NETWORK MODELS AND ASSOCIATED APPLICAnONS
MATHEMATICAL PROGRAMMING STUDIES Editor-in-Chief R. W. COTTLE, Department of Operations Research, Stanford University, Stanford, CA 94305, U.S.A. Co-Editors L.C.W. DIXON, Numerical Optimisation Centre, The Hatfield Polytechnic, College Lane, Hatfield, Hertfordshire ALIO 9AB, England B. KORTE, Institut fur Okonometrie und Operations Research, Universitat Bonn, Nassestrasse 2, D-5300 Bonn I, W. Germany T.L. MAGNANT!, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A. M.J. TODD, School of Operations Research and Industrial Engineering, Upson Hall, Cornell University, ithaca, NY 14853, U.S.A. Associate Editors E.L. ALLGOWER, Colorado State University, Fort Collins, CO, U.S.A. R. BARTELS, University of Waterloo, Waterloo, Ontaria, Canada V. CHVATAL, McGill University, Montreal Quebec, Canada J.E. DENNIS, Jr., Rice University, Houston, TX, U.S.A. B.C. EAVES, Stanford University, CA, U.S.A. R. FLETCHER, University of Dundee, Dundee, Scotland J.-B. HIRIART-URRUTY, Universite de Clermont II, Aubiere, France M. IRI, University of Tokyo, Tokyo, Japan R.G. JEROSLOW, Georgia Institute of Technology, Atlanta, GA, U.S.A. D.S. JOHNSON, Bell Telephone Laboratories, Murray Hill, NJ, U.S.A. C. LEMARECHAL, INRIA-Laboria, Le Chesnay, France L. LOVASZ, University of Szeged, Szeged, Hungary L. MCLINDEN, University of Illinois, Urbana, IL, U.S.A. M.W. PADBERG, New York University, New York, U.S.A. M.J .D. POWELL, University of Cambridge, Cambridge, England W.R. PULLEYBLANK, University of Calgary, Calgary, Alberta, Canada K. RITTER, University of Stuttgart, Stuttgart, W. Germany R.W.H. SARGENT, Imperial College, London, England D.F. SHANNO, University of Arizona, Tucson, AZ, U.S.A. L.E. TROTTER, Jr., Cornell University, ithaca, NY, U.S.A. H. TUY, Institute of Mathematics, Hanoi, Socialist Republic of Vietnam R.J.B. WETS, University of Kentucky, Lexington, KY, U.S.A. e. WITZGALL, National Bureau of Standards, Washington, DC, U.S.A. Senior Editors E.M.L. BEALE, Scicon Computer Services Ltd., Milton Keynes, England G.B. DANTZIG, Stanford University, Stanford, CA, U.S.A. L.V. KANTOROVICH, Academy of Sciences, Moscow, U.S.S.R. T.e. KOOPMANS, Yale University, New Haven, CT, U.S.A. A.W. TUCKER, Princeton University, Princeton, NJ, U.S.A. P. WOLFE, IBM Research Center, Yorktown Heights, NY, U.S.A.
NORTH-HOLLAND PUBLISHING COMPANY -AMSTERDAM' NEW YORK· OXFORD
MATHEMATICAL PROGRAMMING STUDY A PUBLICATION OF THE MATHEMATICAL PROGRAMMING SOCIETY
15
Network Models and Associated Applications Edited by D. KLINGMAN and 1.M. MULVEY T.E. Baker V. Balachandran R.S. Barr R.E. Brooks R.L. Crum R.S. Dembo F. Glover
1931
J .G. Klincewicz D. Klingman R.R. Love, Jr. D.J. Nye V. Srinivasan G.L. Thompson J.S. Turner
N.H 1981
(P~C
1981
NORTH-HOLLAND PUBLISHING COMPANY - AMSTERDAM·NEW YORK· OXFORD
© THE MATHEMATICAL PROGRAMMING SOCIETY - 1981
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, recordmg or otherwise, without the prior permission of the copyright owner.
This book is also available in journal format on subscription.
ISBN: 0444 86203 X
Published by: NORTH-HOLLAND PUBLISHING COMPANY AMSTERDAM . NEW YORK . OXFORD
Sole distributors for the U.S.A. and Canada: Elsevier North-Holland, Inc. 52 Vanderbilt Avenue New York, N.Y. lOOl7
Library of Congress Cataloging,.jn Publication Data
Main entry under title: Network models and associated applications. (Mathematical programming study ; 15) "A publication of the Mathematical Programming Society." Includes bibliographies and index1. Network a~lysis (Planning+ I. Baker, T. A. II. Mulvey, J. M. (John M.) III. Klingman, D. (Darwin) IV. Series. T57.85.N467 001.4'24 81-3952 ISBN 0-444-86203-X (Elsevier North-Holland) AACR2
PRINTED IN THE NETHERLANDS
PREFACE The past forty years have witnessed a marked increase in the acceptance of mathematical modeling and solution techniques by both business and government. This acceptance of the tools of mathematical programming is a direct result of changes in the size and complexities of decision problems, modeling techniques, computer hardware, mathematical solution algorithms, computer implementations, and management education. The eight papers in this Mathematical Programming Study illustrate how these changes are affecting the network flow field. In short, we tried to select a set of papers which collectively offer a view of where applied network programming is and where it is probably going. To meet these objectives, we used two criteria in selecting papers. Specifically, each paper had to present a model which is currently being used or has strong potential of being used, or in response to an algorithmic need generated by an application. Also, each paper has had to originate in the field of network flows. The latter criterion is the combination of a desire to keep the focus of this Mathematical Programming Study to a single field and the strong association between the attributes of the network flow field with the specification of criterion one. The network field has two distinguishing characteristics. First, it has been driven, in large part, by the challenge of practice. Historically, the field of network flows was one of the first, among many, within the broad category of mathematical programming, to yield practical optimization models. For example, Kantorovich's Noble Prize winning work deals with modeling machine production and distribution decisions as network models. Stemming from a need to solve these models, the field developed the first usable pencil and paper heuristics and optimization algorithms (e.g., Charnes, Cooper, Dantzig, Flood, Ford, Fulkerson, Hitchcock, Kantorovich, Koopmans, Kuhn [1-11]). Second, this field has proven to be an excellent indicator of things to come in other areas. Network solution algorithms were among the first developed; pioneer work in the implementation of computer optimization. software took place with the development of a transportation code on the National Bureau of Standard's Eastern Automatic Computer in 1952. Today, network algorithms are continuing to lead the way in the design of more efficient computer implementations for mathematical programming. The papers in this study demonstrate that this area continues to prosper. The first three articles present models which have been used extensively. The first paper by Barr and Turner sets forth an interesting nonintuitive transportation formulation of a microdata file merging problem. The size of their model v
vi
Preface
challenges one's imagination. Computational results on solving problems with 30000 equations and 12 million variables on a Univac 1108 are presented. The second paper by T.E. Baker describes an integer generalized network model of a difficult process changeover problem. The author discusses a man/machine interactive solution which uses an extremely efficient generalized network code to solve each subproblem. In the third paper, Brooks develops generalized network models for natural gas pipeline distribution problems. This paper, like others in this Study, illustrates that the solution efficiency of network algorithms is dramatic, indeed. For example, Brooks indicates that by exploiting the generalized network structure of the model he was able to reduce his solution costs by a factor of 40 to 1. A review of the operator theory parametric programming as applied to network problems appears as the fourth paper. Balachandran, Srinivasan, and Thompson describe how their approach is adaptable to many applications. In the fifth paper, Crum and Nye describe a generalized network model for managing cash flows within a large insurance company. This model has several unusual features which keep the problem in the network. Otherwise, it would be difficult (if not impossible) to solve the problem in a practical setting. Traffic scheduling problems are likewise amenable to network models, as Love demonstrates in paper six. He employs Benders Decomposition to isolate the network portion of the problem so that efficient network solution strategies can be fully exploited. The last two papers involve computer solution strategies for network-related problems. First, the nonlinear separable network problem is solved by means of an approximate second-order method. Herein, Dembo and Klincewicz show the tractability of the concept by solving real-life problems drawn from water resource planning. The general linear programming problem is investigated in the last paper. Many researchers have noted that most linear programming applications possess a large network component, as much as 99%. Building on this idea, Glover and Klingman provide detailed algorithmic steps for specializing the revised simplex method so that the network portion is maintained as a separate entity. Although the basic idea has been proposed previously, the authors suggest novel data structures for taking advantage of the network partition. Preliminary computational experiences indicate that the simplex Son algorithm may be superior to commerical LP packages for certain categories of problems. In summary, these eight papers span many of the recent developments in the network flow field: (1) new application areas, (2) novel modeling concepts which allow many problems to be captured in a network setting, and (3) extensions of highly efficient solution strategies. We are excited by the pace of re.search activity and expect that it will continue for many years to come.
Preface
vii
References [I] A. Charnes and W.W. Cooper, "Generalizations of the warehousing model", Operational Research Quarterly 6 (1955) 131-172. [2] A. Charnes and W.W. Cooper, "The stepping stone method to explaining linear programming calculations in transportation problems", Management Science I (1954) 49-69. [3] G.B. Dantzig, "Application of the simplex method of a transportation problem", in: T.C. Koopmans, ed., Activity analysis of production and allocation (Wiley, New York, 1951) pp. 359-373. [4] M.M. Flood, "On the Hitchcock distribution problem", Pacific Journal of Mathematics 3 (1953) 369-386. [5] L.R. Ford, Jr. and D.R. Fulkerson, "Solving the transportation problem", Management Science 3 (1956) 24-32. [6] F.L. Hitchcock, "The distribution of a product from several sources to numerous localities", Journal of Mathematical Physics 20 (1941) 224-230. [7] L. Kantorovich, "Mathematicheskie metody organisatskii planirovania proizvodstva", Lenningrad University Leningrad (1939). [Translated as: Mathematical methods in the organization and planning of production", Management Science 6 (196) 366-422]. [8] L. Kantorovich and M.K. Gavurin, "The application of mathematical methods to problems of freight flow analysis", Collection of Problems Concerned with Increasing the Effectiveness of Transports (Publication of the Akademii Nauk USSR, Moskow-Leningrad, 1949) Ila--138. [9] T.C. Koopmans, "Optimum utilization of the transportation system", Proceedings of the International Statistical Conference Washington DC, 1947 (Vol. 5 represented as Supplement to Econometrica 17 (1949». [10] T.C. Koopmans and S. Reiter, "A model of transportation", in: T.C. Koopmans, ed., Activity analysis of production and allocation (Wiley, New York, 1951) pp. 222-259. [11] H.W. Kuhn, "The Hungarian method for the assignment problem", Naval Research Logistics Quarterly 2 (1955) 83-97.
CONTENTS Preface
. .
v
(1) Microdata file merging through large-scale network technology, RS.
Barr and J.S. Turner
.
(2) U sing generalized networks to forecast natural gas distribution and allocation during periods of shortage, RE. Brooks
23
(3) A branch and bound network algorithm for interactive process scheduling, T.E. Baker .....
43
(4) Applications of the operator theory of parametric programming for the transportation and generalized transportation problems, V. Balachandran, V. Srinivasan and G.L. Thompson
58
(5) A network model of insurance company cash flow management, RL. Crum and D.J. Nye
86
(6) Traffic scheduling via Benders decomposition, RR Love, Jr.
102
(7) An approximate second-order algorithm for network flow problems
with convex separable costs, RS. Dembo and J.G. Klincewicz (8) The simplex SON algorithm for LP/embedded network problems, F. Glover and D. Klingman . . . . .....
125
148
Mathematical Programming Study 15 (1981) 1-22. North-Holland Publishing Company
MICRODATA FILE MERGING THROUGH LARGE-SCALE NETWORK TECHNOLOGY Richard S. BARR Edwin L. Cox School of Business, Southern Methodist University, Dallas, TX, U.S.A.
J. Scott T U R N E R School of Business Administration, Oklahoma State University, Stillwater, OK, U.S.A.
Received 30 May 1979 Revised manuscript received 8 April 1980 This paper describes the results of over four years of research, development, implementation and use of a system which will optimally merge microdata files. The merge process requires solving transportation problems with up to 50 000 constraints and 60 million variables, and is now being performed on a production basis. The resultant statistical data files fuel U.S. tax policy evaluation models which are used today in the design and analysis of Federal legislation. Computational experience with this pioneering optimization software is described. Key words: Network, Large-Scale Optimization, Microsimulation, Linear Programming, Microdata, File Merging.
1. Introduction and overview In analyzing economic policy, one of the most important tools currently available is the microanalytic model. With this class of econometric models, calculations are performed on individual sample observations of decision units, called rnicrodata, to forecast aggregate population effects of changes in public policy. The significance of this technique is underscored by its use in virtually every Federal agency and a growing number of State governments for the evaluation of policy proposals. This paper focuses on the models used extensively by the U.S. Department of the Treasury's Office of Tax Analysis (OTA) to evaluate tax revision and reform proposals for the Administration and for Congress. One of the strengths of the microanalytic technique is its direct use of sample observations rather than aggregated data. The need for high quality and completeness of these models' microdata is evident from the importance of their end-use applications: legislation design and public policy analysis. But for a variety of reasons, including cost and legality, data is rarely collected specifically for policy models. Instead, they inevitably rely on data accumulated as a part of program implementation (for example, I.R.S. tax forms) or from a survey commissioned for a different purpose (e.g., Census Bureau data). Therefore, the
2
R.S. Barr, J.S. Turner/Microdata file merging
quality of a model's data often depends on more than the sampling and recording procedures; the data from a single source may be ill-specified or incomplete. In this case, the problem becomes more complex; multiple sources are used and files are merged to form a composite data file. Merging involves matching each observation record in one file with one or more records in another file. In this manner, composite records are formed which contain the data items from both original files. This paper explores some of the difficulties associated with the merging process and describes a new technique for their resolution. Until recently, merging has been performed in either an ad hoc or a heuristic manner, but research at OTA [23, 24] has shown that an optimal merge can be defined by the solution to a large-scale, linear programming transportation problem. This optimal merging not only produces the best possible match but also preserves the complete statistical structure of the original files. Because of the unusually large nature of the network optimization problems, a new state-of-the-art solution system was designed to accommodate problems of up to 50000 constraints and 65 million variables and is currently run on a production basis on Treasury computer systems. This paper describes the environment of the merge problem, the optimal merge model, and the pioneering mathematical programming system devised to meet this special set of needs. In summary, public policy models often require data that is unavailable from existing sources and separate surveys would cost tens of millions of dollars apiece. The file merging process described herein is used to combine available sources for a small fraction of that cost. Thus, the objective of the optimal merging approach is the cost-effective preparation of high-quality data for input to the public decision-making process.
2. OTA tax models
The main responsibility of OTA is the evaluation of proposed tax code revisions. In the personal tax area, proposed changes are analyzed to determine the effect they would have on the tax liability of families or individuals having certain characteristics. From the analysis of a set of exhaustive and mutually exclusive classes (based on such characteristics as tax return income class, family size, age of family head, and demographics) it can be determined, for example, how a proposed change affects the Federal tax liability of a husbandwife filing unit (joint return) with two dependent exemptions and with an adjusted gross income between $15 000 and $20 000. From these components, the total variation of tax revenue is determined. The tax policy changes to be analyzed come both from the Administration via the Treasury's Assistant Secretary for Tax Policy and from the tax-related Congressional committees (Ways and Means, Senate Finance, and Joint Com-
R.S. Barr, J.S. Turner/ Microdata file merging
3
mittee on Taxation). The process is usually iterative, with one alternative leading to another, and subject to overall constraints such as a specific limit on the total change in revenue. As a result, the computer models may be run hundreds of times in response to a series of "what if" questions. Two microeconomic models in heavy use at OTA are the Federal Personal Income Tax Model and the Transfer Income Model. Description of these models follow.
2.1. Federal Personal Income Tax Model The Federal Personal Income Tax Model is used to assess proposed tax law changes in terms of their effects on distribution of after-tax income, the efficiency with which the changes will operate in achieving their objectives, the effects the changes are likely to have on the way in which individuals compute their taxes, and the implications for the level and composition of the GNP. For example, a proposal might be made to increase the standard deduction from $2200 to $2600, impose a floor on itemized medical deductions equal to 5 percent of adjusted gross income, and eliminate gasoline taxes as an allowable deduction. Because of interactions among variables, the combined effect of these changes is quite different from the sum of the isolated effects. For example, many taxpayers would switch from itemization to the standard deduction. 2.2. Transfer Income Model (TRIM) The Transfer Income Model (TRIM) is an enormous and complex microanalytic model used by almost every Federal department for analysis of transfer income programs such as welfare and social security. It generates total budget requirements and detailed distributional effects of new transfer programs or changes to existing programs. Moreover, the model can describe the impact of simultaneous program changes. For example, TRIM can ascertain the effect of the cost-of-living component in social security on the food stamp program's transfers.
3. Sources of microdata
The OTA models make heavy use of two sources of microdata: the Statistics of Income file and the Current Population Survey. As microdata, these files contain complete records from reporting units (individuals or households) but, for reasons of privacy and computational efficiency, only a representative subset of the population records are included. Each record is assigned a "weight" designating the number of reporting units represented by the particular record.
4
R.S. Ban', J.S. Turner/Microdata file merging
The resulting microdata file is a compromise between a complete census file and fully aggregated data. Thus, sufficient detail remains to support microanalysis of the population, while partial aggregation protects individual privacy and greatly diminishes the computational burden.
3.1. Statistics of Income (SOl) The SOI file is generated annually by the Internal Revenue Service and consists of personal tax return data. Returns are sampled at random from 15 to 20 income strata; selection rates differ by stratum and by sources of income (e.g., business or farm). Thus, the basic microdata record is a personal tax return with 100 to 200 recorded data items, together with a weight equal to the reciprocal of the sampling rate. The sum of all weights equals the total number of returns (e.g., 82 million in 1975). For computational efficiency, the OTA tax models make use of a subsample of 50 000 records taken from this file. Comparison of a large number of tabulations produced from this subsample, with comparable tabulations based on the full SOl, show an agreement of -+0.2 percent; hence the subsample provides a very accurate representation of the full SOl.
3.2. Current Population Survey (CPS) This survey is generated monthly by the Bureau of the Census, which interviews approximately 67 000 households, representing some 64 000 potential tax returns, to obtain information on work experience, education, demographics, et cetera. Questions are asked on the individual level as well as on the family level, and questions vary each month. The primary purpose of the CPS is to estimate the unemployment rate. Each March, an in-depth survey is made that includes some sources of income that are common to the SOl and some that are not--such as social security and workman's compensation. Because of the presence of individual and household data and the inclusion of most sources on income, such data are very useful for analysis of tax policies and Federal transfer programs.
4. Merging microdata files A typical problem in tax policy evaluation occurs when no single available data file contains all the information needed for an analysis. For example, if the policy question is the incidence and revenue effect of including Old Age Survivors Disability Insurance (OASDI) benefits in adjusted gross income, the Personal Statistics of Income (SOl) microdata file cannot be used in its original form since OASDI benefits are not included. Census files (e.g., CPS) with OASDI benefits do not of themselves allow a complete analysis of the effect of
R.S. Barr, ZS. Turner/Microdata file merging
5
including this benefit, since information on allowable itemizations and capital gains are not in these files. In an attempt to resolve this problem, procedures for matching or merging two microdata files have been proposed. They fall into the general categories of exact matches and statistical matches. In an exact match, the records for identical reporting units are contained in each file and are mated, usually on the basis of a unique identifier such as the social security number. Statistical merges involve files whose records are taken from the same population but are not necessarily from the same reporting units. In this case, matching of records is performed on the basis of their "closeness" with respect to the attributes common to the two files, as illustrated in Fig. 1. 4. I. Di~iculties in obtaining exact matches While in many instances exact matching may be the preferable approach, in practice there are several accompanying problems: insignificant sample overlap, lack of unique identifiers, confidentiality and expense. In the OASDI example mentioned earlier, the necessary information for analysis exists in the SOl and CPS files together. However, exact matching would be useless because an insignificantly small number of persons will appear
FILE B RECORDS
FILE A RECORDS
12ooo AO"STATE' CA,CA'"II [ 001 AC"STATE'ISS,'"Cl I
I
'0001
[Ai
Pll P21... PRI
RECORD COMMON WEIGHT ITEMS ~
~
... Psi
]
FILEA f ,
~
[ BI ,
Iss'
QIlQ2j... QRI ... QTI [
RECORD ~
COMMON ITEMS
FILEB ONLY
FILE C (COMPOSITE RECORDS) [X~j[
PI~ P2~... Ps~
Qlj Q2j...QTI
I
INTERRECORD DISSIMILARITY MEASURE (DISTANCE FUNCTION): Cu = E(Pll ..... PRI,Q1; ..... QRI) Fig. 1. Statistical file merging.
6
R.S. Barr, J.S. Turner/Microdata lile merging
in both sample files. Thus, even if exact matching were not in violation of the confidentiality strictures, the information gain for policy purposes would be insignificant. Another prevalent problem is the absence of unique record identifiers. As a result, even given a significant overlapping of two data files, a 100 percent mapping of identical records between files is very unlikely (using common attributes) since the data values are subject to both measurement and recording errors. The situation in which two samples contain identical reporting units without unique identifiers is not typical when publicly available files are used. When this problem does arise, the application of a statistical matching procedure using common attributes produces as good a mapping of records as is possible, given the quality of the recorded attributes. But even in situations where exact matching is possible, it is often precluded by confidentiality and cost considerations. In many instances privacy legislation guarantees respondents that information given for one file will not be used to "check up" on information given for another file. It may also be significantly more costly to achieve an exact match than a statistical match since, even if unique identifiers are present, many nonresponse items and recording errors are possible. A great deal of effort can be spent handling these "exception" records that cannot be matched without obtaining additional data. Depending upon the analytic purpose of the matched file, use of a statistical merging procedure may be best. 4.2. Statistical and constrained merges
Matching data files with the restriction that the means and variance-covariance matrix of data items in each file be fully retained in the matched file is designated as constrained matching. The equivalence of this restriction to the addition of a series of constraints to the merge process will be developed in subsequent sections. Examples of constrained matching are given by Budd [10] and by Turner and Gilliam [23]; see [22] for a history and survey of statistical matching. The simplest case for statistical constrained matching occurs when two probability samples of equal size with equal record weights are merged. In this case, for purposes of matching, all record weights can be set equal to one. The condition for constrained matching is that each record in both files is matched with one and only one record in the other file. Consider two files, A and B, both with n records: 1, if ith record in file jth record in file O, if i th record in file the jth record in
A is matched with the B; A is not matched with file B;
(1)
R.S. Barr, J.S. Turner/Microdata file merging
7
xo = 1 ,
f o r j = l , 2 ..... n;
(2)
~x 0=1,
f o r i = l , 2 ..... n.
(3)
i=l
Equality constraints (2) and (3) ensure that the condition for constrained matching is met. 4.3. The assignment model of a constrained merge
Each microdata record consisting of r items can be viewed as a point in a Euclidean r-dimensional space. It can be shown for the example above that, under certain assumptions, the permutation of the records (points) in set B that satisfies the pertinent maximum likelihood condition has the following mathematical form: minimize
(4)
~ ~ c,,x,i. i~l j=l
subject to
~x o=1,
i = l . . . . . n,
(5)
j = i ..... n,
(6)
j=l
~xo=l, i=1
where I,
X,; =
if i 'h record in A is matched with the j,h record in B, 0, otherwise;
(7)
c~j = f ( p , , Pi2. . . . . Pi, qil, qj2 . . . . . qir);
p~ -= value of the k 'h common data itcm in record i of file A; qjk - value of the k 'h common data item in record j of filc B. The mathematical modcl given by expressions (4) through (7) is the assignment model. The optimal constrained matching of records in file A with records in file B is obtained by using any onc of thc known assignment algorithms (sce [4]) to find a set of x~, values that minimize expression (4) whilc satisfying constraints (5), (6) and (7). In this model, originally posed in [24], c0 is a measure of inter-record dissimilarity based on a comparison of corresponding record attributes. The specification of this function is dependent upon the statistical propertics of the data items p~k and qj~ and, given ccrtain distributional assumptions, is uniquely determined (see [16]). Thus, the parameter c0 can bc viewcd as the "distance" between record i of file A and record j of file B, and the problem of determining
8
R.S. Barr, J.S. Turner/Microdata file merging
a set of x~i values that minimize the aggregate distance between matched records also yields the assignment problem. Consider a pair of files with two common attributes: wages and salaries earned (p, and qil), the sex of the reporting unit (P~2 and qj2). A simplistic distance function might take the form: Cii = wI]Pil -- qil[ + W2Sii,
where s~i = 0 if P~2 = qj2, else s~i = 1; and w~ and w2 are weights reflecting the relative importance and magnitude of the respective items. While a unique measure has been derived, in practice distance functions are designed to emphasize those items of importance to the merge file user. In either case, a file obtained using the assignment model or the following transportation formulation is said to be an optimal constrained match, as it has been optimized with respect to a given distance function. 4.4. The transportation model of a constrained merge A matching situation more typical of policy analysis problems is a constrained merge of two microdata files with variable weights in both files and an unequal number of records in the files. Let ai be the weight of the i th record in file A, and let b i be the weight of record j in file B. Suppose that file A has m records and that file B has n records. Also suppose that the following condition holds: ai = i=1
b,.
(8)
=
The condition for a constrained matching of file A and file B is given by: ~.,xii = ai,
for i = 1, 2 ..... m,
(9)
~,x~=bj,
for j = 1 , 2 ..... n,
(10)
for all i and j,
(i 1)
xii ~- 0,
where xii represents the weight assigned to the composite record formed by merging record i of file A with record j of file B, with a zero value indicating that the records are not matched. An example of constrained matching using expressions (8) through (1 i) is given in [10, 22]. If c,j is specified as the assignment model example given earlier, and if the objective is to minimize the aggregate after-matching distance between two files (A and B) that satisfy (8), then the problem becomes: minimize
z = ~ ~ ciix,i, i=1 i-I
(12)
R.S. Barr, J.S. Turner/Microdata file merging
subject to
~ xi~ = ai,
9
for i = 1, 2 ..... m,
(13)
f o r j = 1,2 ..... n,
(14)
for all i and ].
(15)
i=l
~ Xij = b j,
i=l
xij >- O,
Note that expressions (13), (14) and (15) are the conditions for constrained matching and that the mathematical model given by (12) through (15) is a linear program. Moreover, this problem is the classical uncapacitated transportation model [11]. This last observation is extremely important for computational reasons, as described in a subsequent section. The dual problem for this model is: maximize
w = ~ aiui + ~., b,v,, i=l
subject to
(16)
i=l
ui + vj <- c o
for all i and .h
(17)
u, vi unrestricted in sign.
(18)
The analogy between this formulation of the merge process and the transportation network model described earlier in this volume also provides an intuitively appealing means of visualizing the underlying common problem. In the merge model analogy depicted to Fig. 2, the nodes represent individual microdata records whose weights are given as the supply and demand values.
2000
--~.
xll = 400, Cll =10 ~ o , c ~
_
lOO
3ooo( T Network Supply Origin Component: [ V~a~eS Nodes
400
0o 1
Xmn:
n'
Arcs, with FIoWScost (andxi s ' Unitcij l )
Destination Demand Nodes Values(bj)
t
Merge I CPS Record Matches Model Record CPS Equivalent: Weights Records Weights withand Assigned Distances
SOl Records
SOl Record Weights
Fig. 2. Example constrained merge as represented by transportation network model
10
R.S. Barr, J.S. Turned Microdata file merging
The network arcs correspond to record matching combinations, and the associated flows and costs represent the merge record weights and distance function values, respectively. The objective is to determine the set of record matches and associated weights such that the original record weight totals are maintained at a minimum overall distance. The solution to this problem identifies the records in file B that are to be merged with each record in file A. In contrast with the assignment model, this problem permits a record in one file to be split or to be matched with more than one record in the other file. But since the weight of the original record is apportioned among the otherwise identical split records, the marginal and joint distributions of each file's variables are preserved. (See appendix for proofs). Therefore, this optimal merging not only minimizes aggregate information loss in the matching process, but preserves the complete statistical structure of the original files, two important characteristics missing from all other available merging schemes. Unconstrained matching of two microdata files is given by applying either constraints (13) or (14) but not both. In this case the item means and variancecovariance matrix of only one of the files is preserved in the matching process. Okner [211 describes an example of unconstrained matching which is the model of (12), (13), and (15). See [7] for a critique of unconstrained matching. The transportation model for optimal constrained microdata matching was originally posed in [24] and further discussion is given in [23]. A theoretical formulation of an optimal constrained merging is given in Kadane [16], where it is corroborated that under certain conditions constrained matching is analytically equivalent to the transportation model.
5. An optimal file merge system In the transportation network model given above, the number of constraints is (m + n). Since each x,j represents the merging of two records, there are up to (ran) problem variables in a constrained file merge. These dimensions can be extremely large, considering typical sizes of m and n and the fact that the problem is totally dense (any of the m n variables might be positive). For example, to merge the CPS and SOI files directly would involve over I I0000 constraints and 3 billion variables. Since problems of this magnitude are far beyond the capability of the best general-purpose linear programming system and, even if they were divided into a series of subproblems, solution would involve an inordinate amount of machine time, a large-scale network solution system for the optimally-constrained merge problem was developed. This Extended Transportation System (ETS) makes use of recent research into network solution techniques [2, 5, 6, 13-15, 18, 20] and is based on a specialization of the primal simplex method. This system has been
R.S. Barr, J.S. Turner/ Microdata file merging
11
used to solve some of the largest known optimization problems and is the only file merge system of its kind in existence.
5.1. Computational aspects of the primal simplex method The primal simplex method as specialized to transportation problems has many computational advantages over other approaches. First, a simplex basis for an m • n problem corresponds to a spanning tree with (m + n - 1) arcs. As such, a basis can be represented compactly using lists of node labels and corresponding flow values. This same data structure carries the basis inverse implicitly and, in conjunction with a set of node potentials and list structures for their maintenance, dramatically streamlines the simplex pricing and pivoting steps. It is through these elegant mathematical structures that the superior efficiencies, in terms of solution speed and memory requirements, are attained by this approach. It is also important to note that, in contrast to out-of-kilter and primal-dual approaches which require all problem data to be in primary storage, only the basic arc data need be so maintained. In addition, the arc cost/distance data may be inspected piecemeal and can therefore reside on a secondary storage device and inspected in pages, or blocks of data. Identification of efficient rules for paging and pivot selection has been the subject of much research [9, 13, 14, 18, 20]. Another valuable characteristic of network problems in general is the automatic integrality of variables when all supplies and demands are whole numbers. When the distance data are also integer-valued, no program data need be represented as real numbers with the attendant concerns of numerical round-off and error tolerances.
5.2. The ETS solution system In designing a network solution system for the OTA merge problem, the hardware available was a UNIVAC 1108 with only 150000 36-bit words of primary storage, plus disk and drum secondary mass storage. This limited amount of memory plus the enormous size of the problem precluded even the use of an available paged-data primal-simplex network code [18] because of the need to maintain in primary storage a basis of size (6m + 6n) words plus a page of arc distance data. Even when the problem specifications were reduced to 50 000 constraints and 65 million variables, primary storage was insufficient. The result was a twofold problem: first, the major data processing task of efficiently handling the arc distance data and, secondly, the extension of network solution technology to a new level to accommodate problems of this magnitude. The following sections describe ETS features designed to meet these needs.
R.S. Barr, J.S. Turner/ Microdata ~le merging
12
5.2.1. Transportation problem optimizer The primal simplex transportation code with the smallest known memory requirements is used. ETS employs a modification of the SUPERT code by Barr [2] which stores the basis in (4m +4n) locations. Special packing techniques reduce this memory requirement to (2m + 2n), thus allowing a 50 000 constraint basis to be maintained in 100000 words; the remaining locations are used for storing the program and pages of the arc distance data. It should be noted, however, that this condensed storage technique markedly increases the computational burden associated with the execution steps since every reference to problem data requires either a packing or an unpacking operation. Preliminary testing indicated that, as a result, solution times have been increased by a factor of from two to four over normal, unpacked data storage. Partially offsetting this implementational disadvantage is the high efficiency of the transportation optimizer. The SUPERT code uses an independently derived variant of the ATI algorithm [14] and compares favorably with state-of-the-art primal simplex network codes. As shown in Table 1, in a comparison of aggregate solution times on a standard set of small transportation problems [19], the PNET-I code [15] is 73% slower than SUPERT, the GNET code [9] requires 106% more time, and the ARC-II code [5] is roughly comparable. Besides these primal simplex-based codes, the times for the SUPERK out-of-kilter code [3] are over five times those of SUPERT. These network codes have the disadvantage of being designed for more general capacitated problems but have the advantage TaMe 1 Total solution times on transportation problems on a CDC 6600 a NETGEN Problem 1 2 3 4 5 6 7 8 9 10 Total time:
m
n
100 100 100 100 100 150 150 150 150 150
100 100 100 100 100 150 150 150 150 150
Arcs 1300 1500 2000 2200 2900 3150 4500 5155 6075 6300
SUPERT b 0.42 0.59 0.70 0.70 0.85 1.29 1.70 1.95 2.05 2.04 12.29
PNET-F 0.92 0.98 1.20 1.07 1.61 2.28 2.79 3.11 3.29 4.08 21.33
GNET a
ARC-IV
1.06 1.08 1.45 1.44 1.76 2.45 3.39 4.06 4.12 4.68 25.49
0.60 0.68 0.76 0.68 0.90 1.60 1.62 2.17 2.11 2.81 13.93
SUPERK 3.72 4.25 4.39 4.27 4.23 7.09 8.11 8.61 DNR DNR
DNR = Did not run in 201 000~ words of memory. a All programs compiled under FTN with OPT = 2. Times are elapsed CPU time exclusive of input and output. b Modified row most negative pivot strategy used (see [14]). c Modified node most negative pivot strategy used (see [13]). d Default pivot strategy used.
R.S. Barr, J.S. Turner/ Microdata file merging
13
of more advanced data structures. These programs also have substantially greater memory requirements than the ETS transportation optimizer.
5.2.2. Nondense problem generation The density, d, of a transportation problem is defined as the number of problem arcs divided by (mn), the number of arcs if all origin--destination pairs are considered. Because of the enormous size of (mn) in the merge model, problems with d < 1 are generated using a sampling window that restricts consideration to a subset of the possible matches for a given record. Several heuristic schemes are employed to determine this window, and these schemes are based primarily on comparisons of dominant items in the distance function so as to consider the "most likely" matches. Specifically, one scheme narrows the window of consideration to the t records in file B that match most closely a given record in file A, based on one or more common attributes. This has been used with t = 500 and 1000, with the attribute being adjusted gross income. Since the merge file in this case was used in tax policy models, the income attribute was deemed to be of key importance; however, the size of the window still allows the various other factors, as expressed in the distance function, to influence the match process.
5.2.3. Distance [unction scaling The range of the distance function values is reduced to 64 categories to permit exploitation of the machine wordsize by the data packing scheme described above. This is necessitated by a worst-case analysis of the size of the problem's dual variables (computed from sums of the cij values) and the number of bits available for their storage. But even with this scaling, a sufficient degree of distance value differentiation is available to produce an excellent match for the problems under consideration (see Section 5.4 regarding match quality).
5.2.4. Phase I/phase 2 solution strategy Initially the construction of a feasible basis is attempted from a single pass of the problem variables. If a feasible solution is not found, artificial arcs are added to form the starting basis and must be purged by the solution process. The wordsize restriction necessitates the use of a "phase 1]phase 2" solution approach instead of the more efficient "Big M" method of eliminating artificial variables from the solution basis. Since the actual merge problem is totally dense (d = 1), these artificial variables correspond to legitimate matching possibilities that fell outside of one record's window. However, their associated interrecord distances are unknown and are assumed to be extremely large. Phase 1 is used to drive these variables out of solution so as to form an initial feasible basis for phase 2 optimization. This approach is a costly one, time-wise, as demonstrated in Section 5.3; however, OTA deemed merged file quality to be more valuable than the
R.S. Barr, J.S. Turner/ Microdata file merging
14
additional machine time. The effect of allowing variables to remain in the merged file have not been investigated. 5.2.5. C l o s e n e s s to o p t i m a l i t y
calculations
Two new procedures are incorporated in ETS to compute "closeness to optimality" figures for intermediate solutions from this primal algorithm. The objective function value associated with a given primal simplex basis is an upper bound on the optimal solution value. Hence if a similar lower bound can be determined, a conservative measure of closeness to optimality can also be calculated. Such a measure can be used to terminate the solution procedure when a given suboptimal solution is deemed to be "good enough". Normally, a feasible solution to the dual problem must be constructed (at great computational cost) in order to arrive at a lower bound on the optimal objective function value, but the special structure of transportation problems can be exploited to expedite calculation of such a lower bound. Both algorithms are detailed in [1, 8] and the more successful, in terms of strength of the bound, will now be presented. From duality theory it is known that, for any feasible solution {xi} to the primal problem (12)-(15) with value z and any feasible solution {ui, vi} for the dual problem (16)-(18) with value w, the relationship w---z holds. Moreover, w <-- w* = z* - z, where w* and z* are the optimal solution values for the dual and primal problems. Hence, the objective function value w for any dual feasible solution is a lower bound on the optimal solution value. The following algorithm constructs just such a solution and bound from a primal feasible transportation basis. For each primal feasible solution to the transportation problem, the simplex method associates a dual solution {ui, v~}, the node potentials. If the primal solution is not optimal, then the dual solution is not feasible and one or more (nonbasic) arcs violate constraint (17). In particular, if arc (i, j) is dual infeasible, II~j = ui + vj - cij > O.
If node potential ui is decreased in value by H~i, the arc (i, j) becomes dual feasible. This new dual solution is obtained by a change of variables using the relation u~ = u~- H~j, which yields u ~ + v s = c~j.
Moreover, the dual feasibility of any other arc (i, k) out of node i is not altered. Since H~j > 0, if u~ + vk < c~k, then u~ + v~ = (ui - II~i) + vk < cik.
No other arcs are affected by this change of dual variables. The result of this substitution is a dual solution with at least one fewer dual infeasible arc with the new objective function value w ' = w - aiII~j. This pro-
R.S. Barr, J.S. Turner/Microdata file merging
15
cedure can be repeated for all dual infeasible arcs until a dual feasible solution is obtained. The objective function value for this final solution is then a bound on Z*.
While this bound requires substantial processing to calculate, the bound becomes quite strong as intermediate solutions approach the optimal. This was verified by testing on medium-sized (250 000 arc) problems. Because of the speed of ETS, however, all production problems have been run to optimality. In other instances where greater machine time restrictions exist, this bound can be used to evaluate the quality of a suboptimal solution.
5.2.6. Pricing strategies The pricing procedure is enhanced through the use of a multipricing technique for pivot selection that has been shown to drop solution time for large problems to half of that required when using the best pivot selection of earlier studies [13, 20, 25]. This tactic scans a page of arc data for pivot-eligible arcs (II~i > 0) and generates a "candidate list" of such arcs with predefined length I. The arc with the largest H 0 value is selected, removed from the list, and pivoted into the basis. The remaining arcs in the list are then repriced. The "most eligible" candidate arc is selected from the revised list and the process continues until k such candidates are chosen or all candidate arcs on the list price nonpositive. At that point the list is replenished and the process repeated. This continues until the entire page of arc data prices dual feasible or until s passes of the page have been made. When all pages price nonpositive, optimality has been achieved. The selection of values for the parameters k, I, and s determine the effectiveness of the pricing procedure. It should also be noted that all arc data input is "double-buffered", a systems programming technique which permits the pricing and pivoting operations to be carried out simultaneously with the paging in of arc distance data. In this manner, the central processing unit will rarely have to wait for a subsequent page of data to be read into primary storage from disk.
5.2.7. Other ETS implementation aspects The system is written entirely in FORTRAN to increase its maintainability and portability. Of course, the use of a higher level language is not without its cost in efficiency, since assembly language programming would allow full exploitation of a particular machine's architecture. The execution times of some mathematical programming codes have been shown to improve by 30 percent to 300 percent through the inclusion of assembly coding in critical areas alone [171. ETS also includes the capability for resuming the optimization process from a suboptimal solution, a command language for execution control, and report generation options.
16
R.S. Barr, J.S. Turner/ Microdata tile merging
5.3. Recent ETS usage In order to assess the impact of tax rebate proposals and President Carter's tax reform initiatives, a merge of the 1975 SO1 file and Survey of Income and Education (SIE) file (a one-time survey, equivalent to the CPS) was performed in the fall of 1978. The results were used in the preparation of [27]. Similar files have been used in the past to analyze former Secretary William Simon's fundamental tax reform proposals, the results of which appeared in [26]. Because of the enormity of the problem (110 094 constraints), the merge was broken into six subproblems based on census region. Each subproblem was optimized and the ETS solution statistics for these runs are given in Table 2. It should be noted that the solution times would be markedly reduced if data packing were not used and if key portions of the system were coded in assembly language. And, since the effect of many of the system parameters such as pivoting strategy and page size has not been researched, even these extremely fast times should not be construed as the best attainable with ETS. Recent comparisons between a FORTRAN-language primal network code and a state-of-the-art, commercial, general linear programming system (APEX III) have shown the specialized approach to be 130 times faster [11]. Using this figure as a basis of comparison, a general-purpose mathematical programming system running on a dedicated UNIVAC 1108 would require over seven months to solve these problems. The values in Table 2 show that phase 1 required approximately one-third of the solution time to drive out artificial variables constituting an average of 6.4% of the initial basis. This is also an indication of the time that could potentially be saved by the Big M method or by the construction of an initial primal feasible solution. The "percent degenerate pivots" figures show that these transportation problems have relatively little degeneracy, a characteristic noted in studies of smaller transportation problems. This is in sharp contrast with assignment and transshipment network problems which have been shown to exhibit over 95 and 80 percent degenerate pivots, respectively [4, 6, 13]. A more curious finding from these statistics is that the number of pivots is highly correlated with the number of constraints (p 2= 0.92) but not with the number of variables (p 5= 0.06). This may indicate that a much larger window could be used in the problem generator without drastically escalating the solution times.
5.4. Quality of the merged file Properly assessing the quality of a merge file is a difficult task since no generally accepted measures of "goodness" have been established and the theory in this area has only recently begun to be investigated. (The derivation of measures of match quality and their interrelationships with distance function
17
R.S. Barr, J.S. Turner/Microdata file merging
~
e4~
7,
<
r >
>
Z < o "0 ~
e,l
I
E 0
:g
..
e, c~
r
e..
("4
< Z e~ 0 0 f21, 0
e.
.o e~
0
d
Z
d
Z
= =
~ ....
6
Z
L)
R.S. Barr, J.S. Turner/ Microdata ~le merging
18
definitions are important topics for future research.) Somewhat simplistic measures can be used, however, to give a broad-brush indication of the degree of agreement between the records joined to form the composite file. To this end, Tables 3 and 4 provide summary information regarding the merge file described above. As depicted earlier in Fig. 1, a composite record is formed by mating a record in file A with a record in file B and assigning a record weight. This record then contains duplicate items since some attributes appear in both original files. These common items are, of course, used in the distance function calculation but specific values can also be compared to see how well individual records matched. Table 3 shows percentages of agreement and average differences between like items in the composite records. For example, 95.1 percent of the merged records had the same I.R.S. tax schedule code. These measurements are calculated using the record weights, so as to reflect the degree of agreement for the merged populations rather than the matched samples. These figures indicate that by minimizing the aggregate distance function values and maintaining the record weight constraints, that a v e r y strong match can be obtained. It should be noted that 100 percent agreement between items is virtually impossible since the match is between different samples. For example,
Table 3 Item analysis of the complete merged file
Common date item
Matched records relationship
Weighted percentage a of records or value
1. Schedule code (single, joint, married filing separately, etc.)
% agreement
95.1%
2. Age of tax filer
% within 5 years % within 10 years
60.9% 91.7%
3. Size of family
% agreement % within 2
70.2% 97.4%
4. Race
% agreement
89.3%
5. Sex
% agreement
94.6%
6. Adjusted gross income (including all taxable sources of income)
average difference % within $1000 % within $2000
$925 79.6% 92.6%
7. Wages and salaries
average difference % within $1000 % within $2000
$637 86.7% 95.0%
a Percentages based on sums of record weights with indicated agreement as a percentage of the total of all record weights.
R.S. Barr, J.S. Turner/Microdata file merging
19
Table 4 C o m p o s i t e agreement counl for six c o m m o n items in the complete merged file N u m b e r of item agreementsd 6 5 4 3 2 I 0
Percent of records (weighted) b 68.6% 22.0 6.4 2.1 0,6 0.2 0.0
Cumulative percent (weighted) h 68.6% 90.6 97.0 99.1 99.8 100.0 100.0
"Categories of item agreement in a composite record: (I) same schedule codes: (2) ages within ten years: (3) family size within two; (4) same race; (5) same sex: and (6) adjusted gross income within $2000. ~'Percentage based on s u m s of record weights exhibiting such a g r e e m e n t as a percentage of the total of the record weights, 82 215 537 (the number of tax fliers).
if the match were made on the basis of schedule code alone and all constraints relaxed, the best possible level of agreement would be 98.2 percent. To identify record agreement on multiple items, six agreement categories were defined, the number of categories of agreement for each record counted, and the results summarized in Table 4. Again using weighted counts, 68.6 percent of the merge file records agree in all six categories and over 90 percent agree in five or more categories. Therefore, this particular file not only is a good match on individual items but on combinations of items as well. Postmerge calculations also verified the retention of the statistical structure of both original files' data. Note that while the figures in Tables 3 and 4 could be improved by relaxing either constraints (13) or (14), this would yield distortions in the aggregate statistics for all data items from the corresponding original file. Such distortions could significantly alter the results obtained by the personal income tax and transfer income models.
6. Summary Whereas separate surveys for different informational needs would cost tens of millions of dollars apiece, this optimal, constrained merge technique can bring about the merging of available sources for a small fraction of that amount. And, as its use continues, the ETS merge system is proving to be a cost-effective means of providing new, high-quality data resources for the public decisionmaking process.
R.S. Barr, J.S. Turner/Microdata file merging
20
Appendix. Preservation of item statistics in constrained merging In this section we show that the means and variance-covariance matrix of items in a given file A are preserved in a file resulting from a fully constrained statistical merge with another file B. This is a consequence of including constraints for the original record weights in the merge process and the inclusion of all of the original items from both files in the composite file. (See Fig. 1.) This discussion does not apply to any relationships between items that were originally in different flies.
A.I. Arithmetic mean The arithmetic mean of a data item in the merge file will retain its value from the originating file even though records may be split in the matching process. This is because the sum of the weights of any split records equals the weight of the original record. To demonstrate this, let pi, represent the value of the r th data item in the i th record of file A, and ai denote the record weight in that file of m records. The mean of item r is given as
P =
a~i~
r
ai
9
When file A is merged with an n-record file B, let x~j again represent the weight assigned to the composite record formed by merging record i of file A with record j of file B. In the fully constrained model, up to (m + n - 1) of these values are positive, with the remaining zero values indicating that the records are not matched. Constraint (9) ensures that
~ x0=a~, n
fori=l,2
m.
=
Therefore, the mean of the same item r in the merged file is given as p* =
p~,x~ i=l
=
Li=I
Pit
xi~ i=l
Xj=l
XO
j=l
Xij
=(~__lPi~ai)/(~__lai )' which is equivalent to the expression for ~,. This relationship holds for any item in either of the original files.
A.2. Variance-covariance matrices For a similar analysis of the items' variance-covariance properties, let Pit and
R.S. Barr, J.S. T u r n e r / M i c r o d a t a file merging
2I
Pi, represent, respectively, the r th and s th data items in the ith record of file A. 2 as the variance of item r (if r = s) or the The following expression defines o-,., covariance of the two items (if r ~ s) in the original file: o-,, =
a, (Pi, -/5,)(pi, - ,0, )
ai .
In a fully constrained merge file, the variances and covariances are given as or2r~ =
i=l
[ X i i ( P i r -- P * ) ( P i ~
--
P*)]
=
)
X~i .
Since p* = p, and p* =/~,. a2,*
=
=
[(p,, - p,)(p~,
- p,
a i ( p i r - 15r)(Pis - if,
x~ i
i=l ai
c~
,
2 This equivalence applies to any items in either file A which is equivalent to are. or file B. These relationships demonstrate that the constrained merge process preserves the statistical content of both original files. Such would not be the case if either weight constraint (9) or (10) were omitted, in which case distributional distortions would be introduced for items in the unconstrained file(s).
Acknowledgment Thanks are given to Darwin Klingman, University of Texas at Austin and David Karney, Williams Companies, for their valuable suggestions and contributions to this paper. We also wish to acknowledge Alan J. Goldman, John M. Mulvey and the referees, whose critiques led to a much-improved paper. Finally, we thank H a r v e y Galper, Nelson McClung and Gary A. Robbins of the Office of Tax Analysis for their strong interest in and support of this project.
References [I] Analysis, Research and Computation, Inc., "Extended Transportation System (ETS) programmer technical reference manual", P.O. Box 4067, Austin, TX (1975). [2] Richard S. Barr, "Primal simplex network codes: A computational study", Research Report, Edwin L. Cox School of Business, Southern Methodist University, Dallas, TX 0980). [3] Richard S. Barr, Fred GIover and Darwin Klingman, "An improved version of the out-of-kilter method and a comparative study of computer codes", M a t h e m a t i c a l P r o g r a m m i n g 7 (1974) 60--86. [4] Richard S. Barr, Fred GIover and Darwin Klingman, "The alternating basis algorithm for assignment problems", M a t h e m a t i c a l P r o g r a m m i n g 13 0977) 1-13.
22
R.S. Barr, ZS. Turner/ Microdata file merging
[5] Richard S. Barr, Fred Glover and Darwin Klingman, "Enhancements to spanning tree labelling procedures for network optimization", INFOR 17 (1) (1979) 16-33. [6] Richard S. Barr, Joyce Elam, Fred Glover and Darwin Klingman, "A network augmenting path basis algorithm for transshipment problems", in: A.V. Fiacco and K.O. Kortanek, eds., External methods and systems analysis (Springer, Berlin, 1980). [7] Richard S. Barr and J. Scott Turner, "New techniques for statistical merging of microdata files", in: R. Haveman and K. Hollenbeck, eds., Microeconomic simulation models for public policy analysis (Academic Press, New York, 1980). [8] Richard S. Barr and J. Scott Turner, "A new, linear programming approach to microdata file merging", in: U.S. Department of the Treasury, 1978 Compendium of tax research (U.S. Government Printing Office, Washington, D.C., 1978) pp. 129-155. [9] Gordon H. Bradley, Gerald G. Brown and Glenn W. Graves, "Design and implementation of large scale primal transshipment algorithms", Management Science 24 (1) (1977) 1-34. [10] Edward C. Budd, "The creation of a microdata file for estimating the size distribution of income", Review of Income and Wealth 17 (4) (1971) 317-334. [11] A. Charnes and W.W. Cooper, Management models and industrial applications of linear programming (Wiley, New York, 1961). [12] Fred Glover, John Hultz and Darwin Klingman, "Improved computer-based planning techniques, Part l", Interfaces 8 (4) (1978) 16-25. [13] Fred Glover, David Karney and Darwin Klingman, "Implementation and computational comparisons of primal, dual and primal-dual computer codes for minimum cost network flow problems", Networks 4 (3) (1974) 192-211. [14] Fred Glover, David Karney, Darwin Klingman and A. Napier, "A computational study on start procedures, basis change criteria, and solution algorithms for transportation problems", Management Science 20 (5) (1974) 793-813. [15] Fred Glover, Darwin Klingman and Joel Stutz, "Augmented threaded index method for network optimization", INFOR 12 (3) (1974) 293-298. [16] Joseph Kadane, "Some statistical properties in merging data files", in: U.S. Department of the Treasury, 1978 Compendium of tax research (U.S. Government Printing Office, Washington, D.C., 1978). [17] James A. Kalan, private communication. [18] David Karney and Darwin Klingman, "Implementation and computational study on an incore/out-of-core primal network code", Operations Research 24 (6) 0976) 1056-1077. [19] D. Klingman, A. Napier and J. Stutz, "NETGEN' A program for generatinglarge scale capacitated assignment, transportation, and minimum cost flow network problems", Management Science 20 (5) (1974) 814-821. [20] John M. Mulvey, "Pivot strategies for primal-simplex network codes", Journal of the Association for Computing Machines 25 (2) (1978) 266-270. [21] Benjamin Okner, "Constructing a new data base from existing microdata sets: The 1966 merge file", Annals of Economic and Social Measurement 1 (1972) 325-342. [22] Daniel B. Radner, "The development of statistical matching in economics", 1978 Proceedings of the American Statistical Association, Social Statistics Section (1978). [23] J, Scott Turner and Gary B. Gilliam, "Reducing and merging microdata files", OTA Paper 7, Office of Tax Analysis, U.S. Department of the Treasury, Washington, D.C. (1975). [24] J. Scott Turner and Gary A. Robbins, "Microdata set merging using microdata files", Research Report, Office of Tax Analysis, U.S. Department of the Treasury, Washington, D.C. (1974). [25] V. Srinivasan and G.L. Thompson, "Benefit-cost analysis of coding techniques for the primal transportation algorithm", Journal of the Association for Computing Machinery 20 (1973) 194-213. [26] U.S. Department of the Treasury, Blueprints [or basic tax reform (U.S. Government Printing Office, Washington, D.C., 1978). [27] U.S. Department of the Treasury, The President's 1978 tax program (U.S. Government Printing Office, Washington, D.C., 1978).
Mathematical Programming Study 15 (1981) 23--42. North-Holland Publishing Company
USING GENERALIZED NETWORKS TO FORECAST NATURAL GAS DISTRIBUTION AND ALLOCATION DURING PERIODS OF SHORTAGE Robert E. B R O O K S Transportation and Economic Research Associates, Los Angeles, CA 90028, U.S.A. Received 14 June 1978 Revised manuscript received 16 September 1980 During the 1970's the United States began to experience for the first time a shortage in natural gas. Various regulatory agencies, including the Federal Power Commission and state public utilities commissions, developed guidelines for allocation of short gas supplies according to prioritized lists of end-uses. During this period also the author developed a family of natural gas distribution models to assist analysts and decision makers in government and industry to forecast shortages and to evaluate alternative strategies for dealing with them. In this paper the author describes the techniques used in modelling the complex system of natural gas transmission, distribution, and allocation under changing conditions of supply and demand. Key words: Allocation, Distribution, Linear Programming, Natural Gas, Network Model, Pipelines, Regulation, Transportation.
1. Introduction I. 1. The natural gas delivery system o f the United States Getting natural gas to final c o n s u m e r s in the U.S. is a complex, multi-level process involving thousands of small and hundreds of large private companies, a large n u m b e r of publically owned utilities, as well as various state and local regulatory agencies and the Federal Energy Regulatory C o m m i s s i o n (FERC). The process begins with private exploration of onshore lands and offshore waters for geological signs indicative of the potential of underground petroleum deposits. While such exploration has been traditionally centered in the Southwest U.S., it is now increasingly moving to more remote areas and deeper waters as is evidenced by recent activity in Alaska, the Canadian Arctic, the U.S. Outer Continential Shelf (OCS), and the R o c k y Mountains. The promise of the OCS has resulted not only in increased activity in the traditional Gulf Coast areas (off Southern Louisiana and Texas), but also off the U.S. East and West Coasts and the Gulf of Alaska. Very recently attention has centered on the overthrust area of the R o c k y Mountains [20]. Major finds in the step of Utah and Southwest Wyoming have resulted in several proposals for new pipelines to transport gas both east and west from this area. 23
24
R.E. Brook.~/ Nalural gas distribution
Getting Alaskan gas to the lower 48 states is expected to be a major challenge for the gas industry, both physically and financially. It is estimated that the total cost of all linkages in the Alaskan gas transportation system (Alaska, Canadian, Western U.S. and Northern U.S. legs) will run to at least $20 billion making it the largest single privately financed venture in U.S. history. Additional sources of gas to satisfy the Nation's energy needs have also created widespread interest. Liquified Natural Gas (LNG) from Algeria, Indonesia, Nigeria, and Venezuela can meet some of this demand, but it also carries with it some worrisome dangers [18]. Conversion of coal, the United States' most abundant fossil fuel, also offers a potentially prolific source of gas, though the costs of proven technologies for doing so are very high [28]. Other potential sources include conversion of organic wastes and biomass into gaseous fuel, gas from coal seams, and dissolved gas in geopressured zones along the Gulf Coast. These sources are in very early stages of development and have not been implemented to any significant degree. President Carter's 1979 address to the Nation on the creation of a Synfuels program and deregulation of high cost gas may result in a much greater effort in these directions, however. These changes in national gas supply are beginning to have a significant impact on the patterns of natural gas delivery in the U.S. There have been several instances of natural gas companies with excess pipeline capacity examining possibilities for conversion to crude oil or products lines. One highly publicized attempt involved El Paso Natural Gas Company and Standard Oil of Ohio (SOHIO) [12]. SOHIO negotiated with El Paso to convert one of El Paso's transmission lines for moving gas from West Texas to California into a crude oil pipeline going in the opposite direction. In May of 1979 SOHIO, however, abandoned its plan citing five years of government red tape as having made the project no longer potentially profitable.
1.2. Model development To analyze the effect of such changes on the distribution and allocation of natural gas in the U.S., several models have been developed. The most noteworthy of these models are the North American continential oil and gas models of Debanne [7,8, 10] and the GASNET models developed by the author in the period 1974-1978 [1,2,3,4,5] culminating with the GASNET3 system to be described in this paper. These models have been utilized by the Department of Energy and its predecessor agencies in several studies involving regional natural gas curtailment forecasting [17], analysis of Alaskan natural gas transportation systems [29], regional alternative fuel demand [26, 27], potential capacity bottlenecks in the gas transportation system in the 1980's and 1990's [22], and capital requirements for new and expanded energy transportation capacity [21]. GASNET3 has also been proposed and is being evaluated for use in gas company planning as a gas balancing model for storage and operations planning
R.E. Brooks/Natural gas distribution
25
In the following sections, the structure and use of this important tool will be described.
2. Structure of the GASNET3 model
2.1. Transactions to be modelled The GASNET3 model is designed to explicitly represent most of the major gas pipeline and distribution companies in the U.S. This representation therefore includes much more than just movements of gas in the pipelines themselves. It also includes transactions (receipts and deliveries) between the producers, the pipelines, the distributors, and the end-users. The transactions which are modelled are only those which involve a physical transfer of gas. These physical transfers take a number of forms as described below.
2. !. 1. Sales by producers to pipeline transmission companies By far the greatest bulk of producers sales are to pipeline transmission companies. These sales usually take place at the wellhead, at gasoline plant outlets, or along a pipeline's transmission line. Prices on these sales are governed by the Federal Energy Regulatory Commission and existing contracts between the producers and pipelines. Maximum volumes available are determined by these contracts and by the production capacities of the wells and plants delivering gas to the pipeline. 2.1.2. Sales by producers to distribution companies Sales by producers directly to distribution companies are usually classified as intrastate sales since the gas sold will not be moving in interstate commerce. Such sales are not regulated by FERC but might be governed by state public utility or conservation commisions. 2.1.3. Sales by producers to consumers Sometimes producers sell directly to consumers located in or near the producing region. These are usually industrial sales often involving the petro-chemical industry. Such sales are not Federally regulated. Sometimes the industry in question will have its own pipeline to transport the gas from the producing are to its plant. Sometimes a buyer will purchase gas from a producer and have it transported by pipeline hundreds of miles to the consuming area. 2.1.4. Sales by pipeline transmission companies to distributors The greatest majority of pipeline sales are to distributors who then retail the gas to final consumers. These sales are regulated by FERC when the pipeline in question is a jurisdictional pipeline involved in interstate commerce. Prices on
26
R.E. Brooks/Natural gas distribution
these sales are based on estimates by FERC of the pipeline's cost of service and a fair and reasonable rate of return on its capital rate base.
2.1.5. Sales by pipelines to consumers Pipeline companies frequently sell gas directly to end-users. Mainline industrial sales are sales to industrial users along a main line of the pipeline. Direct community sales are sales to final consumers by the distribution division of an integrated transmission/distribution company. Mainline and direct sales are not regulated by FERC.
2.1.6. Sales by pipeline to other pipelines A substantial quantity of natural gas is sold each year by one pipeline to another. These sales can involve either long term arrangements or emergency deliveries. Some pipelines receive all of their gas supply this way (e.g. Algonquin Gas Transmission from its parent company Texas Eastern Transmission). Others use such transfers to level out temporary regional supply demand imbalances due to weather or other factors. Such interpipeline sales are regulated by FERC when they involve jurisdictional (interstate) pipelines.
2. !.7. Interdepartmental pipeline sales In several cases pipeline transmission companies are also involved in gas distribution and/or electricity production and distribution. Sometimes the gas department in such companies sells gas to the electricity department to use in gas-fired electrical generators. These are called interdepartmental sales.
2.1.8. Exchanges between pipelines Another form of transaction between pipelines is called an exchange. In this transaction one pipeline delivers a certain quantity of gas at one location to a second pipeline, and in exchange the second pipeline delivers gas to the first at another location. Due to fluctuating demand and supply, exchanges do not always balance out at the end of each year. Pipelines therefore keep records which measure the net gas owed by or to their exchange partners.
2.1.9. Transportation of gas by pipelines [or other pipelines There are a number of cases where pipelines purchase or produce gas in a producing area in which they have no transmission facilities. In these instances they must contract with other pipelines to transport gas to their own pipelines which may be a few miles or hundreds of miles away. In these transactions the transporting pipeline acts like a common carrier charging a tariff to cover its cost and return.
2.1. !0. Transportation o.f gas by pipelines for distributors or consumers Under certain circumstances distributors and consumers buy gas directly from
R.E. Brooks/Natural gas distribution
27
producers even though they do not have transportation facilities. In these cases they must arrange with a pipeline to transport the gas for them. In such transactions the pipelines charge a tariff to recover their operating cost and return. 2.1.11. Sales by distributors to consumers By far the greatest portion of final gas sales is made by distributors to consumers in various "end-use sectors" such as residences, commercial establishments, industries, and so on. These sales are usually made under the regulation of state public utility commissions. 2. I. 12. Physical transportation This "transaction" involves the transportation of gas by the transmission company through its pipeline network for delivery at other locations "downstream". It is here that the greatest physical effort in the system occurs, including the use of part of the gas for pressurization at compressor stations. Limits on the amount of gas which can be transported are determined by the size of the pipeline and the horsepower available at each compressor station. Operating and maintenance expenses incurred in transmission operations can be used to compute the cost of gas transportation. Losses and use of gas as pipeline fuel can also be computed. For jurisdictional pipelines these costs are all involved in the determination of prices which the pipelines can charge in recovering expenses while earning a fair and reasonable return on capital plant investment. 2.1.13. Gas storage Finally gas can be injected into storage and withdrawn from storage at a later date when gas demand is greater. Such a transaction may involve a storage facility owned by the pipeline itself or the facility may be owned by a separate company entirely. Gas distributors are the primary users of gas storage facilities to control both seasonal and shorter term demand fluctuations. 2.2. Model structure In this section we will formulate a general model of a natural gas delivery system containing all of the transaction types discussed above. This model contains a "generalized network" (linear network with linear gains or losses) as its central component [13]. 2.2.1. The natural gas network A natural gas pipeline is quite naturally represented as a network. It consists of points at which gas enters the system, points at which it leaves, and connections between these points. The points where gas is received or delivered are represented by nodes and the connections between the nodes are represen-
28
R.E. Brooks[ Natural gas distribution
ted by arcs. A typical network model for a single pipeline transmission company might look like that in Fig. 2.1. In this example, circles represent transshipment nodes which are part of the pipeline's physical transmission system. Triangles represent producers or producer groups (sources of supply). Squares represent distributors. X-filled circles represent transshipment nodes of other pipelines. Hexagons represent gas storage areas. Thus the pipeline represented in Fig. 2.1 receives gas from producers in regions I and 2, receives additional supplies from another pipeline company in region 7, delivers gas to distributors in regions, 2, 5, 6, and 8, and to pipeline companies in regions 4 and 9, and stores gas in region 3. The pipeline's actual network runs from region 1 to 2, 2 to 3, 3 to 4, 4 to 5, 4 to 6, 2 to 7, 7 to 8, and 7 to 9. In the GASNET3 data base four pieces of data are needed to define each transaction in the network. The first specifies the type of transaction, while the remaining three specify location and companies involved. These transactions can be coded as follows: XS~jk---deliveries by producer group i in regions j to pipeline k; XXkl,,--deliveries by pipeline k in region ! to pipeline m; XDkln--deliveries by pipeline k in region l to distributor n; and XTki~--deliveries by pipeline k from region j to region l. The second letters in the variable names have mnemonic significance: S refers to supplies, D to distributors, T to transshipment, and X to interpipeline crossings) In the case of the natural gas network in the U.S. there are over 100 pipelines
Fig. 2.1. Typical pipeline network model.
In this and the remaining sections gas storage transactions will be considered to be a special form of supply for producer groups. This is a simple way to model a static situation where only one period is being analyzed at a time. Thus those transactions will represent net storage changes in the given period.
R.E. Brooks/Natural gas distribution
29
such as the one in Fig. 2.1, interconnected in up to three different ways: -exchanging gas among themselves; - competing for supply in the same regions; - competing for sales to distributors in other regions. Thus a producer class in a given region (triangle) may supply several pipelines in that region and a distributor in a given region (square) may be supplied by several pipelines in that region. Also, a pipeline may both receive and deliver gas to several other pipelines in a given region. Two more transactions are needed to complete the model. First of all, distributors sell gas to a number of different consumer classes in a given region as shown in Fig. 2.2. This simple network shows distributor n in region 1 delivering gas to consuming sectors 1, 2 .... N (diamonds). Each of these transactions is symbolized as: XC, lp--deliveries by distributor n in region 1 to consumer class p. (C refers to consumer.) Finally for cases where producer groups deliver directly to distributors the component in Fig. 2.3 is needed. These are symbolized by Xlii,--deliveries by producer class i in region j to distributor n. (I refers to intraregional deliveries.)
Fig. 2.2. Distributor network model.
Fig. 2.3. Producer to distributor transactions.
30
R.E. Brooks/Natural gas distribution
2.2.2. Regional aggregation level To a large extent the level of aggregation chosen by the modeler is situational. In the case of natural gas transactions a great deal of data is available at a very disaggregated level for the interstate pipelines. For example, this data has been compiled by county and by Bureau of Economic Analysis economic area (BEA), as well as by state [22]. Theoretically one could include every junction in the network. While this would be essential if the purpose were dispatching gas in individual pipelines, it is not necessary for a national model used for studying longer time frame and regional distribution patterns. The GASNET3 modelling system has been developed to handle essentially any regional disaggregation the user may want. The data sets currently available to the GASNET3 user provide a model of the natural gas network based on 140 substate level regions in the U.S., Canada, and Mexico. Each state is represented by at least one region. In cases where states can be naturally divided into different distribution service areas or producing areas, this has been done. On the average each state is divided into about three substate regions.
2.2.3. Network parameters and relations Each transaction in the GASNET3 network is specified by parameters which define its cost, any limitations placed on its size (flow), and any losses that could result. In the case of shipment, the arc cost is simply the unit cost of transportation along that arc and is dependent on the distance and to some degree on the quantity of gas transported. In the case of sales to distributors, or other pipelines, the cost is the unit cost involved in selling that gas. In the case of producer sales to pipelines or distributors, the cost is identical to the price since the base price of gas in the ground is zero prior to sale. Two kinds of limitations can be placed on the amount of a transaction: physical and contractual. The physical constraints on a pipeline are determined by the horsepower of its compressor stations and the size of its pipes. These two factors essentially define an upper limit or pipeline capacity on transshipments. Depending on the type of connection made and size of the receiving pipe, similar upper limits can be placed on deliveries to distributors or other pipelines. Producer deliveries are limited by gas field pressures, horsepower at field compressor stations, and by the size of the receiving transmission or distribution lines. Contractual limits are determined by agreements between the various parties (producers, pipelines, distributors, consumers). These agreements call for the producer to provide a definite minimal supply over a certain period of time. Each buyer may have several contracts with different producers in a given region. The time series of dedicated future supply to such a buyer will show a gradual decline as contracts run out. As these contracts expire, new ones may or may not take their place depending on the ability of the producers to supply new gas and their ability to sell to other buyers who might be willing to pay more. Thus
R.E. B r o o k s ~ N a t u r a l gas distribution
31
lower limits on transaction levels can be estimated on the basis of current levels and average contract expiration rates. One final limitation on gas supply involves gas losses. Gas can be lost from the network due to two factors: (1) actual losses in transmission, distribution, and storage, and (2) use of gas as compressor station fuel. In GASNET3 each node (area) is described by an efficiency factor which measures the losses occurring there. Given the parameters specified above, the problem of determining an economically efficient distribution pattern can be expressed mathematically as: minimize
(2.1a)
CijXii., (i, i ) ~ N
subject to
~, ejxij - k~A Xjk = 0 for j ~ T U D,
iEBj
~,A Xij = Si for i E S, J
i
~
i
i
(2.1b)
j
x~j=dj
xo <- Uij
forjEC,
for(i,j)EN,
(2.1c) (2.1d) (2.1e)
x0 >- L~j for (i, j) ~ N,
(2.1f)
x~j_>0 for ( i , j ) E N ,
(2.1g)
where xij is a flow or transaction on arc (i, j) between nodes i and j which may be either an XS, XI, XT, XD, or XC; ej is the efficiency of pipeline operations in node j (i.e. after accounting for losses); c~j is unit cost for the transaction; si is supply at node i; di is demand at node j; U~i and Lij are upper and lower limits on the transaction (i, j); At, Bj are the sets of nodes having arcs leading from and to node j respectively; S, T, D, and C are the sets of supply, transshipment, distributor, and consumer nodes, respectively; and N is the set of all arcs on the network. This model is the standard minimum cost transshipment problem with "gains"; however, modifications are in order to represent the current natural gas situation.
2.2.4. Excess demand The model (2.1) assumes that supplies are sufficient to satisfy demands even after losses (2.1d). In the natural gas market today and for the forseeable future this is not necessarily true. Prices during the 1960's were kept so low relative to the price of oil that a tremendous demand for gas was created. Low prices also reduced investment in exploration and development of gas supplies which was needed to keep pace of the growing demand. Thus curtailments in gas supply began occurring in the early 1970's. In spite of talk about a glut in gas supply in 1979 and 1980, curtailments are still taking place. While decreases in curtail-
R.E. Brooks~Natural gas distribution
32
ments have occurred due to fuel switching combined with increased production due to higher gas prices, excess demand still exists in many areas of the country. Thus (2.1d) must be modified; xij + Ej = dl
for j E C
(2.1d')
ieB~
where Ej is excess demand for gas which must be satisfied by some alternative fuel (or not at all). In addition, the objective function must also be changed:
cOx,i+ ~, PIEj (i,i)~N
(2. la')
jEC
where Pj is the cost per unit of excess demand at node j. Note that the Ej are variables with only one subscript. In a generalized network model this type of arc is called a self-loop [13]. They are, of course, equivalent to slack variables in standard linear program terminology. The next subsection takes up the problem of setting the costs Pj.
2.2.5. Priority of end-use gas allocation In GASNET3 a number of different end-uses of natural gas are defined. For example, distributors can sell gas to residential, commercial, and a variety of industrial customers, including electric utilities. Because of past and projected gas shortages, FERC and state public utility commissions have attempted to establish a reasonable strategy for allocation of gas by end-use sector. Modelling a priority allocation system in GASNET3 involves choosing the excess demand costs Pi in an appropriate fashion so that the desired allocation will be selected. By setting the value of Pj for the highest priority end-use sector sufficiently high, the model will allocate as much gas as it can to satisfy the demand in that sector before it allocates gas to other sectors. Thus to produce a hierarchical allocation, the costs Pj should decrease for each lower priority end-use category. The guideline used in GASNET3 for the selection of these costs is that they be sufficiently separated in magnitude so that consecutive categories do not "overlap", i.e., that one only be partially satisfied before the next is supplied. This can happen if there is a large differential in transportation costs between two regions. For example, suppose that region 1 has enough gas to supply both high priority and low priority users while region 2 has excess demand in its high priority sector. Let P~ and P2 represent excess demand costs for the two sectors and Cl2 the cost of transporting gas from region 1 and region 2. If an incremental unit of gas available at node 1 is used to reduce the excess demand in sector 2 this will save the system P2 in penalties. If this gas is sent to region 2 to reduce excess demand in high priority sector 1 instead, the system cost will be reduced by e2P~- C~2, where e2 is the efficiency of region 2 and C~2 is the transport cost between l and 2. Thus if one selects P~ to be greater than (P2 + C*)/e* where C*
R.E. Brooks~Natural gas distribution
33
is the maximum transportation cost and e* the lowest efficiency between two regions, then the system will allocate gas preferentially to sector 1. The reasoning is directly extendable to more than two hierarchical levels. An alternative to this approach would be to make multiple runs, establishing levels of deliveries to priority one in the first run, fixing them, running again to find deliveries to priority two, fixing them, and so on through the priority list. This approach was not taken because of the greatly increased expense which would be involved in attaining each complete solution. Such an approach might be feasible if the solution code had an efficient advanced basis restart mechanism. 2.2.6. Price dependent formulations Since (2.1) is a linear program there exists a dual problem whose variables are prices satisfying marginal pricing constraints. This is the type of pricing which should prevail in a competitive unregulated industry. Unfortunately this is not the situation in the natural gas industry. Regulation permeates this industry. Whereas the minimum cost network model might be valid for some purposes, it may not be for others. Thus, for example, for short term forecasting where prices are fixed by contract and regulations, the model (2.1) might serve perfectly adequately for allocation and a cost minimization objective for the system as a whole. If one were interested in equilibrium or market clearing prices, however, one would have to utilize price dependent supply and/or demand relations and the model would become non-linear. Such models can be expressed as quadratic programs [1], but solving large quadratic programs can be expensive and computationally difficult. 2 An additional complexity involves the modelling of FERC restrictions on prices of gas moved in interstate commerce. Here a cost of service approach is used which involves average costs rather than marginal costs. A non-linear formulation of this situation involves quadratic constraints [1]. Specialized network codes with quadratic capabilities might be sufficiently powerful to solve these problems and reduce computer costs to acceptable levels. Until such are available, it will not be possible to fully utilize the non-linear regulated model of the natural gas system.
3. Generation of GASNET3 networks and scenarios
3.1. GASNET3 data base and sources The GASNET3 data base from which the user draws his network, supply, and demand estimates, consists of network definition, production, and consumption 2 Professor J.G. Debanne of the University of Ottawa has pointed out that such models can be solved approximately using step-wise approximations of supply and demand functions [10, 11]. In addition, Enquist and Beck at the University of Texas at Austin, have developed some preliminary unpublished results on quadratic network codes.
34
R.E. Brooks/Natural gas distribution
data sets at the substate area levels for over 100 pipeline companies and 200 distributors for the base year 1975. This data base is being updated to include 1976-79 data also. The primary sources for this data base have been the Federal Energy Regulatory Commission, the state energy and public utility commissions, and surveys of natural gas distributors. FERC Form 2, Form 14, and Form 15 reports [24] provide highly detailed data on jurisdictoral pipelines transactions, imports and exports, and seasonality of supplies and sales. State Regulatory Commissions have supplied detailed production data and detailed intrastate pipeline transactions data. Brown's Directory [6] provides the results of annual surveys of gas distribution companies which include detailed reports of sales and prices to the various consuming sectors within the local franchise areas of the distributor. 3.2. Generating the network In this section the methods used to establish the parameters defining a network for GASNET3 are described, in particular the transaction levels, pipeline flows, multipliers, costs, and capacities. 3.2. I. Transaction levels and prices For most XS, XD, and X X transactions, historical levels and prices defined for a selected base period are available directly from FERC Form 2 reports [24] or state gas utility commission reports [23]. In these reports the data is usually presented in a very detailed format which must then be aggregated to the desired level. For example, in the Form 2 reports one pipeline might report sales to a particular distributor at twenty or thirty different places or on several different contracts at the same place. If these places were located within two different regions defined for the particular level of aggregation being modelled, the sales and prices contained in this Form 2 would have to be split up and summed for each of these two regions. Note that X S transactions include two kinds of gas: purchases from producers and pipeline "own-production", i.e., gas produced from pipeline owned gas wells. XD transactions can also include sales by the pipeline to mainline industrial and other "direct" customers as well as sales for resale to distributors. X X transactions include sales, transportation for others, and exchanges. The X I transactions are given directly in state or company reports, or they must be computed indirectly as the difference between distributors' total receipts and their receipts from pipelines. For this estimate two sources of data are required, for example, Brown's Directory [6] or Moody" s Public Utilities Manual [16] for total distributor receipts and Form 2 for receipts from jurisdictional pipelines. Sometimes, it is impossible to resolve inconsistencies through the use of these data sources alone and expert judgment has to be used. For example,
R.E. Brooks/Naturalgas distribution
35
several entries in Brown's have been discovered to have incorrect assignments of quantities to units, i.e., MCF versus therms or cubic feet. These data are corrected prior to insertion in the model data base.
3.2.2. Pipeline flows XT transactions are in a different class from those previously discussed because they are not defined by raw data. Pipeline flows are not sales which are reported. Instead they comprise the movements of gas in the pipeline network. Since these data are not reported they must be estimated. Note that a state level model and a substate level model do not have the same network. Whereas sales and receipts can be simply summed up over all subregions within each state to get state totals, flows over a substate level network cannot be summed up to get flows on a state level network. In fact, the two network models will in general have a different structure entirely. One mechanism used in estimating historical flows is to build a network model for each company and to use mass balance at each node in the model. (See Fig. 3.1.) Mass balance at each node requires that
ek( ~ XTik + INk) = E XTki + OUTk i
~
(3.1)
jEA~
for each node k. Given a reasonable estimate for the set of node efficiency factors (ek)*, and the total receipts (INk) and deliveries (OUTk) at each node, one can compute the flows (XT~k) between each set of connected nodes in the network. Unfortunately, there are some complications. First, the efficiency factors are not necessarily known with certainty. Second, if there are alternative pathways between nodes, a unique solution might not exist. (If all ej = l, then it would certainly not exist.) The first problem is handled by using an iterative solution mechanism. Initial estimates for ek are made and used in solving for the XT~k. These XT~k are then OU T
"~,,
~
~ XTkj
tN Fig. 3.1. Estimating flows by mass balance. * Node etSciency factors define how much gas is lost in each node.
R.E. Brooks/Natural gas distribution
36
used to solve for second estimates for ek which are used to solve for second estimates of X T a and so on. While this method is heuristic, it has worked without any problem in practice. Two or three iterations usually produce a solution within a few percent of the sequence limits. The second problem is handled by solving the set of linear equations as a linear program with dummy objective function and pipeline capacities on each arc.
The LP algorithm selects a set of flows which satisfy the capacity constraints of the network. While this might not be a unique solution, it is at least a reasonable one. Positive lower bounds guarantee that no arc will have a zero flow. 3.2.3. Efficiency factors Efficiency factors for each transmission company are computed by allocating total losses (including pipeline fuel usage) to the various nodes of the company's model. In GASNET3, node rather than arc efficiencies are used. This point of view places greater attention on the nodes since all the physical equipment of the network, all transaction points, and all losses occur in the nodes which represent the service areas of the pipelines while arcs merely represent the boundaries between adjacent areas. Losses in a network are approximately proportional to the distance traveled and the quantity moved and inversely to the pipe diameter [9]. To allocate total pipeline losses to each node, we use the total pipeline "distance" and "volume" for each node. Pipeline "distance" is computed as the inventory of pipe mileages in each region weighted by cross-section. Thus four six inch pipes 10 miles long would be equivalent to one twelve inch pipe 10 miles long or one twenty-four inch pipe 2.5 miles long. The total input to each node, i.e., receipts plus flows in from other nodes, is used for pipeline "volume". Losses allocated to node j are then Li
=
vjOj/d~
L,
(3. 2)
(VkDi/dk)
where L is total company losses, Vj is volume in j, D i is distance in j, di is the average pipe diameter and the summation is over all nodes in the company. The true engineering relation between Li and flow quantity X T , i is highly non-linear and must therefore be considered approximately linear only within a sufficiently narrow band about the historical flow rate used in the estimation process [ 111. The efficiency factor is then computed el = I - Li/Vj
(3.3)
Note that this is equivalent to (3.1) where V i = ~,~ X T , i + INj
(3.4)
R.E. Brooks/Natural gas distribution
37
and Vj - L; = ~
kU:Aj
(3.5)
XTjk + OUTj
As stated previously, computing ej using (3.2), (3.3), and (3.4), one generates an improved value for the flows XTij. These values are then used to calculate refined estimates for ej.
3.2.4. Transmission costs Costs to be ascribed to transmission arcs ( X T ) are derived in a similar fashion. In this case in place of the distance concept of Section 3.2.3 we use the geographical distance between two points on the pipeline located in neighboring regions as the pipeline distance and pipeline arc flow as the volume factor. The unit average cost for arc (i, j) is then: c~j =
(3.6)
Mi i 9 XT~j C
E
M,jXT,j
(i,j)EN
where C is total company transmission cost excluding the cost of fuel used for pumping, M~j is mileage between nodes i and j, XT~ is the estimated flow between i and j, and N is the set of all arcs in the company.
3.2.5. Price m a r k u p s Price markups must be computed for each transaction arc in the model. These represent the " c o s t s " of these transactions. For X S and X I we simply use the wellhead price since it represents the increase in value from the ground to first sale. If the prices for transactions are known, then markups can be computed as price minus cost at that point. But the average cost of gas at any point in the network is determined by the average cost of each of its components. An equation for the average cost at any node can be stated as:
iEBj
i~Bj
where the Ci are node costs, Pkj prices of transactions Xkj between company k and company j, and cij transmission costs for arc flow XT~, and the summations over index k refer to all companies k delivering gas to company j. If Pkj, Xkj, c~j, and XT~j are known this becomes a set of simultaneous equations for Cj. When these are solved the markup for transaction (jk) can be computed simply as: qk = Pjk -- Cj.
(3.8)
38
R.E. Brooks/Natural gas distribution
3.2.6. Capacities Capacities for X T transactions are not computed for the GASNET3 data base, but are estimated on the basis of FERC Flow Diagrams [25]. These documents are submitted by the pipelines each year to indicate the structure of their pipelines and give information on compressor station capacity at points in their networks. Sometimes these capacities are not given and must be estimated. In some cases the data is better and more simply related in FERC Form 2 reports than in the Flow Diagrams. Upper bounds on XS, XI, XX, and XD arcs are preferably specified on the basis of actual physical limitations. When this data is not available, an alternative is to use historical data on peak day or peak hour sales. One can then scale these data to the base time period of the network data base for use as upper limits on these transactions. Where none of these data are available, one can use average load factor estimates which when divided into historical transaction levels will produce estimates of upper bounds on these transactions. 3.3. Supply and demand base data sets In the GASNET3 data base the user has access to supply and demand data sets for use in forecasting future supply and demand. These data sets are actually production and consumption data sets for the base year 1975 by state and substate areas. The production data sets are based on production estimates filed by state energy and conservation commissions and collected and published by International Oil Scouts Association [30]. Consumption data is based on distributor sales as compiled and published by Harcourt, Brace, Jovanovich [6].
3.4. Generating scenarios To generate a scenario the user of GASNET3 first selects network definition, supply, and demand data sets from the GASNET3 data base. He then chooses a forecast date, inflation rate, allocation strategy, and annual contract expiration rates which represent his assessment of the situation he wishes to analyze. The latter parameters are used to establish lower bounds on certain transaction levels by deflating the base period transaction quantities at an annual rate equal to the expiration rate for that type transaction. In addition he may choose growth (or decline) rates for supply and demand data sets if he is using historical base period production and consumption data sets in the GASNET3 data base. Alternately he can prepare his own supply and demand data or network.
3.5. Running the model Upon generating a scenario, the user can then run the model to compute the optimal distribution strategy for gas given his particular mix of constraints and
R.E. Brooks/Natural gas distribution
39
allocation strategy. The network representing this problem is solved using a generalized network algorithm and the solution printed for the user in any of eight reports individually accessible to him.
4. Availability and application of the model GASNET3 is currently operational as an analytical tool for use by TERA and other firms on the BCS Mainstream EKS system. Applications of GASNET3 have included its use by TERA as the basis for the National Energy Transportation Study [22] and other Department of Energy analyses of energy transportation facilities [21]. GASNET3 is also being examined by two natural gas companies as a possible tool for evaluating the local supply effect of imported natural gas from Mexico on their own systems. Earlier models in the GASNET family have been used by the Department of Energy and predecessor agencies in the analysis of Alaskan natural gas transportation system, short term natural gas curtailments, and regional alternative fuels demand. For more localized applications GASNET3 is currently being evaluated as a replacement for the existing Gas Balancing Model of a large natural gas distributor in the United States. In this application GASNET3 will be used in a multi-period mode for planning gas storage injections and withdrawals to account for seasonal and weather related fluctuations in demand. Its primary advantage over the current system, which is based on an out-of-killer approach [19], is that it can account for losses in compressor stations through the use of a generalized network modelling structure. The primary modelling change necessitated in this usage as compared to the National model just described is that the multi-company one period model becomes a single company multi-period model. Indices corresponding to companies are reinterpreted as times. Otherwise the formulation remains the same.
5. Computational experience with the model Early tests with the GASNET3 model showed that use of a multiple purpose linear programming language such as IBM's MPSX to solve for flows on its large (2000 nodes, 5000 arcs) networks would be impractical. Therefore, a special purpose generalized network program called NETG [15] was tested for this purpose and found to reduce costs by a factor of 40 to 1 compared to MPSX [4]. NETG is a computerized implementation of the highly efficient Extended Augmented Predecessor Index (EAPI Procedure) applied to generalized network
R.E. Brooks/Natural gas distribution
40
Table 5.1. Results of GASNET3 test Size: 2180 Nodes, 4734 Arcs (Quantities are CPU Seconds)
INPUT SETUP CHECK NETG REPORT Total
IBM 370/158 9.60 16.71 6.44 38.58 13.53 84.86
CYBER 175 2.79 4.41 2.29 15.43 12.01 36.93
RATIO 3.44 3.79 2.81 2.50 1.13 2.30
problems [14]. This code extends the capabilities of other network algorithms by allowing positive, negative, and zero multipliers. A typical test of a GASNET3 scenario produced the results in Table 5.1. The names on the left column represent five modules within the GASNET3 systems. INPUT and SETUP are used to generate a scenario and prepare it for solution. CHECK is used to examine the scenario for infeasibility prior to solution. NETG solves the network. REPORT is the GASNET3 report writer. Note the relative strengths of the IBM and CDC systems. CDC does much better in data crunching (NETG) and only slightly better in Input/Output (REPORT), but with a higher cost per CPU second. Cost for a typical run ranges between $75 and $150 for both machines depending on the reports selected.
6. Areas of continuing research and development A number of additions are planned to the current data base and program set. The data base is scheduled for a complete update to include 1976-1979 data. Additional effort will be aimed at increasing the comprehensiveness of the alternate fuel data base including SNG, manufactured gas, bottled propane, etc. Further research and development in the area of price related models will also be undertaken. This will include an examination of the potential of currently available generalized network codes in handling non-linear relations such as described previously. At least one pricing module will be selected for inclusion in the GASNET3 program set. Research in the area of econometric estimation of supply and demand relations for use in GASNET3 forecasts will also be needed in conjunction with work in the area of pricing. This will enable the development of an equilibrium model with price dependent supply and demand. Finally, research efforts will be undertaken in the application of computer graphics to GASNET3 solutions. Plotting of distributions on national or regional
R.E. Brooks~Natural gas distribution
41
maps, computerized mapping of transmission systems, schematics of the pipeline network or portions of it, these are all prospective uses for a GASNET3 computer graphics capability.
Acknowledgment The author wishes to acknowledge and thank Joe Debanne, John Mulvey and Darwin Klingman for their valuable criticism and recommendations during the course of preparation of this paper.
References [1] R.E. Brooks, "Allocation of natural gas in times of shortage: A mathematical programming model of the production, transmission, and demand for natural gas under Federal Power Commission Regulation", unpublished Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, MA (1975). [2] R.E. Brooks, "GASNET: A mathematical programming model for the allocation of natural gas in the United States", Working Paper ~34-1976, Graduate School of Business Administration, University of Southern California, Los Angeles (1976). [3] R.E. Brooks, "The development and implementation of the FEA natural gas transmission model", unpublished final report, Chase Econometric Associates, Inc., Bala Cynwyd (1976). [4] R.E. Brooks, "The development and implementation of GASNET2-The EPRI natural gas transmission, distribution, and pricing model", unpublished final report, Robert Brooks and Associates, Norwalk (1977). [5] R.E. Brooks, GASNET3 natural gas transportation modelling system version 1.1, Preliminary user's guide, Robert Brooks and Associates, Norwalk (1978). [6] Z. Chastain, ed., Brown's Directory of Northern American Gas Companies (Harcourt, Brace Jovanovich Publications, Duluth, 1976). [7] J.G. Debanne, "A model for continential oil supply and demand", Journal of Petroleum Technology (1971). [8] J.G. Debanne, "A systems approach for oil and gas policy analysis in North America", Proceedings of the 8th World Petroleum Conference (Moscow, 1971). [9] J.G. Debanne, "The optimal design of natural gas pipelines", working paper 77-28, University of Ottawa, Ottawa (1977). [10] J.G. Debanne, "A regional techo-economic energy supply-distribution-demand model for North America", Computers and Operations Research 2 (1975) 153-193. [11] J.G. Debanne, Communication to the author (1979). [12] Foster Associates, Foster natural gas report 1200, Washington (1979) 1. [13] F. Glover, J. Hultz, D. Klingman and J. Stutz, "Generalized networks: A fundamental computer based planning tool", Management Science 24 (1978). [14] F. Glover, D. Klingman and J. Stutz, "Extensions of the augmented predecessor index method to general network problems", Transportation Science 7 (1973) 374-384. [15] D. Klingman, NETG user's guide Analysis, Research and Computation, Austin (1977). [16] Moody's Investor Service, Moody's Public Utilities Manuals New York (1976). [17] J. Neri, "Short term natural gas forecasting model", Federal Energy Administration, Bureau of Applied Analysis, Washington (1977). [18] Office of Technology Assessment, Transportation o[ Liquilied Natural Gas USGPO (1977). [19] SHARE Distribution Agency, "Out-of-kilter network routine", Hawthorne, New York (1967). [20] T.D. Stacy, World oil (Gulf Publishing, Houston, August 1979) pp. 40--41.
42
R.E. Brooks/Natural gas distribution
[21] TERA Inc., Capital cost of new and expanded capacity for the transportation of energy materials, prepared under DOE Contract EC-77-C-01-8596, Arlington (1978). [22] TERA Inc., Disaggregating regional energy supply demand and flow data to 173 BEAs in support of the National Energy Transportation Study, prepared under DOE Contract EJ-78-C01-6322, Arlington (1979). [23] Texas Railroad Commission, Annual gas utilities reports, (Austin, Annual). [24] U.S. Federal Energy Regulatory Commission, Form 2, Form 14 and Form 15 reports of interstate pipeline companies (FERC, Washington, Annual). [25] U.S. Federal Energy Regulatory Commission, Flow diagrams of the interstate pipeline companies (FERC, Washington, Annual). [26] U.S. Federal Energy Regulatory Commission, Final environmental impact statement, Cities Service Gas Company, Docket No. RP75-62 FERC, Washington (1978). [27] U.S. Federal Energy Regulatory Commission, Final environmental impact statement, Northern Natural Gas Company, Docket No. RP76-52 FERC, Washington (1975). [28] U.S. Federal Energy Regulatory Commission, Natural gas survey, synthesized gaseous hydrocarbon fuel USGPO (1978). [29] U.S. Federal Power Commission, Natural gas distribution analysis, the lower 48 states, El Paso Alaska Company, Docket No. CP75-96 et al., FPC, Washington (1977). [30] J.L. Wiggins, (Ed)., International oil and gas development yearbook, Part II, 46, International Oil Scouts Association, Austin (1976).
Mathematical Programming Study 15 (198 l) 43-57. North-Holland Publishing Company
A BRANCH AND BOUND NETWORK ALGORITHM FOR INTERACTIVE PROCESS SCHEDULING Thomas E. BAKER Exxon Corporation, Florham Park, N J, U.S.A. Received 8 February 1980 Revised manuscript received 12 July 1980
A multi-facility multi-product production scheduling problem is considered in terms of a general class of process unit operations scheduling problems which are common in the refining and chemicals processing industries. A generalized network formulation is used to model the conversion of unit processing capacity to finished products. A specialized branch and bound algorithm is used to enforce the restriction that only one operation can be run per unit at any given time. The algorithm minimizes total costs, which consist of unit operating costs, processing costs, inventory holding costs, setup and changeover costs. A procedure is developed by which the setup and changeover costs are used to estimate bounds for the network model in the branch and bound algorithm. All other costs are incorporated in the network formulation. It is shown that the algorithm is more efficient in those problems for which the setup and changeover costs are small, or in problems in which a lower bound for the setup and changeover costs can be accurately estimated. The implementation of the algorithm in an interactive process scheduling system is discussed in terms of the human engineering factors involved.
Key words: Production Scheduling, Interactive Scheduling, Branch and Bound, Network Model.
1. Introduction In the refining, chemicals processing, and manufacturing industries there are a wide variety of unit operations scheduling problems which have a very similar set of characteristics. The units represent the processing capabilities which must be allocated to various operations which produce the different grades of product. In the class or problems addressed by this paper the demands over time for the products may be discrete or continuous and are considered to be known. The general problem is one of scheduling the operations of the units in such a way as to keep the inventories of all grades within their limits, while minimizing the total operating costs which consist of inventory holding costs, setup costs, changeover costs, operating costs for the units, processing costs for the operations. The scheduling function generally consists of allocating the available unit capacity to the various operations, of sequencing the operations on each unit, and of determining the optimal run length for each operation. The recent literature contains many references to multi-facility, multi-commodity production scheduling algorithms, and the general approach reported in 43
44
T.E. Baker/Process scheduling
the present paper parallels many of those efforts. Most notable among these references are the uses of network production models reported by Dorsey, Hodgson and Ratliff [1, 2], and Love and Vemuganti [6]. However, the present approach appears to be unique for a number of reasons. First, the algorithm resides in an interactive scheduling system. The human scheduler who uses this system, in general, has intimate knowledge of his scheduling problem, but has no knowledge of mathematical programming. Thus, the algorithm must be truly robust in that it must be able to withstand unsympathetic intervention on the part of the user. Second, the network model used by the algorithm is transparent to the user, and is considered by the rest of the scheduling system to be only an approximate model. This property changes completely the way in which bounds are treated in the branch and bound process, as will be discussed in a later section. Third, due to the specialized structure of the network formulation for the process scheduling problem, the branch and bound process itself can be reduced to a special compact and efficient form.
2. Interactive environment
As mentioned in the introduction, the user of the process scheduling system understands all the critical aspects of his problem, and is quite capable of producing good schedules on his own. It is well recognized that the user may have knowledge of scheduling considerations which are impossible to capture within any given model in the scheduling system. For this reason, the user is given complete control over the schedules produced by the system, and is given the capability to interact with the opimization process at almost any level. Unfortunately, when the combinatorial nature of a scheduling problem becomes too complex, the human scheduler tends to rely on simplifying rules of thumb. The real role of the optimization algorithms in the process scheduling system is to guide the user away from his cherished rules of thumb when those rules of thumb become nonoptimal. With these considerations in mind, the interactive process scheduling system was designed with four basic elements data, solution, simulation and algorithms-as shown in Fig. 1. Through a CRT the user is able to display and manipulate the basic data in the system which define the scheduling problem. Through the screen he can define the complete plant layout, the characteristics of production units, processing options, storage configurations, demands for finished products, availability of raw materials, etc. He is also able to display the current schedule, or solution, in the system which is evaluated by means of a deterministic simulation model. Every change which the user makes to the basic data, or to the current schedule, is evaluated immediately by the simulator in order to produce a new solution screen. Thus, the model which resides in the simulator, and which was defined
T.E. Baker/Process scheduling
45
RUN SIMULATION
II/
CHANGE DATA q
! ALTER SOLUTION
BASIC DATA AND SEQUENCE OR ASSIGNMENT
DATA
SIMULATION
~//
L ~[~\
SOLUTION
SCHEDULE AND EVALUATION
1REPLACE INCUMBENTSOLUTION WITH BETTER SOLUTION
INDIRECT CONTROL OF ALGORITHM
cl ALGORITHM
RUN ALGORITHM
Fig. 1. Interactive scheduling system.
by the user through alterations to the basic data tables, becomes the reference model for all algorithms which reside in the process scheduling system. Next to the user's mental model of the scheduling problem, the simulator becomes the best model available. The optimization algorithms in the system may operate on approximate models, such as our network model. However, as indicated by the TEST loop in Fig. 1, the solutions produced by these algorithms are generally checked against the simulation model before being displayed on the solution screen. As a result of the above structure, the branch and bound algorithm, which will be described in Section 4, takes on a unique form. During the branching process the evaluations of the objective function produced by the network model are considered to be partial evaluations of the cost, and are used only to estimate bounds for the branching process. During the branching process, when a true evaluation of the total cost is required, the system relies on the simulator. A lengthy consideration of the human factors involved in the process scheduling environment has led us to design an interactive system which is neither conversational nor overtly didactic. Users quickly adapt to representations of their problem in terms of data arrays and incidence tables and, in general, prefer not to be encumbered with a predetermined conversational mode of operation. Online descriptions of system functions are available upon request but these features are seldom used after the initial training phase.
46
7".E. Baker/Process scheduling
To the extent that it is possible on a CRT, scheduling information is presented to the user in a spatial rather than numerical form. For example, on the main scheduling screen time is represented on the horizontal axis and each major scheduling resource is assigned a number of lines on the vertical axis. Operations are represented as horizontal blocks whose horizontal dimensions are proportional to the processing times. The resulting bar chart allows the user to see at a glance the state of his production units at any point in time. Critical inventory problems are flagged on an exception basis at the point in time when the problem occurs. In a similar vein, the user is provided with the facility of altering sequences and run lengths by simply moving characters about the scheduling screen. Reliance on spatial representations eliminates the need for the user to remember specialized input formats and enables the user to make full use of his innate pattern recognition capabilities.
3. Network model
The network model used by the algorithm is a simple multi-time period network flow model which follows unit processing capacity and the inventories of the different grades of products. The processing model is shown in Fig. 2.
TOTALCAPACITY P1 CAPACITY\ AFTER \ PERIODkl ~
P
DAYSALLOCATED TO OPERATION Pl 2
ALLFLOWSIN DAYS
~SP
PP ~v ARECAPACITY
Fig. 2. Processing model.
47
T.E. Baker/Process scheduling
The scheduling time horizon for each unit is divided into a number of scheduling periods. As indicated by the notation used in Fig. 2, the scheduling periods for each unit are independent of the periods associated with other units. Each period is represented in Fig. 2 by a node. The flows out of these nodes, which intersect with the inventory model, represent the number of days which each unit time period spends on running the various operations denoted by Pp. Processing costs by unit by operation are assigned to these processing arcs. All flows in the processing portion of the model are in days. The structure of the inventory model is shown in Fig. 3. For each grade which is followed in the model, the scheduling time horizon is broken up into a number of fixed time periods. These time periods are independent for each grade. All flows in the inventory model are in tons. Inventory holding costs are assigned to those arcs which represent the inventory at the end of each time period. Product is added to the inventory portion of the model through the processing arcs, Pp. Each operation, or process, deals with only one grade but more than one process can make the same grade. The processing model (in days) and the inventory model (in tons) are tied together through the processing arc multipliers, rkjv (in tons/day), which represent the rate at which unit k can produce grade j through operation p. The time periods in the inventory model are fixed in length, while the processing model has variable STARTING INVENTORY RADE Gj Pl*rljl
[ ~.._ ~ .~
/
/
~
DEMAND IN ~ PERIODjl
\ INVENTORYAT END \
OF PERIODjl
P2*r2j2 . ! \ Pp rkjp ~--~, Gj2 ) TROoNDS uOFEGRB~DENGIIF Uk THROUGH OPERATIONPp ' ~
ALL FLOWSI
N
~
v ~ Fig. 3. Inventorymodel.
DEMAND IN ~-~ PERIODj2
II.\ CLOSING INVENTORY
48
T.E. Baker~Process scheduling
length time periods. A solution to the combined model yields, for each unit, a sequence of operations and the run length of each operation. It should be noted that the model is approximate in that the length of a unit production period is not forced to equal the length of a demand period with which it is associated. The network model, as posed above, has two main limitations--setup and changeover costs for moving from one operation to another cannot be represented, and the formulation does not prevent more than one operation to be run on a given unit in any production period. Both of these limitations will be overcome to some degree in the branch and bound algorithm. The combined network model represented in Fig. 4 shows a number of possible operations, Pp, emanating from each production period node. These groups of arcs, as indicated in Fig. 4, may be thought of as multiple choice decisions. However, since the term multiple choice usually refers to a set of 0-1 decisions, the term cardinal group will be used throughout the following.
MULTIPLIER r l l I GRADE G 1
Fig. 4. Combined model with cardinal groups.
T.E. Baker/Process scheduling
49
Definition 1. A cardinal group is a set of non-negative variables for which there is an implied constraint that at most one variable is allowed to take on a positive value. A cardinal group is called cardinal feasible if the implied constraint is met. A solution to the network problem is considered cardinal feasible if all the cardinal groups in the solution are cardinal feasible.
4. Branch and bound algorithm The decision process, which involves a series of choices among groups of continuous variables, has two special properties which suggest a special purpose branch and bound algorithm. First, there are no integer variables, only discrete relationships between continuous variables. Second, the multiple choice nature of the variables in the cardinal groups greatly reduces the size of the branching tree, i.e., if one of the variables in a cardinal group is chosen to be positive all the other variables in the group must be bounded at zero. The following branch and bound algorithm, which takes advantage of these properties, is a depth-first branching algorithm which maintains a master list of the unsolved subproblems. This master list is processed on a last-in, first-out basis--a simple backtracking enumeration scheme. As indicated in Fig. 4, every arc which connects the processing model with the inventory model is a member of a cardinal group. In general, there will be one cardinal group for each unit time period. As shown in the appendix, when the above formulation is represented in algebraic form, these processing arcs, or cardinal group arcs, would have to be written with five subscripts indicating the allocation of operation p to unit k in time period m producing grade j in demand period n. In order to avoid this notation in the following, the cardinal group arcs will be labelled simply by the index i. Let vi be the value of variable i, let Ig be the set of indices for the variables in cardinal group g, let G be the set of indices for all the variables in cardinal groups, and let s~ be the state of the bounds imposed on variable i,
If
si
=~1 or 2, then 0-< vi <- MAXi, [0, then vi = 0.
(I)
Let S = {s,} be the working set of state variables of the cardinal groups. During the branching process, the bounds on the processing arcs in the network model are determined by the state variables in the working set, S. Unsolved subproblems are defined by storing an altered version of the working set in the master list of unsolved subproblems. The stored subproblems are retrieved from the master list to the working set during the backtracking process.
T.E. Baker/Process scheduling
50
4.1. Bound evaluation The following algorithm assumes that the total objective function, f, is made up of two components--f s, the objective function produced by the network model, and .fs, the evaluation of the setup and changeover costs. f = fN + fs-
(2)
If we can develop a lower bound, bs, for the setup and changeover costs, fs, then we have a lower bound on the function .f related to the objective function from the network model, fN. .f ~ .r + bs.
(3)
If we let b represent the value of the incumbent best solution, the usual branch and bound test, [ <- b, can now be expressed in terms of the objective function of the network model. fN -< b - bs = bN.
(4)
This network bound, bN, replaces the bound usually associated with the branch and bound process. However, it is only an estimated bound. The incumbent best solution is not replaced unless the total objective function, .f, is less than the total bound, b. Methods of estimating the setup and changeover cost bound, bs, will be discussed in Section 4.3.
4.2. The Algorithm (ZAP) Initialize the working set, S, by setting s, = I for all i in G. Set the overall objective function bound, b, equal to infinity unless a valid current solution is available to produce a starting bound. Estimate a lower bound, bs, for the setup and changeover costs. Step I: Solve the network model with bounds on the arcs determined by the working set, S, according to (I). Step 2: If the solution is infeasible, or if the objective function from the network model, [s, is greater than the network bound, bN -- b - bs, proceed to Step 3. Otherwise, proceed to Step 4. Step 3: Terminate if the master list is empty. Otherwise, restore the working set, S, from the last subproblem stored in the master list and return to Step I. Step 4: If the solution is cardinal feasible, proceed to Step 7. Otherwise, find the first cardinal group, g, which is not cardinal feasible. Proceed to Step 5. Step 5. Choose c such that vr
i~l~}
(5)
Proceed to Step 6. Step 6. Create a stored set by copying the working set with the following
T.E. Baker/Process scheduling
51
alterations, 2,
si= 0,
if i = c,
(6)
ifiEIg, i~c
Add the characteristics of the stored set to the master list. Alter the working set by setting sc = 0. Return to Step 1. Step 7. Evaluate total objective function, [. If [ > b proceed to Step 8. Otherwise, store the solution as the best incumbent solution and let b =f. Proceed to Step 8. Step 8: If vi--0 for all i in G for which si-- l, return to Step 3. Otherwise, choose c such that vc=min{viivi>0, si=l,
lEG}
(7)
Return to Step 6. The logic followed by the above algorithm is first to produce cardinal feasible solutions with minimum distortion (Step 5) and then to minimize the number of production runs in the schedule (Step 8). At any level in the branching tree, it is possible to restore the last subproblem from the master list if it is known which variable was selected at each previous level in Step 5 and the level at which each cardinal group was restored in Step 3. Thus the maximum total storage required for the master list is equal to the number of cardinal group arcs plus the number of cardinal groups. In the current production version of the ZAP algorithm, the network subproblems are solved by the generalized network code, NETG [4]. The total objective function, f, is evaluated by means of a deterministic simulation model. The solution to the network problem produces a sequence of operations for each unit and the run length for each operation which, in turn, serve as input to the simulation model. At present, the lower bound on the setup and changeover costs is determined by a simplistic calculation, which first determines which operations have to be run for a feasible schedule, and then determines the minimum setup and changeover costs for running those required operations. Obviously, the value of bs is critical to the bounding efficiency of the above algorithm. If the setup and changeover costs represent a large component in the total objective function, or if it is not possible to estimate a good lower bound, bs, then the above algorithm becomes less efficient in its search for the best solution. In the worst case, where all of the costs in the total objective function are made up of setup and changeover costs, the above algorithm degenerates to a complete enumeration of all feasible production sequences.
4.3. Extensions The above algorithm has been coded with state variables defined as in (1) in order to interact more efficiently with the network code, NETG. However, the
T.E. Baker/Process scheduling
52
algorithm can be extended to semi-continuous branching problems with positive lower bounds on processing arcs, and to bivalent mixed integer programming problems. In order to handle semi-continuous flows in cardinal groups, (I) is replaced by: If
s,=
i' 0. .
then 0<- v~ -< MAX. thenv,=0. then MIN, -< v~ -< MAX,.
(8)
For bivalent variables. (I) is replaced by: If
s~=
1, then MINi-< c, <- MAX, 0, then vi=MIN~, 2, then v~ = MAX,.
(9)
In the bivalent case, the variables have to be rescaled by their lower bounds in order to maintain the definition of cardinality. For both the semi-continuous and bivalent cases, the branch and bound algorithm outlined above is valid if Step 7 is altered to evaluate the total objective function only if positive variables satisfy the bounds implied by state s~ = 2. In the version of the algorithm described above, the setup and changeover cost bound, bs, is estimated before the branch and bound search begins. This bound, however, could be determined for the current working set, S, in Step 2, Since the working set typically contains a partial assignment of operations to units, the bound, bs, should become progressively stronger as the level of the tree increases. The efficiency of the search should increase dramatically in problems with a weak initial bound.
5. Operational experience The algorithm described above has been implemented in a general purpose interactive process scheduling system which is currently in use at a number of Exxon locations. When a user of the interactive process scheduling system enters the ZAP command, the network model is generated automatically on the basis of the basic data which the user has defined in the system. Since the process scheduling system is used for a large variety of scheduling problems, we found that it was a non-trivial task to design a matrix generator which would produce a valid network model for problems of all shapes and sizes. The basic logic followed by the network generator will be outlined here only briefly. The matrix generator first determines, for each grade, the total demand for that grade during the scheduling period. On the basis of the working inventory for that grade, the generator determines roughly how many production runs of that grade will be required during the scheduling period. The number of production runs for each grade forms the basis of the determination of the number of
T.E. Baker~Process scheduling
53
time periods which are required for each grade in the inventory model. The processing model is then generated to have approximately the same number of unit production periods as there are demand periods in the inventory model, i.e., the total number of demand periods for all grades is roughly equal to the total number of production periods for all units. The processing arcs connecting the processing model to the inventory model are then generated on the basis of basic data. The number or processing arcs is carefully determined in order to give the model sufficient processing flexibility without making the decision process unduly laborious. The user can communicate with the ZAP algorithm at several different levels. By altering his basic data, he indirectly influences the structure of the network model by restricting unit/operation combinations and by modifying the number of production periods to be generated. He directly influences the branch and bound search by forcing selected operations into the schedule via a special command on the main scheduling screen. Since it is part of the basic design philosophy of the interactive process scheduling system to avoid long delays in returning a solution to the user, the algorithm will interrupt the search after a predetermined number of subproblems and return to the user with the best solution found to that point. The user is then given the choice of continuing the search, or of changing the solution or basic data. If the changes made by the user invalidate the branching tree, then a model is regenerated and the branching algorithm starts from scratch.
6. Computational results The dimensions of the problems listed in Table 1 are typical of the scheduling problems which are encountered in the process scheduling system. In all of the problems shown in Table 1, the setup and changeover cost component of the objective function was reasonably large causing the network bound, bN, to be quite weak. The results show, however, that even with a weak network bound the infeasibility bounding carried out by Step 2 in the algorithm can limit the search in even large problems to computable levels. In practice, the product demands and inventory constraints under which a plant must operate, naturally reduce the burden placed on the branch and bound search. All of the problems shown in Table 1 were run on an IBM 370/168 with a FORTRAN G compiler. The CPU times shown for the problems include model generation, the branch and bound search, network solution, and the calls to the simulator for the evaluation of cardinal feasible solutions. Except for Problem 8, all branch and bound searches were completed in the time shown, establishing optimality with respect to the model formulated by the ZAP algorithm. In order to compare the results shown in Table 1 with mixed integer programming results published elsewhere, the equivalent mixed integer program-
T.E. Baker/Process scheduling
54 Table l Computational results
1
1
6
6
22
82
10
60
328
.92
201
.86
1.84
2
2
4
4
16
48
8
32
384
.51
148
.49
1.05
3
3
6
6
24
56
12
32
500
1.03
277
1.68
2.78
4
3
8
8
32
82
15
50
1210
3.19
116
.85
4.19
5
4
9
6
26
62
12
36
1376
2.69
188
1.23
4.08
6
3
9
9
36
96
18
60
2340
7.76
1
.01
8.10
7
5
9
9
31
82
15
51
738
1.82
202
2.22
4.19
8
5
9
9
40
104
20
64
2000
8.06
1219
13.23
21.75"
9
5
8
8
21
61
10
40
2268
3.89
406
3.86
7.99
* SEARCH NOT COMPLETE
ming formulations of the problems shown above would have the same number of integer variables as there are cardinal group arcs. The number of rows in an equivalent MIP formulation would be equal to the number of network nodes plus the number of cardinal groups plus the number of cardinal group arcs. The number of columns in an equivalent formulation would be equal to the number of network arcs plus the number of cardinal group arcs.
7. Conclusion
The power of a generalized network code has been applied to those aspects of the unit operations scheduling problem which best fit its capabilities, namely those of resource allocation, continuous cost minimization, and conservation of flow for product movement and unit processing capacity. However, setup and changeover costs and the cardinality conditions for unit operations cannot be handled directly in the network model and must be considered in the branch and bound portion of the algorithm. The combined power of the specialized branch and bound algorithm and the generalized network code, produces solutions with sufficient speed that the user often is unaware of a wait time difference between
T.E. B a k e r / P r o c e s s scheduling
55
a call to the optimization algorithm and a call to any of the simple data retrieval commands in the interactive scheduling system. There are, however, a number of limitations of the algorithm in terms of its applicability to the general process scheduling problem for which the interactive process scheduling system was developed. For example, the use of a network code requires that the model formulation follows only one product per operation. In the general scheduling problem, it may be necessary to follow splitting operations with byproducts and blending operations with a combination of blend stocks. In its present form, the algorithm follows only one major grade per operation, in the same sense, the formulation used in the present algorithm is limited to single stage production problems. Both of these limitations could be overcome at the expense of solving the subproblems as standard linear programming problems. User response to the algorithm has been generally favorable. Since the user has the capability of altering or rejecting any of the solutions produced by the algorithm, he does not feel compelled to understand the optimization procedure upon which the algorithm is based. The users, however, seem very quick to understand the power of the algorithm on an input/output basis. They readily understand the relationships between alterations in basic data and the corresponding variations in optimal solutions. As a standalone optimization procedure, the algorithm described above is capable of producing good solutions to a wide variety of unit operations scheduling problems. However, in an interactive environment, the algorithm not only produces good schedules but increases the user's appreciation for those factors which characterize good solutions to his scheduling problem.
Appendix. Mathematical formulation for the network model A. 1. Notation Indices k
unit index, from I to K units grade index, from I to J grades J unit time period index, from 1 to Mk periods m i1 grade time period index, from I to Ni periods P operation index T'~ set of all operations p which can be run on unit k in time period m T7 set of all operations p which can affect grade j in time period n
Variables
UT'
capacity (in days) remaining on unit k at the end of time period m
G7 inventory (in tons) of grade j at the end of time period n
T.E. Baker/Process scheduling
56
P k'~?, days spent running operation p on unit k in production period m affecting grade j in demand period n
Given data demand for grade j in time period n rate (in tons/day) at which unit k produces grade j through operation p maximum inventory for grade j MAXi MINi minimum inventory for grade j GO starting inventory for grade j RUNp maximum run length for operation p u 0 total capacity for unit k pcostkj~ production cost of operation p icost7 inventory holding cost for grade j D7
rkm
A.2. Formulation K ~
S Nj
minimize I j
+
=1 = N~
n-I pEI'I"~nTT)
Y~ icost~'*G~', =
n= J
N1
0 ~ U~, for all k = I ..... K, a n d m = l ..... Mk
0-< P~'~ <- RUNp, K
.',4~
GT=Gr-'-D"j+~,~__
~
rk,o*Pk"~,
(a2)
=lp
MIN i-
for a l l j = i ..... J and n = l ..... N i
where (a I) corresponds to the production model and (a2) corresponds to the inventory model. References 111 R.C. Dorsey, J.J. Hodgson and H.D. Ratliff, "A production-scheduling problem with batch processsing", Operations Research 22 (1974) 1271-1279. [2] R.C. Dorsey, T.J. Hodgson and H.D. Ratliff, "'A network approach to a multi-facility multiproduct production scheduling problem without back ordering", Management Science 21 (7) (1975) 813-822. 131 A.M. GeolSrion and G.W. Graves, "'Scheduling parallel production lines with changeover costs: Practical application of a quadratic assignment/LP Approach", Operations Research 24 (4) (1976) 595-bl0. [41 F. Glover, J. Hultz. D. Klingman and J. Stutz, "Generalized networks: A fundamental computer-based planning tool". Management Science 24 (12) (1978) 1209-1220. [51 B.J. Lageweg, J.K. Lenstra and A.H.G. Rinnooy Kan, "Job-shop scheduling by implicit enumeration", Management Science 24 (4) (1977) 441.450.
T.E. Baker/Process scheduling
57
[6] R.R. Love, Jr. and R.R. Vemuganti, "The single-plant mold allocation problem with capacity and changeover restrictions", Operations Research 26 (1) (1978) 159--t65. [7] A.S. Manne, "Programming of ecomonic lot sizes", Management Science 4 (1958) 115-135. [8] A.T. Mason and C.L. Moodie, "A branch and bound algorithm for minimizing cost in project scheduling", Management Science 18 (4) (1971) B-158-B-173. [9] A.H.G. Rinnooy Kan, B.J, Lageweg and J.K. Lenstra, "Minimizing total costs in one-machine scheduling", Operations Research 23 (5) (1975) 908-927. [10] E. Uskup and S.B. Smith, "A branch-and-bound algorithm for two-stage production-sequencing problems", Operations Research 23 (I) (1975) 118-136.
Mathematical Programming Study 15 (1981) 58-85. North-Holland Publishing Company
APPLICATIONS
OF THE OPERATOR THEORY PROGRAMMING FOR THE TRANSPORTATION AND GENERALIZED TRANSPORTATION
OF P A R A M E T R I C PROBLEMS
V. B A L A C H A N D R A N Graduate School of Management, Northwestern University, Evanston, IL 60201, U.S.A.
V. S R I N I V A S A N Graduate School of Business, Stan[ord University, Stan[ord, CA 94305, U.S.A.
G.L. T H O M P S O N Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, PA 15213, U.S.A.
Received 4 January 1978 Revised manuscript received 21 January 1979 Many operations research problems can be modelled as the transportation problem (TP) or generalized transportation problem (GTP) of linear programming together with parametric programming of the rim conditions (warehouse availabilities and market requirements) and/or the unit costs (and]or the weight coefficients in the case of the GTP). The authors have developed an operator theory for simultaneously performing such parametric programming calculations. The present paper surveys the application of this methodology to several classes of problems: (a) optimization models involving a TP or GTP plus the consideration of an additional important factor (e.g., production smoothing, cash management); (b) bicriterion TP and GTP (e.g., multimodal TP involving cost]time trade-offs, TP with total cost/bottleneck time trade-offs); (c) multi-period growth models (e.g., capacity expansion problems); (d) extensions of TP and GTP (e.g., stochastic TP and GTP, convex cost TP) (e) branch and bound problems involving the TP or GTP as subproblems (e.g., traveling salesman problem, TP with quantity discounts); and (f) algorithms for solving the TP and GTP (e.g., a polynomially bounded primal basic algorithm for the TP, a weight operator algorithm for the GTP). The managerial and economic significance of the operators is discussed. Key words: Transportation Networks, Parametric Programming, Generalized Transportation Problems, Operator Theory, Network Application.
1. Introduction A s u r v e y of the industrial and g o v e r n m e n t applications o f m a t h e m a t i c a l p r o g r a m m i n g r e v e a l e d that p e r h a p s as great as 70% o f such applications fell in the c a t e g o r y of n e t w o r k flow problems, in particular t r a n s p o r t a t i o n P r o b l e m s (TP) and Generalized T r a n s p o r t a t i o n P r o b l e m s ( G T P ) [32]. T h e intuitive appeal of these m o d e l s , the impressive c o m p u t a t i o n a l p e r f o r m a n c e of their solution algorithms [ 13, 58
V. Balachandran, V. Srinivasan, G.L. Thompson~Applications operator theory
59
31, 48, 55] and the ability to reformulate several more complex O.R. problems as single or sequence of TP (or GTP) [32, 37] are probable reasons for this popularity. Our purpose in this paper is to suggest ways to further increase the applicability of TP and GTP. The special mathematical structures of TP and GTP have prevented their applications to a broader class of problems and have limited the considerations that can be taken into account in solving practical problems. Specifically, our focus is on a class of problems which cannot be solved directly as TP or GTP but which can be solved as the TP or GTP together with parametric programming procedures for examining the effects on the optimal solution of continuous changes in the data of the problem. The present paper surveys the previous work in this area. For greater details, proofs, numerical illustrations, and more rigorous discussion of the ideas presented here, appropriate references will be provided. In the remainder of this section, we give a flavor of the algorithms employed to perform parametric programming calculations using what we call "operators" and motivate the value of operator theoretic algorithms in increasing the applicability of TP and GTP. The remaining sections discuss the different classes of applications where operator theoretic methods have proved useful in conjunction with a TP or a GTP formulation. To fix ideas, let us consider a TP with a set I = {i} = {I, 2 ..... m} of warehouses, a set J = {j} = {1, 2 ..... n} of markets with unit costs {c~j}, availabilities {ai} and requirements {bs} where ~,i~i ai = ~,jEs bs. Denoting by xi~ the amount shipped from warehouse i to market j, we define the transportation problem P to be"
min s.t.
~'.
c~/xi~= Z,
(1)
(i, j)E[IxJ]
~'. x,j = a,. for i E / ,
(2)
for j E J,
(3)
for (i, j) ~ [I x J].
(4)
j~l
~, xii = bs iEl
x,i - 0
The GTP is the same as the TP except that (2) is replaced by
~, eiixij <- ai
j~J
for i ~ L
(5)
(In the Machine Loading context [20], e o ( - 0 ) refers to the per unit production time of product j on machine i. The restriction ~i~i a, = ~iEs bj does not arise for the GTP. Furthermore, in the GTP, ai and bj may be in different units.) For expositional ease, our discussion in the rest of this section will be in the context of the TP. The reader is referred to [44, 45] for the more general capacitated (or upper bounded) TP (where (4) is replaced by 0 -< xis -< U~j)and to [7, 8, 9, 10] for the results in the case of the GTP. We assume that the reader is familiar with the MODI [17, pp. 308-313 and 41,
60
V. Balachandran, V. Srinivasan, G.L. Thompson/Applications operator theory
ch. 11] or stepping-stone method [15, ch. II] for solving the TP (adaptation of the primal simplex method to the TP) with its terminology such as transportation tableau, basis B, cycle, basic solution (i.e., {x0} satisfying (2)-(3) with x~j = 0 for nonbasic cells), primal feasibility (i.e., (4)) and dual feasibility, i.e.,
u~ + vj <- qi for (i, ]) E [I x J]
(6)
where the dual variables {ui} for i E I and {vj} for j E J are defined so as to satisfy ui + vj = cij for (i, j) E B.
(7)
By a cell we mean an index pair (p, q) with row p E I and column q E J. The basis B can also be represented as a tree Q in the graph [I U J, (I x J)]. When any basic cell (p, q) E B is dropped, the tree splits into two subtrees with row p in one subtree QR and column q in the second subtree Qc. The set of rows and columns in QR (Qc) are denoted by IR and JR (Ic and Jc) respectively. The subsets IR and Ic (JR and Jc) partition I(J) [44, pp. 217-218]. An operator ST(P) determines the sequence of optimum solutions (i.e., {xij}, {ui}, {vj} and Z) as the problem P with data {ai}, {bj}, and {%} is transformed into problems PT(8) with data {aT}, {b T} and {c T} which are linear functions (8)-(10) below of a single parameter 8 for all 0 ~ 8 < oo:
aT=ai+sai
foriEL
(8)
bT= bi + 8~j
for j E J,
(9)
c~ = cij + ~3'0 for (i, j) ~ [I x J]
(10)
where the prespecified values for {ai} , {~j} and {Yij} are unconstrained in sign but such that
iEl
~<,= ~ flj. ~EJ
(11)
Constraint (11) is required so as to satisfy ~iE1 aT = ~jEJ b T. By imposing restrictions on the values for {oti}, {fli} and {7ii}, we get some important special cases of the operator fiT(P): (A) Rim operators 6R(P) where 3'ij = 0 for all (i, j)E [I x J], i.e., the data changes are only in the rim conditions {ai} and {bj}. (B) Cost operators 8C(P) where ai = 0 for i E I and/3j = 0 for j E J, i.e., the data changes are only in the cost coefficients {cii}. The rim operators are further classified into: (A1)(Plus) cell rim operator 8R~q(P) where a~ = 0 for i E I - {p} with so = 1 and /3j = 0 for j ~ J - {q} with/3q = 1. i.e., all data remain the same except apT__ - ao + and b T = bq + 8. The name cell operator arises from the fact that (p, q) is a cell in the transportation tableau.
v. Balachandran, V. Srinivasan, G.L. Thompson/Applications operator theory
61
(A2) (Minus) cell rim operator 8R[,q(P). This is the same as (A1) above except ap = - l and flq = - l , i.e., all data remain the same except a T = ap--8 and bqT = bq - 8. (A3) Area rim operator 8RA(P) which is any rim operator, without any restrictions on {oei} and {/3~}aside from (11). The cost operators are likewise classified into: (B1) (Plus) cell cost operator 8C~q(P) where ~'ii= 0 for (i, j) E [I • J] except T = cp~ +8. that Tpq = 1, i.e., only data change is cpq (B2) (Minus) cell cost operator 8Cpq(P). Same as (B1) above except that T = cpq - - 8 . ypq = - l , i.e., all data remain the same except that cpq (B3) Area cost operator 8CA(P) which is any cost operator without any restrictions on {y~j}. The cell rim operator considers a simultaneous increase (or decrease) in warehouse p and market q. This concept is generalized to that of binary rim operators in [22] where the possibility of increasing ap to ap + 8 and decreasing aq to a q - 8 with all other data remaining the same is considered. (Likewise bp can be changed to bp + 8 and bq changed to bq - 8 with all other data remaining the same.) In the case of capacitated transportation problems with upper bounds {U ii}, we have bound operators as well which can be reduced to rim operators [44, pp. 221-223]. For the GTP, we also have weight operators which examine the effects on the optimal solution of changes in the {e;i} [9]. Algorithms for implementing rim and cost cell and area operators are discussed in [44, 45] for the TP and in [7, 8, 9, 10] for the GTP. (The references [8, 44] also discuss the application of the more general operator ST(P).) These algorithms can roughly be described as follows:
Algorithm I (for applying ST(P)) (i) Determine the basic optimum solution to the problem P. Let B1 be the optimal basis. Let k -- 1 and 81 = 0. (it) Determine the maximum extent /Zk such that for 8k --< 8 ~< 8 k "Jr ~J'k the basis Bk continues to be optimal for the problem pT(8). For this range of 8, the optimal primal and dual solutions can be easily determined since the optimal basis is known. If p.k =0% stop. Otherwise, go to (iii). (iii) Determine an alternate optimal basis Bk+I for the problem pT(8) for 8 = 8k +/Zk. If no such basis can be found, the problem PX(8) is infeasible for 8 > 8k + /~k; stop. If such a basis can be found, set 8k+l = 8k + /Zk, k = k + 1 and go to (it). In step (it) of the above algorithm, we determine the optimal solution as a function of 8 with the optimal basis remaining the same. Such operators are referred to as basis preserving operators and are denoted by light face letters such as ST(P), 8R(P), 8C~q(P), etc. In the case of cell rim operators with (p, q)q2Bk, 8Rgq(P) amounts to shifting the amounts {xij} around the cycle created by adding (p, q) to Bk. (If (p, q) ~ Bk only xpq gets altered.) The objective
62
V. Balachandran. V. Srinivasan, G.L. Thompson/Applications operator theory
function Z increases by (6 - 6k)(Up + Vq). In the case of area rim operators, the transformed solution is obtained as Xii = Xi0 + (6 - ~k)Yii for (i, j) E [I • J]
(12)
where {Yij} satisfy (2)--(3) with a; and b i replaced by a; and/3j and such that Y0 is zero for nonbasic cells. The {x,~ corresponds to values of { x J for P'r(6D. The objective function Z increases by ( 6 - 6~)[]~1 a~u~ + ~,i~J {3ivjl. Since the optimal basis remains the same and the costs {co} are not changing, the optimal dual variables do not change for basis preserving rim operators. The value for ~k is determined as the maximum value for (~5-6k) so that the transformed {xo} satisfy the nonnegativity constraints (4). Thus for the problem Pr(Sk + ~,A) at least one basic cell, say, (r, s) reaches the value x, = 0 and any further application of the basis preserving operator with basis Bk would drive Xr, negative. The alternate basis Bk+~ is determined from Bk by finding a cell (e,/') such that Bk~l = Bk - {(r, s)} + {(e, f)} is also an optimal basis for Pr(6k + t~k). The cell (e, ]') is determined so that c,f - u, - v t =
min
( c , j - u, - vj)
(13)
(i. i}EIIc•
where the sets lc and JR are determined by dropping (r, s) from BK (see earlier discussion). If the set [ I c x J,] is empty, it can be shown that the problem PT(6) has no feasible solution for 6 > 8k + ~k. For cell cost operators 6C~q with (p, q) ~ B, step (ii) of the above algorithm amounts to determining the sets IR, JR, It. Jc obtained by dropping (p, q) from Bk. The optimal dual values for the basis preserving operator are obtained by merely increasing the {u,} to { u i + ( 6 - 6 k ) } for i E IR and decreasing {vi} to {vi - (6 - 6D} for i E JR and leaving the remaining duals unchanged. It can also be shown that the objective function increases by (6-6~)xpq. (Similar remarks apply to 6C-p, with ( p , q ) E B.) If ( p , q ) ~ B. the duals {u,} and {vj} and the objective function Z remain unaltered for 6C~q. For the area cost operator 6CA, we define u~=u~
v~ = v oi + (6 - 6 D r *
*
foriEl,
(14)
for ] E J
(15)
where u* and v* satisfy (7) with c,i replaced by 7,i. The {u ~ and {v ~ correspond to the duals for P'r(6k). The objective function Z increases by ( 6 - 6 ~ ) • ~{,.j~el~• 3',ix,i- For the basis preserving cost operators, since the basis and the rim conditions do not change, the primal solution {x~i} does not change for 6k -< ~ <- ~Sk+ ~k- The value/~k is determined as the maximum value for (6 - 6k) so that the transformed {u~} and {vi} satisfy the dual feasibility conditions (6) for the problem PT(6). Thus for the problem Pr(6~ + tzk), at least one of the nonbasic cells (e,]') satisfies c, t = u, + v t and any further application of the basis preserving cost operator with basis B~ would make c, t < u, + v t thus violating the dual
V. Balachandran, V. Srinivasan, G.L. Thompson/Applications operator theory
63
feasibility condition (6). The alternate optimal basis Bk+l is determined by adding (e, f) to Bk and eliminating the 'giver' cell (r, s) with the smallest value for xrs. (In the cycle created by adding (e,f) to Bk we mark alternate cells of the cycle as 'getters' and 'givers' starting with (e, f) as a 'getter'.) The operators are computationally easy to apply. In particular the computational steps involved in applying cell operators are even easier and this is the reason we have provided specialized algorithms for the cell operators rather than treating them merely as area operators. As will be seen in the rest of this paper, cell operators tend to arise frequently in many applications. Parametric programming is much more valuable for transportation problems compared to general linear programs. To see this, consider the linear program min
C'X,
s.t.
AX=b
and
X->0
(16)
where C and X are (N x 1), b is (M • 1) and A is (M • N). Now assume that the requirements vector b can be changed to b + 6d where d is a given (M x 1) vector and 6 >-0 is a scalar, but at a cost g6 (g is a scalar). The problem of determining the optimal 6 can be formulated as min
C ' X + gS,
s.t.
AX-6d=b;
X, 6>_O.
(17)
We note that (17) is also a linear program in the variables X, 6 so that it can be solved directly as such and no parametric program is necessary. However, such is not the case if (16) is a TP. In that case, the constraint matrix A has the special "echelon-diagonal" pattern [41, pp. 227-228] with N = mn and M = m + n and the primal transportation algorithm effectively uses this structure. However, the presence of the vector d in (17) makes the coefficients matrix [ A - d ] not possess the special structure any longer. Thus (17) is not a TP. Of course, (17) can be solved directly as a linear program but it will be computationally more efficient to solve (17) by applying an area rim operator (the ai and /3j will be directly determined from d) to the optimum solution of the TP (16). As remarked earlier, the objective function increases with a marginal cost of ~,iEt aiui +~,j~s [3jvs = f (say). Consequently, the overall marginal cost is (f + g) and we apply the area rim operator until (st + g) becomes nonnegative. The above example illustrates as to why parametric programming is likely to prove more valuable for transportation problems as compared to general linear programs. In addition to providing practical benefits in solving problems which do not directly fit as TP or GTP, the operator theoretic algorithms also provide theoretical insights. For instance: (i) In the linear cost capacity expansion problem discussed in Section 5.3 [21] where market demands are monotone nondecreasing over time it can be shown
64
V. Balachandran. V. Srini~'asan. G.L. Thompson / Applications operator theory
that there exists an optimal solution in which the warehouse capacities are nondecreasing even though such constraints are not explicitly imposed. Although some of these insights may be obtained by alternate means such as lattice theory [56] our operator theoretic algorithms have the advantage of obtaining such a solution (in addition to proving existence). (ii) By defining an additional warehouse (m + I) and an additional market (n + i) (with cm+,., = 0 for j E J , c~.~+~ = 0 for i E I, c~+~.~+~ = M (a large positive number), a,~, = 0 and bn§ = 0), the cell rim operator 8R~,,j.~ provides a means for determining the downward marginal cost ( = u,~, + v~.~) of the transportation system as a whole (i.e., the marginal rate at which the total optimal cost will go down if the volume handled in the system is reduced) [45, p. 250]. (iii) The cell cost operators can be used in an algorithm for solving the TP itself with costs translated if necessary so that c~j -> 0 for all (i, j). Consider any primal basic feasible solution to this problem. By temporarily defining cii = 0 for ( i , j ) E B, the feasible solution is optimal to the transformed problem. We can now restore the costs for (i, j) C B one by one using the (plus) cell cost operator. It can be shown that this algorithm converges to an optimum within 2 ~ , ~ a, iterations (assuming a~ and b i to be integers and the primal problem to be nondegenerate) [52]. This is the first primal basic algorithm that we know of with a polynomial bound in the number of iterations. Another interesting property of this cell cost operator algorithm is that, even if the problem is primal degenerate it will converge to an optimum without any need to (slightly) perturb the data of the problem. If 7", T~, T2 are operators, and if 8a and 82 are scalars, it can be easily shown that the following theoretical properties hold: (i)
8~T,(82T2(P)) = 82T2(SjTj(P))
(ii)
8, T ( 8 2 T ( P ) ) = (Sj + 82)T(P).
(commutativity);
Although parametric programming has been well investigated in the context of linear programming [27, 28] much less results were available in the context of TP and GTP before our papers [7, 8, 9, 10, 44, 45]. For instance, [i, 53] concern themselves with an analysis of the 'stability' of an optimal basis with respect to data variations. To examine the maximum value ,5 for which the current basis is feasible when cpr is changed to c,~ + r these approaches would involve the determination of (m - I ) x ( n - 1) cycles and solving a set of (m - I ) x ( n - I) inequalities in 8. The cell cost operator algorithm, on the contrary, does not involve any determination of cycles at all and evaluates the minimum of only (m • n)/4 numbers, on the average. Consequently, the procedures in [I, 53] for continuous data variations are not computationally efficient for the parametric programming of transportation problems, although admittedly there is some overlap in the underlying theory. The determination of B~ t from Bk in step (iii) of the Algorithm for rim operators is the same as finding an adjacent dual
V. Balachandran, V. Srinivasan. G.L. Thompson/Applications operator theory
65
feasible basis in the dual simplex method for the transportation problem [I 1, 16, 331. It should be emphasized that these last referenced papers are concerned with discrete changes in rim conditions whereas the operator theoretic algorithms deal with continuous variations in all the data (rims, costs, upper bounds, weights) of the TP and GTP.
2. Managerial and economic significance of operators One of the important benefits of the operator theory approach is the managerial and economic significance of the shadow prices associated with each of the cell operators. The shadow prices enable us to determine the effects on the optimal cost Z of ceteris paribus changes in the parameters of the problem. Among other things they permit the identification and explanation of the 'transportation paradox' for the TP case and the related 'production paradox' for the GTP case. Considering first the cell cost operator, the change in the optimal cost Z associated with the operator 8C~ was stated in Section l to be ~xpq so that the shadow price (i.e., rate of change in optimal cost) is xpq. This shadow price examines the effect of an increase (or decrease) in the unit cost of transportation along route (p, q). Such shadow prices would be of significant managerial assistance in bargaining rates with trucking companies and in reacting to changed conditions caused by strikes, rate changes, etc. The cell rim operators are discussed in [44, 45] for the TP and [7, 8, 10] for the GTP. To aid interpretation, we first augment the given m x n problem by an additional row, m + I, and column n + I. All costs in the new row and column are zero except c,.~j.,+~ = M for the TP case (M is a large positive number). For the GTP all costs in the new row and column are zero except c,,.~.i = M for j ~ J . For the TP the a , ~ and b..~ are chosen to make the sum of the availabilities a, to be equal to the sum of the requirements bi. In the GTP case a,~.~ = M and b.+~ is not needed and hence not defined. We define I ' = I U {m + I} and J' = J U {n + i}. These extra rows and columns serve somewhat different purposes in the two cases. However, in either case, once the enlarged problem is solved, we can define the optimal dual matrix D with entries dpq=epqup+vq,
forpEl'andqEJ'
(18)
where e~q = i for all p and q in the TP case, and up and vq are the optimal values for the dual variables. We now indicate how each of the (m + I)• (n + 1) entries dpq in (18) has at least one (and sometimes more than one) shadow price interpretation. We concentrate on the TP case. The optimal dual matrix defined in (18) can be partitioned into the four areas as shown in Fig. 1. In addition there are special
66
V. Balachandran, V. Srinivasan. G.L. Thompson~Applications operator theory 2
9
-
t
c~
.
n*l
.
l 2
dpq
P
dp,n4l
AREA A 2
AREA A 1
Baszs (k,n-l)
k
m m*
cell
. . . . . . . . . . . . . . . . . . . . . . . . dm+l.q BamXm c e l l dm*l.n*l AREA a 3
(m*l,t)
AREA A 4
Fig. I. Optimal dual matrix D
r o w and c o l u m n indices, k and I, such that cells (k, n + I) and (m + 1, I) are in the optimal basis. ( E v e r y basis must have such cells in the last row and column.) C o r r e s p o n d i n g to a n y optimal basis, the following facts can be s h o w n c o n c e r n i n g the matrix D s h o w n in Fig. I: (a) T h e entries dpq (defined in (18)) are unique e v e n though up and v, are not unique in the T P case. (b) The dpq in A~, for p E l and q ~ J, can be either positive or negative e v e n w h e n all the c~i are positive. (c) In areas Az, A3, and A4 we have d p . ~ . , < 0 for p E l ' , and d,~+,.q<_0 for q ~ J ' . T h e quantities d,~+t.l = 0 and dk.,+, = O. (d) In A4 we h a v e d,~§ = -dkl. T h e interpretations of the dual prices {d,~} are given in T a b l e I for the case ~ i ~ t ai = ~ i ~ s b i. (A m o r e c o m p l e t e table is given in [45, p. 250] to include the c a s e s of ~'.i~l a, > or < ~ s ~ j bj.) T h e s h a d o w prices given in T a b l e I are useful to a m a n a g e r in guiding decisions as to which w a r e h o u s e capacities are to be increased, which m a r k e t s should be sought after, etc. F r o m T a b l e 1 we note that for simultaneous c h a n g e s in w a r e h o u s e p and m a r k e t q, the s h a d o w price for Table I Economic and managerial interpretations of cell rim operators Ceteris paribus Changes in the rim conditions of the problem a. Simultaneous increase or decrease in a p and
Relevant cell rim operator(s)
Shadow price
8R~,q
~dpq
bq b. c. d. e. f.
Increase in ap Decrease in ap Increase in bq Decrease in ba Decrease in the total shipments in the system
8Rj;.,.I 6R~l[6R~.l.l]
6R:,.1.~ ,SRiq[6R~.,.d
~R:,.~.,.~
dp.,~l -dpl
d,,.i.q -d~
d,,§
Note: ~,~1 a, = ~l~j bj before the changes are made. Indices k and I are chosen so that (k, n + I) and (m § I, I) are in the current basis.
V. Balachandran, V. Srinivasan, G.L. Thompsonl Applications operator theory
67
decrease in ap and b~ is just the negative of the shadow price for the increase in ap and b~. However, the shadow prices for ceteris paribus changes in ap (or bq) are not so simply related for increases as compared to decreases. Such an asymmetry in shadow prices is not at all apparent from a direct linear programming analysis of the problem. Another rich interpretation from the operator theory approach is the so-called 'downward marginal cost' of the transportation system, i.e., the rate at which the optimal cost would change if the total shipments in the system were reduced. As shown in [45, p. 250] this can be accomplished by the operator ~R~§ with associated shadow price d,,,t,~+~.The upward marginal cost, on the other hand, requires the slightly different setup elaborated in [46]. The interpretations in Table I hold only over a finite range of ~5, say 0 -< 8 -< IX. The extent, IX, for the basis preserving operator can be calculated by the methods given in [44]. For the case when all the c,j are nonnegative but some dpq < 0 in A~, i.e., p E I, q E J, and the extent of the operator ~SR;q is positive (Ix >0), then we observe the 'transportation paradox' by applying 8R~q with ~5> 0. That is, we can 'ship more (total tonnage) for less (total cost)'. When the extent of an operator IX =0, we have the degenerate case. It is shown in [45] that by a finite number of basis changes the shadow price dpq and corresponding operator 8R~q having positive extent Ix can be determined. In [221 Fong and Srinivasan show how to simultaneously calculate such nondegenerate shadow prices for all p and q. The interpretations of the entries of the dual matrix for the GTP case are similar. The following facts are proved in 17] for any optimal basis B: (a) up and vq are unique; (b) up-<0 for p E l ; and (c) vq---0 for q E J . There is no downward marginal cost evaluator. Again when all the unit costs c,j are nonnegative but dp~ < 0 for some (p, q)• A~ and 8R;o has a strictly positive extent IX we have the 'production paradox', that is, it is possible to 'produce more (total output) for less (total cost)'. In the GTP case we have a new kind of operator, the weight operator, (see [9]) to change a cell weight ep~ (efficiency of machine p in producing product q). The effect on Z of this operator is, in general, not piecewise linear in contrast to all other operators.
3. Optimization problems involving a TP (or GTP) plus the consideration of an important additional factor In formulating an operations research model as a TP (or GTP) it often happens that some important aspect(s) of the real problem is not taken into account in the model. For instance, it may be that some of the rim conditions and/or cost coefficients may be controllable to an extent rather than 'given' and
68
v. Balachandran, V. Srinivasan, G.L. Thompson] Applications operator theory
the manager naturally wants to know which controls he should exercise and to what extent in order to obtain the best solution. In such situations, we could make use of the operator theory to take into account such an additional consideration in that problem. Two illustrations of this concept are provided below, one in the context of production smoothing and the other in the context of cash management. If, however, it is essential to consider several additional factors, it may be necessary to abandon the operator theory approach and resort instead to a decomposition formulation [18] with the additional factors serving the role of the master problem and the TP (or GTP) considered as the subproblem. 3.1. Converting B o w m a n ' s production scheduling model [12] into a production smoothing model Production Smoothing [36] (also known as Aggregate Planning) is the problem
of deciding how to absorb the fluctuations in demand by changes in production level, inventory fluctuations and variations in the size of the labor force. Bowman [12] has'formulated the production scheduling problem (i.e., work force changes are not taken into account) as a TP where the 'warehouses' represent the 'production possibilities'--initial inventory and the regular and overtime capacities for the n periods in the horizon. The "markets" denote the forecasted demands for the n periods and the final inventory. Costs of storage and the different costs for production at regular and overtime are the unit costs of the problem. We can incorporate hiring and firing costs (workforce change costs) into the Bowman model with the help of rim operators as follows. Let the workforce level at the beginning be W* which we will, for simplicity, assume to be the desired ending workforce as well. We first solve the transportation model using PR = qR W* for the regular production capacity for each of the n periods where qR is the average production capacity per worker at regular time. Let P0 = a q o W * be the overtime capacity in each period where q0 is the average overtime production capacity per worker and a is the maximum fraction of the workforce which can be used for overtime purposes. Let d~, d2..... d, and D~,/)2 ..... D, be the rates of reduction in total costs (which can be found with the help of the shadow prices in Table 1) associated with increases in production capacities-regular and overtime--during the n periods. If we hire H, workers in the i th period (and keep them up to the end of the n th period), then this increases the regular production capacity by qRH~ and overtime capacity by aqoH~ for the periods i, i + I ..... n. Thus the ceteris paribus cost reduction (provided H,. is small) is given by Hi[qR(di + d,+l + "'" + d~) + aqo(Di + Di+l + "'" + D~)] - Hihi
where hi is the unit cost of hiring. In the context of production scheduling the total production capacity (regular and overtime) for the n periods usually
V. Balachandran, V. Srinivasan, G.L. Thompson~Applications operator theory
69
exceeds the total market demands so that from [45, p. 250] the shadow prices for decreases in production capacities will be just the negatives of the corresponding shadow prices for increases in production capacities. Consequently, denoting [~ to be the unit cost of firing and letting H,. and F~ be the number of workers hired and fired in period i, the net cost reduction becomes A =
(Hi - F,.) ~, (qRdj + aqoD~) - I-I~h~- F~, i=l
(19)
j=i
subject to the constraints ~(H/-F/)=0
and
Hi, Fi>-O
f o r i = l , 2 ..... n.
i=l
We can solve the linear program of maximizing A subject to the constraints. Actually it is not necessary to use the simplex procedure for solving this problem since the solution can be inferred by collecting the coefficients of and F,. in the square brackets of (19). The intent is mainly to find in which periods workers are to be hired and when excess should be fired. Then we can apply the relevant basis preserving rim operators up to the maximum permissible extent, update the dual variables after changing the basis structure and reevaluate (19). If it still pays to hire and fire, we can continue. Otherwise we stop. The resulting solution can be shown to provide an optimum solution to the production smoothing problem.
3.2. Determining the minimum cash balance in the transshipment m o d e l / o r cash management
The cash management problem is concerned with optimally financing net cash outflows and investing net cash inflows of a firm while simultaneously determining payment schedules for incurred liabilities. The multi-period cash management problem is formulated in [43] as a transshipment model with m 'warehouses' corresponding to different sources of cash (e.g., cash sales, receipts from accounts receivable, maturing short-term securities and line of credit), and the n 'markets' corresponding to different uses of cash (e.g., cash purchases, payments on accounts and notes payable). Since cash can be both a source and a use, transshipments are permitted across different time periods (e.g., short-term investments and/or short-term loans). Yields on securities, interest rates on loans, and discounts and penalties on accounts and notes payable constitute the unit costs of the problem. The transshipment formulation takes full account of payment schedules, short-term financing, and securities transactions. It, however, ignores the important problem of determining the 'minimum' cash balance itself. Cash deposits in excess of some absolute minimum cash balance requirement prove valuable since they improve the firm's credit ratings and the banker's goodwill. But
70
V. Balachandran, V. Srinivasan, G.L. Thompson/Applications operator theory
deposits also mean foregone revenue from alternative investment in securities. Consequently, the financial manager needs to know the effect on the optimum total cost from maintaining different levels for the 'minimum' cash balance so as to arrive at a subjective decision regarding its magnitude. This important additional consideration can be implemented through a cell rim operator. Increasing the 'minimum' cash balance is equivalent to an increase in the net cash outflow for the first period, i.e., market #1. The effect of an increase of 8 in the 'requirement' of 'market 1' can be examined through the operator fiR~,+~.~(see Table 1). By applying Algorithm I outlined in Section I, we get an increasing, piecewise linear, convex cost curve demonstrating the opportunity costs (i.e., foregone revenue) by maintaining increasing levels for the 'minimum' cash balance.
4. Bicriterion transportation problems In industrial logistics problems, the cost of transportation and the average time of transportation are often considered to be the two most important objectives. In military logistics, the cost of transportation and the bottleneck time (maximum time taken along any route actually used in the solutions) are considered most important. Since a feasible solution which minimizes one objective does not, in general, minimize the other objective as well, we may concentrate our attention on the set of all efficient (Pareto-optimal or nondominated) solutions as defined below. Denoting the two (minimization) objectives as
F(x) =
~,
fi~xo,
(20)
gi~xii,
(21)
(i.I)E[lx]]
G(x) =
~, (i. i ) E [ I x JI
we define a solution pair (F,G) to be the values of the two objectives corresponding to a primal feasible solution X (i.e., satisfying (2)-(4)). A solution pair (F, G) is efficient if there exists no other solution pair (F', G') such that either (i) F' <- F and G ' < G or (ii) F' < F and G ' < G. It is shown in [51, pp. 4-5] that the set of all efficient solution pairs can be obtained by finding the feasible solutions which minimize (1 - 8)F(x)+ 8G(x)
(22)
for all values of 8 in the range 0 < 8 < I. Having determined the set of all efficient solution pairs (which can be displayed in a graph with axes F and G), the manager can subjectively choose one of the points by trading off one objective versus the other and implement the corresponding solution X. From (20)--(22) we may rewrite the compound objective as
V. Balachandran, V. Srinivasan, G.L. Thompson~Applications operator theory
minimize
~.
[fii +
(i.i)EIIxJI
8(g~i -f,j)lx,,.
71
(23)
By defining c~j = f~i and Yii = (g~i-f,J), it is easily seen that the problem (23) is solved by the area cost operator 8C^ (see (10)) applied over the range 0 < 8 < 1. Thus cost operators play a valuable role in the determination of the set of all efficient solutions in bicriterion transportation problems. Two illustrations of this concept are provided below, one in the context of multimodal cost vs. time transportation problems and the second in the context of solving transportation problems where bottleneck time is considered as an objective in addition to total cost. If, however, several objectives have to be considered simultaneously, we may have to resort to a more complicated procedure such as that employed in the multi-criteria simplex method [58]. The area cost operator approach to solving the bicriterion transportation problem is also useful in the solution of a TP for which one additional constraint such as y~jx0 -> H
(24)
(i.i)EIIxJ]
is to be imposed. Such a constraint arises, for instance, to limit the total amount of time spent by a salesman on a subtour [30]. Denoting the Lagrange multiplier associated with (24) as 8, the Lagrangian becomes
minimize
~
(c~j- 8"y~)x,i.
(25)
(i.j)CIIxJI
Consequently to solve (I)-(4) with the additional constraint (24), we can first solve (I)-(4) as a standard TP. If (24) is satisfied, then we are finished; otherwise, we apply the area cost operator 8CA (see (10)) by increasing 8 from 0 to a positive number 8* large enough to satisfy (24). (If 8" = 0% then this would imply that the constraint (24) can not be satisfied.)
4.1. Determining cost vs. time Pareto-optimal frontiers in multi-modal TP
To consider the multiple modes of transportation, Srinivasan and Thompson [51] generalize the problem (1)-(4) by defining an additional subscript k E K = {I,2 ..... p} to denote the transportation mode (e.g., railroad, highway, air). (2)-(4) are generalized to ~'. ~ x;ik = a~ for i E I,
(26)
jEJ kEK
~ X,ik=bj
iEI kEK
X,k-->0
forjEJ,
(27)
for (i, j) • [I x J] and k E K.
(28)
72
V. Balachandran, V. Srinivasan, G.L. Thompson[ Applications operator theory
D e f i n i n g {Cijk} to be the unit costs and {tog} to be the transportation times, the total cost C and the (weighted) average shipment time T' (each time tOk weighted by the corresponding Xok) are given by
C(X) =
~,
(i,j)E[l•
~, CokXisk,
(29)
k~K
(i,j)EIlxJ] k~K
(i,./
•
k~K
Since the denominator of (30) is equal to ~ i a~ = ~sr bs, a constant, we can, for mathematical convenience, consider the (weighted) total transportation time
T(X) =
~,
~, tokXisk
(31)
(i,j)~[lxJ] k~K
to be the objective instead of T'(X). If we ignore the subscript k for a moment, the problem of finding all efficient solution pairs (C, T) can, by analogy to (20)-(23), be accomplished by the application of an area cost operator 8CA over the range 0 < 8 < 1. For a given value of ~ the objective (22) becomes minimize
~
(i, ])~[l•
[(1- ~)c o+ 8to]x o.
(32)
Since there is a choice with respect to the transportation mode, it is clear that for each route (i, j) we should choose that mode for which the compound unit cost
dijk = (1 -- ~)C~jR+ ~t~sk
(33)
is a minimum. The mode at which this minimum is attained for a cell (i, j) is a function of & however. The algorithm for determining the set of all efficient solution pairs (C, T) uses the area cost operator algorithm to apply 8CA in the range 0 < ~ < 1 while simultaneously keeping track of the mode at which dog is a minimum for each (i, ]). The detailed algorithm, its justification and a numerical example are provided in [51].
4.2. Algorithms for minimizing total cost, bottleneck time and bottleneck shipment in TP Consider a TP that has unit cost c~j and time t 0 when a good is transported from warehouse i to market j. We can define the following objectives for every feasible solution satisfying (2)-(4): Total Cost
TC =
c~sxii,
(34)
Bottleneck Time
BT = Maximum ti;,
(35)
Shipment on Bottleneck Routes
SB =
(36)
~
(i, j)~[/xJ]
{(i, j) [ x//>0}
~ {(i, j) t~= BT}
xij.
v. Balachandran, V. Srinivasan, G.L. Thompson~Applications operator theory
73
Thus BT measures the maximum time taken by any shipping route actually used (xij > 0) in the solution, while SB gives the total shipping amount over all routes having this maximum time. In [50] Srinivasan and Thompson provide an algorithm for determining all the efficient solution pairs (TC, BT) and for each such BT, the set of all efficient (TC, SB) solution pairs. This algorithm may be briefly outlined as follows. The algorithm starts with the solution to the standard TP (1)-(4) (i.e., minimize TC) to define the solution pair (TC ~, BT1). At the k th iteration of the algorithm with current solution pair (TC k, BTk), define ~-(/2) to be the set of cells (i, j) ~ [I • J] with tij > BT k (tij = BTk). Let us define a ~r-solution to be a feasible solution X (i.e., satisfying (2)-(4)) with x~j = 0 for all (i, j) E ~r. Then as in (22), the set of all efficient (TC, SB) solution pairs corresponding to BT k can be obtained by minimizing ( 1 - ~)TC + ~SB, for 0 < ~ < 1 while restricting attention to ~r-solutions. As can be seen from (34) and (36) this minimization is easily accomplished through an area cost operator. As 8 ~ 1 the minimum value of SB would be attained. If this minimum were zero, the value of BT decreases and we obtain corresponding to the current X the new solution pair (TCk§247 The algorithm continues with the revised sets ~- and O. If, however, as ~ ~ 1 the minimum value of SB in the k th iteration is strictly positive, the algorithm stops since in that case BT k is the minimum attainable value for BT. Srinivasan and Thompson [50] also provide a faster algorithm for determining the (TC, BT) efficient solutions when SB is not considered as an objective per se. This alternate algorithm may be thought of as a modification of the algorithm just outlined, the main difference being the manner in which SB is driven down to zero. In the previous algorithm SB was driven to zero by obtaining a set of (TC, SB) efficient solution pairs with the limiting SB = 0. In the alternate algorithm, the xij for (i, j) ~ O (i.e., {(i, j) [ t/j = BTk}) are driven to zero one by one. Consequently cell cost operators (which are much easier to apply computationally) are used in the alternate algorithm rather than the area cost operator used in the previous algorithm. Both algorithms are illustrated with examples in [50]. If all c~j = 0, the second algorithm reduces to the Szwarc-Hammer algorithm [54, 34] for the Bottleneck Transportation Problem (BTP). Since all c~j = 0, any primal basic feasible solution is optimal in terms of TC and hence can be used to start the algorithm. This algorithm for the BTP is extremely efficient (takes only about a second, on the average, to solve a 100x 100 problem on the UNIVAC 1108 computer, FORTRAN V compiler) and takes only about 35% of the computation time required by the threshold algorithm for the BTP [25]. Although the BT and SB objectives considered here have not been very widely applied so far, it is to be hoped that the existence of the efficient algorithms discussed here for solving these problems will make their applications to areas such as assembly line balancing and personnel selection [50] more widespread.
74
V. Balachandran. V. Srinivasan, G.L. Thompson~Applications operator theory
5. Multi-period growth models Rim operators have also proved useful in obtaining solutions to multi-period growth models where the total volume handled in the logistic system increases through time. Depending on the nature of the constraints in the particular growth model, the three problem types discussed below have been considered. 5.1. Optimal growth paths problem Consider a production-distribution system with prespecified lower bounds {ai} on the amount of production in factories i E 1 = {1,2 ..... m} and lower bounds {bi} on the volumes supplied to markets j E J = {i, 2 ..... n}. Given linear costs of production, transportation and market expansion, we are interested in the optimal growth path, i.e., time sequence of production amounts, market volumes and shipments along the individual routes, as the total volume handled in the system, i.e., ~,.i)~l~xjix,j, is increased through time. To solve this problem, Srinivasan and Thompson [46] add an (n + 1)'t column to correspond to variables xi.,+~ = N - S, where N is a large positive number and {Si} are the increments to production over and above {a,.}. Similarly, an (m + 1)'t row is added to correspond to variables xm+l.i = N - T i where {T~} are the increments to market volumes over and above {bi}. It is shown in [46] for the TP and in [2] for the GTP that the optimal growth path can be obtained by the application of the cell rim operator 8Rm§ 5.2. Multi-location plant sizing and timing Rao and Rutenberg [39] consider a dynamic multi-location growth problem where the demands in the n markets grow at varying (but known) rates. There are economies of scale in building plant capacity and the authors assume that the sequence of plants that will be built first, second, etc. has been decided in advance. The plant sizing and timing issues are decided by iteratively solving two subproblems: (a) Given the plant sizes, the timing of expansions is decided by evaluating the effect on total cost of changes in the times at which each of the plants are built; this is done through an area rim operator 8R^ where the prespecified market growths define the {/3j} for the area rim operator. (b) Given the timings of expansion, the plant sizes are decided by the method of feasible directions in nonlinear programming [57, Ch. 15] where the value of an incremental unit of capacity at plant p at time t is determined as up(t)+ v~§ (see the shadow price given in row (b) of Table 1). Rao and Rutenberg report excellent results in terms of both the goodness of the solution obtained in comparison to alternate methods and computational efficiency.
V. Balachandran, V. Srinivasan, G.L. T h o m p s o n ~ A p p l i c a t i o n s operator theory
75
5.3. Multi-period multi-region capacity expansion problem In this section we describe the application of area rim operators in solving a multi-period capacity expansion and shipment planning problem for a single product under a linear cost structure [21]. The product can be manufactured in the set I = {1,2 ..... m} of producing regions and is required by the set J = {1, 2 ..... n} of markets for each of the time periods (e.g., years) K = {1, 2 ..... T}. Let K' = K - {I}. Let {r ~ be the initial demand in market j and let r~ -> 0 be the known increment in market j's demand in time period t. Thus ~,=0 ri represents the demand in market j at time t. Let {q0} be the initial production capacity in region i and let {q~} be the cumulative capacity added in region i from periods 1 to t. Thus the total production capacity in region i at time t is q0+ q~. The demand in a market can be satisfied by production and shipment from any of the regions, but must be met exactly during each time period (i.e., no backlogging or inventorying). Let c/j be the unit cost of "shipping" from region i to market j. (This includes transportation costs, variable costs of production including costs of maintaining a unit of capacity.) Let ki be the unit cost of capacity expansion in region i. (Proportional capacity expansion costs, as assumed here, may be realistic when the production capacity is rented or subcontracted or when the fixed costs are relatively small. Moreover, the optimum solution to the problem with linear costs can be used to provide a lower bound to the objective function of the problem with concave expansion costs.) Let hi be the unit cost of maintaining si units of idle capacity in region i. It is assumed that all the costs are stationary but the model can be easily extended to take into account inflationary effects. Let gi be the terminal (or resale) value of a unit of capacity in region i at time T. Let a be the discount factor per period. Then the problem of determining a schedule of capacity expansions for the regions and a schedule of shipments from the regions to the markets so as to minimize the discounted capacity expansion and shipment costs can be formulated as the problem P below: min tEK (i,j)E[IxJl
tEK iEl
+ ~, kiq~ + ~ ~.. a t-, ki(qi, _ q [ - ' ) - ~ , ol rgiqi,r iEl
s.t.
tEK' iEl
(37)
iEl
~, x~j + s~ - q~ = q~ f o r i E I a n d t ~ K ,
(38)
j~J
t
for j E J and t E K,
(39)
ql - qI-~ -> 0
for i E I and t E K',
(40)
x[j,s[,q[>-O
for i ~ I , j ~ J
(41)
~'. x ~.= ~ r~ iEI
~=0
and t E K .
The objective function (37) gives the minimum total time discounted shipment, idle capacity maintenance and capacity expansion costs less the salvage value
76
V. Balachandran, V. Srinivasan, G.L. Thompson/Applications operator theory
for the capacity. After making suitable assumptions on the salvage values g~ and rearranging terms, (37) can be rewritten as rain
~] a ' - ' [ t~-K
~
-L(i,./)~[IxJ l
cox,~ + ~ {h,s~ + k]q~}] iEl
J
(42)
where k~ = (1 - a)ki. The constraints (38) state that the amount shipped out of region i at time t ( = ~ms x[j) plus whatever is left as idle capacity ( = s[) should be equal to the net capacity q~ + q[. (38) can be rewritten as E
x ot + s ~t + ( N - q ~ ) = q ~
foriEI
(43)
where N is a large positive number. Thus if we define xi.,+~' -- si' and xg.,+2'= ( N - q~), (43) becomes
x~s = q~ + N
(44)
fori~I
where J" = J tO {(n + 1)} U {(n + 2)}. We note that 0-<x~,,+2-
foriEI
(45)
since N - q~ - 0. The constraints (40) become t , - i >- O f o r i ~ L t ~ K ' . -x~,,+2 + x~,,+2
(46)
Constraints (40) (and equivalently (46)) are the coupling constraints in P linking the time periods in K. Consequently, if we drop the constraints (46), the problem P splits into a sequence of problems Pt (t E K). As will be seen later, this sequence of problems can be solved by effective and repeated use of area rim operators sequentially by starting from the solution of problem P0 to solve P~, starting from the solution of P1 to solve Pz, etc. Augmenting the set I by a "dummy" region (m + 1) to pick up any extra amounts in the (n + 1) and (n + 2) columns and defining I' = I tO {m + I}, the problem Pt becomes rain s.t.
~,
(i, j)~II'xJ"]
ciix~i,
~, x~i = q~ + N
(47) foriGI',
(48)
,~, x~, = ~0 r~ for ./E J",
(49)
.xL.+2 <- N
for i ~ I',
(50)
for (i,j)C[I'•
(51)
t
xls---0
where the values for co' q0 and r~ for i = m + l ; appropriately defined.
j=(n+
1), ( n + 2 ) are to be
V. Balachandran, V. Srinivasan. G.L. Thornpsonl Applications operator theory
77
The interesting thing to note is that P, is a TP and, what is more. problem P, differs from P,_~ only in the right hand side of constraints (49). The requirement of market j increases by r~ in going from P,_~ to P,. Thus the optimal solution to P, can be obtained from that of P,_~ by the application of the area rim operator 6R~,(P, a) with 8 = ! , a,=0 for i ~ l ' and [3i=r I for j@J". (The definition of r',,,. is such that ~ i e r / 3 i = ~,era, = 0.) It can be proved that the solutions so obtained satisfy the constraints (46) (equivalently (40)) thus providing the optimal solution to the original problem P. Consequently. a 10 period, 10 region, 200 market problem is reduced from a linear program P with 2190 constraints and 20 200 variables to a TP with il regions and 202 markets at time 0 together with the application of the area rim operator to obtain the transportation solutions for the next 10 periods. A detailed illustration of this approach is given in [21]. Rim operators have proved useful in other capacity expansion problems also. In [23], Fong and Srinivasan extend the formulation discussed above to the case where the costs may be nonstationary, demands not necessarily increasing and capacity expansion costs having fixed components as well. The heuristic algorithms used in [231 start with a feasible solution and improve it by swapping capacities between two regions over the planning horizon if that would reduce the total cost. The binary rim operators discussed earlier [221 prove useful in this context. The heuristic solutions so obtained are only about 0.8% away from the optimum and considerably faster (often by a factor of more than 10) compared to mixed integer exact procedures.
6. Extensions of the TP and GTP
Certain extensions of the standard TP or GTP (e.g., the stochastic TP and GTP, the TP and GTP with piecewise linear, convex transportation costs) can be solved by first solving a standard TP or GTP and then sequentially applying the relevant operators. We illustrate this concept below in the context of Balachandran's [4] algorithm for the stochastic GTP. The application of cost operators for solving the piecewise linear cost TP is discussed in detail by Caccetta [141.
6.1. The stochastic TP and GTP Garstka [261 has presented an algorithm for solving the Stochastic Transportation Problem by utilizing the concept of "stochastic programming with simple recourse". He allows the demands bi's to have a known marginal probability distribution. By utilizing the per unit penalties of under and over supply he introduces an equivalent convex function as the objective and thus solves the "simple recourse" problem. We discuss below Balachandran's [4] application of operator theory to provide a solution procedure for the extension of the above formulation to stochastic GTP:
78
V. Balachandran, V. Srinivasan, G.L. Thompson~Applications operator theory
min s.t.
,.,~l~st c'F~
(52)
~, e,jxii <- a~ for i ~ I,
(53)
~'. xij = bj
for j E J,
(54)
for (i,]) E [I x J].
(55)
iEJ
iEl
x,j -> 0
(If eij= I for all (i, j) ~ [I x Jl this reduces to the stochastic TP.) Following the usual assumptions of "stochastic programming" with recourse [26] with bi having a marginal probability density (function)/'(bj), let us define the per unit penalty costs Pi ( >- 0) and d r ( -> 0) respectively for under and over supply. Then the equivalent stochastic GTP can be shown to be (see [4] for details): minimize subject to
Zi +Z2
(56)
~. ei~cii-< ai
iEJ
x~i->0
for i E I,
(57)
for (i, j) ~ [l x J]
(58)
where
Z~ =
~,
thj)E|lxJI
~(2ci~- pi + dJ)xo,
(59)
b~
Z , = ~ s ( P ' + d,)
f
(b,-~x,i)f(bs)db,,
(60,
EtelX0
(bj,. is the median value of the random variable bj). The following properties of the optimal solution {xJ can be shown to be true [41: (l) The objective function is convex. (2) For any (i, j) E [I x J], if c~i > pj, then x~i = O. (3) For any j E J, if ci~ >- (pj - di)/2 for all i ~ I, then ~ E I xii <- bj,,,. Note that if Pi < dj the above inequality trivially holds if cis >- O. (4) For any i E I, if c~i + d i < 0 for all j E J, then ~,ies eiF~j = al. With these properties, it is shown in [4] that the optimal solution to the stochastic GTP can be obtained by solving an initial GTP where the rim conditions for the demands (columns) can be first set equal to bi,,, for every j and the objective function coefficient for xii as ~(2ci~- Pi + d~). The subsequent rim conditions for b~'s are obtained by iteratively solving the relationship: b/,,,
v jk = (pj + d~) I "f(bi)dbi ./
(61)
where v~ is the 'known' optimal dual corresponding to iteration k and b~ *a is the
V. Balachandran, V. Srinivasan, G.L. Thompson/Applications operator theory
79
unknown rim condition for the jth column to be evaluated from the above relationship. The newly evaluated {b k§ are used to obtain the next set of optimal solutions {x k+~}and the corresponding duals {u k+l}and {v k+~}.However, it is easily seen that the set of optimal primal and dual solutions for iteration (k + 1) can be obtained from those for iteration k by utilizing the area rim operator ~RA [8, 10] by defining/3j = b~ § b k and applying ~RA up to ~ = 1. The algorithm terminates when v I+l = v J (or b ~+1= b I) for every j. A convergence proof is also indicated in [4]. 7. Branch and bound problems with the TP or GTP as subproblems
In solving many OR models by branch-and-bound, it often happens that the subproblems are TP or GTP. Consequently, as we move along the branch-andbound tree, we may obtain the optimal solution to the (k + 1)st subproblem by post-optimization of the k th subproblem, rather than inefficiently having to solve the (k + 1)st subproblem from the start. Usually, the parameters of the (k + 1) st subproblem differ from that of the k th subproblem in either the rim conditions or the unit costs. Thus by appropriately defining the relevant operator and applying it to the k th subproblem, the (k + 1)st problem can be solved readily. We illustrate this method in the context of solving the traveling salesman problem [42, 49] and the TP with quantity discounts [6]. However, the application of operator theory has been indicated in several other applications that involve branch-andbound: optimal facility location under random demand with general cost structure [5], assignment of sources to uses [47], allocation of jobs to be processed by computers in a computer network [3], Decision CPM [49], Lock-box decision models [38], the one-machine job shop scheduling problem [29], etc.
7.1. Subtour elimination algorithm for solving asymmetric traveling salesman problems Eastman [19] and Shapiro [40] have solved the traveling salesman problem by using the linear assignment problem as a relaxation with subtour elimination in a branch-and-bound algorithm. Denoting the number of cities as n and cii as the distance from the i th city to the jth city, the assignment subproblem is given by (1)-(4) with m = n , a i = l for i ~ L b j = l for j ~ J and cij=oo for j ~ J . The solution to the assignment problem may contain subtours, however. For instance, if the assignment solution to a four-city problem yielded a value of 1 for x13, x3~, x42 and x:4 with all other xij = 0, this results in the two subtours (1 --*3--* 1) and (2 ~ 4 ~ 2). Starting with this solution for subproblem A (say), we may avoid the subtour (1-->3---> 1) by branching into two subproblems B and C by prohibiting the routes (1, 3) and (3, 1) respectively. The solution to subproblem B can be obtained from that of A by applying the operator MC~3 where M is a large positive number. (By making the cost along route (1, 3) prohibitively large, we drive x~3-->0.) If we now want to backtrack from B to A, we have to apply 8C~3
80
V. Balachandran, V. Srinivasan, G.L. Thompson/Applications operator theory
to bring the unit cost of cell (1, 3) back to its original value. Thus by the application of suitable cell cost operators, one can easily traverse up and down the branch-and-bound tree. Note that only one assignment subproblem needs to be solved from the start with all remaining subproblems solved by the use of cell cost operators. There is yet another advantage of using operator theory for this problem. Note that the increase in the value of the optimal objective function Z by applying MC~q is greater than that by applying [.LCpq, where/z is the maximum extent to which the basis preserving operator can be applied (since M >--/z). Now, from Section 1, the change in the value of Z by applying/xC~q = I~Xpq =/.~ since xp~ = 1 whenever a route is to be prohibited. Consequently,/z is a lower bound to the effect on Z of prohibiting the route (p, q). Since/x refers to the maximum extent of the basis preserving cost operator, it can be computed readily without having to alter the current solution in any way. The fast computation of such lower bounds considerably speeds up the branch-and-bound algorithm. Computational experience reported in [42] shows that this operator theorybased subtour elimination algorithm for the asymmetric traveling salesman problem (i.e., c~j not necessarily equal to q~) is considerably faster than (i) previous subtour elimination algorithms, and (ii) the 1-arborescence approach of Held and Karp [35]. The operator based algorithm takes only about 1.7 seconds on the UNIVAC 1108 (FORTRAN V) for a 50-city problem. However, for the symmetric traveling salesman problem, the subtour elimination approaches (including the present one) are completely dominated by the Held and Karp 1-tree approach [35]. 7.2. TP with quantity discounts The TP with quantity discounts may be formulated as follows: minimize
Z=
~
c*xij,
(62)
~, xij = ai
for i E L
(63)
XO = b i
for j ~ J,
(64)
(i, j)•[l•
subject to
jEJ
iEl
O<-x~j <-X~
for (i, j) ~ [I • J]
(65)
where for each (i, j): c ~,
c~,
1 if O = A 0~j<_ xjj < A o, 1 2 if Aij<_xij
c k, if A~-'<_xii--ck+l f o r k = l , 2 ..... r - 1 .
(66)
V. Balachandran. V. Srinit, a~an. G_L. Thompson/Applications operator theory
81
Let us refer the problem (62)-(65) as Pf. It can be shown that the fixed charge TP is a special case of problem P~. The problem P~ is solved by an algorithm [6], where one initially solves a problem with c~ = t,i " for each (i, i). Obviously if the optimal solution X -: {x~} satisfies (66). then the solution is optimal. Else, a branching rule is provided to find a cell (s. t) from among the "interval infeasible" cells from which the branch-and-bound procedure can be applied. Let us say that the current optimal xs*, should be such that )t~-~ _<xs*< ,~k,, due to the cost c* = c], used, but in fact. k-I it is interval infeasible in that x* < ,~, . This condilion leads to two branches (or subproblems) corresponding to x~, -> h~t4~ and xst < Ak,,-~. In branch i. the lower bound restriction x,,->A~ -~ has to be applied while leaving the current c'*,, unchanged. This subproblem can be easily verified to be equivalent to the TP in which the cell (s, t) corresponds to x',, = x,t - ,~k[t (rather than xs,), a, is changed to a~-A~,/~ and b, is changed to b , - A ~ , -~. Thus the subproblem of branch I can be solved by applying dR;, to the current solution where 8 = A~,-'. In branch 2, the current solution satisfies the constraint xst < hi, -~. However, the unit cost c* is to be replaced by the "interval-feasible" cost. i.e., the ct, corresponding to the current x~,. Thus the unit cost has to be increased from c* to c~t. This is easily accomplished by the cell cost operator 8C~, where ~ =
c~,,- c~. The detailed branch-and-bound procedure using the cell rim and cost operators in the manner outlined above is discussed in detail and illustrated with an example in [61.
8. Algorithms for TP and GTP Besides its value in solving nonstandard TP and G T P as outlined in Sections 6 and 7.2, it is interesting that operator theory can be used to develop algorithms for solving the TP (or GTP) itself. The basic idea is to first " s o l v e " a trivial TP (or GTP) for which the optimum solution can be obtained by inspection. One then uses operator theory to change the parameters of the problem starting from those for the trivial problem and ending with those for the true problem that is to be solved. We illustrate this concept below in the context of a cost operator algorithm for the TP and a weight operator algorithm for the GTP.
8. I. Cost operator algorithm f o r the G T P [9] The proposed algorithm starts with the determination of a primal basic feasible solution. The unit costs corresponding to the basic cells in the initial solution are then altered so that the solution is dual feasible as well and hence optimal to the problem with modified costs. For instance, for a problem with
82
V. Balachandran, V. Srinivasan, G.L. Thompson/Applications operator theory
nonnegative cii, the solution becomes dual feasible by setting cii = 0 for (i, j) E B. The altered costs are then successively restored to their true values with appropriate changes in the primal and dual solutions using cell cost operators. When all the altered costs (at most, m + n - 1 of them corresponding to the basic cells) are restored to their true values, one obtains an optimum solution for the original problem. The cell cost operator algorithm has many interesting theoretical features. First, it converges in a finite number of steps to an optimum even without perturbation of the rim conditions as against other primal basic methods. Second, it converges to an optimum in ( 2 T - 1 ) iterations for primal nondegenerate transportation problems where T denotes the sum of the (integer) warehouse availabilities {a~} (also the sum of the (integer) market requirements {bi}). This bound on the number of steps is much smaller than the bound for the Ford-Fulkerson algorithm for the transportation problem [24, Ch. 3.1] by a factor of approximately min(m, n)/2 where m and n are the number of warehouses and markets respectively. For primal degenerate transportation problems, however, the cell cost operator bound is slightly weaker than the FordFulkerson bound by a factor of approximately two. As against most primal basic algorithms which have exponential bounds, the cell cost operator algorithm has the more desirable polynomial bound. However, in terms of average computation times, the primal MODI algorithm [17, pp. 308-313] is faster than the cell cost operator algorithm. Details of the cell cost operator algorithm as well as the related area cost operator algorithm are given in [52]. In a similar manner, one can devise an algorithm for the transportation problem using rim operators. For instance, any dual basic feasible solution can be used to start the algorithm. The rim conditions can be altered (e.g., by setting a~ = 0 for i E I and b i = 0 for ] E J) so that the solution is primal feasible and hence optimal to the altered problem. The rim conditions can then be restored to their true values using rim operators with appropriate changes in the primal and dual solutions. When all the rim conditions are restored to their true values, an optimum solution to the original problem would have been determined. The cost (and rim) operator algorithms can be easily generalized to the GTP.
8.2. Weight operator algorithm [or the GTP [9] An interesting approach to solving the GTP is to use cell weight operators on a TP solution. Since a transportation problem is easier in basis structure, we can solve the GTP with the given unit costs and rim values by first setting all the weight coefficients {e0} as identically equal to unity. This converts the GTP to an ordinary TP so that the solution is easily obtained. One can then successively restore the original weights of the GTP from the current unit weights by the application of cell weight operators. In the GTP formulation of the cash management problem [43, p. 1358], the weight coefficients are typically very
V. Balachandran, V. Srinivasan, G.L. Thompson/Applications operator theory
83
close to one so that the weight operator algorithm could be expected to be computationally efficient in solving this class of problems.
9. Conclusions We have shown in Sections 3 through 8 that many classes of OR models can be solved by a TP or GTP formulation together with the application of rim/cost/weight operators for varying the parameters of the problem. Furthermore, as discussed in Sections 1 and 2, operator theory also provides rich economic and managerial interpretations as well as interesting theoretical insights. We hope that the operator theory of parametric programming would prove useful in further enhancing the successful use of the TP and GTP in the practice of operations research.
References [1] J. Abraham, "Uber die Stabilitiit der L6sungen im Transportproblem der Linearen Programmierung", Czechoslovak Mathematical Journal 8 (1958) 131-138. [2] V. Balachandran, "Optimal production growth for the machine loading problem", Naval Research Logistics Quarterly 22 (1975) 593-607. [3] V. Balachandran, "An integer generalized transportation model for optimal job assignment in computer networks", Operations Research 24 (1976) 742-759. [4] V. Balachandran, "The stochastic generalized transportation problem--An operator theoretic approach", International Journal o/Networks 9 (1979) 61-70. [5] V. Balachandran and S. Jain, "Optimal facility location under random demand with general cost structure", Naval Research Logistics Quarterly 23 (1976) 421-436. [6] V. Balachandran and A. Perry, "Transportation type problems with quantity discounts", Naval Research Logistics Quarterly 23 (1976) 195-209. [7] V. Balachandran and G.L. Thompson, "An operator theory of parametric programming for the generalized transportation problem, I: Basic theory", Naval Research Logistics Quarterly 22 (1975) 79-100. [8] V. Balachandran and G.L. Thompson, "An operator theory of parametric programming for the generalized transportation problem, II: Rim, cost and bound operators", Naval Research Logistics Quarterly 22 (1975) 101-125. [9] V. Balachandran and G.L. Thompson, "An operator theory of parametric programming for the generalized transportation problem, III: Weight operators", Naval Research Logistics Quarterly 22 (1975) 297-315. [10] V. Balachandran and G.L. Thompson, "An operator theory of parametric programming for the generalized transportation problem, IV: Global operators", Naval Research Logistics Quarterly 22 (1975) 317-339. [11] E. Balas and P. Ivanescu (P.L. Hammer), "On the transportation problem, I and II", Cahiers du Centre de Recherche Operationnelle (Brussels) 4 (1962) 98-116 and 131-160. [12] E.H. Bowman, "Production scheduling by the transportation method of linear programming", Operations Research 4 (1956) 100-103. [13] G.H. Bradley, G.G. Brown and G.W. Graves, "Design and implementation of large-scale primal transshipment algorithms", Management Science 24 (1977) 1-34. [14] L. Caccetta, "The transportation problem with piecewise linear costs", Working Paper, Department of Mathematics, University of Western Australia (1976).
84
V. Balachandran, V. Srinivasan, G.L. Thompsonl Applications operator theory
[15] A. Charnes and W.W. Cooper, Management models and industrial applications of linear programming, Vol. 1. (Wiley, New York, 1961). [16] A. Charnes and M. Kirby, "The dual method and the method of Balas and Ivanescu for the transportation model", Cahiers du Centre de Recherche Operationnell (Brussels) 6 (1964) 5-18. [17] G.B. Dantzig, Linear programming and extensions (Princeton University Press, Princeton, NJ, 1%3). [18] G.B. Dantzig and P. Wolfe, "Decomposition principle for linear programs", Operations Research 8 (1960) 101-111, [19] W.L. Eastman, "Linear programming with pattern constraints", Unpublished doctoral dissertation, Harvard University, Boston (1958). [20] K. Eisemann, "The generalized stepping stone method for the machine loading model", Management Science 11 (1964) 154-176. [21] C.O. Fong and V. Srinivasan, "Multiperiod capacity expansion and shipment planning with linear costs", Naval Research Logistics Quarterly 23 (1976) 37-52. [22] C.O. Fong and V. Srinivasan, "Determining all nondegenerate shadow prices for the transportation problem", Transportation Science 11 (1977) 199-222. [23] C.O. Fong and V. Srinivasan, "The multi-region dynamic capacity expansion problem, I and II", Operations Research 29 (1981) to appear. [24] L.R. Ford and D.R. Fulkerson, Flows in networks (Princeton University Press, Princeton, NJ, 1963). [251 R.S. Garfinkel and M.R. Rao, "The bottleneck transportation problem", Naval Research Logistics Quarterly 18 (1971) 465-472. [26] S.J. Garstka, "Computation in stochastic programs with recourse", unpublished doctoral thesis, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, PA (1970). [27] S.I. Gass and T. Saaty, "The computational algorithm for the parametric objective function", Naval Research Logistics Quarterly 2 (1955) 39-45. [281 S.I. Gass and T. Saaty, "Parametric objective function, Part II: Generalization", Operations Research 3 (1955) 395-401. [29] L. Gelders and P.R. Kleindorer, "Coordinating aggregate and detailed scheduling decisions in the one-machine job shop, Part I: Theory", Operations Research 22 (1974) 46-60. [30[ D.H. Gensch and E.A. Kervinen, "The traveling salesman's subtour problem", Working Paper, School of Business Administration, University of Wisconsin, Milwaukee, WI (1975). [31] F. Glover, D. Karney, D. Klingman and A. Napier, "A computational study on start procedures, basis change criteria, and solution algorithms for transportation problems", Management Science 20 (1974) 793-813. [32] F. Glover and D. Klingmen, "New advances in the solution of large scale network and network related problems", Research Report CS-177, Center for Cybernetic Studies, The University of Texas at Austin (1974). [33] W. Grabowski and W. Szwarc, "Optimum transportation bases", Zastosowania Matematyki (Warsaw) 9 (1968) 357-389. [34] P.L. Hammer, "Time-minimizing transportation problems", Naval Research Logistics Quarterly 16 (1969) 345-357. [351 M. Held and R.M. Karp, "The traveling salesman problem and minimum spanning trees, Part II", Mathematical Programming 1 (1971) 6-25. [36] C. Holt, F. Modigliani, J. Muth and H. Simon, Planning production, inventories and workforce (Prentice-Hall, Englewood cliffs, NJ, 1960). [37] J.M. Mulvey, "Network relaxations for 0-1 integer programs", Paper No. WA 3.4, presented at the Joint National Meeting of the Operations Research Society of America and The Institute of Management Sciences, Las Vegas, NV, 1975. [38] A. Perry, "An optimum seeking algorithm for locating lock boxes", Midwest AIDS Proceedings (1975). [39] R.C. Rao and D.P. Rutenberg, "Multilocation plant and timing", Management Science 23 (1977) 1187-1198. [40] D.M. Shapiro, "Algorithms for the solution of the optimal cost and bottleneck traveling salesman problems", unpublished doctoral thesis, Washington University, St. Louis, MO (1966).
V. Balachandran, V. Srinivasan, G.L. Thompson/Applications operator theory
85
[41] M. Simmonard, Linear programming (Prentice-Hall, Englewood Cliffs, NJ, 1966). [42] T.H.C. Smith, V. Srinivasan and G.L. Thompson, "Computational performance of three subtour elimination algorithms for solving asymmetric traveling salesman problems", Annals ,9[ Discrete Mathematics 1 (1977) 495-506. [43] V. Srinivasan, "A transshipment model for cash management decisions", Management Science 20 (1974) 1350-1363. [44] V. Srinivasan and G.L. Thompson "An operator theory of parametric programming for the transportation problem, I", Naval Research Logistics Quarterly 19 (1972) 205-225. [45] V. Srinivasan and G.L. Thompson, "An operator theory of parametric programming for the transportation problem, II", Naval Research Logistics Quarterly 19 (1972) 227-252. [46] V. Srinivasan and G.L. Thompson, "Determining optimal growth paths in logistics operations", Naval Research Logistics Quarterly 19 (1972) 575-599. [47] V. Srinivasan and G.L. Thompson, "An algorithm for assigning uses to sources in a special class of transportation problems", Operations Research 21 (1973) 284-295. [48] V. Srinivasan and G.L. Thompson, "Benefit-cost analysis of coding techniques for the primal transportation algorithm", Journal of the Association for Computing Machinery 20 (1973) 194-213. [49] V. Srinivasan and G.L. Thompson, "Solving scheduling problems by applying cost operators to assignment models", in: S.E. Elmaghraby, ed., Symposium on the theory of scheduling and its applications (Springer, Berlin, 1973) pp. 399-425. [50] V. Srinivasan and G.L. Thompson, "Algorithms for minimizing total cost, bottleneck time, and bottleneck shipment in transportation problems", Naval Research Logistics Quarterly 23 (1976) 567-595. [51] V. Srinivasan and G.L. Thompson, "Determining cost vs. time Pareto-optimal frontiers in multi-modal transportation problems", Transportation Science 11 (1977) 1-19. [52] V. Srinivasan and G.L. Thompson, "Cost operator algorithms for the transportation problem", Mathematical Programming 12 (1977) 372-391. [53] W. Szwarc, "The stability of the transportation problem", Mathematica 4 (1962) 397-400. [54] W. Szwarc, "The time transportation problem", Zastosowania Maternatyki 8 (1%6) 231-242. [55] G.L. Thompson, "Recent theoretical and computational results for transportation and related problems", in: W.H. Marlow, ed., Modern trends in logistic research (MIT Press, 1976). [56] A.F. Veinott, Jr., "Lattice programming", Working Paper, Department of Operations Research, Stanford University, Stanford, CA (1977). [57] H. M. Wagner, Principles of operations research with applications to managerial decisions (Prentice-Hall, Englewood Cliffs, NJ, 1975). [58] M. Zeleney, Linear multiob]ective programming (Springer, Berlin, 1974).
A d d i t i o n a l A p p l i c a t i o n s o f O p e r a t o r T h e o r y ( A d d e d in P r o o f ) [59] R.V. Nagelhout and G.L. Thompson, "A single source transportation algorithm", Computers and Operations Research 7 (1980), 185-198. [60] R.V. Nagelhout and G.L. Thompson, "A cost operator approach to multistage location-allocation", European Journal of Operations Research, 6(2) (1981) 149-161. [61] R.V. Nagelhout and G.L. Thompson, "A study of the bottleneck single source transportation problem", Management Sciences Research Report 456, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, PA (1980).
Mathematical Programming Study 15 (1981) 86-101. North-Holland Publishing Company
A N E T W O R K M O D E L OF INSURANCE COMPANY CASH F L O W MANAGEMENT Roy L. CRUM and David J. NYE University of Florida, Gainesville, FL, U.S.A. Received 12 January 1979 Revised manuscript received 12 July 1980
Cash flow management activities of property-casualty insurance companies deal with both insurance underwriting operations and investment portfolio pursuits. For the greatest overall efficiency, these two dimensions require simultaneous consideration and coordination to effect the optimum strategy for each. The complexities involved in managing the joint cash flow are such that the judgmental methods or heuristic guidelines often employed are inadequate to investigate properly the various tradeoffs. Analytical techniques, such as various forms of general linear and nonlinear programming, have been proposed to aid in these investigations, but are of little operational value because of problem size. In other applications, network analysis has been able to overcome this type of difficulty. Hence, in this paper an application of a network model to the problem of managing the overall cash flow of a medium sized property-casualty insurance company is described. The model allows for coordination of the two dimensions, and an example situation is indicated.
Key words: Generalized Networks, Cash Flow Management, Investment Decisions, Portfolio Theory, Insurance Companies.
1. Introduction There are two dimensions to the problem of effective management in property-casualty insurance company operations. One involves the composition of the insurance portfolio: Which lines should be emphasized, how much of each should be sold, and when and under what conditions should efforts be expended to make sales? These and related questions require consideration of factors such as the expected loss ratio for each line, the amount of time that claims payments can be delayed without implicit penalty, the ramifications of further delays, and the alternatives available for investing or otherwise utilizing the cash flow. The other dimension concerns the firm's investment portfolio: What is the proper mix between debt and equity securities, what maturity structure should be sought, how much should be invested in which securities, and when to buy--and to sell? Legal requirements and capital market conditions are crucial in answering these questions; but equally important is the amount and pattern of cash flow generated by the insurance operations. Thus, for the greatest overall efficiency, both insurance and investment dimensions require simultaneous consideration and coordination to effect the optimum strategy for each. Several methodologies have been proposed to achieve this needed coor86
R.L. Crum, D.J. Nye/ lnsurance cash flow management
87
dination. One method involves informal "judgment", for instance: Make the best investment possible for the time period for which the cash is available while postponing payouts. In the presence of high inflation, however, postponing the payment of claims might have an implicit--or explicit--cost higher than the return that could be earned by investing the funds. Paying the claims early avoids the cost, but then the period for which the cash is available also changes. Such dynamic interactions tend to reduce the efficiency of cash flow management based on heuristic decision rules, even when the heuristics are complex and some flexibility is introduced. Without the further capability to investigate systematically the various tradeoffs, inefficiencies are unavoidable. Other techniques such as linear programming [9], goal programming [10] and chance constrained programming [ll] have been proposed to circumvent these inefficiencies. These formulations have proven inadequate for many applications for two reasons. First, managers who would use such models frequently neither understand nor trust the mathematical formulation, particularly in highly uncertain situations, and thus resist their use. Second, because of computational limitations, a model with sufficient complexity to capture the necessary interrelationships is very slow and expensive to solve--indeed, if an optimum can even be obtained. Applications of network analysis to insurance company cash flow management have not been reported in the literature, but, as shown in [3, 4, 5 and 6] among others, network formulations have advantages over other models in terms of understandability and rapid solution time. Thus, in this paper we describe such an application of network modeling to the cash flow management problem of a multiple line property-casualty insurance company. The fundamental structure of a generalized network model is explained in the next section. This is followed by a discussion of the relationships found in the various subsections of the insurance company model formulation. The input data for an example are then presented and the findings from an analysis of several model runs are discussed. The paper concludes with an explanation of how a model such as this one can be employed in the cash flow planning and management process of an insurance company. I. Model structure
A network model consists of m nodes that are connected pairwise by a set A of direct cash flow arcs. The variable bi represents the supply of cash or the demand for funds at node i (where supply is denoted as a negative quantity and demand as positive). An arc which connects node i to node j is denoted by an ordered pair (i, j). Each arc (i, j) in a generalized network can then be described in terms of five parameters. Defining Lij and Uij as respectively the lower and upper bounds on the amount of flow, x~j, on the arc (i, j) (U~j need not be finite), a multiplier, pii, is applied to the actual flow leaving node i, and c~i is the unit cost
88
R. 1.. Crum, D.J. Nye/Insurance cash flow management
of the flow x,i from node i to node j. This cost or objective function coefficient may be an interest cost (or receipt) or it may be an opportunity cost. A mathematical statement of a generalized network is shown below: minimize subject to
~,.~A cijxii, ~
p,,x,j- ~
x,,=b,
i - - I , 2 ..... m,
L~i <- x,j <_ U~i, (i, j) E A. Generalized network models of this form are natural representations of many cash flow phenomena. Thus, they are often rather easy and straightforward to structure adequately without sacrificing much of the richness of the underlying process. This is shown to be true for the insurance company cash flow model described below. To facilitate exposition, the model is arbitrarily broken down into three subsections corresponding to the three primary activities of most multiple line property-casualty insurance companies: the insurance portfolio, the investment portfolio, and acquisition of external capital.
2. Insurance portfolio The main driver of the insurance company model is the cash flow generated by insurance sales. Assuming m lines of insurance written by the firm and an n-period planning horizon for the model (where the periods are not restricted to be of equal length), INS,i represents the actual dollar volume of insurance written in line i during period j. The maximum amount of insurance that can be written in period j for line i is designated MV, i, so CRij = MVij - INS,~ represents any additional insurance that could have been sold in line i during period j had the model solution chosen to do so. Since there will be some underwriting expenses for line i during period j, UE,i, where UE~i is a decimal fraction of INS,, then only INS~I[I- UE,i] dollars are actually received by the firm from insurance sales. A variation of this formulation may merit consideration for some lines. If it can be assumed that the amount of insurance sold in line i during period j is an increasing function of the underwriting expense, then the model will choose the level of sales as a function of incremental profitability. That is, assume that underwriting expense is an increasing step function of the amount of sales (perhaps higher commissions are required to induce agents to expend more efforts to increase sales). Define UE,jk as the underwriting expense for line i during period j in step k. Similarly, INS~ik represents the dollar volume of
R.L. Crum, D.J. Nye/ lnsurance cash flow management
89
insurance written in line i during period j that is subject to the underwriting expense in step k. INSok is bound from above by UINSijk, representing the maximum sales volume attainable in step k. Then r INS~ = ~--1INSijk and the dollar amount received by the firm is K ~__l INSiik[1 - UEijk]. Note that since UEi/k < UEii(k+l), INS~jk[1 - UEijk] > INSij(k+l)[1 -- UEij(k+l)]. So long as the objective function coefficients are the same for the flow in each step (they equal zero for this variable in the model), the arc corresponding to the step that permits the greatest flow to arrive at node j will be used to capacity before flow is nonzero on any higher cost steps. Thus, with this alteration, the model determines the level of sales for line i as a function of the increasing underwriting cost. The insurance in force will give rise to claims for payment and thus to an outflow of cash from the firm. The maximum level of claims that could be incurred in line i for insurance written in period j, CL~j, is a function of the maximum insurance that could be written, MV~j, and the loss ratio, LR~i, where the loss ratio is defined as losses incurred divided by premiums earned. Specifically, CLij = MVii[LR0]. From this it should be evident that CR~j[LR~j] represents the reduction of claims from CL~i if INS~i is less than MVii, so that only insurance sold can give rise to claims. The underwriting profit realized each period by the firm from sales of line i is a function of the sum of the loss ratio, LRgi, and the underwriting expenses, UE~j. If LR~j+UE~j < 100 percent, there is an underwriting profit associated with writing the line. Otherwise there is an underwriting loss. Assuming that the model elects to sell insurance in line i during period j, then INS~j is greater than zero and claims payments must be made. The amount of claim equals CL~i- CR~j[LR~]. The model has flexibility to determine the time distribution of payments, but the longer that claims are delayed, the higher will be the implicit cost to the firm. This penalty cost may be related to such things as inflation, or it may represent management's estimated cost of ill-will caused by late payments. CPiik represents the dollar amount paid during period j for claims arising out of insurance written in line i during period k, where k-<j. This payment may be reduced by a penalty cost, LP~ik, where i, j, and k are defined as above, so that only CP~ik[1-LP~jk] is available to satisfy the claim. It is likely that LP~k increases as payments are delayed for longer periods, so more and more dollars will be required to satisfy the claim.
R.L. Crum, D.J. Nye/ Insurance cash flow management
90
An important relationship exists between underwriting profit (or loss) and the time distribution of claims payment. Even though at current premium levels an insurance line may not have an underwriting profit, this may not be necessary for profitably writing insurance. If there is a long payment lag of claims, perhaps the firm can earn a sufficient annual return on the gross amount received so that the investment profit will offset the underwriting loss (and any late payment penalties). This does not necessarily imply, however, that the heuristic decision rule of postponing payments as long as possible is always called for. Many insurance companies prefer to settle some claims immediately rather than contest them and risk a much larger payment in the future. Depending on the penalty structure, the temporal distribution of claims payments, and investment conditions in the capital market, writing insurance at an underwriting loss to acquire timely cash flow may be the most profitable course of action. This can even occur when the payment lag will not permit investment of the proceeds for sufficient time to overcome the underwriting loss--the underwriting loss may represent the cheapest "cost of capital". An integrated analytical model is useful for "sorting through" these complexities and identifying the most profitable course of action. It is also necessary to pay claims for insurance written before the start of the model but still in force. The amount of these claims for line i is denoted PCi. Then PPC~j represents payments in period j to satisfy the claims PCi. Again, a penalty function for delayed payment may be introduced so that PPCij[1 - LPij] is the effective payment. To this point cash inflows have been generated by the firm from insurance sales and cash outflows have been incurred to pay claims. ICBj represents the initial cash balance in the firm in period j before any operations are undertaken. As operations commence and the additional cash flows are generated, for all periods except the first, CBj_I.j denotes the amount of cash carried over by the firm the previous period. A lower bound on this flow represents a desired minimum cash balance. In a similar vein, CBj,j+~ is the cash balance transferred to the next period. A master sink is used as a cash flow collection device by the model and represents the planning horizon. The general relationships described above can be portayed by four sets of equations. For the cash flow nodes, the insurance nodes, the existing claims nodes and the new claims nodes respectively, the equation sets are: m
m
for j = 1 , . . . , n +1,
n
k = l ..... n.
m
(1)
- INSij - CRij = - MV~i for i = 1..... m,
j = 1..... n.
(2)
R.L. Crum, D.J. Nye/ Insurance cash flow management
91
n
~ PPC/j[1 - LPij] = PCi =
for i = 1 . . . . . m.
(3)
n
1 CPijk[1 - LPi~k] + C R i j [ L R J = CL/j for i = 1 . . . . . m,
j = I , . . . , n.
(4)
A graphical representation is also feasible. Fig. 1 portays t h e s e relationships for t w o lines of insurance and three periods.
-MV2t
-IC81
-MV11 LINE 11
-
~
C A1S H
.
~r "~
P=1-UE21
(
LINE 21
IL M,.,MUM(
#~'
CASH
CL21
C L11
r
CLAIMS
CLAIMS
221NE'~My22 -UE.
C L12
\
/
\
/':'-o~.
EW
N E W
CLAIMS 12
CLAIMS 22 /
tl
L22
) -MV23
-MV13
AS 23
P=l-UE13
v~
L
q
~L
Fig. I. Insurance portfolio operations.
92
R.L. Crum, D.J. Nye/Insurance cash flow management
3. Investment portfolio The investment side of the company's operations includes investment in P types of long-term securities and Z different forms of short-term financial assets. Starting with an initial investment in long-term asset p, IVLp, the firm can increase the amount by contributing additional capital during period j, designated by LTIjp. There will usually be a transactions charge or brokerage commission associated with this investment in security p during period j, denoted TCLjp, so that only LTIip[1-TCLjp] dollars actually enter the long-term investment account. It is also possible to utilize the long-term investment as a source of capital by reducing the amount invested. This reduction can take two forms. First, if it involves only the current period return on investment (in the form of interest or dividends), ROIjp, there is no transaction cost involved. However, an upper bound equal to the expected dollar value of the return on investment is required to differentiate between this flow and a liquidation of principal (return of the investment), or RLTjp. Liquidating the principal, or selling the stock or bonds, involves a brokerage charge, LCLjp, so that only RLTjp[I-LCLjp] dollars are available for other uses by the firm. Clearly, without the upper bound restriction on the return on investment flow, this liquidating charge could be circumvented. The final possibility with long-term financial assets is to retain them in the portfolio to earn an additional return. LTPjj+Lp represents the balance invested in long-term asset p that is carried forward to the next period. Similarly, LPTj ~,j,p is the balance carried forward from the previous period, and it earns interest at the rate of LINj-Li,p. Thus, LTPj ~,j,p[1+ LINj-I,j,p] is the interest plus principal available in the next period. Depending on such things as legal constraints, charter provisions, or managements' investment diversification policy, upper and/or lower bounds can be placed on the flow LTPj,j+Lp. With an integrated model such as this one, the implicit cost to the firm of such externally imposed constraints can be explored. In a similar manner, the Z different forms of short-term financial assets are modeled. An initial value of investment in short-term asset z, IVSz, can be input if deemed appropriate. STIj,: then represents additional investment in short-term asset z during period j, with TCSjz denoting any transactions cost associated with this incremental investment. The model determines the maturity structure of the short-term portfolio with RSTj, k,: representing the return in period j of principal invested in short-term asset z during period k, where k < j. The interest earned on this investment is SINj, k,:, so the cash flow received during period j from investment made in short-term asset z in period k is RSTj, k,:[1 + SINj.k,z]. Although associated with all of the firm's operations rather than the investment operation alone, another use of cash that will reduce the amount of capital available for investment is included in the model at this point. There are fixed
R.L. Crum, DJ. Nye/ lnsurance cash flow management
93
overhead expenses such as building depreciation and maintenance costs, salaries of supervisory personnel, and utilities, among others, that must be met each period in order to remain in business. The sum of all of these fixed operating costs during period j, FOCj, must be paid during t h e period. Thus, the model selects the best source of funds to make the payment, denoted CCj. These relationships can be shown algebraically by reference to the equation systems for long-term investments, short-term investments, and fixed operating costs respectively: LTIi,p[1 - TCLj.p] + LTPj ~.j.p[1+ LINj_1,j,p] - ROIj,p - RLTj,p - LTPj.j+I,p = IVLj forp=l
..... P,
j = l . . . . . n.
(5)
STIj, z[1 - TCSi, z] - ~__~RSTj, k.: = IVSi f o r z = l ..... Z,
j = l . . . . . n.
(6)
CCj = FOCi for j = 1. . . . . n.
(7)
Also, the cash node equation is expanded from the form shown in (1) to include the investment portfolio: / . ROIj.p + / ~ RLTj,p[1 - LCLj.p] + p=l
p=l P
RSTj,k,~[1+ SINi,k,~] z=l k = l
Z
- ~ LTIi,. - ~ STIj, z - CCj p=l
z=l
+ [insurance portfolio relationships] = ICB i for j = 1,..., n + 1.
(la)
A graphical representation of these relationships is given in Fig. 2 for one long-term investment and one short-term asset.
4. External capital sources
Both the insurance portfolio and the investment portfolio generate cash inflows and, at times, both require a net expenditure of funds. Not infrequently the most profitable means for satisfying this cash need is to acquire capital from external sources, either in the debt or equity markets. Considering first the debt markets, there are T sources from which loans can be obtained. BLtj represents the maximum amount of capital available from source t in period j, and the per period interest rate is IBLtj. If the entire line of credit is subscribed, the dollar amount to be repaid (interest plus principal) in period j + 1 is BRz,j.I where BRt,j+l = BLtj[1 + IBL0].
94
R,L. Crum, D,J. Nye/ lnsurance cash [low management -IVL I
-ICB
U=MAX CASH1
;
ROI11
P=l-TCLll p=l -LCLll
FOC
IV$1
CHARGE
(
L=MINIMUM CASH BALANCE
an~Ki
P= 1-1- LIN211
TERM INVEST
U=MAX
ROI21
P=I--IEL 2 p = ' / - - LCL 2
TERM INVEST
FOC P= 1"t- L I N32 I
TERM INVEST
CHARGE 2
MAX
C A3 S H ~ ~-~
/
ROt31
P ='I--TCL31
TERM INVEST
p=l--I-CL31
SINK
_~ L Fig. 2, The investment portfolio.
The firm can also choose to borrow less than BLtj (or zero), so ABBti denotes the dollar amount actually borrowed from source t during period j. In this case, ANBtj is the amount not borrowed and ANB,i = BLti - ABBti. It is convenient to assume that all loans mature in one period. This causes no loss of generality since the model is free to borrow in period j + 1 to repay borrowing in period j. Only the periodic interest must be paid each period. ARBt,i+I denotes the amount of dollars repaid in period j + 1 because of borrowing from source t during period j. Thus,
R.L. Crum. D.J. Nye/ lnsurance cash flow management
95
ARB,.,.I = ABB,)[I + IBL,~]. Since ABB, may be less than BL,j, ARB,.j, ~ may be less than BR,.j§ However, BR,.j.z = ABB,i[I + IBL0] 4 ANB,i[1 + IBL,i], so the second term represents the reduction of the maximum repayment for capital not borrowed during period j from source t. Thus, the model requires repayment by the firm only for the amount actually borrowed. The other source of external capital is from the equity markets. Stock is normally issued in large blocks, but it is likely that several sizes of issue are feasible. Another complicating feature of stock issues is that there is usually a fixed underwriting charge that cannot be approximated adequately as being completely variable. Thus, a zero-one integer constraint is appropriate to handle the fixed charge.' Assume lhat S mutually exclusive issue sizes are feasible in period j." SK~j indicates the availability for sale of stock issue s in period j by a value of - I. The sale of this issue s during period j is represented by a zero-one integer variable STK,j. The multiplier on this arc is the total net price realized from the sale, NPRs91Thus, STK~j[NPR~i] is the dollar cash flow received by the firm from the sale of stock issue s in period ./. Although the firm will only receive STK,~[NPR~] dollars from the sale of the stock, the total yield to investors is computed on the basis of their purchase price (NPR,, plus the flotation cost). Since common stock is a perpetuity, this capital contributed by investors will never have to be repaid (unless the firm is liquidated); but a distortion will be introduced into the model if allowance is not made to preserve this principal plus any expected appreciation in the market price. Thus, at the horizon (represented by the master sink), a flow must be induced equal to the return of principal plus the expected capital gains yield. ~ This is accomplished by associating with each stock issue a node with a per share demand equal to the initial market (purchase) price, P0, times (1 + K,) "§ or one plus the total yield accrued between period j and the horizon. This demand per share times the number of shares issued is denoted by DR,j. if STK,j equals zero, meaning that stock issue s was not placed in the market in period j, then a dummy variable, DUMsj, is required to satisfy the demand. The value of DUM,i equals I - S T K . , i, so it will also be a zero-one integer
' The use of This z e r o - o n e integer constraint m e a n s that a continuous generaJized network solution code cannot bc used to solve the model [1,7 and 8]. If a mixed integer generalized network code is not available, the constraint can be removed and the continuous approximation employed. Even in this case, the z e r o - o n e condition appears to hold for most problems. 2 For more information about how to model mutually exclusive z e r o - o n e integer constraints in cash flow networks see [2[. ' The capital gains yield is equal to the earnings retention ratio times the expected return on book equity. For a further explanation of this point, see [13].
96
R.L. Crum, D.J. Nye/ Insurance cash flow management
variable. The multiplier applied to DUMsj is DSKsj and is numerically equal to DRsj, or P0(1 + Ks) "+l-j times the number of shares issued. If STKsj equals one, meaning that stock issue s was sold in the market in period j, then DUMsj[DSK~] = 0 and other channels must be utilized to satisfy the stock yield requirements in periods j + 1 through n + 1. Let DIVs# denote dividends paid in period r on stock issue s issued in period j, where r > j. UDIVsjr is an upper bound on this variable and for periods j + 1 through n, its value equals the expected dollar dividend for the period. For period n + 1, or the horizon, the value of UDIVsj,,+~ is the expected dollar dividend plus the return of principal adjusted for the capital gain yield. Thus, UDIVsj,,+I = DRs,j- ~
DIVsjr.
r=l+l
By this device, if STKsj equals zero or one, the correct intertemporal cash flow yield characteristics will be embedded in the model. This is also true for the continuous approximation if STKsj equals zero or one. However, if 0 < STKsj < 1 in this case, some intertemporal distortion will take place--but the distortion will be one of timing rather than amount. The stock issue and required yield equations are respectively - STKsj - DSKsj = - SKsj f o r s = l .... ,S,
j = l ..... n.
(8)
n+l
DSK,j(DUMsi) + r--~j+lDIVsir = DRsj for s = l .... ,S,
j = l ..... n + l .
(9)
DIVsir --- UDIVsj,
(10)
STK~i, DSKsj = (0, 1) integer For the credit lines and repayment, - A B B t j - ANBti ~ - BLtj for t = l .... ,T,
j = l ..... n.
(11)
ANBt~(1 - IBLtk) + ARBjk = BR 0 f o r t = l .... ,T,
j = 2 ..... n + l ,
k=j-1.
Finally, the expanded cash node equation is S
T
S
J
2 STKsi(NPRsi) + 2 A S S t j - 2 2 DIVs# s=l
t=l
s=l r=0
T -
~, ARBtjk + (insurance portfolio terms) t=l
(12)
R.L. Crum, D.J. Nye/ Insurance cash flow management
97
§ (investment portfolio terms) = I C B j f o r j = 1 . . . . . n + 1.
(lb)
The graph structure for one line of credit and one issue of stock is shown in Fig. 3.
Notice that in Figs l, 2 and 3 there are no objective function coefficients, cij, shown in the model. Similar to the Weingartner horizon model [12], the generalized network equivalent of which is presented in [2], all relevant cash flow impacts are portayed by the variables with the objective being to maximize the net cash flow into the master sink that is available at the horizon. This is ICB 1 P~NPR11 - S K1
=
Kll
(0,1) INTEGER
FLOW
~
CASH1
--BLll BANK LOAN 1 'F BR12
L~MINIMUM CASH BALANCE
REPAy
"~-D RlI U= UDIVll 2
REQUIRED 1 --BL12
BANK LOAN BR13
JL
REPAY 3
3
CK
NPR13
(0,1)
CASH 3
"13 LOAN 3 FBR14
P=DSK13 REPAY 4
"~ D R
1
3
~ U:UOIV13 4
~
SINK
L
Figure 3. External sources of capital.
98
R.L. Crum, D.J. Nye/ Insurance cash flow management
equivalent to maximizing the value of the firm after repaying all incremental capital acquired by the model. As proved by Weingartner [12], this is equivalent to maximizing the value of the existing equity--the appropriate objective for a public corporation.
5. Example To illustrate the use of this insurance company cash management model, assume that the company has assets of $100 000 and an annual premium volume of $80 000. Sales are evenly divided between eight different insurance lines, and management has a choice of four asset categories: cash, short-term securities, bonds, and common stock. 5.1. Insurance returns
Each insurance line is differentiated from other lines on the basis of risk, length of time between claim occurrence and claim payment, and underwriting profitability. Risk is measured by the standard deviation of the loss ratio. The standard deviation is in the 9-11 percent range for the risky lines and 4-7 percent for the "safe" lines, and the coefficient of variation for the risky lines is in the 0.13 to 0.18 range versus 0.07 to 0.10 for the safer lines. These estimates are derived from industry data for the period 1957-1975. Of course, the loss ratios are known with certainty only on an ex post basis, so target expected loss ratios are needed for the model. Ex ante, management can vary the target loss ratios to investigate the sensitivity of the model to changes in these parameters. In this way, control limits could be established within which variation in the loss ratio for a given line would not be of particular concern. As shown in Table l, several time lags between claims occurrence and claims payments are included. Minimum, maximum and intermediate time periods are specified for each line and in some cases penalties are assessed when claims payments are delayed. These penalties are levied by an increase in the expected loss ratio. Although conscious delay of this sort may not actually be a decision variable, it is commonly recognized that there are tradeoffs between profitability and claims payment lag. The model captures this relationship. Underwriting profitability by line is the next important feature captured in the model and is summarized in Table 2. All combinations of the three factors of underwriting profitability, claims payment lag and risk level are tested using the eight insurance lines described in Table 3. 5.2. Results
Since the primary purpose of this paper is to illustrate how network models can be applied to an insurance situation, a detailed analysis of tests of the model
R.L. Crum, D.J. Nye/ Insurance cash flow management
99
Table 1 Time lag between claim occurrence and claim payment Line
Allowable time lag
1 2 3 4 5 6 7 8
12, 18 and 24 months 3, 4 and 5 years 1, 2 and 3 months 6, 9 and 12 months 9, 12 and 18 months 2, 3 and 4 years 2, 3 and 4 weeks 1, 2 and 3 months
TabIe 2 Underwriting profit by line L Expected loss ratio Underwriting expenses Underwriting profit (loss)
I
N
E
1
2
3
4
5
6
7
8
0.66
0.61
0.70
0.75
0.82
0.77
0.65
0.68
0.30
0.43
0.24
0.31
0.16
0.25
0.31
0.38
0.04
(0.04)
0.06
(0.06)
0.02
(0.02)
0.04
(0.06)
Table 3 Insurance line characteristics Line
Underwriting profits
Claims payment lag
Risk level
1 2 3 4 5 6 7 8
Positive Negative Positive Negative Positive Negative Positive Negative
Long Long Short Short Long Long Short Short
High High High High Low Low Low Low
is not presented. An important general result is that the network approach was able to capture some of the significant relationships in management's decision making. One such relationship exists between ex ante underwriting profits and the supply of insurance. The results were in conformity with the notion that an underwriting profit is not a necessary condition for insurance to be profitably supplied to the market. The relationship and dynamic interactions between the firm's insurance and investment activities were also highlighted. From a purely finance viewpoint, an
100
R.L. Crum, D.J. Nye/ Insurance cash flow management
expected underwriting loss may be considered a cost of capital. As such, the firm's underwriting decisions should be made in light of its investment financing opportunities. Clearly, underwriting activities involve many institutional factors; however the financial approach advocated here emphasizes the relationship between underwriting profits, length of claims payment lag and investment opportunities. Related to this point is the fact that results for the entire model period are optimized. Thus, for a given term structure, cash flows are managed with consideration given to both immediate and future investment financing opportunities. A final point is that unexpected allocations occurred at the horizon due to distortions in the opportunity set at that time. This tends to affect the dynamics of the planning process; but if the model horizon is carefully set several periods longer than the planning horizon, distortion in the relevant allocations can be eliminated.
6. Observations and conclusions
This paper has developed a network model for a multi-line property-casualty insurer's operations. It proved possible to integrate the variables which play an important part in the planning process, including expected profitability, time claims, distribution of payments, composition of the insurance portfolio and the selection and financing of the firm's investments. The use of integer variables increases the potential of the network model. Many important discontinuous variables associated with the issuance of new common stock, as well as with other fixed or mixed cost situations, can be modeled directly without having to rely on continuous approximations. In the network format, problem size limitations and slow solution speeds encountered with integer or mixed integer solution codes can be avoided. If the model were to be coupled to an efficient problem generator and report writer, it could be used to investigate public policy implications of the firm's pricing and underwriting decisions by following a "what if" approach. Using expected values for the insurance portfolio, planners could ask what happens if a requested price increase is disallowed, or, what happens if claims payments are accelerated? Fast, quantitative answers to these and other questions would be available for use in policy planning. The main limitation that is immediately obvious with the generalized network formulation is that it is impossible to maintain portfolio composition ratios in either the insurance or the investment portfolio without resorting to additional (non network) constraints. Approximations using upper and lower bounds are feasible, however, and generally prove to be satisfactory. This is particularly true if the model is used in an on-line interactive mode whereby "fine tuning" of
R.L. Crum, D.J. Nye/ Insurance cash flow management
101
the bounds is easy to perform. This difficulty notwithstanding, we believe that the model developed in this paper adequately demonstrates how an interactive network cash flow management system could be used to benefit the planning processes of property-casualty insurance companies.
References [1] A. Charnes, F. Glover, D. Karney, D. Klingman and J. Stutz, "Past, present and future development, computational efficiency, and practical use of large scale transportation and transshipment codes", Computers and Operations Research 2 (1975) 53-65. [2] R.L. Crum, D.D. Klingman and L.A. Tavis, "Implementation of large-scale financial planning models: Solution efficient transformations", The Journal of Financial and Quantitative Analysis l (1979) 137-152. [3] F. Glover, J. Hultz, D. Klingman and J. Stutz, "Generalized networks: A fundamental computer-based planning tool", Management Science 12 (1979) 1209-1220. [4] F. Glover, J. Hultz and D. Klingman, "Improved computer-based planning techniques", Interfaces 4 (1968) 16-25. [5] F. Glover, D. Karney and D. Klingman, "Implementation and computational comparisons of primal, dual and primal-dual computer codes for minimum cost network flow problems", Networks 3 (1974) 191-212. [6] F. Glover and D. Klingman, "New advances in the solution of large-scale network and network-related problems", in: A. Prekopa, ed., Progress in operations research, Vol. l, Colloquia Mathematica Societatis J~inos Bolyai, 12 (North-Holland, Amsterdam, 1976) pp. 441-460. [7] F. Glover and D. Klingman, "Network application in industry and government", Research Report CCS-247, Center for Cybernetic Studies, The University of Texas at Austin (September 1975). [8] F. Glover and J. Mulvey, "Equivalence of the 0-1 integer programming problem to discrete generalized and pure networks", Operations Research, to appear. [9] A.E. Hoffiander, Jr. and M. Drandell, "A linear programming model of profitability, capacity and regulation in insurance management", Journal of Risk and Insurance l (1969) 41-50. [10] D.R. Klock and S.M. Lee, "A note on decision models for insurers", Journal of Risk and Insurance 3 (1974) 537-543. [l l] H.E. Thompson, Jolln P. Matthews and R.C.L. Li, "Insurance exposure and investment risks: An analysis using chance-constrained programming", Operations Research 5 (1974) 991-1007. [12] H.M. Weingartner, Mathematical programming and the analysis of capital budgeting problems (Prentice-Hall, Englewood Cliffs, N J, 1963). [13] J.F. Weston and E.F. Brigham, Managerial finance, Sixth Edition, (The Dryden Press, Hinsdale, IL, 1978).
Mathematical Programming Study 15 (1981) 102-124. North-Holland Publishing Company
TRAFFIC SCHEDULING VIA BENDERS DECOMPOSITION Robert R. LOVE, Jr. The Kelly-Springfield Tire Company, Cumberland, MD, U.S.A. Received I May 1978 Revised manuscript received 21 April 1980.
This paper presents a mathematical formulation and a solution technique for a class of traffic scheduling problems. Each problem in this class is characterized by a company-owned fleet of tractors and trailers, a calendar of shipments to be made during a specified scheduling period, and an option to contract any shipment to an independent hauler. The solution technique uses the Benders decomposition algorithm to determine the routing of each tractor and trailer in the fleet and the shipments to be made by independent haulers. Computational results are summarized, and the application of the model to Kelly-Springfield's traffic scheduling problem is discussed.
Key words: Scheduling, Network Applications, Routing, Integer Programming.
O. Introduction Suppose a company owns a fleet consisting of a specified number of tractors and a specified number, usually larger, of trailers. The reasoning for more trailers than tractors is that tractors represent a much larger capital investment and can be used while trailers are being loaded and unloaded. During the period to be scheduled, or scheduling horizon, the company has a calendar of shipments which must be made. The basic unit of shipment is one trailer load. Each of these shipments can be made by using a company-owned tractor and trailer or by engaging an independent hauler. The calendar of shipments specifies for each shipment the source, the destination, and the time the trailer is to begin loading the shipment. Independent haulers' fees, trailer loading and unloading times, and company fleet interlocation travel times and travel costs for full trailers, for empty trailers, and for unattached tractors are assumed known. The company knows the initial locations of all its tractors and trailers and has determined the final distribution of tractors and trailers about various locations which would be desirable at the end of the scheduling horizon. A feasible traffic schedule which minimizes total traffic cost for the scheduling horizon is sought. To be feasible, a traffic schedule must satisfy the following conditions: (1) Each shipment must be made, either by the fleet or by an independent hauler. (2) The initial locations of the tractors and trailers must be considered. (3) The final distribution requirements for tractors and trailers must be met. 102
Robert R. Love/Traffic scheduling via Benders decomposition
103
(4) The routing of each tractor and of each trailer must adhere to the calendar of shipments, trailer loading and unloading times, and interlocation travel times. The total cost of a traffic schedule is composed of the costs of: (1) shipments made by the fleet, (2) shipments made by independent haulers, (3) movements of empty trailers, and (4) movements of unattached tractors. This traffic scheduling problem is an extension of the tanker scheduling problem which determines the minimum number of tankers required to meet a fixed schedule. The tanker scheduling problem was first formulated by Dantzig and Fulkerson [6] in 1954, and can be solved efficiently as a network flow problem. If the number of tractors equals the number of trailers, and if a tractor and a trailer are considered a unit for routing, then the traffic scheduling problem can be viewed as a tanker scheduling problem. Familiarity with the tanker scheduling problem would b e helpful in following the development of the mathematical formulation of the traffic scheduling problem. The algorithm to be presented for the traffic scheduling problem is an extension of the algorithm developed for a production planning problem having the same mathematical structure [18]. In Section 1, we formulate the traffic scheduling problem as a linear integer program. Section 2 presents an example of the traffic scheduling problem. Section 3 demonstrates the applicability of the Benders decomposition technique for mixed integer linear programs to the linear integer program. In Section 4, we present the network representations for the subproblems obtained from the decomposition. In Section 5, we describe the Benders decomposition algorithm for the traffic scheduling problem and present two heuristic procedures developed to obtain approximate solutions to the resulting Benders integer programs. Section 6 summarizes computational results for each of these procedures. In Section 7, we consider extensions of the traffic scheduling model. Section 8 discusses the application of the traffic scheduling model to the traffic scheduling problem at The Kelly-Springfield Tire Company. Section 9 discusses the applicability of the general traffic scheduling model. I. Model formulation
1.1. Model formulation overview The problem of generating a traffic schedule can be viewed as having two subproblems, a trailer scheduling subproblem and a tractor scheduling subproblem. The trailer scheduling subprogram can be considered a Master Problem which develops a routing for each trailer in the fleet and determines the shipments to be made by the fleet and the shipments to be made by independent haulers. The tractor scheduling subproblem develops a routing for each tractor in the fleet which accomplishes all trailer movements implied by the trailer routings. The interplay between the trailer and tractor subproblems is considered in the tractor sub-
1o4
Robert R. Love/Traffic scheduling via Benders decomposition
problem. If a shipment is to be made by the fleet as determined in the trailer subproblem, then there is a requirement for a tractor to make the shipment in the tractor subproblem. If a shipment is to be made by an independent hauler as determined in the trailer subproblem, then there is no requirement for a tractor to make the shipment in the tractor subproblem. If an empty trailer is moved in the trailer subproblem to begin loading a shipment or to satisfy a final distribution requirement for trailers, then there is a requirement for a tractor in the tractor subproblem to make the empty trailer movement.
1.2. Trailer s u b p r o b l e m
Consider the problem of developing a routing for each trailer in the fleet. If a specified shipment is to be made by the fleet, then a trailer must be available at the origin of the shipment when the shipment is to begin loading. This trailer makes the shipment and becomes available for redeployment at the destination when the shipment has been unloaded. If a specified shipment is to be made by an independent hauler, there is no trailer required at the origin of the shipment and no trailer subsequently available at the destination. If the fleet has N trailers, and there are D shipments available for consideration by the fleet, then there are N + D potential requirements for trailers; namely: (1) N requirements for trailers to satisfy the final distribution of trailers, and (2) D requirements for trailers to make the shipments by the fleet. We will denote final distribution requirements as demands 1 through N, and the shipment requirements as demands N + 1 through N +D. Similarly, there are N + D potential supplies for trailers to satisfy these demands; namely: (1) N trailers available from the initial distribution of trailers, and (2) D trailers available after making shipments by the fleet. We will denote trailers available initially as supplies 1 through N, and trailers available after making shipments by the fleet as supplies N + 1 through N + D. If a shipment is made by an independent hauler, there is no demand for a trailer in the fleet to make the shipment, and there is no trailer available when the shipment has been made. We define Yij for i, j = 1..... N + D to be a 0--1 variable which is 1 if and only if supply i for trailers as described in the preceding paragraph is used to satisfy demand j. Let Aij be the cost associated with Y~j = 1. Then Ai i is the cost of the empty trailer movement (if any) from the location of supply i to the location of demand j if Y0 = 1 is feasible, and ~ otherwise. The assignment Y~i = 1 is feasible if supply i can actually be used to satisfy demand j, i.e., if (1) the time supply i is available plus (2) the travel time (if any) from the location of supply i to the location of demand j is not greater than (3) the time for demand j. We define Z~ for i = 1..... D to be a 0-1 variable which is t if and only if shipment i is made by an independent hauler. Let Bi be the penalty cost (positive or negative) incurred by using an independent hauler to make shipment i, i.e., B~ is equal to (1) the cost of making shipment i by an independent hauler minus (2) the cost of making shipment i by the fleet.
Robert R. Love/Tragic scheduling via Benders decomposition
105
The trailer subproblem of the traffic scheduling problem can be modeled as follows: N*D N+D
Minimize
D
~ N+D
subject to
~
Y,j=I,
i = l ..... N,
(2)
Yii+Z(i-NI = l ,
i = N + l ..... N + D ,
(3)
Y,j = I,
j
(4)
i=l
N+D
E j=l
N+D =
I
..... N,
i=I
N~D
j=N+l
..... N+O
(5)
i=1
{Y~i,Z~} = 0, I
for all i, j.
(6)
Eqs. (2) represent trailers in the fleet available initially; eqs. (3) denote trailers available after making shipments by the fleet. Eqs. (4) represent the final distribution requirements for the trailers in the fleet; eqs. (5) denote trailers required to make shipments by the fleet. It should be noted that if shipment i is made by an independent hauler (Z, = I), there is no requirement for a trailer to make the shipment (eqs. (5)), and there is no trailer available after the shipment has been made (eqs. (3)).
1.3. Tractor subproblem Consider the problem of developing a routing for each tractor in the fleet given the routings for the trailers in the fleet. If a specified shipment is to be made by the fleet as determined in the trailer subproblem, then a tractor must be available at the origin of the shipment when the shipment has been loaded. This tractor makes the shipment and becomes available for redeployment when the shipment arrives at the destination. If a shipment is to be made by an independent hauler as determined in the trailer subproblem, there is no tractor required at the origin of the shipment, and no tractor subsequently available at the destination. If the fleet has M tractors, then M + D potential requirements for tractors are: (1) M requirements for tractors to satisfy the final distribution of tractors, and (2) D requirements for tractors to make the shipments by the fleet. We will denote final distribution requirements as demands 1 through M, and the shipment requirements' as demands M + 1 through M + D. Similarly, M + D potential supplies for tractors are: (1) M tractors available from the initial distribution of tractors, and (2) D tractors available after making shipments by the fleet. We will denote tractors available initially as supplies I through M, and tractors available after making shipments by the fleet as supplies M + 1 through M + D.
106
Robert R. Love/Tragic scheduling via Benders decomposition
There may be additional requirements for tractors and supplies of tractors implied by the routings for the trailers. Suppose in the trailer subproblem, Yo = 1 for a specified i and j where the location of supply i is not the same as the location of demand j. This implies that an empty trailer is to move from the location of supply i to the location of demand j. Hence, in the tractor subproblem, there is a requirement for a tractor at the location of supply i to make the empty trailer movement, and there is a tractor available at the location of demand j when the empty trailer movement has been made. If, in the trailer schedule, Y0 = 1 for a specified i and j where the location of supply i is the location of demand j, there is no empty trailer movement, and hence, no associated requirement for or supply of a tractor in the tractor subproblem. In the trailer subproblem, it is assumed that empty trailer movements are made so that the empty trailer arrives precisely at the time of the trailer requirement; i.e., any slack time is spent at the source of the empty trailer movement. Hence, the tractor becomes available for redeployment at the time of the trailer requirement. Since there are N + D requirements for trailers, there are N + D potential supplies for tractors associated with empty trailer movements. We will denote tractors available after making empty trailer movements as supplies M + D + 1 through M + N + 2/). The time of the requirement for a tractor to make a specified empty trailer movement is dependent on: (1) the time of the trailer requirement, and (2) the interlocation travel time from the source to the destination of the empty trailer movement. Hence, a unique tractor requirement could be associated with each Y0 in the trailer subproblem. It follows that there are (N + D) z potential requirements for tractors associated with empty trailer movements; we will denote these requirements as demands M + D + 1 through M + D + (N + D) 2. It should be noted that given a schedule for the trailers in the fleet, eqs. (4) and (5) imply that at most N + D of the ( N + D) 2 potential requirements for tractors to move empty trailers will be actual requirements. We define Xii for i = 1..... M + N + 2D; j = 1. . . . . M + D to be a 0-1 variable which is 1 if and only if supply i for tractors as described in the preceding paragraphs is used to satisfy demand j. Let C 0 be the cost associated with X~j = 1. Then C~i is the cost of the tractor movement (if any) from the location of supply i to the location of demand j if X~j = 1 is feasible and ~ otherwise. The assignment X~i -- 1 is feasible if: (1) the time supply i is available, plus (2) the travel time (if any) from the location of supply i to the location of demand j is not greater than (3) the time for demand j. We define WUk for i = 1..... N + M + 2D; j, k = 1..... N + D to be a 0-1 variable which is 1 if and only if supply i for tractors is used to satisfy the demand for a tractor to make the empty trailer movement implied by YjE = 1. Let Ejk be the cost of the tractor movement (if any) associated with W~jk= 1 if W~jk= 1 is feasible and oo otherwise. Finally, we define Ejk to be a 0-1 scalar which is 1 if and only if an empty
Robert R. Love/Tranic scheduling via Benders decomposition
107
trailer movement is implied by Y/k = 1; i.e., if and only if the location of supply j for trailers is not the same as the location of demand k. Given a routing for each trailer in the fleet, the tractor scheduling subproblem of the traffic scheduling problem can be modeled as follows: N+M+2D M+D
Minimize
E CiiXJs+
E
E
i=l
j=l
i=I
,/=1 k=!
M+D
subject to
N+M+2D N+D N+D
E
~
j=l
k=l
j=l
W~jk=l,
i = l ..... M,
E Wijk=l-Z(i-u,,
j=l
k=l
j=l
i=M+l
j=l
..... M + D ,
Wijk =
k=l
E E j ( i - M - D ) ~rJ(i-M-D) ' i=l
i=M+D'+I
....
.... M + N + 2 D , N+M+2D
E
(9)
N+D
N+D N+D
X x,,+ X X
j=l
(8)
N+D N+D
E Xij+ E
M+D
(7)
N+D N+D
~.. X/j+ ~ M+D
E F~ikW~ik,
(lO) (11)
Xq = l,J =1 ..... M,
i=1
N+M+2D
~..
..... M + D ,
X# = 1 - Z0_M),
j=M+I
W,jk=~j~Yjk,
j= I,...,N+o,
(12)
i=1
N+M+2D
E
i=1
{Xo, Wiik}= 0, 1
k = l ..... N + D ,
(13)
for all i, j, k.
(14)
Eqs. (8) represent tractors in the fleet available initially; eqs. (9) denote tractors available after making shipments by the fleet. It should be noted that if a shipment is made by an independent hauler, there is no tractor available. Eqs. (10) signify tractors available after moving empty trailers to satisfy trailer requirements. Eqs. (11) represent the final distribution requirements for tractors in the fleet; eqs. (12) denote tractors needed to make shipments by the fleet. Again, it should be noted that if a shipment is made by an independent hauler, there is no requirement for a tractor. Finally, eqs. (13) signify tractors needed to move empty trailers. Given the model formulations for the trailer and tractor subproblems, the traffic scheduling problem can be modeled as the following integer linear program: (TSP)
Minimize subject to
(1)+(7), (2)-(6), (8)-(14).
108
Robert R. Love/Traffic scheduling via Benders decomposition
A typical traffic scheduling problem could have 20 tractors, 40 trailers, and 200 shipments to be considered. The corresponding integer linear program for this problem would contain 22,423,000 binary variables, and 58,760 constraints. It should be noted that each element of the coefficient matrix for the traffic scheduling problem is 0, +1, or -1. Each column of the matrix associated with the variables X~ and W~, contains two nonzero elements. Each column associated with Y~ has two nonzero elements if E~i = 0 and four nonzero elements if E,/= 1. Each column associated with Z, has four nonzero elements. It has not been possible to reduce the traffic scheduling problem to a network flow problem.
2. A traffic scheduling example Suppose a company owns a fleet consisting of two tractors and four trailers, and this fleet services five locations. The scheduling horizon is one working week (144hours) beginning at 7:00a.m. Monday (time =0) and ending at 7:00a.m. Sunday (time = 144). This scheduling horizon establishes a 24-hour maintenance period each week for the fleet. The calendar of shipments available to be made by the fleet during a given week is listed in Table I. The trailer load time and the trailer unload time for each shipment is 4 hours. The travel time from location i to location j for shipments, for empty trailer movements, and for unattached tractor movements is 4 x [i - Jl, where [ [ denotes absolute value; and the associated travel cost is (100 x Ji-JD § 50. The cost to make a shipment using an independent hauler is four times the associated fleet cost.
The initial distribution (at time = 0) of tractors and of trailers in the fleet is given by Table 2.
Table 1 Calendar of shipments
Shipment number
Source
Destination
Time to begin loading
I
I
4
17
2 3 4 5 6 7 8 9 I0
4 5 4 4 3 5 5 3 I
5 3 I 5 2 3 4 5
20 37 58 63 85 89 94 97 120
3
Robert R. Love/Traffic scheduling via Benders decomposition
109
Table 2 Initial distribution of the fleet Tractor number
Initial location
Trailer number
1 2
5 1
! 2 3 4
Initial location 5 I 3 3
The final distribution (at time = 144) of tractors and of trailers should be the same as the initial distribution; i.e., locations 1 and 5 should each have one tractor and one trailer, and location 3 should have two trailers. Consider the trailer scheduling subproblem for this traffic scheduling problem. A routing for each trailer in the fleet is given by Table 3. Table 3 indicates, for example, that trailer 4 begins its route by making shipment 6. The trailer begins loading shipment 6 at location 3 at time 85 and finishes unloading the shipment at location 2 at time 97. Trailer load and unload times are each 4 hours, and the interlocation travel time is 4 hours. Trailer 4 then moves empty from location 2 to location 1, beginning at time 116 and ending at time 120. Trailer 4 completes its route by making shipment 10 from location 1 to location 3. The routings for the trailers given by Table 3 imply that shipment 8 will be made by an independent hauler. Consider the problem of developing a routing for each tractor in the fleet, given the routings for the trailers as listed in Table 3. The tractor routings have Table 3 Trailer routings
Trailer number
Source
Destination
Description
Trailer Requirement time
Trailer Redeployment time
1 1 1 1 1 l
5 4 5 3 4 5
4 5 3 4 5 3
Empty trailer movement Shipment 2 Shipment 3 Empty trailer movement Shipment 5 Shipment 7
16 20 37 59 63 89
20 32 53 63 75 105
2 2
1 4
4 1
Shipment 1 Shipment 4
17 58
37 78
3
3
5
Shipment 9
97
113
4 4 4
3 2 1
2 1 3
Shipment 6 Empty trailer movement Shipment 10
85 116 120
97 120 136
110
Robert R. Love/Tragic scheduling via Benders decomposition
~ ~ _ _
~ ' ~ ,
~
~
~
~
.
[-
a-N...
-
Robert R. Love/TraBfc scheduling via Benders decomposition
111
been incorporated in a total fleet schedule given by Table 4. Table 4 indicates that tractor 1 begins its route by moving empty trailer 1 from location 5 to location 4. This empty trailer movement begins at time 16 and ends at time 20. Trailer 1 is then loaded with shipment 2 from time 20 to time 24. Shipment 2 is made from location 4 to location 5 using tractor 1 from time 24 to time 28. Tractor 1 is then available for redeployment and trailer 1 becomes available for redeployment after the shipment has been unloaded at time 32. The remainder of the fleet schedule given by Table 4 can be described in a similar manner. It should be noted that the fleet schedule given by Table 4 satisfies the final distribution requirements for tractors and trailers. A traffic schedule for the example problem is given by Table 4 and by specifying that shipment 8 should be made by an independent hauler. It can be verified that this traffic schedule is optimal. The optimal value of the objective function for the traffic scheduling problem is 3700. If all shipments were made by independent haulers, the value of the objective function would be 9200. Thus the fleet generated savings of 5500 for the week scheduled.
3. Applicability of Benders decomposition technique Consider the partitioning of the variables for the traffic scheduling problem (TSP) into two sets x and y given by: x = {x,j, w, jk},
y = { Y/i, Z~}. Then the traffic scheduling problem can be written as Minimize
ClX + c2y,
subject to
A 3 y = d,
(15)
A i x + A 2 y = b,
x, y all 0-1 variables
where A3 is the matrix of coefficients for eqs. (2)-(5). Several observations regarding the structure of the matrices A3 and A~ can be made. Each column of the matrix of coefficients for eqs. (2)-(5) has exactly two nonzero coefficients, both + 1. It can be shown that the matrix A3 is unimodular [5] and the problem Minimize
c2y,
subject to
A 3 y = d,
y all O-1 variables
(16)
112
Robert R. Love/Traffic scheduling via Benders decomposition
has a network representation and can be solved as an assignment problem using network algorithms [2, 4, 8, 13, 17, 19, 20]. Likewise, it can be shown that given any values to the variables in y which satisfy the constraints in (16), the remaining problem in x Minimize
clx,
subject to
A~x = b - A2y,
(17)
x all 0--1 variables has a network representation and can also be solved using a network assignment algorithm. Thus, for a given y satisfying the constraints in (16), the integer restriction on the variables in x can be relaxed. This fact suggests the applicability of the Benders decomposition technique for mixed integer linear programs [1, 3, I0, 11, 15] to the traffic scheduling problem.
4. Network representations of the subproblems Increased understanding of the traffic scheduling problem may be gained by constructing the network representations of the subproblems (16) and (17) derived in the previous section. Subproblem (16) is the trailer scheduling subproblem given by (1)-(6) and has a network representation given by Fig. 1. Supply i for trailers for i = 1..... N + D is available at location Ri at time Qi. Demand j for trailers for j = 1..... N + D is required at location Sj at time Pj. We will denote by T(R, S) and C(R, S) the time and cost to move an empty trailer from location R to location S. The supply at each node on the left side of the graph equals 1 trailer. Nodes in the upper left represent trailers available initially; nodes in the lower left represent trailers available after making shipments by the fleet. The demand at each node on the right side of the graph equals 1 trailer. Nodes in the upper right represent the final distribution requirements for trailers; nodes in the lower right represent trailers needed to make shipments by the fleet. A flow of 1 unit in arc A 1 signifies that trailer 1 is used only to satisfy final requirement 3 for trailers. The cost of arc A ~is C(R1, S3) if T ( R I , S3) <~ P3 - Q~ and ~ otherwise. A flow of 1 unit in arc A 2 implies that trailer 2 is used first to make shipment 2. The cost of arc A 2 is C(R2, SN+2) if T(R2, SN+2)~
113
Robert R. Love/Traffic scheduling via Benders decomposition
RI'QZ ~
0 Si,p~ o sz,P2
R3,0 3 0
R,,o, 9 J t I I RN,QN 6
RN+I,QN+I 0
s,,P, I i I I ~ ) $N,pN
~ Pr
-
~
.
~
.
~
~x~
RN+2,QN+2 C) RN+3,QN+3 0
SN+I,PN+J
SN+2,pN+2 a,~
A5
"~-.~:)
RN+4,QN+4 (~ II I I RN+o,QN+D
SN+3,PN.+5 ~.~ SN+4,PN~.4 II I I d SN+D'PN+D
Fig. 1.Trailer networkformulation. fleet. It should be noted that a flow of 1 unit in arc A 5 exhausts trailer supply N + 4 and satisfies trailer demand N + 4. Subproblem (17) i s t h e tractor scheduling subproblem given by (7)-(14) and has a network representation given by Fig. 2. Supply i for tractors for i = 1..... M + N + 2D is available at location G~ at time/-/~. This supply is a tractor available initially if i ~<M, a tractor available after making a shipment by the fleet if M < i ~< M + D, and a tractor available after making an empty trailer movement to satisfy demand k for trailers if i = M + D + k. In the latter case, Gi = Sk and Hi Pk where Sk and Pk are the location and time for trailer requirement k. Demand i for tractors for i = 1..... M + N + 2D is required at location U~ at time V~. This demand is a final requirement for tractors if i ~< M, a requirement for a tractor to make a shipment by the fleet if M < i ~< M + D, and a requirement for a tractor to make an empty trailer m o v e m e n t to satisfy demand k for trailers if i = M + D + k . We will denote by T'(G, U) and C'(G, U), the time and cost to move an unattached tractor from location G to location U. The nodes in the upper left of the graph represent tractors available initially and have a supply of 1 tractor. The nodes in the middle left denote tractors available after making shipments b y the fleet and have a supply of 0 or t =
114
Robert R. Love/Tragic scheduling via Benders decomposition G, , H e 0-------__
Ai
0 UI ,VI
G 2 ,H 2
U2 ,V 2
GM , HM
UM, VM
GM. I ~HM+ I
UM"I ,VM+ I
GM. 2 DHM.2
UM'~2 ,VM+2
GM,,D ,HM~.D
UM+D ,VM~D
GM.D4. I , HM~.D~.I
U~*D~I , VM.D. (
GM,.D.2 , HM.D~ 2 CT) f I I
UM~
GMtN.2D,HM.N.Z D 6
'VM*o*2
(~) UM.,N.20 ,VM.N.ZD Fig. 2. Tractor network formulation.
depending upon whether shipment ( i - M ) is being made by an independent hauler or by the tieet. Nodes in the lower left of the graph depict tractors available after moving empty trailers. Node M + D + k has a supply of Ejk if Yjk = l, and a supply of 0 if all Yjk = 0. Nodes in the upper right represent final requirements for tractors: nodes in the middle right denote tractors required to make shipments by the fleet. Nodes in the lower right depict tractors required to move empty trailers. The demand for each node on the right is equal to the supply for the corresponding node on the left. A flow of one unit in arc A ~ signifies that tractor l is used only to satisfy final requirement 2. A flow of one unit in arc A 2 implies that tractor 2 is used first to make shipment l. Arc A ~ denotes that tractor M is used first to move an empty trailer to satisfy requirement ! for trailers. Arc A' signifies that the tractor used to make shipment 1 completes its route by satisfying final requirement 2. Arc A ~ implies that the tractor used to make shipment 2 is then used to move an empty trailer to satisfy requirement 2 for trailers. A flow of one unit in arc A 6 denotes that the tractor used to move an empty trailer to satisfy requirement l for trailers is then used to make shipment D. The cost of the arc from supply node i to demand node j is given by C'(G~, Uj) if T'(Gi, Uj) ~< Vj - Hi and ~ otherwise.
Robert R. Love/Tragic scheduling via Benders decomposition
II 5
5. The Benders decomposition algorithm We describe the Benders decomposition algorithm for the traffic scheduling problem. The Benders integer program at iteration K has the form Minimize
z,
subject to
z >1 c2y + u~kJ(b - A2y),
(18) k = 1..... K,
(19)
A3y = d,
(20)
0 <~ y ~< i and integer.
(21)
Let y(K~denote an optimal solution to problem (18)--(21). We define u "~ to be a zero vector and u r for k = 2, 3 . . . . . K to be an optimal solution to the dual of problem (17) for y = y~k-~>given by Maximize
u ( b -- A ~ y t k - ~ ) ) ,
subject to
uA~ ~ c~,
(22)
u~>0. An optimal solution to problem (22) is obtained as a by-product of the network assignment algorithm for problem (17) with y = ytk-,~. The Benders integer program for K = I is equivalent to problem (16) and can be solved as an assignment problem. However, for K > 1, the Benders integer program is an iteger linear program with (N + D)2+ D binary variables, and 2(N + D) + K constraints. For realistic values of N and D, problem (18)-(21) cannot be solved efficiently using state-of-the-art integer programming techniques [9, 11]. It is not critical to the application of the Benders decomposition algorithm that we determine optimal solutions to the Benders integer programs. We present two procedures developed to obtain approximate solutions. Both procedures involve constructing a surrogate constraint [12] and solving an assignment problem as follows. Suppose ~k~>0 is determined for k = l ..... K such that ~ : j ~k=l. Multiply the k 'h Benders constraint in (19) by ~k for k = I .... , K and sum the results to obtain the surrogate constraint K
z >~ c2y + ~ l
~kuk(b -A2y).
(23)
Consider the problem of minimizing (18) subject to constraints (23), (20), and (21). This problem is equivalent to the following integer linear program. Minimize
W (K>= c2y + ~__, ~ kU~(b - A2y).
subject to
A3y = d, 0 ~< y ~< i and integer.
(24)
116
Robert R. Love/Tragic scheduling via Benders decomposition
Problem (24) can be efficiently solved as an assignment problem. Both procedures consider the y vector obtained from the solution of problem (24) as an approximate solution to the Benders integer program at iteration K. This solution is henceforth denoted by y(K) and is used in problem (22) to generate another Benders constraint. The two procedures differ in the method and effort used to determine the multipliers O~k.These methods are now described. Procedure 1. Relax the integer restriction in the Benders integer program (18)-(21) and solve the resulting linear programming problem by the DantzigWolfe decomposition algorithm [7, 14]. The Benders constraints (19) plus the convexity constraint form the constraints of the master problem and constraints (20) and (21) are the subproblem constraints. The multipliers o~k are the transfer prices at optimality and y(r) is the basic extremal column at optimality yielding the minimum value of z in constraints (19). If we denote by z0 the optimum value of z for the linear program (18)-(21), it can be verified from the structure of the cost and right-hand side vectors for the Dantzig-Wolfe decomposition formulation that r Zo = ~=1 OCku(k)(b -AzY(K))" These ock constitute a best set of multipliers in the sense that the optimum value of W (K) in problem (24) obtained by using these OCkis at least as large as the optimum value of W (K) obtained by using any other set of OCk(see [12]). Procedure 2. Let OCk= I [ K for k = 1..... K and let y(r) be a corresponding optimum solution to problem (24). An attractive feature of the Benders decomposition technique for MILP's is the ability to recognize a good solution. The bounding information available at each iteration provides this capability. Both procedures described previously yield an upper bound and a lower bound for the optimal objective function value of problem (15) at each iteration. For both procedures, the upper bound at iteration K is given by UK = minimum {clx (k) + c2y (k)} k=l,.. ,K
where x (k) is an optimal solution to problem (17) for y = y(k). The lower bound LK at iteration K for Procedure 1 is given by W(0K~, where W(0rr is the optimal objective function value of problem (24). For Procedure 2, the lower bound LK at iteration K is given by Lr = maximum {W~k)}. k=l ..... K
Robert R. Love/Tra~c scheduling via Benders decomposition
117
6. Computational results The Benders decomposition algorithm for each of the two procedures described previously was programmed on an IBM 370/165 using the FORTRAN IV language. The programs require approximately 250 kilobytes of core. Assignment problems having the form of problems (17) and (24) are solved using the Hungarian method [8, 17]. The program for each procedure employs three termination criteria: (1) (UK-Lr)/Lr<-e for some iteration K, where ~ is a sufficiently small user-supplied nonnegative constant such as 0.05, 0.02, or 0.01. (2) Maximum Benders iterations allowed. (3) Maximum CPU time allowed. An attractive feature of the program is the ability to stop execution at any time and restart the solution technique from the last complete Benders iteration, thereby allowing user interaction via the three termination criteria. Each of the procedures was applied to ten computer-generated problems. The first five problems had 16 trailers, 8 tractors, 10 locations, and 100 shipments to be made during a 2-week (336 hours) scheduling horizon. The last five problems had 67 trailers, 34 tractors, 30 locations, and 400 shipments to be made during a 2-week period. The initial locations of the trailers and tractors and the sources and destinations of the shipments were determined using a random distribution. The final distribution of trailers and of tractors was equal to the initial distribution. The time interval between trailer requirements for successive shipments was randomly distributed with the mean equal to the length of the scheduling horizon divided by the number of shipments. The distance between location i and location j was defined to be ( 1 0 0 x [ i - j D + 5 0 where[ [denotes absolute value. The costs for fleet shipments, independent hauler shipments, empty trailer movements, and unattached tractor movements were proportional to distances. Table 5 presents the computational results for Procedure 1, i.e., determining the OCk by linear programming. The termination criteria were e =0.01, maximum iterations allowed = 19, and maximum CPU time allowed = 7200 seconds. A solution that is within 1% of optimality is well within the normal accuracy of fleet operating cost data. Maximum iterations and CPU time allowed reflected computer space (256 kilobytes per partition) and computer processing capacity considerations. The first three columns report the iterations and CPU time in seconds required before (Ur-L~:)/Lr is less than or equal to 0.05, 0.02, and 0.01, respectively. If termination criterion 2 or 3 was employed, the rightmost column contains (Ug -Lr)/Lr for the last complete Benders iteration. Table 6 presents the computational results for Procedure 2; i.e., setting all ~ equal to 1/K at iteration K. The termination criteria were E = 0.01, maximum iterations allowed = 49 for Problems 1-5 and 19 for Problems 6-10, and maximum CPU time allowed = 7200 seconds.
Robert R. Love] Tranfc scheduling via Benders decomposition
118
Table 5 Problem number
0.05
Iteration
0.02
Time
Iteration
Time
(sec) 1 2 3 4 5 6 7 8 9 10
3 1 3 2 1 2 2 2 2 2
40 4 50 72 4 2030 1855 1628 1821 1443
Last iteration
0.01
Iteration
5 8 6 6 8 >3 >3 >4 >4 >4
99 290 177 251 269 >7200 >7200 >7200 >7200 >7200
Number (UK - LK)/LK
Time
(sec)
(sec) 10 19 19 7 > 19 >3 >3 >4 >4 >4
365 1531 1599 335 > 1421 >7200 >7200 >7200 >7200 >7200
19 3 3 4 4 4
0.0108 0.0249 0.0297 0.0224 0.0258 0.0278
Table 6 Problem number
0.05
Iteration
1 2 3 4 5 6 7 8 9 10
4 1 3 2 1 4 3 4 5 7
0.02
0.01
Time (sec)
Iteration
Time (sec)
Iteration
17 4 15 9 4 661 533 687 708 1338
12 10 13 8 9 8 8 9 16 17
52 44 64 33 38 1321 1422 1547 2264 3250
>49 39 >49 11 39 >19 >19 >19 >19 >19
Time (sec) >211 173 >240 49 166 >3138 >3379 >3265 >2689 >3632
Last iteration Number (UK - LK)/LK
49
0.0117
49
0.0125
19 19 19 19 19
0.0123 0.0112 0.0119 0.0175 0.0163
These results indicate that Procedure 2 is superior to Procedure 1 because of the computational effort required to obtain the OCk by the Dantzig-Wolfe decomposition technique. In most applications, the accuracy of the mathematical model (I)-(14) and the precislon of the fleet operating cost data do not justify the search for an optimal solution. A feasible solution to the mathematical program having an objective function value within 2% of the optimal value can be considered sufficient. Procedure 2 can be expected to yield such a solution in 1 minute or less for 100-shipment problems and in 1 hour or less for 400-shipment problems.
Robert R. Love/Tra~c scheduling via Benders decomposition
119
7. Extensions The determination of the multipliers oc k in Procedure 1 by the Dantzig-Wolfe decomposition algorithm required excessive computational time. Klingman and Russell [16] have developed a technique for efficiently solving constrained transportation problems. The feasibility of using this technique to obtain a 'best' set of multipliers should be examined. These multipliers could then be used to construct a surrogate constraint and a problem having the form of problem (24) could be solved to obtain an approximate solution to the integer program. If the cost of operating a tractor and a trailer can be divided into fixed and variable components, the traffic scheduling algorithm can be used to determine the optimal fleet size and the optimal initial distribution of tractors and trailers (assuming final distribution must be same as initial). The fixed charges are added to the costs of all arcs in (2) and (8) except for arcs which satisfy final distribution requirements and have the source and sink of the arc to be the same location. Opportunity costs can be considered in the traffic scheduling problem formulation (1)-(14) by including an extra factor in the cost of each arc in the trailer and tractor subproblems. Labor contracts often specify that drivers are paid delay time if their schedule requires layovers in a city other than their home city. If the tractors are all based in the same city (e.g., company headquarters), then a delay cost can easily be incorporated in the cost of each arc in the tractor subproblem. A tractor is normally associated with a team of drivers. Hence it may be desirable to balance the length of the tractor routes to achieve an equitable division of the total workload among the drivers. The length of a tractor route in the traffic scheduling problem formulation (TSP) is limited by the maximum distance a tractor can travel during the scheduling horizon. If the tractors are all based in the same city, then a penalty cost can be incorporated in the cost of each arc in the tractor subproblem to encourage a balance in the length of the tractor routes. For example, if the scheduling horizon is one week and it is desirable to have all drivers work from 7:00 a.m. Monday to 3:00 p.m. Friday; an 'idle' penalty cost can be associated with each arc that implies a tractor will be idle in the home city during this period. Similarly, an 'overtime' penalty cost can be associated with each arc that implies a tractor will not be idle in the home city during the period 3:00p.m. Friday to 7:00 a.m. Monday.
8. Application of the traffic scheduling model The Kelly-Springfield Tire Company operates a fleet consisting of 34 tractors and 67 trailers. This fleet makes shipments of two types: (1) shipments of raw materials from supplier locations to Kelly-Springfield's four manufacturing
120
Robert R. Lovel Tragic scheduling via Benders decomposition
facilities, and (2) shipments of finished goods (tires) from these manufacturing facilities to Kelly-Springfield distribution centers. The fleet was established and continues to function primarily to provide reliable service for shipments of raw materials, thereby allowing the factories to operate with significantly lower raw material inventories. Shipments of finished goods are made to avoid costly empty trailer movements, since many of the sources of raw materials are located near distribution centers. The traffic scheduling problem for Kelly-Springfield is: given the raw materi~ als shipments which are available to be made by the fleet during the subsequent week, determine a traffic schedule which: (l) maximizes the number of raw materials shipments made by the fleet, and (2) maximizes the savings generated by the fleet. The savings generated by a traffic schedule is defined to be: (l) the sum of independent haulers fees for shipments of raw materials and finished goods made by the fleet, minus (2) the total operating cost of the fleet for the traffic schedule. The traffic scheduler has complete flexibility in determining shipments of finished goods to be made by the fleet; i.e., it is assumed that the factories will stage production to satisfy the finished goods shipments required by the fleet. Any production in excess of fleet requirements will be shipped to the distribution centers by independent haulers. There are restrictions which the traffic scheduler must consider in developing a schedule for the fleet. Each of the tractors in the fleet must begin and end its weekly schedule at fleet operations headquarters located in Cumberland, MD. There are two reasons for this restriction: (I) maintenance on the tracaors is performed at fleet headquarters during the weekend, and (2) all of the drivers reside in Cumberland. Cumberland is an excellent location for fleet headquarters since located here are: (l) Kelly-Springfield's largest distribution center, (2) one of Kelly-Springfield's four factories, and (3) a shuttle to move finished goods shipments and empty trailers between the factory and the distribution center. Hence, each tractor in the fleet can be scheduled to return to Cumberland at the end of a week with a shipment of raw materials for the factory or a shipment of finished goods for the distribution center. The shuttle moves the empty trailers from the distribution center to the factory during the weekend. Each tractor in the fleet can then begin the subsequent week's schedule with a finished goods shipment from the Cumberland Factory. The traffic scheduler must consider the procedures for loading and unloading of shipments at distribution centers, factories, and raw materials suppliers. Kelly-Springfield's fleet consists of 34 tractors and 67 trailers. Each factory and each distribution center serviced by the fleet maintains a constant inventory of trailers, 33 in total. When the fleet arrives at a distribution center with a shipment of finished goods, the trailer is unhitched, an empty trailer from the distribution center's inventory is hitched, and the fleet continues its schedule. When the fleet arrives at a factory with a shipment of raw materials, the trailer is unhitched, a trailer (probably containing a shipment of finished goods) from the
Robert R. Love/Tragic scheduling via Benders decomposition
121
factory's inventory is hitched, and the fleet continues its schedule. When the fleet arrives at a raw materials supplier, the tractor must remain with the trailer while the shipment is loaded. How does Kelly-Springfield's traffic scheduling problem differ from the problem described in the Introduction and modeled in Section 1? The distribution of Kelly-Springfield's excess trailers (over tractors) remains constant; i.e., trailers are only 'swapped' at factories or distribution centers. Since the identity of individual trailers is not significant, we can consider a tractor and a trailer to be a unit for scheduling purposes. If all raw material shipments and finished goods shipments available to be considered by the fleet were given, then the traffic scheduling problem would have the form of the trailer subproblem (1)-(6) with a unit now being a tractor and a trailer instead of a trailer. This problem could be solved by a network assignment algorithm as illustrated by Fig. 1. The loading time for a finished goods shipment would be the time required to hitch the trailer. The loading time for a raw materials shipment would be the time required to load the trailer. The unloading time for a finished goods shipment or a raw materials shipment would be the time required to unhitch the trailer. The initial supplies and final requirements would be 34 units located in Cumberland at the beginning of the week, and at week's end. The primary objective of making as many raw materials shipments as possible could be attained by adding an arbitrarily large constant to the cost of making each such shipment by an independent hauler. However, the finished goods shipments to be considered by the fleet are not given. Instead, finished goods shipments can be scheduled to accommodate shipments of raw materials by the fleet. This additional flexibility must be incorporated in the model for the Kelly-Springfield traffic scheduling problem. Suppose we include only raw materials shipments available to be considered by the fleet in the model formulation (1)-(6). We define a route from supply i to demand j to be a feasible (timewise) path which the tractor and trailer (ignoring swaps) could follow from the location of supply i to the location of demand j, making any nonnegative number of finished goods shipments. The cost of a route is: (1) the operating cost of the fleet for the route minus (2) the sum of independent haulers' fees for finished goods shipments included in the route. We define Aij to be the minimum cost for all routes from supply i to demand j if a route exists, and oo otherwise. Then the Kelly-Springfield traffic scheduling problem can be modeled as the integer linear program (1)-(6), and can be solved using a network assignment algorithm. Consider the problem of determining a minimum cost route from supply i to demand j for the Kelly-Springfield traffic scheduling problem. It should be noted that each supply is available at a factory location, since the initial units are available in Cumberland, and the remaining units are available after making raw materials shipments to the factories. The first 'leg' of a minimum cost route from supply i to demand j must be either: (1) a direct movement of the empty trailer
122
Robert R. Love/Tranic scheduling via Benders decomposition
from the factory to the demand location, or (2) a minimum cost (fleet operating cost-independent hauler's fee) route to the demand location consisting of a finished goods shipment to a distribution center followed by an empty trailer movement to the demand location, or (3) a minimum cost route to a factory consisting of a finished goods shipment to a distribution center followed by an empty trailer movement to the factory. If alternative (1) is not feasible, then Ai; is equal to ~. The validity of (2) and (3) are dependent on the fact that the minimum cost route corresponds to the minimum time route in all cases. At the end of the first leg of the minimum cost route we are at either the demand location or a factory. In the latter case, the second leg of a minimum cost route must lie in the alternatives described above; as must all succeeding legs. These observations can be used to construct an implicit enumeration scheme [9, 11] to determine a minimum cost route from supply i to demand j. It should be noted that if two partial routes from supply i to demand j lead to a common factory, and if one partial route is more expensive and more time consuming, then this partial route can be fathomed. The solution technique for the Kelly-Springfield traffic scheduling problem uses the network algorithm developed for the trailer subproblem to solve the assignment problem given by (1)--(6) with the Aii determined using the implicit enumeration scheme described in the previous paragraph. An arbitrarily large constant is added to the independent hauler's fee for each raw materials shipment to achieve the primary objective of maximizing the number of raw materials shipments made by the fleet. Kelly-Springfield's fleet contains 34 tractors and 67 trailers, and there are approximately 100 raw materials shipments available to be considered by the fleet each week. The algorithm requires approximately 400 kilobytes of core and 6 CPU seconds on an IBM 370/165 to generate a weekly traffic schedule. The algorithm was used to generate a weekly traffic schedule for the KellySpringfield fleet each week for the period May 1 through May 28, 1978. These weekly traffic schedules were evaluated by the Traffic Department. Their analysis determined that the automated traffic schedules were comparable in quality to the manual schedules generated for the same period. This conclusion was based on comparisons of weekly total scheduled miles, weekly total empty miles, and weekly total savings generated by the fleet. The efficiency of the algorithm and the quality of the resulting schedules indicate that the algorithm has potential as a fleet scheduling tool. The traffic scheduling model and algorithm could also be used to rapidly evaluate the effects of modifications to the fleet size, expansion or contraction of the calendar of raw materials shipments available to the fleet, and changes in independent haulers' fees. To date, the traffic scheduling algorithm has not been implemented for the Kelly-Springfield fleet because a large number of changes to the fleet schedule are made during the actual operation of the fleet. The fleet scheduler, by constructing the initial schedule manually, gains familiarity with this schedule.
Robert R. Love/Traffic scheduling via Benders decomposition
123
This familiarity allows him to make better decisions when schedule changes are required. It should be noted that the efficiency of the traffic scheduling algorithm makes feasible an online traffic scheduling system which would be responsive to situations requiring fleet schedule changes during fleet operation. The fleet scheduler, using a remote terminal, would begin the traffic scheduling process by entering the initial and final distribution requirements and the shipments available for consideration by the fleet. The traffic scheduling algorithm would determine an initial schedule for the fleet and report it via the terminal or via a remote printer. The fleet scheduler could modify the list of shipments via the terminal as required during fleet operation and the algorithm would determine an updated schedule. The network formulation would 'freeze' the fleet schedule to-date by placing lower bounds of I on the appropriate arcs to reflect shipments that have been or are currently being made. The use of lower bounds on arcs would also allow the fleet scheduler to specify a route for a given tractor in the fleet or a portion of a route that must be followed by some tractor in the fleet. The algorithm would then determine the remainder of the fleet schedule. It should be noted that the processing time for modifications to the fleet schedule could be dramatically reduced by using the previous solution (adjusted if necessary, e.g. additional shipments) as a quick-start solution. The data support aspects of this interactive traffic scheduling system must be considered.
9. Applicability o| the general traffic scheduling model The principal difference between the general traffic scheduling model (TSP) and the Kelly-Springfield model is that the general model allows movements of unattached tractors; i.e., tractors without trailers. In the Kelly-Springfield model, trailers are only dropped at company locations; and if a trailer is dropped, another trailer is picked up before the tractor continues its route. The practice of moving unattached tractors is known in the trucking industry as 'bobtailing'. Although this practice is probably not common in private (companyowned) fleets, it is used extensively by trucking company or 'common carrier' fleets. If a company engages a common carrier to make a shipment, and if the tractor and driver(s) remain with the trailer while it is being loaded, the company pays very costly 'detention' charges if the trailer is not loaded within a specified period. The company avoids detention charges and may obtain a reduced rate for the shipment if the common carrier 'spots' the trailer at the company location for pickup at a later time. The common carrier avoids having a tractor and driver(s) tied up during the specified period before detention charges are applicable. Hence a common carrier will frequently spot a trailer at a given location and have the tractor bobtail to another location. The general model would be preferred to the Kelly-Springfield model for the scheduling of these
124
Robert R. Love/Tra~c scheduling via Benders decomposition
common carrier fleet operations. It should be noted that for the scheduling of common carrier fleets, an independent hauler is another common carrier and the cost to make a shipment by an independent hauler is the profit lost if the shipment is not made by the fleet being scheduled.
Acknowledgment The author wishes to thank Professor John M. Mulvey of Princeton University and an anonymous referee for their comments and suggestions which have significantly improved this paper.
References [1] E. Balas and C. Bergthaller, "Benders method revisited", Management Sciences Research Report No. 401, Carnegie Mellon University (1977). [2] R. Barr, F. Glover and D. Klingman, "The alternating basis algorithm for assignment problems", Mathematical Programming 13 (1977) 1-13. [3] J.F. Benders, "Partitioning procedures for solving mixed-variables programming problems", Numerische Mathematik 4 (1962) 238-252. [4] G.H. Bradley, G.G. Brown and G.W. Graves, "Design and implementation of large-scale primal transhipment algorithms", Management Science 24 (1977) 1-34. [5] P. Camion, "Characterization of totally unimodular matrices", Proceedings o[ the American Mathematical Society 16 (1965) 1068-1073. [6] G.B. Dantzig and D.R. Fulkerson, "Minimizing the number of tankers to meet a fixed schedule", Naval Research Logistics Quarterly 1 (1954) 217-222. [7] G.B. Dantzig and P. Wolfe, "The decomposition of mathematical programming problems", Operations Research 8 (1960) 101-111. [8] L.R. Ford, Jr. and D.R. Fulkerson, Flows in networks (Princeton University Press, Princeton, NJ, 1962). [9] R.S. Garfinkel and G.L. Nemhauser, Integer programming (Wiley, New York, 1972). [10] A.M. Geoffrion and G.W. Graves, "Multicommodity distribution system design by Benders decomposition", Management Science 20 (1974) 822-844. [11] A.M. Geoffrion and R.E. Marsten, "Integer programming algorithms: A framework and state-ofthe-art survey", Management Science 18 (1972) 465-491. [12] F. GIover, "Surrogate constraints", Operations Research 16 (1968) 741-749. [13] F. Glover, D. Karney, D. Klingman and A. Napier, "A computational study on start procedures, basis change criteria, and solution algorithms for transportation problems", Management Science 20 (1974) 793-813. [14] G. Hadley, Linear programming (Addison-Wesley, Reading, MA, 1962). [15] T.C. Hu, Integer programming and network flows (Addison-Wesley, Reading, MA, 1969). [16] D. Klingman and R. Russell, "Solving constrained transportation problems", Operations Research 23 (1975) 91-106. [17] H.W. Kuhn, "The Hungarian method for the assignment problem", Naval Research Logistics Quarterly 2 (1955) 83-97. [18] R.R. Love, Jr., "Multi-commodity production and distribution scheduling with capacity and changeover restrictions", Ph.D. Dissertation, The Johns Hopkins University, Baltimore (1974). [19] J.M. Mulvey, "Testing of a large-scale network optimization program", Mathematical Programming 15 (1978) 291-314. [20] V. Srinivasan and G. Thompson, "Benefit-cost analysis of coding techniques for the primal transportation algorithm", Journal o[ the Association [or Computing Machinery 20 (1973) 194-213.
Mathematical Programming Study 15 (1981) 125-147. North-Holland Publishing Company
A SCALED REDUCED GRADIENT ALGORITHM FOR NETWORK FLOW PROBLEMS WITH CONVEX S E P A R A B L E COSTS* Ron S. DEMBO SOM, Yale University, Box 1A, New Haven, CT 06520, U.S.A.
John G. KLINCEWICZ Bell Laboratories, Holmdel, NJ 07733, U.S.A. Received 21 August 1978 Revised manuscript received 20 November 1979
In this paper we present an algorithm for the convex-cost, separable network flow problem. It makes explicit use of the second-order information and also exploits the special network programming data structures originally developed for the linear case. A key and new feature of the method is the use of a preprocessing procedure that resolves the problem of degeneracy encountered in reduced gradient methods. Some preliminary computational experience with the algorithm on water distribution problems is also presented. Its performance is compared with that of a reduced gradient and a convex simplex code.
Key words: Nonlinear Network Optimization, Nonlinear Programming, Reduced Gradient Method.
I. Introduction In recent years, the development of special data structures for storing and updating the spanning tree of a network has resulted in new, efficient primal simplex algorithms for network flow problems with linear costs. Transshipment and assignment problems with tens of thousands of arcs and nodes are now routinely solved in a fraction of the time required by a production linear programming software system [5, 12, 15]. As a natural consequence, researchers have used this available new technology to design algorithms for network optimization models with convex objective functions [1, 7, 8]. This has not been merely an esoteric exercise: there are important engineering and economic problems in which a convex separable objective has to be minimized, such as water distribution [7, 13] and resistive electrical network problems [8]. Also, multicommodity nonlinear network flow problems arise in equilibrium models for traffic assignment either on road [9, 21] or computer networks [2, 3, 6]. * This work was supported in part by National Science Foundation Grant No. ENG78-21615. 125
126
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
Apart from some very recent developments [2, 3], programming algorithms for convex separable network flow problems have been based on methods that use a linearized subproblem to generate search directions. Three algorithms that have been used extensively are the Frank-Wolfe [9, 10, 21], convex simplex [7, 14] and piecewise-linearization methods [7, 18, 19]. The use of linear search direction procedures has been motivated primarily by: (i) the ability to capitalize on recent improvements to algorithms for linear flow problems and (ii) the natural decomposition that is induced in the multicommodity case. Unfortunately, algorithms that neither use nor attempt to approximate second-order information can exhibit slow convergence. Indeed, the primary dissatisfaction with the above algorithms has been the extensive empirical evidence of their poor convergence behavior. The purpose of this paper is twofold. First, we wish to examine a reduced gradient algorithm in this context, since folklore has it that it outperforms the above algorithms empirically and because its asymptotic convergence rate is known to be superior to that of the Frank-Wolfe and convex simplex methods. Moreover, reduced gradient algorithms are able to exploit the efficient network data structures that have been developed for linear network programming. Our second objective is to develop procedures for enhancing the rate of convergence of reduced gradient methods by introducing second-order information, but without incurring a drastic increase in overhead per iteration. We feel that we have achieved some degree of progress toward attaining both these goals, as is evidenced by the computational results presented in Section 5. The first goal requires an efficient mechanism for handling degeneracy. Here we have developed a novel procedure (see Section d the 'maximal basis procedure') that has application to general linearly-constrained optimization. To achieve the second goal we have introduced a dynamic scaling technique, coupled with a heuristic for conditioning the reduced-Hessian, which appears to dramatically improve computational performance. In the following section we give a precise statement of the convex separable network flow problem and define our notation.
2. Problem formulation
A directed network consists of a finite set 2r of nodes and a finite set M of arcs. Each arc originates at some node i and terminates at another node k. Multiple arcs are allowed; that is, more than one arc may be directed between the same pair of nodes. In a network flow model, several of the nodes have associated with them either a supply (net inflow) or demand (net outflow) of a particular commodity. This commodity then travels from node to node across the directed arcs so that the supply satisfies the demand. The total amount of the commodity that travels
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
127
across a particular arc is called the flow on that arc. A nonlinear network flow problem ( N L N ) is a mathematical programming problem of the form: minimize X
f(x),
subject to
~xj-~x
(2.1)
(NLN)
lt <~xj <- uj
i=b;
for i = l, ... , m,
(2.2)
for j = 1. . . . . n
(2.3)
where f:R"~R,
m = number of nodes, n - - n u m b e r of arcs, xj = flow on arc j, bi = fixed supply (if positive) or demand (if negative) at node i, Wi = set of arcs originating at node i, Vi = set of arcs terminating at node i, uj = maximum allowable flow on arc j, li = minimum allowable flow on arc j. In this paper we are concerned with applications in which the objective function f ( x ) is c o n v e x and separable, that is, n
f(x) = ~ lj(x~) j=l
where each fj(xj) is a convex function. However, the algorithm we propose is applicable to more general functions, provided one is prepared to generate search directions based on an underlying separable model. It is convenient to introduce a fictitious collector node, numbered m + 1, into the network. We also introduce additional (artificial) arcs that lead from each of the supply nodes to the collector node and from the collector node to each of the demand nodes. A high cost is placed on these artificial arcs to ensure that no flow is carried on them in an optimal solution to (NLN). With these additional arcs, the flow conservation constraints (2.2) may be expressed in matrix form as: A x = b.
(2.4)
Each column of A corresponds to an arc and each row to a node of the directed network. The only nonzero elements in a column are a +1 in the row corresponding to the node where the arc originates and a - 1 in the row corresponding to the node where the arc terminates. The row corresponding to the collector node is not included in A and hence columns corresponding to artificial arcs have only one nonzero element. By construction, A is therefore an m by n + m matrix of rank m ( = number of nodes).
R~S, Dembo, J.G. Klincewiczl Nonlinear network algorithm
128
Following [20] we partition columns of A as follows:
A=IB
$
NI
(2.5)
where the columns of B form a basis; the columns of S correspond to superbasic arcs, that is, nonbasic arcs whose flow may vary between the bounds (2.3); and the columns of N correspond to nonbasic arcs with flow at either an upper or lower bound. In a similar fashion we define and partition the following:
x a_ [xB
Xs
g(x) a_ [g8
xNlr = vector of current flows, gs
gN]x =gradient of objective function,
G(x) ~- diag(G~, Gs, GN) -- Hessian of ]'(x), assumed to be separable, convex, [PB Ps P ~ IP~
Ps
PN]r =current search direction.
By construction of A we know the following [5]: (i) For any basis B of the columns of A, there exists an ordering of the rows and columns such that B is upper triangular. (ii) The arcs corresponding to columns in a basis form a spanning tree of the network. Most of the gains made in the development of primal simplex algorithms for the linear network flow problem are a result of attempts to fully exploit this structure. Special node length vectors called the depth array, the predecessor array and the threaded-index store the information contained in the basic spanning tree. A complete description of these data structures can be found in the literature [5]. In addition to these data structures that appear in the primal simplex linear network codes, we also keep a node length vector called the reverse-thread which allows us to trace the rows in a triangular basis matrix in reverse order. All of the above arrays can be easily updated whenever a change is made in the basis. Use of these structures allows the following procedures to be executed efficiently: (i) Compute first-order estimates of the multipliers associated with constraints (2.2) in (NLN), from B T w = g~
(2.6)
where B is an upper triangular basis, and g8 is the vector of gradient elements associated with the basic arcs. The threaded-index and predecessor arrays are used to solve this lower triangular system of equations by forward-substitution. (ii) Find a [easible direction for basic variables by solving BpB = - S p s
(2.7)
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
129
where B is an upper triangular basis, and ps is a given superbasic search direction vector. The reverse-thread and predecessor arrays are used to solve this upper triangular system of equations by back-substitution. (iii) Identification of the cycle formed by the addition of an arc to the basic spanning tree. Any nonbasic arc forms a cycle with some subset of the basic arcs, which can be identified with the help of the predecessor and depth arrays. (iv) Identification of the subtrees formed by the removal of a basic arc. Removing an arc from the basic divides the node into two sets, each of which is spanned by a subset of the remaining basic arcs. The threaded-index and depth arrays can be used to identify each set of nodes. Any nonbasic arc that leads from a node in one set to a node in the other can then enter the basis in place of the removed arc.
3. Computing a feasible Newton search direction for (NLN)
Given a feasible point, x, a constrained Newton algorithm for (NLN) would involve solving the following quadratic program to compute a feasible search direction.
(NS)
minimize
89
subject to
Ap = 0,
+ g(x)Tp,
(3.1)
p
(3.2)
pj > 0
if xi =/i,
(3.3)
Ps <- 0
if x s = uj.
(3.4)
Under certain smoothess assumptions (see [22] for example) an algorithm that uses the direction generated by (NS) will exhibit a Q-quadratic rate of convergence. However, in a large-scale setting, the cost of solving (NS) at each iteration would probably be prohibitive. Our aim here is to approximate a solution to (NS) as closely as possible, bearing in mind that storage is at a premium for large problems and that too much time spent computing search directions might make the algorithm impractical. It is instructive to see how the solution to (NS) would be computed in order to best generate approximations to it. If one ignores the (complicating) constraints pj -> 0 and Ps < 0, then (NS) may be solved via a primal or dual approach, both requiring the solution of a positive-definite, symmetric system. Since our algorithm may be viewed as one that approximates either the primal or dual system, we describe one of these approaches in detail.
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
130
3.1. Primal constrained-Newton
Let 2 be an (n + m) by n matrix whose columns form a basis for the null space of A, giving A Z = O. If the change of variables p = Zy is made, then (NS) becomes an unconstrained problem whose solution may be obtained by solving:
(2rG2")y = -(2rg).
(3.5)
The optimal solution p to (NS) is recovered from: p = s
(3.6)
We will restrict ourselves to matrices Z" of the form: Z =
I
s
0
tl-$
(3.7)
bearing in mind the ill-conditioning that may result in (3.5) from a poor choice of basis. We do this for two reasons. Firstly, there does not seem to be any other practical choice of Z for large-scale programming and secondly, the above mentioned network programming data structures make it very inexpensive to operate with the Z defined in (3.7). To o v e r c o m e any induced ill-conditioning we propose a heuristic in Section 4 which appears to work extremely well on our test problem set. Note that for the above choice of Z (3.5) and (3.6) become (ZTGZ)ps = -(ZTg),
(3.8)
BpB = - Sps,
(3.9)
pN = 0.
(3.10)
At this stage it is worth recalling that we have ignored constraints (3.3) and (3.4) in (NS). These constraints pose no problem for nonbasic or superbasic variables. They do, however, pose a serious difficulty for basic variables. 3.2. Computing a search direction when s o m e basic variables are at their bound
Most of the computational effort in a Newton-type algorithm for ( N L N ) will be concentrated in solving or approximating a solution to (3.8). If, after all this effort, a direction Ps is computed that results in a basic search direction pa (computed using (3.9)) violating one of the constraints ps, -> 0 (if xsj = li) or ps, -< 0 (if xsj = us), then a new basis must be found, a new search direction (pB, Ps) computed, and so on. This can be a very costly procedure since it results in iterations where no progress is made in the objective function. The analogy in linear programming is a degenerate pivot, which can occur very frequently in network problems. To overcome this difficulty, we propose introduction of a preprocessing procedure which constructs a basis that guarantees that, for a
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
131
particular choice of ps, pB will satisfy (3.3) and (3.4), thereby ensuring that no null steps will be taken.
3.3 The concept of a maximal basis Definition 1. An arc is termed free if the current value of flow on that arc is not at a bound. Definition 2. A basis B is maximal if there is no alternate basis with a greater number of free basic arcs. A maximal basis corresponds to the following combinatorial problem: n+rn
(MB)
maximize
~
[choice of basis BI ~---I=
c~z~
(3.11)
where I
if arc j is free, if arc j is not free.
0
if arc j is in the basis, if arc j is not in the basis.
ci = z, =
This is a maximal spanning tree problem with costs c i on the arcs. It is well-known that it may be solved by a greedy algorithm that proceeds as follows. The tree is built one arc at a time. First, the arcs j with cj = 1 are considered (in any order). Each free arc examined is included in the basis if possible; otherwise, it can be excluded from further consideration. The motivation for defining a maximal basis is given in the theorem below.
Theorem 1. Suppose a maximal number of free arcs is in the basis. Then an adjustment of flow on a free superbasic arc requires adjustment of flow only on free basic arcs in order to satisfy [3.2]. Proof. Call the superbasic arc j. It f o r m s a cycle with some subset of the basic arcs. Any change in the flow on j will only affect basic arcs in this cycle. If some basic arc in this cycle were at one of its bounds, we could replace it with the superbasic arc j, thereby increasing the number of free arcs in the basis. This contradicts the initial assumption of a maximal basis. T h e o r e m 1 assures us that if we operate with a maximal basis, the only out-of-basis arcs that need cause any concern with regard to feasibility are those moving a w a y from a bound.
Theorem 1 may be shown to be valid in the more general context of linearly constrained optimization. This has implications for the reduced gradient method and will be pursued elsehwere.
R.S. Demho. J.G. Klincewicz/ Nonline~r network algorithm
132
Constructing and maintaining a maximal basis simply requires: (i) identifying free arcs, (ii) partitioning the variables into basic arcs and those not in the basis and (iii) solving the resulting combinatorial optimization problem (MB) (3.1 I) using a greedy algorithm. It is worth pointing-out that in a 'cold start' situation with no free out-of-basis arcs, an all-artificial basis is maximal. In the following section we describe a scaled reduced gradient (SRG) algorithm which operates with a maximal basis.
4. The scaled reduced gradient (SRG) algorithm The reduced gradient algorithm uses the direction: p = Zv
(4.1)
c = -ZTg.
(4.2)
where
This is simply the direction of steepest descent in the subspace defined by the current active constraints. One may also view (4.1) as an approximation to the constrained-Newton direction (3.5) and (3.6) in which the reduced Hessian Z r G Z is replaced by an identity matrix. The chief advantage of the reduced gradient direction is that it is cheap to compute and requires relatively little storage. Its main disadvantages are that it is usually a poor approximation to the Newton direction and that it generates an algorithm with a linear rate of convergence that depends on the conditioning of Z T G Z (see [17, p. 265]). One possible way of improving the approximation, without placing a burden on storage requirements or the cost of computing a feasible direction, is to replace Z T G Z in (3.5) by some suitable (easy to invert) positive definite matrix M. That is, we propose the scaled search direction p = Zv
(4.3)
Mr = - Z r g .
(4.4)
where
Using the definition of Z in (3.7), (4.3) and (4.4) may be computed as follows: The S R G Search Direction: p J = (p~
p~
p~)r
Step I:
BTw = g~,
(4.5)
Step 2:
My = S r w - gs.
(4.6)
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
133
I O if (Xs)j = lj and vj < 0 , Step 3:
Ps=
0
i f ( x s ) j = u i a n d v~>O,
(4.7)
vi otherwise, PN = O. Step 4:
BpB
=
-
(4.8) Sps.
(4.9)
Remarks. (i) Eq. (4.5) yields a first-order estimate, w, of the optimal KuhnTucker multipliers. (ii) The conditions in (4.7) ensure that (3.3) and (3.4) are satisfied for the superbasic variables. (iii) If the basis B is maximal, PB computed using (4.9) guarantees that the feasibility conditions (3.3) and (3.4) will also be satisfied for the basic variables. Theorem 2. Assume that basis B is maximal with respect to the superbasic variables and M is a positive definite scaling matrix. Then the SRG direction is a feasible descent direction. The proof is straightforward and will be omitted here. The theorem below justifies the introduction of a scaling matrix M. Theorem 3 (Rate of convergence of the SRG algorithm). Suppose (i) f(xB, xs) is a convex quadratic function having a unique minimum at
(XB, Xs) (ii) SRG produces a sequence {(x~, x~)} converging to ( x L x~), and (iii) the partition x = (xB, Xs) is the same throughout the tail of the sequence. Then the sequence of objective function values {f(x~)} converges to f(x*) linearly with a ratio no greater than [ ( Q - q ) / ( Q + q)]2 where Q and q are, respectively, the largest and smallest eigenvalues of the matrix M I/2TZTGZM-~/:. The proof of this theorem is almost identical to that given in [17] for the reduced gradient algorithm. Details may be found in [16]. By careful choice of M one may therefore enhance the rate of convergence of the reduced gradient method. It is instructive to examine the structure of ZTGZ for our particular choice of Z. Using (3.7) we have: ZT GZ = [S-T B-T GBB-1S + Gs].
(4.10)
It is clear from (4.10) that the choice of M should not be independent of the choice of basis. Rather, we propose choosing the partition [B S N] and the matrix M so that they reinforce each other in reducing the condition number of M-I/ZZTGZM 1/2. We have experimented with two easily computable scaling matrices (I) M = Gs and (II) M = diag(ZTGZ).
134
R.S. Demho, J.G. Klincewicz/ Nonlinear network algorithm
(I) M = Gs If all the basic arcs were free and linear (i.e. GB = 0), then ZTGZ = Gs and SRG with M = Gs is Newton's method. Thus the choice of M = Gs, coupled with a conditioning procedure that chooses [BSN] so that HGB]] is as small as possible should, in theory, enhance the rate of convergence. A heuristic for approximating this choice of basis is given in Section 4.1 below. (II) M = diag(ZTGZ) For convex separable network flow problems elements of diag(ZTGZ) are cheap to compute since
(ZT GZ)" = ie~, G~ + Gi,
(4.11)
where C, is the set of basic arcs that appear in a cycle with superbasic arc i. In this case the choice of basis partition is not as obvious as in (I). Our experiments reported in Section 5 indicate that attempting to keep IIGsH small works well.
4.1. Conditioning the basis In view of the above discussion we seek a maximal basis with minimal IIGaH. To find such a basis would probably be too expensive to warrant the expected benefits. Instead we propose the heuristic below which reduces IIG~II without affecting the maximality of the basis.
Conditioning Procedure Step l: Create a list of the free basic arcs whose cost functions have the largest second derivative values. The length of this list is determined by choice of a parameter MTEST. Step 2: Set j = I. Step 3: Consider the jth arc on the list. Search to find a free superbasic arc (i) whose cost function has a smaller second derivative value, and (ii) that can enter the basis in place of the jth arc on the list. Step 4: If such an arc is found, pivot it into the basis. If j < M T E S T , set j~-j + I and go to Step 3. Executing the above at every iteration would probably be too expensive. We therefore introduce a second parameter, MITER. The above steps are then performed every M I T E R iterations after an initial feasible flow has been obtained. Furthermore, in the course of computation if no pivot results on a particular application of these steps, we double the value of MITER.
4.2. Identifying the superbasic variables The size and content of the superbasic set is likely to have a marked effect on
R.S. Dembo, J.G. Klincewic:/ Nonlinear network algorithm
135
the performance of the algorithm. In a large-scale setting a candidate list structure is essential. Exactly how such a list should be constructed can only be determined by experiment and will not be pursued here. An immediate consequence of Theorem I is that the only out-of-basis arcs that are of concern in defining the superbasic set are those that are at their bounds. We therefore need a mechanism for testing whether or not these arcs may be included in the superbasic set. Since the SRG search direction can be obtained with almost no additional computational effort when the above test is carried out, the two computations are executed simultaneously in the following procedure.
Test for a nonbasic arc to enter the superbasic set Step l: Assume arc j leads from node i to node k. Compute Vj = Wi -- Wk -- ~ cgXi
(i.e. the jth element of N r w -gN). Step 2: If j is at its lower bound and vi < 0 or if j is at its upper bound and v~ > 0 , then fix the search direction for arc j at zero for this iteration (i.e. j remains nonbasic), and proceed to test the next arc. If not, execute Step 3. Step 3: Trace around the cycle that j forms with the basic arcs. If rj is positive (negative) and if there are arcs at their upper bound in the same (opposite) direction as j around the cycle, or there are arcs at their lower bound in the opposite (same) direction to j around the cycle, then fix the search direction for arc j at zero for this iteration (i.e. j remains nonbasic) and proceed to test the next arc. In this case, we refer to arc j as a blocked arc. If j is not blocked, execute Step 4. Step 4: Compute (ps)j = r~/Mji and increment -Sps. Remark. In the convex simplex algorithm, a blocked arc identified in Step 3 would be pivoted into the basis. We do not do this since it would require recomputing superbasic search directions already determined.
4.3. Global convergence of the algorithm From the above discussion, the search direction computation will yield one of two possible outcomes. Case (i) A nonzero descent direction will be calculated or Case (ii) The only arcs which violate optimality conditions will be blocked arcs. The following theorem guarantees that if Case (ii) occurs and we pivot blocked arcs into the basis one at a time, then eventually a nonzero descent direction will result.
136
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
Theorem 4. If Case (ii) occurs, it is always possible to generate a basis that allows a nonzero adjustment of flow. Proof. This is equivalent to finding an improving basis for a linear programming problem with linear cost coefficients cj = af/Oxi and may be achieved without cycling, provided some appropriate pivoting rule [4] is followed. The algorithm will always generate a descent direction unless optimality has been achieved. Two more components are needed before we can prove convergence to a minimum from an arbitrary feasible starting point. (a) A step in the direction of search must be chosen to guarantee a sufficient decrease in the objective function [11] and (b) An anti-jamming [17] mechanism should be incorporated into the algorithm. To achieve (a) we use the safeguarded steplength algorithm suggested in [l 1]. To achieve (b) we pivot blocked arcs into the basis only when we are close to optimality with respect to the current superbasic set. That is, when: IIsTw - gsll~ -< &
(4.12)
The tolerance ~ may be varied in the course of computation. It should be large relative to the optimality tolerance on the reduced gradient during the initial stages and small when near a solution. An overview of the SRG algorithm is given in the flowchart below (fig. 1).
5. Computational experience An experimental FORTRAN code [ 16] for solving water distribution problems, based on the ideas presented in Section 4, has been written. This section reports some computational results comparing the behaviour of our SRG code with that of the convex simplex (CS) code of Helgason and Kennington [14] and the reduced gradient (RG) option in our code. All the codes were written in FORTRAN, compiled using the F40 Compiler and run in double precision on a DECSYSTEM 20/50 computer operating under TOPS-20.
5.1. Test problems, optimality criteria and parameter settings The three test problems, which are by current standards representative of small, medium and large single-commodity nonlinear flow problems, are the following: (i) a 30 node, 46 arc problem, (ii) a 150 node, 196 arc problem and (iii) a 666 node, 906 arc problem.
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
137
1,n,t,o, Mo~,~I Bo,,, I Colculote
w from (4,5)
I~,
T
i J Pivot o bloc blocked arc intO I J the bosis,hin ploceof on arc ot 0 boI bound l
Colculote $eorch directions
for superbosic otct, from (4.7 Increment - SPs
b Yes
I
I TEST FOR A NONBASIC ARC TO ENTER SUPERBASIC SET No I Computethe boS~Cvorloble ~ - ~ seorch direction from (4.B) Perform
CONDITIONINGPROCEDURE every MITER ~terotlons opproximotes rn~nf(x § p) updoLe x
Compute =mthot
and
I
r
Fig. I. Flowchart of the SRG algorithm.
All three are extracted from the water-distribution system of Dallas, TX, which is described in [7]. In all cases, SRG and its RG option start with an all-artificial basis and they terminate when: maxJ(S Tw - gs)iJ -~ 0.1.
(5.1)
I
This is the same as the optimality criterion used by the convex simplex code (CS) and the criterion used by city planners in Dallas. For the water distribution problems tested here, the number of bounded arcs is relatively small and therefore, blocked arcs are likely to occur infrequently. Thus, relatively little work is required to test the nonbasic arcs and hence we chose to allow as many arcs as possible to enter the superbasic set at each
138
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
iteration. The pivoting tolerance, 8, is set dynamically according to: 8 = 0.9 maxl(ATw jec
g)[
(5.2)
where C is the set of blocked arcs. For other problems, where the blocking phenomenon may occur frequently, careful study of the choice of 8 is warranted. The SRG algorithm requires a positive definite scaling matrix M. In order to guarantee this for our choices of M, we modify M as follows: Mij*-max(Mjj, HTOL).
(5.3)
The experiments reported here all used HTOL = l0 -7. The codes used in our experiments differ with regard to the line-search procedure used. Helgason and Kennington's code (CS) uses interval bisection and takes advantage of the fact that at each iteration in the convex simplex algorithm, flow changes on only one superbasic arc and on those basic arcs forming a cycle with it. For SRG and RG, the flow may change on any number of arcs during the course of an iteration. Naturally, then, the proportion of time per iteration spent doing linesearch in SRG and RG is higher than in CS. The linesearch algorithm used by SRG and RG is a successive polynomial approximation procedure with safeguards, based on [11]. We use the Newton step T
a ~ = p-~Gp
(5.4)
to initiate the steplength algorithm. The algorithm proceeds until it finds an a* such that
lg(x + a * p ) T p [ Ig(x)Tp[
< "1,
o<
-- 'r
(5.5)
< i. "O
Then a* is checked to see whether f(x + a ' p ) - f ( x ) <- p.a*g(x)Tp,
0 < p. < 1
(5.6)
holds. If it does not satisfy (5.6), then the value of a* is successively halved until the criterion (5.6) is met. The choice of rl in (5.5) determines the accuracy of the linesearch. A small value of r/corresponds to an accurate linesearch whereas n ~ 1 results in a very inaccurate linesearch. Since, in our case, function and gradient evaluations are expensive relative to the work done per iteration, we settled for a moderately inaccurate linesearch with 71 = 0.2. Criterion (5.6) demands that the decrease in objective function value be some fraction of the expected decrease, a*grp, based on a first-order analysis estimate. In all our experiments, the relatively high value of p. = 0.5 was used, the
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
139
effect being to require a large function decrease relative to the steplength a*. In our experience the Newton step (5.4) invariably satisfies both (5.5) and (5.6) near a minimun and (5.6) is invoked relatively infrequently. We note that since convexity and the availability of the Hessian is assumed, (5.6) may be replaced by:
I(x+~*p)-I(x)<-~ a*g(x)Tp+
pTG(x)p ; 0 < ~ ' ~ 1 .
(5.7)
This criterion is likely to provide a more robust algorithm but was not explored here. An additional tolerance parameter, ~, was introduced in the case of SRG with M = Gs as follows. The scaling matrix M = Gs was introduced only when
llo *pll < Ilxll § l
(5.7)
For ~ small, this has the effect of using RG until one is close to optimality and then switching to SRG (M = Gs). We found a value of E = 0.001 to be suitable but were able to improve convergence on particular problems using different values.
5.2. Computational results Tables 5.1, 5.2, 5.3 and 5.4 contain a summary of our test runs on the 30, 150 and 666 node problems using three different conditioning strategies and two different linesearch tolerances. In the discussion below we isolate various aspects of the SRG algorithm that are highlighted in these results.
5.2.1. The effect of conditioning As can be seen from Tables 5.1, 5.2 and 5.3 conditioning dramatically affects the convergence of both RG and SRG. In particular consider the case of SRG with M = Gs. This algorithm, without conditioning, converges more slowly than RG for each of the three test problems. However, with conditioning, SRG does consistently better than RG (even under a variety of linesearch and other tolerance settings). Thus it appears that the heuristic procedure significantly improves the conditioning of G~f~2ZrGZG~12. Based on the observation that the more conditioning we attempted, the better SRG performed, we are confident that other, improved heuristics for conditioning can be found and predict that they will greatly improve the convergence of SRG. The heuristic settings in Tables 5.1, 5.2 and 5.3 represent a compromise that was relatively cheap to implement in terms of CPU time. Proportionately, the 30-node problem received the most conditioning and showed the greatest improvement in convergence (219 iterations without vs. 50 with conditioning). Note that convergence of RG improved substantially with some conditioning
140
R.S. Dembo. J.G. Klincewiczl Nonlinear network algorithm
II
N
II
i-,
II
~
~.~
~
:~
'
_ Z
~-
L~
.6~ II
v,~ +__o~
f~
~I
I
r-,
l~, ~
~.~
-~. -~
,.-
..~
~ ~,, ~._~
"~
"" 0
~-
e-
~.-~~,-
=9
~
.= ~_ . . . .
~
~f.-
~
~'
_ ~
~.
C
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
~ ~. ~. ~. ~. ~ .=
r
- r
,~. r ~
~,-,,
~
- ~ - . ~
II
-
~
d
~
.~
~
e~
t~ II
II
e~
I"~
~r
0
,.2 0
141
142
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
~
_
~
~
H
T,
":'--,.4--
--
r~
7, o~
,,,
~ II
Am~O A A
~m
--
I
~5
%
T,
0
I'~ ''~
o~
0
0
,,4
"=E
-"
~
~...
',=
~-~ =o'S
==
"=
E
~ ~ .~ ~
o..~o ~"= ..
R.S. Dembo, J.G. Klincewiczl Nonlinear network algorithm
~N
143
N
~ - ~
T,
H
" II ~ [....
II
~
['"
II
[..
,,4 =9
~
".=
=
=
.,=
e-
....
~
!o ~
E-
144
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
but deteriorated somewhat when the parameter MTEST was increased further (see Tables 5.1 and 5.2). This is most likely due to the fact that the heuristic was not designed to minimize the condition number of ZTGZ. The most obvious result of these experiments, however, is that for any given setting of the conditioning parameter MTEST, SRG (M = diag{ZTGZ}) outperforms both RG and SRG (M = Gs). It requires fewer iterations than RG even if the conditioning heuristic is not applied. For all parameter settings, there is a gradual increase in optimization time per iteration from RG to SRG (M = Gs) to SRG (M = diag{ZTGZ}. This is to be expected since second derivative values are not stored and so the diagonal elements of M must be computed at each iteration.
5.2.2. Time spent conditioning Tables 5.1, 5.2 and 5.3 show that conditioning adds roughly 3% to the total time taken to reach an optimum. This is largely due to the data structures that permit one to exchange basic and nonbasic variables in an efficient manner.
5.2.3. Linesearch times (Readers are cautioned that CPU times were measured in a multiprogramming environment in which a 10% variation in measured CPU times can occur.) The average linesearch time per iteration is lower for CS than for RG or SRG. This is to be expected since far fewer variables change from one iteration to the next in CS than in RG and SRG. However, the extra time spent obtaining a direction and performing a linesearch (in SRG and RG) is more than compensated for by improved rates of convergence. On the average, SRG and RG spend more on one dimensional searches than on actually computing a search direction. This is particularly interesting since we demand a relatively inaccurate linesearch (a~ =0.2 in (5.5)). Tables 5.1, 5.2 and 5.3 show that SRG averages approximately 1.3 function evaluations per iteration. Since one function evaluation per iteration is a lower bound, one cannot gain much by relaxing rl any further. Those results do, however, imply that spending more time computing a better search direction may be warranted, in order to reduce the total number of iterations. An interesting aspect of SRG (M = diag{ZTGZ}) is that the number of function evaluations and linesearch time per iteration is less than for SRG with M = Gs or for RG. This implies that the initial Newton step, a ~ (5.4), is almost always a better predictor of the step to the minimum in the direction p when M = diag{ZTGZ} is used.
5.2.4. The effect of linesearch tolerance Experiments reported in Tables 5.1, 5.2 and 5.3 used linesearch parameter settings ~ = 0.2,/~ = 0.5. We have also tested the SRG algorithm under a variety of other parameter settings. The qualitative results remained fairly consistent;
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
145
that is, SRG with M = diag{ZTGZ} generally required the fewest iterations and the least CPU time, followed by SRG with M = Gs and in turn by RG. Table 5.4 displays some results obtained with parameter settings ~! = 0.001 and /~ = 0.0001. This choice of parameters results in a relatively accurate linesearch, controlled primarily by criterion (5.5). In comparing the results in Table 5.4 with the results in Tables 5.1, 5.2 and 5.3, we note that, with the accurate linesearch, RG required more iterations to satisfy the optimality criterion than it did with the relatively inaccurate linesearch. This is also true for SRG with M = Gs on the 150 node and 666 node problems. The number of iterations required by SRG with M = diag{ZTGZ}, however, was almost unchanged. This is some indication that the reduced gradient algorithm is more robust under this diagonal scaling than with M = Gs or M = L As expected, the decrease in ,/ resulted in an increase in linesearch time per iteration and in function evaluations required per iteration. The fact that the linesearch routine requires so much CPU time per iteration compared to optimization time per iteration is one of the key reasons we recommend an inaccurate linesearch.
5.2.5. Function evaluation routine The objective function f(x)= ~,jfj(x i) is convex and separable. It is not necessary to store the individual arc costs fj, since they can be generated when needed. However, the results reported thus far were obtained with a code which does store the vector of arc costs at any current feasible point. With the individual arc costs in storage, they do not have to be re-evaluated as long as the arc flows do not change. This can save time during the linesearch routine, since the cost for an arc j does not then have to be recalculated at intermediate points if p~ = 0. For the most part the savings in linesearch time was in the neighborhood of 20% or less.
5.2.6. Storage requirements The SRG/RG computer code does use more in-core storage than CS. Whereas CS requires 6 node-length and 3 arc-length vectors, the SRG/RG test code uses in addition, one node-length vector (the reverse thread) and 4 arc-length vectors (the search directions, the gradients, the arc costs fj(x), and a working vector used in linesearch). However, both SRG and RG could work with two fewer arc-length vectors (the arc costs fj(x) and the working vector), at the expense of an increase in linesearch time.
146
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
Acknowledgements W e are i n d e b t e d to J o h n M u l v e y and Gilles G a s t o u for their c a r e f u l r e a d i n g a n d c o m m e n t s o n a n earlier v e r s i o n of this paper. Also we w i s h to t h a n k Jeff K e n n i n g t o n for p r o v i d i n g his c o n v e x s i m p l e x code a n d the Dallas test p r o b l e m .
References [1] A.I. Ali, R.V. Helgason and J.L. Kennington, "The convex cost network flow problem: A state-of-the-art survey", Technical Report OREM 78001, Southern Methodist University (January 1978). [2] D.P. Bertsekas, "Algorithms for optimal routing of flow in networks, Coordinated Science Laboratory Working Paper, University of Ilinois at Champaign-Urbana (June 1978). [3] D.P. Bertsekas, E. Gafni and K.S. Vastola, "Validation of algorithms for routing of flow in networks", Proceedings of 1978 IEEE Conference on Decision and Control, San Diego, CA (January 1979). [4[ R.G. Bland, "New finite pivoting rules for the simplex method", Mathematics of Operations Research 2 (1977) 103-107. [5] G.H. Bradley, G.G. Brown and G.W. Graves, "Design and implementation of large scale primal transshipment algorithms", Management Science 24 (1977) 1-34. [6] D.G. Cantor and M. Gerla, "Optimal Routing in packet switched computer network", IEEE Transactions on Computing C-23, 10 (1974) 1062-1068. [7] M. Collins, L. Cooper, R. Helgason, J. Kennington and L. LeBlanc, "Solving the pipe network analysis problem using optimization techniques", Management Science 24 (1978) 747-760. [8] L. Cooper and J. Kennington, "Steady state analysis of nonlinear resistive electrical networks using optimization techniques", Technical Report, IEOR 77012, Southern Methodist University (October 1977). [9] M. Florian, "An improved linear approximation algorithm for the network equilibrium (Packet switching) problem", Publication No. 251, Department d'informatique et de recherche operationnelle, Universit6 de Montr6al (1977). [10] M. Frank and P. Wolfe, "An algorithm for quadratic programming", Naval Research Logistics Quarterly 3 (1956) 95-110. [11] P.E. Gill and W. Murray, "Safeguarded steplength algorithms for optimization using descent methods", Report NAC37, National Physical Laboratories Teddington, Middlesex (August 1974). [12] F. Glover, D. Karney and D. Klingman, "Implementation and computational comparisons of primal, dual and primal-dual computer codes for minimum cost network flow problems", Networks 4 (1974) 191-212. [13] M.A. Hall, "Hydraulic network analysis using (generalized) geometric programming", Networks 6 (1976) 105-130. [14] R.V. Helgason and J.L. Kennington, "An efficient specialization of the convex simplex method for nonlinear network flow problems", Technical Report, IEOR 77017, Southern Methodist University (April 1978). [15] D. Karney and D. Klingman, "Implementation and computational study on an in-core, out ofcore primal network code", Operations Research 24 (1976) 1056-1077. [16] J.G. Klincewicz, "Algorithms for network flow problems with convex separable costs", Dissertation, Yale University, New Haven (1979). [17] D.G. Luenberger, Introduction to linear and nonlinear programming (Addison-Wesley Publishing, Reading, MA, 1973). [18] R.R. Meyer, "Two-Segment separable programming", Management Science 25 (1979) 385-395. [19] R.R. Meyer, "Algorithms for a class of 'convex' nonlinear integer programs", in: W.W. White,
R.S. Dembo, J.G. Klincewicz/ Nonlinear network algorithm
147
Ed., Computers and Mathematical Programming, National Bureau of Standards Special Publication 502 (1978). [20] B.A. Murtagh and M.A. Saunders, "Large scale linearly constrained optimization", Mathematical Programming 14 (1978) 41-72. [21] S. Nguyen "A mathematical programming approach to equilibrium methods of traffic assignment with fixed demands", Publication No. 138, Department d'informatique et de recherche operationnelle, Universit6 de Montr6al (1973). [22] R.A. Tapia, "Diagonalized multiplier methods and quasi-Newton methods for constrained optimization", Journal o[ Optimization Theory and Applications 22 (1977) 135-192.
Mathematical Programming Study 15 (1981) 148-176. North-Holland Publishing Company SON ALGORITHM FOR LP/EMBEDDED NETWORK PROBLEMS*
THE SIMPLEX
Fred GLOVER Unit, er.~ity o[ Colorado, Graduate School o.f Business Boulder, CO 80309, U.S.A.
Darwin KLINGMAN Unit'er.~ity of Texas, Department o[ General Bu.~iness, BEB 608, Austin, TX 78712, U.S.A.
Received 3 February 1979 Revised manuscript received 16 October 1979 This paper develops a special partitioning method for solving LP problems with embedded network struclure, These problems include many of the large-scale LP problems of practical importance, particularly in the fields of energy, scheduling, and distribution. The special partitioning method, called the simplex special ordered network (SON) procedure, applies to LP problems thai contain both non-network rows and non-network columns, with no restriction on the form of the rows and columns thal do not exhibit a network structure, Preliminary computational results are reported for an all-FORTRAN implementation of the simplex/SON algorithm called NET/LP. The test problems are real-world models of physical distribution and scheduJing syslcms. NET/LP has solved problems with 6200 rows and 22,000 columns in less than 3 minutes, counting aII 1/0, on an AMDAHL V-6 with a FORTRAN H compiler. Key words: Linear Programming, Multi-Commodity Networks, Networks. Transportation,
Distribution.
1. Introduction T h e d r a m a t i c s u c c e s s e s o f the p a s t s e v e r a l y e a r s in s o l v i n g p u r e n e t w o r k p r o b l e m s [2, 3, 5, 6, 8, 19, 30, 41] h a v e m o t i v a t e d c o n s i d e r a t i o n o f m e t h o d s for s o l v i n g m o r e g e n e r a l l i n e a r p r o g r a m m i n g ( L P ) p r o b l e m s with e m b e d d e d n e t w o r k s t r u c t u r e . F o r e x a m p l e , in the realm o f p u r e n e t w o r k s ( c a p a c i t a t e d m i n i m u m c o s t flow p r o b l e m s ) , the c o m p u t a t i o n a l s t u d y [21] d e m o n s t r a t e s that special p u r p o s e n e t w o r k c o m p u t e r c o d e s are 150-200 t i m e s f a s t e r than the s l a t e - o f - t h e art L P c o d e , A P E X I l l . S u b s e q u e n t s t u d i e s o f ' s i n g l y c o n s t r a i n e d ' n e t w o r k s ( L P p r o b l e m s c o n s i s t i n g o f a n e t w o r k plus o n e a d d i t i o n a l side c o n s t r a i n t ) d e m o n s t r a t e d that s p e c i a l i z e d m e t h o d s also yield s u b s t a n t i a l c o m p u t a t i o n a l a d v a n t a g e s f o r p r o b l e m s that d o not e x h i b i t pure n e t w o r k s t r u c t u r e s , but w h i c h are ' a l m o s t n e t w o r k s ' . T h e p a p e r s [18, 20, 311 s h o w that t h e s e p r o b l e m s c a n b e s o l v e d 2 5 - 5 0 t i m e s f a s t e r than A P E X - I l l . M a n y p r a c t i c a l L P p r o b l e m s , h o w e v e r , c o n t a i n e m b e d d e d n e t w o r k s with multiple side c o n s t r a i n t s and m u l t i p l e side v a r i a b l e s , a n d so it is e x t r e m e l y i m p o r t a n t to d e t e r m i n e w h e t h e r an efficient s p e c i a l i z e d * This research was partially supported by FEA contract CR-03-70128-00 with Analysis, Research, and Computation, Inc.. and by ONR Projects N R047-172 and NR047-021 with the Center for Cybernetic Studies, The University of Texas at Austin.
148
Fred Glover and Darwin Klingman/ The simplex SON algorithm
149
method can be developed for these problems. The purpose of this paper is to describe such a method, called the simplex special ordered network (SON) algorithm. The simplex SON algorithm is a primal basis partitioning method that employs special updating and labeling procedures to accelerate computations involving the network-LP interface. A preliminary FORTRAN implementation of this method solves real-world physical distribution models 25-50 times faster than APEX-III, confirming that it is possible to create a marriage of network and LP methodology that has advantages for more general problems with embedded network structure. For definitional purposes, we refer to an LP/embedded network problem as a capacitated or uncapacitated linear program in which the coefficient matrix A can be characterized as follows. The A matrix contains m + q rows and n + p columns which are ordered and scaled such that each column of the m • n submatrix Amn, consisting of the first m rows and n columns of A, has at most one +1, one - 1 , and zeros elsewhere. A major portion of the LP literature has been devoted to problems in which: (a) m -- n -- 0 (standard LP problems); (b) p = 0 (multicommodity networks and constrained network problems); (c) p --0 and the submatrix Amn contains only one non-zero entry per column (LP/generalized upper bounding (GUB) problems); (d) p = q = 0 (pure network problems). The success with special classes of LP/embedded network problems, already noted, has led to speculations [10, 18, 21] that good results can also be obtained by extending these ideas for problems where p and q are less than n and m, respectively. Motivation for such an extension that leads to a highly efficient implementation has come from a number of the major practitioners of linear programming. In addition, the members of SHARE, and a number of industrial and governmental agencies that have large-scale LP/embedded network problems, have strongly stressed the need for such a method. Several of these individuals and groups have urged us to undertake such a development, leading to the results reported in this paper. One of the important applications to which the simplex SON method is relevant is the national energy model PIES, developed by the Federal Energy Agency [24]. In the PIES model, q-<20, m = 2500, n = 4400, and p -- 4500. In general, it is our experience that most large-scale LP problems involving production scheduling, physical distribution, facility location, personnel assignment, or personnel promotion contain a large embedded network component, sometimes consisting of several smaller embedded networks. Coupling constraints (q >0) arise, for example, from economies of scale, limitation on the total number of promotions, capacity restrictions on modes of transportation (e.g., pipelines, barges), limitations on shared resources, multiple criteria, or from combining the outputs of subdivisions to meet overall demands. Coupling columns (p > 0) arise from activities which involve different time periods (e.g., storage), production alternatives (e.g., refinery activities), or which involve
150
Fred Glover and Darwin Klingman/ The simplex SON algorithm
different subdivisions (e.g., assembly). For example, Agrico Chemical Fertilizer Company has physical distribution and facility location problems where p = 0, m = 6200, n = 22000 and q = 20.
1.1. History of methods for solving special classes of LP/embedded network problems There are two basic approaches which have been employed to develop specialized techniques for the above special classes (cases (b), (c), and (d)) of LP/embedded network problems--decomposition and partitioning methods. Decomposition approaches are further characterized as price-directive or resource-directive. The papers [3, 11, 12, 13, 16, 43, 44, 45, 47] give variations of price-directive decomposition and the papers [3,14, 34, 40, 42] are resourcedirective decomposition. Partitioning approaches can be divided into partitioning block diagonal structured linear problems or general partitioning to exploit embedded substructure within general linear problems. Because they share the same basic principles, we will briefly review the literature of both. The general idea of partitioning block diagonal structured programs was originally proposed by Dantzig [14a] and in a slightly different setting, by Charnes and Cooper [9, Vol. 2]. Later Bennett [7], Hartman and Lasdon [26], Heesterman [28], Kaul [33], and Weber and White [46] independently developed primal solution procedures of more general scope. The paper by Hartman and Lasdon [26] contains an excellent description of this approach and procedures for handling the working inverse. Further, their paper contains computational results of such an algorithm. Grigoriadis and Ritter proposed a dual method [25]. Dantzig and Van Slyke [15] then proposed their well-known GUB specialization of the primal simplex algorithm for the case where each block contains only one row. The general block diagonal procedures were further refined and specialized for multicommodity network problems by Saigal [38]. This specialization involves carrying a working basis inverse whose size need not exceed the number of saturated arcs. Hartman and Lasdon [27] developed efficient procedures for updating this working basis inverse. However, neither Saigal nor Hartman and Lasdon diseuss how this procedure may be efficiently implemented. Maier [36] refined their procedures and initiated implementation discussions. Kennington et al. [1,29, 30] streamlined the implementation procedures of Maier and conducted extensive computational testing. Charnes and Cooper [9, Vol. 2] originally proposed partitioning to exploit embedded substructure within general linear problems in their Double Reverse Method. Bakes [4] independently proposed an analogous algorithm when p = 0. Klingman and Russell [35] developed additional specializations for exploiting pure network substructure and Hultz and Klingman [31, 32] for generalized network substructure (i.e., where each column of A,~n may have at most two
Fred Glover and Darwin Klingmanl The simplex SON algorithm
151
arbitrary non-zero coefficients). These papers initiated in-depth discussions of implementation techniques. Actual implementations of such algorithms for pure and generalized network structures are respectively reported in Glover et al. [20] and in Hultz and Klingman [31]. Schrage [40a, 40b] and Glover [17a] developed specializations of the simplex method to VUB and GVUB problems, which are also subsumed by the LP/embedded network problem framework. The work by McBride [37] and Graves and McBride [23] on Factorization redeveloped and refined the general procedures of 19, 14a]. Further, McBride's dissertation [37] discussed specialization of these procedures for exploiting pure network substructure.
1.2. Form of the simplex SON method for LP/embedded network problems The simplex SON method constitutes a highly efficient way to modify and implement the steps of the primal simplex algorithm for the completely general case of embedded pure network problems (where p > 0 and q >0). The efficiency is the direct result of exploiting the pure network portion Amn of the coefficient matrix and the network-LP interface by special labeling and updating procedures. The starting point for the algorithm, following the natural design of partitioning methods, is to subdivide the coefficient matrix into network and non-network components. By reference to this subdivision, a basis inverse compactification procedure is employed that maintains a working basis inverse, V -~, whose dimension equals m + q less the rank of the basic subcolumns of A associated with A,,~. The size of V -j therefore varies dynamically. This is one of the major features of this algorithm that distinguishes it from partitioning algorithms designed for constrained networks [31, 32, 35]. The basic variables not associated with V -~ are stored in a special graph form called the master basis tree. The development of the master basis tree and the procedures for using it to efficiently replace arithmetic operations constitute the principal contributions of this paper. We show that the operations normally performed by using the full basis inverse can instead be performed by special labeling and graph traversal techniques [5] applied to the master basis tree and its interface with V -~. The organization of the simplex SON method maintains the network portion of the basis as large as possible at each iteration, thereby enabling these labeling and list processing procedures to operate on a maximally dimensioned part of the basis. This in turn minimizes the size of V -~. The resulting advantages over the standard LP implementation approach are several. First, the graph traversal operations reduce both the amount of work needed to perform the algorithmic steps and the amount of computer memory required to store essential data. Second, the algorithm orients the execution of operations in a manner that is best suited to the design of computers (making extensive use of linked list structures, pointers, and logical operations in place of
152
Fred Glover and Darwin Klingmanl The simplex SON algorithm
arithmetic operations.) Third, the method is less susceptible to round-off error and numerical inaccuracy. Most of the operations are performed using original problem data and only the residual portion of the basis inverse associated with V -~ is subject to the slower customary updating, with its greater attendant susceptibility to round-off error and numerical inaccuracy. (The graph traversal procedures also automatically eliminate checking or performing arithmetic operations on zero elements.) 2. Problem notation
The LP/embedded network problem and its dual may be stated mathematically as follows: Primal
Minimize
c.x. + CpXp,
(I)
subject to
A.,.x. + mmpXp = bin,
(2)
Aq.x. + Aqpxp = bq,
(3)
0 -< x. -< u.,
(4)
0 -< xp -< up.
(5)
Maximize
wmbm + wob~ - vu. - $ u p .
(6)
subject to
wmAm, + wqAq, - 7 ~ c,.
(7)
wmAmp + wqA~p - d~ <_ Cp.
(8)
win. w~--unrestricted.
(9)
Dual
V. ~ -> 0
(10)
where Am, is (m x n ) . Amp is (m xp). Aq, is ( q x n ) . and Aqp is (q xp). The remaining vectors are conformable vectors whose subscripts indicate their dimensionality. Each column of the matrix A~,, contains at most one +l. one - l . and zeros elsewhere. Thus A,,, corresponds to a pure network problem. For this reason the (2) portion of the LP/embedded network problem will be referred to as node constraints or simply nodes. The x, variables will be referred to as arc variables or simply arcs. The arcs will be further classified as ordinary arcs, which have exactly two non-zero entries in Am,, and as slack arcs, which have exactly one non-zero entry in A~,. In graph terminology, a simple graph G ( V , E ) is a finite set of vertices V and a finite set of edges E connecting the vertices. Each element of E is identified with an unordered pair of distinct elements of V. Schematically, each edge connects
Fred Glorer and Darwin Klingman/The simplex SON algorithm
153
two distinct vertices, which are then considered to be adjacent. If the edge set E is expanded to contain edges connecting the same two vertices (parallel edges), then G(V, E) is called a general graph. If each edge of the general graph has an implied direction, then the graph is sometimes called a digraph or directed graph, the vertices are referred to as nodes, and the edges are referred to as arcs, The underlying pure network problem A,~n defines one or more connected digraphs as follows. Each row of A,~n corresponds to a node and each column to an arc. The - I entry in a column indicates the node where the arc begins (from node) and the +l entry in a column indicates the node where the arc ends (to node). If a column has only one non-zero entry (a slack arc) the endpoints of the arc are incident on the same node. This representation of An,, may consist of several disjoint connected digraphs. Because each arc is associated with a variable, it has lower and upper bounds and an objective function coefficient.
3. Basis structure
Using the standard bounded variable simplex algorithm, a basis B for the LP/embedded network problem is a matrix composed of a linearly independent set of column vectors selected from the coefficient matrix
A
:
JAm, ! Amp ]
The variables associated with the column vectors of B are considered to be basic variables xn and all others are non-basic variables xN at their lower or upper bound. Without loss of generality, it will be assumed that A has full row rank. Any basis B for the LP/embedded network problem will, therefore, be a nonsingular matrix of order (m + q ) x ( m + q ) . Clearly, any basis matrix B may be partitioned as follows: xBt
B
--
x~2
[Bll Bi2]
(11)
where B~t is a nonsingular submatrix of A,,. Thus the basic variables xa~ associated with the
B,,] B21J
I
columns are exclusively arc variables. The basic variables xa2 associated with the
[
B,2] B221
columns may also contain arc variables. (xB~ and xa2 are written above their associated components of B in (l 1).) Based on the indicated partitioning of (i i), the basis inverse B -i may be
154
Fred Glover and Darwin Klingmanl The simplex SON algorithm
written as follows:
B-' =
[B~? + B[dB~V-'B~,B;; i -B,,'B,,V-'] t ............ ::V ................ ]
(12)
where V = B22 - B2~B;~B,~. The motivation for this way of partitioning (1 1) is to factor out the submatrix B,I of A,,~ and thereby exploit its inherent triangularity by viewing and storing B,, as a digraph. This handling of B~ has several advantages: (I) the graph contains only the nonzero components of Bit, (2) any operations involving B,, may be performed by traversing the associated digraph using appropriately designed labeling techniques, (3) since BN contains only original problem data, numerical errors are reduced. The only matrices required to generate the basis inverse, as seen from (12), are B, B?I' and V -~. The efficiency of generating the needed components of B -~ in performing the steps of the simplex method depends on the size and techniques used to store B,,, the labeling procedures used with Bit to eliminate matrix multiplications involving B{~~, and the procedures used to maintain the position of B. The developments of the following sections show how to handle these considerations effectively. 3.1. Graph representation of B~,
As in the case of A,~,. Bu defines a digraph. The structure of this digraph is a set of disjoint spanning trees, each of which is augmented by an arc whose from node and to node are the same node in B , . Such an arc of B~, will be called a simple loop. (This structure implies that Bit is a block diagonal matrix and each block is triangular. The blocks each consist of a spanning tree plus a simple loop. Thus, each block is a quasi-tree.) This structural property of B,~ follows directly from the fact that Bt~ is a square nonsingular matrix and that B~ is a submatrix of A,,~. Thus each column of B~ contains at most two non-zero unit entries of the opposite sign. A column of Btl corresponding to a simple loop may be of two types according to whether the associated column of A,,, is a slack arc or an ordinary arc. If the Ar~ column is an ordinary arc, then the partitioning of B has split this arc between B,~ and B2t--that is, one of its unit entries lies in Bj, and the other in B2~. Because of this, the algorithm stores and uses B~, by keeping a larger digraph than the one corresponding to BI~. This larger digraph, called the master basis tree, contains every node in Am~ plus another node called the master root; thus, it always contains m +1 nodes and m arcs. The nodes of this tree that correspond to rows of B2~, since they are external to the nonsingular network structure of B , , are called externalized roots (ER's). The master basis tree contains all of the ordinary arcs in B~. Simple loops in
Fred Glover and Darwin Klingman/ The simplex SON algorithm
155
Bll are contained in the master tree in a modified form. If the simple loop is a slack arc of Am,, then the simple loop is replaced by an arc between the master root and its unique node. If the simple loop is an ordinary arc in Amn it is replaced by an arc between its nodes in Am,. Such arcs, thus, join ER's to nodes in BI~. To complete the master basis tree each ER is connected to the master root by an externalized arc (EA). Fig. 1 graphically depicts a master basis tree. It is important to stress that the master basis tree is a conceptual scheme designed to allow the simplex/SON algorithm to efficiently maintain the partitioning of B while keeping the B~t portion at maximum size during each iteration. This construction should not be confused with the simple model device sometimes employed in pure network settings, where a pseudo root is added for the purpose of giving each slack arc two endpoints. The connections represented by the master basis tree include both network and 'extra network' structures (mediated by externalized roots and arcs), and the rules for operating on these structures are of a very special type. To understand the function of these rules it should be noted that a multiplicity of cases must be considered when executing a basis exchange step. For example, if an arc variable is to be added to the basis, or transferred from xB2 to xB~, the endpoint(s) of the arc may consist of (1) two nodes in a single block (quasi-tree) in B . , (2) nodes in different blocks (quasi-trees) in Bl~, (3) a node in B21 and a node in BH, (4) two nodes in B21, (5) a single node in B , , (6) a single node in B21
Externalized arcs
..... ---I/~ . ~....~ ~
.J
L\\
Fig. I
156
Fred Glover and Darwin Klingman[ The simplex SON algorithm
(where cases (5) and (6) apply to slack arcs of Am~). A similar multiplicity of cases applies to removing an arc from the basis, and the combinations that result from both adding and removing arcs are still more numerous. The use of the master basis tree permits all of these cases to be unified in a particularly convenient fashion. The rules characterizing the conditions for adding and deleting arcs, and specifying the appropriate restructuring of the master basis tree, are as follows.
3.2. Fundamental exchange rules (1) An arc of xB2 can admissibly be added to the BH portion of the basis, without deleting another, if and only if its loop in the master basis tree contains at least one EA. (Such a loop can contain at most two EA's.) The updated form of the master basis tree then occurs in the following manner: (a) Add the new arc and drop any EA from the loop. (b) Change the status of the ER formerly met by the dropped EA to that of an ordinary node, transferring its row from the B21 to the B1~ portion of the basis. (2) An arc can be deleted from B H (removing a component of xB1)without adding another as follows: (a) Identify the node of the selected arc that is farthest from the master root. (b) Change this node into an ER node by moving this node to BEi and attaching it to the master root by a newly created EA. At the same time delete the selected arc from BH. (3) An arc can be added to BH and another simultaneously removed from B~I as follows: (a) If the loop in the master basis tree created by the added arc includes the arc to be dropped, then the exchange step is handled exactly as an exchange step of an ordinary network basis. (Thus no EA's are added or dropped, and no nodes alter their status as ordinary nodes or ER nodes.) (b) If the loop in the master basis tree created by the added arc does not include the arc to be dropped, then the exchange may be performed as a two-part process that applies the preceding rules 1 and 2 in either order (as long as the exchange is valid). (4) BI1 and B21 can be restructured, without adding or deleting basis arcs xB~, by an exchange step that drops any EA and adds another EA to any node of the isolated tree (excluding the master root) created by dropping the first. This step is accomplished by interchanging the ER status and ordinary node status of two nodes which swaps their corresponding rows in BI~ and B21. It should be remarked that the EA's have a special interpretation in these rules. Since the master basis tree spans all nodes of Amn, and always contains the same total number of arcs (including EA's), the number of EA's corresponds to the number of non-arc variables in the basis (elements of xB2) that are required to give the basis full row rank for the Amn portion of the problem. It may also be observed that the Am portion of each column of B21 (the portion associated with the ER's) contains at most one non-zero entry. In
Fred Glover and Darwin Klingman[ The simplex SON algorithm
157
particular, this partial column is the zero vector if its associated arc (element of xB~) does not meet an ER, and otherwise contains a -1 entry if the from node of the arc meets an ER and a +l entry if the to node of the arc meets an ER. No slack arcs of A,,~ can contain entries in B2b or else their BH columns would be zero vectors. However, it is possible for slack arcs of A ~ to have entries in B22, and also for ordinary arcs of Am to have both of their non-zero entries in B22(meeting two ER's at their endpoints). In this case such a slack arc or ordinary arc, when added to the master basis tree, creates a loop that includes an EA, and thus can be moved from xn2 to xBt by the Fundamental Exchange Rules. The validity of the Fundamental Exchange Rules is expressed in the following result. Theorem 1. /11 is maintained as a nonsingular matrix by the addition and deletion of arcs if and only if the Fundamental Exchange Rules are applied. Proof. The master basis tree maintains a linearly independent superstructure which, by rooting each block (quasi-tree) of 811 at an ER, and each slack arc of Ar~ in B , at the master root, assures that BH is nonsingular. Further, it is readily verified that each operation prescribed by the Fundamental Exchange Rules preserves these structural relationships. The primary requirement is to determine that none of the possible ways of modifying Bll is overlooked. We will not trace the details of a full itemization of cases, since they are somewhat tedious, but simply remark that each of the alternatives can be directly observed to be handled correctly by the unifying construction of the master basis tree. (The reader who wishes to trace these cases in detail will be aided by reference to Fig. 2.)
It may be noted that Rule 4 of the Fundamental Exchange Rules, while not directly concerned with the addition and deletion of arcs, can affect the density of B2~, and indirectly the composition of V. The density of any row of B2~ associated with Amn can be minimized (reduced to a single non-zero, if any non-zeros exist) by a single application of Rule 4 to swap the status of the associated ER node with that of the last node of its subtree (i.e., swapping a row of BH with a row of B~2), using the last node function [5]. Note that by successively applying this procedure each ER could be structured so that it has only one arc of Amn incident on it. More important and compelling is the issue of whether there exists an easily specifiable way to apply the Fundamental Exchange Rules to maximize the dimension of B . (the number of components of xsl) or whether, due to the influence of the non-network basis structure, a particular configuration for BH cannot be augmented except by a complex strategy of removing current arcs and adding new ones. The resolution of this issue, which relies on the fact that
158
Fred Glarer and Darwin Klingman/ The simplex SON algorithm
RULE
RULE
1 (3 Examples)
(1.)
d
(3 Examples)
(1)
~ Ax)<~;,c-S-o,~?--. -I~;I'~-(~" "-.
d 4
2
( T h e " O n o d e " i s the endpoint of the deleted arc
(Deleted arc must always be an EA,)
/
farthest f r o m MR.)
.J/T
~
"-..
",,
4) )
dd'
j//
d
RULE 30 (2 Exomptes) (Deleted arc lies on the cycle created by the
dd "
RULE 3b (1 Ezomple)(The deleted arc (1") does nat lie on the cycle created by the added arc (1). Exchanges involving type (1) arcs or type (1") arcs can be done in either order.)
/' RULE 4 (1 Example)
SYMBOL
LEGEND
(Each " D n o d e " c o u l d acceptably be an endpolnt of the i . . . . trig . . . . . i th ( ~ M R t heot. he r~ nd . . . . t.) Master Root
@ l) ( h ) -I
(h)
Externalized Root
Deleted arc for the h th ~xamp)e of the rule illustrated Added arc |or the k th example at the r u l e i l l u s t r a t e d Externalized Root that becomes an ordinary node when the EA into it is deleted
e
Fig. 2
Ordinary node that becomes an ER when an EA is connected to it
Fred Glover and Darwin Klingman/ The simplex SON algorithm
159
maximizing the dimension of Bll is equivalent to minimizing the number of EA's in the master basis tree, is expressed in the following result. Theorem 2. The dimension of BN is maximized (and the number of EA's minimized) by successively applying Rule 1 of the Fundamental Exchange Rules until no more arcs are admissible to be added by this rule. Proof. Suppose instead a master basis tree is obtained that cannot admit the addition of any arc by Rule l, yet that another master basis tree exists with fewer EA's. Since any tree can be transformed into any other by some sequence of pairwise arc exchanges, producing a tree at each stage, we can find some tree T~ in such a sequence which allows no swap reducing the number of EA's, and another tree T2, obtained by a swap in T1, such that a swap in T2 will reduce the number of EA's. (Any single swap that reduces the number of EA's is an application of Rule 1.) Thus, the tree with a smaller number of EA's is two swaps away from T~. But a basic result of tree exchanges is that if two trees are two swaps apart, then the swaps can be executed in either sequence, producing a tree at each step (see, e.g. [22]). This proves the theorem by contradiction. Theorem 2, which applies either to adding incoming arcs to x~, or transferring arcs from x~2 to xB~, also makes it possible to maintain BH at its maximum dimension at each iteration of the primal simplex method, as will be demonstrated subsequently. For the special case in which the LP/embedded network problem has no non-network constraints (though any number of non-network variables), the following observation is useful. Corollary 1. If the nodes of Am~ span all the rows of A (i.e., if Aqn is empty), then every basic arc variable can be included in xs~. Proof. Any basis arc variable that cannot be included in XB1 must create a cycle that does not include an EA. This implies linear dependence in the arc variable columns of B. These columns must contain zeros in BI2 and B22; otherwise their arcs would intersect an ER and their loop would contain an EA. Then B must be singular, contrary to the fact that it is a basis. The value of this corollary is that it allows all basic network arcs automatically to be included in xBl for LP/embedded network problems without side constraints (q = 0), avoiding the work of checking to see whether the inclusion of any particular arc is admissible.
3.3. Labeling algorithms for accelerating basis computations Drawing on the network topology of B, as embodied in the construction of the master basis tree, we turn now to the determination of special algorithms for
160
Fred Glover and Darwin Klingman[ The simplex SON algorithm
processing this master tree. The algorithms are specifically designed to carry out computations involving B -1 that are required in the steps of the primal simplex method (pricing out the basis, determining the representation of the incoming variable, etc.). In particular, we are concerned with identifying the most effective way to take advantage of the network structure of Bll embedded in the partitioned inverse. The principal computations of the primal simplex method that involve Bh i can be segregated into three classes concerned with computing: (1) B~IIG, (2) HB~IJ, and (3) HB~I~G. By allowing H and G in each of these classes to be either a vector or a matrix (and in the case of a vector, to have either single or multiple non-zero components), it is possible to express the generic forms of all of the simplex method calculations involving B . in the partitioned inverse. In this section we provide the procedures capable of performing these calculations in a manner that most fully exploits the structure of BH. Significantly, it turns out that the most effective list processing and labeling procedures differ in each of the several alternative cases. The list structures and functions used in these procedures are those commonly employed in network optimization. For completeness, we include explicit descriptions of these lists. To provide a common visual frame of reference, the root node r may be viewed as the highest node in the tree with all the other nodes hanging below it. The tree is then represented by keeping a pointer list which contains for each node w (other than the root) the unique node v above node w which is connected to w by an arc. This upward pointer is called the predecessor of node w and will be denoted by p(w). Correspondingly, node w is called an immediate successor of node v. For convenience, we will assume that the predecessor of the root, p(r), is zero. Fig. 2 illustrates a tree rooted at node 1, the predecessors of the nodes, and other functions to be described subsequently. The predecessor of a node is identified in the p array. For example, the predecessor of node 16 is node 5. Fig. 2 illustrates additional tree information expressed in terms of node functions, which are often used in computer implementation procedures for solving network problems. The first of these functions, the thread function, is denoted by t(x). This function is a 'downward' tree pointer. As illustrated in Fig. 2 by the dashed line, function t may be thought of as a connecting link (thread) which passes through each node exactly once in a top to bottom, left to right sequence, starting from the root node. For example, in Fig. 2, t(1) = 2, t(2)= 4, t(4) = 5, t(5)= 16, t(16)= 8, etc. Letting n denote the number of nodes in the tree, the function t satisfies the following inductive characteristics: (a) The set {r, t(r), t2(r) ..... t"-~(r)} is precisely the set of nodes of the rooted tree, where by convention t2(r)=t(t(r)), t3=t(t2(r)), etc. The nodes r, t(r) ..... tk-~(r) will be called the antecedents of node tk(r). (b) For each node i other than node t"-~(r), t(i) is one of the nodes such that
Fred Glover and Darwin Klingman/ The simplex SON algorithm
161
p(t(i)) = i, if such nodes exist. Otherwise, let x denote the first node in the predecessor path of i to the root which has an immediate successor y and y is not an antecedent of node i. In this case, t(i) = y. (c) t~(r) = r; that is, the 'last node' of the tree threads back to the root node. The reverse thread function, rt(x), is simply a pointer which points in the reverse order of the thread. That is, if t(x) = y, then rt(y) = x. Fig. 2 also lists the reverse thread function values. The depth function, dh(x), indicates the number of nodes in the predecessor path of node x to the root, not counting the root node itself. If one conceives of the nodes in the tree as arranged in levels where the root is at level zero, and where all nodes 'one node away' from the root are at level one, etc., then the depth function simply indicates the level of a node in the tree. (See Fig. 3.) The cardinality function, c(x), specifies the number of nodes contained in the subtree associated with node x. By the nodes in the subtree associated with node x, we mean the set of all nodes w such that the predecessor path from w to the root contains x. (See Fig. 3.) The last subtree node function, f(x), specifies the node in the subtree of x that is encountered last when traversing the nodes of this subtree in 'thread order'.
Predecessor
p (x)
Node potential
d (x)
Thread
t Ix)
Reverse thread
rt(x)
Depth
dh/x}
Cardinal ity
c (x)
Last node in subtree
fix)
/
26 2 ~
6J2 718
13~ ~
617
4 1 7 3 3
'6
;~.~ I /
\,";~ ~ (,~'~ ~,'\~' ;/~ ~ ',--
r,'.-L
i(z)
~
(17)
\
2 314
~ 1 1 1 3 1214 2 31s '~_~--~ 12111 Ii is il 3 i 12 1 3 1 1 0 1 s l 4 1 0 3 113
)
14 I lO 18 11 13 3
I 14
4 7 4
I 16 117
1sill Is
Fig. 3
162
Fred Glover and Darwin Klingman/ The simplex SON algorithm
More precisely, f ( x ) = y where y is the unique node in the subtree of x such that t(y) is not also a node in the subtree of x. (See Fig. 3.) Note that both the domain and the range of each of the above discrete functions consist of a subset of the problem nodes and thus are independent of the number of arcs in the L P / e m b e d d e d network problem. Since the master basis tree contains m + 1 nodes, a one-dimensional array of size m + l, called a node length array, is allocated to each function during computer implementation. The procedures for updating the values of the functions when the tree is reconfigured are discussed in [5, 17]. In computing a particular vector x = B~IIG, the components of x are associated with columns of Bll via the equation BHx = G, and are thus associated with arcs. Similarly, the components of the vector W = HBF~I are associated with rows of B~I via the equation WBH = H and are thus associated with nodes. Consequently, in computing these vectors we will let: Xk----the component of x associated with the basis arc k (whose endpoints are nodes k and p(k)), WR = the component of W associated with node k. These latter definitions will be modified only to allow x and W to represent matrices rather than vectors, in which case each row of x will be thought of as a row of variables associated with the corresponding arc, and each column of W as a column of variables associated with the corresponding node. Finally, we will refer to a basis arc as conformable if its Am, direction agrees with its basis predecessor orientation (i.e., the from node of the arc lies at the predecessor node and the to node of the arc lies at the successor node), and refer to the arc as nonconformable otherwise.
A. Algorithms f o r computing x = BHIG (solving for x in the equation BHx = G)
A1. G is a column vector with one non-zero element. L e t Go equal the non-zero element Gk of G, associated with node k. AI.1. Let q = p(k). Let Xk = Go if the arc is conformable, and let Xk = - G o if the arc is nonconformable. A1.2. If q is a root (either an ER or the master root), stop. All non-zero elements of x have been assigned the proper value. Otherwise, let k = q and return to A1.1. A2. G is a column vector with two nonzero elements. Use the depth or cardinality function conjunction with the predecessor so that the trace from the two nodes that correspond to the non-zeros of G can be interrupted at their intersecting paths. A2.1. Apply A1 to each non-zero, independently. Stop by the criterion of A1.2 if the paths do not intersect first. A2.2 If the paths intersect before either meets a root, temporarily stop at step
Fred Glover and Darwin Klingman/ The simplex SON algorithm
163
A1.2 when q is the intersection node, and redefine Go to be the sum of the two non-zeros of G. If Go is zero, stop. Otherwise, continue A1 to its termination. A3. G is a column vector with more than two non-zero elements. Option 1. Apply a generalized form of A2, not proceeding beyond any path intersection until all paths meeting at that point have been traced to it. The value of Go at that point becomes the sum of the non-zeros on the starting nodes of the paths meeting there. (This can be useful if most of the paths are known to intersect only at roots.) Option 2. (Generally preferred.) L e t Gi denote the ith element of G (whether non-zero or not). Using the last node function, identify the last node of the master tree, and designate this node to be node k. A.3.1. If p(k) is not a root, execute Step AI.1 for Go replaced by Gk, and let Gq = Gq -t- Gk. (A3.1 can be skipped if Gk ---- 0 thus calculating only non-zero Xk values.) A3.2. Let k = rt(k). If k is the master root, stop. Otherwise, return to A3.1. (Note in this procedure, one m a y avoid checking whether p(k) is a root in A3.1 by allowing 'fictitious' variables xk to be associated with EA's.) A4. G is a matrix (and hence x is also a matrix). Let Gi denote the ith row of G and let xi denote the ith row of x. Then apply algorithm A3, Option 2.
B. Algorithms for computing W = HB?I ~ (solving for W in the equation W B , = H) BI. H is a row vector with one non-zero element. Let Hk denote the non-zero element of H, which occurs in the kth position, associated with the arc k of the basis tree joining nodes k and p(k). BI.1. Let Ho = Hk if the arc is conformable and let H0 = - H k otherwise. Let
i=k. B1.2. Let Wi = H0. B1.3. If, by the last node function, node i is the last node of the subtree headed by node k, stop. All non-zeros of W have been generated (with the value H0). Otherwise, let i = t(i) and return to B 1.2. B2. H is a row vector with more than one non-zero element. If the subtrees containing the arcs associated with the non-zeros of H are known to be disjoint, apply B1 independently to each subtree. Otherwise, select any node k of the master tree (possibly an ER or the master root) which heads a subtree whose arcs 'contain' all non-zeros of H. Definitionally, for the following, let Wi = 0 if node i represents a root node (the master root or an ER), and let Hi = 0 if arc i represents an externalized arc (which may always be regarded as conformable.).
164
Fred Glover and Darwin Klingman/ The simplex SON algorithm
Option 1. Let WE = 0 and i = t(k). Let k* denote the last node of the subtree headed by k. B2.1. Let q = p(i). Then let W~ = Wq + Hi
if arc i is conformable,
Wi = Wq - H i
if arc i is nonconformable.
B2.2. Let i--t(i). If i = k*, stop. All non-zero elements of W have been determined. Otherwise, return to B2.1. Option 2. (Generally preferable if H has few non-zeros.) Define two lists, a subtree header list SH(j) and a last node list LN(j), j = 1..... J. To begin, let SH(1) = k and let L N ( 1 ) - - k * , the last node in the subtree headed by k. Let Wk = 0 and i = t(k ). B2.3. Let H0 = Hi if arc i is conformable and let/4o -- - H i otherwise. B2.4. Let W~ -- H0. If i -- k*, go to B2.7. B2.5. Let i = t(i). If Hi -- 0, go to B2.4. Otherwise, let H0 = H0 + Hi if arc i is conformable and let H0 = H 0 - Hi if arc i is nonconformable. B2.6. If i = k*, let Wi --/4o and go to B2.7. Otherwise, let SH(J) = i if the last node in the subtree headed by i is k* (updating the header list to name the most recent node whose last node is k*). Otherwise, let J -- J + 1, let SH(J) = i, let k* denote the last node in the subtree headed by i, and go to B2.4. B2.7. If J = 1, stop. All non-zero components of W have been determined. Otherwise, let J = J - 1 , let k = S H ( J ) , let k * = L N ( J ) and let H0 =WE. Then go to B2.5. B3. H is a matrix (hence W is also a matrix). Let Hi denote the ith column of H and let Wi denote the ith column of W. Then apply algorithm B2, Option 1 or Option 2. (If many of the columns of H are 0 vectors, Option 2 is preferable. If, further, non-zero columns of H have few non-zero elements, then a variant of Option 2 can be applied, particularly by using pointers that name only non-zero elements.) Algorithms A1, A2, B1, and B2 Option 1 are direct counterparts of algorithms already used in network optimization methods. Algorithms A3, A4, B2 Option 2, and B3 are new, designed to handle the special requirements of the LP/embedded network problem, with the same types of computational advantages that have derived from their simpler predecessors. One of the principal features shared by all of these algorithms, which provides a primary basis for exploitation by the method of the next section, is given in the following result. Theorem 3. The value assigned to a variable at any iteration of any of the preceding algorithms is the correct solution value for the indicated equation and is not modified at any subsequent iteration.
Proof. The fact that a solution value, once assigned, is not changed thereafter,
Fred Glover and Darwin Klingman/ The simplex SON algorithm
165
can be ascertained by tracing the steps of the algorithms. That this value is correct, in the case of the new algorithms, follows from the structural characteristics of the master basis tree already established, and from the list processes employed in these algorithms (see e.g. [5]). The usefulness of the 'once and for all' determination of values indicated in Theorem 3 manifests itself in the simplex SON optimization procedure by providing the basis for three additional algorithms which, together with the algorithms preceding, yield the ability to accelerate the calculation of more complex matrix products. The basis of these algorithms lies in identifying the appropriate manner to accumulate a matrix product at intermediate stages conserving time and memory. The form of these algorithms is as follows.
C. A l g o r i t h m s f o r c o m p u t i n g Z = HB~j~G + Zo
C1. G is a column vector and H is a matrix. Begin with Z equal to the column vector Z0. CI.1. Compute x = B ~ I G by the appropriate member of the A algorithms (depending on the structure of G). C1.2. As each non-zero element x~ of the column vector X is computed, let Z = Z + x~H~, where H~ is the ith column of H. C2. H is a row vector and G is a matrix. Begin with Z equal to the row vector Z0. C2.1. Compute W = H B ~ ~ by the appropriate member of the B algorithms. C2.2. As each non-zero element W~ of the row vector W is computed, let Z = Z + W~G~, where G~ is the ith row of G. C3. G and H are both matrices. Begin with Z equal to the matrix Z0. Option 1.
C3.1. Compute x = Bh~G by algorithm A4. C3.2. As each non-zero row x~ of x is computed, let Zk = Zk + Hkixi
for each row Zk of Z where Hki is the kth element of the ith column Hi of H.
Or alternatively, Zj = Zi + H~x~j for each column Z~ of Z where x~i is the jth component of xi. Option 2. C3.3. C o m p u t e W = H B h ~ by algorithm B3.
166
Fred Glover and Darwin Klingman/The simplex SON algorithm
C3.4. As each non-zero column W~ of W is computed, let Zk = Zk + Wk~Gi for each row Zk of Z where Wki is the
kth element of Wi and G~ is the ith row of G. Or alternatively, zj -- zi + W,G,i
for each column Zj of Z where Gii is the jth component of Gi.
The validity of these algorithms follows directly from the validity of the A algorithms and B algorithms, and from the admissible options for organizing matrix computations. A special consequence of these methods, which is uniform throughout all calculations of x, W, or Z, is as follows. Corollary 2. The most efficient versions of the A, B, and C algorithms, when either G or H is a matrix, result by storing G by row and storing H by column. Proof. Immediate from the structure of the algorithms. This outcome has noteworthy implications for the unified implementation of the A, B, and C algorithms in the simplex SON procedure. In particular, the identity of G in the simplex SON procedure (when G is a matrix), is characteristically B~2, and the identity of H (when H is a matrix), is characteristically B21 or the augmented matrix C~/" Thus, by Corollary 2, it is preferable to store BI2 by row and B21 by column. This constitutes a departure from the methodology of previous compact basis procedures. That is, the result that the most effective computation arises by storing different components of B differently, provides a new organizational strategy for compact basis methods. With these foundations, we are now ready to specify the complete form of the simplex SON optimization method.
4. Algorithmic steps This section describes in detail the basic simplex operations as they pertain to the LP/embedded network problem. These operations include initialization, checking for optimality, finding the representation of the vector entering the basis, the basis exchange step, and calculating updated dual variable values. As noted in Section 3, it will be assumed that the basis is partitioned as in (1 1), that BN is stored using a master basis tree, and that V -1 is the only portion of the basis inverse that is being kept.
Fred GIover and Darwin Klingman/ The simplex SON algorithm
167
4.1. Initialization An initial basis B for the LP/embedded network problem can be obtained by selecting a set of variables whose columns in A,,, are linearly independent and then augmenting these columns by appending slack or artificial variables to the problem that result in satisfying the constraints (2) and (3). Columns for artificial variables whose unit entries occur in the first m rows can be treated as an augmentation of the A~, matrix. Given an initial basis, Fundamental Exchange Rule 1 can be used to partition the basis so that the dimensionality of BI! is maximum. Next, V=Bzz-B21BT~JB12 is computed by algorithm C3, letting Zo=B22, H = -Bzl, and G = BI2. V l is, then, calculated. Once a basis has been selected and partitioned, the complementary slackness conditions can be used to obtain dual variable values which satisfy:
-[Bu l B12] (w~, Wq)[-~2;-]--~-~] =
(c.,,
CBZ)
(13)
where CB~ and cB2 are the vectors of objective function coefficients associated with x~ and xB2. In expanded form, (13) becomes:
w,,Bu + wqB21 = CBI,
(14)
wmBl2 + wqB2z = ct~2.
(15)
Eqs. (14) and (15) may be rewritten as follows:
w,.Bu = CB1 -- wqB21, wq( B22 - BZlB IIIB21) -- CB2 -- r
(16) HIBr2.
(17)
Noting that V = B 2 2 - B21BIlIB21, (17) can be stated as: Wq ----(CB2- CBIB ulBI2)V -1.
(18)
Wq could be recomputed at future iterations using (18) and algorithm C2 to find -cBIB~IIB~2. However, as q (the number of constraints in (3)) increases, the number of operations involved in this process becomes prohibitive. Thus, for large q, it is better to treat Wq as an extra row of V -~ and update it at each iteration. Given an initial value of Wq, (16) provides an excellent framework for the calculation of w,,. Once the right-hand side of (16) has been calculated, wm may be computed by algorithm B2.
4.2. Determination of optimality Let rPm,J] PJ = L Pq,j J
168
Fred Glover and Darwin Klingman/ The simplex SON algorithm
denote the jth column vector of the coefficient matrix A, where P,,,i is associated with constraints (2), and Pq,i is associated with constraints (3). Optimality of both the primal and dual solution occurs when the dual solution (as well as the primal solution) is feasible. A determination of this can be made by first calculating the updated objective coefficients c* associated with each primal column vector as follows: c*j = cs -(wmPm.i + wqPq,j).
(19)
Dual feasibility is achieved when the following conditions are satisfied for all j: c~---0,
xi --- 0,
c~=0,
xj--basic,
(20)
c~ -<0, , xi = ui. If any one of the conditions in (20) is not satisfied, then the associated primal column vector may be selected to enter the basis. 4.3. Finding the representation of the entering vector
Let
Ps
___ rP,.:]
L Pq,sJ
denote the column vector selected to enter the basis matrix. The representation
a=rLO~B2J aB,] of P~ in terms of B must be computed so that the vector to leave the basis can be determined. This computation involves solving the following system of equations: aBl = B l~P,~.s + BlIIBl2V-lB21BTlIPm,s - BTIlBi2V-IPq,s,
(21)
as2 = V-I(-B21B {~P,,,s + Pq,s).
(22)
Substituting t~s2 into (21) yields aiti
= B ll(p,~,s -/12ORB2).
(23)
It is thus efficient to first calculate as2 and then to use this result to find aal. To compute aB2, first compute the quantity within the parentheses of (22) by algorithm C1 and then multiply by V-L To compute aBl, use algorithm A3. Using this method, a ratio test should be performed immediately on aB2 before the computation of asl is made since degenerate pivots occur frequently in network problems and, therefore, unnecessary computations could be avoided. 4.4. The basis exchange step
Let
Fred Glover and Darwin Klingman/ The simplex SON algorithm
169
denote the vector selected to leave the basis. This column is identified by the minimum ratio calculation. The updated form of this column, B-~Pr, is a unit vector, whose non-zero entry occurs in the pivot row. To execute the basis exchange step, it is necessary to identify the segment of this row in B -~ that affects the updating of V J. There are two cases: (I) The outgoing variable is an element of xB2. The relevant segment of the pivot row is simply the row of V ~that corresponds to the outgoing variable. (2) The outgoing variable is an element of xB~. In this case, the pivot row segment that affects the updating of V -~ does not lie in V -~ itself, but is contained in the upper right quadrant of the partitioned B -~. Consequently, from (12), this row segment has the form -eiB ~lIB J2V -l
(24)
where el is the unit row vector with a 1 in the position corresponding to the row of B -~ associated with the outgoing variable. The computation of (24) may be accomplished by first computing -eiB~B~2, using algorithm C2 (and algorithm B1 in the initial step). The result is then multiplied by V -~. Once the appropriate pivot row segment is thus determined, the updating of V -~ (and of wq) proceeds in the customary manner by a pivot step restricted to this portion of B -~. The final type of operation of the simplex SON method is the modification of V -1 by the addition or deletion of a row or column. 4.5. Changing the dimensionality of V -~ The dimensionality of V ~ is determined by the dimensionality of xB2 (hence also of x~), and this in turn rests on the application of the Fundamental Exchange Rules. As previously noted, the initial composition of x~j and xB2 can be determined by successive application of Rule 1 to maximize the dimension of BN, and consequently to minimize the dimension of V ~. Once the simplex SON method begins with this condition satisfied, Theorems 1 and 2 imply that the basis exchange step can maintain this condition at each iteration by the transfer of at most one element between XBl and xB2 (depending on the configuration of the master basis tree, which may be modified by the addition of an arc corresponding to the incoming variable or by the deletion of an arc corresponding to the outgoing variable). In particular, by applying Rule 1, 2, or 3 of the Fundamental Exchange Rules to the incoming and outgoing variables (if one or both of them correspond to arcs), and by applying Rule 1 to a potential transfer variable (an element of xB2 corresponding to an arc)--or, if appropriate, applying Rule 3 to the transfer variable and the outgoing variable--the resulting reconfiguration of the master basis tree will automatically maximize the dimension of B~ and minimize the dimension of V ~. In each case, V -* will be modified by the addition or deletion of at most one row and at most one column (in each combination that leaves V -~ square). The objective, then, is to show how this modification of V -~ can be brought about.
Fred Glover and Darwin Klingman/ The simplex SON algorithm
170
The deletion of a row or column of V -~ may be accomplished in a straightforward manner. Therefore we address the operations involved in the addition of such a vector. (In each case, the additions should take place before the execution of the pivot steps, allowing the pivot step to determine their newly updated forms.) A d d i n g a row to V -~. The only row that may be added to V -~ is that of the pivot row, and then only if it does not already lie in V l (i.e., only if the outgoing variable is an element of xB~). The way to identify and calculate this updated row is by (24). Note that the addition of a row to V -1 does not entail any additional computation, since in this case the form of the pivot row must be determined in any event. A d d i n g a column to V -~. Any column of B -~ not already overlapping V -~ must lie in the left half of the partitioned basis inverse (12). Consequently, the updated and original form of this column is given, respectively, by 0 and
= [0BI] L0~2J
]
where the unit element of ek and ej occur in the same position. With this identification, we may compute 0 from the explicit representation of the partitioned basis inverse (12) in precisely the manner used earlier to compute the basis representation a of the column for the incoming variable. In particular, the form of 0 is obtained by substituting 0Bj, OB2, ek and 0, respectively, for abe, aB2, Pm,s and Pq.~ in (22) and (23). This yields 0BI = B ~ll(ek-- B 120B2)
(25)
OB2 = V-I B21B ~llek.
(26)
and The calculation of 0B2 proceeds by first applying algorithm C1 to compute B21B~jlek (and using algorithm A1 in the initial step), followed by multiplication by V -l. Then, unless both a row and a column are simultaneously to be added to V -1, it is unnecessary to compute 0BI since 0B2 is the portion of B -1 required. On the other hand, if a row is to be added to V -~ as well as a column, it is necessary to compute the pivot row element in 0~. Thus, letting e~ denote the unit row vector as previously defined in the generation of the updated pivot row, this element of 0B~ is given by eiB IIl( ek -- B tzOB2).
(27)
This can be computed by algorithm C2, using algorithm B1 in the first part, and noting that the matrix stipulation for G may be replaced by the stipulation that G is a column vector. The use of the A, B, and C algorithms in these calculations materially accelerates the steps of the simplex SON method, following the indicated prescriptions.
Fred Glover and Darwin Klingman/ The simplex SON algorithm
171
5. Implementation and computational testing 5. I. Implementation We have implemented a preliminary FORTRAN version of the simplex SON method for capacitated LP problems where Amp = 0. This in-core code, called NET/LP, employs super-sparsity (i.e., it stores only the unique non-zero elements of A) and keeps V -~ in product form. NET/LP also employs the predecessor, thread, reverse thread, cardinality, and last node functions. NET/LP has been run on a CDC 6600, a CYBER-74, and an AMDAHL V-6. The computer memory space utilized by NET/LP depends on several factors including (i) the computer, (ii) the number of unique non-zeros in the original problem, (iii) the number of unique non-zeros in the ETA file. Consequently, it is impossible to specify in simple terms an exact formula for the amount of memory required by the program. An approximate formula for IBM 370 and AMDAHL computers is 46 bytes per network row, 40 bytes per LP row, 12 bytes for each arc variable (i.e., for each element of x,) plus 4 bytes for each non-zero coefficient in its LP rows, and 8 bytes for each non-arc variable (i.e., for each element of xp) plus 4 bytes for each non-zero coefficient in its LP rows. In addition, NET/LP keeps a variably dimensioned working space array which contains the pool of unique non-zero elements of A and V -t. NET/LP first optimally solves the network portion of the LP/embedded network problem. This optimal network basis is then augmented by appropriate slack or artificial variables to form a starting basis for the entire problem. During the solution of the network problem, NET/LP employs the modified row minimum start [19], and the standard Phase I-II method for handling artificial variables. If the optimal network basis is augmented by artificial variables, NET/LP minimizes the sum of infeasibility in Phase I for the full LP/embedded network problems.
5.2. Computational testing In order to evaluate the computational merits of NET/LP, we tested the following three classes of problems: (i) pure network, (ii) GUB/LP problems, and (iii) embedded network/LP problems where Amp = 0 and m is large relative to q. The first class, pure network, was selected in order to determine the relative efficiency of NET/LP to a state-of-the-art special purpose code for solving network problems. To conduct our comparison we used the network code PNET-I of [19] and modified its pivot criterion to correspond to that of NET/LP. Our analysis disclosed that PNET-I is only twice as fast as NET/LP on pure network problems. This is surprising in view of the fact that PNET-I is 150--200 times faster than commercial LP codes such as APEX-III and MPSX370. However, since NET/LP is designed to exploit network structures, it is
172
Fred Glover and Darwin Klingman/ The simplex SON algorithm
relevant to identify the reasons for the difference in speed between NET/LP and PNET-I. These reasons are as follows: (1) NET/LP uses double precision flowing point arithmetic while PNET-I uses integer arithmetic. (2) NET/LP, due to its use of super-sparsity and its ability to solve any general LP problem, stores the original problem data in a more complex manner than PNET-I. Consequently, NET/LP requires more time to access the original problem during basis exchanges and pricing operations. (3) During the solution of the network problem, NET/LP uses the standard Phase I-II method, while PNET-I uses the BIG-M method is substantially more efficient for pure network problems. Because of these factors, the difference in solution speed between NET/LP and P N E T - I is much less than might be expected. In general, we found that NET/LP is able to solve network problems with 2000 nodes and 9000 arcs in less than 15 seconds using a FORTRAN G compiler on an AMDAHL V-6. The second class of problems, GUB/LP problems, was selected because generalized upper bound constraints can be viewed as a very simple form of network constraints. Further, since the GUB feature has been dropped from most of the major commercial LP codes, we felt that some evaluation of NET/LP on GUB problems would be of interest to practitioners. Our test problems were furnished by a major airplane manufacturer. The GUB portion represents the assignment of plane-types to routes. The typical problem contained 80 GUB rows, 14 non-network rows, and 130 arc variables. On these small problems NET/LP was four times faster than MPSX/370. The third class of problems is the one which NET/LP is specifically designed to solve. Most of our computational experience with this problem class is for real problems which were solved by Agrico Chemical Company. These problems involve the determination of optimal production and distribution schedules, and were solved on an AMDAHL V-6 using a FORTRAN G compiler. Unfortunately, it was not possible to compare NET/LP on these problems to another code because of the proprietary nature of the problem data. (In addition, it is difficult to obtain free computer time on an IBM 370/168 or an AMDAHL V-6 for the purpose of benchmarking against MPSX, MPSX-370, or MPS-III.) Table 1 contains typical solution statistics on Agrico's three largest product models. Subsequent to this testing, Agrico acquired a FORTRAN H compiler. Agrico has determined that the FORTRAN H compiler reduces total run time, including all I/O by approximately 45%. Consequently, the times in Table 1 would probably be reduced by 45% using the FORTRAN H compiler. Furthermore, Agrico's comparison of PNET-I and NET/LP using the H compiler indicates that NET/LP is only 20% slower than PNET-I. In order to provide some comparison of NET/LP to a commercial LP code on the third class of problems, we solved one randomly generated problem on the
Fred Glover and Darwin Klingman/ The simplex SON algorithm
173
Table 1 Solution statistics on NET/LP
Number of constraints Number of network rows -=--m
Number of Variables
NET/LP*
Number of non-network rows -= q
3179 3442 6192
20 6 10
Arcs - n
Non-arcs ~- p
Total time**
tota~ pivots
15 831 21 898 21 939
40 12 20
103 seconds 180 seconds 351 seconds
10 248 17 817 10 345
* AMDAHL V-6 with FORTRAN G compiler. ** Including all I/O processing.
Table 2 NET/LP VS. APEX-Ill
Number of constraints Number of network Number of non-network rows -= rn rows ~- q 1000
Number of Variables
Arcs --- n
1
5000
NET/LP
APEX-Ill
$ 3. I 1*
$210.68"*
Non-arcs --- p 1
* AMDAHL V-6 with FORTRAN G compiler. ** Including all I/O processing.
CYBER-74 computer using NET/LP and APEX-Ill. Table 2 contains the problem specifications and indicates that this problem can be solved at least 70 times less expensively with NET[LP than with APEX-III. While the above computational testing is quite limited, it indicates that the simplex SON algorithm may be extremely efficient for solving large embedded network/LP problems. At this point, however, we would caution the reader that it would be premature to extrapolate these results to other problems and in particular, to other problem classes. An exhaustive computational study is required before any general inferences can be drawn. Acknowledgments We wish to acknowledge many helpful conversations with Harvey Greenberg during the development of the simplex SON procedure, particularly dealing with potential applications. We are also grateful to Leon Lasdon, John Mulvey, and Richard O'Neill for their valuable editorial comments on an earlier version of this paper. Furthermore, we would like to thank David Karney, who did the major part of the implementation and computational testing reported in the final section.
174
Fred Glover and Darwin Klingman/ The simplex SON algorithm
References [1] A. Ali, R. Helgason, J. Kennington and H. Lall, "Solving multi-commodity network flow problems", Technical Report IEOR 77015, Southern Methodist University (1977). [2] Analysis, Research, and Computation, Inc., "Development and computational testing on a capacitated primal simplex transshipment code", ARC Technical Research Report, Austin, TX. [3] A.A. Assad, "Multicommodity network flows--Computational experience", Working Paper No. OR 058-76, Massachusetts Institute of Technology (1976). [4] M.D. Bakes, "Solution for special linear programming problems with additional constraints", Operational Research Quarterly 17 (4) (1966) 425-445. [5] R. Barr, F. Glover and D. Klingman, "Enhancements of spanning tree labeling procedures for network optimization", INFOR 17 (1) (1979). [6] R. Barr, F. Glover and D. Klingman, "The alternating basis algorithm for assignment problems", Mathematical Programming 13 (1977) 1-13. [7] J.M. Bennett, "An approach to some structured linear programming problems", Operations Research 14 (1966) 636-645. [8] G. Bradley, G. Brown and G. Graves, "Design and implementation of large-scale primal transshipment algorithms", Management Science 24 (1977) 1-35. [9] A. Charnes and W. Cooper, Management models and industrial applications of linear programming, Vols. I and II (Wiley, New York, 1961). [10] A. Charnes, F. Glover, D. Karney, D. Klingman and J. Stutz, "Past, present and future of large-scale transshipment computer codes and applications", Computers and Operations Research 2 (1975) 71-81. [11] H. Chen and C.G. DeWald, "A generalized chain labeling algorithm for solving multicommodity flow problems", Computers and Operations Research 1 (1974) 437--465. [12] J.E. Cremeans, R.A. Smith and G.R. Tyndall, "Optimal multi-commodity network flows with resource allocation", Naval Research Logistics Quarterly 17 (1970) 269-280. [13] J.E. Cremeans and H.S. Weigel, "The multicommodity network flow model revised to include vehicle per time period and node constraints", Naval Research Logistics Quarterly 19 (1972) 77-89. [14] H. Crowder, M. Held and P. Wolfe, "Validation of subgradient optimization", Mathematical Programming 6 (1974) 62-88. [14a] G.B. Dantzig, "Upper bounds, secondary constraints, and block triangularity in linear programming", Econometrica 23 (1955) 174--183. [15] G.B. Dantzig and R.M. Van Slyke, "Generalized upper bounding techniques", Journal of Computer and System Science 1 (3) (1967) 213-226. [16] L.R. Ford and D.R. Fulkerson, "A suggested computation for maximal multi-commodity network flows", Management Science 5 (1) (1958) 97-101. [17] J. Gilsinn and C. Witzgall, "A performance comparison of labeling algorithms for calculating shortest path trees", NBS Technical Note 772, U.S. Department of Commerce (1973). [17a] F. Glover, "Compact LP bases for a class of IP problems", Mathematical Programming 12 (1977) 102-109. [18] F. Glover, J. Hultz and D. Klingman, "Improved computer-based planning techniques", Interfaces 8 (4) (1978) 16-25. [19] F. Glover, D. Karney and D. Klingman, "Implementation and computational study on start procedures and basis change criteria for a primal network code", Networks 4 (3) (1974) 191-212. [20] F. Glover, D. Karney, D. Klingman and R. Russell, "Solving singly constrained transshipment problems", Transportation Science 12 (4) (1978). [21] F. Glover and D. Klingman, "Capsule view of future developments on large-scale network and network-related problems", Research Report CCS 238, Center for Cybernetic Studies, The University of Texas at Austin (1975). [22] F. Glover and D. Klingman, "Finding minimum weight node order constrained spanning trees", In: B. Roy, ed. Combinatorial Programming: Methods and Applications (D. Reidel Publishing Co., Dordrecht, Holland, 1975) pp. 60-71.
Fred Glover and Darwin Klingman/ The simplex SON algorithm
175
[23] G.W. Graves and R.D. McBride, "The factorization approach to large-scale linear programming", Mathematical Programming 10 (1) (1976) 91-111. [24] H.J. Greenberg and R.P. O'Neill, "A computational perspective of PIES", Supply and Integration Analysis Division, Federal Energy Administration (1977). [25] M.D. Grigoriadis and K. Ritter, "A decomposition method for structured linear and nonlinear programs", Journal of Computer and System Sciences 3 (1969) 335-360. [26] J.K. Hartman and L.S. Lasdon, "A generalized upper bounding method for doubly compled linear programs", Naval Research Logistics Quarterly 17 (4) (1970) 411-429. [27] J.K. Hartman and L.S. Lasdon, "A generalized upper bounding algorithm for multicommodity network flow problems", Networks 1 (1972) 333-354. I28] A.R.G. Heesterman, "Special simplex algorithm for multisector problems", Numerische Mathematik 12 (1968) 288-306. [29] R. Helgason and J. Kennington, "A product form representation of the inverse of a multicommodity cycle matrix", Technical Report IEOR 76003, Southern Methodist University (1976). [30] R. Helgason, J. Kennington and J. Lall, "Primal simplex netowrk codes: State-of-the-art implementation technology", Technical Report IEOR 76014, Department of Industrial Engineering and Operations Research, Southern Methodist University (1976). [31] J. Hultz and D. Klingman, "Solving Singularly constrained generalized network problems", Applied Mathematics and Optimization 4 (1978) 103-119. [32] J. Hultz and D. Klingman, "Solving constrained generalized network problems", Research Report CCS 257, Center for Cybernetic Studies, The University of Texas at Austin (1976). [33] R.N. Kaul, "An extension of generalized upper bounded techniques for linear programming", ORC 65-27, University of California, Berkeley, CA (1965). [34] J. Kennington and M. Shalaby, "An effective subgradient procedure for minimal cost multicommodity flow problems", Technical Report IEOR 75010, Department of Industrial Engineering and Operations Research, Southern Methodist University (1976). [35] D. Klingman and R. Russell, "On solving constrained transportation problems", Operations Research 23 (1) (1975) 91-107. [36] S.F. Maier, "A compact inverse scheme applied to a multicommodity network with resource constraints", Technical Report No. 71-8, Operations Research House, Stanford University (1971). [37] R.D. McBride, "Factorization in large-scale linear programming", Working Paper No. 200, University of California (Los Angeles, CA, 1973). [38] R. Saigal, "Multicommodity flows in directed networks", ORC Report 66-24, Operations Research Center, University of California, Berkeley, CA (1966). [39] R. Saigal, "Multicommodity flows in directed networks", Technical Report No. ORC 67-38, Operations Research Center, University of California, Berkeley, CA (1967). [40] M. Sakarovitch, "The multi-commodity maximum flow problem", Technical Report No. ORC 66-25, Operations Research Center, University of California, Berkeley, CA (1966). [40a] L. Schrage, "Implicit representation of generalized variable upper bounds in linear programming", Mathematical Programming 14 (1978) 11-20. [49b] L. Schrage, "Implicit representation of variable upper bounds in linear programs", Mathematical Programming Study 4 (1975) 118-132. [41] V. Srinivasan and G. Thompson, "Accelerated algorithms for labeling and relabeling of trees with applications for distribution problems", JACM 19 (4) (1972) 712-726. [42] C. Swoveland, "Decomposition algorithms for the multi-commodity distribution problem", Working Paper No. 184, Western Management Science Institute, University of California, Los Angeles, CA (1971). [43] C. Swoveland, "A two-stage decomposition algorithm for a generalized multi-commodity flow problem", INFOR 11 (3) (1973) (1973) 232-244. [44] J.A. Tomlin, "Minimum-cost multicommodity network flows", Operations Research 14 (1) (1966) 45-51. [45] J.A. Tomlin, "Mathematical programming models for traffic network problems", Unpublished Dissertation, Department of Mathematics, University of Adelaide, Australia (1967).
176
Fred Glover and Darwin Klingman/ The simplex SON algorithm
[46] D.W. Webber and W.W. White, "An algorithm for solving large structured linear programming problems", IBM New York Scientific Center Report No. 320-2946 (1968). [47] R.D. Wollmer, "Multicommodity networks with resource constraints: The generalized multicommodity flow problem", Networks 1 (1972) 245-263.