ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
GERAD 25th Anniversary Series
w
Essays and Surveys i n Global Optimization Charles Audet, Pierre Hansen, and Gilles Savard, editors
w
Graph Theory and Combinatorial Optimization David Avis, Alain Hertz, and Odile Marcotte, editors
w
Numerical Methods in Finance Hatem Ben-Ameur and Michkle Breton, editors
w
Analysis, Control and Optimization o f Complex Dynamic Systems El-KCbir Boukas and Roland MalhamC, editors
w
Column Generation Guy Desaulniers, Jacques Desrosiers, and Marius M. Solomon, editors
w
Statistical Modeling and Analysis for Complex Data Problems Pierre Duchesne and Bruno RCmillard, editors
w
Performance Evaluation and Planning Methods for the Next Generation Internet AndrC Girard, Brunilde Sansb, and FClisa Vizquez-Abad, editors
Dynamic Games: Theory and Applications Alain Haurie and Georges Zaccour, editors
Logistics Systems: Design and Optimization And& Langevin and Diane Riopel, editors w
Energy and Environment Richard Loulou, Jean-Philippe Waaub, and Georges Zaccour, editors
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Edited by
CHARLES AUDET ~ c o l Polytechnique e de Montreal and GERAD
PIERRE HANSEN HEC Montreal and GERAD
GILLES SAVARD ~ c o l Polytechnique e de Montreal and GERAD
Q - Springer
Charles Audet &ole Polytechnique de Montreal & GERAD Montreal, Canada
Pierre Hansen HEC Mont+al& GERAD Montrkal, Canada
Gilles Savard Ecole Polytechnique de Montrkal& GERAD Montreal, Canada
Library of Congress Cataloging-in-Publication Data A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN-10: 0-387-25569-9 ISBN 0-387-25570-2 (e-book) Printed on acid-free paper. ISBN-13: 978-0387-25569-9 O 2005 by Springer Science+Business Media, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science + Business Media, Inc., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now know or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if the are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
Printed in the United States of America. 9 8 7 6 5 4 3 2 1
SPIN 11053194
Foreword
GERAD celebrates this year its 25th anniversary. The Center was created in 1980 by a small group of professors and researchers of HEC MontrBal, McGill University and of the ~ c o l Polytechnique e de Montreal. GERAD's activities achieved sufficient scope to justify its conversion in June 1988 into a Joint Research Centre of HEC MontrBal, the ~ c o l e Polytechnique de Montreal and McGill University. In 1996, the Universite du Quebec B MontrBal joined these three institutions. GERAD has fifty members (professors), more than twenty research associates and post doctoral students and more than two hundreds master and Ph.D. students. GERAD is a multi-university center and a vital forum for the development of operations research. Its mission is defined around the following four complementarily objectives: w
The original and expert contribution to all research fields in GERAD's area of expertise; The dissemination of research results in the best scientific outlets as well as in the society in general; The training of graduate students and post doctoral researchers; The contribution to the economic community by solving important problems and providing transferable tools.
GERAD's research thrusts and fields of expertise are as follows: Development of mathematical analysis tools and techniques to solve the complex problems that arise in management sciences and engineering; Development of algorithms to resolve such problems efficiently; Application of these techniques and tools to problems posed in related disciplines, such as statistics, financial engineering, game theory and artificial intelligence; Application of advanced tools to optimization and planning of large technical and economic systems, such as energy systems, transportation/communication networks, and production systems; Integration of scientific findings into software, expert systems and decision-support systems that can be used by industry.
vi
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
One of the marking events of the celebrations of the 25th anniversary of GERAD is the publication of ten volumes covering most of the Center's research areas of expertise. The list follows: Essays a n d Surveys i n Global Optimization, edited by C. Audet, P. Hansen and G. Savard; G r a p h T h e o r y a n d Combinatorial Optimization, edited by D. Avis, A. Hertz and 0. Marcotte; Numerical M e t h o d s i n Finance, edited by H. Ben-Ameur and M. Breton; Analysis, Cont r o l a n d Optimization of Complex D y n a m i c Systems, edited by E.K. Boukas and R. Malham6; C o l u m n Generation, edited by G. Desaulniers, J. Desrosiers and M.M. Solomon; Statistical Modeling a n d Analysis for Complex D a t a Problems, edited by P. Duchesne and B. R6millard; Performance Evaluation a n d P l a n n i n g M e t h o d s for t h e N e x t G e n e r a t i o n I n t e r n e t , edited by A. Girard, B. Sansb and F. VBzquez-Abad; D y n a m i c Games: T h e o r y a n d Applications, edited by A. Haurie and G. Zaccour; Logistics Systems: Design a n d Optimization, edited by A. Langevin and D. Riopel; E n e r g y a n d Environment, edited by R. Loulou, J.-P. Waaub and G. Zaccour. I would like to express my gratitude to the Editors of the ten volumes, to the authors who accepted with great enthusiasm to submit their work and to the reviewers for their benevolent work and timely response. I would also like to thank Mrs. Nicole Paradis, Francine Benoit and Louise Letendre and Mr. Andre Montpetit for their excellent editing work. The GERAD group has earned its reputation as a worldwide leader in its field. This is certainly due to the enthusiasm and motivation of GERAD's researchers and students, but also to the funding and the infrastructures available. I would like to seize the opportunity to thank the organizations that, from the beginning, believed in the potential and the value of GERAD and have supported it over the years. These are HEC Montrkal, ~ c o l Polytechnique e de Montreal, McGill University, Universit6 du Qu6bec B Montr6al and, of course, the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Fonds qu6b6cois de la recherche sur la nature et les technologies (FQRNT). Georges Zaccour Director of GERAD
Le Groupe d'ktudes et de recherche en analyse des dkcisions (GERAD) fete cette annke son vingt-cinquikme anniversaire. Fond6 en 1980 par une poignke de professeurs et chercheurs de HEC Montreal engagks dans des recherches en kquipe avec des collkgues de 1'Universitk McGill et de 1'~colePolytechnique de Montrkal, le Centre comporte maintenant une cinquantaine de membres, plus d'une vingtaine de professionnels de recherche et stagiaires post-doctoraux et plus de 200 ktudiants des cycles supkrieurs. Les activitks du GERAD ont pris suffisamment d'ampleur pour justifier en juin 1988 sa transformation en un Centre de recherche conjoint de HEC Montrkal, de 1'~colePolytechnique de Montrkal et de 1'Universitk McGill. En 1996, 1'Universitk du Qukbec & Montr6al s'est jointe & ces institutions pour parrainer le GERAD. Le GERAD est un regroupement de chercheurs autour de la discipline de la recherche op6rationnelle. Sa mission s'articule autour des objectifs complkmentaires suivants : la contribution originale et experte dans tous les axes de recherche de ses champs de compktence ; la diffusion des rksultats dans les plus grandes revues du domaine ainsi qu'auprks des diffkrents publics qui forment l'environnement du Centre ; la formation d'ktudiants des cycles sup6rieurs et de stagiaires postdoctoraux ; rn la contribution & la communautk kconomique & travers la r6solution de problkmes et le dkveloppement de coffres d'outils transfkrables. Les principaux axes de recherche du GERAD, en allant du plus thkorique au plus appliqu6, sont les suivants : rn le dkveloppement d'outils et de techniques d'analyse mathkmatiques de la recherche opkrationnelle pour la rksolution de problkmes complexes qui se posent dans les sciences de la gestion et du gknie ; la confection d'algorithmes permettant la rksolution efficace de ces problkmes ; rn l'application de ces outils & des problkmes posks dans des disciplines connexes & la recherche opkrationnelle telles que la statistique, l'ingknierie financikre, la thkorie des jeux et l'intelligence artificielle ; rn l'application de ces outils & l'optimisation et & la planification de grands systkmes technico-kconomiques comme les systkmes knergB tiques, les rkseaux de t616communication et de transport, la logistique et la distributique dans les industries manufacturikres et de service ;
...
vlll
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
l'intkgration des r6sultats scientifiques dans des logiciels, des systkmes experts et dans des systkmes d'aide & la dkcision transfkrables B l'industrie. Le fait marquant des cklkbrations du 25e du GERAD est la publication de dix volumes couvrant les champs d'expertise du Centre. La liste suit : Essays a n d Surveys i n Global Optimization, kditk par C. Audet, P. Hansen et G. Savard; G r a p h T h e o r y a n d Combinatorial Optimization, kdit6 par D. Avis, A. Hertz et 0 . Marcotte; Numerical M e t h o d s i n Finance, kditk par H. Ben-Ameur et M. Breton ; Analysis, C o n t r o l a n d Optimization of C o m p l e x D y n a m i c Systems, kditk par E.K. Boukas et R. Malhamk ; C o l u m n Generation, 6ditk par G. Desaulniers, J . Desrosiers et M.M. Solomon ; Statistical Modeling a n d Analysis for Complex D a t a Problems, kditk par P. Duchesne et B. Rkmillard ; Performance Evaluation a n d P l a n n i n g M e t h o d s for t h e N e x t G e n e r a t i o n I n t e r n e t , kditit par A. Girard, B. Sansb et F. V&zquez-Abad; D y n a m i c G a m e s : T h e o r y a n d Applications, 6ditk par A. Haurie et G. Zaccour ; Logistics S y s t e m s : Design a n d Optimization, kditk par A. Langevin et D. Riopel; E n e r g y a n d Environment, kdit6 par R. Loulou, J.-P. Waaub et G. Zaccour. Je voudrais remercier trks sinckrement les 6diteurs de ces volumes, les nombreux auteurs qui ont trks volontiers r6pondu & l'invitation des kditeurs & soumettre leurs travaux, et les kvaluateurs pour leur bknkvolat et ponctualitk. Je voudrais aussi remercier Mmes Nicole Paradis, Francine Benoit et Louise Letendre ainsi que M. Andr6 Montpetit pour leur travail expert d'kdition. La place de premier plan qu'occupe le GERAD sur 1'6chiquier mondial est certes due B la passion qui anime ses chercheurs et ses ittudiants, mais aussi au financement et & l'infrastructure disponibles. Je voudrais profiter de cette occasion pour remercier les organisations qui ont cru dks le depart au potentiel et B la valeur du GERAD et nous ont soutenus durant ces annkes. I1 s'agit de HEC Montrkal, 1'~colePolytechnique de Montrkal, 1'Universitk McGill, l'Universit6 du Qukbec & Montrkal et, bien sfir, le Conseil de recherche en sciences naturelles et en gknie du Canada (CRSNG) et le Fonds qukbkcois de la recherche sur la nature et les technologies (FQRNT). Georges Zaccour Directeur du GERAD
Contents
Foreword Contributing Authors Preface 1 Unilateral Analysis and Duality Jean- Paul Penot 2 Monotonic Optimization: Branch and Cut Methods
v vii xi xiii 1
39
Hoang Tuy, Faiz Al-Khayyal, and Phan Thien Thach 3 Duality Bound Methods in Global Optimization Reiner .Horst and Nguyen Van Thoai 4 General Quadratic Programming Nguyen Van Thoai
5 On Solving Polynomial, Factorable, and Black-box Optimization Problems Using the RLT Methodology Hanif D. Sherali and Jitamitra Desai
79
107
131
6 Bilevel Programming Stephan Dempe
165
7 Applications of Global Optimization to Portfolio Analysis Hiroshi Konno
195
8 Optimization Techniques in Medicine Panos M. Pardalos, Vladimir L. Boginski, Oleg Alexan Prokopyev, Wichai Suharitdamrong, Paul R. Carney, Wanpracha Chaovalztwongse, and Alkzs Vazacopoulos 9 Global Optimization in Geometry-Circle Packing into the Square Pe'ter ~ i b o Szabd, r Miha'ly ~ s a b aMarkdt, and Tibor Csendes
211
233
x
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
10 A Deterministic Global Optimization Algorithm for Design Problems Fre'dkric Messine
267
Contributing Authors FAIZAL-KHAYYAL Georgia Institute of Technology, USA faizQisye.gatech.edu VLADIMIR L. BOGINSKI University of Florida, USA vbQufl.edu PAULR. CARNEY University of Florida, USA carneprQpeds.ufl.edu WANPRACHA CHAOVALITWONGSE Rutgers, The State University of New Jersey, USA wchaoval~at~rciQrutgers.edu
TIBOR CSENDES University of Szeged, Hungary csendesQinf.u-szeged.hu STEPHANDEMPE Technical Univ. Bergakademie Freiberg, Germany dempeQmath.tu-freiberg.de JITAMITRA DESAI Virginia Polytechnic Institute and State University, USA j idesaiQvt .edu REINERHORST University of Trier, Germany horstQuni-trier.de HIROSHIKONNO Chuo University, Japan konnoQindsys.chuo-u.ac.jp
MIHALYCSABAMARKOT University of Szeged, Hungary mark0tQinf.u-szeged.hu FRBDBRICMESSINE Universitk de Pau et des Pays de I'Adour, France messineQuniv-pau.fr
PANOSM. PARDALOS University of Florida, USA pardalosQufl.edu JEAN-PAUL PENOT Universite de Pau et des Pays de I'Adour, France Jean-Paul.PenotQuniv-pau.fr
OLEGALEXAN PROKOPYEV University of Florida, USA oap4ripeQufl.edu HANIFD. SHERALI Virginia Polytechnic Institute and State University, USA hanifsQvt.edu WICHAISUHARITDAMRONG University of Florida, USA wichaisQufl.edu POTER GABORSZABO University of Szeged, Hungary pszab0Qinf.u-szeged,hu P H A NTHIEN THACH Institute of Mathematics, VNNCST, Vietnam ptthachQmath.ac.vn HOANGT U Y Institute of Mathematics, Vietnam htuyQmath.ac.vn NGUYENVANTHOAI University of Trier, Germany thoaiQuni-trier.de ALKISVAZACOPOULOS Dash Optimization Inc., USA avQdashoptimization.com
Preface
Global optimization aims at solving the most general problem of deterministic mathematical programming: to find the global optimum of a nonlinear, nonconvex, multivariate function of continuous and/or integer variables subject to constraints which may be themselves nonlinear and nonconvex. Moreover, a proof of optimality of the solution found is also requested. Not surprisingly and due to its difficulty, global optimization has long been neglected and, for instance, not even mentioned in many books of nonlinear programming of the 60's through 80's. The situation has however drastically changed in the last 25 years, in which global optimization has become an important field in itself, with its own Journal of Global Optimization and about 100 books published on theory and applications. The essays and surveys of the present volume are testimonies of the successes obtained by many researchers in this field, the power of its methods, the diversity of its applications as well as the variety of the ideas and research poles explored. A first series of papers addresses mathematical properties and algorithms for general global optimization problems or specialized classes thereof: in his essay "Unilateral Analysis and Duality," Jean-Paul Penot introduces one-sided versions of Lagrangian and perturbations and, using concepts from generalized convexity, presents the main features of duality in a general framework. In "Monotonic Optimization: Branch and Cut Methods" Hoang Tuy, Faiz Al-Khayyal and Phan Thien Thach consider monotonic optimization problems dealing with multivariate monotonic functions and differences of monotonic functions. They derive several new families of cuts. Reiner Horst and Nguyen Van Thoai survey "Duality Bound Methods in Global Optimization" which use Lagrangian duality and brand-and-bound. In another paper, Nguyen Van Thoai also surveys "General Quadratic Programming," focusing on three topics: optimality conditions, duality and solution methods. In their survey "On Solving Polynomial, Factorable, and Black-Box Optimization Problems using the RLT Methodology," Hanif D. Sherali and Jitamitra Desai present a discussion of the Relaxation-Linearization/Conversification Technique applied to several classes of problems. Finally, Stephan Dempe surveys the thriving field of "Bilevel Programming," i.e., hierarchical models in which an objective function is minimized over the graph of the solution set mapping of a parametric optimization problem. A second series of papers addresses applications of global optimization in various fields. Hiroshi Konno surveys "Application of Global
xiv
ESSA Y S AND SURVEYS IN GLOBAL OP T I M I ATION
Optimization to Portfolio Analysis" an area which has been explored extensively with much success. Mean-risk models under nonconvex transaction costs, minimal unit transaction and cardinality constraints are discussed, as well as several bound portfolio optimization problems. Panos Pardalos and co-workers survey recent applications of ('Optimization Techniques in Medecine," a rapidly emerging interdisciplinary research area. They consider problem of diagnosis, risk prediction, brain dynamics and epileptic seizure predictions, as well as treatment planning. P6ter GBbor Szab6, MihBly Csaba Mark6t and Tibor Csendes survey in depth a much studied problem in their paper "Global Optimization in Geometry - Circle Packing into the Square." The main tool used is interval analysis. Finally, Fredkric Messine illustrates the state-of-the art in solving a most difficult problem, i.e., mixed-integer problem with complicated constraints involving trigonometric function, in his essay "A Deterministic Global Optimization Algorithm for Design Problems." The conjunction of interval analysis with constraint propagation techniques within a branch-and-bound framework leads to exact solution of complex electrical motor design problems, which would have appeared completely out-of-reach just a few years ago.
CHARLES AUDET PIERREHANSEN GILLESSAVARD GERAD
Chapter 1
UNILATERAL ANALYSIS AND DUALITY Jean-Paul Penot Dedicated to Jean-Jacques Moreau o n the occasion of his 80th birthday
Abstract
1.
We introduce one-sided versions of Lagrangians and perturbations. We relate them, using concepts from generalized convexity. In such a way, we are able to present the main features of duality theory in a general framework encompassing numerous special instances. We focus our attention on the set of multipliers. We look for an interpretation of multipliers as generalized derivatives of the performance function associated with a dualizing parameterization of the given problem.
Introduction
Unilateral analysis is the study of phenomena or concepts in which one-sided notions arise and for which directions cannot be reverted. Optimization is one of its main streams, since in general one is interested in a minimization process and the maximization question is without interest, or vice versa. Thus optimality conditions are of the same nature. A typical appearance of this point of view has emerged with the notion of subdifferential in convex analysis and its extensions in nonsmooth analysis and generalized convexity. One-sided estimates also play an important role in some practical methods such as the branch and bound method. A number of results in duality theory are also of one-sided essence. Duality theories for optimization problems have received much attention, both from a theoretical point of view and from a computational concern. The aim of the present article is to stress the use of one-sided analysis in duality theories. Specifically, we examine the possibility of obtaining some flexibility in replacing equalities with inequalities in the definitions of Lagrangians and perturbations. Such a step is in accordance with a number of generalizations used in optimiza-
2
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
tion, convex analysis and nonsmooth analysis which led to what is often called unilateral analysis or one-sided analysis. As in recent monographs dealing with generalized convexity (Pallaschke and Rolewicz, 1997; Rubinov, 2000a; Rubinov and Yang, 2003; Singer, 1997), our framework is quite simple and general: we do not suppose the decision space X is endowed with any structure and the admissible (or feasible) set A c X is any subset. Also, we do not impose any special property on the objective function f : X + R of the constrained problem
0")
minimize f (x) subject to x E A
we consider. We give a special attention to the case the feasible set A is the value at some base point Oz of some parameter space Z of a multimapping F : Z 3 X. In such a case (which contains many special situations, in particular mathematical programming problems), a natural perturbation of problem ( P ) is available and a duality theory can be applied provided one disposes of a coupling function between Z and some other space Y. Another classical duality theory is the Lagrangian of hopefully simple functions is used theory in which a family (ty)yEy to approach the objective function f : X + R U {+a) on A or rather the modified function fA = f LA, where LA is the indicator function of A given by LA(X) = 0 if x E A, LA(X) = +a if x E X\A. The two approaches have been compared in some cases (see, for example, Magnanti, 1974; Ekeland and Tkmam, 1976; Penot, 1995a; Rockafellar, 1974b; Rockafellar and Wets, 1997; Tind and Wolsey, 1981, . . . ). But it seems that such a comparison is lacking in a more general framework. We undertake it in Section 4 after a short review of the two schemes mentioned above. That comparison relies on the concepts of conjugacy and generalized (or abstract) convexity, which go back to Fan (1956, 1963) and Moreau (1970); see also Balder (1977), Briec and Horvath (2004), Dolecki and Kurcyusz (1978), Komiya (1981), Penot and Rubinov (2005), Rockafellar and Wets (1997), and Singer (1997). In Section 3, the definitions and results from abstract convexity which will be used in this paper are collected. More information, sometimes in excruciating details, can be found for example in the monographs Pallaschke and Rolewicz (1997), Rubinov (2000a), Rubinov and Yang (2003), and Singer (1997). For the convex case, which serves as a model for the whole theory, we refer to Aubin (1979), Aubin and Ekeland (1975), Auslender (2000), Az6 (1997), Borwein and Lewis (2000), Ekeland and T6mam (1976), Laurent (1972), Hiriart-Urruty (1998), Minoux (1983), Moreau (l97O), Rockafellar (l974b), and Zdinescu (2002) for example.
+
3
Unilateral Analysis and Duality
1
Although the notion of duality and the related notion of polarity (Volle, 1985; Penot, 2000) can be set in a more general framework (Martinez-Legaz and Singer, 1995; Penot, 2000; Singer, 1986), and even in the broad setting of lattices (Birkhoff, 1966; Martinez-Legaz and Singer, 1990; Ore, 1944; Penot, 2000; Volle, 1985), we limit our approach to the case of conjugacies, in the line of Moreau (1970), Balder (1977), Martinez-Legaz (1988b, 1995), Pallaschke and Rolewicz (1997), Penot (1985), Penot and Volle (1987, 1990a), Rubinov and Yang (2003), Singer (1986), and Tind and Wolsey (1981). In Penot and Rubinov (2005) and in some examples presented here, the case in which the parameter space Z is endowed with some preorder is considered. As there, we give a particular attention to the set of multipliers here. In particular, we relate this set to the solution set of the dual problem and to the subdifferential of the performance (or value) function p associated with the perturbation. Here the concept of multiplier is a global one and the subdifferential is a set related to the coupling, and not to a local subdifferential as in nonsmooth analysis. It follows that, on the contrary to the case studied in Gauvin (1982), Gauvin and Janin (1990), Gauvin and Tolle (1977), Penot (1997a,b, 1999), Rockafellar (1974a,b, 1982, 1993), no constraint qualification and no growth conditions are required. Several examples are gathered in Section 5. We end the paper with some observations concerning the links with calmness and penalization. Recent contributions (Aubin and Ekeland, 1975; Bourass and Giner, 2001; Clarke and Nour, 2005; Huang and Yang, 2003; Ioffe, 2001; Ioffe and Rubinov, 2002; Nour, 2004; Penot, 2003c; Rolewicz, 1994, 1996; Rubinov and Yang, 2003, . . . ) show the interest of dealing with dualities in a non classical framework. We hope the.present paper will contribute to the usefulness of such a general approach.
2.
Sub-Lagrangians and multipliers
The Lagrangian approach to duality in optimization is general and simple. It relies on the elementary inequality sup inf L(x, y) 5 inf sup L(x, y) YEY"EX
"EX ~ E Y
a
valid for any function L: X x Y -+ := R U {-GO, +GO). In order to take advantage of this inequality, we suppose that we are given a family (ty)yEy of extended real-valued functions on X such that ty 5 f A := f L A for each y E Y, or, equivalently, setting L(x, y) := Ly(x) := ty(x),
+
4
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Then we say that the bifunction L : X x Y + is asub-Lagrangian of problem ('P). The change of the equality fA(.) = supyEyL(.,y) defining a Lagrangian into an inequality brings some versatility to the approach. We get an idea of the flexibility of the notion of sub-Lagrangian in observing that if f is bounded below by m on A, then for Y := (01, L(x, y) = m and L(x, y) = m LA(X)are sub-Lagrangians, but in general they are not Lagrangians. More generally, if (Ay)yE is a family of Ay and if f is bounded below by my subsets of X such that A c on Ay, then one gets a sub-Lagrangian in setting L(x, y) = my L ~ , ( x ) , for x E X, y E Y. We will see that most properties of Lagrangians can be preserved under appropriate assumptions. The first one is the weak duality property. Introducing the dual functional dL by
+
nyGY
+
we define the Lagrangian dual problem as (DL
maximize dL(y) y E Y
Then, in view of relations (1.1) and (1.2), one has the obvious weak duality inequality (1.4) s u p ( D ~I) inf('P). This inequality may offer useful estimates. One says that there is no duality gap when this inequality is an equality. The no duality gap property may occur for a sub-Lagrangian which is not a Lagrangian. It always occurs when some multiplier is available. Here, the notion of multiplier we adopt is a global notion, not a local one as for the notion of Lagrange or Karush-Kuhn-Tucker multiplier; however, under generalized convexity assumptions, both concepts coincide. The precise definition is as follows; we strive to keep close to the classical notion of convex analysis, in spite of the fact that L is just a sub-Lagrangian. DEFINITION1.1 One says that an element y of Y is a multiplier (for the sub-Lagrangian L) if infxEx L(x, y) = infxEAf (x), or, in other terns, if dL(fj) = inf f A ( X ) . The set of multipliers will be denoted by M (or ML if one needs to stress the dependence on L). A number of studies have been devoted to obtaining conditions ensuring the no gap property or the existence of multipliers (see Aubin, 1979; Moreau, 1964; Balder, 1977; Dolecki and Kurcyusz, 1978; Huang and Yang, 2003; Janin, 1973; Lindberg, 1980; Pallaschke and Rolewicz, 1997; Rockafellar, 1974a; Rubinov et al., 1999a,
1
5
Unilateral Analysis and Duality
2000, 2002; Rubinov and Yang, 2003, for instance). Some forms of convexity or quasi-convexity implying an infsup property is often involved (see Aubin and Ekeland, 1975; Auslender, 2000; Moreau, 1964; Simons, 1998, . . . ). A prototype of such results is the Sion-Von Neumann minimax theorem (Aubin, 1979; Rubinov and Yang, 2003; Simons, 1994; Sion, 1954, 1958) which represents a noticeable step outside the realm of convex analysis. The very definition of a multiplier leads to the following obvious observation. PROPOSITION 1.1 An element y of Y belongs to the set M of multipliers i i and only if, it belongs to the set Sz of solutions to the dual problem (DL) and there is no duality gap. Thus, when M is nonempty, one has M = Sz. When a multiplier is available, one gets the value of problem (P)by solving the unconstrained problem (Qd
minimize L(x, y)
xEX
which is easier to solve than (P) in general. In fact, much more can be expected.
PROPOSITION 1.2 Suppose L is a sub-Lagrangian of ( P ) . If 3 2 X belongs to the set S of solutzons to (P) and zf y E M , then 3 belongs to the set Sg of solutions to (Qg) and L(3, y) = fA(Z). Conversely, given y E Y, if 3 t: Sg and zf L(3,jj) = fA(3),t h e n 3 E S and G E M . Proof. Let Z E S and E M . Then, for any x E X , in view of the assumption fA(3) = inf(P) = dL(y) 5 L(x, jj), one has L(Z,y) 5 fA(3)l L(x, y), so that 3 is a solution to (Qg); moreover, taking x = 3 in these last inequalities one gets L(3, y) = f A (3). Conversely, let y E Y and let 3 be a solution to (Qg) such that L(3, y) = fA(3). Then, for any x E X , one has fA(3)= L(3, y) = dL(y) 5 inf(P) 5 fA(x), so that Z is a solution to (P),in particular 0 Z E A, dL(jj) = inf(P) and y is a multiplier. When the assumptions of the preceding statement are satisfied and when L(3, y) is finite, one can show that (3, y) E X x Y is a saddle-point of L on X x Y in the sense that for any (x, y) E X x Y one has
The knowledge of a multiplier leads to an unconstrained problem, a clear advantage. On the other hand, in many cases the function L(., y) is nonsmooth, even when the data are smooth. Note that for many
6
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
methods of global minimization smoothness is not as important as for a local search. In some cases as in Examples 1, 2 below, the nonsmoothness is rather mild when f and g are smooth, since LC(.,jj) is quasi-differentiable in the terminology of Pchenitchny (1990), i.e. this function has directional derivatives which are sublinear and continuous as functions of the direction. It is well-known that multipliers may exist while the primal problem ( P ) has no solution (Aubin and Ekeland, 1975; Luenberger, 1969, . . .). One of the advantages of the generality of our approach is that it makes clear the symmetry of the situation. Turning the dual problem (DL) into a minimization problem
(7;'
)
minimize - dL(y) y E Y
called the adjoint problem, one can note that -L~ given by -L~ (y, x) := -L(x, y) for (y, x) E Y x X is a Lagrangian of (PE)since supzEx -L(x, y) = -dL(y) for any y E Y. This symmetry has been observed in Balder (1977, p. 337) in the case the optimization problem is of mathematical programming type. Then the bi-adjoint problem is
(Pr>
minimize dLL(x) := sup L(x, y)
x E X.
YEY
When L is a Lagrangian, the bi-adjoint problem is just (P). In the may be simpler than ( P ) and its value is such that general case
(Pr)
sup(DL) 5 inf
(Pr) Iinf ( P ) .
(Pr*)
One always has = ( P i ) . The following statement is a direct application of the preceding proposition to the adjoint problem.
1.1 When sup(DL) = i n f ( P r ) , the multipliers of the adCOROLLARY joint problem (PE) are the solutions of In particular, when L is a Lagrangian and when there is no duality gap, the multipliers of the adjoint problem (P;) are the solutions of ( P ) .
(Pr).
An illustration of the sub-Lagrangian theory is offered by the familiar case of linear programming, in which X is a normed space, f is a continuous linear form c on X and the constraint set is given by
where A is now a continuous linear map from X to another normed space 2, >_B, >C being preorders in X and Z defined by closed convex cones B and C respectively, Y := Z* and b E 2 . When the polar cone
1
7
Unilateral Analysis and Duality
C0 := {y E Y : 'dz E C (y,z) 5 0) is difficult to deal with, one may replace it with a smaller cone D and use the sub-Lagrangian L given by L(x,y) := (c,x)
+ (y,b-Ax)
for (x,y) E B x D ,
L(x, y) := +oo for (x, y) E (X\B) x Y, L(x, y) := -oo for (x, y) E B x (Y\D). Then, the dual problem turns out to be of a similar form: maximize (y, b)
ATy 2 B C,~y
2~ 0.
The possibility of choosing D in different ways (for instance in taking for D a finite dimensional polyhedral cone) yields estimates which can be easier to get than the ones obtained by taking D = C0 for the usual Lagrangian. On the other hand, it may be more difficult to find multipliers.
3.
Sub-pertusbations and generalized convexity
Another means to get duality results is the theory of perturbations. It relies on some notions of generalized convexity we recall now. Then we will show that a one-sided character can be given to the theory. Given two sets Y, Z paired by means of a coupling function c : Y x Z + Ik := R U {-oo, +oo) one defines the Fenchel-Moreau conjugacy p H pC from IkZ to RY by setting
and the reverse conjugacy q H qCfrom IkY to
IkZ
+
by setting analogously
+
(+oo) = Throughout we adopt the convention (+oo) (-m) = (-oo) +oo and the rule r - s = r (-s); we also assume that sup0 = -oo and inf 0 = +oo. A first appearance of duality arises from the inequality p 2 pCC:= (pC)C. Another way to arrive at duality relationships consists in using the of functions on Z (which could be called c-linear or family ( c ( ~ , generalized linear) and the associated family Ac := := ~ ( y.), r : y E Y, r E R) of c-afine functions (or generalized affine functions). Functions p E RZ which can be represented as suprema of families A,(p) c A, will be called c-convex; we write p E r c ( Z ) :
+
8
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
The Fenchel-Moreau theorem relates the two approaches.
THEOREM 1.1 For any function p E RZ, the biconjugate pCCcoincides with the A,-convex hull coA,(p) of p given by:
Given some 2 E 2, the function p is said to be c-convex at 2 if the equality pCC(2)= p(2) holds. When p(2) is finite, one has pCC(2)= p(2) and the supremum in (1.7) with z = Z is attained if, and only if, the c-subdifferential aCp(z)is nonempty, where the c-subdiflerential of p at 2 is defined by
The idea of using generalized convexity in the study of Lagrangians has been extensively exposed and exploited (Balder, 1977; Dolecki and Kurcyusz, 1978; Huang and Yang, 2003; Janin, 1973; Horvath, 1990; Kurcyusz, 1975; Lindberg, 1980; Martinez-Legaz, 198813; Moreau, 1970; Pallaschke and Rolewicz, 1997; Penot, 1985; Penot and Volle, 1990a; Rockafellar, 1974a; Rubinov et al., 1999a,b, 2000, 2002; Rubinov and Simsek, 1995a,b, 2000; Rubinov and Yagublov, 1986; Rubinov and Yang, 2003, . . . ). The perturbational approach assumes that one disposes of a parameter space Z and a function P : X x Z --+ Ik which represents a perturbation of the given problem in the sense that for some base point Oz of the parameter space Z one has P ( x , Oz) = fA(x) for each x E X, where as above, fA is the extended objective function given by f A := f LA. Here we relax this condition in calling a function P : X x Z -+ such that P ( x , Oz) 5 fA(x) for each x E X a sub-perturbation of ( P ) . We associate to P the performance (or value) function p given by
+
<
The relation: pCC(Oz) p(OZ), leads to the perturbational dual problem (Vp)
maximize dp(y) := -
- c(y, 02)) over y E Y,
and agai,n we have the weak duality inequality sup(Dp) I inf (P). Under the following finiteness assumption
(F)
c(., Oz)
is a finitely valued function
Unilateral Analysis and Duality
1
the objective dp of (Dp) can be written
If furthermore one assumes the simplifying assumption (s)
c(y, Oz) = O for all y E Y,
the objective function of (Dp) becomes -pC. The expression of dp in (1.10) is also valid when the following assumption holds:
( F ' ) c(., Oz) takes its values in R U {-oo), one of them is finite and ~ ( O Z )< too. In such a case, for each y E Y one has pC(y) > -oo and C(Y,02)) = C(Y,02) - pC(y). Transforming the problem (Dp) into a minimization problem, we get the perturbational adjoint problem
(pa
minimize pC(y) y E Y.
Again, a certain symmetry appears since one can associate to that problem a natural perturbation Q: Y x Z -t given by Q(y, x) := pC(y)c(y, x). Then, the performance function q associated with this perturbation is just -pCC: for each z E Z one has q(x) := infyE Q(y, z ) = inf,,y (pC(y)- ~ ( ~ 1 = 4 )-pCC(z). We note that when (F)is satisfied, we can reduce the situation to the case when (S) holds. Indeed, if (S) does not hold, we can replace the function c with the function co given by co(y,z) = c(y, z) - c(y, Oz). We have
Hence, the objectives of the dual problems with respect to c and co coincide: c,,(', Oz) - pCO = -pCO(.) = c(., Oz) - pC(.). (a)
Since under assumption (Fj (resp. (S))one has respectively
pCC(Oz)= sup
inf
( P ( x , z) - c(y, 2)).
y ~( xy, ~ ) E X X Z
10
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Such a relation may incite to look for a coupling of X x Z with another space as in Ekeland and T6mam (1976) and elsewhere. However, we prefer to stress the fact that no structure is required on X . Let us relate the set ST, of solutions to (Dp) to the c-subdifferential at Oz of p, or rather of its' biconjugate pCC,as defined in (1.8). The following result is well known in the convex case (see Rockafellar, 197413). In the case of a general coupling c, it is given in Penot and Volle (1990a, Prop. 6.1); we present the proof for completeness.
PROPOSITION 1.3 Suppose the value of (Dp) is finite and assumption (F)holds. Then the set S$ of solutions to (Dp) coincides with aCpCC(Oz). If there is no duality gap, then S ; = aCp(Oz).
a
Proof. In view of the rules we have adopted for the extension to of the addition in R, for any function f : Z -+ @ finite at some w E Z we have
Thus, under assumption (F),one has f j E dCp(Oz) iff -pC(y) 2 p(Oz) c(fj,Oz) and since pCCC = pC,one has y E aCpCC(Oz) iff -pC(y) 1 pCC(Oz)c(y, 02). Now y E ST, iff pC(y)- c(y, Oz) 2 pC(y)- c(fj,Oz) for every y E Y, iff -pCC(Oz)2 pC(fj)- c(fj,Oz) iff -pC(y) pCC(Oz)- c(fj,Oz) iff fj E aCpcc(Oz). When there is no duality gap, or more generally when pCC(Oz)= p(Oz), one has aCpCC(Oz) = aCp(Oz).
>
Passages between the two approaches
4.
Our aim now is to compare the two approaches. In order to do so, we will assume a coupling function c : Y x Z -+ is given. We also assume a base point Oz has been chosen and that (F)holds, i.e. that c(y, Oz) is finite for all y E Y.
a
From the perturbational approach to the Lagrangian approach Given a sub-perturbation P : X x Z -+ of (P)and x E X , we denote
4.1
by Px the function P ( x , and by P,C := (PX)'its conjugate function; similarly, for a function L : X x Y -+ we set Lx := L(x, a)
a
a).
PROPOSITION 1.4 Let P be a sub-perturbation of (P). Then the function L given by L(x, y) := zEZ inf ( ~ ( xz), - c(y, z )
+C(~,OZ)),
x E X, y E Y,
(1.11)
1
Unilateral Analysis and Duality
11
or L ( x , y ) := c(y,Oz) - P,C(y), is a Lagrangian of the relaxed problem
(Pcc)consisting i n minimizing over X the function P,CC(Oz). Thus L is a sub-Lagrangian for ( P ) . Moreover, the objective functions of the dual problems associated with P and L coincide. Thus, the two dual problems have the same values and the same sets of solutions. When the simplifying assumption (S)holds, the Lagrangian L satisfies the simple relation Lx=-P,C YXEX. (1.12) Note that L is not necessarily a Lagrangian for ( P ) , even when P is a perturbation of ( P ) ;thus, our presentation of duality is somewhat more natural than the usual one.
Proof. The first assertion follows from the equalities
Thus
and L is a sub-Lagrangian for ( P ) . Now the dual objective function associated with L is given by
d L ( y ) := inf L ( x , y ) = xEX
= Zinf EZ
inf ( x , z ) ~ X x(Z ~ ( xz ), - c ( y , 2)
( ~ ( 2 ) C(Y,
z))
+ c ( y ,0 2 ) )
+ C ( Y , Oz) = c ( y ,O Z ) - pC(y)= d p ( y ) .
In the following corollary we recover a classical statement. It stems from the fact that under its assumptions one has equality in (1.13).
COROLLARY 1.2 If 19 is a perturbation o f , ( P ) and i f for each x E X one has P,CC(Oz) = Px(Oz), then the function L given by (1.11) is a Lagrangian of ( P ).
4.2
From the Lagrangian approach to the perturbational approach
We have shown that any perturbational dual problem can be considered as a Lagrangian dual. There is a reverse passage; which is described as follows. We still assume the coupling function c satisfies condition ( F ) .
12
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
PROPOSITION 1.5 For any sub-Lagrangian (resp. Lagrangian) L : X x Y -+ of ( P ) , the function P defined by the following formula is a sub-perturbation (resp. a perturbation) of ( P ) : P ( x ,z ) := - inf ( c ( ~O z,)
- c(y,z ) - L ( x ,y ) ) .
YEY
Moreover, the dual objective functions d L and d p associated with L and P respectively are such that d L 5 d p . More precisely, when ( S ) holds, one has (1.14) Px = ( - L x ) C . and d p = d K , where K is the sub-Lagrangian function of ( P ) given by K ( x 7Y ) := - ( - L x ) C C ( ~ ) . Proof. The first assertion stems from the cancelling o f c(y,O z ) when taking z = O z in the formula giving P. The rule for the extended addition we adopted shows that for any (x,y , z ) E X x Y x Z we have
(see also Moreau, 1970, Prop. 3c). Hence, for any y E Y , ~ P ( Y := ) C(Y,
Oz) - p C ( y ) := c(y,0 2 ) + Zinf E Z ( p ( z )- c(y,z ) ) ,
= c(y,Oz)
+ xinf inf ( ~ ( zx),- ~ ( yz ), ) , EX ZEZ
2 inf L ( x ,y ) = d L ( y ) . xEX
Suppose now that ( S )holds. Then, clearly, Px = (- LX)' for each x E X . Thus, for each y E Y , one has d~ ( Y )
-pC(y) =
2 ~ $L5
= inf -(-L,)"(y) xEX
(PX
( z )- ~ ( yz ), ) = xinf -Pz(y) EX
= inf K ( x ,y ) = d K ( y ) . XEX
The fact that K is a sub-Lagrangian o f ( P ) when ( S ) holds derives from the observation that constant functions on Y are c-convex, so that hCC2 b i f h E EY and b E R are such that h 2 b; in particular, for each x ~ x o n e h a s - K ( X , ~ ) = ( - L ~ ) ~s ~ i n~c e-- ~ L x~>(- fXA )( x ) . The next assertion is a consequence of the relation K x = Lx when -Lx is c-convex for x E X , what amounts to -L, = (-Lx)CC.
COROLLARY 1.3 If ( S ) holds and if for each x E X the function -Lx := - L ( x , .) is c-convex, then, for the sub-perturbation P associated with the sub-Lagrangian L, one has d L = d p .
1 Unilateral Analysis and Duality
13
One may wonder what happens when one makes successively the two passages described a,bove. For simplicity, we use assumption (S). Let us start first with a sub-Lagrangian L. Then we denote by LP the sub-Lagrangian associated with the perturbation P deduced from L via relation (1.14): for any (x, y) E X x Y, by (1.12), one has L ~ ( XY), := -P,"(y), hence L ~ ( XY) , = -(-L,)"(y) = K ( x , y). Now let us start with a sub-perturbation P and let us denote by pLthe sub-perturbation associa,ted with the sub-Lagrangian L deduced from P via relation (1.12): for any (x, z) E X x Z , by (1.14), one has p L ( x ,2) := (-L,)'(Z), hence p L ( x ,x) = P,CC(x). Thus, when (S) holds, it is possible to characterize the family of subperturbations which are obtained from a sub-Lagrangian and the subLagrangians which are obtained from a sub-perturbation via the preceding processes.
1.4 When (S) holds, a sub-perturbation P is obtained from COROLLARY i for each x E X one has P, = PC , C and a sub-Lagrangian if, and only z a sub-Lagrangian L is obtained from a sub-perturbation i i and only if, for each x E X one has L, = -(-Lx)CC. Proof. When P is deduced from L by the preceding process, for each hence PC , C = (-L,)CCC = (-L,)' = P,. x E X one has P, = (-L,)', Conversely, when for each x E X one has P, = P,CC,setting L, := -Pi, one has P, = (--L,)' and P is deduced from L by the preceding process. The proof of the second assertion is similar.
4.3
Perturbing a constraint
Let us illustrate the passages we have just described by an important example as in Balder (1977), Dolecki and Kurcyusz (1978), Kurcyusz (1975), and Pallaschke and Rolewicz (1997, 1998), in which the admissible set A is the value F(Oz) at some reference point Oz of a parameter space Z of some multimapping F : Z 4 X representing a means to perturb A into another feasible set. Such a formulation is quite versatile since it can be adopted whenever the feasible set depends on some parameters. Let G : X 4 Z be the inverse of F, so that A = {x E X : Oz E G(x)). Then it is natural to study the perturbed problem
(Pz1
minimize f (x) subject to x E F (2).
Then, its performance function p, given by p(x) := inf(f (x) : x E F ( z ) ) = inf(f (x): x E G(x))
14
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
is associated to the perturbation P given by
where LS is the indicator function of a set S. Assuming (F) holds, the dual problem consists in maximizing the dual function
From Proposition 1.4, this dual function coincides with the dual function dL deduced from the sub-Lagrangian L associated with the perturbation P which is given by
where the complementary function k is defined by k(x, y) := c(y, Oz) - sup{c(y, z ) : z E G(x)),
(x, y) E X x Y. (1.16)
The supremum in the second term can be called the c-support of G(x); it does not depend on the objective function f . When (S) holds and G(x) := C - g(x), where C is a subset of Y, g : X -+ 2, Z is a vector space and when c(y, .) is additive, the sub-Lagrangian takes the familiar form L(x, Y) = f (4+ C(Y, g ( 4 ) - sup{c(y, 4 : z E C). The following proposition gives a criterion in order that L be a Lagrangian. It is close to Dolecki and Kurcyusz (1978, Prop. 7.5), although the assumptions and the conclusion are different. As there, we say that a subset C of Z is c-convex if C is either Z or an intersection of chalf spaces, i.e. of sets of the form {z E Z : c(y, z) 5 r) for some (y, r) E Y x R. When the indicator function LC is c-convex, C is clearly c-convex.
PROPOSITION 1.6 Let ( P z ) ,the perturbation P and the sub-Lagrangian L be as above. Suppose that (S) holds and that for any y E Y and any X > 0 there exist y', y" E Y such that c(yf,.) 5 Xc(y, -) 5 c(yU,.). If for each x E X the set G(x) is a c-convex proper subset of 2 , then L is a Lagrangian. Proof. For each x E A = F(0) we have Oz E G(x), hence k(x, y) 5 0 for each y E Y. Since G(x) # Z , there exist some y E Y, r E R such that G(x) c {z : c(y, z) r ) . Ther? k(x, y) -r. Using our assumption on c, given F > 0, replacing y by some y' corresponding to some X 5 ~ / r , -E, hence SUPyEyk(x, y) = 0 for x E A. NOWfor we get k(x, y')
<
>
>
1
Unilateral Analysis and Duality
15
x E X \ A we have Oz 6 G(x), so that there exists y E Y and r E R such that 0 = c(y, Oz) > r 2 sup{c(y, z ) : z E G(x)). Replacing y by some y" E Y, it follows that k(x, .) takes values as large as required. Thus 0 SUP,, k(x, y) = +cm and L is a Lagrangian. In Dolecki and Kurcyusz (1978, Prop. 7.5) the set {c(y, +):y E Y) is supposed to be stable by homotheties and addition of constants; under F(x) = 0 it is proved that infZExSUP,, L(x, y) = the assumption inf (P), a weaker conclusion.
nzEz
5.
Examples of duality schemes
Let us present various examples showing the versatility of the two approaches. EXAMPLE1 (CONVEXDUALITY) Suppose Y, Z are locally convex topological vector spaces in duality. Then one can take for c the usual coupling (., .) and one recovers the familiar convex duality schemes. In particular, when (P) is given as in the preceding subsection, with G(x) = C -. g(x), with C a closed convex cone of Z with polar cone Y+, one gets the usual Lagrangian
EXAMPLE 2 (SUBAFFINE DUALITY, MART~NEZ-LEGAZ, 1988~,~; PENOT AND VOLLE,1987, 1988, 1 9 9 0 ~ , 2003,. ~, . . ) The following coupling function is more appropriate to the study of general quasiconvex problems. Given a locally convex topological vector space Z with dual space Z*: taking Y := Z* x R, this coupling is given by
where r A s := min(r, s) for r , s E R. Initiated in Penot and Volle (1990a), a full characterization of the class of CQ-convexfunction has been given in Martinez-Legaz (198813, Prop. 4.2): f : Z -+ is CQ-convexiff f is lower semicontinuous, quasiconvex and for any X < sup f ( 2 ) there exists a continuous affine function g such that g A X 5 f. EXAMPLE 3 (LOWERQUASICONVEX DUALITY) Suppose Y, Z are as in Example 1 and c is taken as
Note that c,(y, x) = CQ ((y, O), 2). When (P) is given as in Example 1, G(x) = C - g(x) with C a closed proper convex cone with polar Y+,
16
ESSA Y S AND SURVEYS IN GLOBAL OPTIMIZATION
since for any x E X , y E Y+, one has k(x, y) = inf{(y, g(x) - w)+ : w E C) = (y,g(x))+, while k(x, y) = 0 for x E X , y E Y\Y+ and one gets which is given by the lower Lagrangian L< : X x Y --+
Note that L< is a Lagrangian of ( P ) since L<(x,y) 5 f (x) for (x, y) E X x Y, L<(x, 0) = f (x) and, for z := g(x) E Z \ C one can find jj E Y+ , + m . It such that (jj, z) > 0, so that SUPyEy+(y, z)+ s ~ p ~ , ~ ( tZ)j j = has been used in Martinez-Legaz (1988b), Penot (2003b), and Penot and Zalinescu (2000).
>
Let us observe that since L< majorizes the usual Lagrangian L given in Example 1, the dual functional d< associated with L< majorizes the classical dual functional d associated with L, so that
the duality gap is reduced by using the dual problem (DL,) instead of (DL). This fact is illustrated by the following sub-example: Let X = R = Z, C = R- and let f , g be given by f (x) = 32, g(x) = 1 - x3. Then d(y) = - m for any y E R but d<(y) = min(y - 2/&7,3) for y E R+ (with 110 = f m ) so that sup(DL) = --m, sup(DL,) = 3 = inf (P). The choice of c< is more natural than it may appear at the first glance. With this coupling function one has a
>
+
EXAMPLE 4 (FUNCTIONAL DUALITY) Let Z be a set, let Oz be a fixed element of Z and let Y be a subset of the set RZ of real-valued functions on Z. When c: Y x Z -t R is the evaluation mapping given by c(y, z) := y(z) and when Y is a cone of IEZ, the condition on c of Proposition 1.6 is satisfied. If Z is a topological space, if for each x E X the set G(x) is a closed proper subset of Z and if Y contains the set of lower semicontinuous functions on Z with values in [ O , l ] , then the c-convexity of the sets G(x) is satisfied. If the space Z is uniformisable, then one can take for Y the space of continuous functions on Z (compare with FloresBazBn, 1995; Martinez-Legaz, 1988b). If Z is a metric space, one can
1 Unilateral Analysis and Duality
17
take for Y the space of Lipschitz functions (see Martinez-Legaz, 1988b; Pallaschke and Rolewicz, 1997; Penot , 2003c; Rolewicz, 1994). DUALITY, BERTSEKAS, 1982; EXAMPLE 5 (AUGMENTED LAGRANGIAN HESTENES,1975; MUANG AND YANG,2003; JANIN,1973; PENOT, AND VOLLE,1 9 9 0 ~ ROCKAFELLAR, ; 1 9 7 4 ~ RUBINOV ; 2002; PENOT AND YANG,2003; YANGA N D HUANG, 2001) Let Z be a metric space with base point Oz, and let c : Y x Z -+ R be a coupling function satisfying (S). Suppose one is given some firm function k: Z + R+ i.e. a function such that k(Oz) = 0 and (z,) + Oz whenever (k(zn)) --+ 0. Then one can introduce the set Y = R+ x Y, and the coupling function 6: Y x Z + R defined by
Then the sub-Lagrangian L associated with 6 and a sub-perturbation P : X x Z + R, := R U {+oo) is called the augmented sub-Lagrangian since it is deduced from the sub-Lagrangian associated with c and the sub-perturbation P through relation (1.11) which turns out to ( ~ ( xz), - c(y, z) i ( x , y, r ) = zinf EZ
+ rk(z)),
(x, y, r) E X x Y x R.
+
The dual function d is given by d(y, r) = --p"y, +) = -(p rk)c(y). Under some growth condition one can obtain the zero duality gap property whenever p is lower semicontinuous at Oz (see Huang and Yang, 2003; Penot, 2002; Penot and Volle, 1990b; Rockafellar, 1974a; Rubinov and Yang, 2003, . . .). Moreover, in the usual case of Example 1, the augmented dual.function d is smoother than d = -pC. EXAMPLE 6 (STARSHAPED DUALITY, PENOT, 1997c, 2001; RUBINOV, 2 0 0 0 ~ RUBINOV ; AND SIMSEK, 2000; RUBINOV A N D YAGUBLOV, 1986; SHVEIDEL,1997; ZAFFARONI,2 0 0 4 ~ , Let ~ ) Z be a convex cone in a normed vector space and let Y be the set of continuous superlinear functions on Z , c being the evaluation mapping given by c(y, z) := y (z). Then a nice and simple separation theorem (see Zaffaroni, 2004b, and the appendix) shows that the family of c-convex subsets containing 0 coincides with the family of closed starshaped subsets of Z , i.e. closed subsets S such that tx E S for any t E [O,1] and x E S .
A specialization of this duality occurs when Y = Rn, and Z = Rn+ and one considers the coupling function em defined on Y x Z by
18
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
When G is defined by G(x) = {x E JRn+ : x 2 g(x)), where g : X + Z is some mapping, a computation of k shows that the corresponding subLagrangian is given by:
It is clearly a Lagrangian. with the usual convention sup 0 = -a.
7 (MODULAR DUALITY) Given a set S, let Z be a subfamily EXAMPLE of the set Pf(S)of finite subsets of S endowed with the order given by inclusion and let Y be a subset of the space of real-valued functions on S. Assume 0 E Z and set for y E Y, x E 2,
with the convention c(y, 0) = 0. This example is important for discrete optimization: when Z is a distributive sublattice of Pf(S),c-linear functions correspond to modular functions, i.e. functions f : Pf (S)+ JR satisfying
These functions are very simple as they are determined by their values on 0 and the singletons:
Any function f : Pf (S)+ JR which is submodular, i.e. such that
can be extended to a convex function f : R: + R given by f(y) = ~ f Xi f=(Ai)~ if y = x i k= , AilA, with Xi E R+\{O), Ak C Ak-1 C . . . C A1 are distinct subsets (see LovAsz, 1983, Prop. 4.1). The fact that one disposes of a sandwich theorem (Frank, 1982) and of duality results (Fujishige, 1984; Martinez-Legaz, 1995) points out the analogies with convexity.
(2,s)
be a preordered space EXAMPLE 8 (HOMOTONE DUALITY) Let and let Y be the set of homotone (i.e. nondecreasing) functions on Z. Then the evaluation mapping (y, z) H y(z) is a coupling. The special case Z = Rd, Y being the set of superadditive homotone functions null at 0 is considered in Tind and Wolsey (1981); for positively homogeneous homotone functions see Martinez-Legaz and Rubinov (2001),
1
19
Unilateral Analysis and Duality
Martinez-Legaz et al. (2002), Rubinov and Glover (1997, 1998a,b, 1999), and Rubinov et al. (1999a,b, 2000, 2002). The important case Z is a distributive lattice and Y is formed of modular or submodular functions is also included in the present example. EXAMPLE 9 (SHADYCONJUGACY) Let Z be a n.v.s. and let Y be its dual. Let cV and cV be the couplings given by cV(y,z) = - L [ (z) ~ ~ ( Z ) , for a subset S of Z , is the indicator and z) = - L [ ~ < ~ ~where function of S and [y 5 11 (resp. [y < 11) stands for {z E Z : (y, z) 5 1) (resp. {z E 2:(y, z) < 1)). For such examples condition ( F ) is satisfied, and these conjugacies have some interest (see Penot, 1997c, 2005; Thach, 1991, 1993, 1994, 1995; Volle, 1985). EXAMPLE10 (PARTIALSUBLEVEL DUALITY)Let Z be a n.v.s. and let
Y := Z* x IR- . Let cC be given by cC(z, (z*,r)) = -L[,.>,~ (2). Then condition (F) is not satisfied for this coupling which has been used in Penot (2000), Singer (l986), Volle (1985).
11 (INTEGRAL DUALITY) Let p 2 1, let (S,S,a) be a meaEXAMPLE sured space (in brief S), let E be a separable Banach space, and for i = 0 , . . . , k, let Fi: S x E t IfB be a measurable integrand, with associated integral functional fi on X := Lp(S,E) given by fi(x) := fi (s,x(s)) d a ( ~ for ) x E Lp(S,E). Suppose f,(x) > - m for each x E X and the Slater condition: {x E X : fo(x) < + m , fi(x) < 0, i = I ,..., k ) # 0 holds. Let f = fo and A := {X E X : fi(x) 5 0 , i = 1 , . . . , k). Then, setting for (x, y) E X x IRk
&
L(x,Y)= f ( x )
+ ylfl(x) + + ykfk(x), '"
the existence of multipliers can be ensured under appropriate assumptions (Aubin and Ekeland, 1975; Bourass and Giner, 2001) although ( P ) is a nonconvex problem. Other examples are given in Balder (1977), Briec and Horvath (2004), Briec et al. (2OO5), Martinez-Legaz (1988b), Penot (2002), Penot (2000), Rubinov (2000a,b), Rubinov and Andramonov (1999), Rubinov and Glover (1997, 1998a,b, 1999), and Rubinov et al. (1999a,b, 2000, 2002) Rubinov and Simsek (1995a,b, 2000), Rubinov and Yagublov (l986), Rubinov and Yang (2OO3), Shveidel (1997), Simons (1994, l998), Singer (1986, 1997), and Thach (1991, 1995) for instance.
6.
Multipliers and subdifferentials
In the next result we suppose L is the sub-Lagrangian associated with a sub-perturbation P and a coupling c and we compare the set M of multipliers of L with dCp(Oz). Again we suppose assumption (F) holds.
~
~
20
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
PROPOSITION 1.7 Let P be a sub-perturbation of ( P ) and let L be the sub-Lagrangian associated with P :
Then, M c aCp(OZ)and one has p(Oz) = inf(P) if M is nonempty. When p(OZ) = inf ( P ) one has
Proof. Let y E M. Then, by Definition 1.1, Propositions 1.4 and 1.1 inf(P) = sup(VL) = sup(Vp) 5 p(Oz) I inf(P) and equality holds. Moreover, for each z E Z , one has
y is a multiplier, hence fj E aCp(OZ). Now suppose p(Oz) = inf(P) and let jj E aCp(Oz),so that for every x E X, z E Z one has as
Thus, one gets for x E X L(x, y) := inf ( P ( x ,z) - c(y, z) zEZ
+ c(y, OZ)) 2 p(Oz) = inf (P)
and infZExL(x, y) 2 inf(P). Since the opposite inequality always holds in view of weak duality, this inequality is an equality and one gets y E
M.
0
REMARK1 (a) The preceding result can also be deduced from Propositions 1.1, 1.3 and 1.4. Since L is deduced from the sub-perturbation P, it is a sub-Lagrangian of ( P ) and the solution set SIT, to (Vp) is also the solution set SL to (DL). Given ji E M we have sup(VL) = inf(P) = p(OZ) and thus, by Proposition 1.3, S; coincides with aCp(Oz). For the converse, we note that when dCp(Oz) is nonempty we have aCp(Oz)= aCpCC(Oz) and pCC(Oz)= ~ ( O Z ) , so that, when p(Oz) = inf(P), there is no duality gap and then again aCp(O.z)= S; = M . (b) The equality inf(P) = p(OZ) may occur for a sub-perturbation which is not a perturbation.
1 Unilateral Analysis and Duality
21
EXAMPLE 12 In the setting of Example 3, c being given by (1.17), we have seen that we get the lower subdifferential, or Plastria subdifferential, given for Oz = 0 by
A variant of this subdifferential is the infra-difleerential, or Gutikrrez subdifferential, given by dSp(0z) := {y E Y : 'dz E
i p(Oz)],p(z) 2 p(Oz) + (Y, 2 - 02))
where 5 p(Oz)] := p-' ( ] - - m , p ( ~ ~ ) ]Clearly, ). one has d
+
7.
+
Relationships with calmness and penalization
Now let us deal with links between multipliers and the notion of calmness, assuming Z is a metric space and assumption (F)holds for some coupling function c: Y x Z -+ @. Recall that a function q: Z -+ is said to be calm (at Oz) if q(Oz) is finite and if there is a constant k E R+ such that 4(2) 2 q(0z) - kd(z, Oz) b'z E 2. (1.19) Such a notion has been widely used (see Clarke, 1983j. In the present section we consider it for the performance function p associated with some sub-perturbation P.of the primal problem ( P ) .
22
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
The first assertion of the following proposition corresponds to a familiar situation; the other two are rather unusual.
PROPOSITION 1.8 (a) Suppose that the set M of multipliers for the sub-Lagrangian associated with a sub-perturbation P is non empty, and that for some y E M the function c(y, .) is calm. Then the performance function p is calm. (b) If p is calm, if p(Oz) = inf(P) and if for any k E R+ the function -kd(., Oz) is c-subdiflerentiable at Oz, then the set M of multipliers is non empty. (c) If p is calm and c-convex, if p(Oz) = inf(P) and if any calm cconvex function is c-subdiflerentiable at Oz, then the set M of multipliers is non empty. Proof. (a) Let y E M be such that c(y, .) is calm. Then, by Proposition 1.7, we have y E dCp(Oz)or p(-) 2 p(Oz) c(y, -) - c(y, Oz). Taking k E R+ such that (1.19) is satisfied for q := c(y, .), we get (1.19) for 4 = P(b) Suppose (1.19) is satisfied for q := p and p(Oz) = inf(F'). Associating to Ic E R+ some y E dC( - kd(., OZ)) (OZ) i.e. some y E Y such that -kd(.,Oz) 2 4 ~ 1 . ) c(y,Oz), we get
+
for each x E Z. Since the assumption of the last assertion of Proposition 1.7 is satisfied, we have y E dCp(Oz)c M. (c) Our assumptions ensure that aCp(Oz) is non empty. Moreover, since p(Oz) = inf (P),we have dCp(Oz)c M by Proposition 1.7. 0 It is easy to check calmness of the functions c(y, .) (y E Y) in Examples 1 and 2. The condition of assertion (b) is satisfied in the case of Example 6: for c = c, one can take the vector y = (k, k , . . . , k) . In the case of Example 1, the sandwich theorem ensures that any convex function which is calm at Oz is subdifferentiable. We now use a simple sufficient condition for calmness to get existence of multipliers. For other results in that vein see Lasserre (1980), Penot (2002) and their bibliographies. Here we suppose X and Z are metric spaces and that the perturbation is given as in (1.Is), with A = F - ' ( 0 ~ ) . The distance of x E X to A is denoted by d(x, A) := infaEAd(x, a). PROPOSITION 1.9 Assume X is a metric space, the objective function f is Lipschitxian on X and there exists a constant K > 0 such that
1
Unilateral Analysis and Duality
23
Then p is calm. Thus, if the assumptions of assertions (b) or (c) of the preceding proposition are satisfied, the set M of multipliers is non empty. Proof. Let K' > K and let z E Z\{Oz). It follows from (1.20) that for each x E F ( z ) there exists some v E A such that d ( x ,v ) 5 ~ ' d ( Oz z, ) . If X is the Lipschitz rate of f , we obtain that
Taking the infimum over x E F ( z ) we get p(z) 2 p(Oz) - d X d ( z ,0 2 ) . Taking the supremum over K' > K , it follows that (1.19) holds for q = p with k = KX. 0 In the next result we point out a link between general multipliers and penalization: the knowledge of some multiplier gives some information about the threshold of the penalization coefficient. Again we consider problem ( P ) , with A = G V 1 ( o z ) . For r E R+, we denote by fr the penalized fwction given by
Note that it is much easier to use this function than the function x H f ( x ) kd(x,A) which is the natural penalized function. When relation (1.20) holds, taking the infimum over z E G ( x ) , one can pass from f kd(., A ) to f , since d(.,A ) 5 ~d ( G ( . )0, 2 ) This observation points out the importance of (1.20) which has been the object of numerous studies.
+
+
PROPOSITION1.10 Suppose Y and Z are metric spaces with base points O y and O z respectively. Suppose the set M of multipliers is nonempty and that for some y > 0 the coupling function c satisfies
Then, for r 2 yd(Oy, M ) , one has inf f r ( X ) = inf f ( A ) . Proof. Let x E X, y E M and let z E G ( x ) . Using the notation of relation (1.16),one has
hence
24
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
inf f (A) = inf L ( X , y) = inf (f (x') %/EX
+ k(x', Y))
I f (x)
Taking the infimum over y E M and z E G(x) one gets, for r $(OY 7 M ) ,
>
Since f,(x) = f (x) for x E A, we get inf f (A) = inf f, (X). For related results, see Bonnans (1990), Lasserre (1980), Penot (1992, 2002), Thibault (1995), Yang and Huang (2001), Ye et al. (1997) and their references. Examples of coupling functions satisfying relation (1.22) are given in Penot (2003~).
Appendix In this appendix we present a simple proof of the assertion of Example 6, inspired by Zaffaroni (2004a); see Rubinov and Simsek (2000), Shveidel (1997), Zaffaroni (2004b) for related material.
PROPOSITION 1.11 A subset S of a normed space Z is closed and starshaped if, and only iJ it contains 0 and is c-convex, where c is the evaluation mapping c: Y x Z --t R given by c(y, z) = y(z) and Y is the set of continuous superlinear functions on Z. Proof. If S := L: r], where p is a continuous superlinear function on Z and r E R, and if 0 E S, then one has r 0 and S is clearly closed and starshaped. Since these properties are preserved by taking intersections, the condition is sufficient. In order to prove it is necessary, let us show that if S is a closed and starshaped subset of Z and if 2 E Z\S, then, there exist some continuous superlinear function p on Z and r E R+ such that p(z) L: r for each z E S and p(2) > r . Since S is closed, there exists some s E ( 0 , l ) and some r > 0 such that the open ball V with center w := s2 and radius r does not intersect S . Let C := (0, + m ) V be the cone generated by V. Since S is starshaped, we have S f l [I, +oo)V = 0.Moreover, we have W + C C[ 1 , + m ) V s i n c e f o r a n y t > O , v ~ Vw, e h a v e w + t v = (1+t)v1 for v' := (1 t)-'w t ( l t)-'v E V, by convexity of V. Therefore, S n (w C ) = 0.Since 2 = s-'w = w (s-l - l)w E w C , the result is a consequence of the following lemma of independent interest. 0
>
+
+
+
+
+
+
LEMMA1.1 Given an open convex cone C # Z of a normed space Z and w E C there exists some continuous superlinear function p on Z such that p(w) = 1 and w C = {z E Z : p(z) > 1).
+
1 Unilateral Analysis and Duality
Proof. Let C+ := (43)'be the dual cone of C and let
K := {y E C + : (y, w) = 1). Let r > 0 be such that B ( w , r ) c C. Then, for each y E K and each z E B(0, r ) , we have w - z E C , hence
so that llyll I r-l. Thus, K is weak* compact. Let p be given by p(z) := infuEK(y,2). Then, the compactness of K ensures that p(z) > 1 for each z E w + C since for each y E K one has C c {u E Z : (y,u) > 0) and since there exists some y E K such that p(z) = (y, z). Now let us show that p(x) I 1 for each x E Z\(w C ) . Then x - w C. By the Hahn-Banach theorem, one can find some y E Z*\{O) such that
+
4
Since C is a cone we have y E C f . Moreover, since C is open and w E C we note that (y, w) > 0. Without loss of generality we may assume (y, w) = 1. Thus y E K and p(x) 5 (y, x) 5 (y, w) inf{(y, c) : c E C) = 1. 0
+
One may wonder whether the function p given by p(z) := d ( z , Z\C) which is obviously superlinear could be of some use for such a matter (shrinking C if necessary).
Acknowledgements The author thanks C. Ziilinescu for his accurate criticisms and remarks.
References Aubin, J.-P. (1979). Mathematical Methods of Games and Economic Theory. North-Holland, Amsterdam. Aubin, J.-P. and Ekeland, I. (1975). Minimisation de crithres intB graux. Comptes Rendus de 1'Acade'mie des Sciences Paris. Se'rie A-B, 281(9):A285-A288. Aubin, J.-P. and Ekeland, I. (1984). Applied Nonlinear Analysis. Wiley, New York. Auslender, A. (2000). Existence of optimal solutions and duality results under weak conditions. Mathematical Programming, 88(1):A45-A59.
Az6, D. (1997). Ele'ments d 'analyse convexe et variationnelle, Ellipses, Paris.
26
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Bachir, H., Daniilidis, A., and Penot, J.-P. (2002). Lower subdifferentiability and integration. Set-Valued Analysis, 10(1):89-108. Balder, E.J. (1977). An extension of duality-stability relations to nonconvex problems. SIAM Journal on Optimization, 15:329-343. Balder, E.J. (1980). Nonconvex duality-stability relations pertaining to the interior penalty method. Mathematische Operationsforschung und Statistik (Serie Optimization), 11:367-378. Ben-Tal, A. and Teboulle, M. (1996). A conjugate duality scheme generating a new class of differentiable duals. SIAM Journal on Optimization, 6(3):617-625. Bertsekas, D .P. (l975). Necessary and sufficient conditions for a penalty function to be exact. Mathematical Programming, 9:87-89. Bertsekas, D.P (1982). Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York. Birkhoff, G. (1966). Lattice Theory. American Mathematical Socity, Providence, RI. Bonnans, J.F. (1990). Thkorie de la penalisation exacte. RAIRO Mode'lisation Mathe'matique et Analyse Nume'rique, 24(2):197-210. Bonnans, J.F. and Shapiro, A. (2000). Perturbation Analysis of Optimization Problems. Springer Series in Operations Research, Springer, New York. Borwein, J.M. and Lewis, A.S. (2000). Convex Analysis and Nonlinear Optimization. CMS Books in Mathematics. Springer-Verlag, New York. Bourass, A. and Giner, E. (2001). Muhn-Tucker conditions and integral functionals. Journal of Convex Analysis, 8:533-553. Breckner, W.W. and Kassay, G. (1997). A systematization of convexity concepts for sets and functions. Journal of Convex Analysis, 4(1):109127. Briec, W. and Horvath, Ch. (2004). B-convexity. Optimization, 53(2): 103-127. Briec, W., Horvath, Ch., and Rubinov, A. (2005). Separation in BConvexity. Forthcoming in: Pacific Journal of Optimization.
1 Unilateral Analysis and Duality
27
Clarke, F.H. (1983). Optimization and Nonsmooth Analysis. Wiley, New York. Clarke, F.H. and Nour, C. (2005). Nonconvex duality in optimal control. Forthcoming in: SIAM Journal on Control and Optimization. Crouzeix, J.-P. (1977a). Contribution B l'Btude des fonctions quasiconvexes. Thbe d j ~ t a tUniversitk , de Clermont 11. Crouzeix, J.-P. (197713). Conjugacy in quasiconvex analysis. In: Convex Analysis and its Applications (A. Auslender, ed.). Lecture Notes in Economics and Mathematica.1 Systems, vol. 144, pp.66-99, SpringerVerlag, Berlin. Crouzeix, J.-P. (2003). La convexit6 g6nBralisBe en Bconomie mathkmatique. ESAIM: Proceedings, 13:31-40. Daniilidis, A. and Martinez-Legaz, J .E. (2002). Characterization of evenly convex sets and evenly convex functions. Journal of Mathematical Analysis and Applications, 273:58-66. Dolecki, S. and Kurcyusz, S. (1978). On @-convexity in extremal problems. SIAM Journal of Control and Optimization, 16:277-300. Dolecki, S. and Rolewicz, S. (1979). Exact penalties for local minima. SIAM Journal of Control and Optimization, 17(5):596-606. Eberhard, E., Nyblom, M., and Ralph, D. (1998). Applying generalised convexity notions to jets. In: Generalized Convexity, Generalised Monotonicity: Recent Results (J.-P. Crouzeix et al., eds.), pp. 111-157 Kluwer, Dordrecht . Eberhard, A. and Nyblom, M. (1998). Jets, generalised convexity, proximal normality and differences of functions. Nonlinear Analysis, 34:319-360. Ekeland, I. and Tkmam, R. (1976). Analyse Convexe et Problimes Variationnels. Dunod, Paris; English transl., North Holland, Amsterdam. Evtushenko, Yu., Rubinov, A.M., and Zhadan, V.G. (2001). General Lagrange-type functions in constrained global optimization, 11. Exact auxiliary functions. Optimization Methods and Software, 16:231-256. Fan, K. (1956). On systems of linear inequalities. Annals of Mathematics Studies, 38:99--156. Fan, K. (1963). On the Krein-Milman theorem. Proceedings of Symposia in Pure Mathematics, 7:2ll-219.
28
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Flachs, J. (1981). Global saddle-point duality for quasi-concave programs. I. Mathematical Programming, 20:327-347. Flachs, J . (1982). Global saddle-point duality for quasi-concave programs. 11. Mathematical Programming, 24:326-345. Flachs, J . and Pollatschek, M.A. (1969). Duality theorems for certain programs involving minimum or maximum operations. Mathematical Programming, l6:348-370. Flores-BazBn, F. (1995). On a notion of subdifferentiability for nonconvex functions. Optimization, 33:l-8. Flores-BazBn, F. and Martinez-Legaz, J.E. (1998). Simplified global optimality conditions in generalized conjugation theory. In: J.-P. Crouzeix, J.E. Martinez-Legaz, and M. Volle (eds.), Generalized Convexity, Generalized Monotonicity, pp. 303-329. Kluwer, Dordrecht. Fujishige, S. (1984), Theory of submodular programs: A Fenchel-type min-max theorem and subgradients of submodular functions. Mathematical Programming, 29:142-155. Frank, A. (1982). An algorithm for submodular functions on graphs. Annals of Discrete Mathematics 16:97-120. Gauvin, J . (1982). The'orie de la Programmation Mathe'matique non Convexe. Les Publications CRM, Montrhal; Revised version: (1994), Theory of Nonconvex Programming; Les Publications CRM, Montrhal. Gauvin, J . and Janin, R. (1990). Directional derivative of the value function in parametric optimization. Annals of Operations Research, 27:237-252. Gauvin, J. and Tolle, J.W. (1977). Differential stability in nonlinear programming. SIAM Journal of Control and Optimization 15:294-311. Giannessi, F . (1984). Theorems of the alternative and optimality conditions. Journal of Optimization Theory and Applications, 42(3): 331365. Goh, C.J. and Yang, X.Q. (2001). Nonlinear Lagrangian theory for nonconvex optimization. Journal of Optimization Theory and Applications, 109(1):99-121. Greenberg, H.P. and Pierskalla, W.P. (1973). Quasiconjugate function and surrogate duality. Cahiers du Centre d ' ~ t u d ede Recherche Ope'rationnelle, 15:437-448.
1
Unilateral Analysis and Duality
29
Hayashi, M. and Komiya, H. (1982). Perfect duality for convexlike programs. Journal of Optimization Theory and Applications, 38:179-189; 107-113. Hestenes, M.R. (1975). Optimization Theory. The Finite Dimensional Case. Wiley-Interscience, New York. Hiriart-Urruty, J.-B. (1998). Optimisation et Analyse Convexe. Presses Universitaires de France, Paris. Horvath, C. (1990). Convexit6 g6n6ralis6e et applications. Mkthodes topologiques en analyse convexe, MontrBal, Qc, 1986), S6minaire de Mathkmatiques Sup&rieures,vol. 110, pp. 79-99, Presses de l'Universit6 de Montrkal, Montrkal, QC, 1990. Huang, X.X. and Yang, X.Q. (2003). A unified augmented Lagrangian approach to duality and exact penalization. Mathematics of Operations Research, 28(3):533-552. Ioffe, A.D. (2001). Abstract convexity and non-smooth analysis, Abstract convexity and non-smooth analysis. Advances in Mathematical Economics, vol. 3, pp. 45-61, Springer, Tokyo. Ioffe, A.D. and Rubinov, A.M. (2002). Abstract convexity and nonsmooth analysis: Global aspects. Advances in Mathematical Economics, vol. 4, pp. 1-23, Springer, Tokyo. Janin, R. (1973). Sur la dualitd en programmation dynamique. Comptes Rendus de 1'Acade'mie des Sciences. Se'rie A-B, A277:1195-1197. Komiya, H. (1981). Convexity on a topological space. Fundamenta Mathematicae, 11l(2):107-113. Kurcyusz, S. (1975). Some remarks on generalized Lagrangians. Optimization Techniques, 362-388. Kutateladze, S.S. and Rubinov, A.M. (1972). Minkowski duality and its applications. Russiari Mathematical Surveys, 27:137-191. Lasserre, J.B. (1980). Exact penalty function and Lagrange multipliers. RAIRO Automatique, 14(2):117-126. Laurent, P.-J. (1972). Approximation et Optimisation. Hermann, Paris. Li, D. (1995). Zero duality gap for a class of nonconvex optimization problems. Journal of Optimization Theory and Applications, 85:309324.
30
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Li, D. and Sun, X. (2001). Nonlinear Lagrangian methods in constrained nonlinear optimization. In: Optimization Methods and Applications, Applied Optimization, vol. 52; pp. 267-277, Kluwer, Dordrecht. Lindberg, P.O. (1980). A generalization of Fenchel conjugation giving generalized Lagrangians and symmetric nonconvex duality. Survey of Mathematical Programming, (Proceedings of the Ninth International Mathematical Programming Symposium, Budapest, l976), vol. 1, pp. 249-267, North-Holland, Amsterdam-Oxford-New York. LovAsz, L. (1983). Submodular functions and convexity. In: Mathematical Programming, the State of the Art (Bonn 1982) (A. Bachem, M. Grotschel, and B. Korte, eds.), pp. 235-257, Springer Verlag, Berlin. Luenberger, D. (1969). Optimization by Vector Space Methods. Wiley, New York. Magnanti, T.L. (1974). Fenchel and Lagrange duality are equivalent. Mathematical Programming, 7:253-258. Martinez-Legaz, J.-E. (1988a). Quasiconvex duality theory by generalized conjugation methods. Optimization, 19:603-652. Martinez-Legaz, J.-E. (1988b). On lower subdifferentiable functions. In: Trends in Mathematical Optimization (K.H. Hoffmann et al., eds), Internationale Schriftenreihe zur Numerischen Mathematik, vol. 84, pp. 197-232, Birkhauser, Basel. Martinez-Legaz, J.-E. (1995). Fenchel duality and related properties in generalized conjugation theory. International Conference in Applied Analysis (Hanoi, 1993), Southeast Asian Bulletin of Mathematics, 19:99-106. Martinez-Legaz, J.-E. and Rubinov, A.M. (2001). Increasing positively homogeneous functions defined on Rn.Acta Mathernatica Vietnam, 26(3):313-331. Martinez-Legaz, J.-E. Rubinov, A.M., and Singer, I. (2002). Downward sets and their separation and approximation properties. Journal of Global Optimization 23(2):111--137. Martinez-Legaz, J.-E. and Singer, I. (1990). Dualities between complete lattices. Optimization, 21:481-508. Martinez-Legaz, J.-E,. and Singer, I. (1995). Subdifferentials with respect to dualities. Zeitschriift fur Operations Research. Mathematical Methods of Operations Research 42(l):109-125.
1 Unilateral Analysis and Duality
31
Minoux, M. (1983). Programmation Mathe'matique. The'orie et Algorithmes, Dunod, Paris. Moreau, J.-J. (1964). ThBorkmes "inf-sup". Comptes Rendus de l'Acad6mie des Sciences. Paris, 258:2720-2722. Moreau, J.-J. (1970). Inf-convolution, sous-additivit6, convexit6 des fonctions numBriques. Journal de Mathe'matiques Pures et Applique'es, 4:109-154. Moreau, J.-J. (2003). Fonctionnelles Convezes. SBminaire sur les Bquations aux dBriv6es partielles, Collkge de France, Paris, 1967 Nour, C. (2004). Smooth and nonsmooth duality for free time problem. Optimal Control, Stabilization and Nonsmooth Analysis, Lecture Notes in Control and Information Sciences, vol. 301, pp. 323-331, Springer Verlag, Heidelberg. Oettli, W. (1985). RBgularisation et stabilit6 pour les problkmes "infsup". Mannh,eimer Berichte, 26:9-10. Oettli, W. and Schlager, D. (1998). Conjugate functions for convex and nonconvex duality. Journal of Gbbal Optimixation, 13(4):337-347. Oettli, W., Schlager, D., and ThBra, M. (2000). Augmented Lagrangians for general side constraints, Optimization (Namur, 1998) (Nguyen, Van Hien et al., eds.), Lecture Notes in Economics and Mathematical Systems, vol. 481, pp. 329-338, Springer, Berlin. Ore, 0 . (1944). Galois connexions. Transactions of the American Mathematical Society, 55:493-513. Pallaschke, D. and Rolewicz, S. (1997). Foundations of Mathematical Optimization. Mathematics and its Applications, vol. 388, Kluwer Academic Publishers Group, Dordrecht, 1997. Pallaschke, D. and R.olewicz, S. (1998). Penalty and augmented Lagrangian in general optimization problems. In: Charlemagne and his heritage. 1200 years of civilixation and science in Europe (Aachen, 1995), vol. 2, pp. 423-437. Brepols, Turnhout. Pchenitchny, B.N. (1990). Necessary conditions for an extremum, penalty functions and regularity. In: Perspectives in Control Theory (Sielpia, 1988), Progress in Systems and Control Theory, vol. 2, pp. 286-296. Birkhauser Boston, Boston, MA.
32
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Penot, J.-P. (1985). Modified and augmented Lagrangian theory revisited and augmented. Unpublished lecture, Fermat Days, Toulouse. Penot, J.-P. (1992). Estimates of the exact penalty coefficient threshold. Utilitas Mathematica, 42: 147-161. Penot, J.-P. (1995a). Analyse unilaterale et dualit& Communication to the Mode Meeting, Brest. Penot, J.-P. (l995b). Generalized convexity in the light of nonsmooth analysis. In: Recent developments in optimization (Dijon, 1994), Lecture Notes in Economics and Mathematical Systems, vol. 429, pp. 269290, Springer, Berlin, 1995. Penot, J.-P. (1997a). Generalized derivatives of a performance function and multipliers in mathematical programming. In: Parametric optimization and related topics, IV (Enschede, 1995), Approximation & Optimization, vol. 9, pp. 281-298, Lang, Frankfurt am Main. Penot, J.-P. (199713). Multipliers and generalized derivatives of performance functions. Journal of Optzmization Theory and Applications, 93(3):609-618. Penot, J.-P. (1997~).Duality for radiant and shady programs. Acta Mathematical Vzetnamica, 22(2):541-566. Penot, J.-P. (1999). Points de vue sur l'analyse de sensibilite en programmation mathkmatique. In: A. Decarreau, R. Janin, R. Philippe, and A. Pietrus (eds.), Actes des sixiimes journe'es du groupe mode), pp. 176-203, Editions Atlantiques, Poitiers. Penot , J .-P. (2000). What is quasiconvex analysis? Optimization, 47:35110. Penot, J.-P. (2001). Duality for anticonvex problems. Journal of Global Optimization, 19:163-182. Penot , J.-P. (2002). Augmented Lagrangians, duality and growth conditions. Journal of Nonlinear Convex Analysis, 3(3):283-302. Penot, J.-P. (2003a). Characterization of solution sets of quasiconvex programs. Journal of Optimixation Theory and Applications, 117(3):627--636. Penot , J.-P. (2OO3b). A Lagrangian approach to quasiconvex analysis. Journal of Optimixation Theory and Applications, 117(3):637-647.
1
Unilateral Analysis and Duality
33
Penot , J.-P. (2003~).Rotundity, smoothness and duality. Control and Cybernetics, 32(4):711--733. Penot, J.-P. (2005). The bearing of duality on microeconomics. Advances in Mathematical Economics, 7:113-139. Penot, J.-P. and Quang, P.H. (1993). Cutting Plane Algorithms and Approximate Lower Subdifferentiability. Fortcoming in Journal of Optimization Theory and Applications. Penot, J.-P. and Rubinov, A.M. (2005). Multipliers and General Lagrangians. Submitted. Penot, J.-P. and Sach, P.H. (1997). Generalized monotonicity of subdifferentials and generalized convexity. Journal of Optimization Theory and Applications, 94(1):251-262. Penot, J.-P. and Volie, M. (1987). Dualit6 de Fenchel et quasi-convexit6. Comptes Rendus des Se'ances de I'Acade'mie des Sciences. Se'rie I. Mathe'matique, 304(13):371-374. Penot, J.-P. and Volle, M. (1988). Another duality scheme for quasiconvex problems. In: K.H. Hoffmann et al. (eds.), Trends in Mathematical Optimzzation, pp. 259-275. Internationale Schriftenreihe zur Numerischen Mathematik, vol. 84, Birkhauser, Basel. Penot, J.-P. and Volle, M. (1990a). On quasi-convex duality. Mathematics of Operations Research, 15(4):597-625. Penot, J.-P. and Volle, M. (1990b). On strongly convex and paraconvex dualities. In: A. Cambini, E. Castagnoli, L. Martein, P. Mazzoleni, S. Schaible (eds.), Generalized convexity and fractional programming with economic applications (Pisa, Italy, 1988), pp. 198-218. Lecture Notes in Economics and Mathematical Systems, vol. 345, Springer Verlag, Berlin. Penot, J.-P. and Volle, M. (2003). Surrogate programming and multipliers in quasiconvex programming. SIAM Journal of Control Optimization, 42(6):1994--2003. Penot, J.-P. and Zaiinescu, C. (2000). Elements of quasiconvex subdifferential calculus. Journal of Convex Analysis, 7:243-269. Plastria, F. (1985). Lower subdifferentiable functions and their minimization by cutting plane. Journal of Optimization Theory and Applications, 46(1):37-54.
34
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Pshenichnii, B. and Daniline, Y. (1977). Me'thodes nume'riques duns les problimes d 'extre'mum. Mir, Moscow. Rockafellar, R.T. (1974a). Augmented Lagrange multiplier functions and duality in nonconvex programming, SIAM Journal of Control Optimization, 12:268-285. Rockafellar, R.T. (1974b). Conjugate Duality and Optimization. Lectures given at the Johns Hopkins University (Baltimore, 1973), Conference Board of the Mathematical Sciences Regional Conference Series in Applied Mathematics, no. 16, Society for Industrial and Applied Mathematics, Philadelphia, Pa. Rockafellar, R.T. (1982). Lagrange multipliers and subderivatives of optimal value functions in nonlinear programming. Nondifferential and variational techniques in optimization (Lexington, 1980), Mathematical Programming Study, 17:28-66. Rockafellar, R.T. (1993). Lagrange multipliers and optimality. SIAM Review, 35:183-238. Rockafellar, R.T. and Wets, R.J.-B. (1997). Variational Analysis. Springer, Berlin. Rolewicz, S. (1994). Convex analysis without linearity. Control and Cybernetics, 23:247-256. Rolewicz, S. (1996). Duality and convex analysis in the absence of linear structure. Mathematica Japonica, 44:165-182. Rubinov, A.M. (2000a). Abstract Convexity and Global Optimixation. Kluwer, Dordrecht. Rubinov, A.M. (2000b). Radiant sets and their gauges. In: V. Demyanov and A.M. Rubinov (eds.), Quasidiflerentiability and Related Topics. Kluwer, Dordrecht. Rubinov, A.M. and Andramonov, M. (1999). Lipschitz programming via increasing convex-along-rays functions. Optimization Methods and Software, 10:763-781. Rubinov, A.M. and Glover, B .M. (1997). On generalized quasiconvex conjugation. In: Y. Censor and S. Reich (eds.), Recent Developments in Optimization Theory and Nonlinear Analysis, pp. 199-216. American Mathematical Society, Providence, RI.
1 Unilateral Analysis and Duality
35
Rubinov, A.M. and Glover, B.M. (1998a). Quasiconvexity via two steps functions. In: J.-P. Crouzeix, J.-E. Martinez-Legaz and M. Volle (eds.), Generalized Convexity, Generalized Monotonicity, pp. 159-183. Kluwer Academic Publishers, Dordrecht. Rubinov, A.M. and Glover, B.M. (1998b). Duality for increasing positively homogeneous functions and normal sets. RAIRO Recherche Opt!rationnelle, 32:105-123. Rubinov, A.M. and Glover, B.M. (1999). Increasing convex-along-rays functions with applications to global optimization. Journal of Optimization Theory and Applications, 102:615-642. Rubinov, A.M., Glover, B.M., and Yang, X.Q. (1999a). Extended Lagrange and penalty functions in continuous optimization. Optimization, 46:327-351. Rubinov, A.M., Glover, B.M., and Yang, X.Q. (1999b). Decreasing functions with applications to optimization. SIAM Journal on Optimization, 10(1):289-313. Rubinov, A.M., Glover, B.M., and Yang, X.Q. (2000). Nonlinear unconstrained optimization methods: A review. In: Progress in optimization (Perth, 1998),pp. 65-77. Applied Optimization, vol. 39, Kluwer Acad. Publ., Dordrecht. Rubinov, A.M., Huang, X.X., and Yang, X.Q. (2002). The zero duality gap property and lower semicontinuity of the perturbation function. Mathematics of Operations Research, 27:775-791. Rubinov, A.M. and Simsek, B. (1995a). Conjugate quasiconvex nonnegative functions. Optimization, 35:l-22. Rubinov, A.M. and Simsek, B. (199513). Dual problems of quasiconvex maximization. Bulleti~of the Australian Mathematical Society, 51:139-144. Rubinov, A.M. and Simsek, B. (2000). Separability of star-shaped sets with respect to infinity. Progress in Optimization (Perth, 1998) (X. Yang et al:, eds.), Applied Optimization, vol. 39, pp. 45-63, Kluwer Acad. Publ., Dordrecht. Rubinov, A.M. and Yagublov, A.A. (1986). The space of star-shaped sets and its applications in nonsmooth optimization. Mathematical Programming Study, 29: 176-202.
36
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Rubinov, A.M. and Yang, X.Q. (2003). Lagrange-Type Functions in Constrained Nonconvex Optimization. Kluwer Academic Publishers. Shveidel, A. (1997). Separability of star-shaped sets and its application to an optimization problem. Optimization, 40:207-227. Simons, S. (1994). A flexible minimax theorem. Acta Mathematica Hungarica, 63(2):119-132. Simons, S. (1998). Mznimax and Monotonicity. Lecture Notes in Mathematics, vol. 1693, Springer, Berlin. Singer, I. (1986). Some relations between dualities, polarities, coupling functions and conjugations. Journal of Mathematical Analysis and Applications, 1l5:1-22. Singer, I. (1997). Abstract Convex Analysis. John Wiley, New York. Sion, M. (1954). On the existence of functions having given partial derivatives on a curve. Transactions of the American Mathematical Society, 77:179-201. Sion, M. (1958). On general minimax theorems. Pacific Journal of Mathematics, 8:171-176. Thach, P.T. (1991). Quasiconjugate of functions, duality relationships between quasiconvex minimization under a reverse convex constraint and quasiconvex maximization under a convex constraint and application. Journal of Mathematical Analysis and Applications, 159:299-322. Thach, P.T. (1993). Global optimality criterion and a duality with a zero gap in nonconvex optimization. SIAM Journal on Mathematical Analysis, 24(6):1537-1556. Thach, P.T. (1994). A nonconvex duality with zero gap and applications. SIAM Journal on Optimization, 4(1):44-64. Thach, P.T. (1995). Diewert-Crouzeix conjugation for general quasiconvex duality and applications. Journal of Optimization Theory and Applications, 86(3):719-743. Thibault, L. (1995). Proprie'tks des sous-diflrentiels de fonctions localement lipschitziennes de'finies sur un espace de Banach skparable. Applications. Ph.D. Thesis, Universit6 Montpellier. Tind, J. and Wolsey, L.A. (1981). An elementary survey of general duality theory in mathematical programming. Mathematical Programming, 21:241-261.
1
Unilateral Analysis and Duality
37
Volle, M. (1985), Conjugaison par tranches. Annali di Matematica Pura ed Applicata. Series IV, 139:279-312. Yang, X.Q. and Huang, X.X. (2001). A nonlinear Lagrangian approach to constrained optimization problems. SIAM Journal on Optimixation, 11(4):1119-1144. Ye, J.J., Ye, X.Y., and Zhu, Q.J. (1997). Exact penalization and necessary optimality conditions for generalized bilevel programming problems. SIAM Journal on Optimixation, 7:481-507. Zaffaroni, A. (2004a). Is every radiant function the sum of quasiconvex functions? Mathematical Methods of Operations Research, 59:221-233. Zaffaroni, A. (2004b). A Conjugation Scheme for Radiant Functions. Preprint, University di Lecce. ZMinescu, C. (2002). Convex Analysis i n General Vector Spaces. World Scientific, Singapore.
Chapter 2
MONOTONIC OPTIMIZATION: BRANCH AND CUT METHODS Hoang Tuy Faiz Al-Khayyal Phan Thien Thach Abstract
1.
Monotonic optimization is concerned with optimization problems dealing with multivariate monotonic functions and differences of monotonic functions. For the study of this class of problems a general framework (Tuy, 2000a) has been earlier developed where a key role was given to a separation property of solution sets of monotonic inequalities similar to the separation property of convex sets. In the present paper the separation cut is combined with other kinds of cuts, called reduction cuts, to further exploit the monotonic structure. Branch and cuts algorithms based on an exhaustive rectangular partition and a systematic use of cuts have proved to be much more efficient than the original polyblock and copolyblock outer approximation algorithms.
Introduction
Monotonic optimization, or more generally d.m. (differences of monotonic) optimization, is concerned with nonconvex optimization problems described by means of monotonic and d.m. functions. This class of problems is very wide and includes a large majority of nonconvex problems encountered in the applications, such as: multiplicative programming, quadratically constrained quadratic optimization, polynomial optimization, posynomial optimization, Lipschitz optimization, fractional programming, generalized fractional programming, etc. (Tuy, 2000a; Rubinov et al., 2001; Phuong and Tuy, 2002, 2003; Luc, 2001; Tuy et al., 2004; Tuy and Luc, 2000; Tuy and Nghia , 2003, . . . .) For the numerical study of this class of problems a theory of d.m. optimization (Tuy, 2000a) has been recently developed which shares many common features with d.c. optimization (Tuy, 1995). Just as the ba-
40
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
sic problem of d.c. optimization is convex maximization under convex constraints, the basic problem of d.m. optimization is maximization of a monotonic function under monotonic constraints. In d.c. optimization the separation property of convex sets in its various forms plays a fundamental role. Similarly, in monotonic optimization a key role is given to a separation property of normal sets (i.e. upper level sets of increasing functions), where however, separation is achieved by a cone congruent to the nonnegative orthant rather than by a halfspace. Based on this specific separation property, normal sets can be approximated as closely as desired by particular geometric objects called "polyblocks" (the analogues of polytopes), so that monotonic optimization problems can be solved by polyblock outer approximation algorithms analogous to polyhedral outer approximation algorithms for convex maximization. Limited computational experience with polyblock algorithms has shown that they work quite well for problems whose equivalent monotonic formulation has relatively small dimension n, typically n 5 10 (Rubinov et al., 2001; Tuy and Luc, 2000; Tuy et al., 2004; Luc, 2001; Phuong and Tuy, 2002). However, for n 2 5 the algorithm often converges slowly near to the optimum and needs a large number of iterations to reach the desired accuracy. This phenomenon is quite common for outer approximation procedures. According to the outer approximation scheme, each new polyblock is derived from the previous one by cutting off some unfit portion disclosed by the separation property. But, as has been often observed, cuts of this kind usually become shallower and shallower in high dimension. One may wonder whether it is possible to make these cuts deeper by removing the whole removable portion and not only part of it as we did in the original polyblock algorithms. On the other hand, before initializing the polyblock and copolyblock algorithms in Tuy (2000a) a reduction operation is applied to the box containing the feasible set. This reduction actually results from two consecutive cuts, each removing the complement of a cone congruent to the negative or the positive orthant. The question arises as to whether monotonicity properties could be exploited more efficiently by combining reduction with separation cuts and branching to produce algorithms capable of handling larger problems than previously. The purpose of this paper is to investigate these possibilities and to develop branch and cut algorithms for monotonic optimization based on a systematic use of valid cuts exploiting the monotonic structure. In Section 2 we review some basic geometric concepts needed for the foundation of monotonic optimization, such as normal and conormal sets, their separation property, together with the approximation of these sets
Monotonic optimization
41
by polyblocks and copolyblocks. The cuts underlying this approximation are called separation cuts because they are generated by the specific separation property of normal and conormal sets. In Section 3 we introduce valid reduction cuts, which can be used for replacing a given box by a smaller one without losing any feasible solution contained in the box. Section 4 presents a refined version of the polyblock algorithm for solving the canonical monotonic optimization problem. The refinement consists in a maximal use of monotonicity cuts for tightening the outer approximating polyblocks and accelerating the convergence of the outer approximation procedure. As a result a new polyblock algorithm is developed which can also be considered a special branch and cut algorithm combining outer approximation with branching and range reduction. Since dimensionality is a formidable obstacle for every deterministic global optimization method, Section 5 discusses some techniques for transforming nonconvex optimization problems into monotonic optimization problems of reduced dimension and also for handling problems with many intermediate variables. A drawback of the polyblock approximation algorithm is that in the most general case it may be very difficult to obtain by this algorithm an adequate approximate optimal solution in finitely many iterations. To overcome this drawback, Section 6 presents a finite procedure for computing an adequate approximate optimal solution. Another potential difficulty with polyblock algorithms is that the collection of partition sets may quickly increase in size as the algorithm proceeds because at each iteration a partition set may be subdivided into n subsets (n = dimension of the problem). Therefore, in Section 7 a branch-reduce-and-bound algorithm for general d.m. optimization is presented in which successive partition always proceeds by bisection as in conventional rectangular algorithms. Finally, to illustrate the practicability of the proposed algorithms, Section 8 presents some numerical examples taken from the literature and known to be among the hardest test problems. Regarding the notation, throughout the paper, for two vectors x, y E Rn we write u r= x A y, v = x V y to mean ui = min(xi, yi), vi = max(xi, yi), Yi = 1,.. . ,n. For any finite set { x l , . . . ,xm) C Rn we k to mean ui = minixi,. . . x m i ), vi = write u = A ~ = ~vX=~Vg1x , max{xi,. . . ,xT) Y i = 1,.. . ,n.
2.
Basic concepts
We first briefly review some basic concepts of monotonic optimization (Tuy, 2000a).
42
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Increasing function, normal set and polyblock For any two vectors x, y E Rn we write x 5 y (x < y, resp.) to mean X i 5 yi (xi < yi, resp.) for every i = 1 , . . . , n. If a 5 b then the box [a, b] ((a, b], resp.) is the set of all x E Rn satisfying a 5 x 5 b (a < x 5 b, resp.). When x 5 y we also say that y dominates x. As usual, ei denotes the i-th unit vector of Rn, i.e. the vector such that ei = 1, e; = 0 V j # i, and e the vector of n ones, i.e. e = Cy'l ei. A function f : R? + R is said to be increasing if f (x') 1 f (x) when x' 1 x 1 0; strictly increasing if, in addition, f ( x t ) > f (x) when x' > x. A function f is said to be 2.1
decreasing (strictly decreasing) if -f is increasing (strictly increasing). Let [a, b] be a box in R3. A set G c [a,b] is said to be normal in [a, b] (or briefly, normal) if x E G =+ [a,x] C G. A set H C [a,b] is said to be conormal in [a, b] (or briefly, conormal) if x 6 H + [a,x] f l H = 0. (Conormal sets have been previously called "reverse normal" in Tuy (2000a)l If g, h : R 3 + R are increasing functions then clearly the set G = {x E [a, b] I g(x) 5 0) is normal, while the set H = {x E [a,b] I h(x) 1 0) is conormal. Given a set A c [a, b] the normal hull of A, written Al, is the smallest normal set containing A. The conormal hull of A, written LA, is the smallest conormal set containing A. (i) The normal hull of a set A c [a,b] C Rn+ is the PROPOSITION 2.1 set A1 = uZEA[a, z]. If A is compact then so is A]. (ii) The conormal hull of a set A c [a, b] c Rn+ is the set [A = uzEA[z,b]. If A is compact then so is [A. Proof. It suffices to prove (i), because the proof of (ii) is similar. Let P = uzEA[a,z]. Clearly P is normal and P > A, hence P > A1 . Conversely, if x E P then x E [a,z] for some z E A c A], hence x E A1 by normality of A], so that P c ~1 and therefore, P = A]. If A is compact then A is contained in a ball B centered at 0, and if xk E A], k = 1 , 2 , .. . , then since xk E [a, zk] C B, there exists a subsequence {k,) C {1,2,. . .) such that zku + z0 E A, x k -+ xO E [a, zO],hence xOE A], proving the compactness of A:. 0
A polyblock P is the normal hull of a finite set V c [a, b] called its vertex set and is denoted by V = vertP. By Proposition 2.1, P = uZEv[a,z]. A vertex z of a polyblock is called proper if there is no vertex z' # z "dominating" z, i.e. such that z' 2 z. The set of proper vertices of P is denoted by pvertP. An improper vertex or improper
43
Monotonic optimization
element of V is an element of V which is not a proper vertex. Obviously, a polyblock is fully determined by its proper vertex set; more precisely, P = ( p v e r t ~ ) li.e. , a polyblock is the normal hull of its proper vertices. Similarly, a copolyblock (reverse polyblock) Q is the conormal hull of a finite set T c [a, b] called its vertex set. By Proposition 2.1, Q = uzET[z,b]. A vertex z of a copolyblock is called proper if there is no vertex z' # z "dominated" by z, i.e. such that z' 5 z. An improper vertex or improper element of T is an element of T which is not a proper vertex. Obviously, a copolyblock is fully determined by its proper vertex set; more precisely, a copolyblock is the conormal hull of its proper vertices.
PROPOSITION 2.2 a polyblock.
(i) The intersection of finitely many polyblocks is
(ii) The intersection of finitely many copolyblocks is a copolyblock. Proof. If TI, T2 are the vertex sets of two polyblocks PI,P2, respectively, then PI n P 2 = ( U Z E T [a, ~ 21) n (U,€TZ [a,Y] = U Z E T ~ , ~ E[a, T Zz] n [a,Y] = U z E ~ l , y E[a, ~ 22 A y] where u = A y means ui = min{zi, yi) b'i = 1,. . . , n. Similarly, if TI, T2 are the vertex sets of two copolyblocks Q1, Q2, reT ~ [z, b] n [Y, b] = U ~ C , T ? / E~T ~[z V Y, b] spectively, then Qi n Q2 = U ~ E ,,ET~ where v = 2 V y means vi = max{zi, yi) b'i = 1, . . . , n. Finally observe that if x E [a, b] then the set [a, b] \ (x, b] is a polyblock with proper vertices
2.2
The canonical monotonic opt irnizat ion problem
As was proved in Tuy (2000a), by simple manipulations any optimization problem dealing with increasing or decreasing functions can be reduced to the following canonical form:
where [a, b] C Rn+;f , g, h: Rn+ + R are increasing functions, and f , h are U.S.C.(upper semi-continuous) while g is 1.s.c. (lower semi-continuous). Setting
the problem can alternatively be written as max{ f (x) I x E G n H
c [a,b])
44
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
with G, H being closed normal and conormal subsets in [a, b], respectively. Sometimes it will be convenient to refer to the constraint g(x) 5 0 (i.e. x E G) as the normal constraint, and the constraint h(x) 2 0 (i.e. x E H) as the conorrnal constraint. Of course the problem is the same as that of minimizing the lower semi-continuous decreasing function -f (x) over the set G n H . Also note that a minimization problem such as min{f (x) I x E G n H , x E [a, b]).
(MO/B)
- Specifically, can be converted to an equivalent maximization problem. by setting x = a + b - y, = -f(a+ b-y), G = a + b-G, H = a b - H, it is easily seen that the problem MO/B is equivalent to the following MO/A:
+
of(y)
Therefore, in the sequel, we will mostly restrict attention to the problem MO/A. For a closed normal set G in [a, b] a point 3 E G is called an upper boundary point if the cone K$ := {x I x > Z ) contains no point x E G. The set of all upper boundary points of G is called its upper boundary and denoted by d+G. Clearly, if 2 E [a, b] \ G then the first point of G in the line segment joining 2 to a is an upper boundary point of G.
PROPOSITION 2.3 Let G be a closed normal set in a box [a, b], and 2 E [a, b] \ G. If 3 is any point on d+G such that IT;. < 2 then the cone K$ := {x 1 x > IT;.) separates 2 strictly from G. Proof. If there were x E G such that x > 3 then by normality, [Z,x] C G, hence G n K: > [Z,x] n Kz # 0, conflicting with Z being an upper 0 boundary point. We shall refer to the cone K: as a separation cut with vertex Z. The next Corollary shows that with respect to compact normal sets, polyblocks behave like polytopes with respect to compact convex sets.
2.1 Any compact normal set is the intersection of a family COROLLARY of polyblocks. In other words, any compact normal set G can be approximated as closely as desired by a polyblock. Proof. Clearly, if G c [a, b] then Po := [a, b] is a polyblock containing G. Therefore, the family I of polyblocks containing G is not empty. We have G = niEIRibecause if there were x E niErPi\ G there would
45
Monotonic optimization
exist, by the above Proposition, a polyblock P contradiction.
> G such that
x $ P, a 0
Based on these properties, a method for solving problem MO/A consists in generating inductively a nested sequence of polyblocks outer approximating the feasible set:
in such a way that max{f(x) I x E Pk)\ max{f(x) ( x E G n H ) . At each iteration, a vertex zk of the current polyblock Pkis chosen such that f (zk) is maximal among all vertices z of Pk belonging to H (if no such z exists, the algorithm terminates: either a current best feasible solution exists and then it is an optimal solution of MO/A; or else the problem MO/A is infeasible). If zk E G then zk is the sought global maximum. Otherwise, a point xk E dSG is determined such that the set (xk,zk] = {x: xk < x 5 zk) contains no feasible solution and can be removed from [a,zk] without losing any feasible solution. If also xk E H then xk is a feasible solution and can be used to update the current best feasible solution. By removing (xk,zk] from Pk,a new polyblock Pk+l, smaller than Pk,is formed which excludes zk while still containing at least a global optimal solution of the problem. The procedure can then be repeated at the next iteration. Under mild conditions, namely: f (x) upper semi-continuous, a E G, Int G
# a), b E H,
x
> a 'dx E H
it has been shown in Tuy (2000a) that as Ic -+ +co the sequence xk converges to a global optimal solution of MO/A. The convergence speed of the above method critically depends on two operations:
I.) given a point zk E [a, b] \ G, select a boundary point xk of G such that the set {x: xk < x 5 zk) can be removed from [a,zk] without losing any feasible solution; 2. generate the new polyblock set.
Pk+1and
compute its proper vertex
Although the rules given for these operations in the original polyblock approximation algorithm in Tuy (2000a) have proven to perform satisfactorily on problems which can be reduced to equivalent monotonic
46
ESSA Y S AND SURVEYS IN GLOBAL OPTIMIZATION
optimization problems of small dimension, it turns out that the convergence guaranteed by these rules is too slow in the general case. It is therefore of interest to examine how one can speed up the convergence by using more sophisticated rules for these operations, especially for the construction of the new polyblock Pk+l.
3.
Valid reduction cuts
Given a number y E f (GnH) and a box [p, q] c [a,b] we would like to check whether the box [p, q] contains at least one solution to the system
v,
q'] c [p, q] still containing the and to find, if possible, a smaller box best solution of (2.2) in b,q], i.e., a solution x of (2.2) in [p, q] with maximal value f ( x ) . Observe that if g(q) 2 0 then for every point x0 E [p,q] satisfying (2.2) the line segment joining xOto q intersects the surface g(x) = 0 at a point x' E [p, q] satisfying
v,
q'] contains all x E Therefore, it suffices that 0 I h(x), f ( x ) 2 y,i.e., satisfying
PROPOSITION 2.4
(i) If g(p)
(ii)
> 0 or min{h(q), f (q) - y) < 0 then
b, q] satisfying If g(p) 5 0 then the box b, q'] there is no x E
b, q] satisfying g(x) =
(2.2). where q' = p
+ Cy=,ai(qi - pi)e2,
with
b, q]. = q - Cy=,Pi(qi - pi)ei,
still contains all feasible solutions to (2.3) in (iii) If h,(q) 2 0, then the box with
v,q] where p'
still contains all feasible solutions to (2.3) in
b,q]
47
Monotonic optimization
Proof. It suffices to prove (ii) because (iii) can be proved analogously, while (i) is obvious. Since qQ= aiqi + (1 - ai)pi with 0 5 ai 5 1, it follows that pi 1q i 1 qi Vi = 1 , . . . ,n , i.e. [p, q'] c [p, q]. Recall that
+
For any x E G fl b, q] we have, by normality, [p, x] c G, so xi = p (xi - pi)ei E G, i = 1,.. . ,n. But xi L: qi, SO xi = p a(qi - pi)ei with 0 1a 1. This implies that a 5 ai, i.e. xi 5 p + ai(qi - pi)e2, i = 1,.. . ,n , and consequently x q', i.e. x E [p, q']. Thus, G n [p, q] c G n [p, q'], which completes the proof because the converse inclusion is obvious from the fact [p, q'] c [p, q].
<
+
<
Clearly the box b, q'] defined in (ii) is obtained from [p, q] by cutting off the set U & ~ { X I xi > q:), while the box v , q ] defined in (iii) is obtained from [p,q] by cutting off the set U ~ = ~ { XI xi < pi). The cut U ~ = ~ {I xi X > q:) is referred to as an upper y-valid cut with vertex q' and the cut U ~ = ~ {I xi X < pi) as a lower y-valid cut with vertex p', applied to the box [p, ql. Using these cuts we next define a box redy[p, q] referred to as y-valid reduction of [p, qj:
where
As a consequence of the above, we can also state
PROPOSITION 2.5 Let E = {x E G n H I f (x) > y), and let P be a polyblock containing E, with proper vertex set V. Let V' be the set obtained from V by deleting every x E V satisfying redy[a,x] = 0 and replacing every other z E V with the highest corner z' of the box redy [a,z] = [a, z']. Then the polyblock P' generated by V' satisfies { x E G ~ IHf ( x ) > y ) c P'c P.
(2.9)
Proof. Since E (7 [a,x] = 8 for every deleted z , while E f l [a, z] C redy [a,x] := [a,x'] for every other x, the proposition follows. 0
48
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
We shall refer to the polyblock P' with vertex set V' as the y-valid reduction of the polyblock P and denote it by redyP. As we saw the reduction amounts to a number of monotonicity cuts.
The polyblock algorithm
4.
Recall that at the k t h iteration of the polyblock outer approximation method for solving MO/A outlined in Subsection 2.2 we have a polyblock Pk > G n H n {x I f (x) 2 3/lc) where yk is the current best value (the objective function value at the best feasible solution so far available). Then let zk be a maximizer of f (x) over the proper vertex set vk of Pk. If zk E G then it is the sought optimal solution. Otherwise, two operations are performed: 1. Select a point xk E d+G such that the set {x I xk < x 5 zk) can be removed from [a,zk] without losing any feasible solution x with f (4 2 7. 2. Generate the new polyblock Pk+land compute its proper vertex set Vk+l. Let us describe how to perform these two operations.
4.1
Computing the boundary point xk.
Given a point zk E [a, b] \ G we want to select a boundary point xk of G such that the set {n: E [a,b] x > x k ) cuts off zk without cutting off any point of G. As in Tuy (2000a) we take xk = n(zk), where n(xk) is the first point of G on the line segment joining zk to a , i.e.
I
n(zk) = zk - Xk(z.k ' - a ) ,
Xk
= min{X
I g(zk - X(zk - a)) 5 0).
Then the convergence speed depends on how fast zk - xk
-+
0 as k
-+
+m.
Experience has shown that, with this choice of x k , if the optimal solution lies in a strip {x I ai xi ai E ) with E > 0 very small for some i E (1,. . . ,n ) , then as zk approaches this strip the ratio Xk = Izf - x$l/ (zf - ai) may decrease slowly, causing a slow convergence of zk to the optimal solution. To prevent such an event, it suffices to arrange that (2.10) x > a + a e V X E H.
<
< +
where a > 0 is chosen not too small compared to Ilb - all (e.g. a = 1/411b - all). This can be achieved by simply shifting the origin to -ae. A point x E G f l H is said to be an upper basic solution of MO/A if there is no y E G n H such that y x, y # x. Clearly an upper basic
>
49
Monotonic optimization
solution must be an upper boundary point of G and for any y E G n H there is an upper basic solution x y, namely x = zn, where z' E argmax{xl I x E G n H, x 2 y), xi E argmax{zi I x E G n H, z 2 xi-') for i = 2 , . . . , n. Therefore, an optimal solution of the problem MO/A can always be found among the upper basic solutions.
>
PROPOSITION 2.6 Assume that (2.10) holds. Let a' = a - ae, G = (G - Rn+)n [a', b]. Then G is a closed normal set such that every upper basic optimal solution of the problem
is also an upper basic optimal solution of MO/A. Proof. That G is closed is clear. To see that G is normal, let x E G, so that x E [a', b] and x E G - Rn+,i.e. x y for some y E G. If x' E [a', x] then on the one hand, a' x' x 5 b, hence x' E [a', b] , on the other hand, x' x 5 y E G, hence x' E G - Rn+, so x' E G, proving the normality of G. Now let Z be an upper basic optimal solution of (2.11). Then Z E G and whenever x' E G, x' 2 3 then x' = 3. Since Z E G we have Z 5 y for some y E G c G, hence Z = y E G. Furthermore, if x' E G and x' 2 Z then, since G c G, we must have x' = Z, so 3 is an upper extreme point of G and obviously an optimal solution of
< <
<
<
MO/A. Note that a' < a E G c G, so Int G # 0 and for any z E [a', b] \ G the line segment joining z to a' meets I ~ + G at a unique point. Thus upon a change of notation: a c a', G t G one can assume that condition (2.10) holds for problem MO/A. REMARK2.1 The set G = (G - Rn+) fl [a', b] consists of all x E [a', b] such that x 5 y, g(y) 0 for some y. For a given z E [a', b] \ G the first point of G on the line segment joining z to a' can be computed by solving (eg. by a Bolzano binary search) the subproblem
<
min{X I z
4.2
- X(z
-a') -
< o , ~ ( <~ 0), a 5 y < b).
Generating the new polyblock
The next question concerns the derivation of the new polyblock obtained from the current one by removing an unfit portion, i.e., Given a polybloclc P with V = pvertP, a vertex v E V and a point x E [a,v) such that G f l H c P \ (x, b], determine a new polyblock P' satisfying P \ (x, b] c P' c P \ {v).
50
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
In the original Algorithm 1 in Tuy (2000a), PI was taken to be the polyblock generated by the set
The drawback of this polyblock PI is that it still contains a significant portion of ( x ,b], if there are more than one element v E V satisfying v > x. The next proposition indicates a simple way to compute the proper vertex set of the polyblock P \ ( x ,b]. For any two x, y let J ( z , y) = { j I zj > y j ) and if a x 5 z 5 b then define zi = z ( x i - zi)ei, i = 1,. . . , n , so that, by (2.1), xi := z A ui.
<
+
PROPOSITION2.7 Let P be a polyblock with proper vertex set V c [a,b] and let x E [a,b] satisfy V, := { z E V I z > x ) # 0. Then the polyblock PI := P \ ( x ,b] has vertex set
TI = (V\ V*)u {xi = z + (xi - zi)ei I z E V*,i E { I , ...,n ) ) . (2.13)
+
The improper elements of T 1 are those zi = z ( x i - zi)ei for which there exists y E V$ := { z E V I z 2 x ) such that J ( z ,y) = { i ) . In other words, the proper vertex set of the polyblock PI := P \ ( x ,b] is obtained from TI by removing improper elements according to the following rule: For every pair z E V*, y E V . compute J ( z ,y) = { j I zj > y j ) and if J ( z , y) consists of a single element i then remove zZ. Proof. Since [a,z] n ( x ,b] = 0 for every z E V \ V*, it follows that P \ ( x ,b] = PI U P2, where PI is the polyblock with vertex V \ V, and P2 = (UZEV*[a,x])'\(x,b] = UZEV* ( [ a x]\(x, , b]).Noting that [a,b]\(x, b] is a polyblock with vertices given by (2.I ) , we can then write [a,z]\ ( x ,b] = [a,z] n ( [ a b] , \ ( x ,b])= [a,z] n (Ui=l,...,,[a, u i ] )= Ui=l,...,n[a,z] fl [a,ui] = Ui=l,...,,[a, x A ui],hence P2 = ~ { [ za A, ui] I z E V*,i = 1 , . . . ,n ) , which shows that the vertex set of P \ ( x ,b] is the set TI given by (2.13). It remains to show that every y E V \ V* is proper, while a zZ = z + ( x i - zi)ei with z E V* is improper if and only if J ( z ,y) = { i ) for some y E V . . Since every y E V \ V* is proper in V , while zZ z E V for every zi it is clear that every y E V \ V, is proper. Therefore, an improper element must be some zi such that xi y for some y E TI. Two cases are possible:
<
<
51
Monotonic optimization
<
<
xi we must have x y, In the first case (y E V), since obviously x : furthermore, z j = zj yj tij # i, hence, since z $ y, it i.e. y E ;V follows that zi > yi, i.e. J ( z , y) = {i). In the second case, zi 5 y1 for some y E V* and some 1 E { I , . . . ,n). Then it is clear that we cannot y$ tij = 1,..., n. If 1 = i then this have y = z, SO y # z and 2; implies that zj = 2% yj = yj tij # i, hence , since z $ y it follows that zi > yi and J&,y) = {i). On the other hand, if 1 # i then from zj y; tij = 1,...,n we have y yj tij # i and again since z f y, we must have zi > yi, so J ( z , y) = {i). Thus an improper xi must satisfy J ( z , y) = {i) for some y E V . . Conversely, if J ( x , y) = {i) for some : then z! yj V j # i, hence, since z: = xi and xi = y: if y E V* yEV or else Xi yi lf y E V,S, it follows that zi yi if y E V, or else zi y, if y E V,f; so in either case zi is improper. This completes the proof of 0 the Proposition.
<
<
<
<
<
4.3
<
<
<
Polyblock algorithm
With the above rules for computing xk and for determining the new polyblock at each iteration k, we now formulate a polyblock algorithm which substantially improves upon an earlier algorithm proposed in Tuy (2000a) for solving problem MO/A. As was shown in Subsection 4.1, without loss of generality we may assume that condition (2.10) ) holds. Furthermore, since the problem is obviously infeasible if b $! H, whereas b is an obvious optimal solution if b E G n H, we may assume that
Algorithm PA
[Polyblock Algorithm]
Proof. Step 0 Let Pl be an initial polyblock containing G n H and let Vj. be its proper vertex set. For example, Pl = [a,b], Vj. = {b). Let 2' be the best feasible solution available and CBV = f ( 3 ' ) (if no feasible solution is available, set CBV = -m). Set k = 1. Let Step 1. b + ~ { EzFk).
Fk = red7Pk and Vk = pvertPk, for y = CBV.
Reset
Step 2. 1f Vk = 0 terminate: if CBV = - m the problem is infeasible; if CBV > - m the current best feasible solution zk is an optimal solution. If Vk # 0, select zk E argmax{f (z) I z E ck). If g(zk) L 0, Step 3. then terminate: zk is an optimal solution. Otherwise go to Step 4.
52
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Step 4. Compute xk = zk - Xk(zk - a ) , with Xk = min{X I zk - X(zk - a ) E G) (so xk is the first point of G on the halfline from zk to a). Determine the new current best feasible solution zk+' and the new current best value CBV. Compute the proper vertex set Vk+l of Pk+l= Fk\ (xk,b] according to Proposition 2.7. Step 5.
Increment k and return to Step 1.
THEOREM 2.1 Whenever infinite Algorithm PA generates an infinite sequence {xk) every cluster point of which is a global optimal solution. Proof. First note that by replacing Pk with pk = redyPk in Step 1, all z E Vk such that f (z) < y are deleted, so that at any iteration k all feasible solutions z satisfying f (z) 1 CBV are contained in the polyblock Fk. This justifies the conclusions when the algorithm terminates at Step 2 or Step 3. Therefore, it suffices to consider the case when the algorithm is infinite. Condition (2.10) implies that
where p = alllb - all. We contend that zk - xk --+ 0 as k --+ oo. In fact, suppose the contrary, that there exist 7 > 0 and an infinite sequence kl such that llzkl - xklII 2 7 > 0 b'l. For all p > 1 we have zkp $! (xkl,zkl] because Pk, C Pkl\ (xkl, b] . Hence, 11 zkp - zkl11 1 mini,' ,...,, Izikl - xik ' 1. k On the other hand, mini=l,,,,,n(zil - ai) pllzk1 - all by (2.15) because zkl E H, while xk' lies on the line segment joining a to zk1, so zik ' - xfl = kl z? - ai/llzk1 - all llzkl - xkl 11 2 pllzk1 - x k 111 b'i, i.e. lzi 1 1 , O I I Z ~ ~ - xk1ll. Therefore, Ilzk* - zk111 L mini=l,...,n Izikl - xikl I > pllzkl - xkl 11 2 pq, conflicting with the boundedness of the sequence {zkl) c [a, b]. Thus, zk - xk -+ 0 and by boundedness, we may assume, by passing to subsequences if necessary, that xk --+ 3, zk --+ 3. Then, since zk E H, xk E G b'k, it follows that E G n H , i.e. 3 is feasible. Furthermore, f ( z k ) f ( z ) b'z E Pk > G n H , hence by letting k +oo: f (3) f (x) b'x E G n H, i.e. 3 is an optimal solution. 0
>
XP
>
>
--+
REMARK2.2 Algorithm PA differs from the original Algorithm 1 in Tuy (2000a) mainly in that monotonicity cuts are used systematically to reduce the feasible portion currently still of interest. All the remarks about implementation issues in Tuy (2000a), Section 5, for the original Algorithm 1 in Tuy (2000a), also apply for Algorithm PA. In particular,
53
Monotonic optimization
(i) To avoid storage problems in connection with the growth of the it may be useful to restart the Algorithm whenever set exceeds a prescribed limit L. If o x is the point where we would like to restart the algorithm (usually, o z = xk or current best solution), then Step 5 should be modified as follows.
vk,
vkI
Step 5. If IVk+ll 5 L, then set k Otherwise go to Step 6.
t
k
+ 1 and return to Step 1.
-
Step6. R e d e f i n e ~ ~ = n ( Q x ) , V ~ + ~ = { b + ( x ~ - b ~ ) ..., e ~ ,n), i=1, and return to Step 1. With this modification an occurrence of Step 6 means a restart, i.e. the beginning of a new cycle of iterations. (ii) Computational experience seems to suggest that the convergence should usually be faster for problems with a more balanced feasible set G n H; in other words, e.g. if a box [a,b] can be chosen so that min{bi - ai/bj - a j I i < j ) is more or less near to 1. To achieve this, a rescaling may often be useful.
REMARK 2 . 3 Since in practice we must stop the algorithm at some iteration k , the question arises as to when to stop (how large k should be) to obtain a zk sufficiently close to an optimal solution. If the problem MO/A does not involve any conormal constraint, i.e. has the form max{f (4 1 9(x>5 0, x E [a, bl),
+
then xk is always feasible, so when f (xk)- f (xk) < E , then f (xk) E > f ( z k ) 2 max{f(x) 1 g(x) 5 0 5 h(x),x E [a, b ] ) , i.e. xk will give an &-optimalsolution of MO/A. In the general case, xk may not be feasible and it may happen that Algorithm PA cannot provide an &-optimal solution in finitely many iterations. Since however, g(zk) - g(xk) -+ 0 and g(xk) 5 0, we must have g(zk) ) I for k sufficiently large. Then xk is an E-approximate optimal solution, in the sense that it is an optimal solution of the perturbed problem
Although f (zk) tends to the optimal value w of MO/A, the drawback of the algorithm is that it does not indicate how small should be E > 0 to have f (zk) sufficiently close to w .
54
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Copolyblock algorithm
4.4
A similar copolyblock algorithm can be formulated for solving the monotonic minimization problem
Without loss of generality one can assume that
(G, H have the same meanings as previously). For a given number 6 E f (G n H ) (eg. 6 is the value of the objective function at the best feasible solution so far available) let
By 6-valid reduction of a box
b,q] c [a,b] we mean the box
where
If Q is a copolyblock with proper vertex set V then the &valid reduction of Q is the copolyblock redsQ generated by the set V' obtained from V by deleting every z E V satisfying reds[z, b] = 0 and replacing every other z E V by the lowest corner z' of the box reds[z, b]. The next proposition is the analogue of Proposition 2.7:
PROPOSITION 4.2' 1 Let Q be a copolyblock with proper vertex set V c [a, b] and let x E [a, b] satisfy V, := {x E V I z < x ) # 0. T h e n the copolyblock Q' := Q \ [a,x) has vertex set
T' = (V \ V*) U {xi = z .t(xi - zi)ei 1 z E V*, i E (1,.. . ,n ) ) . The improper elements of T' are those xi = x+ (xi - zi)ei for which there exists y E V$ := (x E V I x 5 x ) such that J ( y , z ) := { j I yj > zj) = { i ) . I n other words the proper vertex set of the copolyblock Q' := Q\[a, x) is obtained from T' by removing improper elements according to the rule:
55
Monotonic optimization For every pair z E V*, y E V$ compute J ( y , z) = { j ( yj J ( y , z) consists of a single element i then remove z2.
> zj) and zf
With the above background the copolyblock algorithm for solving MO/B reads as follows (assuming f , g 1.s.c. and h U.S.C.):
Algorithm QA [Copolyblock Algorithm] Let Q1 be an initial copolyblock containing G n H and let Step 0. & be its proper vertex set. For example, Q1 = [a, b], Vl = {a). Let 3' be the best feasible solution available and CBV = f (3') (if no feasible solution is available, set CBV = +m). Set k = 1.
Step 1. Let = redsQk and Reset a c ~ { EzVk).
Vk = p V e r t G ~ kfor , 6 = CBV.
Step 2. 1f Vk = 0 terminate: if CBV = +m the problem is infeasible; if CBV < +m the current best feasible solution zk is an optimal solution. Step 3. If Pk# 0, select zk E argmin{f (z) I z E Vk). If h(zk) 2 0, then terminate: zk is an optimal solution. Otherwise go to Step 4.
+
Step 4. Compute xk = zk Xk(b - zk), with Xk = min { A I h(zk X(b - z k ) ) 2 0) (so xk is the first point of H on the halfline from zk to b). Determine the new current best feasible solution zkfl and the new current best value CBV. Compute the proper vertex set Vk+1 of \ [a,xk) according to Proposition 4.2'. Qk+1 =
+
o~~
Step 5..
Increment k and return to Step 1
REMARK2.4 Just as with the Algorithm PA for MO/A, although the condition x < b b'x E G is sufficient for the convergence of the algorithm, to prevent possible jams near to certain facets of the box [a,b] one should arrange so that x < b - a e VXEG, where cr > 0 is a constant not too small compared to Ilb - all. Also, if there is no normal constraint, i.e. if the problem has the form
then xk yields an &-optimal solution when f (xk) - f (zk) ) i. In the general case, xk can only yield, for k so large that g(xk) ) E , an iapproximate optimal solution of MO/B, i.e. an optimal solution of the
56
5.
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Dimension reduction techniques
At the present state of knowledge, no deterministic global optimization method can pretend to be able to solve efficiently general nonconvex problems of large dimension. The monotonic optimization method is not an exception. Nevertheless, a host of large-scale nonconvex problems can be reduced to monotonic optimization problems of much reduced dimension. Therefore in practice the range of problems solvable by the monotonic optimization approach extends far beyond problems of small dimension. Since in any event the dimension of a global optimization problem is a crucial factor determining its difficulty, it is of utmost importance, before embarking on a global solution procedure, to reduce the dimension whenever possible. As shown in the next examples, this can often be done by simple transformations.
EXAMPLE 2.1 Let ui (x), i = 1,. . . ,m, be positive-valued increasing functions on Rn+, and D = {x I Ax 5 c) be a polytope in RF. Since A = A+ -A- where A+, A- are matrices with nonnegative components, the constraint Ax 5 c is in fact a d.m. constraint, so the "multiplicative programming" problem
is a monotonic optimization problem in Rn+, In most practical cases, m is much smaller than n. Choosing yi = ui(x) as new variables we can rewrite this problem as
Since the function
nL1yt is increasing on RT while
is a normal compact set, hence contained in some box [a, b] (e.g. ai = min{ui(x) I x E D), bi = ~ ~ x { u ~ ( (xx E) D)), this is a monotonic optimization problem in RI;.:
Monotonic optimization
57
Using this transformation fairly large scale multiplicative programs with hundreds of variables and up to 10 functions u ~ ( x )can be solved efficiently by the polyblock approximation algorithm, even in its original version (see e.g. Rubinov et al., 2001), Tuy and Luc (2000); Phuong and Tuy (2003). Similarly, the problem
is equivalent to
More generally, a problem of the form
with
where G is a compact normal set G defined by
For computational experiments with the polyblock algorithm on problems of the above form the reader is referred to Phuong and Tuy (2003).
EXAMPLE 2.2 Consider the problem
where the objective function depends only on some but not all variables. If also the conormal constraint depends only on y, i.e. h(y, t ) G k(y), the problem is actually equivalent to a problem in y only, namely
In fact, if (y, t) is a feasible solution to (2.19) then g(y, c) 5 g(y, t) 1 0, and conversely, if y is,a feasible solution to (2.20) then obviously (y, c) is a feasible solution to (2.19).
58
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
If t E R we show that Algorithm PA can be modified so as to work essentially in the y-space (i.e. Rn) The problem (2.19) can be rewritten as
where
(G is a normal set, H a conormal set in [a,b].) Suppose now that Algorithm PA is applied to solve (2.18). As usual we assume (2.10), i.e. (y, t) > (a, c) Yx = (y, t) E H. Consider the point zk = (uk,s k ) E H \ G chosen in Step 3. Let ik= (uk,s k ) , where ik - min{t I h(uk,t) 2 0), so that ikE H but ( u k , t ) $ H for every t < ik.If ikE G then ikis feasible and by the choice of zk it follows that ikis an optimal solution. So let ik$ G and denote by xk = ( y k , t k )the first point of G on the line segment joining ikto (a, c). By removing (xk,xk] from the box [(a,c) ,zk], we obtain a polyblock with vertices
Since in view of (2.10) xk < ikwe have tk < sk,hence xkln+' $ H and will be dropped. Thus, only x k l , . . . , xkn will remain for consideration, as if we were working in Rn. This discussion suggests that to solve problem (2.21) by Algorithm PA, Steps 3 and 4 should be modified as follows:
vk # 0, select xk = (u, s ) E argmax{f
(2) I r E vk}. Compute ik-- min{t I h(uk,t) 0). If ik:= (uk,ik)E G, terminates: an optimal solution has been obtained. Otherwise, go to Step 4.
Step 3.
If
k
k
>
Compute xk = ik- A k ( i k - (a, c)), with Ak = min {A I Step 4. i k - A ( z k - (a, c)) E G). Determine the new current best feasible solution zk+' and the new current best value CBV. Compute the proper vertex set Vk+' of Pk+1= & \ (zk,b] according to Proposition 7. In this manner, the algorithm will essentially work in the y-space and can be viewed as a polyblock outer approximating procedure in this space. The above method cannot be easily extended to the general case of (2.19) when t E Rm with m > 1. However, to take advantage of the fact that the objective function depends on a small number of variables one
Monotonic optimization
Figure 2.1. Inadequate &-approximateoptimal solution.
can use a branch and bound procedure (see Sections 7,8 below), with branching performed on the y-space.
6.
Successive incumbent transcending algorithm
The &-approximateoptimal solution, as computed by Algorithm PA for MO/A (or Algorithm QA for MO/B) in finitely many iterations, may not be an adequate approximate optimal solution. In fact it may be infeasible and for a given E > 0 it may sometimes give an objective function value quite far from the actual optimal value of the problem, as illustrated by the example depicted in Figure 1.1 where x* is almost feasible but not feasible. To overcome this drawback, in this Section we propose a finite algorithm for computing a more adequate approximate optimal solution of MO/A. 0) # 0 (this is a mild Assume that {x E [a,b] I g(x) < 0, h(x) assumption that can often be made to hold by shifting a to a' < a). For E > 0 satisfying {x E [a, b] I g(x) 5 - E , h(x) 0) # 0, we say that a feasible solution Z is essentially E-optimal if
> >
Clearly an infinite sequence {z(E)),E \ 0, of essentially &-optimalsolutions will have a cluster point x* which is a nonisolated feasible solution satisfying f (x*) = max{ f (x) I x E S*),
60
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where S* = cl{x I g(x) < 0 5 h ( x ) , x E [a, b]). Note that S* may be a proper subset of the feasible set {x I g(x) 5 0 5 h(x), x E [a,b]) (which is closed, since g is 1.s.c. and h is u.s.c.). Such a nonisolated feasible solution x* is referred to as an essential optimal solution. Basically, the proposed algorithm for finding an essentially e-optimal solution of MO/A is a procedure for successively solving a sequence of incumbent transcending subproblems of the following form: (*) Given a real number y, find a feasible solution with an objective function value exceeding y, or else prove that no such solution exists. As will be shortly seen each of these subproblems reduces to a MO/B problem.
6.1
Incumbent transcending subproblem
For any given y E R U (-oo) consider the problem min{g(x) min f (x) 2 y, h(x)
> 0, x E [a,b]).
Since this is a MO/B problem without normal constraint, an e-optimal solution of it can be found by Algorithm QA in finitely many iterations (Remark 4). Denote the optimal values of MO/A and (Bly) by max MO/A and min B/y, respectively. (i) Ifmin(B/y) > 0 then any feasible solution 2 of MO/A such that f (Z) y - e is an e-optimal solution of MO/A. Hence, if min(B/y) > 0 for y = -oo then MO/A is infeasible.
PROPOSITION 2.8
>
(ii) If min(B/y) < 0 then any feasible solution o x of (Bly) such that g ( o x ) < 0 is a feasible solution of MO/A with f ( o x ) 2 y. (iii) If min(B/y) = 0 then any feasible solution 3 of MO/A such that g(Z) 5 - E and f (2) y - E is essentially &-optimal.
>
Proof. (i) If min(B/y) > 0 then, since every feasible solution x of MO/A satisfies g(x) 5 0, it cannot be feasible to (Bly). But h(x) 0, x E [a, b], hence f (x) < y. Consequently, max (MOIA) < y, i.e. f (z) y - E > rnax (MOIA) - E , and hence 2 is an &-optimalsolution of MO/A. (ii) If o x E [a, b] is a feasible solution of (Bly) while g(Gx) < 0, then g(Bx) < 0, f ( 6 x ) y, h ( o x ) 2 0, hence Gx is a feasible solution of MO/A) with f ( o x ) 2 y. (iii) If min(B/r) = 0 then any x E [a, b] such that g(x) -E, h(x) 0, is infeasible to (Rly), hence must satisfy f (x) < y. This implies that y sup{f(x)l g(x) 5 - E , h(x) 0, x E [a, b]), so if a feasible solution n: of MO/A satisfies g(3) 5 - E and f (3) 2 - E , then f (2) E 2
>
>
>
>
<
>
>
A/
+
61
Monotonic optimization
sup{ f (x) I g(x) 5 -e, h(x) y &-optimalto MO/A.
6.2
2 0, x
E [a, b]), and so 3 is essentially 0
Successive incumbent transcending algorithm for MO/A
Proposition 2.8 can be used to devise a successive incumbent transcending procedure for finding an &-essentialoptimal solution of MO/A. Before stating the algorithm we need some definitions. A box [p, q] c [a,b] can be replaced by a smaller one without losing any x E [p, q] satisfying g(x) 0, f (x) y, h(x) 2 0, i.e. without losing any x E [p, q] satisfying
<
>
g(x) 5 0 5 h,(x) := min{f (x) - y, h(x)). This reduced box redo[p, q], called a valid reduction of [p, q], is defined by
where n
As in Section 3, it can easily be proved that the box redo[p, q] still contains all feasible solutions x € b,q] of (Bly) with g(x) 0. For any given copolyblock Q with proper vertex set V, denote by redoQ the copolyblock whose vertex is obtained from V by deleting all z E V satisfying redo[z,b] = 0 and replacing every other z E V with the lowest corner z' of the box redo[z,b]. Also for any z E [a, b] denote by py(z) the first point where the line segment joining z to b intersects the surface h-, (x) := mini f (x) - y, h(x)) = 0, i.e.
<
p-,(z) = z
+ p(b - z),
Algorithm SIT/A
with
p = min {p' I h-, (z
+ pt(b -
2))
L 0).
[Successive Incumbent Transcending for MO/A]
Let 3 be a best feasible solution available for MO/A, Step 0. and let Vl be the proper = py(z), Q1 = [a, b] \, [a, 7 = f (3) E , vertex set of the copolyblock Q1 (if no feasible solution is available, set y = -co,Q1 = [a,bj). Set k = 1.
+
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
62
-
-
Let Vk = redoVk. If Vk = 0, terminate: 3 is an €-optimal Step 1. solution of MO/A if y > -oo, while the problem is infeasible if y = -oo. # 0, reset a +- ~ { Ez and go to Step 2. If
vk
vk)
Select zk E argmin{g(z) I z E vk). Compute xk = ,o-, (zk), Step 2. the first point satisfying hy(x) = 0 on the line segment joining zk to b. If g(xk) 5 0, then xk is a feasible solution of MO/A with f (xk) 2 y : go to Step 3. If, on the contrary, g(xk) > 0, then go to Step 4. Step 3. Return to S t e 0~ with 3 copolyblock generated by Vk.
t
xk, Vl
+-
vk and Q1 being the
If g(zk) > 00,terminate: Z is an €-optimal solution. If Step 4. - E < g(zk) 0, terminate: Z is essentially e-optimal. If g(zk) 5 -e, then let Qk be the copolyblock with (proper) vertex set Pk. Compute the copolyblock Qk+1 := Q~\ (xk,a] and its proper vertex set Vk+1according to Proposition 4.2'. Increment k and return to Step 1.
<
THEOREM 2.2 Algorithm SIT/A terminates after finitely many iterations, yielding either an E-optimal solution Z of MO/A or a feasible solution Z that is essentially e-optimal.
vk
<
Proof. By construction spans a copolyblock Qk > {x I g(x) 0, f ( x ) y). Therefore, if Step 1 occurs then Qk = 0, so no feasible solution x exists satisfying f (x) 2 y = f (Z) E . Hence Z is an €-optimal solution if y > -oo while infeasibility is detected if y = -oo. In Step 2, if g(xk) 0, then xk is feasible to both MO/A and (Bly), which implies that xk is a feasible solution to MO/A with f (xk) 2 y. In Step 4, if g(zk) > 0, the inclusion {x I g(x) 0, f (x) 2 y) c Qk shows that min(B/y) > 0, and so, by Proposition 2.8, It. is an €-optimal solution of MO/A. If -E < g(zk) 0 , then Z is a feasible solution of MO/A with f(3) = y - E , while - E < min{g(x) I f ( x ) 2 y, h(x) 2 0 , x E [a,b]), -e, hence f ( x ) < y = f(Z) E for all x E [a,b] satisfying g(x) h(x) 0. This means that Z is essentially €-optimal. There remains to show that the algorithm is finite. Since at every occurrence of Step 3 the current best value f (z) improves at least by E > 0 while it is bounded above by f (b), it follows that Step 3 cannot occur infinitely many times. Therefore, there is Lo such that for all k ko we have g(xk) > O! and also g(zk) ) 0 < g(xk). From this moment the algorithm works exactly as a procedure QA for solving the problem (B/y). Since g(xkj- g(zk) + 0 for k -++oo by the convergence of this procedure, the event -E < g(zk) 0 must occur for sufficiently 0 large k. This completes the proof.
>
+
<
<
>
< +
<
>
<
63
Monotonic optimization
REMARK 2.5 The above algorithm SIT/A proceeds by copolyblock approximation of the set {x E [a, b] I g(x) 5 0 5 h(x), f (x) 2 w), where w is the optimal value of MO/A. A similar algorithm SIT/B can be developed for MO/B, which proceeds by polyblock approximation of the set {x E [a, b] I g(x) 5 0 5 h(x), f (x) a), where a is the optimal value of MO/B.
<
7. 7.1
BRB algorithm for general D.M. optimization D.M. functions
A function f : RT
R is said to be a d.m. function if it can be represented as a difference of two increasing functions: f (x) = f l ( x ) f2 (x), where f l , fi : R r + R are increasing. --t
Examples of d.m. functions 1. Any linear, quadratic, polynomial function, or more generally, any signomial P ( x ) = C, c,xa, for x E R3, where a = ( a l , . . . , a n ) 0, and x" = X ~ . x?, ~ ca X E R. ~ In~ fact P ( x ) = Pl(x) - P2(x), where Pl(x) is the sum of terms c,xff with c, > 0 and -P2(x) is the sum of terms coxa with c, < 0. 2. Any convex function f : [a, b] c RT -t R. In fact, since the 8f (x) is bounded, we can take M > K = sup{IIpII 1 p E set UzE[a,bl af (x),x E [a,b]). Then the function g(x) = f (x) M C:=l xi satisfies, for a 5 x 5 x' b:
>
+
+
<
n
n
>
>
where p E 8f (x), hence g(xl) - g(x) ( M - K ) ELl(xk - xi) 0. This shows that g(x) is increasing and hence that f (x) is d.m. because M C:=l xi is obviously increasing. 3. Any Lipschitz function f (x) on [a,b] with Lzpschitz constant K . Specifically, g(x) = f (x) M Cy.lxi where M > K is increasing on [a, b], because for a x x' b one has
+ < < <
Denote by DM[a.,b] the class of d.m. functions on [a, b]. The above examples show that this class is extremely wide and includes all d.c.
64
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
(differences of convex functions) as well as Lipschitz functions. In Tuy (2000b) we have also shown that many composite functions of the form f (g(x)) are actually members of DM[a, b]. The following property is very useful in monotonic optimization.
PROPOSITION 2.9 DM[a, b] is a vector lattice. Proof. That DM[a, b] is a vector space is plain. To see that it is a lattice we write
and note that the sum, the upper and the lower envelopes of finitely 0 many increasing functions are increasing. COROLLARY 2.2 Any conjunctive or disjunctive system of d.m. inequalities is equivalent to a single one: gi(x) 5 0 Vi = 1, . . . ,m gi(x)
7.2
* 2=1, max gi (x) 10, ...,m ,
gi(x) 1 0 . I 0 for at least one i = l , . .. , m H i=l,min ...,m
D .M. optimization problem
Consider now the general d.m. optimization problem under d.m. constraints: (DM) max{f ( 4 I g(x) - h(x) 0, x E [a,bl}
<
where f (x) = fl (x) - fi(x), and f l , fi, g, h are continuous increasing functions on [a,b]. The next proposition shows how to convert a problem (DM) into a canonical monotonic optimization problem MO/A.
PROPOSITION 2.10 There exists an x E b,q] satisfying g(x) - h(x) 10 if and only if h(q) - g(p) 2 0 and there exists a t E R such that (x, t) satisfies
Proof. If there exists x E b,q] satisfying g(x) - h(x) 5 0, then there exists t' E R such that g(x) t' 1h!x) and setting t' = -t+ h(q) we then have g(x)+t h(q) h(x)+t; furthermore, since g(p) 5 g(x) 1h(q)-t,
<
<
<
65
Monotonic optimization
h(q) 2 h(x) 2 h(q) - t it follows that 0 5 t 5 h(q) - g ( ~ ) Conversely, . if (x, t) satisfies (2.23) then it is obvious that x E b, q] and g(x) i h(x), 0 i.e. g(x) - h(x) 5 0 . As a consequence of this Proposition the problem (DM) can be rewritten as
Based on this fact, problem (DM) can be solved by solving the equivalent canonical monotonic optimization problem (2.24) (by Algorithm PA or better, by Algorithm SIT/A). We now present a direct approach to (DM), not requiring additional variables. This new algorithm will be referred to as a branch-reduce-andbound (briefly, BRB) algorithm, because it involves three basic operations: 1. Branching upon the nonconvex variables of the problem; 2. Reducing any partition set before bounding;
3. Bounding over each partition set. Let us first describe these operations and state the convergence conditions.
7.3
Branching
Branching is performed by rectangular subdivision (see e.g. Tuy, 1998)). Specifically, the subdivision of a box M = [p,q] is carried out by choosing a point v E M and an index i E (1,. . . ,n), and dividing M into two subboxes by the hyperplane xi = vi (such a subdivision is referred to as a partition via (v, i)). The most commonly used subdivision is the standard bisection, i.e. a partition (v, i) where v is the midpoint of a longest side of M and i is the index corresponding to this side. An iterative partition process is said to be exhaustive if any infinite filter (i.e., nested sequence) of partition sets it generates shrinks to a singleton (see e.g. Tuy, 1998). It has been proved that an iterative partition process that involves only bisections is exhaustive.
7.4
Reduct ion
Let S be the feasible set of (DM). Given a number y E f (S) and a box M = b, q] we say that a box M' = q'] c M is a y-valid reduction
v,
66
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
of M and denote it by red,M if every x E S fl b, q] satisfying f (x) 2 y is still contained in M' (so that no feasible solution x with f (x) 2 y is lost when M is replaced by M').
PROPOSITION 2.11
(i) If h(q) - g(p)
red, [p,q] = 0.
< 0, or fl(q) - f2(p) < y, then
where, for i = 1 , . . . ,n,
Proof. This result could be derived from Proposition 2.4, but a direct proof is also simple. Consider any x E b , q ] satisfying g(x) I h(x). Since g(.), h(.) are increasing, g(p) 5 g(x) 5 h(x), hence h(x) 2 g(p). If x 2 p' then there is i such that xi < pi = qi - ai(qi - pi), i.e. xi = qi - a(qi - pi) with a > ai, which, in view of the definition of ai (see (2.26)), implies that h (q - (qi - xi)ei) = h(q - a(qi - pi)ei) < g(p), and hence h(x) h(q-a(qi -xi)ei) < g(p), conflicting with h(x) 2 g(p). Therefore, x 2 p', i.e. x E q]. Similarly, if x $ q' then there is i such that xi > 9: = pi Pi(qi - p!,), i.e. xi = p', P(qi -pi) with p > Pi and from the definition of pi (see (2.27)) it follows that g(p' (xi - p!,)ei) = (PI P(qi - p:)ei) > h(q), and hence g(x) 2 g (P' (qi - p:)ei) > h(q) 7 conflicting with g(x) I h(q). Therefore, x I q', i.e. x E W,q']. Thus any x E b, q] satisfying g(x) - h(x) 0 must lie in the box q']. Analogously, any x E. b, q] satisfying f2(x) - fl(x) 7 I 0 must lie in the box q']. 0
<
+
v,
+
+
+
<
v,
+
+
v,
REMARK 2.6 When there are more than one d.m. constraints gj(x) hj(x) 5 0, j = I , . . . ,m , the formulas (2.26)-(2.27) should be replaced by the following
67
Monotonic optimization
The reduction operation thus consists in applying monotonicity cuts to remove certain portions (sometimes a whole partition set) currently of no more interest. This operation ensures that any infinite filter of partition sets generated by an exhaustive subdivision process shrinks to a feasible solution, as shown in the following proposition.
PROPOSITION 2.12 Let {MkL= bkl,qk~]) be any infinite filter of boxes such that h(qkl)- g(pkl) 2 0 'dl = 1 , 2 , . . . , and n E I M k L= (3). Then 3 is a feasible solution of problem (DM). Proof. Clearly pkl,qkl -+ Z. Since g(qkl) - h(pkl) 5 0, it follows by continuity that g(2) - h(3) 5 0, i.e. 3 is a feasible solution of problem (DM). 0
Bounding
7.5
Let y be the current best objective function value and let M = b, q] be a y-valid reduction of a box in the current partition. We must compute an upper bound for f (x) over the box b, q], i.e. a number
An obvious upper bound is
As simple as it is, this bound satisfies the consistency condition, namely: 3 then P ( M ) = fi(q) - f2(p) fl(2) - f2(3)= f(Z). Conif p,q sequently, as will be seen shortly, convergence of the algorithm will be ensured when this bounding is used in conjunction with an exhaustive subdivision. The reduction operation can be conceived of as an additional device to speed up the convergence. Nevertheless, since the bound (2.29) is generally not very tight, better bounds are often needed to enhance efficiency. One way to do that consists in applying one or two iterations of the PA procedure for solving the bounding subproblem in (2.28), as was indicated in the original paper Tuy (2000a). Alternatively, one may combine exploiting monotonicity with any partial convexity present in the problem. For instance, suppose that the constraint set is G n H n b, q] where G, H are normal and conormal closed sets, respectively, while a concave function f (x), together with a convex set C, are available such that --f
--f
6
68
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Then an upper bound of f (x) over the set G n H n given by the optimal value of the problem
b, q] is obviously
which is a convex problem and can be solved by currently available efficient algorithms. Let w be an optimal solution of (2.30). Then f (w) fl(q) - f2(p) with usually the strict inequality holding, so f (w) gives a better upper bound than fl(q) - f2(p) for f ( x ) over G n H n b,q]. If w E G f?H then f (w) is the exact maximum of f (x) over G n H . Otherwise, either w f G or w f H. If w $ G, define
<
Then y E @G, so the rectangle {x I y < x 5 q) can be removed without losing any feasible point. If w f H, then letting
we have x f H b'x E b, vj, so the latter rectangle can be removed without losing any feasible point. After removing the just defined rectangle (y, q] or b, v), one has a polyblock still containing all the feasible solutions in b,q], so one more iteration of the polyblock approximation will help to obtain a more refined upper bound over M = b, q]. Often it may be useful to write f (x) as a sum of several functions such that an upper bound can be easily computed for each of these functions: P ( M ) can then be taken to be the sum of the separate upper bounds. An example illustrating this method will be given in Section 8.
Algorithm and convergence We are now in a position to state the proposed algorithm for (DM). Algorithm BRB [Branch-Reduce-and-Bound Algorithm] Start with PI = {MI), Ml = [a,b], R1 = 0. If a best Step 0. feasible solution is available let CBV (current best value) denote the value of f (x) at this point. Otherwise, set CBV = -m. Set k = 1.
Step 1.
For each box M E Pk:
Compute its y-valid reduction redyM for y = CBV, as described in Proposition 2.11; m
Delete M if red,M = 0;
69
Monotonic optimization
# 8;
rn
Replace M by redyM if red,M
m
If redyM = [p,q] then compute an upper bound P ( M ) 2 f 2 (p) for f (x) over the feasible solutions in M .
f i (q) -
Step 2. Let Prkbe the collection of boxes that results from Pk after completion of Step 1. From Rk remove all M E Rk such that P ( M ) 1 CBV and let RL be the resulting collection. Let Mk = RrkU Prk. Step 3. If Mk = 8 then terminate: the problem is infeasible (if CBV = - m ) , or CBV is the optimal value and the feasible solution 3 with f (3) = CBV is an optimal solution (if CBV > -m). Otherwise, let Mk E argmax{P(M) I M E M k ) . Divide Mk into two subboxes by the standard bisection. Step 4. Let Pk+1be the collection of these two subboxes of Mk.
Step 5.
Let Rk+1= Mk\ {Mk). Increment k and return to Step 1.
THEOREM 2.3 Whenever infinite Algorithm BRB generates an infinite filter of boxes {Mkl) whose intersection yields a global optimal solution. Proof. If the algorithm is infinite, it must generate an infinite filter of boxes {Mkl) and by Proposition 2.12, n z l M k L = {z) with 3 being a feasible solution. Therefore, if the problem is infeasible, the algorithm must stop at some iteration where no box remains for consideration, giving evidence of infeasibility. Otherwise, if MkL= bkl,q k l ) ] then
<
hence liml-t+mP(Mkl) f(Z). On the other hand, since MkLcorresponds to the maximum of P ( M ) among the current set of boxes, we have
and hence,
Since 3 is feasible, it follows that Z is an optimal solution.
0
REMARK2.7 Just like Algorithm PA, Algorithm BRB can guarantee in finitely many iterations only an &-approximate optimal solution (by
70
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
+
removing in Step 3 all M E Mk such that P ( M ) E 5 CBV). If the objective function f (x) is increasing (fi = 0) then an &-essential optimal solution can be computed in finitely many iterations by using a successive incumbent transcending algorithm analogous to Algorithm SIT/A for MO/A, in which, for each incumbent value y the subproblem min{g(x) - h(x) I fl (x) 2 y E , x E [a,b ] } should be solved by a BRB algorithm until a better value than y or an evidence has been produced that y is the &-optimalvalue.
+
Further methods for improving bounds
7.6
The bound (2.29) depends upon the size of the box b, q] and is tightened by the reduction operation. Below we indicate some methods for further improving the bounds. (i) Grid method A simple bounding method consists in applying a number of iterations of Algorithm PA, starting with an initial approximating polyblock constructed by means of a finite grid of the upper boundary of G. Take a set U = {cO,cl,..., cn) c {X E Rr I C : = l ~ i = 1). For example, let U consist of the follow ing points c0 = e,/n
- (n
(barycentre of unit simplex)
+ 1)e - nek
k = 1, ...,n n2 (ck is barycentre of simplex spanned by cO,ei, i c -
# k) .
For each k = 0 , 1 , . . . , n, let xk be the intersection of af G with the halfline from p in the direction ck. Construct a set T as follows:
+ (xp - qi)ei i = 1 , .. . , n.
Step 0. Set k = 1.
Let T = {ul,. . . un) with ui = q
Step k.
Compute xk. Let Tk,, = {x E T I z
.
+
> xk), and compute
where xi = x (xik - xi)ei. Let Tk+1 be the set obtained from Ti by removing every zi such that { j I zj > yj} = {i} for some y E T,& = { z E T I x 2 xk). If k < n, let k t k + 1 and go back to Step k. If k = n , stop. If T is the last set obtained by the above procedure then it can be used as initial PI (with P(z) for every x E PI))for one or several iterations of Algorithm PA.
71
Monotonic optimization
The more dense the grid U the tighter the upper bound, but also the more costly the computation will be. Therefore a reasonable trade-off should be resolved between grid denseness and bound quality. (ii) Convex relaxation Assume that the constraint set is G n H n b , q] with G, H being normal and conormal closed sets, respectively. As we saw in Subsection 5.3 one way to compute an upper bound for max{f (x) I x E G n H f l [p, q]) is to solve a convex relaxation (2.30) of this problem. To this end the following propositions may be useful.
PROPOSITION 2.13 The convex hull C O Gof a compact normal set G c b, q] C W 3 is a nomnal set. The convex hull co H of a compact conormal set H C [p, q] is a conormal set.
c:
Xixi Proof. Let x E COG,so that, by Caratheodory theorem, x = withxi E G, Xi 2 OandCyZ:Xi = 1. I f p < y x then y - p = a ( x - p ) Xi (xi - p) = for some a E [ O , l ] , hence y = p a ( x - p) = p a ~y::Xi[p+ a ( x i -p)]. Since p + a ( x i - p ) E G , i = 1,...,n 1, it follows that y E co G, proving the normality of G. The second assertion is proved analogously.
+
<
+ ~;=f: +
PROPOSITION 2.14 The convex or concave envelope of an increasing ( a decreasing, resp.) function over a box b, q] is an increasing ( a decreasing, resp.) function. Proof. Recall that the convex envelope cp(x) of a function f (x) over a box b, q] is given by
where f ( x ) = f (x) if x E b, q] and f ( x ) = +oo if x $! [p, q]. (see e.g. Tuy, 1998, Proposition 2.10). Now assume that f (x) is increasing and let XI 2 x 0, so that u = x' - z 2 0. For every set { X I , : .. , x n + l ) C Wn there corresponds a set {x", . . . , xIn+') with xIi = x2 u such that X2x1', Xi = 1 if and only if x' = x = Xix2, X 2 > - 0, Xi 0, Xi = 1 and this correspondence is 1 - 1. Therefore, according to (2.32):
>
~yzt c:
>
+
72
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
>
and since f ( d i ) f(xi) whenever xi E [p, q] it follows that cp(xl) 2 cp(x). An analogous argument can be used to prove that the convex envelope of a decreasing function is a decreasing function. Since the concave envelope of an increasing function f (x) is the negative of the convex envelope of the decreasing function -f (x) the assertion about concave envelope follows. 0 As a consequence of the above propositions, for the monotonic optimization problem
the convex relaxation (2.30) where e f ( x ) is the concave envelope of f (x) and C is the convex hull of G nH is a convex optimization problem which is also a monotonic optimization problem (maximizing a concave increasing function over a convex normal set).
8.
Illustrative examples
To illustrate the performance of the proposed algorithms as compared with the original algorithms in Tuy (2000a) we present some numerical examples. All computations were done on a P C Pentium IV 2.53 GHZ, RAM 256Mb DDR.
EXAMPLE 2.3 Consider the problem max
{& i=l
++si
X)
jdi, X)
Ti
Ax<~,x>O
where x E R12 and ci, di E R12, Ti, S i E R, A E R given below.
~ q E R15 ~ are~
~
73
Monotonic optimization
In order to reduce the dimension of the problem (see Section 5) we first computeaconstant C>OsuchthatC+mini,l,,,.,5 (c,~ ) + ~ i / (xd) ~+ ,s ~ 0 for all x satisfying Ax 5 q, x 1 0, then determine 2,Pi and bi such that bi (ci, X) Ti/ (di, X) si C = (2, X) Pi/ (di,X) si 2 0, i = 1, ...,5. Then the problem can be written as a problem MO/A in y E IR5 (with x being now intermediate variables)
>
>
+
+ +
Solving this problem with tolerance w
Optimal value: 16.077978.
w
Optimal solution:
+
E
+
= 0.01 yields:
To obtain these results, Algorithm PA needed 184 iterations, 13.84 sec., 1687 linear subprograms, while Algorithm 1 in Tuy (2000a) required 620 iterations, 65.58 sec., 11867 linear subprograms.
74
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
EXAMPLE 2.4 (Test problem 1, Chapter 4 Floudas et al., 1999)
+ 2 = z this problem becomes min (0.1666672~- 4.08z5 + 31.2875z4 - 106.666667z3 0<%<13
By the change of variable x
Algorithm BRB, with tolerance 0.01, found the optimal solution z = 11.999847, i.e. x = 9.999847, with objective function value: -29763.232229 at iteration 1106 and confirmed its optimality at iteration 1918 (note that each iteration is executed very fast) . Computation time: 0.015 sec., maximal number of nodes in each iteration: 12. EXAMPLE 2.5 (Test problem 5, Chapter 4 Floudas et al., 1999) min (20: --5<xi,x2<5
-
+
' 6 1.05~: -xl - x1x2 6
+ x2
(the three-hump camel-back function). By the change of variables x (5,5) = z this problem becomes min
(0.166667z7 - 1.05zt
0<%1,%2<5
+2 4 +
222 -
+
21x2).
Algorithm BRB, with tolerance 0.01, found the optimal solution: z = (5,5), i.e. x = (O! 0) with objective function value 0, right at the beginning (iteration 0) and confirmed its optimality at iteration 3131. Computational time: 0.219 sec., maximal number of nodes in each iteration: 697 . EXAMPLE 2.6 (taken from Murtagh and Saunders, 1983) This is a small but nevertheless difficult problem, as was confirmed in (Floudas, 2000, pp. 388-390):
75
Monotonic optimization
+
Changing the variables xi t xi 5, and writing the constraints as a system of d.m. inequalities yields the problem
An attempt to solve this problem by Algorithm PA failed, spending hours without finding even a feasible solution, but the problem was solved in just about 2 minutes by Algorithm BRB using an exhaustive subdivision of the box [O, 10e] c IW? and a bounding method exploiting both monotonicity and partial convexity. Denote by E the feasible set of the problem. For any given box M = b,q] c [O,lOe] the bounding subproblem is to compute a lower bound for (2.33) min{f (4 I 2 E E n b,41). Setting yl = zl - 6, y2 = zl - 2 2 , y3 = 2 2 - 23, y4 = 24 - 23, y5 = z4 - 25 consider the following relaxation of (2.33):
Clearly a lower bound for the optimal value of the latter problem (hence, also for the optimal value of (2.33) is P ( M ) = P1(M) P2(M), where
+
76
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
But , since p2 seen that
+ < 42 + p3
q3
implies that
p2 - q3
< q2
- p3,
it is easily
while the problem (2.35) which is convex can be solved easily. In the implementation we even replaced p2 (M) by its underestimate pzl P22 P24 P 2 5 , where
+ +
+
Thus, a lower bound over a box M =
b,q] was taken to be
The results of computation (on a PC Pentium IV 2.53 GHz) were as follows: w
Global optimal solution
w
Optimal value: 0.029311 (tolerance 0.01).
w
The optimal solution was found at iteration 11274 and confirmed at iteration 14199.
Monotonic optimization Maximal number of nodes generated: 3022. Computation time: 123.516 sec.
Acknowledgments The authors are indebted to Shabbir Ahmed (Georgia Institute of Technology, Atlanta) for several useful discussions and to Ng.T. Hoai Phuong (Institute of Mathematics, Hanoi) for the illustrative examples in Section 8. We are also grateful to Gilles Savard ( ~ c o l ePolytechnique, Montrkal) for many remarks and suggestions which have helped to correct a number of minor mistakes and misprints. The research of the authors was supported in part by US National Science Foundation Grant DMI-990826 and VN National Program of Basic Research 2001-2003.
References Floudas, C.A. (2000). Deterministic Global Optimixation. Theory, Methods and Applications. Nonconvex Optimization and its Applications, vol. 37. Kluwer Academic Publishers, Dordrecht. Floudas, C.A. et al. (1999). Handbook of Test Problems i n Local and Global Optimixation. Nonconvex Optimization and its Applications, vol. 33, Kluwer Academic Publishers, Dordrecht. Luc L.T. (2001). Reverse polyblock approximation for optimization over the weakly efficient set and efficient set. Acta Mathematica Vietnamica, 26(1):65-80. Murtagh, B.A. and Saunders, S.N. (1983). MINOS 5.4 U s e r ' s Guide. Systems Optimization Laboratory, Department of Operations research, Stanford University. Phuong, Ng.T.H. and H. Tuy (2002). A monotonicity based approach to nonconvex quadratic minimization. Vietnam Journal of Mathematics 30(4):373-393. Phuong, Ng.T.H. and H. Tuy, H. (2003). A unified approach to generalized fractional programming. Journal of Global Optimixation, 26:229259. Rubinov, A., Tuy, H., and Mays, H. (2001). Algorithm for a monotonic global optimization problem. Optimization, 49:205-221. Tuy, H. (1987). Convex programs with an additional reverse convex constraint. Journal of Optimization Theory and Applications, 52:463-486.
78
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Tuy, H. (1995). D.C. optimization: Theory, methods and algorithms. In: R. Horst and P. Pardalos (eds.), Handbook of Global Optimization, pp. 149-216. Kluwer Academic Publishers, Dordrecht. Tuy, H. (1998). Convex Analysis and Global Optimization. Nonconvex Optimization and its Applications, vol. 22. Kluwer Academic Publishers, Dordrecht . Tuy, H. (1999). Normal sets, polyblocks and monotonic optimization. Vietnam Journal of Mathematics, 27:277-300. Tuy, H. (2000a). Monotonic optimization: Problems and solution approaches. SIAM Journal on Optimization, 11(2):464-494. Tuy, H. (2000b). Which Functions Are D.M.? Preprint, Institute of Mathematics, Hanoi.
Tuy, H. (2001) Convexity and monotonicity in global optimization. In: N. Hadjisavvas and P.M. Pardalos (eds.), Advances in Convex Analysis and Global Optimization, pp. 569-594. Kluwer Academic Publishers, Dordrecht . Tuy, H. and Luc, L.T. (2000). A new approach to optimization under monotonic constraint. Journal of Global Optimization, 18:l-15. Tuy, H. and Nghia, 1Vg.D. (2003). Reverse polyblock approximation for generalized multiplicative/fractional programming. Vietnam Journal of Mathematics, 31:391-402. Tuy, H., Thach, P.T., and Konno, H. (2004). Optimization of polynomial fractional functions. Journal of Global Optimization, 29(1):19-44.
Chapter 3
DUALITY BOUND METHODS IN GLOBAL OPTIMIZATION Reiner Horst Nguyen Van Thoai Abstract
1.
This article presents an overview of the use of Lagrange-duality bounds within a branch and bound scheme for solving nonconvex global optimization problems. Convergence properties and application possibilities of the method are discussed.
Introduction
A widely employed algorithmic concept for the global minimization of various types of nonconvex functions over convex and nonconvex sets uses successively refined partitioning of the feasible set coupled with a bounding procedure yielding suitable lower bounds for the minimum of the objective function over each element of the partitions. Most often this concept is presented in the form of a branch and bound scheme, which is actually one of the most promising methods for solving multiextremal global optimization problems. In many cases, lower bounding procedures are established using suitable types of underestimating functions of the nonconvex functions involved in the problem under consideration. As a result, lower bounds are computed by solving 'relaxed' problems in the same space of variables as that of the original problem. Algorithms based on this kind of lower bounding procedures usually have the following convergence properties (cf., e.g., Horst and Tuy, 1996; Horst et al., 2000): (a) if the underlying problem has an optimal solution, and the algorithm does not terminate at an optimal solution after finitely many iterations, then it generates infinite subsequences of feasible and/or infeasible points each of them converging to optimal solutions;
80
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
(b) if the feasible set of the underlying problem is empty, then the algorithm terminates after finitely many iterations indicating this fact. An alternative to the above bounding procedures by underestimating functions is the application of Lagrange-dual problems. It is well known from weak duality that a lower bound for the optimal value of each minimization problem can be obtained by solving the corresponding Lagrange-dual problem. The main advantage of this duality bounding procedure is that in many cases it provides better lower bounds than those computed by solving 'relaxed' problems as mentioned above. However, there are two essential questions to be answered while applying this method. The first question is how to solve the Lagrange-dual problem for a given nonconvex problem, and the second is concerned with the convergence of these algorithms. The purpose of this overview is to discuss systematically convergence properties of branch and bound algorithms using duality bounds and to collect some interesting problem classes where duals of nonconvex problems can be efficiently solved. The article is organized as follows. In Section 2, we present the Lagrangian duality bound and the bound obtained when solving the socalled 'convexification problem', in which all nonconvex functions are replaced by their convex envelopes. Useful properties of both kinds of lower bounds are discussed. A branch and bound scheme using duality bounds and its convergence are discussed in Section 3 for the case where the feasible set of the underlying problem is compact. In Section 4, the general algorithm is then applied to establish a decomposition procedure for a wide class of problems in which the variables can be divided into two suitable groups. Convergence properties of the resulting algorithms are discussed including some cases of unbounded feasible sets. The last section contains a collection of some interesting classes of global optimization problems to which our branch and bound scheme can be applied successfully.
2. 2.1
Bounding by Lagrangian duality and convexification Duality bounds
The nonconvex global optimization problem to be considered is generally formulated as follows:
3 Duality Bound Methods i n Global Optimization
81
where C is a nonempty convex subset of Rn, 11,I2 are finite index sets and f , and gi(i E Il u 12) are lower semicontinuous (1.s.c.) real functions defined on C . For X E ~ l ' l I ~ l ' ~ 1x, E C define the Lagrangian
of Problem (3.1) and on the set
define the dual criterion d: A -t R,
X
H d(X)
= inf L(x,X) XEC
Then the dual of (3.1) is defined as the problem d* = sup{d(X): X E A).
(3.2)
It is clear that for each X E A it holds d(X) 5 f *. Consequently, we have that d* 5 f*, which means that a lower bound of the optimal value of Problem (3.1) can be obtained by solving its dual Problem (3.2). If f is convex on C and a certain 'constraint qualification' is fulfilled for Problem (3.1)) then we have strong duality (see, e.g., Geoffrion, 1971; Mangasarian, 1979; Bazaraa et al., 1993):
In our nonconvex case, however, a duality gap
has to be expected. The following result gives some useful properties of the duality bound d*.
PROPOSITION 3.1 (i) Assume that inf{f (x) : x E C ) > -00 and there > 0 for all x E C. Then exists X0 E A such that CieIlu12 d* = +oo.
gi(~)~P
(ii) A = f * - d * 5 f*-inf{f(x):
X E
C).
82
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
>
(iii) Let Cj, j = 1,2, be compact subsets of C satisfying C1 Cz and for each j , let d; be the optimal value of the dual of Problem (3.1) with C = C j . T h e n d"; d;. (iv) Let d be the optimal value of the dual of that problem, which is resulted from Problem (3.1) by adding to it some redundant constraints. T h e n d d*.
>
Proof.
(i) Let
r = {A
E IRlIllf
1'21
: X = kXO,k
> 0). Then
The last equation follows from the assumption in (i) by letting k -t +oo. (ii) follows from the fact t,hat d* = sup{d(X): X E A) inf{f(x): x E C3,
2
d(0) =
(iii) and (iv) follow immediately from the definition of duality bounds.0
xiEIlU12
REMARK 3.1 (a) The condition g i ( ~ ) ~> : 0 for x E C, is fulfilled if, e.g., there is an index j such that {x: I(: E C,gj(x) 5 0 )
=O.
In this case, one can choose X0 = (0,. . . , 0 , A:, 0 , . . . , 0 ) with XjO = 1. (b) The monotonicity property (iii) is useful in the application of duality bounds within a branch and bound scheme, which is considered in the next sections. (c) Based on Property (iv) one can use redundant constraints to improve duality bounds in some interesting cases, see, e.g., Shor and Stetsyuk (2002).
83
3 Duality Bound Methods in Global Optimization
2.2
Convex envelopes and convexification bound
We recall the concept of the convex envelope of a nonconvex fuiiction which is a basic tool in theory and algorithms of nonconvex global optimization, see e.g., Falk and Hoffman (1976), Horst and Tuy (1996), and Horst et al. (2000).
3.1 Let C c Rn be nonempty convex, and let f : C DEFINITION be 1.s.c. on C. The function x
H
cp,,
--+
R
'PC,,-. C - + R , (x) := sup{h(x):h:C R convex, h 5 f on C}
,
-f
is said to be the convex envelope of f over C. Notice that it is often convenient to eliminate formally the set C by setting
and replacing cpCtf accordingly by its extension cp,,, : Rn -+ R U {oo). It is well-known and easy to see that cpCqf is 1.s.c. on C, and hence is representable as the pointwise supremum of the affine minorants of f . Geometrically cpCvf is the function whose (closed) epigraph (i.e., the set of points on and above its graph) coincides with the convex hull of the epigraph of f . The following basic properties and their proofs can be found, e.g., in Horst and Tuy (1996) and Horst et al. (2000).
PROPOSITION 3.2 Assume that in Definition 3.1, C is compact and let D c C be convex7g g: Rn R be an afine function. Then -+
(i) m := min{f (x) : x E C ) = min{cpClf(x) : x E C ) , (ii) {Y E C : f ( y ) = m)
(iv) cp,,f+,
= cp,,
C {Y
E C : cp,,,
(y) = m),
,+ g.
Notice that the result (ii) can be precised: it is easy to see that the set of global minimizers of cpCvf over C is the convex hull of the set of global minimizers of f over C. For each nonconvex optimization problem of the type (3.1) with we can construct the following convexified problem:
12
= 0,
84
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Obviously, cp* is also a lower bound of f *. We call cp* a convexification bound.
2.3
Relationship between duality bounds and convexification bounds
A very nice property of duality bounds is that in many interesting cases, they are at least as good as convexification bounds. More precisely, this property is shown in the following. PROPOSITION 3.3 Assume that in Problem (3.1) I2 = 0 and some constraint qualification is fulfilled for the convexified Problem (3.3). T h e n
(ii) d* = cp* for the case where gi, i E I I are all linear.
be the Lagrangian of Problem (3.3). Then it follows from the above assumption and the definition of convex envelopes that d* = sup inf L(x, A) X>O "cEC
> sup inf z ( x , A) = cp* X>O x E C
(ii) From Proposition 3.2(iv), it follows that on C it holds
Thus, from Proposition 3.2(i) we have for each A 2 0 that
A) = inf z ( x , A), inf L(x, A) = inf cpC,~(x,
xEC
XEC
xEC
13 which implies that d* = cp*. In principle, duality bounds can be strictly better than convexification bounds. The following special case serves as an example (see Diir, 2002).
PROPOSITION 3.4 Assume that in Problem (3.1) and Problem (3.3) the following conditions are fulfilled: (i) I2 = 0; C is compact; (ii) f is strictly concave o n C , i.e.
3 Duality Bound Methods in Global Optimization
85
(iii) -gi, i E II are strictly concave and continuously differentiable o n C , and there i s 3 E C such that gi(3) < 0 , i E I l ; (iv) f * > cp*; (v) cpc,! is not constant o n any linesegment contained in C .
T h e n d* > cp*, i.e. in this case, the duality bound i s strictly better than the convexification bound.
3.
Branch and bound methods using duality bounds
The branch and bound scheme is one of the most promising methods developed for solving multiextremal global optimization problems. The main idea of this scheme consists of two basic operations: successively refined partitioning of the feasible set and estimation of lower and upper bounds for the optimal value of the objective function over each subset generated by the partitions. In this section, we present a branch and bound scheme using duality bounds for solving global optimization problems of the form
where C is a simple n-dimensional compact convex set as used in the first relaxation step of a branch and bound procedure, e.g., a simplex or a rectangle, f and gi (i = 1,. . . ,m ) are lower semicontinuous functions on C. The additional assumption on C is made here for the implementability and convergence of the algorithm. Let
3.1
Branch and bound scheme
For each set R for the problem
C , we denote by p(R) the duality bound computed
and by F ( R ) a finite set of feasible points of this problem, if there exists any.
Branch and bound algorithm Initialization: Set R' = C. Compute p(R1) and F ( R ~ c ) R l n ~ Set . p1 = p(R1). If F ( R ~ # ) 0, then compute yl = min{f (x) : x E F ( R ~ ) ) and choose x1 such that f ( x l ) = yl, otherwise, set yl = + m . If p1 = + m , then set R 1 = 0, otherwise, set R1 = {R1), k = 1.
86
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Iteration k (i) If !Xk = 0, then stop: either Problem (3.1) has no feasible solution or xk is an optimal solution. (ii) If !Xk # 0, then perform a partition of Rk obtaining {R!, . . . , R:), where R:i = 1 , . . . ,r are nonempty n-dimensional sets satisfying UTzl R: = R ~i ,n t ~ mi n t ~ =? 0, for i # j . (iii) For each i = 1,. . . ,r compute ,u(R:) and F (R:) (iv) Set yk+l = min{yk, min{ f (~x): x E
U;=l
c R:
fl
L.
F(R~))).
(v) Choose xkS1 such that f (xk+l)= yk+l. (vi) Set !XkS1= !Xk \ { R ~U{R: ) : p ( R f ) < yk+l, i = 1,.. . , T ) . (vii) If !Xk+1# 0, then set pk+: = min{p(R) : R E !Xk+l) and choose such that p(Rkfl) = pk+l, otherwise, set pk+l = Rktl E y,++l. Go to iteration k t k + 1.
3.2
Convergence
Whenever the above algorithm does not terminate after finitely many iterations, it generates at least one infinite nested sequence of compact partition sets {RQ)such that RQS1c RQ for all q. We obtain the convergence of the algorithm in the following sense.
T H E O R E3.1 M Assume that the algorithm generates a n infinite nested sequence {RQ) of partition sets such that
T h e n each optimal solution of the problem min{ f (x) : x E R*)
(3.5)
is a n optimal solution of Problem (3.4). Proof. For each q let xQE RQsuch that f (xQ)= min{f (x) : x E Rq). Let x* be an accumula,tion point of the sequence {xq). Then x* E R* and, by co. passing to a subsequence if necessary, assume that xQ -+ x* as q Since RQ+' c RQ 'dq and x* E RQ 'dq, it follows with the definition of f (x4) that f (x4) f (xQfl ) f (x*). Hence lim,,, f ( 2 4 ) exists satisfying limq+COf (xq:) f (x*). On the other hand, lower semicontinuity f (xQj2 f (z*), so that we have limq,CO f (xQ)= f (x*). implies lim,,, -+
<
<
<
3 Duality Bound Methods in Global Optimization
87
Since f (x4) = min{ f ( x ): x E R4) 5 min{ f ( x ): x E R * ) 5 f (x*)Vq, it follows that lim f ( x q )= lim mini f ( x ): x E R q )
q+m
q+m
5 min{ f ( x ): x
E
R*)
< f (x*) = lim f (xq), q4C0
which implies that lim mint f ( x ): x E R q ) = min{ f ( x ): x E R * ) = f ( x * ) .
q-+m
Replacing each set Rq by L f l Rq and taking into account the fact that R* c L , we obtain in a similar way lim mini f ( x ): x E L
q+m
n R q ) = min{ f ( x ): x E L f l R*) = min{f(x): x E R * ) .
As shown in the proof of Proposition 3.l(ii), for each q it holds that min{ f ( x ): x E L fR lQ )2 p(R4) 2 mini f ( x ): x E R4). Therefore, we obtain p* = lim p ( ~ k = ) min{ f ( x ): x E R * ) . k-+CO
Since f* 2 p* and R*
c L , it follows that
f * 2 min{ f ( x ): x E R*) 2 f
*, hence f * = min{ f ( x ): x
E
R*),
i.e., each optimal solution of Problem (3.5) is an optimal solution of Problem (3.4).
THEOREM 3.2 Assume that the algorithm generates an infinite nested sequence { R ~ of) partition sets converging to a single point r*, i.e.,
and assume in addition that the function gi, i = 1 , . . . ,m, in Problem (3.4) are continuous. Then r* is an optimal solution of Problem (3.4). Proof. From Theorem 3.1, we only need to show that r* E L. Suppose r* 4 L. Then there is an index j such that gj(r*) > 0. From the continuity of gj and the assumption R~ -t { r * ) as k -+ m, there exists ko such that g j ( x ) > 0 for all x E R ~ OSince . min{f ( x ) :x E R ~ O>) -GO,
88
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
it follows from Proposition 3.1 (i) and Remark 3.1(a) that p ( ~ k O )= +m, which implies that the partition set RkOhas to be removed from further 0 consideration. This contradiction implies that r* E L. Notice that the results of Theorems 3.1 and 3.2 can also be derived from the approach given in Dur (2001).
4.
Decomposition met hod using duality bounds
In this section, we discuss a decomposition method for solving a class of global optimization problems, in which the variables can be divided into two groups in such a way that whenever a group is fixed, all functions involved in the problem must have the same structure with regard to the other group, e.g., linear, convex or concave, etc. Based on the decomposition idea of Benders, a corresponding 'master problem' is defined on the space of one of two variable groups. For solving the resulting master problem, the branch and bound algorithm using duality bound presented in the previous section is applied. Convergence properties of the algorithm for this problem class is established in the next subsection. A special class of so-called partly convex programming problems is considered thereafter. The results presented in this section are originated from Thoai (2002a,b).
4.1
Decomposition branch and bound algorithm
The class of nonconvex global optimization problems to be considered here can be formulated as follows: min F(x, y) s.t.Gi(x,y) 5 0 (i = 1,.. . , m) x E C, y E Y,
(3.6)
where C is a compact convex subset of Rn, Y a closed convex subset of RP, and F and Gi (i = 1,.. . ,m) are continuous functions defined on a suitable set containing C x Y. To apply the branch and bound algorithm presented in the previous section, we also assume in addition that C has a simple structure as e.g., a simplex or a rectangle. We denote by Z the feasible set of Problem (3.6), i.e., Z = { ( x , ~ )Gi(x,y) : 5 O(i = 1 , . . . , m ) , x E C , y E Y), and assume that Problem (3.6) has an optimal solution. Define a function 4 : Rn -+ R by
(3.7)
3 Duality Bound Methods i n Global Optimization
89
and agree that 4(x) = +oo whenever the feasible set of Problem (3.8) is empty. Then Problem (3.6) can be formulated equivalently as min{4(x): x E C}.
(3.9)
More precisely, we state the equivalence between the Problems (3.6) and (3.9) in the following proposition whose proof is obvious.
PROPOSITION 3.5 A point (x*,y*) i s optimal t o Problem (3.6), if and only if r* is a n optimal solution of Problem (3.9), and y* is a n optimal solution of Problem (3.9) with x = x*. In view of Proposition 3.5, instead of Problem (3.6), we consider Problem (3.9) which is usually called the 'master problem'. For solving the master problem in Rn, the branch and bound algorithm presented in Section 3.1 is applied. Notice that partitioning is applied only in the space of the x-variables. For each partition set R c C , a lower bound p(R) of the optimal value of the problem min{4(x) : x E R} = min{F(x, y): Gi(x, y) 5 O(i = 1,..., m ) , x E R, y E Y} is obtained by solving the dual problem, i.e.,
We now establish some convergence properties of the branch and bound algorithm applied to Problem (3.9). Let C0 be the subset of C consisting of all points x E C such that Problem (3.8) has an optimal solution, i.e.,
Note that, since Problem (3.6) is solvable, it follows from Proposition 3.5 that Go # 0. Further, let M : C0 -4 RP be a point-to-set mapping defined by M ( x ) = {y E RP: Gi(x,y) 5 O(i = 1 , . . . , m ) , y E Y}.
(3.11)
The following definition, which is introduced based on well known concepts from convex analysis and parametric optimization (see, e.g., Berge, 1963; Hogan, 1973; Bank et al., 1983), is used for establishing convergence of the algorithm.
90
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
DEFINITION 3.2 (i) We say that the function 'dual-proper' at a point xOE C0 if
4
in Problem (3.9) is
(ii) The function 4 is 'upper semicontinuous (u.s.c)' at xO E CO if for each sequence ( 2 4 ) c C O , limq-+m XQ = xO the inequality lim+(xQ)5 4(x0) holds. (iii) A point-to-set mapping M : C0 + RP is called 'lower semicontinuous according to Berge (1.s.c.B.)' a t xOE C0 if for each open set R satisfying R n M(x') # 0 there exists an open ball, C, around xOsuch that R n M ( x ) # 0 for all x E C n CO.
THEOREM 3.3 Assume that the decomposition branch and bound algorithm generates an infinite subsequence {RQ) of partition sets such that (i) RQ+' c RQfor all q and limq,, (ii) the function
RQ=
RQ= {r*) c Go,
4 is dual-proper at r*,
(iii) there exists qo satisfying p(RQ0)> -00, and (iv) there exists a compact set YO c Y such that for each X 2 0 and for each x E C , the set of optimal solutions of the problem min{F(x, y ) + C E 1 Gi(x, y)Xi : y E Y), if it exists, has a nonempty subset in YO. Then r* is an optimal solution of Problem (3.9), i.e., the point r*, together with each optimal solution of the Subproblem (3.8) with x = r*, is an optimal solution of Problem (3.6). Proof. From Assumption (iii) and the monotonicity property of duality bounds, it follows that p(RQ)> -00 for q 2 qo. For each q 2 qo, let
and let XQ be an optimal solution of the problem max{wq(X): X i.e., p(RQ)= wq(XQ).Moreover, let
2 01,
91
3 Duality Bound Methods in Global Optimization
First, we show that w*(A) = supq wq(A) for each A. By definition, it is obvious that w*(A) 2 supq wq(X). On the other hand, for each q, let 2 4 E R4, y4 E Yo c Y such that
Then limq+OO(xQ, yq) = (r*,y*), where r* E C0 as in Assumption (i) and y* E YO c Y, by assumption (iv). This implies that SUP 4
wq(A) = lim wq(A) = F(T*, Y*) q-+w
+ C Gi(r*,y*)Ai 2 w*(X). i=l
Thus, we have w*(A) = sup, wq(A). Since the sequence {p(R'J)) of lower bounds is nondecreasing and bounded by the optimal value of Problem (3.6), its limit, p*, exists, and we have p* = lim p(Rq) = lim wq(Aq)= lim maxwq(A) Q4W q+w q+w A20 = sup max wq(A) = max sup wq(A) = max w*(A). q A20 A20 q x>o Since 4 is dual-proper at r * (Assumption (ii)), it follows that p* = @(r*), which implies that r* is an optimal solution of Problem (3.9), and hence, the point r*,together with each optimal solution of the Subproblem (3.8) with x = r*, forms an optimal solution of Problem (3.6). 0 REMARK3.2 If Y is a compact set, then Conditions (iii) and (iv) in Theorem 3.3 can obviously be removed.
THEOREM 3.4 Let the assumptions of Theorem 3.3 be fulfilled. Further, assume that throughout the algorithm one has F(R4) # 0 for each q, and the function qh is upper semicontinuous at r*. Then each accumulation point of the sequence {xq) generated by the algorithm, at which 4 is upper semicontinuous, is an optimal solution of Problem (3.9). Proof. Let x* be an accumulation point of {xq), (note that accumulation points exist because of the compactness of C). By passing to a subsequence if necessary, we assume that limq,OO 2 4 = x*. Since lim R q =
4+00
n
~q
= {,*I
co,
92
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
and 4 is upper semicontinuous at r*, it follows that for each q there is a point 7-4 E R4 such that 4(r4) < +co and &$(rq) 5 $(r*). Since {4(xq)) is nonincreasing and bounded by the optimal value of Problem (3.6), and 4(x4) 4(r4) for each q, it follows from the upper semicontinuity of 4 at x* that
<
which implies by Theorem 3.3 that x* is, as well as r*, an optimal solu0 tion of Problem (3.9). From Theorems 3.3 and 3.4, we see that the dual-properness and the upper semicontinuity of the function 4 play the key role for the convergence of the algorithm. In the remaining part of this subsection, we discuss the upper semicontinuity of 4. First, consider the case where in Problem (3.6) the function F does not depend on y, i.e., we have F (x, y) = f (x). Obviously, in this case 4 is upper semicontinuous if f is upper semicontinuous. In general, if we consider Problem (3.6) as a nonlinear parametric optimization problem of the form
then q5 is usually called the optimal value function of this problem. Properties including continuity of optimal value functions are discussed in Bank et al. (1983), Dantzig et al. (1967), and Fiacco (1983) in connection with the sensitivity and stability analysis of several problem classes. For the convergence of our algorithm, we only need to consider the upper semicontinuity of the function +. This is often established by the investigation of the lower semicontinuity (according to Berge) of the point-to-set mapping M defined in (3.11). Below we present a classical result with a simple proof concerning the relationship between the upper semicontinuity of 4 and the lower semicontinuity of M .
3.6 Assume that the point-to-set mapping M defined in PROPOSITION (3.11) is 1.s.c.B. at x0 E CO. Then the objective function 4 of Problem (3.9) is u.s.c at xO zf the function F is upper semicontinuous. E M(x') such 2 4 = xO. Further, let Proof. Let ( 2 4 ) c CO,limq,, = +(xO). Since M is 1.s.c.B. at xO, there exists a sequence that F (xO, {yq) and an index ij such that yQE M(xq) for q 2 ij and yq -+ yo. From F(x4, yq) for each q, it the upper continuity of F , and since 4(x4) follows that 6 4 ( x 4 ) I i m ~ ( ~yq) 4 , F(x', yo) = 4(x0), which implies 0 that 4 is U.S.C.at xO.
<
<
<
3 Duality Bound Methods in Global Optimization
93
The above proposition allows us to consider the lower semicontinuity (according to Berge) of the point-to-set mapping M , instead of establishing the upper semicontinuity of the function 4. A comprehensive representation of theoretical results on the lower semicontinuity (according to Berge) of M can be found, e.g., in Bank et al. (1983). In the framework of our branch and bound algorithm, we shall consider some special cases, to which the algorithm can be applied successfully. PROPOSITION 3.7 Assume that the system Gi(x, y) 5 0 (i = 1,. . . ,m), y E Y in problem (3.6) can be described as a linear system of the form
where A, B and d are respectively two matrices and a vector of appropriate sixes. Then M is 1.s.c.B. at each point of CO. Proof. See Thoai (2002a).
0
The next result concerns a class of problems originally considered by Dantzig et al. (1967). To our purpose, we consider this problem class in a more general form as follows. For each i = 1 , . . . , ml 5 m let the functions Gi (x, y) be given by
where a i : C 4 Rp and bi: C 4 R are continuous functions for i = 1 , . . . ,m l , and assume that the system Gi(x, y) 5 0 (i = 1 , . . . ,m), y E Y is described by
where H is an m2 x p matrix and h E RP. For each point x E C0 let
where for each j = 1 , . . . ,m2, Hj is the j t h row of the matrix H, and ) Hj let A(x) and H ( x ) be matrices having the rows ai(x) (i E ~ ( x ) and ( j E ~ ( x ) )respectively. ,
PROPOSITION 3.8 Assume that in Problem (3.6) the functions Gi (i = 1 , . . . , m l ) and the system Gi(x, y) 5 0 (i = 1 , . . . , m ) , y E Y are given as in (3.13) and (3.14). Then M is 1.s.c.B. at x E C0 if one of the following conditions is satisfied:
94
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
(i) I ( x ) U J ( x ) = 0; (ii) The matrix
(tjfr,) has full rank.
Proof. This proposition follows from Bank et al. (1983, Theorem 3.2.l(3) and Corollary 3.2.3.2(2)). 0
4.2
Partly convex programming
An interesting special case of Problem (3.6) is the so-called 'partly convex programming problem'. A function f : C x Y + R is called partly convex if the function f jx, .) is convex on Y for each x E C. Problem (3.6) is called partly convex program if F and Gi (i = 1 , . . . ,m) are partly convex functions defined on C x Y. As in (3.7), we denote by Z the feasible set of Problem (3.6). We also assume that the partly convex functions F and Gi (i = 1 , . . . , m) are all continuous on C x Y, and Problem (3.6) has an optimal solution. Based on the partial convexity of the functions F and Gi (i = 1, . . . ,m), we obtain the following convergence property of our decomposition branch and bound algorithm.
THEOREM 3.5 Assume that the decomposition branch and bound algorithm generates an infinite subsequence of partition sets, {RQ), such that (i) Rqtl
c Rq for all q
and lim,,,
Rq =
nZi Rq = {r*),
(ii) there is zero duality gap at r*, i. e., min{F(r*,y): Gi(r*,y)
< O(i = 1 , . . . , m), y E Y)
Then ({r*)x Y) f l Z # 0 and (r*,y*) is an optimal solution of Problem (3.6), where y* is an optimal solution of the convex program min{F(r*, y) : Gi(r*,y) 5 O(i = 1,. . . , m), y E Y). Proof. See Thoai (2002b)- For related results, see also Diir and Horst (1997).
5.
Application to some nonconvex problems
In this section, the general branch and bound algorithm is applied to some interesting amd important problems in global optimization. The main task is the computation of duality bounds, i.e., solving the dual problems of given nonconvex optimization problems.
3 Duality Bound Methods i n Global Optimization
5.1
95
Concave minimization under reverse convex constraints
Consider the problem class (see Dur and Horst, 1997) min f (x) s.t. gi(x) 5 0, i = I , . . . , m , x E C, where f : Rn -+ R and gi: Rn -+ R i = 1 , . . . , m, are concave mappings, and C c Rn is a polytope with known vertex set V(C) . Since f (x) Cgl Xigi (x)is concave in x for all X E RT, the dual to (3.16) becomes
+
which is equivalent to the linear program (in t E R, X E Rm) max t
5.2
Problems with linear-concave constraints
For an application of our decomposition branch and bound method, we take problems of the following type (cf. Ben-Tal et al., 1994; Dur and Horst, 1997; Thoai, 2002a): min cT y s.t. A(x)y 5 b XEC, yEY, where c E RP, b E Rm, C and Y polytopes in Rn and RP, respectively, and A(%) is a continuous matrix mapping C -+ RmXP. Assume that each entry aij(x) of A(x) is a concave function C --, R (alternatively it turns out that quasiconcavity of each row U T ( X )is~ sufficient for the practical applicability of dual bounds). Notice that (3.17) includes bilinearly constrained problems and various practical problems such as, for example, the pooling problem in oil refineries (cf., e.g., Ben-Tal et al., 1994). Often, one encounters the condition y 2 0 rather than y E Y, C polytope. However, when upper bounds on y are known, which is often the case, the conditions y 2 0 can be 'replaced' by y E Y, with the
96
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
compact set Y defined as Y := {y E Rp: 0 5 y 5 ye), y > 0 sufficiently large, e = (1,.. . , l)TE Rp. The dual of Problem (3.17) is max min {cry
' X E XEC ~
+ XT ( ~ ( xy )- b) ).
YEY
When Y is the above box with y sufficiently large, Problem (3.18) reduces to a linear program, if we assume that there is X E RZ;. such that AT(x)X c 2 0 Vx E C. This assumption is fulfilled, for example, when A(x) has a row with positive entries for all x E C . Notice that such a row can always be generated by adding the redundant constraint e T y 5 yp to the original constraints. Given the above assumption, Problem (3.18) reduces to
+
rnax -bTX
+
2 0 vx X E EX?.
s.t. A ~ ( X ) X
E
c
(3.19)
Let aT(x)X+cj denote the j t h row in AT(x)X+c. Then the constraints in (3.191 are equivalent to X E R y and
But, by our concavity assumption on the elements of A(x), each minimum in (3.20) is attained at a vertex x of C so that (3.19) reduces to the linear program max -bTX s.t. a T ( x ) ~ + c2 ~0 ,
x E V ( C ) , j = 1,...,p
(3.21)
X E RT, where V(C) denotes the vertex set of C. Notice that C is often a simplex or a p-rectangle with known vertex set.
5.3
Maximizing the sum of affine ratios
Let A E Rmxp, b E Rm and P := {y E RP:Ay bounded. Furthermore, for i = 1 , 2 , . . . ,n, let
> b,y
2 0) be
with ci, di E RP and ai, Pi E R be affine functions satisfying di(y) > 0 for y E P.
97
3 Duality Bound Methods i n Global Optimization
Consider the following problem of maximizing the sum of linear fractions over polytopes. n.
s.t. Ay
2 b,
y
2 0.
Notice that, since in practical instances of Problem (3.22) the number n of quotients in the objective function can be expected to be considerably smaller than the number p of decision variables involved, we propose the following reformulation which allows the design of convergent branch and bound methods where branching (partitioning) is employed only in the space Rn. For i = 1, . . . ,n , determine the numbers
Notice that the above fractional problems can be solved by linear programming techniques (see, e.g., Charnes and Cooper, 1962). Let =(17...7n),3=
(
1
,
3
)
and
C = {X
E Rn:
5 x 5 3).
Then, introducing the new variable vector x = ( x l , . . . , x,), it is easy to see that Problem (3.22) is equivalent to the problem
If y* is an optimal solution of (3.22), then (x*,y*) with
is an optimal solution of (3.23). Conversely, if (x*,y*) is an optimal solution of (3.23), then x: = ni(y*)/di(y*),i = 1,. . . , n, and y* is an optimal solution of (3.22). For all x E C , let
98
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Then, the constraints in (3.23) can be reformulated as
and the dual of (3.23) is
As shown in Diir and Horst (1997) and Diir et al. (2001), the dual Problem (3.24) reduces to a linear program.
Optimization problem over the efficient set
5.4
Consider the following multiple objective linear programming problem: maxbiT y
(
l
,
,n),
s.t. D y + d L O , y L O ,
(3.25)
where D is a .rr x p matrix, d E Rn and bi E IWP \ (0) for i = 1 , . . . ,n. The vectors bi are called 'criterion vectors' of Problem (3.25). Let Y be the feasible set of Problem (3.25) and B the n x p matrix having the rows by,. . . ,.b: A point y E Y is called an efficient solution By and of Problem (3.25), if there is no point z E Y such that Bz Bz # By. It is well known that a point y E Y is efficient if and only if there exists a point x E Rn, x j > 0 ( j = 1 , . . . , n ) , Cy=lx j = 1 such that x T ~ =y max{xTBz: z E Y}. Letting 6
> 0 be a sufficiently small number, C the simplex defined by
and c E RP, we consider the optimization problem min {cTy : x E C, y E
Y,max{xTBz: z E Y}5 x T ~ y } } .
(3.27)
3 Duality Bound Methods i n Global Optimization
Define the function h: C
-t
99
iR by
h(x) = m a x { x T B ~ :z E Y).
(3.28)
Then Problem (3.27) can be rewritten in the form
Problem (3.29) is an important approach in multiple objective programming . It is a difficult multiextremal global optimization problem, since its feasible set is in general nonconvex. Notice that while applying our decomposition branch and bound algorithm, for each x E C there is y E Y such that xTBy = max{xTBz: z E Y),
i.e., h(x) - xTBy = 0.
Therefore we have
which implies that C0 = C. From this, for each simplex R C C a set F ( R ) c C0 n R can be chosen in an arbitrary manner. It is shown in Le Thi et al. (2002) and Thoai (2002a) that for a given simplex R E C, the dual of Problem (3.29) with C = R can be converted into an ordinary linear program (in the variables X E ElT, t E R) given by: max dTX hRt (3.30) ~ . tD . ~ --X( B ~ x ) c~> 0 (VX E v(R))
+
+
(A, t) 2 0, where V(R) is the vertex set of R and hR is the optimal value of the following linear program (in variables (y, r) E IWp+'):
5.5
Nonconvex quadratic programming problem
5.5.1 Indefinite quadratic problem with convex constraints. Consider the following quadratic programming problem:
+
min f (x) = X ~ Q XqTx
100
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where 11, I2are finite index sets, Q, Ai, i E Il are real n x n matrices, ci E Rn and di E R b'i E Il U 12. It can be shown that the dual of Problem (3.31) is equivalent to a semidefinite program in the following sense (cf. Novak, 2000). Let L(x, A) be the Lagrangian of Problem (3.31), 2 E Rn an arbitrary point, Q(X, a) E R(n+l)x(n+l)the matrix defined by
and S a subset of RIr11+Ir21x IW defined by
S = {(A, a) : Q(A, a)
0, Xi 1 0, i E I I ) .
Then the optimal value of the dual of (3.31) is equal to the optimal value of the semidefinite program max{a: (A, a) E S ) .
General quadratic problem with one additional quadratic constraint. The general quadratic programming problem with one additional quadratic constraint is formulated as follows. 5.5.2
+
min f (x) = x T ~ x qTx
where Q, B are real n x n matrices, A is a real m x n matrix, q, c E Rn, d E Rm, h E R, and C is a rectangle defined by
with a0 and b0 being vectors of Rn with finite components. Let D be a subset of Rn defined by
Notice that in general, the set D is nonconvex and even nonconnected. For each rectangle R = {x E Rn: a x b) C_ C (C R?), let a, 6, ,B and p be vectors of Rn defined by
< <
ai=min{Qix: x E R),
6i
=mm{Qix:
X E
R)-ai,
pi = min{Bix : x 'E R), pi = max{Bix : x E R) - pi,
3 Duality Bound Methods in Global Optimization
101
where for each i = 1 , . . . ,n , Qi and Bi are the ith rows of the matrices Q and B, respectively. Using two additional vectors of variables, y, z E Rn, we transform Problem (3.32) into
+ +
min F ( x , y) = xTy (q a ) T x s.t. Qx - y - a 5 0 Ax+d
In Thoai (2002a), it is shown that a point x* is an optimal solution of Problem (3.32) if and only if there exist y* and z* such that (x*,y*, z*) is an optimal solution of Problem (3.33). In this case we have f (x*) = F(x*,y*). Moreover, a lower bound p(R) for the Problem (3.33) is obtained by computing by p(R) = (q+ a ) T a v(R), where v(R) is the optimal value of the following linear program (in variables s E Rn, u e Rn, v E Rm, w E Rn, t E EX, fi E Rn, and w E Rn):
+
+ (Qa - a ) T u + (Aa + d ) T ~ ) +~(aTp w+ aTc + h)t - bTfi - pT6}
v(R) = max{(a +(Ca - ~
-
b)Ts
Linear multiplicative programming problem Let Z be a polytope in RP, and ci(y) ( i = 1,. . . ,n) affine functions satisfying ci(y) > 0 for any y E 2.The problem 5.6
is called a 'linear multiplicative programming problem'. By using n additional variables xi (i = 1, . . . ,n ) , we can rewrite the above problem
102
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
in the following equivalent form: n
min f (L) =
xi i=l ~ . t c. i ( y ) - x i < O , ( i = l , ..., n), x€C,y€Z, where C is a rectangle defined by C = { X E R ~l:<xi :
...,n ) )
with 1: 5 min{ci(y): y E Z) and u: 2 max{ci(y): y E Z), i = I , . . . ,n. Problem (3.36) is a special case of Problem (3.6) in which the objective function does not depend on y. For the application of our decomposition branch and bound algorithm to Problem (3.36), notice that the function 4 according to (3.36) is dualproper at each point in CO. Moreover, since the objective function of (3.36) does not depend on y, the continuity of 4 follows from the continuity of f over C. For each polytope R c C, let V(R) denote the vertex set of R. Moreover, assume that the system ci(y) - xi 5 0, (i = l , . . . ,n), x € R, y € Z can be rewritten as
where E is the n x n unit matrix, B and D are matrices of sizes n x p and n x p, respectively, c E Rn, and d E RT. Then, as shown in Thoai (2002a), t,he dual problem according to Problem (3.36) is equivalent to the following linear program (in the variables (A1,X2, t) E EXn+=+'):
6.
Conclusions
A thorough investigation of the use of the Lagrange dual of a nonconvex global minimization problem as lower bounding procedure in a branch and bound method reveals astonishing nice theoretical and practical properties:
3 Duality Bound Methods in Global Optimization
103
1. Lagrangian dual lower bounds always lead to convergent algorithms where, in contrast to convex duality, no regularity condition is needed. An optimal solution can be determined even if the underlying branch and bound approach never provides upper bounds via feasible points. 2. The Lagrange dual bound is always at least as good and sometimes better than the bound obtained by the convexified problem in which each nonconvex function is replaced by its convex envelope (the uniformly best convex underestimator). 3. For many important classes of global optimization problems the Lagrange-dual problem can be shown to be equivalent to a linear program, and hence can be solved in polynomial time. These include concave minimization under reverse convex constraints, bilinear programming, optimization of the sum of affine ratios, multiobjective programming, quadratic problems with quadratic constraints and linear multiplicative programs.
References Bank, B., Guddat, J., Klatte, D., Kummer, B., and Tammer, K. (1983). Nonlinear Parametric Optimization. Birkhauser, Basel. Bazaraa, M.S., Sherali, H.D., and Shetty, C.M. (1993). Nonlinear Programming: Theory and Algorithms. 2nd ed., John Wiley & Sons, Inc. Ben-Tal, A., Eiger, G., and Gershovitz, V. (1994). Global minimization by reducing the duality gap, Mathematical Programming, 63:193-212. Berge, C. (1963). Topologigal Spaces. Macmillan, New York. Charnes, A. and Cooper, W.W. (1962). Programming with linear fractional functionals. Naval Research Logistics Quarterly, 9:181-186. Dantzig, G.B., Folkman, J., and Shapiro, N. (1967). On the continuity of the minimum set of a continuous function. Journal of Mathematical Analysis and Applications, 17:519-548. Dur, M. (2001). Dual bounding procedures lead to convergent branch and bound algorithms. Mathematical Programming, 91: 117-125. Dur, M. (2002). A class of problems where dual bounds beat underestimation bounds. Journal of Global Optimization, 22:49-57. Dur, M. and Horst, R. (1997). Lagrange duality and partitioning techniques in nonconvex global optimization. Journal of Optimization Theory and Applications, 95:347-369.
104
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Diir, M., Horst, R., and Thoai, N.V. (2001). Solving sum-of-ratios fractional programs using efficient points. Optimization 49:447-466. Falk, J.E., and Hoffman, K.L. (1976). A successive underestimation method for concave minimization problems. Mathematics of Operations Research, 1:251-259. Fiacco, A.V. (1983). Introduction to Sensitivity and Stability Analysis in Nonlinear Programming. Academic Press, New York. Geoffrion, A.M. (1971). Duality in nonlinear programming: a simplified application-oriented development. SIAM Review, 13:1-37. Hogan, W.W. (1973). Point-to-set maps in mathematical programming. SIAM Review, 15:591-603. Horst, R. and Pardalos, P.M. (eds.) (1995). Handbook of Global Optimization. Kluwer Academic Publishers. Horst, R., Pardalos, P.M., and Thoai, N.V. (2000). Introduction to Global Optimixation. 2nd ed., Kluwer, Dordrecht, The Netherlands. Horst, R. and Tuy, H. (1996). Global Optimixation: Deterministic Approaches. 3rd ed., Springer, Berlin. Le Thi, H.A., Pham, D.T., and Thoai, N.V. (2002). Combination between global and local methods for solving an optimization problem over the efficient set. European Journal of Operations Research, 142:258-270. Mangasarian, O.L. (1979). Nonlinear Programming. Robert E. Krieger Publishing Company, Huntington, New York. Nowak, I. (2000). Dual bounds and optimality cuts for all-quadratic programs with convex constraints. Journal of Global Optimixation, 18:337-356. Shor, N.Z. and Stetsyuk, P.I. (2002). Lagrangian bounds in multiextremal polynomial and discrete optimization problems. Journal of Global Optimization, 23: 1-41. Thoai, N.V. (2000). Duality bound method for the general quadratic programming problem with quadratic constraints. Journal of Optimization Theory and Applications, 107:331-354. Thoai, N.V. (2002a) Convergence and application of a decomposition method using duality bounds for nonconvex global optimization. Journal of Optimization Theory and Applications, 113:165-193.
3 Duality Bound Methods i n Global Optimization
105
Thoai, N.V. (2002b) Convergence of duality bound method in partly convex programming. Journal of Global Optimization, 22:263-270.
Chapter 4
GENERAL QUADRATIC PROGRAMMING Nguyen Van Thoai Abstract
1.
The general quadratic programming problem is a typical multiextremal optimization problem, which can have local optima different from global optima. This chapter presents an overview of actual results on general quadratic programming, focusing on three topics: optimality conditions, duality and solution methods. Optimality conditions and solution methods are discussed in the sense of global optimization.
Introduction
In this chapter, by 'general quadratic programming problem' we mean an optimization problem, in which all functions involved are quadratic or linear and, in general, local optima can be different from global optima. The class of general quadratic programming problems plays a prominent role in the field of nonconvex global optimization because of its theoretical aspects as well as its wide range of applications. On the one hand, many real world problems arising from economies and engineering design can be directly modelled as quadratic programming problems. On the other hand, general quadratic programming includes as special cases the equivalent formulations of many important and well studied optimization problems, e.g., linear zero-one programs, assignment problems, maximum clique problems, linear complementarity problems, bilinear problems, packing problems, etc. Last but not least, some special quadratic programming problems are used as basic subproblems in trust region methods in nonlinear programming. Applications of general quadratic programming problems can be found in almost all of the references given at the end of this chapter. The present overview focuses on actual results of three topics in general quadratic programming: optimality conditions, duality and solution
108
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
methods. Optimality conditions and solution methods are discussed in the sense of global optimization. In Section 2, global optimality conditions and duality results for some special problems are presented. Section 3 deals with solution methods. Three basic approaches which are successfully used in global optimization and different techniques for the realization of them in general quadratic programming are discussed.
2.
Global optimality conditions and duality
In this section, we consider general quadratic optimization problems given by
where 11,I2are two finite index sets, A, Qi, i E Il U Iz,are n x n symmetric real matrices and b, qi E Rn, di E R for all i . The notation (., .) stands for the standard scalar product in Rn. If I2= 0 and all functions f,gi, i E Il are convex, then (4.1) is the well known convex quadratic programming problem. Since the subject of this chapter consists of nonconvex problems, we assume throughout the chapter that in the case I2 = 0, at least one of the functions f , gi, i E 11,is nonconvex. For convenience, we sometimes write the objective function of problem (4.1) in the form f (x) = (Ax, x) 2(b, z). Applications of problem (4.1) includes subproblems required in trust region methods (cf., e.g., Mod, 1993; Martinez, 1994; Heinkenschloss, 1994, and references given therein), and quadratic integer programming problems, since each constraint xi E { O , 1 ) can be replaced by the constraint x i ( l - xi) = 0. The purpose of this Section is to present global optimality conditions and properties of Lagrangian duality for some interesting special cases of problem (4.1).
+
2.1
Global optimality conditions
We begin with two special cases of problem (4.1) where II = 0, I2= (1) and I 2 = 0, 11 = {I), respectively. These are generalizations of two problems used in trust region methods:
+
min f (x) = ;(Ax, X) ( b , 2) s.t. gl(x) = (Qlx, x) (ql, 2) dl = 0
+
+
(4.2)
4 General Quadratic Programming
109
and
+
min f ( x ) = ( A x ,x ) (b,x ) s.t. g l ( x ) = ( Q l x , ~ ) ( q l , ~+)d l 5 0.
+
(4.3)
T H E O R E4.1 M ( C F . MORI?, 1993) Assume that i n problem (4.2), the function gl is nonlinear, i.e. Q1 is not a null matrix, satisfying
i n f { g l ( x ): x E Rn) < 0 < sup{gl ( x ): x E Rn). Then a point x* is a global optimal solution of problem (4.2) i f and only if g l ( x * ) = 0 and there is a multiplier A* E R such that
(ii) A
+ A*Q1 is positive
semidefinite.
Notice that ( i ) is the well known Kuhn-Tucker condition for problem (4.2). For problem (4.3), by using an additional variable y E R , one can rewrite this problem equivalently in the form
+ (b,
min f ( x ,y ) = ; ( A X X, ) s.t. g l ( x , y ) = ( Q I x ,x )
2)
+ ( q i ,x ) + dl + y2 = 0.
(4.4)
As a result, one obtains the following optimality condition for problem (4.3) from Theorem 4.1.
T H E O R E4.2 M ( C F . MORE, 1993) Assume that i n problem (4.3), the function gl is nonlinear satisfying
< 0.
inf {gl ( x ): x E Rn)
Then a point x* is a global optimal solution of problem (4.3) if and only if gl(x*) 5 0 and there is a multiplier A* 0 such that
>
+
(ia) ( A A*Ql)x*
+ b + X*ql = 0 ,
(ib) X*gl(x*) = 0 , (ii) A
+ X*Q1
is positive semidefinite.
Some interesting illustrative examples to Theorems 4.1 and 4.2 are given in Hiriart-Urruty (1998). Another kind of optimality conditions is presented in Hiriart-Urruty (2001). Hiriart-Urruty considers a special case of problem 4.3 given by max f ( x ) = & ( A x x, ) s.t. g l ( x ) = ( Q i x ,2 )
+ (b,x )
+ ( q i ,x ) + dl I 0 ,
(4.5)
110
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where f is convex nonlinear, gl is convex and there is xO E Rn such that gl(xO)< 0 (Slater's condition). A global optimality condition for problem (4.5) is formulated as follows.
THEOREM 4.3 (CF. HIRIART-URRUTY, 2001) A point x* with g1(x*) = 0 is a global optimal solution of problem (4.5) if and only if it holds that (Ad,d) - (Qld,d)
(gz::;:)
5 0 'dd :d 0, where
Actually, Theorem 4.3 is equivalent to Theorem 4.1 and is obtained from an optimal condition for the problem of maximizing a convex function f over a closed convex subset C c Rn (cf. Hiriart-Urruty, 1989, 2001). For this problem, x* E C is a global optimal solution if and only if 8, f (x*) c N,(C, x*) Ye > 0, (4.6)
2 0, the &-subdifferential of f at x* is defined by 8, f (x*) = { S :d Rn : f (x) 2 f (x*) + ( s ,x - x*) - E 'dx E Rn)
where, for
E
and the set of e-normal directions to C at x* is defined by
Using the support functions of 8, f (x*) and N,(C, x*) and the quadratic structures of f and C = {x E Rn: gl(x) 5 01, one obtains Theorem 4.3 from (4.6). The next special case of problem (4.1) is the following min f (x) =
4
(AX,
x)
+ (b, x)
+
s.t. gi(x) = ( Q i x , ~ ) ( q i , ~+) d l I 0 g2(x) = (Q22,x) + (q2,x) + d2 i 0.
(4.7)
For the establishment of optimality conditions for this problem, the situation is rather troublesome. In general, necessary conditions and sufficient conditions for global optimality of (4.7) had to be established separately. From results of second order conditions, one obtains the following sufficient condition.
THEOREM 4.4 (CF. HIRIART-URRUTY, 1998) A feasible point x* of problem (4.7) is a global optimal solution if there exist multipliers X1 2 0, X2 0 such that
>
v f (x*) + Xlvgl(x*) + X2Vg2(x*) = 0,
4 General Quadratic Programming
111
Xlgl(x*) = X2g2(x*) = 0, X2Q2 is positive semidefinite.
A + XIQl
+
The above result can be extended to the case of more than two quadratic constraints. As an example, consider the following problem of minimizing a quadratic function over the intersection of ellipsoids: min f ( x ) = ( A x ,x )
+ 2(b,x )
s.t. g i ( x ) = ( Q i ( x - a i ) , ( x - a i ) ) - r i 2 5 0 ,
i=l,
...,m ,
(4.8)
where A is a symmetric matrix, Qi ( i = 1 , . . . , m) are symmetric positive semidefinite matrices and ai E Rn, ri > 0, ( i = 1, . . . , m) are given vectors and numbers, respectively. Obviously, each set
is an ellipsoid with the center ai and the radius
Ti.
THEOREM 4.5 ( C F . FLIPPO AND JANSEN, 1996) A feasible point x* of problem (4.8) is a global optimal solution if there exist multipliers Xi 0, i = 1 , . . . , m such that
>
A+
x
XiQi is positive semidefinite.
i=l
We discuss now necessary conditions. Let x* be a global optimal solution of problem (4.7). If no constraint is active at x*, i.e., gl(x*) < 0, g2(x*) < 0, then it is well known that V f ( x * ) = 0 and A is positive semidefinite. If only one constraint is active at x*, say gl ( x * ) = 0, g2(x*) < 0, and V g l ( x * ) # 0, then it follows from (local) second or0 such that V f ( x * ) der necessary conditions that there exists X 1 XIVgl(x*)= 0 and A XIQ1 has at most one negative eigenvalue. For the case that both constraints are active at x*, i.e. gl(x*) = g(x*) = 0, necessary conditions are established based on the behavior of the gradients of gl and g2 at x*.
+
>
+
THEOREM 4.6 ( C F . PENGAND Y U A N ,1997) Let x* be a global optimal solution of problem (4.7). (a) If V g l ( x * ) and V g 2 ( x * )are linearly independent, then there exist A1 2 0, X z 2 0 such that
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
+
+
(i) V f (x*) XIVgl(x*) X2Vg2(x*)= 0 and (ii) A XIQl X2Q2 has at most one negative eigenvalue.
+
+
(b) If Vgl(x*) = aVgz(x*) # 0 for a A2 2 0 such that (i) holds and (iii) A
> 0, then there exist X1
> 0,
+ XIQl + X2Q2 is positive semidefinite.
To get some necessary and sufficient conditions for global optimality of problem (4.7), one has to make some more assumptions. In HiriartUrruty (2001), the following special case of problem (4.7) is considered:
where f is convex nonlinear, gl, g2 are convex and there is xO E Rn such that gl(xO)< 0, g2(x0) < 0 (Slater's condition). Based on Condition (4.6), necessary and sufficient conditions for global optimality are obtained for problem (4.9). These results are extensions of Theorem 4.3.
THEOREM 4.7 (CF. HIRIART-URRUTY, 2001) (a) A point x* with gl(x*) = g2(x*) = 0 is a global optimal solution of problem (4.9) if and only i f there exist XI 0, X2 0 such that
>
>
'dd E Iiit T(C,x*), where C is the convex feasible set of problem (4.9) and T ( C , x * ) the tangent cone to C at x* defined by
(b) A point x* with gl(x*) = 0, g2(x*)< 0 is a global optimal solution of problem (4.9) i f and only if there exist X1 0 such that
>
Ax*
+ b = Xi(Qlx.* + qi), b'd
E Int T ( C ,x*),
4 General Quadratic Programming
113
where
Another special case of problem (4.7) is considered in Stern and Wolkowicz (1995). It is the problem min f (x) = (Ax, x) - 2(b, x) s.t. - oo 5 p 5 (Qx, x) 5 a 5 +oo,
(4.10)
where A and Q are symmetric matrices.
THEOREM 4.8 ( C F . STERNAND WOLKOWICZ, 1995) Let x* be a feasible point of problem (4.10) and assume that the following 'constraint qualification' holds at x*: Qx* = 0
implies ,8 < 0
< a.
T h e n x* is a global optimal solution if and only if there exists X E R such that (A - XQ)x* = b, (A - XQ) is positive semidefinite, X(/3 - (Qx*,x*)) 0 2 ~ ( ( Q x *x*) , - a).
>
2.2
Duality
General quadratic programming is a rare class of nonconvex optimization problems in which one can construct primal-dual problem pairs without any duality gap. A typical example for this is problem (4.10). For XI 2 0, X2 1 0 define the Lagrangian
of problem (4.10) and the dual function
Then the Langrangian dual problem of (4.10) is defined as the problem
Assume that THEOREM 4.9 (CF. STERNAND WOLKOWICZ, 1995) problem (4.10) has a n optimal solution. T h e n strong duality holds for the problem pair (4.10)-(4.11), i.e., f * = d*, where f * and d* denote the optimal values o f problem (4.10) and problem (4.11), respectively.
114
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Two interesting special cases of problem (4.10) are respectively the problems with Q = I (the unit matrix), ,b' < 0 < a and Q = I, ,b' = a > 0. Notice that the problem of minimizing a quadratic function over an ellipsoid, the constrained eigenvalue problem and the quadratically constrained least squares problem can be converted into these special cases (cf. Pham and Le Thi, 1995; Flippo and Jansen, 1996). There is another way to construct a dual problem of (4.10). For XI 5 0, X2 5 0 such that the matrix (A - XIQ X2Q) is regular, define the quadratic dual function
+
Then the optimization problem SUP h(X1, X2) s.t.X1 1 0, A:! 5 0, (A - XIQ X2Q) is positive definite.
+
(4.12)
can be considered as a dual of problem (4.10), and we have the following.
Assume that THEOREM 4.10 (CF. STERNAND WOLKOWICZ, 1995) problem (4.10) has an optimal solution and there is X E IR such that the matrix A - XQ is posztive definite. Then strong duality holds for the problem pair (4.10)-(4.12), i.e., f* = h*, where f* and h* denote the optimal values of problem (4.10) and problem (4.12), respectively.
3.
Solution met hods
The general quadratic programming problem is NP-hard (cf. Sahni, 1974; Pardalos and Vavasis, 1991). In this section, we present some main solution methods for the global optimization of this NP-hard problem. In general, these methods are developed based on three basic concepts which are successfully used in global optimization. We describe these concepts briefly before presenting different techniques for the realization of them in general quadratic programming. For details of three basic concepts, see, e.g., Horst and Tuy (1996), Horst and Pardalos (1995), and Horst et al. (2000, 1991). It is worth noting that most techniques to be presented here can be applied to integer and mixed integer quadratic programming problems (which do not belong to the subject of this overview).
3.1
Basic concepts
To establish this concept, we conOuter approximation (OA). sider the problem of minimizing a linear function ( c ,x) over a closed
4
General Quadratic Programming
115
subset F C W1. This problem caAn be replaced by the problem of Unding an extreme optimal solution of the problem min{(c, x): x G F } , where F denotes the convex hüll of F . Let C\ be any closed convex set containing F and assume that x1 is an optimal solution of problem min{(c, x): x G C\], Then x1 is also an optimal solution of the original problem whenever x1 G F. The basic idea of the outer approximation concept is to construct iteratively a sequence of convex subsets {C/e}, k = 1,2,... such that C\ D C2 D • • • D F and the corresponding sequence {xk} such that for each /c, xk is an optimal solution of the relaxed problem min{(c, x): x G Ck}> This process is performed until finding xk G F. An OA procedure is convergent if it holds that xk —> x* G F for k —> +00. Branch and bound scheme (BB). The BB scheme is developed for the global optimization of problem /* = min{/(x): x G F} with / being a continuous function and F a compact subset of Mn. It begins with a convex compact set C\ D F and proceeds as follows. Compute a lower bound JJL\ and an upper bound 71 for the optimal value of the problem min{/(x) : x G CiDF}. (71 = f{xl) if some feasible solution x1 G F is found, otherwise, 71 = +00). At Iteration k > 1, if +00 > fik > jk o r jj,k z=z 4-00, then stop, (in the first case, xk with f(xk) = 7^ is an optimal solution, in the second case, the underlying problem has no solution). Otherwise, divide Ck into finitely many convex sets Ckx,..., Ckr satisfying Ui=i Cki — Ck andC/c- D C^. = 0 for i =fi j , (the sets Ck and C/- are called 4partition sets'). Compute for each partition set a lower bound andan upper bound. Update the lower bound by choosing the minimum of lower bounds according to all existing partition sets, and Update the upper bound by using feasible points found so far. Delete all partition sets such that the corresponding lower bounds are bigger than or equal to the actual upper bound. If not all partition sets are deleted, let Ck+i be a partition set with the minimum lower bound, and go to Iteration k + 1. A BB algorithm is convergent if it holds that 7^ \ /* and/or jjik / * / * for k —> + 0 0 .
Combination of B B and OA. In many situations, the use of the BB scheme in combination with an OA in the bounding procedure can lead to efficient algorithms. Such a combination is called branch and cut algorithm, if an OA procedure using convex polyhedral subsets Ck Vfc > 1 is applied.
116
3.2
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Reformulation-linearization techniques
Consider quadratic programming problems of the form min f (x) = (c, x) s.t. gi(x) 5 0, i = 1 , . . . , I (ai,x) - bi 5 0,
(4.13)
i = 1, ... , m ,
where c E Rn, ai E Rn, bi E R 'di = 1,.. . ,m, and for each i = 1 , . . . ,I, the quadratic function gi is given by
with di, q;, Q i k , Qil being given real numbers for all i, k , I . It is assumed that the polyhedral set X = {x E Rn: (ai, x) - bi 5 0, i = 1 , . . . ,m) is bounded and contained in Rn+= {x E R n : x 2 0). The first linear relaxation of problem (4.13) is performed as follows. For each quadratic function of the form
define additional variables v k = x k2 , k = 1 , ...,n, and wkl=xkxl, k = 1 , ..., n - 1 ; 1 = k + 1 ,
..., n.
From (4.15), one obtains the following linear function in variables x, v, w: n
n
n-1
n
The linear program (in variables x, v and w) min f (x) = (c, x) s.t. [gi(x)]&O, i = 1 , [(bi - (ai,x))(bj- (aj,x))le 2 0,
...,I
(4.17)
V1 5 i 5 j 5 m
is then a linear relaxation of (4.13) in the following sense (cf. Sherali and Tuncbilek, 1995; Audet et al., 2000): Let f * and f be the optimal values of problems (4.13) and (4.17), respectively, and let ( 3 ,v, G ) be an optimal solution of (4.17). Then
4 General Quadratic Programming
117
(a) f * 2 f and (b) if @k = 3; V k = 1 , . . . ,n, zEkl = % k 3 l V k = 1 , . . . ,n 1,. . . ,n , then Z is an optimal solution of (4.13).
-
1; 1 = k
+
Geometrically, the convex hull of the (nonconvex) feasible set of problem (4.13) is relaxed by the projection of the polyhedral feasible set of problem (4.17) on Rn. As well-known, this projection is polyhedral. In the case that the condition in (b) is not fulfilled, i.e., either flk # 3; for at least one index k or zEkl # 3kZl for at least one index pair ( k ,l), a family of linear inequalities have to be added to problem (4.17) to cut zE) off from the feasible set of (4.17) without cutting off the point (3,@, any feasible point of (4.13). To this purpose, several kinds of cuts are discussed in connection with branch and bound procedures. Resulting branch and cut algorithms can be found, e.g., in Al-Khayyal and Falk (1983) and Audet et al. (2000).
3.3
Lift-and-project techniques
The first ideas of lift-and-project techniques were proposed by Sherali and Adams (1990) and Lovkz and Schrijver (1991) for zero-one optimization. ~ h e s basic e ideas can be applied to programming as follows. The quadratic programming problem to be considered is given in the form min f (x) = (c, x)
where C is a compact convex subset of Rn, c E Rn, and each function gi is given by (4.19) gi(x) = ( Q ~ x2) , 2(qi, X) di
+
+
with Qi being n x n symmetric matrix, qi E Rn, and di E R. To each vector x = ( x l , . . . ,x , ) ~ E Rn, the symmetric matrix X = xxT E Rnxn with elements Xij = xixj (i, j = 1 , . . . , n ) is assigned. Let Snbe the set of n x n symmetric matrices. Then each quadratic function
on Rn is lifted to a linear function on Rn x Sn defined by
where (Q, X ) = X Esn.
CYz1QijXij stands for the inner product
of Q,
118
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Thus, the set
can be approximated by the projection of the set {(x, X ) E Rn x S n : (Q, X )
+ 2(q,x) + d I0)
on Rn. By this way, the feasible set of problem (4.18) is approximated by the set {x E Rn: (Qi,X )
+ 2(qi,x j + ci'i 5 0 for some X E Sn, i = 1 , .x
C } , (4.20)
and problem (4.18) is then relaxed by the problem min f (x) = (c, x) s.t. ( Q i , ~ ) + 2 ( q i , x ) + . d i < 0 , i = l , ...,m XEC, X E S n .
(4.21)
Next, notice that for each x E Rn, the matrix
is positive semidefinite. 'Therefore, problem (4.18) can also be relaxed by the problem min f (x) = (c, x) s.t. ( Q ~ , X ) + ~ ( ~ ~ , X ) +i~=~l ,<..., O ,m xEC, X € S n
( $)
(4.22)
is positive semidefinite.
If the set C is represented by a finite set of linear inequalities, then problem (4.21) is a linear program (LP), which actually corresponds to problem (4.17). Let the set C be represented by a finite set of linear matrix inequalities. Then problem (4.22) is a semidefinite programming problem (SDP). To improve LP/SDP relaxations of problem (4.18) within a OA procedure, convex quadratic inequality constraints have to be added to problem (4.18) without cutting off any feasible point of it. Different classes of such additional quadratic inequality constraints are proposed in Fujie and Kojima (1997) and Kojima and Tunqel (2000a,b). Aspects of implementation and parallel computation techniques are given in Takeda et al. (2000, 2002).
4 General Quadratic Programming
3.4
119
d.c. decomposition and convex envelope 'techniques
3.4.1 d.c. cecomposition. The idea of representing a general quadratic function as the difference of two convex quadratic functions is quite simple: for each symmetric matrix Q E EXnxn, let p(Q) be the spectral radius (the maximal eigenvalue) of Q, then Q can be rewritten as the difference of two positive semidefinite matrices A, B with A = Q+XI, B = X I , where X 2 p(Q) and I is the unit matrix in Sn. Consider now the quadratic programming problem (4.18). Each quadratic function gi(x) can be represented as the difference of two convex functions as follows. , 2(qi,X) di = ai (x) - bi (x) with gi (z) = ( Q ~ Z X)
+
ai(x) = ((Q'
+
+ XiI)x, + 2(qi, + di, X)
bi (x) = Xi (x, x), Xi
X)
and
2 P(Q~).
Thus, problem ( (4.18) is rewritten in the form min f (x) = (c, x) s.t.ai(x)-.bi(x)<0, i = l , ..., m x E C. Let
Then problem (4.24) can be rewritten as a program i11 EXnf
l:
min f (x) = (c, x)
A function is called d.c. if it can be represented as a difference of two convex functions. Each of such representations is called a d.c. decomposition of this function. Problems of the form (4.24), where all functions ai(x), bi(x) are convex, are called d.c. programming problems, and problem (4.25) is called canonical d.c. program. Theoretical properties and solution methods for these problems can be found, e.g., in Horst and Tuy (1996), Horst et al. (2000), Le Thi and Pham (1997, 1998), Phong et al. (1995), and Horst and Thoai (1999).
120
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
3.4.2 Convex envelope techniques. The concept of convex envelopes of nonconvex functions is a basic tool in theory and algorithms of global optimization, see e.g., Falk and Hoffman (1976), Horst and Tuy (1996), Horst et al. (2000). Let C C Rn be nonempty convex, and let f : C + R be a lower semicontinuous function on C. The function
c
R, x H (PC, (x) := sup{h(x) : h : C --t R convex, h 5 f on C) 'PC,fZ
-+
is said to be the convex envelope of f over C. Convex relaxations of quadratic programming problems with inequality constraints can be obtained by replacing all nonconvex quadratic functions by their convex envelopes. Linear relaxations are then simply obtained from convex relaxations. To construct convex envelopes of quadratic functions, two kinds of convex sets, simplices and rectangles, are chosen for C. Relaxation techniques using convex enveiopes in connection with branch and bound procedures for general quadratic problems can be found, e.g., in Pardalos and Rosen (1987), Al-Khayyal et al. (1995), Horst and Thoai (1996), Horst and Tuy (1996), Raber (1998), Horst et al. (2000), and Locatelli and Raber (2002).
3.5
Bilinear programming techniques
The problem of minimizing a concave quadratic function f(x)=(Qx,x)+(q,x),
with
Q€Rnxn,
qeRn
over a polyhedral set X E Rn+ can be reduced to an equivalent bilinear programming problem
in a sense that from an optimal solution (x*,y*) of the latter problem one has two optimal solutions x*, y* of the former, and from an optimal solution x* of the former one obtains an optimal solution (x*,x*) of the latter (cf. Konno, 1976). In general, quadratic programming problems of the form min f (x) = (&Ox,x)
+ (qO,
2)
4 General Quadratic Programming
121
where Qi (i = 0 , . . . ,I)are symmetric n x n matrices, qi (i = 0 , . . . , I ) , ai (i = 1,. . . ,m) vectors of Rn, di (i = 1, . . . ,I),bi (i = 1,. . . ,m) real numbers, and C a polyhedral set with simple structure (e.g., RT, simplex or rectangle) can be reduced to one of the following equivalent bilinear programming problems: min(QOx,Y)
+ (qO,4
s.t. ( Q ~ x , ~ ) + ( ~ ~5 ,0 x, ) i+= ~1,..., ~ I ( a 2 , x ) - b i < O , i = l , ..., m, xi - yi = 0, i = l , . . . , n. XEC, y E C
(4.27)
and
x E C, y i E ~ i , i = o ,..., I, where, for each i = 0 , . . . ,I,
<
..., n} with !j = min{(&;,x): (ai,x) - bi < 0 , i = 1 , . . . , m , x E C}, u j = max{(Q$, x) : (ai, x) - bi < 0, i = 1,.. . ,m, x E C} and s i = { y ~ R n : 4 y
Q; is the j t h row of the matrix Qi. Solutions methods for bilinear programming problems of the types (4.27) and (4.28) can be found e.g., in Al-Khayyal and Falk (1983), Floudas and Visweswaran (l99O), Visweswaran and Floudas (1990), Floudas and Visweswaran (1995), Al-Khayyal et al. (1995) and references given therein. In general, the number of additional variables yi in problem (4.27) is n. However, this number can be reduced based on the structures of the matrices Qi, i = 0 , . . . ,I. In Hansen and Jaumard (1992) and Brimberg et al. (2002) techniques for optimizing the number of additional variables are proposed.
3.6
Duality bound techniques
Consider optimization problems given by
122
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where C is a nonempty convex subset of Rn, 11,I2are finite index sets and f , and gi (i E II U 12) are continuous functions on C . For X E RI111+lr21,x E C define the Lagrangian
of problem (4.29) and on the set
define the dual function d: A
-+
R,
X
H d(X)
= inf
xEC
L ( x , A).
Then the dual of (4.29) is defined as the problem
d* = sup{d(X): X E A).
(4.30)
It is clear that for each X E A it holds d(X) 5 f *. Consequently, we have d* I f * , which means that a lower bound of the optimal value of problem (4.29) can be obtained by solving its dual problem (4.30). In general, a duality gap
has to be expected. For some special cases of general quadratic problems described by (4.29) with C = Rn, dual problems can be formulated equivalently as semidefinite programming problems (cf., e.g., Vandenberghe and Boyd, 1996; Novak, 1999, 2000; Shor, 1998; Shor and Stetsyuk, 2002). If we formulate quadratic problems of the form (4.29) as bilinear programming problems of the form (4.28), then it can be shown that dual problems of the resulting bilinear problems are equivalent to linear programs, (cf., e.g., Ben-Tal et al., 1994; Diir and Horst, 1997; Thoai, 1998, 2000, 2002). In general, duality bounds can be improved by adding to the original quadratic problem some suitable redundant quadratic constraints. Examples to this topic can be found in Shor (1998) and Shor and Stetsyuk (2002).
4
General Quadratic Programming
3.7
123
Optimizing quadratic functions over special convex sets
There are three interesting special cases of quadratic programming problems which can be formulated as min{ f (x) : x E F},
(4.31)
where f is a general quadratic function and F is respectively a ball, a simplex or a box in Rn.
Optimizing quadratic functions over a ball. When F is a ball given by (4.32) F = {x E Rn: 11x11 5 r } , where 11 . 11 denotes the Euclidean norm and r > 0, (4.31) is the most important subproblem used within trust region methods for solving nonlinear programming problems. The most procedures for solving this subproblem are developed in connection with trust region methods. A positive semidefinite problem approach for this problem is given in Rend1 and Wolkowicz (1997). A d.c. programming approach can be found in Pham and Le Thi (1998). This is the case Standard quadratic programming problem. where f = xTQx, a quadratic form, and F is the standard simplex defined by
F = { X E R ~ : X ~ + . . . + X , = ~ , X ~ ~..., O Yn). Z = ~ ,(4.33) Applications of (4.32) include the quadratic knapsack problem (cf. Pardalos et al., 1991), the portfolio Section problem and the maximum clique problem. Theoretical results and solution methods for the standard quadratic programming problem can be found in Pardalos et al. (1991), Bomze (1998, 2002), and Novak (1999). The third special Optimizing quadratic functions over a box. problem we want to mention here is the case where F is a box of the form F={x~R~:t,x,u) (4.34) with t and u being given vectors. Methods for solving this special problem include the procedure proposed in Bamze and Danninger (1993), branch and bound algorithms (cf., e.g., Pardalos and Rosen, 1987; Hansen et al., 1993; Le Thi and Pham, 1998; Thoai, 2000; Horst et al., 2000), cutting plane techniques (cf., e.g., Yajima and Fujie, 1998), and heuristic procedures of interior point type proposed in Han et al. (1992).
124
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
References Al-Khayyal, F.A. and Falk, J.E. (1983). Jointly constrained biconvex programming. Mathematics of Operations Research, 8:273-286. Al-Khayyal,F.A., Larsen, C., and van Voorhis, T. (1995). A relaxation method for nonconvex quadratically constrained quadratic programs. Journal of Global Optimiaation, 6:215-230. Audet, C., Hansen, P., Jaumard, B., and Savard, G. (2000). A branch and cut algorithm for nonconvex quadratically constrained quadratic programming. Mathematical Programming, A87: 131-152. Bazaraa, MS., Sherali, H.D., and Shetty, C.M. (1993). Nonlinear Programming: Theoy and Algorithms. 2nd edition, John Wiley & Sons, Inc. Ben-Tal, A., Eiger, G., and Gershovitz, V. (1994). Global minimization by reducing the duality gap, Mathematical Programming, 63:193-212. Bomze, I.M. (1998). On standard quadratic optimization problems. Journal of Global Optimization, 13:369-387. Bomze, I.M. (2002). Branch and bound approaches to standard quadratic optimization problems. Journal of Global Optimixation, 22: 17-37. Bomze, I.M. and Danninger, G. (1993). A global optimization algorithm for concave quadratic problems. SIAM Journal on Optimixation, 32326-842. Brimberg, J., Hansen, P., and Mladenovib, N. (2002). A note on reduction of quadratic and bilinear programs with equality constraints. Journal of Global Optimization, 22:39-47. Diir, M. and Horst, R. (1997). Lagrange duality and partioning techniques in nonconvex global optimization. Journal of Optimixation Theoy and Applzcations, 95.347-369. Falk, J.E. and Hoffman, K.L. (1976). A successive underestimation method for concave minimization problems. Mathematics of Operations Research, I:%-259. Flippo, O.E. and Jansen, B. (1996). Duality and sensitivity in nonconvex quadratic optimization over an ellipsoid. European Journal of Operations Research, 94:167-178.
4 General Quadratic Programming
125
Floudas, C.A., Aggarwal, A., and Ciric, A.R. (1989). Global optimum search for nonconvex NLP and MINLP problems. Computers and Chemical Engineering, 13:1117-1132. Floudas, C.A. and Visweswaran, V. (1990). A global optimization algorithm (GOP) for certain classes of nonconvex NLPs- I. Theory. Computers and Chemical Engineering, 14:1397-1417. Floudas, C.A. and Visweswaran, V. (1995). Quadratic optimization. In: R. Horst and P.M. Pardalos (eds.), Handbook of Global Optimization, pp. 217-269. Kluwer Academic Publihers. Fujie, T . and Kojima, M. (1997). Semidefinite programming relaxation for nonconvex quadratic programs. Journal of Global Optimization, 10:367-380. Geoffrion, A.M. (1971). Duality in nonlinear programming: A simplified application-oriented development, SIAM Review, 13:l-37. Han, C.G., Pardalos, P.M., and Ye, Y. (1992). On the solution of indefinite quadratic problems using an interior point algorithm. Informatica, 3:474-496. Hansen, P. and Jaumard, B. (1992). Reduction of indefinite quadratic programs to bilinear programs. Journal of Global Optimization, 2:4160. Hansen, P., Jaumard, B., Ruiz, M., and Xiong, J. (1993). Global optimization'of indefinite quadratic functions subject to box constraints. Naval Research Logistics, 40:373-392. Heinkenschloss, M. (1994). On the solution of a two ball trust region subproblem. Mathematical Programming, 64:249-276. Hiriart-Urruty, J.-B. (1989). From convex to nonconvex optimization. In: Nonsmooth Optimization and Related Topics, Plenum, New York, pp. 217-268. Hiriart-Urruty, J.-B. (1998). Conditions for global optimization. 11. Journal of Global Optimization, 13:349-367. Hiriart-Urruty, J.-R. (2001). Global optimality conditions in maximizing a convex quadratic function under convex quadratic constraints. Journal of Global Optimization, 21:445-455. Horst, R. and Pardalos, P.M. (eds.). (1995). Handbook of Global Optimization. Kluwer Academic Publishers, Dordrecht.
126
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Horst, R., Pardalos, P.M., and Thoai, N.V. (2000). Introduction to Global Optimization, 2nd edition. Kluwer, Dordrecht, The Netherlands. Horst, R. and Thoai, N.V. (1996). A new algorithm for solving the general quadratic programming problem. Computational Optimization and Applications, 5:39-49. Horst, R. and Thoai, N.V. (1999). DC programming: Overview. Journal of Optimization Theory and Applications, 103:l-43. Horst, R., Thoai, N.V., and Benson, H.P. (1991). Concave minimization via conical partitions and polyhedral outer approximation. Mathematical Programming, 50:259-274. Horst, R. and Tuy, H. (1996). Global Optimization: Deterministic Approaches. 3rd edition. Springer, Berlin. Kojima, M. and T u n ~ e lL. , (2000). Cones of matrix and successive convex relaxations of nonconvex sets. SIAM Journal on Optimization, 10:750778. Kojima, M. and T u n ~ e l ,L. (2000). Discretization and localization in successive convex relaxation methods for nonconvex quadratic optimization. Mathematical Programming, A89:79-111. Konno, H. (1976). Maximizing a convex quadratic function subject to linear constraints. Mathematical Programming, 11:117-127. Le Thi, H.A. and Pham, D.T. (1997). Solving a class of linearly constrained indefinite quadratic problems by d.c. algorithm. Journal of Global Optimization, 11: 253-285. Le Thi, H.A. and Pham, D.T. (1998). A branch and bound method via d.c. optimization algorithms and ellipsoidal technique for box constrained nonconvex quadratic problems. Journal of Global Optimization, 13:171-206. Locatelli, M. and Raber, 17. (2002). Packing equal circles into a square: A deterministic global optimization approach. Discrete Apllied Mathematics, 122:139-166. Lov&sz,L. and Schrijver, A. (1991). Cones of matrices and set functions and 0 - 1 optimization. SIAM Journal on Optimization, 1:166-190. Mangasarian, O.L. (1979). Nonlinear Programming. Robert E. Krieger Publishing Company, Huntington, New York, 1979.
4 General Quadratic Programming
127
Martinez, J.M. (1994). Local minimizers of quadratic functions on Euclidean balls and spheres. SIAM Journal on Optimization, 4:159-176. Mork, J . (1993). Generalizations of the trust region problem. Optimization Methods t3 Software 2:189-209. Novak, I. (1999). A new semidefinite bound for indefinite quadratic forms over a simplex. Journal of Global Optimization, 14:357-364. Novak, I. (2000). Dual bounds and optimality cuts for all-quadratic programs with convex constraints. Journal of Global Optimization, 18:337-356. Pardalos, P.M. (1991). Global optimization algorithms for linearly constrained indefinite quadratic problems. Computers &' Mathematics with Applications, 21237-97. Pardalos, P.M. and Rosen, J.B. (1987). Constrained global optimization: Algorithms and applications, Lecture Notes in Computer Science, vol. 268. Springer, Berlin. Pardalos, P.M. and Vavasis, S.A. (1991). Quadratic programming with one negative eigenvalue is NP-hard. Journal of Global Optimixation, 12343-855. Pardalos, P.M, Ye, Y., and Han, C.G. (1991). Algorithms for the solution of quadratic knapsack problems. Linear Algebra t3 its Applications, 152:69-91. Peng, J.M. and Yuan, Y. (1997). Optimality conditions for the minimization of a quadratic with two quadratic constraints. SIAM Journal on Optimization, 7:579-594. Pham Dinh Tao and Le Thi Hoai An. (1995). Lagrangian stability and global optimality in nonconkx quadratic minimization over Euclidean balls and spheres. Joournal of Convex Analysis, 2:263-276. Pham Dinh Tao and Le Thi Hoai An. (1998). d.c. optimization algorithm for solving the trust region subproblem. SIAM Journal on Optimization, 8:l-30. Phong, T.Q., Pham Dinh Tao, and Le Thi Hoai An. (19%). A method for solving d.c. programming problems. Application to fuel mixture nonconvex optimization problem. Journal of Global Optimization, 6237105.
128
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Raber, U. (1998). A simplicia1 branch and bound method for solving nonconvex all-quadratic programs. Journal of Global Optimization, 13:417-432. Rendl, F. and Wolkowicz, B. (1997). A semidefinite framework for trust region subproblems with applications to large scale minimization. Mathematical Programming, B77:273-299. Sahni, S. (1974). Computationally related problems. SIAM Journal on Computing, 3:262-279. Sherali, H.D. and Adams, W.P. (1990). A hierarchy of relations between the continuous and convex hull representations of zero-one programming problems. SIAM Journal on Discrete Mathematics 3:411-430. Sherali, H.D. and Adams, W.P. (1999). A Refomulation-Linearization Technique for Solving Discrete and Continuous Nonconvex Problems. Kluwer Academic Publihers. Sherali, H.D. and Tuncbilek, C.H. (1995). A reformulation-convexification approach for solving nonconvex quadratic programming problems. Journal of Global Optzmization, 7: 1-31. Shor, N.Z. (1998). Nondijj%rentiable optimization and polynomial problems. Kluwer Academic Publihers. Shor, N.Z. and Stetsyuk, P.I. (2002). Lagrangian bounds in multiextremal polynomial and discrete optimization problems. Journal of Global Optimization, 23: 1-41. Stern, R. and Wolkowicz, H. (1995). Indefinite trust region subproblems and nonsymmetric eigenvalue perturbation. SIAM Journal on Optimi2ation, 5:286-313. Takeda, A., Dai, Y., Fukuda, M., and Kojima, M. (2000). Towards the implementation of successive convex relaxation method for nonconvex quadratic optimization problems. In: P. Pardalos (ed.), Approximation and Complexity in Numerical Optimization: Continuous and Discrete Problems. Kluwer Academic Publishers. Takeda, A., Fujisawa, K., Fukaya, Y., and. Kojima,'M. (2002). Parallel implementation of successive convex relation methods for quadratic optimization problems. Journal of Global Optimization, 24:237-260. Thoai, N.V. (1998). Global optimization techniques for solving the general quadratic integer Programming problem. Computational Optimization and Applications 10:149-163.
4 General Quadratic Programming
129
Thoai, N.V. (2000). Duality bound method for the general quadratic programming problem with quadratic constraints. Journal of Optimization Theory and Applications, 107:331-354. Thoai, N.V. (2002). Convergence and application of a decomposition method using duality bounds for nonconvex global optimization. Journal of Optimization Theory and Applications, 113:165-193. Vandenberghe, L. and Boyd, S. (1996). Semidefinite programming. SIAM Review, 38:49-95. Visweswaran, V. and Floudas, C.A. (1990). A global optimization algorithm (GOP) for certain classes of nonconvex NLPs- 11. Application of theory and test problems. Computers and Chemical Engineering, 14:1419-1434. Yajima, Y. and Fujie, T. (1998). A polyhedral approach for nonconvex quadratic programming problems with box constraints. Journal of Global Optimization, 13:151-170.
Chapter 5
ON SOLVING POLYNOMIAL, FACTORABLE, AND BLACK-BOX OPTIMIZATION PROBLEMS USING THE RLT METHODOLOGY Hanif D. Sherali Jitamitra Desai Abstract
1.
This paper provides an expository discussion on using the Reformulation-Linearization/Convexification(RLT) technique as a unifying approach for solving nonconvex polynomial, factorable, and certain blackbox optimization problems. The principal RLT construct applies a Reformulation phase to add valid inequalities including polynomial and semidefinite cuts, and a Linearization phase to derive higher dimensional tight linear programming relaxations. These relaxations are embedded within a suitable branch-and-bound scheme that converges to a global optimum for polynomial or factorable programs, and results in a pseudo-global optimization method that derives approximate, nearoptimal solutions for black-box optimization problems. We present the basic underlying theory, and illustrate the application of this theory to solve various problems.
Introduction
This paper presents an expository discussion on designing algorithms based on the Reformulation-Linearixation/Convexificaton Technique (RLT) for solving a broad class of nonconvex optimization problems. Although the basic concept of the RLT methodology is rooted in its ability to solve polynomial programming probiems involving a polynomial objective function and constraints to global optimality, this methodology, coupled with supporting approximation schemes, can be extended to solve more general classes of nonconvex optimization problems. In particular, discrete optimization problems such as mixed-integer 0-1 pro-
132
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
gramming problems can be modeled as polynomial programs by representing the binary restriction on any variable x j as a polynomial term x j (1 - xj) = 0, where 0 x j 1. Furthermore, if we consider a general linear bounded discrete optimization problem, where x j can take on several general discrete values (which may or may not even be integral) in a set Sj = {Ojk,k = 1,. . . ,n j ), then we can represent this via an equivalent polynomial constraint, fl?!l (xj - Ojk) = 0. Such a construct enables the use of the RLT methodology developed for polynomial programming problems to be applied to a wide range of discrete optimization problems as well. While specialized developments of the RLT for discrete problems that stem from such reformulations appear in Sherali and Adams (1990, 1994, 1999) and Adams and Sherali (1986), our focus in this paper shall be on solving continuous nonconvex programming problems, including polynomial programs, more general factorable programs, and certain black-box optimization problems. The fundamental RLT approach that lies at the heart of our discussion operates in two phases. Given a nonconvex optimization problem, in the Reformulation phase, we replace nonpolynomial terms appearing in the problem with suitable polynomial approximating functions, and additionally, certain classes of implied polynomial constraints are appended to the problem. Subsequently, in the Linearization/Convexification phase, the resulting polynomial program is linearized by defining a new variable for each distinct variable-product term, and substituting these new variables into the problem. Sometimes, under special circumstances, certain classes of convex constraints are retained to relate the new and original variables in the problem. The resulting higher-dimensional representation yields a linear (or convex) representation of the original nonconvex program. Indeed, employing higher-order polynomial constraints leads to generating a hierarchy of tighter relaxations. A noteworthy feature of the RLT is that the lowest-level relaxation in this hierarchy, which generates polynomials of degree no greater than those of the polynomial terms originally present in the problem during the reformulation phase (or of degree two in the case of linear constraints), has been demonstrated to generate very tight relaxations. This has enabled the solution of computationally intractable nonconvex optimization problems to near optimality, often via a single or few linear programs (LPs). Recent advances in LP technology, capable of solving large-sized linear programs fairly efficiently, and the widespread availability of related commercial software, has provided the facility to solve practical instances of nonconvex optimization problems to a sufficient degree of accuracy with manageable computational effort.
< <
5 Solving Polynomial, Factorable, and Black-box Problems using RLT
133
The remainder of this paper is organized as follows. Section 2 begins with a discussion on the essence of the RLT methodology for solving polynomial programs using 'suitably generated bound-factor products. Certain key insights are identified that prompt partitioning strategies for inducing convergence to a global optimum in a branch-and-bound framework. The basic construct of the RLT methodology for polynomial programs is then generalized for a wider class of factorable programming problems in Section 3, and available computational experience results are summarized. Section 4 then demonstrates the applicability of the RLT to solve various practical problems encountered in diverse areas. Beginning with a general survey in Section 4 of nonconvex problems that can be solved using the RLT, Sections 4.1 through 4.6 delve into some specific applications such as the pipe network design problem, pooling problems, problems encountered in control systems, hard and fuzzy clustering problems, risk management problems, and energy allocation problems in wireless communication systems. The use of semidefinite programming techniques to enhance the existing RLT methodology is addressed in Section 5. Specifically, certain cut generation schemes that could potentially tighten the linear programming relaxations obtained via the RLT are discussed, along with a practical application and some accompanying results. A new class of pseudo-global optimization methods predicated on the RLT procedure is delineated in Section 6 for approximately solving certain black-box optimization problems, along with a practical case study dealing with the design of containerships. Finally, Section 7 concludes the paper and indicates avenues for further research.
2.
Polynomial programming problems
Ever since the advent of the simplex algorithm, linear programming has been extensively used with great success in many diverse fields. When the linear functions in these problems are replaced by more general polynomial functions, a formidable class of nonconvex optimization problems emerges, which can be mathematically formulated as follows: PP(R) : Minimize
{ao(x) : x E Z n R),
where, Z = { x : @,(x) 2 &.,r=l,. . ., R 1 , @ , ( x ) = ~ , , r = R 1 + l , .. ., R), R = { x : O < l j ~ x j ~ u j < o o , j ,..., = l n), and where $(x)
=
a,, tET,
I"
, r = 0 , . . , R.
(5.1)
j€ Jrt
Here, T, is an index set for the terms defining the polynomial function 4,(.),and art a,re real coefficients for the multinomial terms xj,
njEJ,,
134
ESSA Y S AND SURVEYS IN GLOBAL OPTIMIZATION
E T,, r = 0 , . . . , R. Note that a repetition of indices is allowed within Jrt. For example, if Jrt = {1,2,2,3), then the corresponding multinoDenote N = {I, . . . ,n ) , and define N = {N, . . . , N) mial term is x1x~x3. to be composed of 6 replicates of N , where 6 is the maximum degree of the polynomial terms appearing in PP(R). Then, each JTtG N, with 1 1 IJrtl1 6 , f o r t ET,, r = 0 , 1 , ..., R.
t
Polynomial programs have received a considerable amount of attention in the literature and several global optimization solution approaches have been developed with varying degrees of success. Belousov and Klatte (2002) used the result that an orthogonal projection along a recession direction of a convex polynomial set is a convex polynomial set, and described an extension of the Frank-Wolfe theorem to solve this class of problems under the assumption of convexity. An approach for finding a global optimum up to a specified tolerance limit for nonconvex polynomial programs was presented by Li and Chang (1998), making use of the representation of a continuous variable as the sum of a discrete variable and a bounded perturbation, and then linearizing the corresponding terms. Shor (1998) and Floudas and Visweswaran (1990, 1995) demonstrated that polynomial optimization problems can be transformed into quadratic programs that are then amenable to sequential convexificationbased algorithmic manipulations, and thereby solved several examples pertaining to chemical engineering applications, pooling and blending problems, and mechanical engineering problems. Sherali and Tuncbilek (1992, 1997) developed an RLT methodology for solving polynomial programming problems more directly as discussed below, and showed that this yields tighter relaxations than those obtained by applying RLT to a transformed equivalent quadratic program, even if all alternative reformulations of the type used in the aforementioned papers are applied simultaneously. Other techniques (e.g. concave minimization and Lipschitzian optimization) a s described in Horst (1990) and Horst and Tuy (1993), are promising alternatives for appropriate specially structured polynomial programs.
2.1
Basic RLT procedure using bound-factor product constraints
The global optimization algorithm of Sherali and Tuncbilek (1992) for solving the general polynomial program using the RLT approach operates as follows. In the reformulation phase, certain nonlinear implied constraints are generated by taking the products of the bounding terms in R up to a suitable order, and also possibly that of other defining constraints of the problem. The resulting problem is subsequently linearized
5
Solving Polynomial, Factorable, and Blackbox Problems using RLT
135
by defining new variables, one for each nonlinear term appearing in the problem. This constitutes the linearization phase. More specifically, given R, in order to construct the linear programming bounding problem LP(R) using RLT, implied bound-factor product constraints are generated by using distinct products of the bounding factors (xj - lj) >_ 0 and (uj - xj) >_ 0, j E N, taken 6 at a time. These constraints can be expressed as follows:
where (J1 U J2) C N, IJ1U J2(= 6. After including the constraints (5.2) in the problem PP(R), the substitution
-
is applied to linearize the resulting problem, where the indices in J are assumed to be sequenced in nondecreasing order, and where Xti). =.xj, V j E N, and X0 1. Note that each distinct set J produces one d~stinct X j variable. This yields PP(R).
REMARK 5.1 A tighter linear programming representation can be produced by generating additional implied constraints in the form of polynomials having degree less than or equal to 6. This could be done by taking suitable products of constraint factors comprised of the original structural constraints @, (x) - ,8, 2 0, r = 1,. . . , R1, or lower-order polynomial restrictions implied by them, and/or products of bounding factors with such constraint factors wherever possible. In addition, the equality constraints @,(x) = ,8, r = R1 1 , . . . , R, defining Z can be multiplied by sets of products of variables of the type Hi, J l q , Jl C N , so long as the resulting polynomial expression is of degree no greater than 6. In the context of quadratic polynomial programs, Audet et al. (2004, 2000) present various alternative classes of RLT constraints that can be generated in a branch-and-cut framework, prescribing also useful members of each class. While all these RLT constraints can quickly proliferate and encumber the solution of the problem, Sherali and Tuncbilek (1997) describe certain constraint filtering strategies that could be used to retain only those constraints that can potentially contribute to tighten the relaxation, and hence keep the size of the problem manageable. In essence, these filtering strategies examine the signs on the coefficients associated with the linearized terms X j appearing in the original problem in order to ascertain whether each such term X j would tend x j that to overestimate or underestimate the nonlinear product
+
njc
136
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
it represents in the underlying relaxation. Accordingly, the additional RLT constraints that would impose an inequality in the opposite sense are included. For example, viewing all the original constraints as well as the objective function as 2 inequalities, where the latter is represented by z - Qo(x) 2 0 with z to be minimized, if X j appears with a positive (negative) coefficient in any of these constraints, then the filtering mechanism retains from among the new RLT constraints those that impose an upper (lower) bounding type of restriction on X j . In addition, of the filtered-out constraints in this process, certain members are resurrected for inclusion in the model in case they provide the aforementioned type of sign-based support for some new RLT variables that collectively appear in at least 50% of the originally retained RLT constraints. We refer the interested reader to Sherali and Tuncbilek (1997) for further details on this topic. The fundamental results pertaining to the RLT methodology are delineated below. LEMMA5.1 v[LP(R)] 5 v[PP(IZ)]. Moreover, if the optimal solution (x*,X * ) obtained for LP(R) satisfies (5.3) for all J E u ~ ~ U ~ ~ ~ ~ { then x* solves Problem PP(S2).
Proof. See Sherali and Tuncbilek (1992). Notice that LP(S2) does not explicitly contain any constraint that can be generated by constructing products of bounding factors taken less than 6 at a time. Sherali and Tuncbilek (1992) prove that such constraints are actually implied by those already existing in LP(S2). On the other hand, RLT product constraints of order greater than 6 could be generated to further tighten the relaxations, however, at the expense of an increase in problem size. Additionally, we could include nonlinear convex constraints in the relaxation to further tighten the representation, but at the added expense of solving the resulting nonlinear convex programming problems in order to derive lower bounds. (Because of the relative non-robustness of solving nonlinear programs as opposed to linear programs, caution must be exercised to ensure that the solution to the relaxation is indeed optimal, or at least provides a valid lower bound.) The key result that establishes an important relationship between the original and the newly defined variables, XJ, and forms the basis for the partitioning strategy of the branch-and-bound algorithm that induces global convergence is stated below.
5 Solving Polynomial, Factorable, and Black-box Problems using RLT
137
LEMMA5.2 Let ( Z , X ) be a feasible solution to LP(R). Suppose that Zp = lp orZp = up. Then
Proof. See Sherali and Tuncbilek (1992). This result indicates that if a particular variable xp takes on a value equal to either of its bounds in any feasible solution to LP(R), then all the RLT variables that contain this variable faithfully reproduce the nonlinear product that they represent in terms of this variable and the corresponding remnant RLT variable. Hence, if all variables turn out to be at their upper or lower bounds in the solution to LP(R), then this solution would be optimal to PP(R). To solve PP(R) to global optimality, LP(R) is embedded in a branchand-bound algorithm to compute lower bounds on the underlying polynomial program. A procedure of this type proposed by Sherali and Tuncbilek (1992, 1995) essentially involves the partitioning of the original set R into sub-hyperrectangles, each of which is associated with a node of the branch-and-bound tree. Let $2' 2 R be any such partition. Then, LP(R1) gives a lower bound to the problem PP(R1). In particular, if (3,X) solves LP(R1) and satisfies (5.3) for all J E uLoU t E ~ ,{Jrt), then by Lemma 5.1, 2 solves PP(R1), and being feasible to PP(R), the value u[LP(R1)]= u[PP(R1)]provides an upper bound for the problem P P ( 0 ) . Hence, we have a candidate for possibly updating the incumbent solution x* and the corresponding objective function value v* for PP(f2). (Alternatively, a local search heuristic could be invoked at each node to derive feasible upper-bounding solutions and possibly update incumbent solutions.) In any case, if u[LP(R1)]2 v* - E (where E 2 0 is some desired optimality tolerance), we can fathom the node associated with 0'. Hence, at any stage s of the branch-and-bound algorithm, we have a set of non-fathomed or active nodes indexed by q E Qs, say, each associated with a corresponding partition Rq 2 R. For each such node, we will have computed .a lower bound LBq via the solution of the linear program LP(R9). As a result, the lower bound on the overall problem PP(R) at any stage s is given by LB(s) = min{LBq: q E Qs). Whenever the lower bounding solution to any subproblem turns out to be feasible to PP(R), we update the incumbent solution x* and the corresponding objective function value v*, as mentioned above. Additionally, if LBq.2 v* - E, we fathom node q. Hence, all the active nodes satisfy LB, < v* - E, 'dq E Q,, at each stage s. We now select an active node q(s) that yields the least lower bound among the nodes q E Q,, i.e., for
138
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
which LB,(,) = LB(s), and proceed by decomposing the corresponding ~ ) two sub-hyperrectangles based on a branching hyperrectangle ~ q ( into variable xp selected according to the following rule. This partitioning rule is geared towards identifying the variable that contributes the most to the discrepancy between a new RLT variable that contains it and the associated corresponding nonlinear product that this RLT variable represents. The motivation is to drive all such discrepancies to zero, by creating partitions that would induce the variables to achieve their bounds (see Lemma 5.2). Accordingly, we select a branching variable xp such that p E argmaxj,~{ej), where
THEOREM 5.1 (MAINCONVERGENCE RESULT) The above algorithm (run with c = 0, which ensures exact optimality) either terminates finitely with the incumbent solution being optimal to PP(R), or else an infinite sequence of stages is generated such that along any infinite branch of the branch-and-bound tree, any accumulation point of the x-variable part of the sequence of linear programming relaxation solutions generated for the node subproblems solves PP(R). Proof. See Sherali and Tuncbilek (1992).
0
As alluded above, Sherali and Tuncbilek (1997) explore an alternative RLT approach to solve polynomial programs, and compare the tightness of the resulting relaxations. In this approach, a transformation is first used to quadrify problem PP(Q), i.e., to transform PP(R) into an equivalent quadratic polynomial program as in Shor (1998) and Floudas and Visweswaran (1990, 19%). Consider the set
n
and
aj 5
s),
where 23 denotes the set of nonnegative integral n-tuples, and where s j , j = 1 , . . . ,n are some specified bounds on the exponents for X j that appear in the given polynomial program. A new variable R[a]is defined for each a E A. Using to represent the multinomial term xa = 7i~=~x,a'
5
Solving Polynomial, Factorable, and Black-box Problems using RLT
139
these variables R[a], a E A, an equivalent quadrified polynomial program is defined. In this quadrification process, each polynomial term is successively represented as a product of exactly two R [ . ] variables. Note that such a representation scheme is not unique. In fact, Sherali and Tuncbilek (1997) prove that applying RLT directly to the original polynomial program provides a tighter representation than applying it to an equivalent quadrified problem, even if this latter problem simultaneously includes all possible quadrifying representations of this type. However, applying RLT to some quadrified problems can yield less computationally burdensome relaxations. Sherali and Tuncbilek (1995,1997) provide several such insights and strategies into the design of suitable RLT strategies that can be gainfully applied to solve challenging problem instances, and report related computational results. Sherali (1998) further extended the RLT to handle polynomial programming problems having rational exponents. A branch-and-bound algorithm was developed as described above based on a sequence of relaxations over partitioned subsets of R in order to find a global optimal solution. In this context, an additional initial step was introduced at each iteration in the regular RLT process that constructs an approximating polynomial program having integral exponents by using affine convex envelopes for the rational exponent terms. The idea behind this RLT methodology was subsequently further refined and extended by Sherali and Wang (2001) to solve a more general class of nonconvex problems known as factorable programming problems. This is expounded in detail in the following section.
3.
Factorable programming problems
Consider the following mathematical formulation of a factorable program. FP(R) : Minimizefo (x) (5.6) subject to: fi(x) 5 pi, for i = 1,.. . ,m, X E R = { X : O < ~ ~ ~ Xf o~ r ~j EUN ~~ (<1..., ~ , ~n ),) , where fi(x) =
ait tETi
fitj(xj)
aitfit(x)
J^EJ~~
for i = 0 , 1 , ...,m.
(5.7)
Here, fi(x) is a nonconvex factorable function that is stated as a sum of terms aitfit(x), indexed by t E Ti. For each term, t E Ti, fit(.) is a product of (twice continuously differentiable) univariate functions fitj(xj) of X j , indexed by j E Jit 5 N. The name factorable program
140
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
denotes this (latter) property whereby each nonlinear term in the problem can be factored into a product of univariate functions. Obviously, this subsumes the class of polynomial programming problems. While several special cases of factorable programs such as nonconvex multiplicative programs (Konno and Kuno, 1995), and separable nonconvex programs (Falk and Soland, 1969; Shectman and Sahinidis, 1996), have been discussed in the literature, global optimization algorithms for general problems of the type FP(R) have not received much attention. The introduction of this class of nonconvex factorable programs is credited to McCormick (1976), who proposed a first global optimization algorithm based on an inductive convex envelope construction process. Subsequently, Sherali and Wang (2001) devised an RLT-based global optimization procedure for solving FP(f2) by introducing an additional approximation step in the reformulation phase. This step generates a tight nonconvex lower bounding polynomial relaxation by using suitable polynomial. functions to bound the nonpolynomial terms in FP(R). Then, in the spirit of Sherali and Tuncbilek (1992), an LP relaxation is constructed for the resulting polynomial program via the RLT methodology. More specifically, the procedure developed for this class of problems is a branch-and-bound algorithm that employs a three-phase process for generating lower bounding relaxations, and accordingly, adopts a suitable partitioning strategy that induces convergence to a global optimal solution. The initial phase of the lower bounding step involves generating a tight, nonconvex polynomial programming representation using the Mean-Value Theorem or Chebyshev interpolation (see Volkov, 1990). Subsequently, over the next two phases, an LP relaxation is constructed for the resulting polynomial program via an RLT process. Thus, there are two levels of relaxations involved at each node of the branch-andbound tree. The motivation here is to generate tight nonconvex polynomial approximations for the nonpolynomial terms in the problem, and then to rely on the demonstrated ability of the RLT to generate a tight LP relaxation in a higher dimensional space for the resulting polynomial program. Note that the availability of effective and stable LP solvers is an added advantage that lends robustness to this process. A suitable partitioning scheme is coordinated with the lower bounding procedure so that the gaps from the above two levels of approximations approach zero in the limit. The special features of the approximating polynomials also ensure that these polynomials simultaneously converge to the original nonconvex factorable functions in the limit, thereby ensuring convergence to a global optimum.
5 Solving Polynomial, Factorable, and Black-box Problems using RLT
141
For these conditions to be satisfied, the problem structure and univariate functions that define the nonlinear factorable terms in the problem are recast to satisfy the following basic properties or assumptions. For each term t E Ti, i E {O,1, . . . ,m), if this term involves a nonpolynomial function fitj(xj), then we assume that 1 Jitl = 1. Note that this can be readily achieved by substituting a new variable yitj in place of any nonpolynomial factor fitj (xj), and then explicitly incorporating the constraint yitj = fitj(xj). Moreover, it is assumed that such a function fitj(xj) is twice continuously differentiable on Rj = [Ij, uj], and that R. there exists a polynomial function git3 (xj), which might depend on Rj, such that for all x j E Rj,
Observe that by isolating distinct nonpolynomial terms as above, we can conveniently adopt multiple polynomial bounding functions in order to derive potentially tighter relaxations, without generating a combinatorial number of constraints. Furthermore, in each case, for any nested sequence {a!) -+ Ri, where k is an index for the elements of the sequence, and where n$ = [l!, u$] and Rj = [Ij, uj], the corresponding Rk
sequence of bounding functions git; (xj), x j E R,! thus created must satisfy the following property. If lj < uj, we must have as k -+ oo,
R,
where git3(xj) is a corresponding bounding function for fitj (xj) on Rj. On the other hand, if lj = u j , then for any sequence {x$) -+ lj = u3,' where x! E R!, Vk, we must have as k -+ oo,
r sit; (
~ )3
+
fitj (1,)
n, For ease in presentation, let us define gitj (xj) = fitj(xj) for the case of polynomial factors fitj(xj), for each j E Jit,t E Ti, i E {O,1, . . . , m). Obviously, this definition satisfies the foregoing assertions. For the case of nonpolynomial univaxiate functions, Sherali and Wang (2001) describe two methods for constructing such polynomial approximating functions based on the Mean-Value Theorem and Chebyshev interpolation. In both these methods, the order of the approximating polynomial utilized
142
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
is a user-defined feature. Naturally, a higher order polynomial approximation would yield a more accurate representation but would lead to a larger sized relaxation. Generally speaking, if the original problem includes polynomial terms of a certain order, then an approximation of a similar order would prove to be useful, since in any case, the subsequent RLT process would need to be designed for polynomial terms of this order. It is instructive to note here that in some application contexts, the original problem could itself have relatively high-order polynomial terms that render the RLT process untenable due to the resulting size of the generated LP relaxations. In such a case, good, relatively lower-order approximation schemes could be utilized for even the polynomial terms in FP(R). Amongst the available methods for approximating high-order polynomials, the notable ones are the Mean-Value Theorem, Chebyshev interpolation, and Newton polynomials (Volkov, 1990). Of recent flavor, are techniques that arise from numerical analysis, mainly pseudopolynomial approximation methods such as spline interpolation, kriging, and other basis function approaches (see Jones, 2001). The main computational requirement in this process of generating bounding polynomial functions is to derive a bound on the highest ordered derivative term used in the approximation for the given univariate function over an interval. The classes of problems for which this method is most well suited are precisely those for which such bounds can be readily computed. Problems of this type include ones in which the nonpolynomial univariate functions are trigonometric, signomial, or various transcendental functions such as exponential or logarithmic functions, among others. Observe that for more complex cases where bounds on such derivative terms for a function fitj(xj) are not convenient to derive for each subinterval of Rj corresponding to each node of the branchand-bound tree, it is valid to simply compute this bound at the very beginning for the original interval Rj and use this throughout the process. R Now, having generated the bounding polynomial functions g,,; (xj) for fitj(zj)on Rj = jlj,uj], for i E {O,l,. . . , m ) , t E Ti, j E Jit,let US define
Thus, noting (5.7) and (5.8): we have fi(x)
r C sit
n
9% (xj) =
xmfl'") @PcR' (x) =
(r),
5 Solving Polynomial, Factorable, and Black-box Problems using RLT
143
Consequently, we derive the following polynomial programming relaxation of Problem FP(R). FPR(R) : Minimize {@:'"'
(x) : @'"':
<
(x) pi, i = 1 . . m , E a}.
(5.13)
Note that u[FPR(R)] provides a lower bound on u[FP(R)]. Having constructed the polynomial lower bounding problem FPR(R) (5.13), we can now apply the RLT scheme of Sherali and Tuncbilek (1992) as discussed in Section 2 to generate a suitable linear programming relaxation. Aclet @i'")(x) denote the cordingly, for any (i, t ) ,i E {O,1, . . . ,m), t E Ti, linearization of @ )": (x) via (5.3), and let @f"' (x) &Ti @")'; (x). From (5.13), this yields the following RLT linear programming relaxation.
-
<
LP(R) : Minimize {@f("'(x) : @"*) (x) Pi, Vi = 1 , . . . , m, ( V I ) L ,x E R),
(5.14)
where ( V I ) L denotes the linearization under (5.3) of additional RLT valid inequalities (including (5.2)) that might have been generated. Therefore v[LP(R)] provides a lower bound on v[FPR(R)], and hence on v [FP(R)]. Similar to the approach adopted in Section 2, Sherali and Wang (2001) proposed a branch-and-bound approach based on partitioning the set R into sub-hyperrectangles, each associated with a node of the enumeration tree. Various branching rules that adopt a philosophy akin to that of (5.5) were developed for effectively reducing the maximum discrepancy between the RLT variables and the corresponding nonlinear terms that they represent. These rules are critical to the theoretical convergence and computational performance of the algorithm. A convergence result identical to that of Theorem 5.1 was shown to hold true for FP(R). Moreover, promising computational results were reported on a collection of fifteen chemical process and engineering design problems taken from various sources in the literature. For nine of these cases, the LP solution obtained at the initial node produced tight lower bounds and provided warm starts for MINOS that resulted in the optimal solution being detected (although yet to be proven) at node zero itself. With an accuracy tolerance of 1%; five of the fifteen nonconvex problems were solved to near-optimality at node zero via a single linear program (three such cases also occurred when using a tolerance of 0.0001%). For the remaining problems where the initial node produced a lower bound that still left a gap to be resolved (typically, problems having trigonometric,
144
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
logarithmic, and high-order polynomial terms that create steep valleys), this gap was quickly reduced by a few branching operations. Also, in particular, for two structural sensitivity test cases taken from Floudas and Pardalos (1990), the proposed algorithm produced solutions having significahtly improved values than previously reported, while finding an optimal solution.
4.
Practical applications of nonconvex optimization problems
Nonconvex optimization problems arise in a host of applications such as pipe network design, pooling and blending problems, problems encountered in control systems, hard and fuzzy clustering problems, risk management problems, energy allocation problems in wireless communication systems, transportation engineering, airline scheduling, etc. The underlying structure of many of these nonconvex problems conforms with that of polynomial or factorable programming problems, thereby facilitating an application of the RLT methodology discussed in Sections 2 and 3. In this section, we briefly address six such application studies. Note that for each described application, we explain the basic problem structure and identify the main source of nonconvexity in the problem, providing certain key insights into solving these problems via certain tailored RLT-based approaches. For further details regarding problem formulations, algorithmic developments, and implementation strategies and results, we refer the interested reader to the main references cited for each application.
4.1
Design of water distribution systems
The design of water distribution systems has consistently received a great deal of attention in the literature because of its importance to society (see Walski, 1984, for a review). An important component of designing a cost effective water distribution network system or extending a pre-existing network is to determine the sizes of the various pipes that are capable of satisfying the flow demand, in addition to satisfying the minimum pressure head and hydraulic redundancy requirements. The potential pipe network can be modeled as a graph G(N, A), where N is comprised of a, set of reservoirs or supply nodes along with a set of consumption or demand nodes, and (i, j) E A denotes a new or existing pipe that connects a certain designated node pair i and j, where i, j E N , i < j . Associated with each link connecting node pairs (i, j) E A is the decision variable qij that represents the flow rate in m3/hr, and the parameter Lij that denotes the corresponding pipe
5 Solving Polynomial, Factorable, and Black-box Problems using RLT
145
length in meters. Moreover, each link needs to be constructed using pipe segments of lengths having standard available diameters, chosen from the set i d k ,k = 1, . . . ,K ) (in inches). Accordingly, let the decision variable xijk denote the length of the pipe segment of diameter dk that is used to design the link (i,j ) E A. The key source of nonconvexity, and consequent problem difficulty, arises in representing the flow equilibrium condition via the head-loss constraints. Specifically, the pressure or head-loss in a pipe due to friction (in meters), for any link (i, j ) , can be described by the empirical Hazen-Williams equation, given by
where CHw(ijk)is the Hazen-Williams coefficient based on the roughness and diameter of the pipe used to construct link (i, j ) . Due to the nonconvex nature of this term, which appears in the constraints that essentially enforce that the net head-loss over any loop in the network is zero, the design problem turns out to be a hard nonconvex optimization problem having a number of local optima. Sherali et al. (2001) propose a tight lower bounding relaxation via the RLT constructs of Section 3, and embed this in a branch-and-bound scheme based on an effective branching variable strategy that induces global convergence. Using this approach, Sherali et al. (2001, 1998) have determined global optimal solutions for several pipe network design problems in the literature that have been hitherto unsolved for two-three decades.
4.2
Hard and fuzzy clustering problems
The field of cluster analysis is primarily concerned with the sorting of data points into different clusters so as to optimize a certain criterion. There are essentially two types of clustering problems addressed in the literature: hard clustering, where each data point is to be assigned to exactly one cluster (refer Spath, l98O), and fuxxy clustering where the data points are assigned grades of membership on [ O , l ] with respect to different identified clusters (see Hoppner et al., 1999). The hard clustering problem can be defined afactorables follows. Given a set of n data points, each having some s attributes, we are required to assign each of these points to exactly one of some c clusters (where c is given), so as to minimize the total squared Euclidean distance between the data points and the centroid of the clusters to which they are assigned. That is to say, if data point i, having a location descriptor ai E RS is assigned to cluster j having a to-be-determined centroid
146
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
zj E RS,then the associated penalty is assumed to be proportional to the square of the straight line distance separation between ai and zj in RS. This results in an objective function, given by C7=lCy=lwij [lai- zj/I 2 , where wij is a binary variable that takes on a value of 1 if data point i is assigned to cluster j , and 0 otherwise. The product of the w- and z-variables in this function render the problem nonconvex in nature, and difficult to solve to global optimality. Sherali and Desai (2004) have designed an RLT approach for this hard clustering problem that includes the generation of additional valid inequalities based on approximations to the convex hull of the data points, along with symmetry-defeating strategies. A tight equivalent 0-1 linear mixed-integer programming representation is derived and a specialized branch-and-bound algorithm is designed to determine a global optimal solution. Results based on computational experiments performed using standard as well as synthetically generated data sets establish the efficacy and robustness of the proposed approach, in contrast with the popular k-means algorithm (Forgy, 1966; McQueen, 1967), as well as in comparison with the global optimization package BARON (see Sahinidis, 1996). Note that in practice, cluster analysis problems can involve very large data sets, and the results in this work suggest that designing heuristic methods based on constructs that are borrowed from strong effective exact procedures might be a prudent approach for addressing such problems. Continuing in t,he same vein as in the case of hard clustering, Sherali and Desai (2004) have proposed an RLT-based optimization approach to solve the fuzzy clustering problem where the objective function in wiIla, - zj/12 based on a quadratic dethis case is given by gree of fuzziness (see Kame1 and Selim, 1994). Sherali and Desai have shown that this problem can be equivalently reduced to a cubic nonconvex polynomial program, for which they have designed a specialization of the RLT methodology of Section 2 in concert with additional valid inequalities and symmetry-defeating constraints. Promising computational results, similar to those for the hard clustering problem, are reported.
xy==i zJC=l
4.3
Linear control of systems
The control of linear systems with uncertain physical parameters has been a main subject of research in the field of control engineering over the last two decades (see Ackermann, 2002). One approach to deal with this problem is to study the behavior of the characteristic equations of these systems, and in particular, to study the effects of uncertain
5
Solving Polynomial, Factorable, and Black-box Problems using RLT
147
parameters on the location of the roots of these polynomials, and thence on the stability and performance of the system. A fundamental problem in such a robust control context is the calculation of stability margins for parameter perturbations, i.e., determining the maximum allowable perturbation in uncertain parameters of a stable system without losing stability. Here, stability is defined with respect to an arbitrary region D in the complex plane, and is referred to as "D-stability". Given that the coefficients are polynomial functions of uncertain parameters, the computation of these stability margins proves to be a challenging task. To compute such D-stability margins, Bozorg and Sherali (2003) have demonstrated that this problem can be equivalently reduced to the form of minimizing C r = l ck lqk - & lP, where qk, k = 1,. . . ,m, are the uncertain system parameters that are to be determined, &, k = 1 , . . . ,m, are given nominal target values of these parameters, ck > 0, k = 1 , . . . , m , are prescribed priority weights, and 1 5 p < co is a selected lp-distance based separation measurement parameter. The constraints represent stability in the complex plane in terms of the related real and imaginary parts, and turn out to be polynomials in the system parameters ql, . . . , q,. Reformulating the absolute-valued objective terms, Bozorg and Sherali (2003) have developed various alternative tailored global optimization procedures for solving the resulting nonconvex polynomial programming problem for different values of the parameter p, using the RLT constructs described in Sections 2 and 3. Several open test cases from the literature have been solved using this methodology to demonstrate its efficacy.
4.4
Risk management with equity considerations
Risk management is primarily concerned with the allocation of resources to attenuate the probability of a risk occurring or to mitigate the consequences of hazardous events that might have already occurred. In this context Sherali et al. (2003a) have developed a decision support system to assist emergency planning agencies to respond quickly, effectively, and equitably to a set of potential hazardous situations. More specifically, consider a situation in which an emergency is underway and critical response decisions must be made. In order to mitigate the hazards, the emergency manager would typically call into play a variety of available resources. Let i E I index such a set of resources, and let bi (in appropriate units) denote the available level of resource z E I. Also, let us suppose that the affected region has been partitioned into areas (indexed by j E J), and let k E K j index the set of hazards affecting area j that the emergency manager needs to ad-
148
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
dress for each j E J . Suppose further that certain ratings rjk E [I, 101, b'j E J, k E K j , have been determined to reflect the relative importance of responding to the situation created by hazard k in the affected area j . Now, in each area j E J, let the unmitigated risk associated with hazard type k E K j be denoted by a j k , and assume that when an amount xijk of resource i is allotted to hazard type k in area j, it serves to attenuate the risk a j k by an exponential factor e-oikxijk, for some suitable parameter value Pir, 0, reducing it to ajke-@ikxvk. Hence, the overall attenuation of oljk due to all resource applications is given by q k e-Akxijk = ajke-xi~~oikxzjk. Therefore, the weighted Bkx~*, mitigated risk in area j is given by Rj CkeKj rjkajkeP and thus, the overall system weighted mitigated risk turns out to be R= ~jkajke-'~~' oikXij*.In order to accommodate a relative degree of equity among the affected areas while allocating the limited available resources, Sherali et al. (2003a) also proposed to simultaneously minimize the sum of absolute deviations CjEJ I'yj - 71, where 'yj = Rj/ CkEECJ rjkajk is the overall risk attenuation factor for area j, and represents the mean weighted attenuation factor. Furthermore, note that it is possible to have more than one solution that yields the same value for the total absolute deviation CjeJI'yj - "/, but the maximum spread might be more in one case than in the other. Hence, in addition to a total deviation term, it is necessary to minimize the maximum inequity given by maximumjEJ{yj -?) (refer Sherali et al., 2003b) for further discussion). Accordingly, the following objective function was prescribed:
>
niEI
xjtJc~~~~
>
>
where the factors po O and p~ 0 compromise appropriately between the two objectives of reducing the overall system weighted mitigated risk, R, and achieving an acceptable degree of equity in this process. Observe that since the second term in the objective function essentially involves an absolute difference of two convex functions of x = (xijk,b'i, j, k), this is a nonconvex programming problem. Sherali et al. (2003aj equivalently transformed the above objective function into a form that is more amenable to algorithmic manipulation, and developed a specialized branch-and-bound procedure similar in concept to the approach described in Section 3 to determine a global optimum. Various alternative branching strategies that preserve the convergence characteristics of the algorithm were also explored. Computational results demonstrated that the proposed algorithm more robustly
5 Solving Polynomial, Factorable, and Black-box Problems using RLT
149
yields global optimal solutions in comparison with the commercial global optimizer BARON (see Sahinidis, 1996).
4.5
Energy allocation in wireless communication systems
Wireless sensor networks have received a considerable amount of attention over the last few years due to their significant contemporary impact. In particular, the two-tiered network architecture has found a number of applications, in both traditional surveillance and national security related contexts (see Akyildiz, et al., 2002). This framework is comprised of a number of sensor clusters and a base-station. Each cluster is deployed around a strategic location and consists of several Application Sensor Nodes (ASN) and one Aggregation and Forwarding Node (AFN). The function of each ASN is to capture and transmit collected local information to the associated AFN, and the AFN performs the intra-network/cluster processing by aggregating all the information collected and transmitting this information to the base-station via a single or multi-hop transmission. An important benchmark to measure the performance of these networks is the network lifetime. In certain situations, when the existing network lifetime is inadequate to meet the required mission's time duration, it can be prolonged by provisioning additional energy to existing AFNs in the network. Hou et al. (2003) have extended this issue by considering the deployment of additional relay nodes (RNs) at certain locations in the network, along with provisioning energy to existing AFNs, in order to maximize the network lifetime. Given a pool of energy E and M RNs, the joint energy provisioning and relay node problem (EP-RNP) formulated in Hou et al. (2003) considers allotting the total amount of energy E into M portions that are ascribed to either AFNs or RNs such that the network lifetime is maximized, subject to flow balance equations and energy constraints for each node in the network, while ensuring that energy is provisioned to at most M RNs and that flow occurs only through established RNs. This resulting formulation turns out to be a mixed-integer nonlinear programming problem (MINLP), and the main source of nonconvexity arises from variable-product terms appearing in the above mentioned constraints. Noting that MINLPs are NP-complete in general, Hou et al. (2003) proposed certain automatic reformulation techniques, and adopted a heuristic algorithm to solve the EP-RNP. Computational results indicate that the proposed heuristic approach yields a relatively superior performance as compared with other existing approaches.
150
4.6
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Pooling problem
The pooling problem is a nonlinear network flow problem that models the supply chain of petrochemicals, wherein different crude oil raw materials i = 1 , . . . ,I are composed to produce intermediate quality streams or pools 1 = 1 , . . . , L, which are then blended at retail locations to produce various final products j = 1 , . . . , J. Audet et al. (2000) and Tawarmalani and Sahinidis (2002a,b) discuss this problem in detail and analyze alternative model formulations and solution methods. In particular, Tawarmalani and Sahinidis (2002b) consider the q-formulation due to Ben-Tal et al. (1994) that is predicated on the principal set of variables q = (qil,i = 1,. . . , I,1 = 1 , . . . , L), where qil represents the fraction of the total output from pool 1 that results from the input provided by the raw material source i . Hence, denoting the flow of raw material i into pool 1 by xil, and the flow from pool 1 to constitute product j by ylj, we have that xi1 = qil J y ~b'i, = 1, . . . ,I,1 = 1 , . . . ,L, where
The foregoing product relationship between the flows gives rise to the underlying bilinear nonconvex structure in the problem. Tawarmalani and Sahinidis (2002b) have shown that by applying RLT to this overall quadratic polynomial q-formulation, and in particular, including the RLT constraint obtained by multiplying (5.15) by y~ and linearizing (upon substituting vilj = qilylj, b'i, 1, j) results in a very tight representation. As a result, all pooling test problems from the open literature were trivially solved 'by a branch-and-bound algorithm based on these relaxations. Likewise, Audet et al. (2004) have reported considerable success in solving large-scale pooling problems by applying their RLTbased branch-and-cut algorithm, which is described in Audet et al. (2004, 2000), to various flow-based and proportion-based models, as well as to a hybrid of these two models.
5.
Enhancing RLT via semidefinite cuts
The RLT is a unifying approach for solving certain discrete as well as continuous factorable nonconvex optimization problems via a process of generating suitable valid inequalities in a higher dimensional space in order to derive tight effective relaxations. In keeping with this underlying spirit, the RLT methodology could be further enhanced by incorporating additional valid inequalities that are obtained via semidefinite pro-
5 Solving Polynomial, Factorable, and Black-box Problems using RLT
151
gramming constructs (see Vandenberghe and Boyd, 1996, and Helmberg (2002)). To illustrate the underlying concept here, consider the quadratic polynomial programming problem. Note that the new RLT variables in this context are represented by the n x n matrix X = [xxTIL, where [.ILrepresentsthe linearization of the expression [ . I under the substitutions (5.3). Observe that since xxT is symmetric and positive semidefinite (denoted k 0), we could require that X k 0, as opposed to simply enforcing nonnegativity on this matrix. In fact, a stronger implication in this same vein is obtained by considering
and requiring that MI k 0. In lieu of solving the resulting semidefinite programming relaxations, which would detract from the robustness and efficiency that accrues from relying on LP relaxations, Sherali and Fraticelli (2002) have proposed the use of a class of RLT constraints known as semidefinite cuts that are predicated on the fact that
Accordingly, given a certain solution (3, r?) to the RLT relaxation of the underlying quadratic polynomial program for which x # ?gT (i.e., the condition of Lemma 5.1 does not hold true), Sherali and Fraticelli (2002) invoke (5.17) to check in polynomial time having a worst-case evaluates M1 at 2 0, where complexity 0 ( n 3 ) whether or not is not positive semidefinite, they the solution (3,X). In case that show that this process also automatically generates an d E Rn+l such that d T ~ l < d 0, which in turn yields the semidefinite cut
Several alternative polynomial-time schemes for generating rounds of cuts (5.18) based on suitable vectors d are described and are computationally exhibited to yield a substantial reduction (by a factor of 2-3) on the class of quadratic programs tested, in comparison with an RLT approach that does not employ such cuts. We mention here that Konno et al. (2003) have proposed using a similar cut of the type tiTX& 2 0, where 13 E Rn is the normalized
152
ESSA Y S AND SURVEYS IN GLOBAL OPTIMIZATION
eigenvector corresponding to the smallest eigenvalue of the matrix X, given that X is not positive semidefinite. However, computing & can be relatively burdensome, and moreover, the round of cuts (5.18) generated for the augmented matrix M1 can yield potentially tighter relaxations. As a precursor of this approach, Shor (1998) had recommended examining the minimum eigenvalue of X as a function X1 ( X ) , where X1 ( X ) m i n { a T x a : [[all= I ) by the Raleigh-Ritz formula. Noting that X1(X) is a concave, but nondifferentiable, function of X that is characterized by the minimum of a family of linear functions, Shor developed a nondifferentiable optimization approach for quadratic programs in which X1(X) 2 0 is incorporated within the model formulation in lieu of X 2 0. Using a similar thrust, Vanderbei and Benson (2000) proposed an alternative, smooth, convex, finite linear programming representation 0 if and only if it can be for the relationship X 2 0, noting that X factored as X = L D L ~where , L is a unit lower triangular matrix and D is a nonnegative diagonal matrix. Denoting dj(X), j = 1 , . . . ,n, as the diagonal elements of D for a given n x n symmetric matrix X , Vanderbei and Benson showed that dj(X) is a concave, twice differentiable function on the set of positive semidefinite matrices. Accordingly, they replaced X 2 0 by the nonlinear, smooth (though implicitly defined) constraints dj(X) 2 0, j = I , . . . ,n, and developed a specialized interior point algorithm to solve the resulting problem. As a further generalization of the RLT procedure, Lasserre (2001, 2002) discussed the generation of tight relaxations for polynomial programming problems using linear matrix inequalities (LMIs). In the spirit of (5.16), let us define x( as the augmentation of the vector x(l) with m, all quadratic terms involving the x-variables, then all such cubic terms, and so on until all possible multinomials of order m. Accordingly, define the moment matrix T Mrn [ ~ ( r n ) x ( ~ ) I ~ + (5.19)
-
Then, for the case of an unconstrained polynomial program of the type: Minimize {Qo(x): x E Rn), where Qo(x) is a polynomial of degree 6, Lasserre (2001) considered the relaxation R, given below, where m 2 [6/21, and where 8(x) a' - I I X ~ ( ~ , with a > 0 being the radius of a ball that is known to contain an optimal solution.
-
R,:
>
Minimize {[QO(x)jL:M, 2 0, [ ~ ( X ) X ( T~ - ~ ) X (0). ~ - ~(5.20) )]~
Lasserre (2001) proved that R, is asymptotically exact in that as m -+ oo, the optimal value of R, approaches that of the underlying polynomial program. In fact, for a univarzate polynomial program to b, where ~ o ( x is ) of odd degree minimize Qo(x) subject to a 5 x
<
5 Solving Polynomial, Factorable, and Black-box Problems using RLT
153
+
2m 1 (some additional manipulation is required for even degree problems), Lasserre (2002) showed that the optimal value is recovered via the semidefinite program Minimize {[@o(x)]L : [(x - U ) X ( ~ ) X T t ~ )0, ]L
[ ( b- x)x(m)xTm)]~ k 0). (5.21) For multivariate constrained polynomial programming problems of the type minimize { Q ( x ) : @, (x) 2 P,, r = 1,. . . , R), having a degree 6, a similar relaxation to that in (5.20) was generated as follows, where m 2 p/21.
where 6, is the degree of iP,(x), V r = 1,. . . , R. Under certain stringent conditions on the feasible region,Lasserre (2001, 2002) exhibited that the relaxation @EXm becomes asymptotically exact as m -+ oo. Lasserre (2002) has also discussed the pros and cons of using semidefinite programming versus LP relaxations in solving polynomial programs. He concludes that although semidefinite representations are more theoretically appealing, the LP relaxation based approaches have more practical significance due to the availability of effective software capable of handling large-scale problem instances. Further discussions on semidefinite programming relaxations versus LP relaxations are provided in Laurent and Rend1 (2002). Based on the approach and experience of Sherali and Fraticelli, and based on the results embodied by the relaxations (5.20), (5.21), and (5.22), note the corresponding LMIs in these problems can be replaced by associated semidefinite cuts of the type (5.18). In cases where the dimension of the matrices in these LMIs might be too large to be computationally manageable, we could indeed generate semidefinite cuts predicated on the relationship [vvTIL 0, where v is a vector of suitable multinomial terms involving certain key x-variables alone. We are presently exploring the generation of such socalled v-semidefinite cuts based on appropriately composed vectors v, and results from this investigation will be forthcoming. In closing this section, we comment on a particular practical case study involving the location of Automatic Vehicle Identification (AVI) devices in a transportation network, where the semidefinite cuts (5.18) were shown to significantly enhance the effectiveness of the proposed algorithm. An AVI system relies on a combination of passive tags attached to vehicles and electronic interrogators (or readers) mounted on
>
154
ESSA Y S AND SURVEYS IN GLOBAL OPTIMIZATION
overhead structures. The system detects the passage of a vehicle at each reading location within the network by monitoring the signal sent by the interrogator's antenna. A vehicle is detected each time an altered signal is reflected back to the antenna. Since each vehicle tag returns an unique signal, specific vehicles can then be identified. The basic problem is to then optimally locate the AVI tag readers for the purpose of estimating dynamic roadway travel times. Sherali et al. (2003b) have modeled this Reader Location problem (RL) as a 0-1 quadratic program based on binary variables that represent whether a reader is located or not at each of the potential nodes of a graph that is constructed from the underlying transportation network. The quadratic objective function seeks to maximize a measure of the total coverage benefit, and the linear constraints assert that the number of readers used and the budget consumed should not exceed the available resource limits. An equivalent zero-one mixed-integer linear programming problem is then derived using the RLT methodology (Adams and Sherali, 1986; Sherali and Adams, 1990, 1994), which is further augmented by semidefinite cuts, as prescribed by Sherali and Fraticelli (2002). The resulting model is demonstrated to yield a substantial computational advantage over formulations that employ either RLT alone or other standard linearization techniques.
6.
RLT-based pseudo-global optimization procedures
Several engineering and process design problems lead to formidable optimization models that contain analytically complex objective and/or constraint functions that are either expensive to evaluate, or that might be available as only black-box functions. Such problems are also typically highly nonlinear and nonconvex in nature. One approach that has been suitably employed to solve these types of complex optimization problems is Response Surface Methodology (RSM). In its traditional form (refer Myers, 1995), this technique attempts to optimize some black-box or expensive-to-evaluate objective function subject to box-constraints (where other constraints might be explicitly included in the objective function via suitably determined penalty functions, whenever applicable-see Bazaraa et al. (1993). This methodology employs quadratic approximations to the objective function in order to ascertain whether a current solution is a local optimum, or else, to determine a search direction. Computing an appropriate step length along this new direction leads to new solution, and the procedure reiterates. However, this process is essentially memoryless. Several optimization
5
Solving Polynomial, Factorable, and Black-box Problems using RLT
155
schemes that enhance .this basic RSM algorithm have been proposed in the literature, including the contributions credited to Joshi et al. (1998); Jones et al. (1993, 1998); Gutmann (2001); Alexandrov et al. (1998); Sherali and Ganesan (2003). In particular, the lattermost paper discusses a new class of RLT-based pseudo-global optimization techniques for solving such formidable, constrained, black-box optimization problems that can be reasonably approximated via polynomial functions. Essentially, this is a two-stage procedure where first, over a current bounding hyperrectangle in a branchand-bound framework, a forward step-wise regression process is applied to a feasibility screened, full or fractional factorial experimental design to develop (upto) fifth order (or some user-defined limit) polynomial approximations for objective and constraint functions. Techniques described in Section 3, or other alternative methods such as spline interpolation or kriging (see Jones, 2001) can be used for this purpose. Note that a simple minimization of this approximating polynomial can easily miss the global optimum (Jones, 2001) and thus, Sherali and Ganesan prescribe two alternative approaches at the subsequent stage. In the first approach, the polynomial programming problem is solved to global optimality, and the resulting solution is refined via local search techniques. The motivation for such an approach is that the underlying polynomial approximation problem is likely to yield a solution that lies in the relative vicinity of the true underlying global optimum, and hence, applying a local search technique from such a solution might detect a good quality incumbent solution. More pertinently, the solution value obtained for the polynomial programming approximation problem is treated as a pseudo lower boultd (for a minimization problem), which is likely a valid lower bound, though not theoretically guaranteed to be so; hence, this lends the name pseudo-global optimization to this process. This construct is then embedded in a branch-and-bound framework, using a least pseudo lower bound node selection strategy in concert with two suitable hyperrectangle partitioning schemes that are predicated on the current incumbent solution and a user-defined priority among the decision variables in order to induce a more rapid convergence. Note that different degree polynomial approximating problems can be generated at each node as necessary. For the second approach proposed by Sherali and Ganesan, in lieu of solving the nonconvex polynomial programming problem to global optimality, the RLT methodology is applied to generate a (tight) higher dimensional linear programming representation, and the optimal value for this LP relaxation is then used as the pseudo lower bound within an identical branch-and-bound framework. Note that the tightness of
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
load
adecl Vohune = Displacement (Dry)
Figure 5.1. Principal dimensions o f a containership.
the L P relaxations produced by the RLT process contributes toward reducing the computational effort for computing lower bounds without impairing the quality of the resulting solution. This methodology was applied by Sherali and Ganesan (2003) for the design of containerships. Ship design has traditionally been an iterative process in which different aspects of the problem pertaining to power, strength, stability, weight, and space balance have been treated in sequence to arrive at a variety of feasible solutions. Relying on several existing modules (see Neu et al., 2000) that encapsulate the essential characteristics of the design problem that relate to the geometry, hydrostatics, water resistance, and economic considerations, Sherali and Ganesan formulated a model using the following principal decision variables (see Figure 5.1): X I = design draft (in meters), 22 = depth at side (in meters), xg = speed (in knots), x4 = overall length (in meters), and the x5 = maximum beam (in meters). In addition, another fundamental characteristic that determines the overall shape of the ship is the hull fomn. Of specific interest are the geometric properties of the hull form, which in the model formulation can be captured via a user-selected weighted average of two or three basis hull forms. The constraints imposed enforce the balance between the weight and the displacement, a required acceptable length to depth ratio, a restriction on the metacentric height to ensure that the design satisfies the Coast Guard wind heel criterion, a minimum freeboard level as governed by the code of federal regulations, and a lower bound on the rolling period to ensure sea-wort hiness. The objective function seeks to minimize the required freight rate that is induced by the design in order to recover capital and operating costs, expressed in dollars per metric ton per nautical mile. Various other practical issues are also accommodated within the model
5 Solving Polynomial, Factorable, and Black-box Problems using RLT
157
formulation, which therefore turns out to yield a highly nonlinear and nonconvex problem. The proposed pseudo-global optimization algorithm was applied to solve this problem for a typical test case from the literature, and the results were compared with those derived by implementing an available standard commercial ship design software developed by (Neu et al., 2000, see). The RLT-based algorithm produced a significantly superior design that enhanced the profit margin by approximately $18.45 million and yielded an estimated 27% increase in the return on investment.
7.
Summary and conclusions
In this paper, we have discussed the elements revolving around designing RLT-based exact or pseudo-global approaches for solving various classes of polynomial, factorable, and black-box optimization problems. The key idea is to examine the basic structure and complexity of the given problem, and accordingly, design an algorithm that is predicated on polynomial approximations used in concert with RLT constraints such as bound-constraint-factor product relationships, convexbounding restrictions, grid-factor or Lagrange interpolating factor constraints, semidefinite cuts, or other classes of valid inequalities. Some constraint filtering strategies could be applied to weed-out relatively inconsequential members of these various classes of RLT-constraints in order to construct effective and manageable-sized relaxations. In addition, a combination of strategies could be devised for either generating such RIJT constraints a priori, or after solving an initial relaxation by applying suitable separation routines. In composing such a bounding mechanism, there is a natural trade-off that occurs between the tightness of the bound generated and the effort required for computing this bound in an overall branch-and-bound process. Obviously, this trade-off is dependent on the software that is being used to solve the underlying relaxations, and is expected to evolve with advances in linear (or convex) programming software technology. The open challenge is to provide a mechanism for generating tight approximations or relaxations that would enable effective approaches to be developed for solving realistically sized instances of problems arising in complex practical situations. Propounding this viewpoint, we anticipate that automatic reformulation techniques, and in particular, the RLT methodology, will play a crucial role in enhancing problem solvability in the decades to come. Actually, it is through a combination of such general strategies for generating tight representations, along with spe-
158
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
cializations for particular structures, that significant solution capability improvements can be made in the field of nonconvex optimization.
Acknowledgments This work has been partially supported by the National Science Foundation under Grant No. DMI-0094462.
References Ackermann, J . (2002). Robust Control: The Parameter Space Approach. Springer-Verlag, London. Adams, W.P. and Sherali, H.D. (1986). A tight linearization and an algorithm for zero-one programming problems. Management Science, 32(7):1274-1290. Akyildiz, I.F., Su, W., Sankarasubra,maniam, Y., and Cayirci, E. (2002). Wireless sensor networks: A survey. Computer Networks, 38:393-422. Alexandrov, N.M., Dennis, J.E. Jr., Lewis, R.M., and Torczon, V. (1998). A trust-region framework for managing the use of approximation models in optimization. Structural Optimization, 15:16-23. Audet C., Brimberg, J., Hansen, P., Le Digabel, S., and Mladenovie, N. (2004). Pooling problem: Alternate formulations and solution methods. Management Science, 50(6):761-776. Audet, C., Hansen, P., Jaumard, B., and Savard, G. (2000). A branch and cut algorithm for nonconvex quadratically constrained quadratic programming. Mathemutical Programming, A87: 131-1 52. Bazaraa, M.S., Sherali, H.D., a,nd Shetty, C.M. (1993). Nonlinear Programming: Theory and Algorithms. John Wiley & Sons, Inc., 2nd edition, New York, N.Y. Belousov, E.G. and Klatte, D. (2002). A Frank-Wolfe type theorem for convex polynomial programs. Computational Optimization and Applications, 22:37-48. Ben-Tal, A., Eiger, G., and Gershovitz, V. (1994). Global minimization by reducing the duality gap. Mathematical Programming, 63: 193-2 12. Bozorg, M. and Sherali, H.D. (2003). Linear Control of Systems with Uncertain Physical Parameters. Working paper. The Grado Department of Industrial ,and Systems Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA.
5 Solving Polynomial, Factorable, and Black-box Problems using RLT
159
Falk, J.E. and Soland, R.M. (1969). An algorithm for separable nonconvex programming problems. Management Science, 15:550-569. Floudas, C.A. and Pardalos, P.M. (1990). A Collection of Test Problems for Constrained Global Optimization Algorithms. Springer-Verlag, Berlin. Floudas, C.A. and Visweswaran, V. (1990). A global optimization algorithm for certain classes of nonconvex NLPs-I: Theory. Computers and Chemical Engineering, l4(l2):1397-1417. Floudas, C.A. and Visweswaran, V. (1995). Quadratic optimization. In: R. Horst and P.M. Pardalos (eds.), Handbook of Global Optimization, Kluwer Academic Publishers, Dordrecht, The Netherlands. Forgy, E.W. (1966). Cluster analysis of multivariate data: Efficiency versus interpretability of classification. Biometric Society Meetings, Riverside, CA, Abstract in Biometrics 21:768. Gutmann, H.-M. (2001). A radial basis function method for global optimization. Journal of Global Optimization, 19:201-227. Helmberg, C. (2002). Semidefinite programming. European Journal of Operational Research, 137:461-482. Hoppner, F., Klawonn, F., Kruse, R., and Runkler, T . (1999). Fuzzy Cluster Analysis. John Wiley & Sons, Inc., New York, N.Y. Horst, R. (1990). Deterministic methods in constrained global optimization: Some recent advances and new fields of application. Naval Research Logistics Quarterly, 37:433-471. Horst, R. and Tuy, H. (1993). Global Optimization: Deterministic Approaches. Springer-Verlag, 2nd ed., Berlin, Germany. Hou, Y.T., Shi, Y., and Sherali, H.D. (2003). On energy Provisioning and Relay Node Placement for Wireless Sensor Networks. Working paper. The Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA. Jones D.R. (2001). A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21(4):345-383. Jones, D.R., Pertunnen, C.D., and Stuckmann, B.E. (1993). Lipschitzian optimization without the Lipschitz constant. Journal of Optimization Theory and Applications, 79:157-181.
160
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Jones, D.R., Schonlau, M., and Welch, W.J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13:455-492. Joshi, S.S., Sherali, H.D., and Tew, J.D. (1998). An enhanced response surface methodology algorithm using gradient deflection and secondorder search strategies. Computers and Operations Research, 25(7/8): 531-541. Kamel, M.S. and Selim, S.Z. (1994). New algorithms for solving the fuzzy clustering problem. Pattern Recognition, 27(3):421-428. Konno, H., Kawadai, N., and Tuy, H. (2003). Cutting plane algorithms for nonlinear semidefinite programming problems with applications. Journal of Global Optimization, 25:141-155. Konno, H. and Kuno, T. (1995). Multiplicative programming problems. In: R. Horst and P.M. Pardalos (eds.), Handbook of Global Optimization, Nonconvex Optimization and its Applications. Kluwer Academic Publishers, Dordrecht, The Netherlands. Lasserre, J.B. (2001). Global optimization with polynomials and the problem of moments. SIA M Journal of Optimixation, 11(3) :796-817. Lasserre, J.B. (2002). Semidefinite programming versus LP relaxations for polynomial programming. Mathematics of Operations Research, 27(2) :347-360. Laurent, M. and Rendl, F. (2002). Semidefinite Programming and Integer Programming. Working paper. CWI, Amsterdam, The Netherlands. Li, H.L. and Chang, C.T. (1998). An approximate approach of global optimization for polynomial programming problems. European Journal of Operational Research, 107:625-632. McCormick, G.P. (1976). Computability of global solutions to factorable nonconvex programs: Part I-convex underestimating problems. Mathematical Programming, 10:147-175. McQueen, J.B. (1967). Some methods of classification and analysis of multivariate observations. In : Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297. University of CaIifornia Press, Berkeley, CA. Myers, R.H. (1995). Response Surface Methodology: Process and Product Optimization Using Designed Experiments. John Wiley & Sons, Inc., New York, N.Y.
5 Solving .Polynomial, Factorable, and Black-box Problems using RLT
161
Neu, W.L., Hughes, O., Mason, W.H., Ni, S., Chen, Y., Ganesan, V., Lin, Z., and Tumma, S. (2000). A prototype tool for multidisciplinary design optimization of ships. Ninth Congress of the International Maritime Association of the Mediterranean, Naples, Italy, April. Sahinidis, N.V. (1996). BARON: A general purpose global optimization software package. Journal of Global Optimization, 8(2):201-205. Shectman, J.P. and Sahinidis, N.V. (1996). A finite algorithm for global minimization of separable concave programs. In: C.A. Floudas and P.M. Pardalos (eds.), State of the Art in Global Optimization, Computational Methods and Applications. Kluwer Academic Publishers, Dordrecht, The Netherlands. Sherali, H.D. (1998). Global optimization of nonconvex polynomial programming problems having rational exponents. Journal of Global Optimization, 12:267- 283. Sherali, H.D. and Adams, W.P. (1990). A hierarchy of relaxations between the continuous and convex hull representations for zeroone programming problems. SIAM Journal on Discrete Mathematics, 3(3):411-430. Sherali, H.D. and Adams, W.P. (1994). A hierarchy of relaxations and convex hull characterizations for mixed-integer zero-one programming problems. Discrete Applied Mathematics, 52:83-106. Sherali, H.D. and Adams, W.P. (1999). Reformulation-linearization techniques for discrete optimization problems. In: D.-Z. Du and P.M. Pardalos (eds.), Handbook of Combinatorial Optimization, volume 1, pp. 479-532. Kluwer Academic Publishers, Dordrecht, The Netherlands. Sherali, H.D. and Desai, J . (2004). A global optimization RLT-based approach for solving the hard clustering problem. Journal of Global Optimixation. Accepted. Sherali, H.D. and Desai, J . (2004). A Global Optimization RLT-Based Approach for Solving the Fuzzy Clustering Problem. Working paper. The Grado Department of Industrial and Systems Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA. Sherali, H.D., Desai, J., and Glickman, T.S. (2003a). Allocating Emergency Response Resources to Minimize Risk with Equity Considerations. Working paper. The Grado Department of Industrial and Systems Engineering, Virginia Polytechnic Institute and State University, Blacksburg, .VA.
162
ESSAYS .AND SURVEYS IN GLOBAL OPTIMIZATION
Sherali, H.D., Desai, J., and Rakha, H. (2003b). A Discrete Optimization Approach for Locating Automatic Vehicle Identification Readers for the Provision of Roadway Travel Times. Working paper. The Grado Department of Industrial and Systems Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA. Sherali, H.D. and Fraticelli, B.M.P. (2002). Enhancing RLT relaxations via a new class of semidefinite cuts. Journal of Global Optimixation, 22:233-261. Sherali, H.D. and Ganesan, V. (2003). A pseudo-global optimization approach with application to the design of containerships. Journal of Global Optimization, 26:335-360. Sherali, H.D., Staats, R.W., and Trani, A.A. (2003~).An airspace planning and collaborative decision-making model: Part I-Probabilistic conflicts, workload, and equity considerations. Transportation Science, 37(4):434-456. Sherali, H.D., Subramanian, S., and Loganathan, G.V. (2001). Effective relaxations and partitioning schemes for solving water distribution network design problems to global optimality. Journal of Global Optimixation, 19:l-26. Sherali, H.D., Totlani, R., and Loganathan, G.V. (1998). Enhanced lower bounds for the global optimiza,tion of water distribution networks. Water Resources Research, 34(7):1831-1841. Sherali, H.D. and Tuncbilek, C.H. (1992). A global optimization algorithm for polynomial programming problems using a ReformulationLinearization ,Technique. Journal of Global Optimixation, 2:101-112. Sherali, H.D. and Tuncbilek, C.H. (1995). A reformulation-convexification approach for solving nonconvex quadratic programming problems. Journal of Global Optimization, 7:l-31. Sherali, H.D. and Tuncbilek, C.H. (1997). Comparison of two Reformulation-Linearization Technique based linear programming relaxations for polynomial programming problems. Journal of Global Optimization, 10:381-390. Sherali, H.D. and Wang, H. (2001). Global optimization of nonconvex factorable programming problems. Mathematical Programming, 89(3):459--478. Shor, N.Z. (1998). Nond2'erentiable Optimization and Polynomial Pro blems. Kluwer Academic Publishers, Dordrecht, The Netherlands.
5 Solving Polynomial, Factorable, and Blackbox Problems using RLT
163
Spath, H. (1980). Cluster Analysis Algorithms for Data Reduction and Classification of Objects. John Wiley and Sons, New York, N.Y. Tawarmalani, M. and Sahinidis, N.V. (2002a). Convexification and Global Optimization i n Continuous and Mzxed-Integer Nonlinear Programming. Theory, Algorithms, Software, and Applications, Chapter 9. Nonconvex Optimization and it Applications, vol. 65, Kluwer Academic Publishers, Dordrecht, The Netherlands. Tawarmalani, M. and Sahinidis, N.V. (2002b). Convexification and Global Optimization of the Pooling Problem. Manuscript. Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana Champaign, Urbana Champaign, Illinois. Vandenberghe, L. and Boyd, S. (1996). Semidefinite programming. S I A M Review, 38(1):49-95. Vanderbei, R.J. and Benson, H.Y. (2000). O n Formulating Semidefinite Programming Problems as Smooth Convex Nonlinear Optimization Problems. Working paper. Department of Operations Research and Financial Engineering, Princeton University, Princeton, N.J. Vanderplaats Research and Development, Inc. (1995). Dot Users Manual, Version 4.20, Colorado, Springs, CO. Volkov, E.A. (1990). Numerical Methods. Hemisphere Publishing, New York, N.Y. Walski, T.M. (1984). Analysis of Water Distribution Systems. Van Nostrand Reinhold Company, New York, N.Y.
Chapter 6
BILEVEL PROGRAMMING Stephan Dempe Abstract
1.
Bilevel programming problems are hierarchical optimization problems where an objective function is minimized over the graph of the solution set mapping of a parametric optimization problem. In this paper, a selective survey for this living research area is given. Focus is on main recent approaches to solve such problems, on optimality conditions as well as on essential features of this problem class.
Introduction
Bilevel programming problems are hierarchical mathematical problems where the set of all variables is partitioned into vectors x and y. Given y, the vector x is to be chosen as an optimal solution x = x(y) of an optimization problem parametrized in y:
This problem is the so-called lower level problem or the follower's problem. Here, f : Rn x Rm + R, g: Rn x Rm -+ Rp. Equality constraints can be added without serious problems. Sometimes, integrality conditions are added to the lower level problem. The point-to-set mapping @ : Rm + 2Wnis the solution set mapping of the problem (6.1).The solution x ( ~ )is called the rational reaction of the follower on the leader's choice y. Knowing this reaction and using an optimistic point-of-view, the bilevel problem reads as follows:
This problem is the optimistic leader's problem or the optimistic bilevel problem. The optimistic point-of-view reflects some cooperation between both decision makers, i.e. that the follower will take a best solution x ( ~ )€ 9(y) with respect to the leader's objective. Since this
166
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
is not necessarily true, besides the optimistic one also other formulations of the bilevel programming problem exist which will be discussed in Section 2. At some places, if different formulations of the bilevel programming problem are.discussed or the true formulation is not clear in advance, quotation marks around the minimum operator will be used to express the then unclear definition of the objective function value F ( x , y) from the leader's point-of-view (who in general has control over y only). Throughout the paper it is assumed that all the functions F, Gj, f , gi are sufficiently smooth, i.e. at least twice continuously differentiable. The bilevel programming problem is a linear one if the functions F, G are (affine) linear and the lower level problem is a perturbed linear programming problem. It should be mentioned that also more general bilevel programming problems are considered in the literature where the upper level constraint functions depend on the lower level decision, too. This means that the inequalities G(y) 0 are replaced by G(x, y) 5 0. These more general problems will be addressed in the last paragraph of Section 3 only. Bilevel programming has many potential applications in very different fields, such as transportation, economics, ecology, engineering and others, see Dempe (2003). One problem in chemistry from Dempe (2002) is the following. If a certain mixture y of chemical substances is put into a chemical reactor running at a certain temperature T and a given pressure p a chemical equilibrium x arises. This equilibrium depends on y, T, p but in general it cannot be influenced directly by the engineer. One can compute this equilibrium by solving a convex optimization problem
<
<
where G N denotes the number of gases and N the total number of reacting substances. Each row of the matrix A corresponds to a chemical element, each column to a substance. Hence, a column gives the amount of the different elements in the substances; x is the vector of the masses of the substances in the resulting chemical equilibrium whereas y denotes the initial masses of substances put into the reactor. The value of ci(p,T) gives the chemical potential of a substance which depends on the pressure p and on the temperature T. Let x(p,T, y) denote the (unique) optimal solution of this problem. But, for arbitrarily chosen values of (p, T, y) this is not the desired solution of the chemical reactions. The parameters (p: T, y) should be determined such that the resulting chemical equilibrium has a desired quality. The latter means that a large amount of some needed substances or a small amount of others is
6 Bilevel Programming
167
contained in the chemical equilibrium. The aim of computing best values for the parameters can be expressed as a second optimization problem whose feasible set depends on the optimal solution of the problem (6.3): min{cTx: (p,T, y) E Y,x = x(p,T, y)). P,T,Y
This latter problem is the bilevel programming problem. Bilevel programming has been the topic of a large number of papers including master's and Ph.D theses, at least two monographs Bard (1998); Dempe (2002) and edited volumes Anandalingam and Friesz (1992); Migdalas et al. (1998), see the annotated bibliography Dempe (2003). These publications present both theoretical results as well as solution approaches and a large number of applications.
2.
Nonunique lower level optimal solutions
In the general definition of the bilevel programming problem, the leader has control over y only. Then, if the set of optimal solutions Q(y) of the lower level problem does not reduce to a singleton, the leader is in general not aware of the true function value of his objective function unless he knows the follower's reaction on his choice. This leads to an ambiguity in the definition of the problem which is illustrated by the following example: EXAMPLE 6.1 (LUCCHETTI AT . 4 ~ . ,1987) Let the lower level problem be given as Q(y) = Argmin,{-xy: 0 5 ?: 5 1) and consider the bilevel problem
Then, evaluating the lower level problem and inserting the optimal solution of this problem into the objective function of the upper level one results in
Unclear is the function vaiue of the function y H F ( X ( ~ ) , Y) at the point y = 0. The infimal function value of F ( X ( ~ )y), is equal to zero but t.his value is attained only if F(x(o), 0) = 0. This situation is called
168
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
the optimistic position in what follows. If this is not the case then the bilevel problem has no solution. Note that in the general formulation of the bilevel programming problem the leader is not allowed to force the follower to take the one or the other of his optimal solutions. Hence, if the upper level objective function depends on the follower's solution, the leader cannot predict the true value of this function until the follower has communicated his choice. To overcome this ambiguity at least two approaches have been suggested: the optimistic (or weak) (see Loridan and Morgan, 1996) and the pessimistic (or strong) (cf. Loridan and Morgan (1989)) formulations. Both formulations have been compared in Loridan and Morgan (1996). Results especially on perturbed problems can be found e.g. in Lignola and Morgan (1997).
2.1
The optimistic position
The first one is the optimistic position. The leader can use this approach if he supposes that the follower is willing to support him, i.e. that the follower will select a solution x(y) E Q(y) which is a best one from the point-of-view of the leader. Let
denote the optimistic objective function value in the upper level. Then, the optimistic position of the bilevel programming problem (or the optimistic bilevel problem) reduces to
A pair (x(y),y) such that jj solves (6.5) and x(y) solves (6.4) for y
=y
is an optimal solution for the problem
too, and vice versa, provided that the latter one has an optimal solution. Most of the contributions to bilevel programming treat this problem. One of the reasons can be that this problem has an optimal solution under quite reasonable assumptions. To formulate an existence result for this problem a regularity condition is needed. For a problem
6 Bilevel Programming
169
with equality and inequality constraints ( a ,pi, yj : Rn -+ R, i = 1 , . . . , p , j = 1 , . . . ,q ) the Mangasarian-Fromowitx constraint qualification at a point xO reads as
(MFCQ) there exists a direction d with o p i ( x 0 ) d < 0 V i : pi(xO) = 0 ,
o r j ( x O ) d= 0 V j
and the gradients { ~ y ~ ( x ~ are ) V linearly j) independent.
THEOREM 6.1 ( D E M P E 2, 0 0 2 ) If
<
<
( C ) the set { ( x ,y ) : g ( x , y ) 0 , G ( y ) 0 ) i s non-empty and compact and, for each y with G ( y ) < 0 , the (MFCQ) i s satisfied for problem (6.1), then problem (6.6) has a n optimal solution. Simple examples can be constructed verifying the necessity of the assumptions of the theorem. EXAMPLE 6.2 Consider the convex lower level problem
with @ ( y ) = ( 1 ) if y = 0 and @ ( y ) = ( 0 ) otherwise and the optimistic bilevel problem min{x2 y2 : y 2 0 , x E @(y)). x>!l
+
This problem has no optimistic optimal solution since the infimal function value is zero and is not attained. Related results can be found in Harker and Pang (1988).
2.2
The pessimistic position
The optimistic position cannot be applied at least in the cases when cooperation is not allowed respectively not possible or when the follower's seriousness of keeping the agreement is not granted. Then, one way out of this unpleasant situation for the leader is to bound the damage resulting from an undesirable selection of the follower. This leads to the pessimistic bilevel programming problem
where
cpp(y) := max{F(x, Y ) : x E @ ( Y ) )
(6.9)
170
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
denotes the worst upper level objective function value achievable on the set of optimal solutions of the lower level problem. T h e point-to-set mapping 9 ( - ) is lower semicontinuous at a point y = yo if, for each open set A with 9(y0) n A # 0 there is an open neighborhood B of yo with 9 ( y ) n A # 0 for all y E B. It is upper semicontinuous at y = yo if, for each open set A with 9 ( y 0 ) C A there is an open neighborhood B of yo with 9 ( y ) C A for all y E B.
THEOREM 6.2 (DEMPE,2002) Let the point-to-set mapping 9 ( . ) be lower semicontinuous at all points y with G(y) 5 0 and let assumption ( C ) be satisfied. Then, problem (6.8) has a n optimal solution. Similar results can be found in the paper by Lucchetti at al. (1987). Note that the assumptions in this theorem are much more restrictive than in Theorem 6.1. The bilevel programming problem has an implicitly determined set of feasible solutions and, at least looking at (6.5) or (6.8), it is a nonconvex optimization problem with a nondifferentiable objective function. Hence local optimal solutions may occur. To define the notion of a locally optimal solution or of a stationary solution usual tools from optimization applied to the problems (6.5) or (6.8) can be used.
<
DEFINITION6.1 A point (xO,yo) with G(yO) 0 and xO E 9(y0) is called a locally optimistic (pessimistic) optimal solution of the bilevel programming problem if
and there is an e
2.3
> 0 such that
Weak solutions
If the lower level problem is nonconvex (for fixed parameter value y) the computation of a globally optimal solution in the lower level problem can be computationally intractable (especially computing global optimal solutions for all parameter values). In this case it can be considered as being helpful to modify the bilevel problem such that a locally optimal solution in the lower level problem is searched for instead of a globally optimal solution. But, as it is shown in an example in Vogel (2002), this can completely change the existence of an optimal solution of the bilevel prbgramming problem. EXAMPLE 6.3 (VOGEL,2002) Consider the bilevel problem
6 Bilevel Programming
with the lower level problem
Then, if 9,(y) denotes the set of global optimal solutions of the last problem and if the optimistic approach is used, then an optimal solution is y* = -2, an optimal solution in the pessimistic approach does not exist. But, if 9,(y) denotes the set of locally optimal solutions of the lower level problem the following results are obtained:
and
Hence, inf cpz(y) = 0 and an optimal solution of this problem does not exist. On the other hand, inf cp;(y) = 4 and all points y E [-3,1] are optimal solutions. The reason for this behavior is that the pointto-set mapping of locally optimal solutions of a parametric nonconvex optimization problem is generally not upper semicontinuous. To circumvent this unpleasant situation, Vogel (2002) has defined a weaker notion of an optimistic and a pessimistic solutions. For this, let the point-to-set mapping %, := cl 9, be defined via the closure of the graph of the point-to-set mapping 9,: grph %, := cl grph 9,. Consider the problem
" min "{F(x, y) : G(y) < 0, x E @s(Y))
(6.10)
Y
and define
where Q, is a point-to-set mapping defined by "solutions" of the lower level problem (e.g. local optimal solutions or global optimal solutions or stationary points).
172
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
DEFINITION 6.2 (VOGEL,2002) Consider the problem (6.10). 1. A point y with G(y) 5 0 is a weak optimistic solution of the bilevel programming problem if there is some 3 E %,(y) such that F ( 3 , y) = F,. 2. A point jj with G(y) I: 0 is a weak pessimistic solution of the bilevel programming problem if there is some 3 E %,(y) and a sequence 00 {yk jkZl c dom XP, such that limk,, yk = y and limk,, pp(yk)= Fp as well as F (3,y) = Fp.
The definitions of weak optimistic and pessimistic solutions are different since, if a weak optimistic solution exists, it can be shown that the additional property similar to the one formulated for the pessimistic solution is satisfied.
THEOREM 6.3 (VOGEI,,2002) Let {(x,y): x E XPS(y),G(~)5 0) be nonempty and bounded. Then, the bilevel programming problem (6.10) has a weak optimistic and a weak pessimistic solutions. The assumption of this theorem is satisfied if XP,(y) denotes the set of globally or locally optimal solutions or the set of Fritz John points and also if it denotes the set of generalized critical points (in the sense of Guddat et al. (1990)) of the lower level problem (6.1).
3.
Relations to other problems
The bilevel programming problem is closely related to other optimization problems which are often used to solve this problem. In the papers Audet et,al. (1997); Frangioni (1995) it is shown that every mixeddiscrete optimizati~nproblem can be formulated as bilevel programming problem. This of course implies NP-hardness of bilevel programming. The latter is also shown in
THEOREM 6.4 (DENG,1998) For any E > 0 it is N P - h a r d to find a feasible solution to the linear bzievel programming problem with n o more than E times the optimal value. Related results can also be found in Hansen et al. (1992). In bicriterial optimization problems two objective functions are minimized simultaneously over the feasible set (Pardalos et al., 1995). To formulate them, a vector valued objective function can be used: ((
min x "{a(x) : x E X),
(6.11)
6 Bilevel Programming
173
Rn and a : X -t R2. In such problems a compromise bewhere X tween the two, in general competing objective functions a l ( x ) and a z ( x ) is looked for. Roughly speaking, one approach for such problems is to call a point x* E X a solution if it is not possible to improve both objective functions at x* simultaneously. Such points are clearly compromise points. More formally, x* E X is Pareto optimal for problem (6.11) if
In this definition, the first orthant R$ in JR2 is used a s an ordering cone, i.e. to establish a partial ordering in the space of objective function values of problem (6.11), which is R2. In a more general formulation, another ordering cone V c R2 is used. The cone V is assumed to be convex and pointed. Then, x* E X is Pareto optimal for problem (6.11) with respect to the ordering cone V if
The relations of bilevel programming to bicriterial optimization have been investigated e.g. in the papers Fliege and Vicente (2003); Haurie et al. (1990); Marcotte and Savard (1991). On the one hand, using R: as the ordering cone, it is easy to see that at least one feasible point of the bilevel programming problem (6.1), (6.6) is Pareto optimal for the
But this, in general, is not true for a (local) optimal solution of the bilevel problem. Hence, attempts to solve the bilevel programming problem via bicriterial optimization with the ordering cone JR: will in general not work. On the other hand, Fliege and Vicente (2003) show that bicriterial optimization can indeed be used to prove optimality for the bilevel programming problem. But, for doing so, another more general ordering cone has to be used. Closely related to bilevel programming problems are also the problems of minimizing a function over the efficient set of some multicriterial optimization problem (see Fiilop, 1993; Muu, 2000). One tool often used to reformulate the optimistic bilevel programming are the Karush-Kuhn-Tucker condiproblem as an one-level tions. If a regularity condition is satisfied for the lower level problem (6.1), then the Karush-Kuhn-Tucker conditions are necessary optimality
174
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
conditions. They are also sufficient in the case when (6.1) is a convex optimization problem in the x-variables for fixed parameters y. This suggests to replace problem (6.I ) , (6.6) by min F(x, y) X,YJ subject to G(y) 10, Vxf (x, Y)
+ ~ ~ V x dY)x =, 0,
Problem (6.12) is called an (MPEC), i.e. a mathematical program with equilibrium constraints, in the literature Luo et al. (1996). The relations between (6.1), (6.4), (6.5) and (6.12) are highlighted in the following theorem. THEOREM6.5 (DEMPE(2002)) Consider the optimistic bilevel programming problem (6.1), (6.4), (6.5) and assume that, for each fixed y, the lower level problem (6.1) is a convex optimixation problem for which (MFCQ) is satisfied for each fixed y and all feasible points. Then, each local optimal solution for the problem (6.1), (6.4), (6.5) corresponds to a local optimal solution for problem (6.12). This implies that it is possible to solve the optimistic bilevel programming problem via an (MPEC) but only if the lower level problem has a unique optimal solution for all values of the parameter or if it is possible to avoid false stationary points of the (MPEC). The solution of a pessimistic bilevel programming problem via an (MPEC) is not possible. Note that the opposite implication is not true in general. This can be seen in the following example.
EXAMPLE 6.4 Consider the simple optimistic linear bilevel programming problem min{y: x E @(y),-1
1y 5
l),
X>Y
where Q(y) := Argmin,{xy : 0
< x < 1)
at the point (x, y) = (0,O). Then, [0,1], if y = 0, (11, {O}, Take 0
< E < 1 and set
W,(O,O) = (-E,
ify
O. E)
x (-E,E). Then,
175
6 Bilevel Programming
Since the infimal function value of the upper level objective function F ( x , y) = y on this set is zero, the point (x, y) = (0,O) is a local optimal solution of problem (6.12). Due to its definition,
Since this function has no local minimum at y = 0, this point is not a local optimistic optimal solution. The essential reason for the behavior in this example is the lack of lower semicontinuity of the mapping Q(y) which makes it reproducible in a more general setting. It is a first implication of these considerations that the problems (6.1), (6.4), (6.5) and (6.1), (6.6) are not equivalent if local optimal solutions are considered and a second one that not all local optimal solutions of the problem (6.12) correspond in general to local optimal solutions of the problem (6.l), (6.4), (6.5). It should be noted that, under the assumptions of Theorem 6.5 and if the optimal solutions of the lower level problem are strongly stable in the sense of Kojima (1980) (cf. Theorem 6.6 below), then the optimistic bilevel programming problem (6.I ) , (6.2) is equivalent to the (MPEC) (6.12). The following example from Mirrlees (1999) shows that this result is no longer valid if 'the convexity assumption is dropped.
EXAMPLE 6.5 Consider the problem
where @(y) is the set of optimal solutions of the following unconstrained optimization problem on the real axis:
Then, the necessary optimality conditions for the lower level problem are y(x 1) exp{- (n: I ) ~ ) (x - 1)~XP{-(X- 1)2 ) = 0
+
+
+
<
which has three solutions for 0.344 y 5 2.903. The global optimum of the lower level problem is uniquely determined for all y # 1 and it has a jump at the point y = I. Here the global optimum of the lower level problem can be found at the points x = f0.957. The point (xO; = (0.957; 1) is also the global optimum of the optimistic bilevel problem.
176
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
But if the lower level problem is replaced with its necessary optimality conditions and the necessary optimality conditions for the resulting problem are solved then three solutions: (x, y) = (0.895; 1.99), (x,,y) = (0.42;2.19), (x,y) = (-0.98; 1.98) are obtained. Surprisingly, the global optimal solution of the bilevel problem is not obtained with this approach. The reason for this is that the problem min {(y-2)+(x- 1)2: y ( x + l ) e~p{-(x+l)~)+(x-1)exp{-(x-
=o)
has a much larger feasible set than the bilevel problem. And this feasible set has no jump at the point (x, y) = (0.957; 1) but is equal to a certain connected curve in EX2. And on this curve the objective function has no stationary point at the optimal solution of the bilevel problem. A surprising result is also the dependence of bilevel programming problems on irrelevant constraints, cf. Macal and Hurter (19%'). This means that dropping some lower level constraints which are not active at an optimal solution (xO, of the bilevel programming problem can change the problem drastically such that (xO, will not remain optimal. For the more general problems with G(y) 5 0 replaced by G(x, y) 5 0, the location of constraints is essential. Moving one constraint from the lower to the upper levels (or vice versa) generally changes the problem significantly; it is even possible that one of the problems has an optimal solution whereas the other has not. To see this consider a lower level problem with a non-trivial linear objective function and move all the constraints to the upper level. In this more general setting, the feasible set even of linear bilevel programming problems needs not to be connected. This can be seen in the following example. EXAMPLE 6.6 (DEMPE,2002) Consider the problem
with
8 10 Then, the feasible set of the bilevel problem is equal to y E [E, ~[3,8].
4. 4.1
Optimality conditions Implicit functions approach
The formuIation of optimality conditions for bilevel programming problems usually starts with a suitable single-level reformulation of the
6 Bilevel Programming
177
problem. First conditions are based on strong stability of optimal solutions for problem (6.1) and replace the implicit constraint x E $ ( y ) by the implicitly determined function x = x ( y ) with { x ( y ) ) = \Ir(y). Let L ( x , y, A) := f ( x ,y ) X T g ( x ,y ) denote the Lagrange function for problem (6.1) and
+
denote its set of regular Lagrange multipliers.
THEOREM 6.6 (KOJIMA,1 9 8 0 ) Consider problem (6.1) and let xO be a locally optimal solution of this problem at y = yo. Assume ( M F C Q ) and
(SSOC) for all X0 E A ( x O ,yo) and for all d
#0
with
= 0 'di E J(x') := { j : A: > 0 ) V Z g i ( x OyO)d ,
the inequality d T V & ~ ( x Oyo, , XO)d > 0 holds are satisfied. Then, the solution xO is strongly stable, i.e. there exist open neighborhoods U of x%nd V of yo and a uniquely determined function x : V -+ U being the unique locally optimal solution of (6.1) i n U for all y E V. Assumption ( S S O C ) is the so-called strong sufJicient optimality condition of second order. If the more restrictive linear independence constraint qualification replaces ( M F C Q ) then the solution xO is strongly stable if and only if ( S S O C ) is satisfied, see Klatte and Kummer (2002). If the assumptions of Theorem 6.6 are satisfied for x0 E $ ( y o ) and problem (6.1) is a convex optimizakion problem for fixed y then, problem (6.1), (6.2) can locally equivalently be replaced with
Using the chain rule, necessary and sufficient optimality conditions can now be derived, provided it is possible to compute, say, a directional derivative for the function x ( y ) in the point yo.
Consider problem (6.1) THEOREM 6.7 (RALPHAND DEMPE,1 9 9 5 ) and let xO be a locally optimal solution of this problem at y = yo. Assume that ( M F C Q ) , ( S S O C ) together with (CRCQ) there exists a n open neighborhood W of ( x O yo) , such that, for each subset I C_ I ~ x O , yo) := { j : g,(xO,yo) = 0 ) the family of gradient vectors { V Z g i ( x ,y ) : i E I ) has the same rank o n W are satisfied at ( x O , Then, the by Theorem 6.6 locally uniquely determined function x ( y ) is directionally differentiable at the point y = yo
178
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
with the directional derivative in direction r being the unique optimal solution of the problem min {$dTV:,l(xO,yo, X)d d
X)r)
if i E J ( X ) ,
= 0,
subject to Vxgi ( X for all
2 + dT V,l(xO,
5 0, if i
E I(xO,
yo) \
J(X)
E ~ r ~ m a x ~ { Vyo, ~ lX)r ( x: X~ E, n ( x Oyo)}. ,
Assumption (CRCQ) is called constant rank constraint qualification. I f this assumption is not satisfied (while maintaining the other assumptions) the solution function remains directionally differentiable, see Shapiro (1988), but the nice method to compute it is lost. Moreover, the following theorem is not true without (CRCQ),see Dempe (2002). Denote the directional derivative of x ( y ) at y = yo in direction r by xl(yO; r ) . Then, necessary and sufficient optimality conditions for the bilevel programming problem can be formulated. THEOREM 6.8 ( D E M P E1992) , Consider the problem (6.1), (6.2) at a and let (MFCQ), (SSOC) and point ( x Oyo) , with G ( y O )5 0, xO E (CRCQ) be satisfied for the lower level problem. Moreover assume that (6.1) is a convex optimization problem parametrized in y. Then,
I . if yo is a locally optimal solution of this problem, the following optimization problem has the optimal objective function value zero: min a 0
0
a,r 1 0
subject to V X F ( x,y ) x ( y ;r ) V G i ( y O5 ) a,
+ V,F(x
0
, Y 0 )r 5 a ,
'v'i : Gi(yO)= 0 ,
Ilrll 5 1. 2. if the optimal function value v of the problem
is greater than zero ( v > O ) , yo is a strict local optimal solution of the problem (6.1), (6.2), i.e. for each 0 < z < v there is E > 0 such that F ( x ,Y ) Z F ( x O YO) , + zlly - Y0I1
6 Bilevel Programming
179
, If the (MFCQ) is satisfied at the point Put 3 ( y ) := F ( ~ ( y )y). for the problem (6.13) then the necessary optimality condition of first order in Theorem 6.8 means that 3'(y0; r )
> O Vr satisfying V G ~ ( Y OI) ~0,
i : ~ i ( y O= ) 0.
This property is usually called Bouligand stationarity (or B-stationarity) of the point yo.
4.2
Using the KKT conditions
If the Karush-Kuhn-Tucker conditions are applied to replace the lower level problem by a system of equations and inequalities, problem (6.12) is obtained. The Example 6.5 shows that it is possible to obtain necessary optimality conditions for the bilevel programming problem by this approach only in the case when the lower level problem is a convex parametric one and also only using the optimistic position. But even in this case this is not so easy since the familiar regularity conditions are not satisfied for this problem.
THEOREM 6.9 (SCHEELAND SCHOLTES,2000) For problem (6.12) the Mangasarian-Fromowitx constraint qualification (MFCQ) is violated at every feasible point. To circumvent the resulting difficulties for the construction of KarushKuhn-Tucker type necessary optimality conditions for the bilevel programming problem, in Scheel and Scholtes (2000) a nonsmooth version of the KKT reformulation of the optimistic bilevel programming problem is constructed: min F ( x , y)
X,YJ
subject to G(y)
< 0,
V A x , Y: 4 = 0, mini-g(x, y), A) = 0.
(6.14)
Here, for a , b E Rn,the formula min{a, b) = 0 is understood component wise. For problem (6.14) the following generalized variant of the linear independence constraint qualification can be defined (Scholtes and Stohr, 2001):
(PLICQ) The piecewise linear independence constraint qualification is satisfied for the problem (6.14) at a point (xO,yo, A') if the gradients of all
180
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
the vanishing components of the constraint functions G(y), V xL(x, y, A), g(x, y), X are linearly independent. Problem (6.14) can be investigated by considering the following patchwork of nonlinear programs for fixed sets I: min F(x, y) X>YJ
Then, the piecewise linear independence constraint qualification is valid for problem (6.14) at some point (xO,yo, XO) if and only if it is satisfied for each of the problems (6.15) for all sets J ( X O ) C I 2 I ( x O , The following theorem says that the (PLICQ) is generically satisfied. For this define the set % = { ( F , G , f , g ) E C ( R m-tn , RISSSISp): (PLICQ) is satisfied at each feasible point of (6.14) with llAllco I B) for an arbitrary constant 0 < B < oo, llXllco = max{lXil: 1 5 i I p ) is the L,-norm of a vector X E RP and 1 2 2. Roughly speaking, the zero neighborhood in the (Whitney) Ck topology in RP is indexed by a positive continuous function E : Rp -t R+ and contains all (vector-valued) functions h E Ck(Rp,Rt) such that each component function together with all its derivatives up to order b are bounded by the function E . For details the interested reader is referred to Hirsch (1994).
< <
THEOREM 6.10 (SCHOLTES AND STOHR,2001) For 2 k 1, the set 7-& is open in the Ck-topology. Moreover, for 1 > rn, the set 7 1 iis also dense in the Ck-topologyfor all 2 5 k 5 1. Now, after this excursion to regularity, the description of necessary optimality conditions for the bilevel programming problem with convex lower level problems using the optimistic position is continued. For the origin of the following theorem for mathematical programs with equilibrium constraints see Scheel and Scholtes (2000). There a relaxation of problem (6.12) is considered:
6 Bilevel Programming
min F ( x ,y) X,YJ,Y subject to V x L ( x ,y, A, p) = 0,
G ( y ) 1 0,
In the following theorem, a more restrictive regularity condition than (MFCQ) is needed: (SMFCQ) The strict Mangasarian-Fromowitx constraint qualification (SMFCQ) is satisfied at xO for problem (6.7) if there exists a Lagrange multiplier ( A , p ) ,
as well as a direction d satisfying
v P i ( x o ) d< 0,
for each i with Pi(xO)= Xi = 0,
VPi(xo)d= 0,
for each i with Xi
v y j jxo)d
for each j
and {vP~(xO): Xi dent.
-
0,
> 0,
= 1, . . . , q ) are linearly indepen-
> 0 ) )1) { v r j ( x O :) j
Note that this condition is im.plied by (PLICQ).
T H E O R E6.11 M Let ( x O , X O ) be a local minimizer of problem (6.14) and use zO = ( x0 ,y 0 ). w
If the (MFCQ) is valid for problem (6.16) at ( x Oyo, , A'), then there exist multipliers ( K , W , C , [ ) satisfying
+
+
V F ( ~ ' ) rcT (0,v y ~ ( y 0 ) )Q ( V ~ L (0 Z, X 0 )w) vxg(zO)w - ,.$ = 0, gi(~O)= & 0, 0
XiJi=O, Cili
Vi7 Vi,
2 07 i E K7
f i T ~ ( y 0=) 0,
L 0,
+ c T V g ( z O )= 0,
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where K = { i : gi(xO, with respect to ( x ,y).
= A: = 0 ) and V denotes the gradient
If the (SMFCQ) is fulfilled for the problem (6.16), then there exist unique multipliers ( K , w , 5, E ) solving the last system of equations and inequalities with CiCi 1 0, i E K being replaced b y
For related optimality conditions see e.g. Flegel and Kanzow (2002, 2003).
5.
5.1
Solution algorithms Implicit function approach
To solve the bilevel programming problem, it is reformulated as a onelevel problem. The first approach again uses the implicit determined solution function of the convex lower level problem x ( ~provided ) this function is uniquely determined. If the assumptions ( C ) , (MFCQ),(SSOC), and (CRCQ) are satisfied for (6.1) at every point y with G ( y ) 0 , then the resulting problem
<
has an objective function being piecewise continuously differentiable (see Ralph and Dempe, 1995). The pieces of the solution function x ( y ) are obtained by replacing some of the active inequalities gi ( x ,y) 5 0, i E 5 in the lower level problem by equations gi(x,y) = 0 , i E 7, where
J(AO):= { i : A: > 0 ) G I 2 ~ ( x ( y O ) , := { j : g j ( x ( y O )yo) , =0) and XO is a Lagrange multiplier vector in the lower level problem corIf the constraints responding to the optimal solution x ( y O )for y = gi(x,y) 5 0 in problem (6.1) are locally replaced by gi(x,y) = 0, i E 5, the resulting lower level problems are
mxi n { f ( x , y ) :gi(x1y)= O,Yi E
I).
(6.17)
If the gradients { ~ , ~ ~ ( x yo) ( y :~i )E ,7 ) are moreover linearly independent (which can be guaranteed for small sets 7 > J(AO)with A0 being a vertex in A ( X ( ~ O ) ,y o ) ) , then the optimal solution function x f ( . ) of the problem (6.17) is differentiable (Fiacco, 1983). Let Z denote the family of all index sets determined by the above two demands for all vertices A0 E A ( x ( ~yo). ~ ~ ,
6 Bilevel Programming
183
THEOREM 6.12 (DEMPEAND PALLASCHKE, 1997) Consider problem (6.1) at the point x0 := (x(yO),yo) and let (MFCQ), (SSOC) as well as (CRCQ) be satisfied there. If the condition
(FRR) For each vertex X0 E A(zO)the matrix
has full row rank n
+ II (x(yO),yo) 1
is valid, then the generalized derivative of the function x(.) at the point y = yo in the sense of Clarke (1983) is ax(yo) = conv
U Vx (y ). I
0
IEZ
Using this formula, a bundle algorithm (cf. Outrata at al., 1998) can be derived to solve the problem (6.13). Since the full description of bundle algorithms is rather lengthy, the interested reader is referred e.g. to Outrata at al. (1998). Repeating the results in Schramm (1989) (cf. also Outrata at al., 1998) the following result is obtained: THEOREM6.13 (DEMPE,2002) If the assumptions (C), (MFCQ), (CRCQ), (SSOC), and (FRR) are satisfied for the convex lower level problem (6.1) at all po.ints (x, y), x E Q(y), G(y) = 0, and the sequence i n the bundle algorithm remains of iteration points { (x(yk),y k >Xk) bounded, then this algorithm computes a sequence {(x(yk),yk, A') having at least one accumulation point (x(yO),yo, XO) with
}El
}El
If assumption (FRR) is not satisfied, then the point (x(yO),yo) is pseudostationary i n the sense of Mikhalevich et al. (1987). Hence, under suitable assumptions the bundle algorithm computes a Clarke stationary point. Such points are in general not Bouligand stationary.
5.2
A smoothing method
To solve problem (6.12) several authors (e.g. Fukushima and Pang, 1999) use an NCP function approach to replace the complementarity
184
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
constraints. This results in the nondifferentiable problem min F ( x , y)
Z,YJ
where a function
@(a,
.) satisfying
is called an NCP function. Examples and properties of NCP functions can be found in the book of Geiger and Kanzow (2003). NCP functions are inherently nondifferentiable, and algorithms solving problem (6.18) use smoothed NCP functions. Fukushima and Pang (1999) use the function
and solve the resulting problems min F(x, y) X,YJ subject to G(y) 1 0, VxL(x, Y, 4 = 0)
(6.19)
for E + 0 with suitable standard algorithms. Hence selecting an arbitrary sequence { c k ) g l they compute a sequence {(xk,yk, A k ) ) g l of solutions and investigate the properties of the accumulation points of this sequence. To formulate their convergence result, the assumption of week nondegeneracy is needed. To formulate this assumption consider the Clarke derivative of the function @(- gi(x, y), &). This Clarke derivative exists and is contained in the set
Let the point (3,jj,X) be an accumulation point of the sequence {(xk,y k , A k ) } ~ , .It is then easy to see that, for each i E I ( Z , j j ) \ J ( X ) any accumulation point of the sequence
6 Bilevel Programming
belongs to Ci (3,jj, X), hence is of the form
with ( 1 - ~ ~ ) ~ + ( 15- 1. ~ ~It )is~said that the sequence {(xk,yk, X k ) ) z l is asymptotically weakly nondegenerate, if in this formula neither Ji nor Xi vanishes for any accumulation point of {(xk,yk, Xk))E1. Roughly speaking this means that both gi(zk,yk) and approach zero in the same order of magnitude (see Fukushima and Pang, 1999).
THEOREM 6.14 (FUKUSHIMA A N D PANG,1999) Let for each point (xk,yk, Xk) the necessary optimality conditions of second order for prob~ l e m (6.19) be satisfied. Suppose that the sequence {(xk,yk, x ~ ) ) Econverges to some (Z,jj, X) for k + oo. If the (PLICQ) holds at the limit is asymptotically weakly nonpoint and the sequence {(xk,yk, Xk))E1 degenerate, then (3,jj, A) is a Bouligand stationary solution for problem (6.12).
5.3
SQP methods
Recently several authors have reported (in view of the violated regularity condition rather surprisingly) a good behavior of SQP methods for solving mathematical programs with equilibrium constraints (see Anitescu, 2002; Fletcher et al., 2002; Fletcher and Leyffer, 2002). To sketch these results consider a bilevel programming problem (6.6) with a convex parametric lower level problem (6.1) and assume that a regularity assumption is satisfied for each fixed parameter value y with G ( y ) 5 0. Then, by Theorem 6.5, a locally optimal solution of the bilevel programming problem corresponds to a locally optimal solution for the problem (6.12). Consequently, in order to compute local minima of the bilevel problem, problem (6.12) can be solved. In doing this, Anitescu (2002) uses the elastic mode approach in a sequential quadratic programming algorithm solving (6.12). Th'is means that if a quadratic programming problem minimizing a quadratic approximation of the objective function of problem (6.12) subject to a linear approximation of the constraints of this problem has a feasible solution with bounded Lagrange multipliers then the solution of this problem is used as a search direction. And if not, a regularized quadratic programming problem is used to compute this search direction. For simplicity, this idea is described for problem (6.7). Then this means that the following problem is used to compute this search direction:
186
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
subject to Pi(x) %(x)
+ VP(x)d 5 0,
+ Vy(x)d = 0,
Vi = 1 , . . . , p V j = 1 , . . . ,q.
Here, W can be the Hessian matrix of the Lagrange function of the problem (6.7) or another positive definite matrix approximating this Hessian. If this problem has no feasible solution or unbounded Lagrange multipliers the solution of problem (6.7) (or accordingly the solution process for the problem (6.12)) with the sequential quadratic programming approach is replaced by the solution of the following problem by the same approach:
where c is a sufficiently large constant. This is the elastic mode SQP method. To implement t,he idea of Anitescu (2002) assume that the problem (6.16) satisfies the (SMFCQ) and that the quadratic growth condition at a point x = x 0
(QGC) There exists a > 0 satisfying
for all x in some open neighborhood of xO is valid for problem (6.12) at a locally optimal solution of this problem.
THEOREM 6.15 (ANITESCU,2002) If the above two assumptions are satisfied then the elastic mode sequential quadratic programming algorithm computes a locally optimal solution of the problem (6.12) provided it is started suficiently close to that solution and the constant c is sufficiently large. Using stronger assumptions Fletcher et al. (2002) have even been able to prove local Q-quadratic convergence of sequential quadratic programming algorithms to solutions of (6.12).
6.
Discrete bilevel programming
If integer variables appear in the lower or upper levels of a bilevel programming problem the investigation becomes more difficult and the number of references is rather small, see Dempe (2003). With respect to the existence of optimal solutions the location of the discrete variables is important Vicente et al. (1996). Most difficult is the situation when the lower level problem is a parametric discrete one and the upper level.
6 Bilevel Programming
187
problem is a continuous problem. Then the graph of the solution set mapping @(.) is in general neither closed nor open. The other cases can be treated more or less analogously to the continuous problems. One way to solve discrete optimization problems (and also bilevel programming problems) is branch-and-bound. If the integrality conditions in both levels are dropped at the beginning and are introduced via the branching procedure, then a global optimal solution of the relaxed problem, which occasionally proves to be feasible for the bilevel problem is in general not an optimal solution for the bilevel programming problem. Moreover, the usual fathoming procedure is not valid, see Moore and Bard, (1990). Fathoming is used in a branch-and-bound algorithm to decide that a node of the enumeration tree need not be explored further. This decision cannot be based on the comparison of the incumbent objective value with the optimal objective function value of the relaxed problem if an optimal solution of the latter problem proves to be feasible for the bilevel problem. Mixed-discrete linear bilevel programming problems with continuous lower level problems have been transformed into linear bilevel problems in Audet et al. (1997) which opens a second way for solving such problems. Other solution methods include one using explicitly the solution set mapping of a right-hand side parametrized Boolean knapsack problem in the lower level and another one using cutting planes in the discrete lower level problem with parameters in the objective function only (see Dempe, 2002). To describe a further approach consider a linear bilevel programming problem with integer variables in the upper level problem only:
subject to Alx 5 bl, x 2 0, integer where y solves
(6.20)
Then, an idea of White and Anandalingam (1993) can be used to transform this problem into a mixed discrete optimization problem, For this, apply the Karush-Kuhn-Tucker conditions to the lower level problem.
188
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
This transforms problem (6.20) into
subject to A 1 x 5 bl,
x 1 0 , integer,
B ~ >Ab2,
(6.21)
Now use a penalty function approach to get rid of the complementarity constraint resulting in the problem
subject to A l x
bl,
x
2 0,
integer,
(6.22)
A 2 x + B2y = b2, y 1 0 , B,TX b2. By application of the results in White and Anandalingam (1993) the following is obtained:
THEOREM 6.16 Assume that problem (6.22) has a n optimal solution for some positive KO. Then, the problem (6.22) describes a n exact penalty function approach for problem (6.20), i.e. there i s a number K* such that the optimal solutions of the problems (6.22) and (6.20) for all K 2 K* coincide. This idea has been used in Dempe and Kalashnikov (2002) to solve an application problem in gas industry. Moreover, the implications of a movement of the discreteness condition from the lower to the upper level problems has been touched there.
7.
Conclusion
In the paper a selective survey of results in bilevel programming has been given. It was not the intention of the author to give a detailed description of one or two results but rather to give an overview over different directions of research and to describe some of the challenges of this topic. Since bilevel programming is a very living area a huge number of questions remain open. Among others, these include optimality conditions as well as solution algorithms for problems with nonconvex lower level problems, discrete bilevel programming problems in every context, and many questions related to the investigation of pessimistic bilevel
6 Bilevel Programming
189
programming problems. Also, one implication from NP-hardness often used in theory is that such problems should be solved with approximation algorithms which, if possible, should be complemented by a bound on the accuracy of the computed solution. One example for such an approximation algorithm can be found in Marcotte (1986) but in general the description of such algorithms is a challenging task for future research.
References Anandalingam, G. and F'riesz, T. (eds.). (1992). Hierarchical Optimization. Annals of Operations Research, vol. 24. Anitescu, M. (2002). On solving mathematical programs with complementarity constraints as nonlinear programs. Technical Report No. ANLINCS-P864-1200, Department of Mathematics, University of Pittsburgh,. Audet, C., Hansen, P., Jaumard, B., and Savard, G. (1997). Links between linear bilevel and mixed 0-1 programming problems. Journal of Optimization Theory and Applications, 93:273-300. Bard, J.F. (1998). Practical Bilevel Optimixation: Algorithms and Applications. Kluwer Academic Publishers, Dordrecht. Clarke, F.H. (1983). Optimixation and Nonsmooth Analysis. John Wiley & Sons, New York. Dempe, S. (1992). A necessary and a sufficient optimality condition for bilevel programming problems. Optimization, 25:341-354. Dempe, S. (2002). Foundations of Bilevel Programming. Kluwer Academic Publishers, Dordrecht. Dempe, S. (2003). Annotated bibliography on bilevel programming and mathematical programs with equilibrium constraints. Optimization, 52:333-359. Dempe, S. and Kalashnikov, V. (2002). Discrete bilevel programming: Application to a gas shipper's problem. Preprint No. 2002-02, T U Bergakademie Freiberg, Fakultat fiir Mathematik und Informatik. Dempe, S. and ~allaschke,D. (1997). Quasidifferentiability of optimal solutions in parametric nonlinear optimization. Optimixation, 40:l--24. Deng, X. (1998). Complexity issues in bilevel linear programming. In: Multilevel Optimization: Algorithms and Applications (A. Migdalas,
190
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
P.M. Pardalos, and P. Varbrand, eds.), pp. 149-164, Kluwer Academic Publishers, Dordrecht . Fiacco, A.V. (1983). Introduction to Sensitivity and Stability Analysis in Nonlinear Programming. Academic Press, New York. Flegel, M.L. and Kanzow, C. (2002). Optimality conditions for mathematical programs with equilibrium constraints: Fritz John and Abadie-Type approaches. Report, Universitat Wurzburg, Germany. Flegel, M.L. and Kanzow, C. (2003). A Fritz John approach to first order optimality conditions for mathematical programs with equilibrium constraints. Optimization, 52:277-286. Fletcher, R. and Leyffer, S. (2002). Numerical experience with solving MPECs as NLPs. Numerical Analysis Report NA/210, Department sf Mathematics, University of Dundee, UK. Fletcher, R., Leyffer, S., Ralph, D., and Scholtes, S. (2002). Local Convergence of SQP Methods for Mathematical Programs with Equilibrium Constraints. Numerical Analysis Report NA/209, Department of Mathematics, University of Dundee, UK. Fliege, J. and Vicente, L.N. (2003). A Bicriteria Approach to Bilevel Optimization, Technical Report, Fachbereich Mathematik, Universitat Dortmund, Germany. Frangioni, A.(1995). ' o n a new class of bilevel programming problems and its use for reformulating mixed integer problems. European Journal of Operational Research, 82:615-646. Fukushima, M. and Pang, J.-S. (1999). Convergence of a smoothing continuation method for mathematical programs with complementarity constraints. In: Ill-posed Variational Problems and Regularization Techniques (M. Thera and R. Tichatschke, eds.). Lecture Notes in Economics and Mathematical Systems, No. 477, Springer-Verlag, Berlin. Fulop., J . (1993). On the Equivalence between a Linear Bilevel Programming Problem and Linear Optimization over the Efficient Set. Working Paper, No. W P 93-1, Laboratory of Operations Research and Decision Systems, Computer and Automation Institute, Hungarian Academy of Sciences. Geiger, C. and Kanzow, C. (2003). Theorie und Numerik restrzngierter Optimierungsaufgaben. Springer-Verlag, Berlin.
6 Bilevel Programming
191
Guddat, J., Guerra Vasquez, F., and Jongen, H.Th. (1990). Parametric Optimization: Singularities, Pathfollowing and Jumps. John Wiley & Sons, Chichester and B.G. Teubner, Stuttgart. Hansen, P., Jaumard, B., and Savard, G. (1992). New branch-and-bound rules for linear bilevel programming. SIAM Journal on Scientific and Statistical Computing, 13:1194-1217. Harker, P.T. and Pang, J.-S. (1988). Existence of optimal solutions to mathematical programs with equilibrium constraints. Operations Research Letters, 7:61-64. Haurie, A., Savard, G., and White, D. (1990). A note on: An efficient point algorithm for a linear two-stage optimization problem. Operations Research, 38:553-555. Hirsch, M.W. (1994). Differential Topology. Springer-Verlag, Berlin. Klatte, D. and Kummer, B. (2002). Nonsmooth Equations in Optimixation; Regularity, Calculus, Methods and Applications. Kluwer Academic Publishers, Dordrecht. Kojima, M. (1980). Strongly stable stationary solutions in nonlinear programs. In: Analysis and Computation of Fixed Points (S.M. Robinson, ed.), pp. 93-138, Academic Press, New York. Lignola, M.B. and Morgan, J. (1997). Stability of regularized bilevel programming problems. Journal of Optimixation Theory and Applications, 93:575-596. Loridan, P. and Morgan, J. (1989). New results on approximate solutions in two-level optimization. Optimixation, 20:819-836. Loridan, P. and Morgan, J. (1996). Weak via strong Stackelberg problem: New results. Journal of Global Optimization, 8:263-287. Lucchetti, R., Mignanego, F., and Pieri, G. (1987). Existence theorem of equilibrium points in Stackelberg games with constraints. Optimixation, 18:857-866. Luo, Z.-Q., Pang, J.-S., and Ralph, D. (1996). Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge. Macal, C.M. and Hurter, A.P. (1997). Dependence of bilevel mathematical programs on irrelevant constraints. Computers and Operations Research, 24:ll29-ll4O.
192
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Marcotte, P. (1986). Network design problem with congestion effects: A case of bilevel programming. Mathematical Programming, 34:142-162. Marcotte, P. and Savard, G. (1991). A note on the Pareto optimality of solutions to the linear bilevel programming problem. Computers and Operations Research, 18:355-359. Multilevel optimization: algorithms and applications. Nonconvex Optimixation and its Applications (Athanasios Migdalas, Panos M. Parda10s and Peter Varbrand, eds.), vol. 20, Kluwer Academic Publishers, Dordrecht . Mikhalevich, V.S., Gupal, A.M., and Norkin, V.I. (1987). Methods of Nonconvex Optimixation. Nauka, Moscow (in Russian). Mirrlees, J.A. (1999). The theory of moral hazard and unobservable bevaviour: Part I. Review of Economic Studies, 66:3-21. Moore, J. and Bard, J.F. (1990). The mixed integer linear bilevel programming problem. Operations Research, 38:911-921. Muu, L.D. (2000). On the construction of initial polyhedral convex set for optimization problems over the efficient set and bilevel linear programs. Vietnam Journal of Mathematics, 28:177-182. Outrata, J., KoEvara, M., and Zowe, J. (1998). Nonsmooth Approach to Optimization Problems with Equilibrium Constraints. Kluwer Academic Publishers, Dordrecht. Pardalos, P.M., Siskos, U.,and Zopounidis, C., eds. (1995). Advances in Multicriteria Analysis. Kluwer Academic Publishers, Dordrecht. Ralph, D. and Dempe, S. (1995). Directional derivatives of the solution of a parametric nonlinear program. Mathematical Programming, 70:159172. Scheel, H. and Scholtes, S. (2000). Mathematical programs with equilibrium constraints: stationarity, optimality, and sensitivity. Mathematics of Operations Research, 25:l-22. Scholtes, S. and Stohr, M. (2001). How stringent is the linear independence assumption for mathematical programs with stationarity constraints? Mathematics of Operations Research, 262351-863. Schramm, H. (1989). Eine Kombznation von bundle- und trust-regionVerfahren xur Losung nichtdiflerenxierbarer Optimierungsprobleme, No. 30, Bayreuther Mathematische Schriften, Bayreuth,.
6 Bilevel Programming
193
Shapiro, A. (1988).Sensitivity analysis of nonlinear programs and differentiability properties of metric projections. SIAM Journal Control Optimization, 26:628-645. Vicente, L.N.,Savard, G., and Judice, J.J. (1996). The discrete linear bilevel programming problem. Journal of Optimization Theory and Applications, 89:597-614. Vogel, S. (2002). Zwei-Ebenen-Optimierungsaufgaben mit nichtkonvexer Zielfunktion in der unteren Ebene: Pfadverfolgung und Spriinge. Ph. D thesis, Technische Universitat Bergakademie Freiberg. White, D.J. and Anandalingam, G. (1993). A penalty function approach for solving bi-level linear programs. Journal of Global Optimization, 3:397-419.
Chapter 7
APPLICATIONS OF GLOBAL OPTIMIZATION TO PORTFOLIO ANALYSIS Hiroshi Konno .bstract
1.
We will survey some of the recent successful applications of deterministic global optimization methods to financial problems. Problems to be discussed are mean-risk models under nonconvex transaction cost, minimal transaction unit constraints and cardinality constraints. Also, we will discuss several bond portfolio optimization problems, long term portfolio optimization problems and others. Problems to be discussed are concave/d.c. minimization problems, minimization of a nonconvex fractional function and a sum of several fractional functions over a polytope, optimization over a nonconvex efficient set and so on. Readers will find that a number of difficult global optimization problems have been solved in practice and that there is a big room for applications of global optimization methods in finance.
Introduction
The purpose of this paper is to review some of the recent successful applications of global optimization methodologies in portfolio theory. Portfolio theory was originated by H. Markowitz in 1952 and has since developed into diverse field of quantitative finance including market risk analysis, credit risk analysis, pricing of derivative securities, structured finance, securitization? real options and so on. Mathema,tical programming is widely used in these areas, but applications in market risk malysis are by far the most important. Also, it is virtually the only area in finance where global optimization methodologies have been applied in a successful way. The starting point of the portfolio theory is the mean-variance (MV) model (Konno and Watanabe, 1996) in which the risk measured by the
196
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
variance of the rate of return of portfolio is minimized subject to the constraint on the level of expected return. This problem is formulated as a convex quadratic programming problem. Though mathematically simple, it took more than 30 years before a large scale mean-variance model was solved in practice, due to the computational difficultly associated with handling a completely dense variance-covariance matrix. The breakthrough occurred in 1984, when Perold (1984) solved a large scale mean-variance problem using a factor model approach and sparse matrix technologies. Twenty years later, we are now able to solve a very large scale MV model consisting of over 10,000 variables on a personal computer. If we replace variance by absolute deviation as a measure of risk, then we can solve the resulting mean-absolute deviation (MAD) model (Konno and Yamazaki, 1991) even when there are more than a million variables since the problem is reduced to a linear programming problem. Both MV model and MAD model can be formulated as a convex minimization problem, so it has little to do with "global" optimization. However, when we extend the model one step further, we need to introduce a variety of nonconvex terms. These include, among others transaction cost, tax, market impact, minimal transaction unit constraints and cardinality constraints. Then we need to apply global optimization approach to solve the resulting nonconvex problems. By global optimization methods, we mean here deterministic algorithms as discussed in the textbook of Horst and Tuy (1996). Also, we concentrate on a class of exact algorithms, i.e., those which generate an optimal solution in the limit or &-optimalsolution in finitely many step. There are still relatively few successful applications of global optimization to finance. Reasons are two-folds. First, deterministic global optimization is a rather new area. In fact, deterministic and exact algorithms are neglected in a survey paper of Rinnooy-Kan and Timmer (1989) appeared in 1989. This means that solving a non-convex problem in a deterministic way has been considered intractable until mid 1980's unless the problem has some special structure, such as concave minimization on an acyclic network (Zangwill, 1968). Heuristic and multi-start local search methods were the only practical methods for handling nonconvex problems without special structures. Therefore, most financial engineers are not aware of recent progress in global optimization and thus try to formulate the problem within the framework of convex minimization or simply apply local search or heuristic approach.
7 Applications of Global Optimization to Portfolio Analysis
197
Second, global optimizers are more interested in applications in physical problems. It appears that there is still psychological barrier for mathematical programmers to do research in dual (monetary) space. In the next two sections, we discuss applications of global optimization to mean-risk models. A variety of nonconvex problems have been solved successfully by employing mean-absolute deviation framework. Section 4 will be devoted to applications of fractional programming methods to bond portfolio analysis. Here we discuss the minimization of the sum of linear fractional functions and the ratio of two convex functions over a polytope. Section 5 will be devoted to miscellaneous applications of global optimization in finance such as minimization over an efficient set, long-term constant proportion portfolio problem, long-short portfolio and problems including integer constraints. Readers are referred to a recent survey on the applications of mathematical programming to finance by Mulvey (2001), a leading expert of both mathematical programming and financial engineering.
2.
Mean-risk models
In the following, we will present some of the basics of the mean risk models. Let there be n assets Sj, j = 1 , 2 , .. . ,n and let R j be the random variables representing the rate of return of Sj . Let x j 1 0 be the proportion of the fund to be invested into Sj. The vector x = (xl, xz, . . . ,x,) is called a portfolio, which has to satisfy the following condition. n
Let R(x) be the rate of return of the portfolio:
and let r(x) and v(x) be, respectively the mean and the variance of R(x). Then the mean-variance (MV) model is represented as follows. minimize subject to
v(x) r(x) 2 p (7.3) (MV) x E X, where X E Rn is an investable set defined by (7.1). Also, it may contain additional linear constraints. And p is a constant to be specified by an investor.
198
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Let x(p) be an optimal solution of the problem (7.3). Then the trais called an efficient frontier. jectory of r (x(p)), There are two alternative representations of the mean variance model, namely (MV2)
minimize subject to
r(x) v(x) 5 a2 x € X,
(7.4)
(MV3)
maximize subject to
r ( x ) - Xv(x) x E X.
(7.5)
All three representations are used interchangeably since they generate the same efficient frontier as we vary p in (MV), a in (MV2) and X 2 0 in (MV3). There are several measures of risk other than variance (standard deviation) such as absolute deviation, lower semi-variance, (lower-semi) partial moments, below-target risk, value-at-risk (VaR) , conditional valueat-risk (CVaR) . Most of these except VaR are convex functions of x. Mean-risk models are denoted as either one of (7.3)-(7.5), where variance v(x) is replaced by one of the risks introduced above. However, following three risk measures are by far the most important from the computational point of view when we extend the model into the direction of global optimization. w ( x ) = E [IW- E[R(x)II] Lower semi-sbsolute deviation W- (x) = E [I R(X) - E [ ~ ( x ) ] Below target risk of degree one BTl (x) = E [I R(x) - T 1-1 (T is a constant) Absolute deviation
1-1
programming problem. since the associated mean-risk model can be formulated as a linear programing problem when (R1,R2, . . . R,) is distributed over a set of finitely many points (rlt, rzt,. . . rnt),t = 1 , 2 , . . . , T and
are known. For example, the mean-absolute deviation model minimize subject to
W (x) r(x) 2 p x € X
7 Applications of Global Optimization t o Portfolio Analysis
can be represented as follows:
where r j = c= :, ftrjt. It is straightforward to see that the problem can be converted to a linear programmming problem
II
minimize
CTZlft(st+ $Jt)
subject to
~ t - $ J t = C ~ = l ( r j t - ~ j ) ~t j=, 1 , 2 ,...,T st20,
$Jt>O,
t = 1 , 2 ,...,T
(7.9)
Also, CVaR,(x) defined by the lower a quantile of R(x): (7.10) E [- R(X) IR(X) I VaR, (x)], 1-a shares the same property as the above three measures (Rockafellar and Uryasev, 2001). CVaR, (x) =
3.
Mean-risk models under market friction
Markowitz formulated the mean-variance model assuming there is no friction in the market. However, nonlinear transaction fee and tax are associated with selling and/or buying assets. Also, we experience the socalled market impact effect when we buy a large amount of assets. The unit price of the asset may increase due to the supply-demand relation and thus the actual return would be substantially smaller than those in the ideal frictionless market. Also, we often need to handle discrete variables. Among such examples are minimal transaction unit constraints and cardinality constraints. The former is associated with the existence of minimal unit one can trade in the market, usually 1000 stocks in Tokyo Stock Exchange. The latter is associated with investors who do not want to hold too many assets, when one has to impose a condition on the maximal number of assets in the portfolio.
3.1
Transaction cost
There are two common types of transaction cost, i.e., piecewise linear concave and piecewise constant as depicted by Figure 7.1.
200
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
(a) piecewise linear concave
(b) piecewise constant
Figure 7.1. Transaction cost function.
Transaction cost is usually relatively large when the amount of transaction is smaller and it increases gradually with small rate, hence concave (Figure 7.l(a)). An alternative is a piecewise constant function as denoted by (Figure 7.l(b)), which is very popular in e-trade system. It is well known that these types of transaction cost functions can be represented in a linear form by introducing 0-1 variables. The number of 0-1 variables are equal to the number of linear pieces (or steps). Therefore, we need to introduce around 8 to 10 times n zero-one variables, so that it is out of the scope of the state-of-the-are integer programming softwares when n is over 1000.
3.2
Market impact cost
The unit price of the asset will sharply increase when we purchase assets beyond some bound, which induces additional transaction cost. One typical cost subject to market impact is depicted in Figure 7.2, which is a d.c. function.
Figure 7.2. Market impact.
7 Applications of Global Optimization to Portfolio Analysis
201
Mean-absolute deviation model under concave and d.c. transaction cost c(x) : maximize r(x) - c(x) (7.11) subject to W(x) 5 w x E X, has been successfully solved by a branch and bound algorithm proposed by Phong et al. (See Phong et al., 1995, for details). The mean-absolute deviation model under transaction cost (7.10) can then be reformulated as a linearly constrained non-concave maximization problem: maximize subject to
As reported in Konno and VC7ijayanayake (1999), the problem can be solved in a few seconds on a personal computer when T 5 60 and n 5 500. In fact, the branch and bound algorithm below can generate an optimal solution much faster than the state-of-the-art integer programming software CPLEX applied to a 0-1 integer programming reformulation of the same problem (Konno and Yamamoto, 2003). Similar algorithms have been applied to a number of portfolio optimization problems under nonconvex transaction cost, including index tracking (Konno and Wijayanayake, 2001b), portfolio rebalance (Konno and Yamamoto, 2001), and long-short portfolio optimization (Konno et al., 2005). Further, this algorithm has been extended to portfolio optimization under market impact (), where the cost function becomes a d.c. function as depicted in Figure konno:fig2. Let us note that the MV model under nonconvex transaction cost still remains intractable from the computational point of view, since we need to handle a, large scale 0-1 quadratic programming problem.
3.3
Branch and bound algorithm
We will present here the branch and bound algorithm Konno and Wijayanayake (1999) used for solving linearly constrained separable concave minimization problem introduced above.
202
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Let F be the set of (x,4, $) E R~~2T satisfying the constraints of problem (7.11)except the lower and apper bound constraints on xi's.
Branch and bound algorithm.
2" If
r = 4,then goto 9;Otherwise goto 3.
E I?: 3' Choose a problem (Pk)
maximize subject to
f (x)= C>l {rjxj- cj(xi)) (x,4,$) E F Pk 5 x ak.
4" Let $(xi) be a linear understimating function of ci(xi)over the inter(j= 1,2,., . ,n)and define a linear programming val pk x ak, problem
< <
is infeasible then go to 2. Otherwise let xk be an optimal If (Qk) solution of (Qk)
If Igk(xk)- f (xk))l> E then goto 8. Otherwise let fk = f (xk). 5" If fk < f" then goto 7;Otherwise goto 6.
6" If
f" = fk;k = kk and eliminate all the subproblems (Pi) for which gt(xt)l
f".
7" If gk(xk)5 f" then goto 2. Otherwise goto 8. k ) lj = 1,2,. . . , n) , 8" Let c,(x:) - c:(x:) = max{cj (x?) - c,k (xi
and define two subproblems:
7 Applications of Global Optimization to Portfolio Analysis
r = r U {fl+l,f i + z ) ,
k =k
+ 1 and goto 3.
9" Stop: 2 is an &-optimalsolution of (Po). THEOREM7.1 2 converges to a n E-optimal solution of (Po) as k
Proof. See Thach et al. (1996).
-+
oo. 0
REMARK 2 Branching strategy using x: as a subdivision point is called w-subdevision strategy. A number of numerical experiments show that this strategy is usually superior to standard bisection, where the midpoint of the interval is chosen as a subdivision point.
3.4
Integer constraints
Associated with portfolio construction is a minimal unit we can purchase, usually 1,000 stocks in the Tokyo Stock Exchange. When the amount of fund is lagrer, then we can ignore this constraint and round the calculated portfolio to the nearest integer multiple of minimal transaction unit. The resulting portfolio exhibits almost the same risk-return structure. When however, the amount of fund is smaller, as in the case of the individual investor, simple rounding may significantly disport the portfolio, particular by when the amount of fund is small. It is reported in that we can properly handle these constraints by slightly modifying the branching strategy (Step 8) of the branch and bound algorithm. Also, the state of the art integer programming software can handle these integer constraints if the problem is formulated in the framework of mean-absolute deviation model (Konno and Yamamoto, 2003).
4.
Applications of fractional programming
Fractimal programming started in 1961, when Charnes and Cooper (1962) showed that the ratio of two nonnegative affine functions is quasiconvex and thus can be minimized over linear constraints by a variant of simplex method. Also, Dinkelbach (1967) showed that the ratio of a nonnegative convex and concave functions over a convex set can be minimized by solving a series of convex minimization problems. The sum of linear fractional functions is no longer quasi-convex, so that it cannot be minimized by convex minimization methodologies. Also, the ratio of two convex functions cannot be minimized by Dinkelbach's method. Minimizing the sum of linear ratios and minimizing general fractional functions is therefore the subject of a global optimization which is now under intestive study,
204
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Associated with bonds are several alternative measures of return and risk. Among the popular return measures are average of direct yield, terminal yield and maturity, all of which are represented as a linear fractional function of a portfolio x. A typical bond portfolio optimization problem is to maximize one of these linear fractional functions over a linear system of equalities and inequalities, which can be solved standard methods. Another problem associated with bond portfolio (Konno and Watanabe, 1996) is qf x+q1o q;x+qzo maximize P ~ , Z + P-~P:X+PZO O subject to
Alx
+ Azy 5 b
I
220, Y20 where x E Rnl, y E Rn2 are, respectively the amount of assets to be added and subtracted from the portfolio. A number of algorithms have been proposed for this problem, among which is a parametric simplex alogorithm (Konno and Watanabe, 1996) seems to be the most efficient. The first step of this algorithm is to define w = l/(pt,x +p20),
X = wz,
Y = wy
and convert the problem (7.13) as follows: tX+910w - (q:Y
maximize
+420~)
+ A2Y - bw 5 0 p i x + p20w = 1
subject to
A1X
XLO,
(7.14)
wLO.
Let (X*,Y*, w*) be an optimal solution of (7.14). Then (x*,y*) = (X*/w*,Y*/w*) is an optimal solution of (7.13). The problem (7.14) is equivalent to 1
t
maximize
?(qlX
subject to
AIX
+ plow) - (qiY + qmw)
+ A2Y - bw I 0
p i x fp20w = 1 P;X
+ piow = F
x>o, lmin
WLO
I I I Emax
7 Applications of Global Optimization to Portfolio Analysis
205
are respectivily, the maximal and minimal value of where I,, and tmin pi X plow in the feasible region. Let us note that this problem can be efficiently solved by primalldual parametric simplex algorithm. Optimization of the weighted sum of objectives leads to a maximization of sum of ratios over a polytope. An efficient branch and bound algorithm using well designed convex under estimating function can now solve the problem with the number of fractional terms up to 15 (Konno, 2001). Another fractional problem is the maximal predictability portfolio problem proposed by Lo and MacKinlay (1997) and solved by Gotoh and Konno (2001). maximize
+
subject to
x E X,
where both P and Q are positive definite and X is a polyhedral set. If P is negative semi-definite and Q is positive definite, then the problem can be solved by Dinkelbach's approach (Dinkelbach, 1967). Let us define a function
for X > 0 and let x(X) be the maximal x correspanding to g(X). Let A* be such that g(X*) = 0. Then it is easy to see (Gotoh and Konno, 2001) that x(X*) is an optimal solution of (7.16) for general P and Q. Problem defining g(X) is a convex maximization problem when P is positive semi-definite, which can be solved by a branch and bound algorithm when n is small. Also the zero point of g(X) can be found by bisection or other search methods. It has been demonstrated in Gotoh and Konno (2001) that the problem can be solved fast when n is less than 20.
5.
Miscellaneous applications
In the section, we will discuss additonal important applications of global optimization in finance.
5.1
Optimization over an efficient set
Let us consider a class of multiple objective optimization problems P j , j = 1 , 2,..., k . maximize subject to
c$x x EX
(7.17)
206
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
A feasible solution x* E X is called efficient when there exists no x E X such that t (7.18) cjx 2 cjx*, j = 1 , 2 , . . . , lc, and strict inequality holds for at least one j. The set XE of efficient solutions is called an efficient set. Let consider another objective function f o ( + )and consider maximize subject to
f0 (x) x E XE
(7.
This is a typical global optimization problem since XE is a nonconvex set. A number of algorithms have been proposed when X is polyhedral and fo is convex (Yamamoto, 2002). In particular, when fo is linear there exists a finitely convergent algorithm. Multiple objective optimization problems appear in bond portfolio analysis as explained in Section 3. Fortunately, the number of objectives is usually single digit, usually less than 5. It has been shown in Thach et al. (1996) that the problem of finding a portfolio on the efficient frontier such that the piecewise linear transaction cost associated with rebalancing a portfolio from the current portfolio xO n.
is minimal can be solved by dual reformulation of the original problem. The problem with up to k = 5 and up to 100 variables can be solved within a practical amount of computation time. Also a minimal cost rebalancing problem with the objective function
where c(.) is a piecewise linear concave function and XE is a meanabsolute deviation efficient frontier, can be a solved by a branch and bound algorithm by noting that XE consists of a number of linear pieces (Konno and Yamamoto, 2001). The problem can be reduced to a series of linearly constrained concave minimization problem which can be solved by a branch and bound algorit.hm of Phong et al. (1995).
5.2
Long term port folio opt imization by constant rebalance
Constant rebalance is one of very popular methods for long term portfolio management, where one sells those assets whose price is higher and
7 Applications of Global Optimization to Portfolio Analysis
207
purchases those assets whose price is lower and thus keep the proportion of the weight of the portfolio constant. Given the expected return in each period, the mean variance model (7.5) over the planning horizon T becomes a minimization of a highly nonconvex polynomial function over a polytope : 2
maximize
subject to
f ~ ( x= )
c:'~ fa {nLl(c:=~ (1 + r;Jxj)} 2 - (EL1 fs n L c>,cl+ r;,)xj) - CLf nL1(c:==,(l+ r;,)~,)
Cy==lx j = 1, 0
(7.20)
< x j < aj, j = 1 , 2 , .. . ,n,
where
f, the probability of the scenario s ,
rjt rate of return of asste j during period t under secnario s. Maranas et al. (1997) applied a branch and bound algorithm similar to the one explained in Section 3.3 using
as an underestimator of fx(x) over the hyper-rectangle [pk,a k ] . When y is large enough, gx(x: y) is a convex function of x. Also
where
S = max{ajk - pjk I j
= 1,2,
. . . ,n}.
Hence gx(x: y) is a good approximation of fx(x) when 6 is small enough. It is shown in Maranas et al. (1997) that this algorithm can solve problems of size up to (n, T, s) = (9,20,100).
5.3
Optimization of a long-short portfolio
Long- short portfolio where one is allowed to sell assets short is a very popular fund managernent strategy among hedge funds. The resulting optimization problem looks to be an easy concave maximization problem without sign constraints on the weights of portfolio. However, it is really not. First, the fund manager has to pay deposit in addition to the transaction cost. Also he is not supposed to leave cash unused. Then, the cash out of short sale is reserved at the the third party who lends the asset.
208
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
As a result, the investable set of the mean-variance model [6] becomes a non-convex set
Also, the objective function contains a non-convex transaction cost. Therefere the problem becames a maximization of a non-concave objective function over a non-convex region. This seemingly very difficult problem has been successfully solved (Konno et al., 2005) by extending the branch and bound algorithm of Section 3.3.
Ackowledgements This research was supported a part by the Grantin-Aid for Scientific Reseach of the Mimistry of Education, Science, Culture and Sports B(2) 15310122 and 15656025. Also, the author acknowledges the generous support of the Hitachi Corporation.
References Charnes, A. and Cooper, W.W. (1962). Programming with linear fractional functionys. Naval Reseach Logistics Quanterly, 9: 181-186. Dinkelbach, D. (1967). On nonlinear fractional programming. Management Science, 13:492-498. Gotoh, J. and Konno, H. (2001). Maximization of the ratio of two convex quadratic fructions over a polytope. Computational Optimization and Apllication, 20:43-60. Horst, R. and Tuy, H. (1996). Global Optimixation: Deterministic Approaches. 3rd edition. Springer Verlag. Konno, K. (2001). Minimization of the sum of several linear fractional functions. In: N. Hadjisavvas (ed.), Advances in Global Optimixation, pp. 3-20. Springer-Verlag. Konno, H., Koshizuka, T. , and Yamamoto, R. (2005). Optimization of a long-short portfolio under nonconvex transaction cost. Forthcoming in Dynamics of Continuous, Discrete and Im,pulsive Systems. Konno, H., Thach, P.T., and Tuy, H. (1997). Optimization on Low Rank Nonconvex Structures. Kluwer Academic Publishers.
7 Applications of Global Optimization to Portfolio Analysis
209
Konno, H., and Watanabe, H. (1996). Nonconvex bond portfolio optimization problems and their applications to index tracking. Journal of the Operetions Research Society of Japan, 39:295-306. Konno, H. and Wijayanayake, A. (1999). Mean-absolute deviation portfolio optimization model under transaction costs. Journal of the Operations Research Society of Japan, 42:422-435. Konno, H. and Wijayanayake, A. (2000). Portfolio optimization problems under d.c. transaction costs and minimal transaction unit constraints. Journal of Global Optimization, 22:137-154. Konno, H. and Wijayanayake, A. (2001a). Optimal rebalancing under concave transaction costs and minimal transaction units constraints. Mathematical Programming, 89:233-250. Konno, H. and Wijayanayake, A. (2001b). Minimal cost index tracking under concave transaction costs. International Journal of Theoretical and Applied Finance, 4:939-957. Konno, H. and Yamamoto, R. (2001). Minimal concave cost rebalance to the efficient frontier. MathematicalProgramming, B89:233-250. Konno, H. and Yamamoto, R. (2003). Global Optimixation us. Integer Programming in Portfolio Optimization Under Nonconvex Transaction Cost. Working paper, ISE 03-07, Department of Industrial and Systems Engineering, Chuo University. Konno, H. and Yamazaki, H. (1991). Mean-absolute deviation portfolio optimization model and its applications to Tokyo stock market. Management Science, 37:519-531. Lo, A. and MacKinlay, C. (1997). Maximizing predictablity in stock and bond markets. Microeconomic Dynamics, 1:102-134. Maranas, C., Androulakis, I., Berger, A., Floudas, C.A., and Mulvey, J.M. (1997). Solving tochastic control problems in finance via global optimization, Journal of Economic Dynamics and Control. 21:14051425. Markowitz, H. (1959). Portfolio Selection; Eficient Diversification of Investment. John Wiley & Sons. Mulvey, J.M. (2001). Introduction to financial optimization: Mathematical programming special issue. Mathmatical Programming, B89:205216.
210
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Perold, A. (1984). Large scale portfolio optimization. Management Science, 30:1143-1160. Phong, T.Q., An, L.T.H., and Tao, P.D. (1995). On globally solving linearly constrained indefinite quadratic minimization problem by decomposition branch and bound method. Operations Research Letters, 17:215-220. Rinnooy-Kan, A.H. and Timmer, G.T. (1989). Global optimization, In: Nemhauser, G.L. et al. (eds.), Handbooks in Operations Research and Management Science, vol. 1, Chapter 9. Elsevier Science Publishers, B.V. Rockafellar, R.T. and Uryasev, S. (2001). Optimization of conditional value-at-risk. Journal of Risk, 2:21-41. Thach, P.T., Konno, H., and Yokota, D. (1996). A dual approach to a mimimization on the set of Pareto-optimal solutions. Journal of Optimization Theory and Applications, 88:689-707. Tuy, H. (1998). Convex Analysis and Global Optimixation. Kluwer Academic Publishers, Dordrecht. Yamamoto, Y. (2002). Optimization over the efficient set: Overview. Journal of Global Optimixation, 22:285-317. Zangwill, W. (1968). Minimun concave cost flows in certain networks. Management Science, 14:429-450.
Chapter 8
OPTIMIZATION TECHNIQUES IN MEDICINE Panos M. Pardalos Vladimir L. Boginski Oleg Alexan Prokopyev Wichai Suharitdamrong Paul R. Carney Wanpracha Chaovalitwongse Alkis Vazacopoulos Abstract
1.
We give a brief overview of a rapidly emerging interdisciplinary research area of optimization techniques in medicine. Applying optimization approaches proved to be successful in various medical applications. We identify the main research directions and describe several important problems arising in this area, including disease diagnosis, risk prediction, treatment planning, etc.
Introduction
In recent years, there has been a dramatic increase in the application of optimization techniques to the study of medical problems and the delivery of health care. This is in large part due to contributions in three fields: the development of more efficient and effective methods for solving large-scale optimization problems (operations research), the increase in computing power (computer science), and the development of more sophisticated treatment methods (medicine). The contributions of the three fields come together since the full potential of the new treatment methods often cannot be realized without the help of quantitative models and ways to solve them. Applying optimization techniques proved to be effective in various medical applications, including disease diagnosis, risk prediction, treatment planning, imaging, etc. The success of these approaches is par-
212
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
ticulary motivated by the technological advances in the development of medical equipment, which has made possible to obtain large datasets of various origin that can provide useful information in medical applications. Utilizing these datasets for the improvement of medical diagnosis and treatment is the task of crucial importance, and the fundamental problems arising here are to find appropriate models and algorithms to process these datasets, extract useful information from them, and use this information in medical practice. One of the directions in this research field is associated with applying data mining techniques to the rnedical data. This approach is especially useful in the diagnosis of disease cases utilizing the datasets of historical observations of various characteristics of different patients. Standard mathematical programming approaches allow one to formulate the diagnosis problems as optimization models. In addition to diagnosis, optimization techniques are successfully applied to treatment planning problems, which deal with the development of the optimal strategy of applying a certain therapy to a patient. An important aspect of these problems is the identification and efficient control of various risk factors arising in the treatment process. These risk management problems can be addressed using optimization methods. There are numerous other application areas of optimization techniques in medicine, that are widely discussed in the literature (Pardalos and Principe, 2002; Sainfort et al., 2004; Du et al., 1999; Pardalos et al., 2004b; Cho et al., 1993). This chapter reviews the main directions of optimization research in medical domain. The remainder of the chapter is organized as follows. In Section 2 we present several examples of applying optimization techniques to diagnosis and prediction in medical applications: diagnosis of breast cancer, risk prediction by logical analysis of data, human brain dynamics and epileptic seizure prediction. Section 3 discusses treatment planning procedures using the example of the radiotherapy planning. In the next two sections we give a brief review of optimization problems in medical imaging and health care applications. Finally, Section 6 concludes the discussion.
2.
Diagnosis and prediction
Diagnosis and prediction are among the most fundamental problems in medicine, which play a crucial role in the successful treatment process. In this section, we present several illustrative examples of applying optimization techniques to these problems.
8
Optimization Techniques i n Medicine
2.1
213
Disease diagnosis and prediction as data mining applications
In a common setup of the disease diagnosis problem, one possesses a historical dataset of disease cases (corresponding to different patients) represented by several known parameters (e.g., the patient's blood pressure, temperature, size of a tumor, e t ~ . ) .For all elements (patients) in this dataset, the actua,l disease diagnosis outcome is known. A natural way to diagnose new patients is to utilize the available dataset with known diagnosis results (so-called training dataset) for constructing a mathematical model that would classify disease cases with unknown diagnosis outcomes based on the known information. In the data mining framework, this problem is referred to as classification, which is one of the major types of problems in predictive modeling, i.e., predicting a certain attribute of an element in a dataset based on the known information about its other attributes (or features). Due to the availability of a training da.taset, these problems are also associated with the term "supervised learning." To give a formal introduction to classification, suppose that we have a dataset of N elements, and each of these elements has a finite number of certain attributes. Denote the number of attributes as n. Then every element of the given dataset can be represented as a pair (xi, yi), i = 1,. . . ,N , where xi E Rn is an n-dimensional vector:
and yi is the class attribute. The value of yi defines to which class a given element belongs, and this value is known a priori for each element of the initial dataset. It should be also mentioned that in this case yi can take integer values, and the number of these values (i.e., the number of classes) is pre-defined. Now suppose that a new element with the known attributes vector x, but unknown class attribute y, is added to the dataset. As it was mentioned above, the essence of classification problems is to predict the unknown value of y. This is accomplished by identifying a criterion of placing the element into a certain class based on the information about the known attributes x of this element. The important question arising here is how to create a formal model that would take the available dataset as the input and perform the classification procedure. The main idea of the approaches developed in this field is to adjust (or, "train") the parameters of the classification model using the existing information about the elements in the available training dataset and
214
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
then apply this model to classifying new elements. This task can often be reduced to solving an optimization problem (in particular, linear programming) of firiding optimal values of the parameters of a classification model. One of the techniques widely used in practice deals with the geometrical approach. Recall that since all the data elements can be represented as n-dimensional vectors (or points in the n-dimensional space), then these elements can be separated geometrically by constructing the surfaces that serve as the "borders" between different groups of points. One of the common approaches is to use linear surfaces (planes) for this purpose, however, different types of nonlinear (e.g., quadratic) separating surfaces can be considered in certain applications. It is also important to note that usually it is not possible to find a surface that would "perfectly" separate the points according to the value of some attribute, i.e., points with different values of the given attribute may not necessarily lie at the different sides of the surface, however, in general, the number of such errors should be small enough. So, according to this approach, the classification problem is represented as the problem of finding geometrical parameters of the separating surface(s). These parameters can be found by solving the optimization problem of minimizing the misclassification error for the elements in the training dataset (so-called "in-sample error"). After determining these parameters, every new data element will be automatically assigned to a certain class, according to its geometrical location in the elements space. The procedure of using the existing dataset for classifying new elements is often called "training the classifier." It means that the parameters of separating surfaces are "tuned" (or, "trained") to fit the attributes of the existing elements to minimize the number of errors in their classification. However, a crucial issue in this procedure is not to "overtrain" the model, so that it would have enough flexibility to classify new elements, which is the primal purpose of constructing the classifier. As an illustrative example of applying optimization techniques for classification of disease cases, we briefly describe one of the first practical applications of mathematical programming in classification problems developed by Mangasarian et al. (1995). This study deals with the diagnosis of breast cancer cases. The essence of the breast, cancer diagnosis system developed in Mangasarian et al. (1995) is as follows. The authors considered the dataset consisting of 569 30-dimensional feature vector corresponding to each patient. Each case could be classified as malignant or benign, and the actual diagnosis was known for all the elements in the dataset. These 569 elements were used for "training" the classifier, which was developed
8
215
Optimization Techniques in Medicine
based on linear programming (LP) techniques. The procedure of constructing this classifier is relatively simple. The vectors corresponding to malignant and benign cases are stored in two matrices. The matrix A (m x n) contains m malignant vectors (n is the dimension of each vector), and the matrix B (Ic x n) represents Ic benign cases. The goal of the constructed model is to find a plane which would separate all the vectors (points in the n-dimensional space) in A from the vectors in B. If a plane is defined by the standard equation
where w = (wl, . . . , w , ) ~ is an n-dimensional vector of real numbers, and y is a scalar, then this plane will separate all the elements from A and B if the following conditions are satisfied:
Here e = ( I l l , .. . , l)Tis the vector of ones with appropriate dimension (m for the matrix A and k for the matrix B). However, as it was pointed out above, in practice it is usually impossible to perfectly separate two sets of elements by a plane. So, one should try to minimize the average measure of misclassifications, i.e., in the case when the constraints (8.1) are violated the average sum of violations should be as small as possible. The violations of these constraints are modeled by introducing nonnegative variables u and v as follows:
Now we are ready to write down the optimization model that will minimize the total average measure of misclassification errors as follows: m
k
C + I cC vj .
1 min ui W,Y;U,V m i=l
1
-
3=1
subject to Aw+u2ey+e Bw-vsey-e u20, v20. As one can see, this is a linear programming problem, and the decision variables here are the geometrical parameters of the separating plane w and y, as well as the variables representing misclassification error u and v. Although in many cases these problems may involve high
216
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
dimensionality of data, they can be efficiently solved by available LP solvers, for instance Xpress-MP or CPLEX. The misclassification error that is minimized here is usually referred to as the in-sample error, since it is measured for the training sample dataset. Note that if the in-sample error is unacceptably high, the classifying procedure can be repeated for each of the subsets of elements in the halfspaces generated by the separating plane. As a result of such a procedure several planes dividing the elements space into subspaces will be created, which is illustrated by Figure 8.1. Then every new element will be classified according to its location in a certain subspace. If we consider the case of only one separating plane, then after solving the above problem, each new cancer case is automatically classified into either malignant or benign class as follows: if the vector x corresponding to this case satisfies the condition xTw > 7 it is considered to be malignant, otherwise it is assumed to be benign. It is important to mention that although the approach described here is rather simple, its idea can be generalized for the case of multiple classes and multiple nonlinear separating surfaces. Another issue associated with the technique considered in this section is so-called overtraining the classijier, which can happen if the training sample is too large. In this case, the model can adjust to the training dataset too much, and it would not have enough flexibility to classify the unknown elements, which will increase the generalization (or, "outof-sample") error. In Mangasarian et al. (1995), the authors indicate that even one separating plane can be an overtrained classifier if the number of attributes in each vector is too large. They point out that the best out-of-sample results were achieved when only three attributes of each vector were taken into account, and one separating plane was used. These arguments lead to introducing the following concepts closely related to classification: feature selection and support vector machines (SVMs). A review of these and other optimization approaches in data mining is given in Bradley et al. (1999). The main idea of feature selection is choosing a minimal number of attributes (i.e., components of the vector x corresponding to a data element) that are used in the construction of separating surfaces (Bradley et al., 1998). This procedure is often important in practice, since it may produce a better classification in the sense of the out-of-sample error. The essence of support vector machines is to construct separating surfaces that will minimize the upper bound on the out-of-sample error. In the case of one linear surface (plane) separating the elements from two classes, this approach will choose the plane that maximizes the sum of the distances between the plane and the closest elements from each class,
8
Optimization Techniques in Medicine
Figure 8.1. An example of binary classification using linear separating surfaces
i.e., the "gap" between the elements from different classes (Burges, 1998; Vapnik, 1995). An application of support vector machines to breast cancer diagnoses is discussed in Lee et al. (2000).
2.2
Risk Prediction by logical analysis
Risk stratification is very common in medical practice. It is defined as the ability to predict undesirable outcomes by assessing patients using the available data: age, gender, health history, specific measurements like EEG, ECG, heart rate, etc. (Califf et al., 1996). The usefulness of any risk-stratification scheme arises from how it links the data to a specific outcome. Risk-stratification systems are usually based on some standard statistical models (Hosmer and Lemeshow, 1989). Recently a new methodology for risk prediction in medical applications using Logical Analysis of Data (LAD) was proposed (Alexe et al., 2003). The LAD technique was first introduced in Hammer (1986). This methodology is based on combinatorial optimization and Boolean logic. It was successfully applied for knowledge discovery and pattern recognition not only in medicine, but in oil exploration, seismology, finance, etc. (Boros et al., 2000). Next, we briefly describe the main idea of LAD. More detailed information about this approach can be found in Boros et al. (1997), Ekin et al. (2000), and Alexe et al. (2003). Let R c Rn be a set of observations. By R+ and R- we denote subsets of positive and negative observations respectively. We also need to define the notion of a pattern P:
218
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where ai (i E I) and ,Oj ( j E J) are sets of real numbers (so-called cutpoints), I and J are sets of indices. A pattern P is positive (negative) if P n Rf # 0 (P n 0- # 0 ) and P fl R- = 0 (P n RS = 0 ) . Obviously, in a general case for real-life applications the number of detected patterns can be extremely large. Patterns are characterized by three parameters: degree, prevalence and risk. The degree of a pattern is the number of inequalities, which identify the pattern in (8.7). The total number of observations in the pattern P is called its absolute prevalence. The relative prevalence of a pattern P is defined as the ratio of its absolute prevalence and 101. The risk p p of a pattern P identifies the proportion of positive observations in the pattern IP n R+I PP =
pnnl
'
Introducing some thresholds on degree, prevalence, and risk we can identify the high-risk and low-risk patterns in the given datasets. Set C = CS U C- is called a pandect, where CS (C-) is a set of high-risk (low-risk) patterns. An application of LAD to coronary risk prediction is presented in Alexe et al. (2003), where the problem of constructing a methodology for distinguishing groups of patients at high and at low mortality risk is addressed. The size of the pandect C in the considered problem was about 4700 low- and high-risk patterns, which is obviously too large for practical applications. To overcome this difficulty, a nonredundant system of low- and high-risk patterns T = TS U T- was obtained. This system satisfies the following properties:
Using the system T defined above the following classification tool referred to as the Prognostic Index ~ ( x is) defined as
where T+ (7-) is the number of high-risk (low-risk) patterns in T , and rf (x) (7-(2)) is the number of high-risk (low-risk) patterns, which are
8
Optimization Techniques in Medicine
219
satisfied by an observation x. Using ~ ( x )an , observation x is classified to low- or high-risk depending on the sign of ~ ( x )The . number of patients classified into the high- and low-risk groups was more than 97% of the size of the studied population. The proposed technique was shown to outperform standard methods used by cardiologists (Alexe et al., 2003).
2.3
Brain dynamics and epileptic seizure prediction
The human brain is one of the most complex systems ever studied by scientists. Enormous number of neurons and the dynamic nature of connections between them makes the analysis of brain function especially challenging. Probably the most important direction in studying the brain is treating disorders of the central nervous system. For instance, epilepsy is a common form of such disorders, which affects approximately 1% of the human population. Essentially, epileptic seizures represent excessive and hypersynchronous activity of the neurons in the cerebral cortex. During the last several years, significant progress in the field of epileptic seizures prediction has been made. The advances are associated with the extensive use of electroencephalograms (EEG) which can be treated as a quantitative representation of the brain functioning. Motivated by the fact that the complexity and variability of the epileptic seizure process in the human brain cannot be captured by traditional methods used to process physiological signals, in the late 1980s, Iasemidis and coworkers pioneered the use of the theory of nonlinear dynamics to link neuroscience with an obscure branch of mathematics and try to understand the collective dynamics of billions of interconnected neurons in brain (Iasemidis and Sackellares, 1990, 1991; Iasemidis et al., 2001). In those studies, measures of the spatiotemporal dynamical properties of the EEG were shown to be able to demonstrate patterns that correspond to specific clinical states (a diagram of electrode locations is provided in Figure 8.2). Since the brain is a nonstationary system, algorithms used to estimate measures of the brain dynamics should be capable of automatically identifying and appropriately weighing existing transients in the data. In a, chaotic system, orbits originating from similar initial conditions (nearby points in the state space) diverge exponentially (expansion process). The rate of divergence is an important aspect of the system dynamics and is reflected in the value of Lyapunov exponents. The method developed for estimation sf Short Term Maximum Lyapunov Exponents (STL,,), an estimate of L,, for nonstationary data, is explained in Iasemidis et al. i2000). Having estimated the STL,, temporal profiles
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Figure 8.2. (A) Inferior transverse and (B) lateral views of the brain, illustrating approximate depth and subdural electrode placement for EEG recordings are depicted. Subdural electrode strips are placed over the left orbitofrontal (AL),right orbitofrontal (AR), left subtemporal (BL), and right subtemporal (BR) cortex. Depth electrodes are placed in the left temporal depth ( ( C L ) and right temporal depth (CR) to record hippocampal activity
8
Optimization Techniques i n Medicine
221
at individual cortical site, and as the brain proceeds towards the ictal state, the temporal evolution of the stability of each cortical site can be quantified. However, the system under consideration (brain) has a spatial extent and, as such, information about the transition of the system towards the ictal state should also be included in the interactions of its spatial components. The spatial dynamics of this transition are captured by consideration of the relations of the STL,, between different cortical sites. For example, if a similar transition occurs at different cortical sites, the STL,, of the involved sites are expected to converge to similar values prior to the transition. Such participating sites are called "critical sites," and such a convergence "dynamical entrainment." More specifically, in order for the dynamical entrainment to have a statistical content, the T-index (from the well-known paired T-statistics for comparisons of means) as a measure of distance between the mean values of pairs of STL,, profiles over time can be used. The T-index at time t between electrode sites i and j is defined as: Ti,j(t) = v'Ex IE{STLmax,i - STLmaxj) 1 /ai (t) where E { . ) is the sample average difference for the STLmaX,i- STLmax estimated over a moving window wt(X) defined as: 1 if X E [ t - N - l , t ] i f ~ e [ t - ~ - ~ , t ] ,
o
where N is the length of the moving window. Then, ai,j(t) is the sample standard deviation of the STL,,, differences between electrode sites i and j within the moving window wt(X). Thus defined T-index follows a t-distribution with N - 1 degrees of freedom. Therefore, a two-sided t-test with N - 1 degrees of freedom, at a statistical significance level a should be used to test the null hypothesis, Ho: "brain sites i and j acquire identical STL,,, values at time t." Not surprisingly, the interictal (before), ictal (during), and immediate postictal (after the seizure) states differ with respect to the spatiotemporal dynamical properties of intracranial EEG recordings. However, the most remarkable finding was the discovery of characteristic spatiotemporal patterns among critical electrode sites during the hour preceding seizures (Iasemidis and Sackellares, 1990, 1991; Iasemidis et al., 2001; Sackellares et al., 2002; Pardalos et al., 2003b,a,c). Such critical electrode sites can be selected by applying quadratic optimization techniques and the electrode selection problem can be formulated as a quadratically constrained quadratic 0-1 problem (Pardalos et al., 2004a): min x T ~ x
(8.9)
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where the following definitions are used: A is a n x n matrix, whose each element a i j represents the T-index between electrode i and j within 10minute window before the onset of a seizure, B is n x n matrix, whose each element bi,j represents the T-index between electrode i and j within 10-minute window after the onset of a seizure, k denotes the number of selected critical electrode sites, T, is the critical value of T-index to reject Ho, vector x = ( x l , . . . ,x,) E (0, l j n , where each xi represents the cortical electrode site i . If the cortical site i is selected to be one of the critical electrode sites, then xi = 1; otherwise, xi = 0. The use of a quadratical constraint ensures that the selected electrode sites show dynamical resetting of the brain following seizures (Shiau et al., 2000; Pardalos et al., 2002a), that is, divergence of STLmaxprofiles after seizures. In more details seizure prediction algorithm based on nonlinear dynamics and multi-quadratic 0-1 programming is described in Pardalos et al. (2004a). other g o u p s reported evidence in support of the existence of the preictal transition, which is detectable through quantitative analysis of the EEG in Elger and Lehnertz (1998), Lehnertz and Elger (1998), Martinerie et al. (1998), Quyen et al. (1999), and Litt et al. (2001). The use of algebra-geometric approach to the study of dynamic processes in the brain is presented in Pardalos et al. (2003d). Quantum models are discussed in Pardalos et al. (2002b) and Jibu and Yassue (1995).
3.
Treatment planning
In this section we discuss an application of optimization techniques in treatment planning. Probably the most developed and popular domain in medicine, where optimization techniques are used for treatment planning is radiation therapy. Radiation therapy is the method to treat cancer with high-energy radiation that destroys the ability of cancerous cells to reproduce. There are two types of radiation therapy. The first one is an external beam radiation with high-energy rays aimed to the cancerous tissues. Multileaf collimator shapes the beam by blocking out some parts of the beam. To precisely shape the beam, multileaf collimators consist of a small array
8
223
Optimization Techniques in Medicine
of metal leaves for each beam. Thus, each beam is specified by a set of of evenly spaced strips (pencils), and the treatment plan is defined by a collection of beams with the amount of radiation to be delivered along each pencil within each beam. The other radiation therapy method is called brachytherapy. In this type of treatment, radioactive sources (seeds) are placed in or near the tumors. These two types of therapy need to be planned to localize the radiation area with the minimum of destroyed tissue. For external beam radiation therapy, radiation planning involves the specification of beams, direction, intensity and shape of the beam. It is a difficult problem because we need to optimize the dose to the tumor (cancerous area) and minimize the damage to healthy organs simultaneously. To reduce the difficulty of the treatment planning procedure, optimization techniques have been applied. Numerous optimization algorithms were developed for the treatment planning in radiation therapy. As one of the possible approaches one can consider multi-objective optimization techniques (Lahanas et al., 2003a,b). Linear (Lodwick et al., 1998), mixed-integer (Lee and Zaider, 2003; Lee et al., 2003b) and nonlinear programming (Billups and Kennedy, 2003; Ferris et al., 2003) techniques are extensively used in therapy planning. the initial step in any radiotherapy planning is to obtain a set of tomography images of the patient's body around the tumor. The images are then discretized into sets of pixels: critical (the set of pixels with healthy tissue sensitive to radiotherapy); body (the set of pixels with healthy tissue not very sensitive to radiotherapy) and tumor (the set of pixels with cancer cells). The formulations of the breatment planning models (linear, quadratic, etc.) depend on the specified clinical constraint: dose homogeneity, target coverage, dose limits for different anatomical structures, etc. As an example of a problem arising in this area consider the work presented in Billups and Kennedy (2003), where the following formulation based on Lodwick et al. (1998) was discussed: Minimize m a x dose to critical structures Subject to: required tumor dose
< tumor dose 5 max tumor dose,
normal tissue dose 5 dose bound for normal tissue dose 2 0.
Billups and Kennedy (2003) formulate the problem as follows:
s.t.: y -
7,x
D(c,p,b)z(p,b)
PEPb € B
> 0,
c E critical,
224
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
where x(p, b) is the amount of radiation to be delivered using the pth pencil in the b-th beam, D(i, p, b) is the fraction of x(p, b) that will be delivered to pixel i, Tl and Tu are the lower bound and upper bound of the amount of radiation delivered to tumor element, respectively, and y is a dummy variable. If we denote by S the feasible region, the problem above can be written as: min y. -Y,xES
In order to reduce the number of used beams (Billups and Kennedy, 2003) penalized the objective function for each used beam by some fixed penalty P:
P
In order to reduce the radiation exposure to the healthy tissue, the brachytherapy was developed as an alternative to external beam radiation. Nevertheless, the right placement of seeds in tumors is a complicated problem. Lee and Zaider (2003) developed the treatment planning for prostate cancer cases by using a mixed integer programming optimization model. This algorithm uses 0-1 variables to indicate the placement and non-placement of seeds in the three-dimensional grid generated by ultrasound image. Since each seed radiates a certain amount of dose, the radiation dose at point P can be modeled from each location of seeds implanted in tumors. Using this idea, the authors formulated the contribution of seeds at point P by
where D ( r ) is dose contribution function, X j is a vector corresponding to point j and x j is 0-1 seed placement variables at j . The constraints of MIP model can be modeled with the lower and upper bond of dose at point P:
8
Optimization Techniques i n Medicine
225
where Up and Lp are the upper and lower bound for radiation dose at point P, respectively. A review of optimization methods in the radiation therapy is presented in Shepard et al. (1999). Also, some promising directions of future research in the radiation therapy are discussed in Lee et al. (2003a). For the description of some other treatment planning problems (besides radiation therapy) the reader is referred to Sainfort et al. (2004).
4.
Medical imaging
Recent advances in imaging technologies combined with marked improvement in instrumentation and development of computer systems have resulted in increasingly large amounts of information. New therapies have more requirements on the quality and accuracy of image information. Therefore, medical imaging plays an ever-increasing role in diagnosis, prediction, planning and decision-making. Many problems in this field are addressed using optimization and mathematical programming techniques. In particular, specialized mathematical programming t,echniques have been used in a variety of domains including object recognition, modeling and retrieval, image segmentation, registration, skeletonization, reconstruction, classification, etc. (Cho et al., 1993; Kuba et al., 1999; Udupa, 1999; Rangarajan et al., 2003; Du et al., 1999). Some recent publications on imaging using optimization address the following problems: reconstruction methods in electron paramagnetic resonance imaging (Johnson et al., 2OO3), skeletonization of vascular images in magnetic resonance angiography (Nystrom and Smedby, 2001), image reconstruction using multi-objective optimization (Li et al., 2000), etc. Discrete tomography extensively utilizes discrete mathematics and optimization theory. A nice overview on medical applications of discrete tomography is given in Kuba et al. (1999).
5.
Health care applications
Optimization and operations research methods are extensively used in a variety of problems in health care, including economic analysis (optimal pricing,'demand forecasting and planning), health care units operations (scheduling and logistics planning, inventory management, supply chain management, quality management, facility location), etc.
226
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Scheduling and logistic problems are among the most important and classical problems in optimization theory. One of the widely used applications of these problems to medicine is so-called nurse scheduling problem. Nurse scheduling planning is a non-trivial and important task, because it affects the efficiency and the quality of health care (Giglio, 1991). The schedule has to determine daily assignments of each nurse for a specified period of time while respecting certain constraints on hospital policy, personal preferences and qualification, workload, etc. Because of the practical importance of these problems, many algorithms and methods for solving it have been proposed (Miller et al., 1976; Warner, 1976; Isken and Hancock, 1991; Siferd and Benton, 1992; Weil et al., 1995). There are basically two types of nurse scheduling problems: cyclical and non-cyclical scheduling. In cyclical scheduling, an individual nurse works in a pattern repeatedly in a cycle of N weeks. On the other hand, non-cyclical scheduling generates a new scheduling period with available resources and policies that attempt to satisfy a given set of constraints. Recently, a problem of rerostering of nurse schedules was addressed in Moz and Pato (2003) This problem is common in hospitals where daily work is divided into shifts. The problem occurs in the case of the non-scheduled absence of one of the nurses, which violates one of the constraints for the given time shift. In Moz and Pato (2003), an integer multicommodity flow model was applied to the aforementioned problem and the corresponding integer linear programming problem was formulated. Computational results were reported for the real instances from the Lisbon state hospital. In general, optimization techniques and linear programming in particular is a very powerful tool, which can be used for many diverse problem in health care applications. For example, we can refer to Sewell and Jacobson (2003), where the problem of pricing of combination vaccines for childhood immunization was addressed using integer programming formulation. Other important problems in health care applications (inventory and queueing management, workforce and workload models, pricing, forecasting, etc.) are reviewed in Sainfort et al. (2004).
6.
Concluding remarks
In this chapter, we have identified and briefly summarized some of the promising research directions in the exciting interdisciplinary area of optimization in medicine. Although this review is certainly not exhaustive, we have described several important practical problems arising in various medical applications, as well as methods and algorithms used
8 Optimization Techniques in Medicine
227
for solving these problems. As we have seen, applying optimization techniques in medicine can often significantly improve the quality of medical treatment. It is also important to note that this research area is constantly growing, since new techniques are needed to process and analyze huge amounts of data arising in medical applications. Addressing these issues may involve a higher level of interdisciplinary effort in order to develop efficient optimization models combining mathematical theory and medical practice.
Acknowledgements This work was partially supported by a grant from the McKnight Brain Institute of University of Florida and NIH.
References Alexe, S., Blackstone, E., Hammer, P., Ishwaran, H., Lauer, M., and Snader C.P. (2003). Coronary risk prediction by logical analysis of data. Annals of Operations Research, 119:15-42, Billups, S. and Kennedy, J . (2003). minimum-support solutions for radiotherapy. Annals of Operation Research, 119:229-245. Boros, E., Hammer, P., Ibaraki, T., and Cogan, A. (1997). Logical analysis of numerical data. Mathematical Programming, 79:163-190. Boros, E., Hammer, P., Ibaraki, T., Cogan, A., Mayoraz, E., and Muchnik, I. (2000). An implementation of logical analysis of data. IEEE Transactions Knowledge and Data Engineering, 12:292-306. Bradley, P., Fayyad, U., and Mangasarian, 0. (1999). Mathematical programming for data mining: Formulations and challenges. INFORMS Journal on Computing, 11(3):217-238. Bradley, P., Mangasarian, 0., and Street, W. (1998). Feature selection via mathematical programming. INFORMS Journal on Computing, 10:209-217. Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121-167. Califf, R., Armstrong, P., Carver, J., D'Agostino, R., and Strauss, W. (1996). Stratification of patients into high, medium and low risk subgroups for purposes of risk factor management. Journal of the American College of Cardiology, 27(5):1007-1019. Cho, Z., Jones, J., and Singh, M. (1993) Foundations of Medical Imaging. Wiley.
228
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Du, D.-Z., Pardalos, P.M., and Wang, J . (eds.) (1999). Discrete Mathematical Problems with Medical Applications. DIMACS Worskhop. American Mathematical Society. Ekin, O., Hammer, P., and Kogan, A. (2000). Convexity and logical analysis of data. Theoretical Computer Science, 244:95-116. Elger, C. and Lehnertz, K. (1998). Seizure prediction by non-linear time series analysis of brain electrical activity. The European Journal of Neuroscience, 10:786-789. Ferris, M., Lim, J., and Shepard, D. (2003). Radiosurgery treatment planning via nonlinear programming. Annals of Operation Research, 1l9:247-260. Giglio, R. (1991). Resource scheduling: From theory to practice. Journal of the Society for Health Systems, 2(2):2-6. Hammer, P. (1986). Partialiy defined Boolean functions and cause-effect relationships. In: International Conference on Multi-Attribute Decision Making via OR-Based Expert Systems. University of Passau. Hosmer, D. and Lemeshow, S. (1989). Applied Logistic Regression. Wiley. Iasemidis, L., Pardalos, P., Sackellares, J., and Shiau, D.-S. (2001). Quadratic binary programming and dynamical system approach to determine the predictability of epileptic seizures. Journal of Combinatorial Optimization, 5:9-26. Iasemidis, L., Principe, J., and Sackellares, J . (2000). Measurement and quantification of spatiotemporal dynamics of human epileptic seizures. In: M. Akay (ed.), Nonlinear Biomedical Signal Processing, Vol. 11, pp. 294-318. IEEE Press. Iasemidis, L. andsackellares, J. (1990). Phase space topography of the electrocorticogram and the Lyapunov exponent in partial seizures. Brain Topography, 2:187-201. Iasemidis, L. and Sackellares, J . (1991). The evolution with time of the spatial distribution of the largest Lyapunov exponent on the human epileptic cortex. In: D. Duke and W. Pritchard (eds.), Measuring Chaos in the Human Brain, pp. 49-82. World Scientific. Isken, M. and Hancock, W. (1991). A heuristic approach to nurse scheduling in hospital units with non-stationary, urgent demand, and a fixed staff size. Journal of the Society for Health Systems, 2(2):24-41.
8
Optimization Techniques in Medicine
229
Jibu, M. and Yassue, K. (1995). Quantum Brain Dynamics and Consciouness: An Introduction. John Benjamins Publishing Company. Johnson, C., McGarry, D., Cook, J., Devasahayam, N., Mitchell, J., Subramanian, s., and Krishna, M. (2003). Maximum entropy reconstruction methods in electron paramagnetic resonance imaging. Annals of Operations Research, 119:lOl-118. Kuba, A., Herman, G. Matej, S., and Todd-Pokropek, A. (1999). Medical Applications of discrete tomography. In: D.-Z. Du, P. M. Pardalos, and J. Wang (eds.), Discrete Mathematical Problems with Medical Applications, pp. 195-208. DIMACS Series, vol. 55. American Mathematical Society. Lahanas, M., Baltas, D., and Zamboglou, N. (2003a). A hybrid evolutionary multiobjective algorithm for anatomy based dose optimization algorithm in HDR brachytherapy. Physics in Medicine and Biology, 48:399-415. Lahanas, M., Schreibmann, E., and Baltas, D. (2003b). Multiobjective inverse planning for intensity modulated radiotherapy with constraintfree gradient-based optimization algorithms. Physics in Medicine and Biology, 48:2843-2871. Lee, E., Deasy, J., Langer, M., Rardin, R., Deye, J., and Mahoney, F. (2003a). Final report-NCI/NSF Workshop on Operations Research Applied to Radiation Therapy. Annals of Operation Research, 119:143-146. Lee, E., Fox, T. and Crocker, I. (2003b). Integer programming applied to intensity-modulated radiation therapy treatment planning. Annals of Operation Research, 119:165-181. Lee, E. and Zaider, M. (2003). Mixed integer programmming approaches to treatment planning for brachytherapy - Application to permanent prostate implants. Annals of Operation Research, 119:147-163. Lee, Y.-J., Mangasarian, O., and Wolberg, W. (2000). Breast cancer survival and chemotherapy: A support vector machine analysis. In: D.-Z. Du, P.M. Pardalos, and J. Wang (eds.), Discrete Mathematical Problems with Medical Applications, pp. 1-9. DIMACS Series, vol. 55. American Mathematical Society. Lehnertz, K. and Elger, C. (1998). Can epileptic seizures be predicted? Evidence from nonlinear time series analysis of brain electrical activity. Physical Review Letters, 80:5019-5022.
230
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Li, X., Jiang, T., and Evans, D. J. (2000). Medical image reconstruction using a multi-objective genetic local search algorithm. International Journal of Computer Mathematics, 74:301-314. Litt, B., Esteller, R., Echauz, J.,Maryann, D., Shor, R., Henry, T., Pennell, P., Epstein, C., Bakay, R., Dichter, M., and Vachtservanos, G. (2001). Epileptic seizures may begin hours in advance of clinical onset: A report of five patients. Neuron, 30:51-64. Lodwick, W., McCourt, S., Newman, F., and Humphries, S. (1998). Optimization methods for radiation therapy plans. In: Computational, Radiology and Imaging: Therapy and Diagnosis. IMA Series in Applied Mathematics, Springer. Mangasarian, O., Street, W., and Wolberg, W. (1995). Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4):570-577. Martinerie, J., Adam, C.V., and Quyen, M.L.V. (1998). Epileptic seizures can be anticipated by non-linear analysis. Nature Medicine, 4:1173-1176. Miller, H., Pierskalla, W., and Rath, G. (1976). Nurse scheduling using mathematical programming. Operations Research, 24(8):857-870. Moz, M. and Pato, M.V.(2003). An integer multicommodity flow model applied to the rerostering of nurse schedules. Annals of Operations Research, 1l9:285-3Ol. Nystrom, I. and Smedby, 0. (2001). Skeletonization of volumetric vascular images distance information utilized for visualization. Journal of Combinatorial Optimization, 5:27-41. Pardalos, P., Chaovalitwongse, W., Iasemidis, L., Sackellares, J., Shiau, D.-S., Carney, P., Prokopyev, O.A., and Yatsenko, V.A. (2004a). Seizure warning algorithm based on optimization and nonlinear dynamics. Revised and resubmitted to Mathematical Programming. Pardalos, P., Iasemidis, L., Shiau, D.-S., and Sackellares, J. (2002a). Combined application of global optimization and nonlinear dynamics to detect state resetting in human epilepsy. In: P. Pardalos and J. Principe (eds.), Biocomputing, pp. 140-158. Kluwer Acedemic Publishers. Pardalos, P., Iasemidis, L., Shiau, D.-S., Sackellares, J., and Chaovalitwongse, W. (2003a). Prediction of human epileptic seizures based on
8
Optimization Techniques in Medicine
231
optimization and phase changes of brain electrical activity. Optimization Methods and Software, 18(1):81-104. Pardalos, P., Iasemidis, L., Shiau, D.-S., Sackellares, J., Chaovalitwongse, W., Principe, J., and Carney, P. (2003b). Adaptive epileptic siezure prediction system. IEEE Transactions on Biomedical Engineering, 50(5):616-626. Pardalos, P., Iasemidis, L., Shiau, D.-S., Sackellares, J., Yatsenko, V., and Chaovalitwongse, W. (2003~).Analysis of EGG data using optimization, statistics, and dynamical systems techniques. Computational Statistics and Data Analysis, 44:391-408. Pardalos, P. and Principe, J. (eds.) (2002). Biocomputing. Kluwer Academic Publishers. Pardalos, P., Sackellares, J., Carney, P., and Iasemidis, L. (eds.) (2004b) Quantitatzve Neuroscience: Models, Algorithms, Diagnostics, and Therapeutic Applications. Kluwer Academic Publishers. Pardalos, P., Sackellares, J., and Yatsenko, V. (2002b). Classical and quantum controlled lattices: Self-organization, optimization and biomedical applications. In: P. Pardalos and J , Principe (eds.), Biocomputing, pp. 199-224. Kluwer Academic Publishers. Pardalos, P., Sackellares, J., Yatsenko, V., and Butenko, S. (2003d). Nonlinear dynamical systems and adaptive filters in biomedicine. Annals of Operations Research, 119:119-142. Quyen, M.L.V., Martinerie, J., Baulac, M., and Varela, F. (1999). Anticipating epileptic seizures in real time by non-linear analysis of similarity between EEG recordings. Neuroreport, 10:2149-2155. Rangarajan, A., Figueiredo, M., and Zerubia, J. (eds.) (2003) Energy Minimization Methods zn Computer Vision and Pattern Recognition, 4th International Workshop, EMMCVRP 2003. Springer. Sackellares, J., Iasemidis, L., Gilmore, R., and Roper, S. (2002). Epilepsy-when chaos fails. In: K. Lehnertz, J. Arnhold, P. Grassberger, and C. Elger (eds.), Chaos in the brain? Sainfort, F., Brandeau, M.,and Pierskalla, W. (eds.) (2004) Handbook of Operations Research and Health Care. Kluwer Academic Publishers. Sewell, E. and Jacobson, S. (2003). Using an integer programming model to determine the price of combination vaccines for childhood immunization. Annals of Operations Research, 119:261-284.
232
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Shepard, D., Ferris, M., Olivera, G., and Mackie, T. (1999). Optimizing the delivery of radiation therapy to cancer patients. SIAM Review, 41 (4):721-744. Shiau, D., Luo, Q., Gilmore, S., Roper, S., Pardalos, P., Sackellares, J., and Iasemidis, L. (2000). Epileptic seizures resetting revisited. Epilepsia, 41/S7:208-209. Siferd, S. and Benton, W. (1992). Workforce staffing and scheduling: Hospital nursing specific models. European Journal of Operations Research, 60:233-246. Udupa, J. (1999). A study of 3D imaging approaches in medicine. In: D.-Z. Du, P.M. Pardalos, and J. Wang (eds.), Discrete Mathematical Problems with Medical Applications, pp. 209--216.DIMACS Series, vol. 55. American Mathematical Society. Vapnik, V. (1995). The -Vatwe of Statistical Learning Theory. Springer. Warner, D. (1976). Scheduling nursing personnel according to nursing preference: A mathematical programming approach. Operations Research, 24(8):842-856. Weil, G., Heus, K., Francois, P., and Poujade, M. (1995). Constraint programming for nurse scheduling. IEEE Engineering in Medicine and Biology, 14(4):417-422.
Chapter 9
GLOBAL OPTIMIZATION I N GEOMETRY - CIRCLE PACKING INTO T H E SQUARE Phter GSbor Szab6 MihSly Csaba Mark6t Tibor Csendes Abstract
1.
The present review paper summarizes the research work done mostly by the authors on packing equal circles in the unit square in the last years.
Introduction
The problem of finding the densest packing of n equal objects in a bounded space is a classical one which arises in many scientific and engineering fields. For the two-dimensional case, it is a well-known problem of discrete geometry. The Hungarian mathematician Farkas Bolyai (1775-1856) published in his principal work ('Tentamen', 183233 Bolyai, 1904) a dense regular packing of equal circles in an equilateral triangle (see Figure 9.1). He defined an infinite packing series and investigated the limit of vacuitas (the gap in the triangle outside the circles). It is interesting that these packings are not always optimal in spite of the fact that they are based on hexagonal grid packings (Szab6, 2000a). Bolyai was probably the first author in the mathematical literature who studied the density of a series of packing circles in a bounded shape. Of course, the work of Bolyai was not the very first in packing circles. There were other interesting early packings in fine arts, relics of religions and in nature (Tarnai, 1997), too. The old Japanese sangaku problems (Fukagawa and Pedoe, 1989; Szab6, 2001) contain many nice results related to the packing of circles. Figure 9.2 shows an example of packing 6 equal circles in a rectangle. The problem of finding the densest packing of n equal and nonoverlapping circles has been. studied for several shapes of the bounding
234
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Figure 9.1. triangle.
The example of Bolyai for packing 19 equal circles in an equilateral
Figure 9.2. Packing of 6 equal circles in a rectangle on a rock from Japan.
9
Circle Paclcing into the Square
235
region, e.g., in a rectangle (Ruda, 1969), in a triangle (Graham and Lubachevsky, 1995) and circle (Graham et al., 1998). Our work focuses only on the 'Packing of Equal Circles in a Square'-problem. The Hungarian mathematicians Dezso LBzBr and LBszl6 Fejes T6th have already investigated the problem before 1940 (Staar, 1990; Szab6 and Csendes, 2001). The problem first appeared in literature in 1960, when Leo Moser (1960) guessed the optimal arrangement of 8 circles. Schaer and Meir (1965) proved this conjecture and Schaer (1965) solved the n = 9 case, too. Schaer has given also a proof for n = 7 in a letter to Leo Moser in 1964, but he never published it. There is a similar unpublished result from R. Graham in a private letter for n = 6. Later Schwartz (1970) and Melissen (1994) have given proof for this case (up to n = 5 circles the problem is trivial). The next challenge was the n = 10 case. de Groot et al. (1990) solved this after many authors published new and improved packings: Goldberg (1970); Milano (1987); Mollard and Payan (1990); Schaer (1971); Schliiter (1979) and Valette (1989). Some unpublished results are known also in this case: Griinbaum (1990); Grannell (1990); Petris and Hungerbiiler (1990). The proof is based on a computer aided method, and nobody published a proof using only pure mathematical tools. There is an interesting mathematical approach of this case in Hujter (1999). Peikert et al. (1992) found and proved optimal packings up to n = 20 using a computer aided method. Based on theoretical tools only, G. Wengerodt solved the problem for n = 14, 16 and 25 (Wengerodt, 1983, 1987a,b), and with K. Kirchner for n = 36 (Kirchner and Wengerodt, 1987). In the last decades, several deterministic (Locatelli and Raber, 2002; Markbt, 2003a; Mark6t and Csendes, 2004; Nurmela and OstergArd, 1999a; Peikert et al., 1992) and stochastic (Boll et al., 2000; Casado et al., 2001; Graham and Lubachevsky, 1996) methods were published. Proven optimal packings are known up to n = 30 (Nurmela and Ostergbrd, 1999a; Peikert et al., 1992; Markbt, 2003a; Mark6t and Csendes, 2004) and for n = 36 (Kirchner and Wengerodt, 1987). Approximate packings (packings determined by computer aided numerical computations without a rigorous proof) and candidate packings (best known arrangements with a proof of existence but without proof of optimality) were reported ir, the literature for up to n = 200: Boll et al. (2000); Casado et al. (2001); Graham and Lubachevsky (1996); Nurmela and OstergArd (1997); Szab6 and Specht (2005). At the same time, some other results (e.g. repeated patterns, properties of the optimal solutions and bounds, minimal polynomials of packings) were published as well (Graham and Lubachevsky, 1996; Locatelli and Raber, 2002; Nurmela
236
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
et al., 1999; Tarnai and GBsp&r, 1995-96; Szab6, 2000b; Szab6 et al., 2001; Szab6, 2004).
The packing circles in a square problem
2.
The packing circles in a square problem can be described by the following equivalent problem settings:
PROBLEM 1 Find the value of the m a x i m u m circle radius, r,, such that n equal non-overlapping circles can be placed i n a unit square. PROBLEM2 Locate n points i n a unit square, such that the m i n i m u m distance mn between any two points is maximal.
PROBLEM 3 Give the smallest square of side p,, which contains n equal and non-overlapping circles where the radius of circles is 1.
PROBLEM 4 Determine the smallest square of side an that contains n points with mutual distance of at least 1.
2.1
Optimization models
The problem is at one hand a geometrical problem and on the other hand a continuous global optimization problem. Problem 2 can be written shortly as a 2n 1 dimensional continuous nonlinear constrained (or max-min) global optimization problem in the following form:
+
This problem can be considered in the following ways:
a)
as a DC programming problem (Horst and Thoai, 1999).
A DC (difference of convex functions) programming problem is a mathematical programming problem, where the objective function can be described by a difference of two convex hnctions. The objective function of the problem can he stated as the difference of the following two convex functions g and h :
9
Circle Packing into the Square
where
or as an all-quadratic optimization problem. The general form of an all-quadratic optimization problem (Raber, 1999) is
b)
+ (do)Tx]
min[xT Q0x subject to
1 =
+
Q O = O , x ~ = ( x ~ , x ~ , . . . , x( d~o ~) T)=, ( - I , O , . . . , O ) ,
/
-1,
[ & 1I .23. = & . .
23
=<
\
1,
0,
ifi =j =
21' 21"
+ 1, + 1,
ifi=j=l, i = 21"+ 1 and j = 211+ 1, i = 21" and j = 211, i = 211+ 1 and j = 21"+ 1, i = 21' and j = 21", otherwise,
In this model, xo is the minimal distance between the points. The coordinates of the ith point (1 5 i < n) are (xziVl,xzi).
238
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
These models can be of interest, to be used for mathematical programming solvers as hard optimization problems. The investigations show that those approaches are effective that utilize the geometrical properties of the problem.
3.
Properties of optimal packings and bounds
Recently, Locatelli and Raber (2002) proved two engaging properties that must be satisfied by at least one optimal solution of Problem 2. These theorems state the intuitive fact that as many points as possible should be located along the boundary of the square.
There exists always THEOREM 9.1 (LOCATELLI AND RABER,2002) a n optimal solution of Problem 2 such that at each vertex of the square one and only one of the following conditions hold: at least one point of the optimal solution coincides with that vertex of the square, two points of the optimal solution belong to the edge determined by the vertices and have a distance of m,, where m, denotes the minimal distance between the points in the optimal solution. THEOREM 9.2 (LOCATELLI A N D RABER,2002) There exists always a n optimal solution of Problem 2 such that along each edge of the square there i s no portion of the edge of width greater than or equal t o twice the optimal distance Tiin which does not contain any point of the optimal solution. Using two another generalized theorems we can give lower and upper bounds for Tiin.
THEOREM 9.3 (HADWIGER, 1944) Let us denote by X a subset o n the plane by bordering a Jordan curve. If Mn denotes the m a x i m u m of minimal distance between n points in X , then
where A(X) is the area of X . THEOREM 9.4 (FOLKMAN AND GRAHAM, 1969) Let u s denote by X a compact convex subset o n the plane. The number of points with mutual distance of at least 1 can be at most
239
9 Circle Packing into the Square where A ( X ) is the area and P ( X ) is the perimeter of X
After a short calculation it can easily be shown that these inequalities are equivalent with the following lower and upper bounds for Tiin, where X is a unit square:
Using these inequalities one may find that, if n tends to infinity, lim
fimn =
n--+m
thus
Szab6 et al. (2001) have provided another lower bound using regular patterns and in Szab6 et al. (2001); Tarnai and GBspdr (1995-96) heuristic upper bounds were studied based on the computation of the areas of circles and minimum gaps among the circles.
3.1
Computer aided approaches
In this subsection we give an overview of the most important earlier methods to find approximate packings. Several strategies were used, e.g., nonlinear programming solver (MINOS, Maranas et al., 1995) and Cabri-Geomktry software (Mollard and Payan, 1990). Unfortunately, these approaches were good only for small numbers of circles. Here we summarize some useful earlier approaches to find approximate packings for higher n.
3.1.1
Energy function minimization.
By virtue of
the problem is relaxed as
This objective function can be interpreted as a potential or energy function. A physical analogon of this approach is to regard the points as
240
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
electrical charges (all positive or all negative) which are repulsing each other. If the minimal distance between the charged particles increases, the corresponding value of the potential function decreases. Nurmela and 0stergdrd (1997) used a similar energy function with large positive integer m values, where X is scaling factor to prevent numerical overflows:
Introducing xi = sin(xi) and yi = sin(yi), it transforms into an unconstrained optimization problem in variables xi, yi, where the coordinates of the centers of the circles fulfill the constraints -1 5 xi 5 1, -1 5 yi 5 1. They published candidate packings up to 50 circles using a combination of Goldstein-Armijo backtracking linear search and the Newton method for the optimization. 3.1.2 Billiard simulation. The billiard simulation method is physically motivated too. Let us consider a random arrangement of the points. Draw equal circles around the points without overlapping. Each circle can be considered as a ball with an initial radius, moving direction and speed. Start the balls and increase slowly the common radius of them. The swing of each ball during the process will be less and less. The algorithm stops when the packing or a substructure of the packing becomes rigid. Using billiard simulation Graham and Lubachevsky (1996) reported several candidate packings for up to 50 circles and for some values beyond.
3.1.3 A perturbation method. Boll et al. (2000) used a stochastic algorithm which gave improved packings for n = 32, 37, 48, and 50. A brief outline of their method is
1 Step: Consider n random points in the unit square, 2 Step: define s = 0.25 as an initial value, 3 Step: for each point (a) perturb the place of the center by s in the directions of North, South, East, or West, (b) if during the movement the distance between the point and its nearest neighbor becomes greater, update the new location of the point,
9
Circle Packing into the Square
4 S t e p repeat S t e p 3 while movable points exist, 5 S t e p s := ~ 1 1 . 5and , if s > 10-lo then continue with S t e p 3. Using the previous simple algorithm good candidate packings can be found after some millions of iterations. They have found unpublished approximate packings up to n = 200. Douglas Hanson, an 8th grade student from Texas, has recently improved some of them using Donovan's program (see http ://www .packomania.com).
3.1.4 TAMSASS-PECS. The TAMSASS-PECS (Threshold Accepting Modified Single Agent Stochastic Search for Packing Equal Circles in a Square) method is based on the Threshold Accepting global optimization technique and a modified SASS local optimization algorithm (Solis and Wets, 1981). The algorithm starts with a pseudorandom initial packing, a standard deviation and with a threshold level. The algorithm improves the current solution by an iterative procedure. At every step it tries to find a better position of the actual point using a local search. The stopping criterion is based on the value of the standard deviation, which is decreased at every iteration. The framework of the method is the Threshold Accepting approach. It is a close alternative of the Simulated Annealing algorithms. It accepts every move that leads to a new approximate solution not much worse than the current one and rejects other moves. Using TAMSASS-PECS (Casado et al., 2001) reported approximate packing~up to n = 100 and improved some earlier packings. 3.1.5 A deterministic approach based on LP-relaxation. The circle packing problem can be regarded as an all-quadratic optimization problem, i.e. an optimization problem with not necessarily convex quadratic constraints. The hardness is due to the large number of constraints. This approach provides a rectangular subdivision branch-and-bound algorithm. To give an upper bound at each node of the branch-and-bound tree, M. Locatelli and U. Raber used the special structure of the constraints and gave an LP-relaxation (Locatelli and Raber, 2002). They have found candidate packings for up to 39 circles proving the optimality theoretically within a given accuracy. 3.1.6 The MBS algorithm. The basic idea of the approach MBS (Modified Billiard Simulation) is as follows (Szab6 and Specht, 2005): Distribute randomly n points inside the unit square and blow them up in a uniform manner. This can be done by incrementing the (which is a safe radii gradually from an initial value of ro =
242
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
lower bound). In early stages of the process, when the distance between the small circles is much greater than their size and no collisions occur, there is no need to change their positions. As the circles grow, we have to deal with collisions (also among the circles and the boundaries). During the process when the decrease is too small or the number of iterations is larger than a given number, the calculation stops. The efficiency of the MBS algorithm comes from a significant reduction of computational costs. The basic idea is as follows: It is not necessary to calculate and store the mutual distance between two circles if they are too far from each other and will never meet. For the numerical calculation the program uses two matrices CCD and CED. Matrix CCD stores the adjacency between the objects themselves, and matrix CED holds these between the objects and the sides of the square. At start, all matrix elements are set to NEAR which means that only such pairs of circles will be checked during the calculation. When (after thousands of collisions) a mutual distance of a pair is great enough, then the value is set to FAR which means that this contact will never occur in later iterations. As long as the program runs, the cost of the subroutine which determines the contacts will become less and less. It is useful to consider not only random arrangements for the initial packing but hexagonal or regular lattice packings too. Sometimes the relationship between the number of the circles and the structure of packing can provide a good initial configuration. The code and the found packi n g ~(up to n = 300) can be downloaded from the Packomania web-site: http://www.packomania.com/. In Table 9.1 we have summarized the numerical results of the known optimal packings.
4.
Repeated patterns in packings
Sometimes, there is a connection between the structures of the packi n g ~and the number of circles. When the structure of a packing follows a kind of regularity (e.g., a lattice arrangement), then the coordinates of the centers of the circles can easily be calculated and these structures are called patterns. It is easy to see t,he pattern when the number of the circles is a square number (n = k2, 1 k 6). In this case, the circles are in a k x k
< <
9
Circle Packing into the Square
Table 9.1. The numerical results for n=2-30. n
exact r,
2 3 4 5 6 7 8 9 10 11
+(2-Jz) ;(8-5a+4fi-
a
$(-l+JZ) &(-13+6m) &(4-J3) i(l+JZ-J3)
;
-
(see separately)
exact m,
approximate m,
41
a
343
1,4142135624 d-fi 1,0352761804 1 1,0000000000 +2 0,7071067812 0,6009252126 4-2fi 0,5358983849 am3 0,5176380902 1 2 0,5000000000 0,4212795440 (see separately) 0,3982073102
lattice arrangement (PAT1) and m, = l / ( k - 1). This pattern gives the optimal solutions considering the mentioned cases, however, if n = 49, then there exist denser packings (cf. Nurrnela and Ostergbrd, 1997). The patterns proposed by Nurmela, and Ostergbrd (1997) and Graham and Lubachevsky (1996) are summarized in Table 9.2. The fourth column of Table 9.2 gives those cases which can ensure optimal packings for the patterns, whiie in the fifth column, we can find the ones with the best known packings. We will show examples of them in Figure 9.3.
244
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Table 9.2. Patterns for the optimal and for the currently best known arrangements. Pattern
Optimal ( k )
PAT 1
2,3,4,5,6
PAT2
37475(3)
PAT3a
374
PAT3b
5
PAT4
4
PAT5
2,425
The best ( k )
Here d = nr2n denotes the density of the packing, c is the number of connections and f stands for the number of free circles. If n = k2 - 1 then (PAT2) pattern can be recognized, or for n = k2 - 2 (PAT3a, PAT3b). These patterns are similar to PAT1, but in this case we remove 1 or 2 circles and press the remaining ones into their columns and rows. There exist 3 different optimal solutions for n = 24 (PAT2) and 4 different good packings for n = 34 (PATSb), in both cases with the same radius values. P A T 4 and P A T 5 are patterns, which represent the points (or centers of the circles) in a lattice arrangement. A generalized pattern of P A T 5 is discussed by Szab6 et al. (2001). After studying these patterns, we can recognize that always exists a threshold number ko such that the patterns give the optimal or the currently best known packings up to this dimension, but later on these packings will provide only lower bounds for the optimal values. It is an interesting question whether there is a universal pattern with an infinite packing series in which all packings are optimal. This is a natural question, originating from an analogous problem: find the densest packing for n equal circles in an equilateral triangle and n = k(k + 1)/2, k 1. In this case there exists an infinite series of optimal packings, see Lubachevsky et al. (1997). Here the circles are in the hexagonal arrangement (the centers of the circles are in a hexagonal grid) which is the densest packing of equal circles in the plane. A similar conjecture exists for equal circles packing in a square problem (Nurmela et al., 1999; Szab6, 2000b). Consider the following recursive sequences (k 3):
>
>
9
Circle Packing into the Square
245
Figure 9.3. Examples for the repeated patterns
Figure 9.4.
An example for a generalized pattern of PAT5 with 12 points
Dividing the side of the square into ah and bk equal parts, wc obtain a k x bk rectangles. Put the first point into a corner of the square, then place the points in each second corners of the rectangulars as on Figure 9,4. It is open for which values of ak and bk these packing are optimal.
246
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
An interesting number theoretical statement is that when ar, and br, are defined by the previous recursive series, then a) limk,,
ak/bk = a / 3 , and
b) { a k / b k ) ~ is l a subseries of the approximate fractions of the
a
- = [O; 1,1,2] = 0 3
1
+
1
continued fraction. Looking at the previous packing sequence, the number of the circles is equal to n = (arc
+ N b k + 1) 2
An explicit formula for n is the following:
where Ak = (2 + and Bk = (2 the minimal distance is m, = d l / a ;
+ lib;.
The maximum m, of
Here, the circles also approximate the hexagonal structure, but this alone, of course, does not prove the optimality. An interesting packing sequence can be found for the densest packing of equal circles in a circle problem in which n = 3k(k 1) 1. On the one hand, the hexagonal structure might be solved in this pattern as well and it presents the most spread packings when n = 7, 19, 37 and 61. On the other hand when n = 91, 127, 169 better ways of packing can be used (Lubachevsky and Graham, 1997).
+ +
5.
Minimal polynomials of packings
Sometimes it is useful to have an algebraic description of a packing. The minimal polynomial is a polynomial with minimal degree and the first positive root of the polynomial is m,. Sometimes it is easy to determine the minimal polynomial of a packing (e.g., the packing symmetric
9
Circle Packing into the Square
Figure 9.5. The optimal packing of 10 circles/points in the unit square.
or contains optimal substructures, Szab6, 2004). But if the structure of an optimal packing is not symmetric and it does not contain an optimal substructure then it is not trivial to calculate the minimal polynomial. In this case a possible way to determine the minimal polynomial is the following: Let us define a quadratic system of equations to the packing where an equation reflects the fact that the distance of two points is m,. To determine the minimal polynomial we have to eliminate all variables with the exception of m,. Using Buchberger's algorithm (based on Grobner basis) or another technique utilizing the resultant and a symbolic algebra system (e.g., Maple, Mathematica, COCOA,Macaulay2, Singular, etc.) this can be done, but sometimes this is also hard. As an example, let us determine the minimal polynomial plo(m) for n = 10 (de Groot et al., 1990). The corresponding quadratic system of equations is the following:
The points PI, P 2 , P3,P4,P6, P8,Pg,and Ploare on the side of the square thus X I = x4 = xg = y2 = y3 = 0 and 2 6 = 28 = y9 = ylo = 1. It is easy to see that y4 = yl m, x3 = 1 2 m and y8 = 96 m. and P2P3P5Ps is a rhombus thus .2.5 = 1- m and ys = y6. In the P4P7Pg
+
+
+
248
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
P9P7Plo isosceles triangles these equalities holds: 'y7 = (1 + yl
+
m)/2 and x7 = xlo/2. Using the previous observations, all variables are eliminated with the exception of x2, 210, yl, 35 and m . The system of equations is then reduced to the form (yl# 0):
Let us now determine the minimal polynomial with Maple 8 based on the Groebner package:
> with(Groebner) : univpoly(m, [polynomials],{x2,yl,xlo,y5,m));. The obtained minimal polynomial p l o ( m ) is given in the following list. A list of the known minimal polynomials p,(m) (2 5 n 5 100):
9
Circle Packing into the Square
250
6.
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
A reliable computer-assisted optimization method for circle packing
The papers Mark6t (2000), Mark6t (2003a) and Mark& and Csendes (2004) introduced a computer-aided technique for proving optimality of certain problem instances. In contrast to the earlier computer methods (see Section I), the presented algorithm is based fully on interval arithmetic. Thus, our method is capable to overcome the rounding and conversion problems occurring in finite precision floating point computations and in I/O routines.
6.1
Problem definition
We study the point packing problem, Problem 2, but with the square of distances. Denote the set of points to be located by ((xl, yl), . . . , (x,, yn)), all in [O, 112. In the sequel we denote this point set briefly by (x, y). Moreover, denote the square of the distance between the points (xi, yi) and (xj, yj) by dij. Then the objective function to be maximized is:
Prior to our investigation the optimal solutions of the cases n = 2 , . . . ,27 and 36 were known. Although a part of the optimality proofs were based on computer-assisted methods, still those methods used floating point arithmetic (with the exception of an interval based local result verification method of Nurmela and Ostergkd (1999a)).
6.2
Interval analysis
The description of the algorithm requires a brief survey on the basic interval definitions and properties (for more details see, e.g., Alefeld and Herzberger, 1983; Hansen, 1992; Moore, 1966): The set of compact intervals is denoted by 1, where for all A E ll intervals A = [A,A] = {a E R I A a A). Here A, A E R mean the lower and upper bounds of A, respectively. In the case of A = A we call A a point interval. For a given set of reals D 2 R, I(D) denotes the set of all intervals in D. The width of an interval is defined by w(A) := A -A. The real arithmetic operations can be extended for intervals by applying the general definition A o B := {ao b I a E A, b E B ) , which can be calculated by the following formulas:
< <
9
Circle Paclcing into the Square
251
, AB,AB)], A . B = [min{A@,dB, AB, AB), m a x { A ~AB, Let cp : D C R -+ R be an elementary real function which is continuous in all A E II(D) intervals. Then the interval extension of the elementary function cp is 4,: il(D) -+ I, @(A):= {cp(a) I a E A). For a given function the corresponding interval extension can be calculated, e.g., by invoking monotonicity properties. A vector of n intervals is called an n-dimensional interval (or shortly, a box): X = (XI, Xz, . . . ,Xn), X E In, and Xi E I for i = 1, 2 , . . . ,n. For a given n-dimensional set D E Rn we denote the set of n-dimensional boxes in D by il(D). The extension of the basic arithmetic operations and elementary functions for multidimensional intervals is defined componentwise, similarly as for real vectors. In order to define interval extensions for compound real functions, we introduce the concept of interval inclusion functions. We call F : I(D) -+ I an inclusion function of f : D C Rn -+ R, if f ( X ) = {f (x) I x E X ) F ( X ) holds for all X E II(D), where f ( X ) denotes the range of f over X . Beyond the theoretical reliability of interval computations, the inclusion properties should be guaranteed also in the case when finite precision floating-point computer arithmetic is applied, i.e. the rounding errors should be controlled. This is usually done by the computational environment using exactly representable floating-point numbers (also called machine numbers) together with directed outward rounding procedures.
6.3
The optimization frame algorithm
We have applied an interval branch-and-bound optimization approach (see, e.g., Csallner et al., 2000; Csendes and Ratz, 1997; Hammeret al., 1993; Hansen, 1992; Kearfott, 1996; Mark6t et al., 2000; Ratschek and Rokne, 1988) designed for determining all the global maximizers of the general global optimization problem
where f : Rn -t R is a continuous objective function and Zo E In is the search space. The main building blocks of the algorithm are basically the same as the steps of the classical B&B methods. We utilize the fact that interval arithmetic provides a general tool to compute guaranteed enclosures F ( Z ) of the range of the objective function f (2) over a box Z. At each iteration cycle, we choose a box Z from the list of boxes
252
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
(WorkList) waiting for further subdivision, and split it into subboxes,
u l , . . . , US (we used s = 2 in the present method).
Then for all U' subintervals some shrinking tools, the so-called accelerating devices are applied, which delete some parts of ui that cannot contain a global maximizer point. When the box 6'enclosing all the remaining parts of ui fulfills a certain termination criterion, we put 6 into the list of the result boxes (ResultList), otherwise we store U' for further processing in the WorkList. At each iteration we also try to update the best known lower bound f of the global maximum value. f is also called as cutoff value: we can delete all boxes U' from the WorkList for which F ( u i ) < f holds. The algorithm stops when the WorkList becomes empty: then the candidate boxes in the ResultList contain the enclosures of all the global maximizers, and moreover, the interval [f,max{F(Z) I Z E ResultList)] encloses the global maximum value. In the following, we specify the algorithmic details by defining an inclusion function of (9.3) and introducing a special accelerating device. Note that already in the first phase of our study it turned out that the classical accelerating devices are not enough, we have to tune our algorithm by designing special interval-based tools utilizing the geometric properties of the problem class.
6.4
Introducing an interval inclusion function
Mark6t (2000) gives a non-trivial interval inclusion function of the objective function (9.3):
THEOREM 9.5 ( M A R K ~ T 2000, ,
SLIGHTLY MODIFIED)
Assume that
( X , Y) 5 [0, 1]2n,and let
Dij, a E R, and b := minl
6.5
The method of active areas
This method played a key role in the earlier theoretical and computeraided optimality proofs, e.g., in de'Groot et a1 (1992); de Groot et al. (1990); Kirchner and Wengerodt (1987); Locatelli and Raber (2002); Nurmela and Ostergbd (1999a,b); Peikert et al. (1992). The essence of the method is the following: assume that we have an fo lower bound for the maximum of the minimal pairwise distances. Let C = (GI,. . . , Cn),
9
Circle Packing into the Square
253
Ci C [0, 112,i = 1 , .. . , n be the currently investigated search set (with a suitable representation), where Ci contains the it11 point of all packing configurations in C. Then, from each component Ci we can iteratively delete those points which have a distance smaller than fo to all points of the remained region of an other component. Figure 9.6 shows an example of eliminating a region (the shaded polygon) from polygon B using polygon A, when assuming exact computations. In the first version of our interval approach (Mark&, 2000) the remaining (active) region of each component was approximated by a rectangle (or unions of rectangles obtained after a horizontal and/or vertical quantization) during the basic elimination step. This matches the idea proposed by de Groot et al. and by Nurmela and &terg&rd. However, our algorithm variant using this device was only able to confirm the local optimality of the earlier found optimal packings. Instead of representing the remaining regions simply by unions of cells, Nurrnela and Ostergiird (1999a) approximated the remaining sets by polygons. The proposed procedure raises several problems when using floating point computations. As a solution, in Mark6t and Cserides (2004) we developed a reliable version of this polygonal approach using interval arithmetic. This method proved t o be the most efficient accelerating test, of the present algorithm.
6.6
The method of handling free circles
The efficient way of handling free circles in the optimal solutioii (or equivalently, handling free points in the corresponding point paclting
Figure 9.6. Approximating the activr regions by polygons (with exact arithmetic).
254
ESSA Y S AND SURVEYS IN GLOBAL OPTIMIZATION
problem) is crucial when circle packing problems are solved with interval algorithms, since free circles pose a positive measure, continuum set of equivalent global optimizers. The simple method below shows a suitable way to 0,vercome this difficulty. The basic idea is that -under certain conditions -some remaining regions can temporarily be replaced by machine points, i.e., by pairs of two machine numbers without losing any global optimizers. 1. Let (X, Y) E enclose all the remaining boxes (stored either in the WorkList or in the ResultList) after a certain number of iteration loops when executing the B&B algorithm. Let $ be the current cutoff value. 2. Assume that there exist machine points pk, , . . . ,pk,, pks E (Xks,Yks), s E (1,. . . , t ) within t different components of (X, Y) such that
holds for all s E { I , . . . ,t ) and for all j denote the index set {kl,. . . , kt).
#
k,, j E {I,. . . ,n). Let K
3. Replace the components (Xi, Y , ) with the point intervals pi for each i E K . Run the B&B algorithm on the resulting (XI, Y') box ignoring the step of improving $ and stop it after a certain number of iterations.
4. Let ( X u ,Y") E I I include ~ ~ all the remaining boxes. The output box of the procedure is then given by (Xi, Y,) for i E K and by (Xy, for j @ K.
y)
THEOREM 9.6 ( M A R K ~AND T CSENDES,2004) The above procedure i s correct in the sense that all the optimal solutions in (X, Y ) are also contained in the output box.
6.7
Investigating subsets of tile combinations
In order to avoid (a part of) the extra amount of work caused by geometrically equivalent packing configurations and to restrict the application of the method of active areas to a local investigation, most computer methods for circle packing include a preprocessing procedure called tiling: Assume that a. lower bound j for the maximum value of the considered point packing problem instance is given. Split the unit square into regions (tiles) in such a way, that the square of distance between any two points within each tile is less than $ (or the distance between any
9
Circle Packing into the Square
255
two points within each tile is less than the fo value of Section 6.5). Then for a feasible solution having an objective function value greater than or equal to f , each tile can contain obviously at most one point of this solution. The optimal packings can be then found by running the search procedure on all possible tile combinations. Prior to the results of the present studies, the main problem when solving circle packing problem instances for n > 27, n # 36 was the highly increasing number of initial tile combinations. For n = 28, a sequential process on those combinations would have required about 1000 times more processor time (about several decades) even with non-interval computations -compared to the case of n = 27. The idea behind the newly proposed method is that we can utilize the local relations (patterns) between the tiles and eliminate groups of tile combinations together. Let us denote a generalized point packing problem instance by P ( n , X I , . . . ,Xn, Yl, . . . , Yn), where n is the number of points to be located, (Xi, Y,) E I I ~ ,i = 1 , . . .n are the components of the starting box, and the objective function of the problem is given by (9.3). The theorem below shows how to apply a result achieved on a 2m-dimensional packing problem to a 2n-dimensional problem with n>m>2.
THEOREM 9.7 ( M A R K ~ATN D CSENDES,2004) Assume that n 2 are integers and let
and
>m >
Pn = P ( n , X i , . . . ,Xn, Yl, . . . ,Yn) = P ( n , ( X , Y))
be point packing problem instances (Xi, Y,, Zi, Wi' E 1;Xi, Y,, Zi, Wi G [O, 11). R u n the B&B algorithm o n Pm using an f c u t 0 8 value in the accelerating devices but skipping the step of improving f . Stop after a n arbitrary preset number of iteration steps. Let (Zi, . . . , Z k , Wi, . . . , WA) := (Z', W') be the enclosure of all the elements placed o n the WorkList and o n the ResultList. Assume that there exists an invertible, distancepreserving geometric transformation cp with cp(Zi) = Xi and cp(Wi) = Y,, satisfying i = 1 , . . . ,m. T h e n for each point packing (x, y) E (x, y) E (X, Y) and fn(z, y ) f , the statement
>
(x, Y ) E
(dzi),.
also holds.
d Z k ) , X m + l , . . . ,Xn, ~p(J+'i),. - ., ~p(Wk),Ym+l, ,Yn) := (X',Y1)
256
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
- B&B
refl.
-
Figure 9.7. The idea behind processing tile combinations.
The meaning of Theorem 9.7 is the following: assume that we are able to reduce some search regions on a tile set S t . When processing a higher dimensional subproblem on a tile set S containing the image of the tile set of the smaller problem, it is enough to consider the image of those of the remaining regions of St as t,he particular coinpoiients of the latter problem. Figure 9.7 illustrates the application of the idea of l-landing sets of tile-combinations: the remaining regions of the tile combinations S and S' are given by the shaded areas. The transformation p is a reflection to the horizontal centerline of the rectangular region enclosing S'.
9.1 ( M A R K ~AND T CSENDES,2004) Let p be the identity COROI,I,ARY transformation and assume that the BBB algorithm terminates with a n e m p t y WorlcList and with a n e m p t y R e s u l t l i s t , i e . , the whole search W ) = ( Z I ,. . . , Zm, W I , . . . , Wm) = ( X I , .. . , XvL1Y I , . . . , Ym) region (2, i s eliminated by the accelerating devices using (the s a m e ) f . T h e n ( X ,Y) does n o t contain a n y ( 2 ,y) E R~~ vectors for which f,,(z, y ) 2 f holds.
6.8
Tile algorithms used in the optimality proofs
The method of the optimality proofs is started by finding feasible tile patterns and their remaining areas on some small subsets of the whole set of tiles. Then bigger and bigger subsets are processed while using the results of the previous steps. Thus, the whole method consists of several phases. The two basic procedures are:
Grow0 add tiles from a new coiumn to each element of a set of tile combinations.
Join0 join the elements of two sets of tile coinhinations pairwise. The detailed description of Join ( ) and Grow ( ) and the strategy of increasing the dimensionality of the subproblems can be found in Mark6t and Csendes (2004).
9
Circle Packing into the Square
257
Numerical results: optimal packings for
6.9
n = 28, 29, 30 The results obtained with the multiphase interval arithmetic based optimality proofs are summarized below: Apart from symmetric cases, one initial tile combination (more precisely, the remaining areas of the particular combination) contains all the global optimal solutions of the packing problem of n points. The guaranteed enclosures of the global maximum values of Problem 2 are
F&= [0.2305354936426673,0.2305354936426743], w (F&) z7 . l0-l5,
Fig = [0.2268829007442089,0.2268829007442240], w (F,*,)z2 . 10-14, F&= [0.2245029645310881,0.2245029645310903], w(F,",) x 2 . 10-15. The exact global maximum value differs from the currently best known function value by at most w(F,*). Apart from symmetric cases, all the global optimizers of the problem of packing n points are located in an (X,Y ) : box (see Mark6t and Csendes, 2004). The components of the result boxes have the widths of between approximately 10-12(with the exception of the components enclosing possibly free points). The differences between the volume of the whole search space and the result boxes are more than 711, 764, and 872 orders of magnitudes, respectively. The total computational time was approximately 53, 50, and 20 hours, respectively. The total time complexities are remarkably less than the forecasted execution times of the predecessor methods.
6.10
Optimality of the conjectured best structures
An optimal packing structure specifies which points are located on the sides of the square, which pairs have minimal distance, and which points of the packing can move while keeping optimality. The output of our methods serves only as a numerical approximation to the solution of the particular problems but it says nothing about the structure of the optimal packing(s). Extending the ideas given in Nurmela and Osterg&rd
258
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
(1999a) to an interval-based context, in a forthcoming paper we intent to prove also some structural properties of the global optimizers (for details see Mark& 2003b).
Acknowledgments The authors are grateful for all the help given by colleagues for the underlying research. This work was supported by the Grants OTKA T 016413, T 017241, OTKA T 034350, FKFP 0739197, and by the Grants OMFB D-30/2000, OMFB E-2412001.
References Alefeld, G. and Herzberger, J. (1983). Introduction to Internal Computations. Academic Press, New York. Althofer, I. and Koschnick, K.U. (1991). On the convergence of threshold accepting. Applied Mathematics and Optimization, 24: 183-195. Ament, P. and Blind, G.(2000). Packing equal circles in a square. Studia Scientiamm Mathematicarum Hungarica, 36:313-316. Boll, D.W., Donovan, J., Graham, R.L., and Lubachevsky, B.D. (2000). Improving dense packings of equal disks in a square. Electronic Journal of Combinatorzcs, 7:R46. Bolyai. F. (1904). Tentamen Juventutem Studiosam in Elementa Matheseos Purae, Elementaris Ac Sublimioris, Methodo Intituitiva, Evidentiaque Huic Propria, Introducendi, Volume 2, Second edition, pp. 119122. Casado, L.G., Garcia, I., and Sergeyev, Ya.D. (2000). Interval branch and bound algorithm for finding the first-zero-crossing-point in onedimensional functions. Reliable Computing, 6:179-191. Casado, L.G., Garcia, I., Szab6, P.G., and Csendes, T. (2001) Packing equal circles in a square. 11. New results for up to 100 circles using the TAMSASS-PECS stochastic algorithm. In: Optimization Theory: Recent Developments from Mcitrahdza, pp. 207-224. Kluwer, Dordrecht. Croft, H.T., Falconer, K.J., and Guy, R.K. (1991). Unsolved Problems in Geometry, pp. 108-110. Springer, New York. Csallner, A.E. Csendes, T., and Mark&, M.Cs. (2000). Multisection in interval methods for global optimization. I. Theoretical results. Journal of Global Optimization, 16:371-392. Csendes, T. (1988). Nonlinear parameter estimation by global optimization-- Efficiency and reliability. Acta Cybernetica, 8:361-370.
9 Circle Packing into the Square
259
Csendes, T . and Ratz, D. (1997). Subdivision direction selection in interval methods for global optimization, SIAM Journal on Numerical Analysis, 34:922-938. Du, D.Z. and Pardalos, P.M. (1995). Minimax and Applications. Kluwer, Dordrecht . Dueck, G. and Scheuer, T . (1990). Threshold accepting: A general purpose optimization algorithm appearing superior to simulated annealing. Journal of Computational Physics, 90:161-175. Fejes T6th, G. (1997). Handbook of Discrete and Computational Geometry. CRC Press, Boca Raton. Fejes T6th, L. (1972). Lagerungen in der Ebene, auf der Kugel und im Raum. Springer-Verlag, Berlin. Fodor, F. (1999). The densest packing of 19 congruent circles in a circle. Geometriae Dedicata 74:139-145. Folkman, J.H. and Graham, R.L. (1969). A packing inequality for compact convex subsets of the plane. Canadian Mathematical Bulletin, 12:745-752. Fukagawa, H. and Pedoe, D. (1989). Japanese temple geometry problems. Sun gaku. Charles Babbage Research Centre, Winnipeg. Goldberg, M. (1970). The packing of equal circles in a square. Mathematics Magazine, 43:24-30. Goldberg, M. (1971). Packing of 14, 16, 17 and 20 circles in a circles. Mathematics Magazine, 44:134-139. Graham, R.L. and Lubachevsky, B.D. (1995). Dense packings of equal disks in an equilateral triangle from 22 to 34 and beyond. Electronic Journal of Combinatorics 2:Al. Graham, R.L. and Lubachevsky, B.D. (1996). Repeated patterns of dense packings of equal circles in a square, Electronic Journal of Combinatonics, 3:R17. Graham, R.L., Lubachevsky, B.D., Nurmela, K.J., and Osterg&rd,P.R.J. (1998). Dense packings of congruent circles in a circle. Discrete Mathematics 181:139-154. Grannell, M. (1990). An Even Better Packing of Ten Equal Circles in a Square. Manuscript.
260
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
de Groot, C., Monagan, M., Peikert, R., and Wurtz, D. (1992). Packing circles in a square: review and new results. In: System Modeling and Optimization, pp. 45-54. Lecture Notes in Control and Information Services, vol. 180. de Groot, C., Peikert, R. and Wurtz, D. (1990). The Optimal Packing of Ten Equal Circles in a Square. IPS Research Report No. 90-12, Eidgenossiche Technische Hochschule, Ziirich . Grunbaum, B. (1990). An Improved Packing of Ten Circles in a Square. Manuscript. Hadwiger, H. (1944). ~ b e extremale r Punktverteilungen in ebenen Gebieten. Mathematische Zeitschrift, 49:370-373. Hammer, R. Hocks, M., Kulisch, U., and Ratz, D. (1993). Numerical Toolbox for Veriified Computing. I. Springer-Verlag, Berlin. Hansen, E. (1992). Global Optimization Using Interval Analysis. Marcel Dekker, New York. van Hentenryck, P., McAllester, D., and Kapur, D. (1997). Solving polynomial systems using a branch and prune approach, SIAM Journal on Numerical Analysis, 34:797-827, 1997. Horst R. and Thoai. N.V. (1999). D.C. programming: Overview, Journal of Optimization Theory and Applications, 103:l-43. Hujter, M. (1999). Some numerical problems in discrete geometry. Computers and Mathematics with Applications, 38:175-178. Karnop, D.C. (1963). Random search techniques for optimization problems. Automatzca, 1:111-121. Kearfott, R.B. (1996). Test results for an interval branch and bound algorithm for equality-constrained optimization. In: Computational Methods and Applications, pp. 181-200. Kluwer, Dordrecht. Kirchner, K. and Wengerodt, G. (1987). Die dichteste Packung von 36 Kreisen in einem Quadrat. Beitrage zur Algebra und Geometrie, 25:147-159, 1987. Knuppel, 0 . ( l 9 B a ) . PROFIL -Programmer's Runtime Optimized Fast Interval Library. Bericht 93.4., Technische Universitat HamburgHarburg. Knuppel, 0. (1993b). A Multiple Precision. Arithmetic for PROFIL. Bericht 93.6, Technische Universitat Hamburg-Harburg.
9 Circle Packing into the Square
261
Kravitz, S. (1967). Packing cylinders into cylindrical containers. Mathematics Magazine, 40:65-71. Locatelli, M. and Raber, U. (1999). A Deterministic global optimization approach for solving the problem of packing equal circles in a square. In: International Workshop on Global Optimixation (G0.99), F'lrenze. Locatelli, M. and Raber, U. (2002). Packing equal circles in a square: A deterministic global optimization approach. Discrete Applied Mathematics, 122:139-166. Lubachevsky, B.D. (1991). How to simulate billiards and similar systems. Journal of Computational Physics, 94:255-283. Lubachevsky, B.D. and Graham, R.L. (1997). Curved hexagonal packings of equal disks in a circle. Discrete and Computational Geometry, 18:179-194. Lubachevsky, B.D. Graham, R.L., and Stillinger, F.H. (1997). Patterns and structures in disk packings. Periodica Mathematica Hungarica, 34:123-142. Lubachevsky, B.D. and Stillinger, F.H. (1990). Geometric properties of random disk packings. Journal of Statistical Physics, 60:561-583. Maranas, C.D., Floudas, C.A., and Pardalos, P.M. (1998). New results in the packing of equal circles in a square. Discrete Mathematics, 128:187-193. Markbt, M.Cs. (2000). An interval method to validate optimal solutions of the "packing circles in a unit square" problems. Central European Journal of Operational Research, 8:63-78. Mark&, M.Cs. (2003a). Optimal packing of 28 equal circles in a unit square- The first reliable solution. Numerical Algorithms, 37:253261. Markbt, M.Cs. (2003b). Reliable Global Optimixation Methods for Constrained Problems and Thew Application for Solving Circle Packing Problems (in Hungarian). Ph.D. dissertation. Szeged. Available at http://www.inf.u-szeged.hu/-markot/phdmm.ps.gz Mark&, M.Cs. and Csendes, T. (2004). A New verified optimization technique for the "packing circles in a unit square" problems. Forthcoming in SIAM Journal on Optimixation.
262
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Mark&, M.Cs., Csendes, T., and Csallner, A.E. (2000). Multisection in interval methods for global optimization. 11. Numerical tests. Journal of Global Optimixation, 16:219-228. Matyas, J . (1965). Random optimization. Automatixation and Remote Control, 26:244-251. McDonnell, J.R. and Waagen, D. (1994). Evolving recurrent perceptrons for time-series modeling. IEEE Transactions on Neural Networks, 5:24-38. Melissen, J.B.M. (1993). Densest packings for congruent circles in an equilateral triangle. American Mathematical Monthly, 100:916-925. Melissen, J.B.M. (l994a). Densest packing of six equal circles in a square. Elemente der Mathematik, 49:27-31. Melissen, J.B.M. (1994b). Densest packing of eleven congruent circles in a circle. Geometriae Dedicata, 50:15-25. Melissen, J.B.M. (1994~).Optimal packings of eleven equal circles in an equilateral triangle. Acta Mathernatica Hungarica, 65:389-393. Melissen, J.B.M. and Schuur, P.C. (1995). Packing 16, 17 or 18 circles in an equilateral triangle. Discrete Mathematics, 145:333-342. Milano, R. (1987). Configurations optimales de desques duns un polygone rigulier. Mkmoire de licence, Universitk Libre de Bruxelles. Mollard, M. and Payan, C. (1990). Some progress in the packing of equal circles in a square. Discrete Mathematics, 84:303-307. Moore, R.E. (1966). Interval Analysis. Prentice-Hall, Englewood Cliffs. Moser, L. (1960). Problem 24 (corrected), Canadian Mathematical Bulletin, 8:78. Neumaier, A. (2001). Introduction to Numerical Analysis. Cambridge Univ. Press, Cambridge. Nurmela, K.J. (1993). Constructing Combinatorial Designs by Local Search. Series A: Research Reports 27, Digital Systems Laboratory, Helsinki University of Technology. Nurrnela, K.J. and Osterg&rd, P.R.J. (1997). Packing up to 50 equal circles in a square. Discrete and Computational Geometry, 18:111120.
9
Circle Packing into the Square
263
Nurmela, K. J. and Osterg&rd, P.R.J . (l999a). More optimal packings of equal circles in a square. Discrete and Computational Geometry, 22:439-457. Nurmela, K.J. and Osterg&rd,P.R.J. (1999b). Optimal packings of equal circles in a square. In: Y. Alavi, D.R. Lick, and A. Schwenk (eds.), Combinatorics, Graph Theory, and Algorithms, pp. 671-680. Nurmela, K.J., Osterg&rd, P.R.J., and aus dem Spring, R. (1999). Asymptotic Behaviour of Optimal Circle Packings in a Square. Canadian Mathematical Bulletin, 42:380-385, 1999. Oler, N. (1961a). An inequality in the geometry of numbers. Acta Mathematica, 105:19-48. Oler, N. (1961b). A finite packing problem. Canadian Mathematical Bulletin; 4:153-155. Peikert, R. (1994). Dichteste Packungen von gleichen Kreisen in einem Quadrat, Elemente der Mathematik, 49:16-26. Peikert, R., Wurtz, D., Monagan, M., and de Groot, C. (1992). Packing circles in a square: A review and new results. In: P. Kall (ed.), System Modellzng and Optimization, pp. 45-54. Lecture Notes in Control and Information Sciences, vol. 180. Springer-Verlag, Berlin. Petris, J. and Hungerbuler, N. (1990). Manuscript. Pirl, U. (1969). Der Mindestabstand von n in der Einheitskreisscheibe gelegenen Punkten. Mathematische Nachrichten, 40: 111-124. Raber, U. (1999). Nonconvex All-Quadratic Global Optimization Problems: Solution Methods, Application and Related Topics. Ph.D. thesis. University of Trier. Rao, S.S. (1978). Optimization Theory and Applications. John Willey and Sons, New York. Ratschek, H. and Rokne! J. (1988). New Computer Methods for Global Optimizatzon. Ellis IIorwood, Chichester. Reis, G.E. (1975). Dense packings of equal circles within a circle. Mathematics Magazine, 48:33-37. Ruda, M. (1969). Packing circles in a rectangle (in Hungarian). Magyar Tudoma'nyos Akad6mia Matematikai 6s Fizikai Tudoma'nyok Osztcilya'nak Kozieme'nyei, 19:73-87.
264
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Schaer, J. (1965). The densest packing of nine circles in a square, Canadian Mathematical Bulletin, 8:273-277. Schaer, J. (1971). On the densest packing of ten equal circles in a square. Mathematics Magazine, 44:139-140. Schaer, J. and Meir, A. (1965). On a geometric extremum problem. Canadian Mathematical Bulletin, 8:21-27. Schliiter, K. (1979). Kreispackung in Quadraten. Elemente der Mathematik, 34:12-14. Schwartz, B.L. (1970). Separating points in a square. Journal of Recreational Mathematics, 3:195-204. Solis, F.J. and Wets, J.B. (1981). Minimization by random search techniques. Mathematics of Operations Research, 6:19-50.
E. Specht 's packing web site. h t t p ://www .packomania .com Specht, E. and Szab6, P.G. (2004). Lattice and Near-Lattice Packings of Equal Circles in a Square. In preparation. Staar, Gy. (1990). The Lived Mathematics (in Hungarian). Gondolat, Budapest. Szab6, P.G. (2000a). Optimal packings of circles in a square (in Hungarian). Polygon, X:48-64. Szab6, P.G. (2000b). Some new structures for the "equal circles packing in a square" problem. Central European Journal of Operations Research, 8:79-91. P.G. Szab6 (2001). Sangaku- Wooden boards of mathematics in Japanese temples (in Hungarian). KoMaL, 7:386-388. Szab6, P.G. (2004). Optimal substructures in optimal and approximate circle packings. Forthcoming in Beitrage zur Algebra und Geometrie. Szab6, P.G. and Csendes, T . (2001). Dezso LBzAr and the densest packing of equal circles in a square problem (in Hungarian). Magyar Tudoma'ny, 8:984-985. Szab6, P.G. Csendes, T., Casado, L.G., and Garcia, I. (2001). Packing equal circles in a square. I. Problem setting and bounds for optimal solutions. In: Optimization Theory: Recent Developments from Ma'trahcixa, pp. 191-206. Kluwer, Dordrecht.
9
Circle Packing into the Square
265
Szab6, P.G. and Specht, E. (2005). Packing up to 200 Equal Circles in a Square. Submitted for publication. Tarnai, T . (1997). Packing of equal circles in a circle. Structural Morphology: Toward the New Millenium, pp. 217-224. The University of Nottingham, Nottingham. Tarnai, T . and GBspBr, Zs. (1995-96). Packing of equal circles in a square. Acta Technica Academiae Scientiarum Hungaricae, 107(1-2):123-135. Valette, G. (1989). A better packing of ten circles in a square. Discrete Mathematics, 76:57-59. Wengerodt, G (1983). Die dichteste Packung von 16 Kreisen in einem Quadrat. Beitrage xur Algebra und Geometrie, 16:173-190. Wengerodt, G. (1987a). Die dichteste Packung von 14 Kreisen in einem Quadrat. Beitrage xur Algebra und Geometrie, 25:25-46. Wengerodt, G. (1987b). Die dichteste Packung von 25 Kreisen in einem Quadrat. Annales Universitatis Scientiarum Budapestinensis de Rolando Eotvos Nominatae. Sectio Mathematica, 30:3-15. Wiirtz, D., Monagan, M., and Peikert, R. (1994). The history of packing circles in a square. Maple Technical Newsletter, 0:35-42.
Chapter 10
A DETERMINISTIC GLOBAL OPTIMIZATION ALGORITHM FOR DESIGN PROBLEMS FrBdkric Messine Abstract
1.
Complete extensions of standard deterministic Branch-and-Bound algorithms based on interval analysis are presented hereafter in order to solve design problems which can be formulated as non-homogeneous mixed-constrained global optimization problems. This involves the consideration of variables of different kinds: real, integer, logical or categorical. In order to solve interesting design problems with an important number of variables, some accelerating procedures must be introduced in these extended algorithms. They are based on constraint propagation techniques and are explained in t,his chapter. In order to validate the designing methodology, rotating machines with permanent magnets are considered. The corresponding analytical model is recalled and some global optimal design solutions are presented and discussed.
Introduction
Design problems are generally very hard to solve and furthermore very difficult to formulate in a rational way. For instance, the design of electro-mechanical actuators is clearly understood as an inverse problem: from some characteristic values given by the designer, find the physical structures, components and dimensions which entirely describe the resulting actuator. This inverse problem is ill-posed in the Hadamard sense because, even if the existence of a solution could be guaranteed, most often there is a large, or even an infinite number of solutions. Hence, only some solut,ions can be characterized and then, it becomes natural to search the optimal ones by considering some criteria, a priori, defined. As it is explained in Fitan et al. (2004) and Messine et a1 (2001), general inverse problems must consider the dimensions but also the structure and the components of a sort of actuator. Thus, an interesting formu-
268
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
lation of the design problems of electro-mechanical actuators-or other similar design problems-consists in considering the associated following non-homogeneous mixed constrained optimixation problem: min
xERn~,~EWn=, b€Bnb , k ~ n r ! + i
f (x, z , b , k)
subject to
(10.1) gi(x,x,b,k) 5 0 'di E (1,...,n g ) h j ( x , x , b , k ) = O ' d j ~ ( 1..., , nh)
where f , gi and hj are real functions, Kirepresents an enumerated set of categorical variables, for example a type of material, and B = {0,1) the logical set which is used to model some different possible structures. Interval analysis was introduced by Moore (1966) in order to control the numerical errors generated by the floating point representations and operations. Consequently, a real value x is enclosed by an interval where the lower and upper bounds correspond to the first floating point numbers over and under x. The operations over the intervals are then developed, so defining the interval arithmetic. Using this tool, reliable enclosures of functions are obtained. In global optimization and more precisely in Branch-and-Bound techniques, interval analysis is meant to compute reliable bounds of the global optimum for univariate or multivariate, non-linear or non-convex homogeneous analytical functions (Hansen, 1992; Kearfott, 1996; Messine, 1997; Moore, 1966; Ratschek and Rokne, 1988). In this chapter, one focuses on design problems which are generally non-homogeneous and mixed (with real, integer, logical and categorical variables). Therefore, this implies some extensions of the standard interval Branch-and-Bound algorithms. Furthermore, design problems are subjected to strong (equality) constraints and then the implicit relations between the variables can be used in order to reduce a priori the part of the box where the constraints cannot be satisfied. These techniques are named constraint propagation or constraint pruning techniques (Hansen, 1992; Messine, 1997, 2004; Van Henterbryck et al., 1997). In Section 2, a deter~ninistic(exact) global optimization algorithm is presented. It is an extension of an interval Branch and Bound algorithm developed in Ratschek and Rokne (1988) and Messine (1997), to deal with such a problem (10.1). An important part of this section is dedicated to the presentation of a propagation technique based on the computational tree. This technique inserted in interval Branch and Bound algorithms has permitted to improve considerably the speed of convergence of such methods. In order to validate this approach and
10 A Deterministic Global Optimization Algorithm for Design Problems 269
in order to show the efficiency of such an algorithm, only one type of electro-mechanical actuator is considered: rotating machines w i t h perm a n e n t magnets. This choice was determined by my personal commitment in the formulation of the analytical model of such actuators, and also by the fact that they represent difficult global optimization problems. Other related work on the design of piezo-electric actuators can be found in Messine et a1 (2001). In Section 3, the analytical model of rotating machines w i t h permanent magnets is entirely presented and detailed. The physical assumptions to obtain this analytical relations are not discussed here, see Fitan et al. (2003, 2004), Messine et al. (1998), Kone et al. (1993), Nogarede (2001), and Nogarede et al. (1995) for a thorough survey on this subject. Numerical optimal solutions for some machines are then subsequently discussed.
2.
Exact and rigorous global optimization algorithm
These kinds of algorithm, named Branch-and-Bound, work within two phases: the computation of bounds of a given function considered over a box, and the decomposition of the initial domain into small boxes. Thus, the initial problem is bisected into smaller ones and for each subproblem, one tries to prove that the global optimum cannot occur in them by comparing the bounds with the best solution previously found. Hence, only sub-problems which may contain the global optimum are kept and stored (a list is generated). For constrained problems, it is also possible to show by computing the bounds that a constraint never can be satisfied over a given box; these corresponding sub-problems are discarded. Furthermore, the constraints reveal implicit relations between the variables and then, some techniques, named constraint propagation or constraint pruning or constraint deduction, are developed to reduce the domain where the problem is studied. These techniques are based on the calculus trees of the constraints (Messine, 1997, 2004; Van Henterbryck et al., 1997) or on linearizations of the constraint functions by using a Taylor expansion at the first order (Hansen, 1992). The exact method developed for solving Problems (10.1) is an extension of Interval Branch and Bound algorithms (Hansen, 1992; Kearfott, 1996; Messine, 1997; Ratschek and Rokne, 1988). All these algorithms are based on interval analysis (Moore, 1966) which is the tool for computing the bounds of a continuous function over a box; i.e. an interval vector. Generally, these algorithms work with homogeneous real variables according to an exclusion principle: when a constraint cannot be satisfied
270
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
in a considered box or when it is proved that the global optimum cannot occur in the box. In our case, it is necessary to extend an internal Branch-and-Bound algorithm to deal with non-homogeneous and mixed variables: real, integer, logical and categorical issue to different physical sizes. Furthermore, in order to improve the convergence of this kind of algorithms, the introduction of some iterations of propagation techniques became unavoidable (Hansen, 1992; Messine, 1997, 2004; Van Henterbryck et al., 1997). In the corresponding code, all the variables are represented by interval compact sets: rn
real variables: one considers the interval compact set where the global solution is searched,
rn
integer variables: the integer discrete set is relaxed to become the closest continuous interval compact set; {zL,. . . , xu) becomes
[xL, xu]. logical variable: {0,1) is relaxed into [ O , l ] , categorical variables: one introduces some definition of intermediate univariate real functions as explained in a following part of this section. The categorical sets are in fact sets of number from one to the number of categories. Of course, these enumerated sets are not ordered. Therefore, a distinction must be introduced between continuous and discrete variables. In the following algorithm, f denotes the function to be minimized, C represents the list where the sub-boxes are stored, 2 and f denote the current solution during the running of the program, ef is the given desired accuracy for the global optimum value and E is a given vector for the precisions of the corresponding solution points. The main steps of Algorithm 10.1 are boldfaced and are defined and detailed in later subsections. ALGORITHM 10.1 (INTERVAL BRANCHAND BOUNDALGORITHM) Begin 1. Let X E EXn, x Wex Bnb x Kibe the initial domain i n which the global m i n i m u m is sought. 2. Set f := +oo. 3. Set C := (+oo, X). 4. Extract from C the box for which the lowest lower bound has been computed. 5. Bisect the considered box, yielding Vl, V2.
n;;,
10 A Deterministic Global Optimization Algorithm for Design Problems 271
6. For j := 1 to 2 do 6.1. Compute vj := lower bound of f over 4 . 6.2. Propagate the constraints over 5 , (4 can be reduced). 6.3. Compute the lower and upper bounds for the interesting constraints over &. 6.4. i f f vj and n o constraint is unsatisfactory then 6.4.1. insert (vj, &) in L. 6.4.2. set f := min (f,f ( m ) ) , where m is the midpoint of &, if and only if m satisfies all the constraints. 6.4.3. if f is changed then remove from L all ( z , Z ) where x > f and set y := m. end if 7. if f < min(,,z)sc z E / and the largest box i n L is smaller than E , then STOP. Else GoTo Step 4. Result: f", G, L. End
>
+
Because the algorithm stops when the global minimum is sufficiently accurate (less than E /), and also when all the sub-boxes Z are sufficiently small, all the global solutions are given by the minimizers belonging to the union of the remaining sub-boxes in L, and the minimal value is given by the current minimum f . In practice, only f and its corresponding solution, are considered.
REMARK 10.1 In order to consider the n o n homogeneous case, E is in fact a real positive vector, represents the desired accuracy for the boxes remaining in the list L; ~i > 0 if it corresponds to a real variable and else ~i = 0, for logical, integer and categorical variables. Algorithm 10.1 follows the four following phases: the bisection of the box, the computation of bounds over a box, the exclusion of a box and propagation techniques to reduce the considered box a priori. These techniques are detailed in the following subsections.
2.1
Bisection rules
This phase is critical because it determines the efficient way to decompose the initial problem into smaller ones. In our implementation, all the components of a box are represented by real-interval vectors. Nevertheless, attention must be paid to the components when they represent real, integer, logical or categorical variables.
272
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
The classical principle of bisection-in the continuous homogeneous case-consists in choosing a coordinate direction parallel to which the box has an edge of maximum length. This box is then bisected normal to this direction (Ratschek and Rokne, 1988). For solving the Problem (10.1), the real variables are generally nonhomogeneous (coming from different physical characteristics: current density and ,diameter of a machine for example). Furthermore, Algorithm 10.1 must deal with discrete variables: logical, integer, and categorical. Hence, the accuracy given by the designer, is represented in Algorithm 10.1 by a real vector E corresponding to each variable: it is the expected precision for the solution at the end of the algorithm. EI, is fixed to 0, if it represents a discrete (integer, logical or categorical) component. Therefore, the bisection rule is modified, considering continuous and discrete variables. Therefore, one uses two different ways to bisect a variable according to its type (continuous or discrete). : the given weights for respecLet us denote, by wx, wf, W: and ,w tively the real variables xi, the integer variables xi, the logical variables bi and the categorical variables ki. First, the following real values are computed for all the variables:
where the application I .I denotes the cardinal (i.e. the number of elements) of the considered discrete sets. The largest real value of this list implies the variable (k) which will be bisected, in the following way: 1. Z1 := Z and Z2 := Z 2. i f
~k
= 0 then ( f o r d i s c r e t e v a r i a b l e s )
e l s e Zk i s divided by i t s midpoint, t h i s d i r e c t l y produces Z1 and Z2.
denotes the kth comwhere Zk = [$, rg], respectively (Zl)k and (22)k, ponents of Z , respectively of Z1 and Z2. [xIIrepresents the integer part of the considered real value x.
REMARK 10.2 It is more efficient to emphasize the bisection for the discrete variables ki,because that involves a lot of considerable modifi-
10 A Deterministic Global Optimization Algorithm for Design Problems 273
cations of the so-considered optimization problem (10.1)). In the following numerical examples and more generally, the weight for the discrete variables are fixed to wf = w: = w: = 100 and for the real variables wx =
Computation of the bounds
2.2
The computation of the bounds represents the fundamental part of the algorithm, because all the techniques of exclusion and of propagation are depending on them. An inclusion function is an interval function, such that it encloses the range of a function over a box Y. For a given function f , a corresponding inclusion function is denoted by F, such that: [minuEyf (y), maxyEyf (y)] C F ( Y ) , furthermore one has: Z C Y implies F ( Z ) G F ( Y ) . The given functions must be explicitly detailed to make possible the construction of an inclusion function. Algorithm 10.1 works and converges even i f f is not continuous (Kearfott, 1996; Moore, 1966; Ratschek and Rokne, 1988). The number of global minimum points can be unbounded, but f has to be bounded in order to obtain a global minimum. Lipschitz conditions, differentiability, or smoothness properties are not needed. Nevertheless the numerical running is facilitated and the convergence speed may be improved if these properties are present. The following paragraph recalls the standard interval techniques used to construct inclusion functions (Moore, 1966). Let II be the set of real compact intervals [a, b], where a , b are real (or floating point) numbers. The arithmetic operations for intervals are defined as follows:
+
+
[a, b] [c,d] = [a -tc, b dl [a, b] - jc,d] = [a - d, b - c] = [a, b] x [c,d] = [min{a x c, a x d, b x c, b x d), max{axc,axd,bxc,bxd)] if 0 @ [c, dl [a,b] t [c,dl = [a, b] x
I
(10.2)
[i,$1
These above operations can be extended for mixed computations between real values and intervals because a real value is a degenerated interval where the two bounds are equal. When, the real value is not representable by a floating point number, an interval can then be generated with the two closest floating points enclosing the real value; for example for T, two floating points must be considered, one just under T and the other just over. Definitions (10.2) show that subtraction and division in I1 are not the inverse operatiyns of addition and multiplication. Unfortunately, the
274
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
interval arithmetic does not conserve all the properties of the standard one; for example it is sub-distributive: V(A, B, C) E 1l3, A x ( B C) 5 A x B A x C (Moore, 1966). The division by an interval containing zero is undefined and then, an extended interval arithmetic has been developed, refer to Ratschek and Rokne (1988) and Hansen (1992). The natural extension of an expression of f into interval, consisting by replacing each occurrence of a variable by its corresponding interval (which encloses it), and then by applying the above rules of interval arithmetic, is an inclusion function; special procedures for bounding trigonometric and transcendental functions allow the extension of this procedure to a great number of analytical functions. This represents a fundamental theorem of interval analysis Moore (1966). The bounds so-evaluated (by the natural extension of an expression of f ) are not always accurate in the sense that the bounds may become too large and then inefficient. Hence, several other techniques based on Taylor expansions, are classically used, refer to Messine (1997), Moore (1966), and Ratschek and Rokne (1988) for a thorough survey and discussion on this subject; for these inclusion function, the given function must be continuous and at least once differentiable. For our design problems, the natural extension into in,terval has generally been sufficient. Interval arithmetic is well defined only for continuous real functions and then, inclusion functions must be extended to deal with discrete variables. For logical and integer variables, one must just relax the fact that these variables are discrete: the discrete logical sets ( 0 , l ) become the continuous interval compact sets [ O , l ] , and the discrete integer sets ( 0 , . . . ,n), (1,. . . ,n), or more generally {xL ,z L 1,xL 2 , . . . , xu) are relaxed by respectively the following compact intervals: [O, n], [I,n] and [xL,xu]. Hence, a new inclusion function concerning mixed variables: logical, integer and real variables can then be constructed. The categorical variables cannot directly be considered in an expression of a function, because they represent some varieties of an object which induces some effects. Generally, these effects bring positive real values; for example the magnetic polarization value which depends on the kind of the permanent magnets used. Therefore, each categorical variable (used to represent varieties of objects) must be associated to at least one real univariate function, denoted by:
+
+
+
+
In this work, only univariate functions are considered because they actually are sufficient for our practical uses. Furthermore in our code, all
10 A Deterministic Global Optimization Algorithm for Design Problems 275
categorical variables a k are denoted by an integer number, beginning from 1 to IKkI. Each of these numbers correspond to a precise category which must be previously defined. Hence, for computing the bounds of a function f over a box X , 2, B , C, depending on the univariate real functions denoted by ai, enclosures of the intervals [minujEKjai (aj),maXuj~Kj ai(oj)] must be computed. Denoting by C j an enumerate subset of the corresponding categorical set Kj, the following inclusion function for the corresponding real function ai is then defined by: [vl,vl], if C j = [I, 11, if C j = [IKjI, [Kjl],
{ ~, .f. . , cy I}, max {vi, i E {c:, . . . , ~ y } } ] , in the general case, where ic, = [Cf, Cy] [l,lKkl] in this representation, a general inclusion function F ( X , 2,B, C) is then constructed for mixed (discrete and continuous) expressions.
REMARK 10.3 A more efficient inclusion function for the real univariate function ai over an enumerate set Cj C K j , is: Ai(Cj) := [rnin {vi, i E {C:,
. . . , cY}},
max {vi, i E
{ ~ f. . ,. , Cy}}]
+
However, this function needs an enumeration of the subset Cj for each computations. Other techniques are possible and some of them are detailed in Messine et al (2001). Hence, lower and upper bounds can also be generated. In order to produce logical, integer and categorical solutions with continuous relaxations for the corresponding discrete variables, only particular bisection rules must be considered, refer to the above section.
2.3
Exclusion principle
The techniques of exclusion are based on the fact that it is proved that the global optimum cannot occur in a box. This leads to two main possibilities, considering a sub-box denoted by X , 2, B , C: I . the (already found) solution, denoted by f , cannot be improved in , B, C) > f , i.e. the lower bound of this considered box: F ~ ( x2, the given function f over the sub-box X , 2,B, C is greater than
276
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
a solution already found: no point in the box can improve this solution f , see Step 6.4 of Algorithm 10.1. 2. It can be proved that a constraint will never be satisfied in the sub-box: G ~ ( x 2, , B, C) > 0 or 0 $ H k ( X ,Z, B, C). Equality constraints are hard to be satisfied numerically. Therefore, given tolerances are introduced for each equality constraints and then, one verifies if Hk(X,Z, B , C) [- (ce)k, (&e)k]in place of H k ( X ,2, B, C) = [O,O].
In our case, the computation of the bounds is exact and rigorous; thus the associated global optimization algorithm is said exact and rigorous, and the global optimum is then perfectly enclosed with a given accuracy: XU-xf < ~ i , ' v 'Ei (1,...,n,}, Z: = $,'v'i E (1,...,n,), b: = by,b'i E (1,. . . , nb} and k f = kv,'v'i E (1,. . . , nk}. REMARK10.4 It may be possible that a logical or a categorical variable generates new additional constraints and variables. In this case, particular procedures must be inserted.
2.4
Constraint propagat ion techniques
Constraint propagation techniques based on interval analysis permit to reduce the bounds of an initial hypercube (interval vector) by using the implicit relations between the variables derived from the constraints. In this subsection, the constraints are written in a general way, as follows: (10.3) c(x) E [a,b], with x E X c Rn. where c is a real function which represents the studied constraint, [a, b] is a real fixed interval and X is a real interval compact vector. In order to consider an equality constraint, one fixes a = b. For an inequality constmint, a is fixed to -00 (numerically one uses the lower representable floating point value). REMARK10.5 Only the continuous case is considered in this section. However, it is very simple to extend these techniques to deal with integer, logical and real variables-except the categorical case-by relaxing the discrete variables by their corresponding continuous set, such as explained below, and by taking the integer part of the upper bound and the integer part-plus one if it is different to the real value-of the lower bound of the resulting interval.
10 A Deterministic Global Optimization Algorithm for Design Problems 277
2.4.1
Classical interval propagation techniques.
The linear case. the propagation is:
If the given constraint is linear: c(x) =
zyz2=1 aixi,
where k is in (1, . . . , n } and Xi is the ith interval component of X .
The non-linear case, Hansen method. If the constraint c is non linear, but continuous and at least once differentiable, Hansen (1992) uses a Taylor expansion at the first order to produce a linear equation with interval coefficients. A Taylor expansion at the first order can be written as follows:
where (x, y) E X 2 and J E 2 (X represents the open set of the compact hypercube X: a component of x has the following form ] $, x y [) . An enclosure of Vc(<) may be computed with an automatic differentiable code extended to interval (Kearfott, 1996; Messine, 1997); one notices it by V C ( X ) . Hence, one has: Vc(J) E VC(X), with VC(X) € In. Consequently:
The propagation produces.
where VCk(X) represents the kth interval component of the V C ( X ) which is an enclosure of the gradient over the box X . 0 E VCk(X) no propagation is done. This equation is satisfied y E X ; generally, one fixes y to the middle of the box X : yk = x i ' d k {I, ~ ...,n}.
+
vector When for all xi/2,
REMARK 10.6 Hansen (1992) proposes other propagation techniques by solving the so-obtained linear system with interval coefficients, by an interval Gauss-Seidel algorithm and also, by using the order two of the Taylor expansions.
278
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
2.4.2 Constraint propagation technique based on calculus trees. This propagation technique is based on the construction of the calculus tree of the given constraint. All the partial computations are stored in each node of the tree and then, the constraint is propagated through the tree, from the root to the leaves.
10.2 (CONSTRAINT PROPAGATION ALGORITHM) ALGORITHM Begin 1. Construction of the calculus tree of the given constraint c over the box X : - the leaves are the variables of the problem: xi and its corresponding interval X i , or some constant real parameters, - on each node, the performed operation is stored with the partial computation; for example: X 1 X 3 , - the root is the last computation. 2. Going down through the tree from the root to the leaves. At each node, one tries by deduction to reduce the partial results computed at the first step, and then the leaves (variables) can be pruned; fm- example: X 1 X 3 = [a,b] implies , - X I ) nX 3 . X 1 := ( [ a ,b] - X 3 ) n X l and X 3 := ( [ a b] End
+
+
A detailed implementation of this algorithm is presented in Messine (1997) and Messine (2004). However, in order to fully understand how the Algorithm 10.2 works, let's consider the following example:
+
EXAMPLE10.1 2x3x2 X I = 3, with xi E Xi = [I, 31, 'di 'i { l ,2,3}. The propagation can be seen on Figure 10.1. The first step of the Algorithm 10.2 is represented on the left of the scheme Figure 10.1, and the second step (the propagation phase) on the right. At each node of the construction phase, the partial results are computed and stored, as represented on the left part of the Figure 10.1. Then, the propagation phase (on the right part of the Figure 10.1) goes down through the tree from the root to the leaves; this procedure makes possible to improve the partial results until the variables are reached (the leaves of the tree). These computations are denoted on the right of each node and each variable. Therefore for the variables, an intersection with its initial interval is performed; the results are denoted below the variables on the Figure 10.1. On this Example 10.1, this technique of propagation allows to prove in one step, that the unique solution which satisfies the constraint is xl = x2 = 2 3 = 1.
10 A Deterministic Global Optimization Algorithm for Design Problems 279
2.5
Properties of algorithm 10.2
For all the following properties, let us denote by Z the box resulting from the propagation of the constraint c ( x ) E [a,b] over a given hypercube X using the Algorithm 10.2 and by C the inclusion function of c , defined by using interval analysis (Moore, 1966). The following theorems and properties are taken from Messine (2004), they provide a theoretical background for the efficiency of such Algorithms 10.2 and also 10.1.
THEOREM 10.1 (VALIDATION OF ALGORITHM 10.2) There is n o point x in X , which satisfies a given constraint c, such that x does not belong to the hypercube Z resulting to the propagation of the constraint c over the box X and using the Algorithm 10.2.
PROPOSITION 10.1 T ~ Zproposition S presents four properties about Algorithm 10.2. 1. The propagation with C ( X ) f l [ a ,b] is suficient, 2. if C ( X ) n [a, b] = 0 then Z = 0 ,
3. zf C ( X )
c [a,b] then Z = X ,
4. if C ( Z ) 2 [a, b] then c ( x ) E [a, b], 'dx E 2.
THEOREM 10.2 A n overestimation i n the worst case of the number of iterations of the Algorithm 10.1, for a n homogeneous continuous problem
Figure 10.1. Scheme of the use of Algorithm 10.2.
280
(X
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
c Wn), ( f r o m Step
4. to Step 7.) is given by: ((maxiE{l,...,n)X! X : ) / E ) n. If the considered problem has q linear equality constraints which are linearly independent, and if Algorithm 10.2 is used at Step 6.2., then a n overestimation of the number of loops from Step 4 to Step 7 becomes ((maxiEIl,...,n) - x f ) Therefore, even if some computations are added (Algorithm 10.2), the exponential complexity of the Algorithm 10.1 can be strongly reduced.
XU
3.
Application to the design of electrical rotating machines
The purpose of the development of such Algorithm 10.1 was to solve some electro-mechanical actuator design problems. Such problems are clearly understood as inverse problems and are formulated as nonhomogeneous mixed constrained global optimization problems (10.1) (Fitan et al., 2004). Only the design of rotating machines with permanent magnets is addressed in this section. However Algorithm 10.1 is currently used by the members of the Laboratoire d7~lectrotechniqueet d 9 ~ l e c t r o n i q uInduse trielle of Toulouse (France) for the design of many different electrical actuators formulated as non-homogeneous or homogeneous mixed or not constrained global optim,ization problems (10.1)) as, for example, for the design of piexoelectrtc bimorphs or trimorphs, with categorical variables for considering the different kinds of ceramic (Messine et al, 2001). As for the design of electrical rotating machines, the direct problem is very hard to solve. It is so because that involves solving three hierarchical problems with Partial Derivative Equations: the Maxwell Equations, in order to obtain some interesting characteristic values of the considered machines; for example, the torque which is "quasi-equivalent" for nonspecialists to the strength of the considered machine. Finite element methods are classically used with efficiency to solve the direct problem. Nevertheless, the inverse problem which must be taken into account to solve rationally the design problems, becomes this way (quasi-)unsolvable. In the past years, the researchers in electrotechnic considered the case of rotating machines, some analytical models which approximate the Maxwell Equations. Thus, the direct problem becomes very simple and these analytical models allow researchers in electro-technic to clearly understand which parameters are important or not. The inverse problem of the design of electrical rotating machines can then be considered. In Fitan et al. (2004) and as recalled in the introduction, this inverse problem is formulated as a non-homogeneous mixed constrained global optimization problem (10.1).
10 A Deterministic Global Optimization Algorithm for Design Problems 281
3.1
Analytical model
In order to clearly understand this analytical model, all the different constitutive variables must be explained. The nomenclature used is xi E R for real variables, zi E N for integer variables, bi E {O,1) for logical variables and ki E Kifor categorical variables. One of another advantage of analytical models is that they permit to combine easily many different models; in this study an electro-magnetical model is associated with a thermal one.
3.1.1 Different kinds of machine. In this analytical model, only logical variables permit to characterize the different structures of the electrical motors: bl represents the fact that the rotating part of the machine is inside bl = 1 such as in common electrical motors, or outside bl = 0 such as, for example, the hard disk of a computer. The name of these machines is internal o r external rotating electrical machines. b2 represents the kind of armature: slotted b2 = 1 or non-slotted b2 = 0 machines. bs represents the kind of waveforms used for the electrical feeding of the machines: sinusoidal b3 = 0 or rectangular b3 = 1 waveforms are classically taken into account.
3.1.2 Dimensions of the machine. For these machines, the following real positive variables allow to characterize the thickness of some important active parts: XI
represents the bore diameter: the inside diameter of the machine.
x2
represents the length of the machine.
x3 represents the mechanical airgap; it is the space between the moving and the static parts of the machine. x4 represents the thickness of the magnets. By assumption, all the magnets have exactly the same size or else it will produce some chaotic rotating movements. xs represents the winding thickness. It is an interesting part of the motor where the electrical current flows; this process generates a flux density. xs represents the thickness of the yoke: that is the most internal and the most external active parts of the machines; the same value is taken for both.
282
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
3.1.3 Permanentmagnets. For these considered machines with permanent magnets, one real and one integer variables must be introduced:
zl represents the number of pole pairs. Because, in a machine, the magnets work by pair: one in a polar direction (towards the center of the motor) and another in the reverse polar direction. x7
represents the polar arc factor. This variable allows to determine the arc dimensions (inside and outside) of the magnet.
In this model, considering some regular aspects on the rotation of the machines, all the magnets have exactly the same size. 3.1.4 The electrical current. In order to model the current (areal) density, one introduces one more real variable xs. Attention must be paid to this particular variable because its value is generally around lo6 Amperes by square meters and then, the optimization problem becomes non-homogeneous with respect to the real variables.
3.1.5 The slots. If the considered machine is a slotted one, b2 = 1 then two other real positive variables and two other integer variables are introduced: xg represents the length of a slot evaluated at the middle.
xlo represents the length of the tooth evaluated at the center. 22
represents the number of phases.
z3
represents the number of slots.
When b2 = 0, these variables are not used and then, a special propagation step will fix them to 0 moreover these variables will be not bisected. The slots are very interesting in electrical machines in order to canalize the flux density into the tooth in that part of the motor. Furthermore the slot part may serve to insert some electrical coiling. The Figure 10.2 gives a graphical idea of the possible structures and of the main dimensional variables used for these kinds of rotating electrical motors. 3.1.6 Magnetic material. Nowadays, it is possible to generate many magnetic materials, for example by injecting some magnetic powders into a plastic form. 'These new materials present many advantages, for example, the weight and the way of fabrication.
10 A Deterministic Global Optimization Algorithm for Design Problems 283 Inverse rotor configuration
----. Internal rotor configuration
~soidalwaveform
Figure 10.2. Possible configurations.
In order to take into account these new possibilities, due to the new technologies producing these new materials, two new categorical variables are introduced in our models:
kl represents the magnetic material used for the permanent magnets. k2 represents the magnetic material used for the yoke and the tooth (if it is a slotted machine, the same material is used). 3.1.7 Thermal variables. A simpler model than the one presented in Fitan et al. (2003) is now considered. Therefore the following three real variables are introduced: xll
represents the temperature inside the winding part.
212
represents the temperature inside the mechanical airgap; between the moving and the static parts of the machine.
x13
represents the temperature of the ambient medium.
284
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
It is important to consider the temperature inside the machine, because the machine may be damaged when the temperature exceeds a fixed threshold. 3.1.8 Magnetical model. The modeling permits to analytically establish some characteristic physical parameters. They are modeled by mathematical positive real functions, depending on the variables introduced below. The way to produce these functions is not given here but can be found in several works, for example in Kone et al. (1993), Nogarede (2001), Fitan et al. (2OO3), and Fitan et al. (2004). The most important value for a considered machine is its magnetic torque (corresponding to its intrinsic strength). re, represents this value and is represented by a real function depending on many other real functions which are all explained, as follows:
Bc(x, I, b, k) = 2 . ~ (~ 2~2 ~( -1b3)
+ b3
Be(x,2, b, k).
(10.14)
10 A Deterministic Global Optimization Algorithm for Design Problems 285
Where K s represents the current electric loading, Be the flux density inside the mechanical airgap, Bt the flux density inside the yoke and kd the slot ratio; the other functions are extra ones and are explained in Fitan et al. (2003) and Fitan et al. (2004). (2bl - 1) takes the value -1 for a reverse rotating machine and 1 for a classical one.
REMARK 10.7 k, which represents the winding fitting factor, is considered as a constant in this model; it must be defined in the schedule of conditions. However, it could be interesting that k, becomes a variable in some applications. In many applications, the factor k, is fixed to 0.7. M ( k 1 ) and B M ( k 2 )are real univariate functions which take the values respectively in the categorical set K 1 and K2. The function M represents the magnetic polarization and depends on the magnetic material used for the permanent magnet and B M represents the maximal induction possible for the material used for the yoke and the slots. The enumerated values must be explained for each application which considers some different kinds of magnetic materials. In these equations, one has two inequalities constraints (10.16) and (10.17), and one equality constraint (10.12) which is always satisfied if b2 = 0--no slot. The other equations are definitions of functions which permit to formulate the optimization problem according to the wishes of the designer. 3.1.9 Thermal model. In this chapter, one considers just a simplified thermal model like in Fitan et al. (2004), for complete details please refer to Fitan et al. (2003). There are two ways to formulate the thermic flux inside the machine, that produces one equality constraint:
Where ha and he are constants which represent the characteristic coefficients of the convection respectively inside the ambient medium and inside the mechanical airgap. They must be defined in the schedule of conditions for each considered problem. The constant parameter pc, = 0.018 1 0 - ~ R r nis the resistivity of the copper.
286
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
3.1.10 Other functions and relations. Two geometrical relations must be introduced to produce mechanical constraints in order to consider sufficient internal or external radius. Therefore, two constants c~ and C R must be defined in the schedule of conditions.
The following functions come from geometrical relations and allow to define some interesting volumes and the mass of the considered rotating machines:
Where the constants dAl = 2700kg/m3 and dc, = 8900kg/m3 are respectively the density of the aluminum and of the copper. The other densities depend on the material used for the permanent magnet: d p M and for the yoke: dc,w. These two univariate real functions depending on categorical variables must be defined in the schedule of conditions of the application. The most interesting criteria are: the permanent magnet volume Vm, the global volume Vg and the mass Ma. For the others please refer to Fitan et al. (2004).
10 A Deterministic Global Optimization Algorithm for Design Problems 287
The numerical tests have been performed on a classical modern PCcomputer (2GHz). Both Algorithms 10.1 and 10.2 have been implemented with Fortran 90.
Design of the MAPSE
3.2
MAPSE is the name of a simple electrical machine without slot, which was first modeled and studied in Kone et al. (1993). This model can easily be deduced from the above general model by fixing the following variables: bl = 1, b2 = 0, b3 = 1, kl = modern permanent magnet, k2 = stamping magnetical conductor. Because it is a machine without slot (b2 = O), and because the thermal model was not taken into account, only the first eight real variables and the first integer variable zl are necessary. In this (old) model, two real variables were introduced to represent function values: xg = B,(x, z, b, k) and xlo = Kf (x, z, b, k). This design problem has been formulated as follows:
( x€XcR1'-', min
Vm(X,z) = ~ ~ x ~ x-~2x3 x -~ 14). ( x ~
xxl(zl
+ x5)xg = 10
Ks(x, z)x8 = krx5xi = 1011 ( A h ) . X g ( = Kf(',.Z)) = 1 . 5 ~ ~
(
210 =
Be(x, 2)) = X
Bc(x, z) = 21
=
2x4M
F.
~
~
~
).
l X l -x-+2 2( ~x 45 + ~ 3 )
Kg7x9x1 = 1.5 Z126
(Nm)
(T).
P
The last constraint does not occur in our actual model, presented in Subsection 3.1. because it derives from a technical constraint which links the diameter of the machine with the number of magnets. Over the past ten years, the technology for making these kinds of machines has improved and so, this constraint has become obsolete. Nevertheless, this constraint is kept here in order to compare the results to those obtained in previous works, Kone et al. (1993) and Messine et al. (1998). The initial domain where the global solution is sought, is:
X = ([0.01,0.5],[0.004,0.5],[0.001,0.005], [0.003,0.05],[0.001,0.05], [0.001,0.05],[0.8,1],[lo5,lo7],[0.1,1],[0.01,0.3]) and = ({I,. . . , l o } ) .
z
.
288
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Table 10.1. Optimal solution of the MAPSE.
Algorithm 10.1
+ Propagation
Without Propagation Hand-computed Hansen technique Algorithm 10.2 Algorithm 10.2 Hansen technique Algorithm 10.2 with E,
+
Minimum 6.20
Number of Iterations 407838
CPU Times E
lh35
6.76 loe4 6.79 6.81 lo-4 6.80
1340 609 120 43
1.9s 41.5s 0.5s 2.7s
6.21
4641
16.8s
The other parameters are fixed: k, = 0.7, B,(x, z) = Biron= 1.5T is the flux density in the iron, M = 0.9T the magnetic polarization and Ap = O.lm the polar step; Ap is not used in the general model presented in Subsection 3.1. In Kone ct al. (1993), standard optimization algorithms were used: Augmented Lagrangian algorithms and some solutions were found for some criteria such as V, and Va = 7rx2(x1 xg - 2 3 - x4)(2x6 x5 x3 24). V, represents the volume of the active parts of the motor (Va is a cost function because it represents the expensive part of the motor). Messine et al. (1998) show that these solutions are only local. Furthermore the global optimization algorithm identified problems with the model, and then this lead to modifications to the initial domain: the value for X3 was [0.001,0.05] in Kone et al. (1993) and the global solutions were obtained when x3 = 0.001 which is not technologically feasible-the gain for the optima was about 1000%: factor 10. However, even if the initial domain was reduced, a gain of about 10% was obtained for the criterium Va. In Messine (2004), the great efficiency of propagation techniques was shown on this example by fixing zl = 4 in order to have solutions using Algorithm 10.1 without propagation step. On this real exa~nple,one can note that all the propagation techniques have permitted to decrease considerably the CPU-time and the number of iterations of this kind of interval Branch-and-Bound Algorithms 10.1. The propagation techniques based on the calculus tree 10.2 is the most efficient (in this example) and the global solution was found in a record
+
+
+ +
10 A Deterministic Global Optimization Algorithm for Design Problems 289
time: only 0.5 seconds. In comparison to the hour and half necessary for the classical interval method, the gain is impressive. However, one can notice a difference between the obtained global solutions. In fact, the global solution given by the classical Algorithm 10.1 directly used without propagation, seems more efficient than those produced by performing some propagation steps. This is due to the fact that the constraints are satisfied with more accuracy when propagation techniques are used. If the designer wants to introduce a tolerance on these constraints, he will just need to modify the equality constraints hj(x) = 0 by hj(x) E [-(&e)j, (ce)j] (where all the ( ~ ~>) 0j are given by the user of Algorithm 10.1). In that case, the same solutions are obtained (denoted by "with E,"); one can note that the CPU-time and the number of iterations is increased, however the algorithm keeps its intrinsic efficiency: only 17 seconds compared to one hour and half with E, = (3 10l0, low2,lV3,2 The Hansen method does not give an efficient result: around 30mns in that case. Nevertheless, the Hansen propagatior~techniques must be associated with a interval GaussSeidel algorithm to become really efficient, see Hansen (1992). Considering only one constraint, the CPU-time of one propagation of the Hansen method is approximately eight times superior to the CPUtime of one iteration of the Algorithm 10.1.
REMARK10.8 No solution was obtained by considering the classical Algorithm 10.1 without propagation, if x l lies in (1, . . . , l o ) . Furthermore, by using propagation techniques the global solution published in Messine et al. (1998) was found in only a few seconds. Consequently, these propagation techniques become unavoidable in order to solve such constrained optimization problems.
3.3
More general numerical designs
In order to show that A.lgorithm 10.1 can solve more general problems with logical and categorical variables, one has taken into account the following design problems:
290
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
f (x, z, b7 k)
min
X E X C ~ ' ~ , Z E Z C ~ ~ , IK.1
b d B 3 , k ~ I I j = ;K j
rem(x,2, b, k) = 10 (Nm) 0.4 5 Be(x,x,b, k) 1 0.9 (T) 500. Ks(x,z, b, k) 5 10000. (Alm) 0.1 5 K f ( x , z, b, k) 5 0.3 0.4 5 kd(x,z, b, k) 5 0.6
<
0.9 5 B&, 2, b, k) 5 BM(k2) (T) 0.9 5 Bt(x72, b7k) 5 BM(k2) (T) Rint (x72, b, k) 2 50 (mm) R e x t (x, z, b, k) 80 (mm) Equality constraint of the thermal model g5Th Equality constraint for the number of slots Ne < (10.28) Wh ,e the constants take the values: k, = 0.7, he = 12 and ha = 4; the crit ia are respectively the Mass (lO.27), the global volume (10.26) and a cc lpromise between both:
<
where Mz and Vg*are the exact global optima for the minimization of the mass and of the global volume. This last criterium introduce a multicriteria problem formulation by weight factors. This kind of multicriteria-techniques produces here interesting results, because the weight factors are very efficient (considering the exact global solutions previously computed). The initial domain where the global solution is searched is: X = ([I,3 0 0 1 , [ ~ 0 , ~ 5 0 [4, 1,~ 1001, , [4, 1001, [4, 100],0.85,[3,61, [4,+4, [47 +d, 130,55,50), Z = ({I,. . . ,16),3, I ) , B = IB3 and K1 = {plastic magnet, modern magnet}, Kz = {powder, stamping}. The real univariate functions are: M (plastic magnet) = O.6(T), M (modern magnet) = O.g(T), dpM (plastic magnet) = 6000(kg/m,~),dpM (modern magnet) = 7900 (kg/m3), BM(powder) = 1.2(T), BM(stamping) = 1.5(T), dcM (powder) = 6000(kg/m3), dcM (stamping) = 7900(kg/m3). On this problem, some variables are fixed by definition of the initial domain, nevertheless all the possible configurations are taken into account.
10 A Deterministic Global Optimization Algorithm for Design Problems 291 Table 10.2. Optimal solutions for general design problems.
Variables
k2
Mass vg
Vc 1%
KO Va
unit
min Mass
138.5 54.2 4.2 6.0 4.6 5.7 4.2 4.2 9 1 1 1 modern powder kgS 2.84 1 0 - ~ r n ~ 1.08 1 0 - ~ r n ~ 21.6 1 0 - ~ r n ~ 7.4 ~ O - ~ r n ~ 5.3 ~ O - ~ r n ~ 7.9
min Vg
min Multi
126.0 51.2 4.4 8.1 4.6 4.8 4.7 4.2 7 0 1 1 modern stamping 3.16 0.86 18.4 7.2 5.6 7.8
130.2 50.1 4.2 7.5 5.6 5.1 4.0 4.0 8 0 1 1 modern powder
2.85 0.91 22.8 7.3 5.1 7.6
The results presented in Table 10.2, represent rotating machines which are optimal considering their mass or their global volume. These motors are drawn and discussed in Fitan et al. (2004). The main difference between the structures of the two optimal solutions comes from the rotoric configuration: internal for the first machine bl = 1 and external for the second bl = 0. It is interesting to notice that the magnetical materials used are also different for the two solutions. The multicriteria solution is very interesting because its value is 2.06 and it is very close to 2 which will be a Pareto global solution. The structure of this solution is an efficient compromise between the two machines: an internal rotoric configuration bl = 0 and powder for the used material of the magnetical conductor. Furthermore, xl = 8 which is just between the previous results (7 and 9). On these examples, one shows that these exact global optimization Branch-and-Bound algorithms based on interval analysis are perfectly adapted to solve such design problems. Nowadays this program-
292
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Algorithm 10.1-is used by the members of the Laboratoire d7Electrotechnique et d7~lectroniqueIndustrielle for the design of several elctro-mechanical actuators which own explicit analytical models such as those presented in this chapter for electrical rotating machines. Furthermore, these numerical designs show clearly the great efficiency of Algorithm 10.1 associated with constraint propagation techniques such as Algorithm 10.2. In such a way, the exact solutions were obtained for large global optimization problems which have no strong assumption about the criteria and the constraints, see equations 10.12, 10.16, 10.17. The adaptability of such interval Branch-and-Bound algorithm is showed and then this definitely validates the interests of these algorithms for solving such design problems.
4.
Conclusion
In the recent past years, these approaches combining analytical models and global optimization algorithms have shown their usefulness to solve in a rational way design problems for electro-mechanical actuators (Messine et al., 1998; Fitan et al., 2004, 2003; Messine et al, 2001). At present, Algorithm 10.1 must be associated to an analytical model which permits to formulate the so-considered design problem as a nonhomogeneous mixed constrained global optimization problem. The purpose of this chapter was also to show the interest of such an approach combining exact algorithms and analytical models. The resolution of the rotating machines with permanent magnets design problems takes from less than one minute to several hours for a standard computer (2GHz); it depends on the number of variables and on the criteria which were considered. In the second more general problems the solutions are obtained in approximately one hour. At present, Algorithm 10.1 became indispensable for solving such design problems in order to avoid or to limit the expensive phase of prototype hand-making in the Laboratoire d9~lectrotechniqueet d '~lectronique Industrielle.
Acknowledgments This work was also supported by the Laboratoire d'~1ectrotechnique et d7~lectroniqueIndustrielle (LEEI) de Toulouse, UMR-CNRS 5828, France. I would particularly like to thank Eric Fitan and Bertrand Nogarede from the LEEI, for their collaborations on this subject. Special acknowledgments to Pierre Hansen for the interest he has shown in this work.
10 A Deterministic Global Optimization Algorithm for Design Problems 293
References Fitan, E., Messine, F., and Nogarede, B. (2003). A general analytical model of electrical permanent magnet machine dedicated to optimal design. COMPEL-International Journal for Computation and Mathematics in Electrical and Electronic Engineering, 22(4):1037-1050. Fitan, E., Messine, F., and Nogarede, B. (2004). The electromagnetical actuators design problem: A general and rational approach. I E E E Transaction o n Magnetics, 40(3):1579-1590. Hansen, E. (1992). Global Optimization using Interval Analysis. Marcel Dekker, Inc., 270 Madison Avenue, New York 100016, USA. Kearfott, R.B. (1996). Rigorous Global Search: Continuous Problems. Kluwer Academic Publishers, Drodrecht, Boston, London. Kone, AD., Nogarede, B., and Lajoie-Mazenc, M. (1993). Le dimensionnement des actionneurs Blectriques: un probkme de programmation non-lin6aire. Journal de Physique, III:285-301. Messine, F. (1997). Mithodes d'optimisation globule busies sur l'analyse d'intervalles pour la risolution de problkmes avec contraintes. Ph.D. Thesis, Institut National de Polytechnique de Toulouse, France. Messine, F. (2004). Deterministic global optimization using interval constraint propagation techniques. R A I R O Operations Research, 38(4): 277-294. Messine, F., Monturet, V., and Nogarede, B. (2001). An interval branch and bound method dedicated to the optimal design of piezoelectric actuators. In: N. Mastorakis, V. Mladenov, B. Suter, and L.J. Wang (eds.), Advances i n Scientific Computing, Computational Intelligence, and Applications, pp. 174-180. Mathematics and Computers in Science and Engineering. Messine, F., Nogarede, B., and Lagouanelle, J.L. (1998). Optimal design of electromechanical actuators: a new method based on global optimization. I E E E Transaction o n Magnetics, 34(1):299-307. Moore, R.E. (1966). Interval Analysis. Prentice Hall, Englewood Cliffs, NJ. Neittaanmaki, P., Rudnicki, M., and Savini, A. (1996). Inverse Problems and Optimal Design in Electricity and Magnetism. Clarendon Press, Oxford.
294
ESSAYS AND SURVEYS IN GLOBAL OPTIMIZATION
Nogarede, B. (2001). Machines tournantes: principes et constitutions. Techniques de l'lnge'nieur, D3410 and D3411, France. Nogarede, B., Kone, A.D., and Lajoie-Mazenc, M. (1995). Optimal design of permanent-magnet machines using analytical field modeling. Electromotion, 2(1):25-34. Ratschek, H. and Rokne, J. (1988). New Computer Methods for Global Optimization. Ellis Horwood Limited Market Cross House, England. Van Henterbryck, P., Michel, L., and Deville, Y. (1997). Numerica: A Modelling Language for Global Optimiaation. MIT Press, Cambridge Mass.