Christian Küchler Stability, Approximation, and Decomposition in Two- and Multistage Stochastic Programming
VIEWEG+TE...
18 downloads
512 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Christian Küchler Stability, Approximation, and Decomposition in Two- and Multistage Stochastic Programming
VIEWEG+TEUBNER RESEARCH Stochastic Programming Editor: Prof. Dr. Rüdiger Schultz
Uncertainty is a prevailing issue in a growing number of optimization problems in science, engineering, and economics. Stochastic programming offers a flexible methodology for mathematical optimization problems involving uncertain parameters for which probabilistic information is available. This covers model formulation, model analysis, numerical solution methods, and practical implementations. The series ”Stochastic Programming“ presents original research from this range of topics.
Christian Küchler
Stability, Approximation, and Decomposition in Two- and Multistage Stochastic Programming
VIEWEG+TEUBNER RESEARCH
Bibliografische Information der Deutschen Nationalbibliothek Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über abrufbar.
Dissertation Humboldt-Universität zu Berlin, 2009
1. Auflage 2009 Alle Rechte vorbehalten © Vieweg +Teubner | GWV Fachverlage GmbH, Wiesbaden 2009 Lektorat: Dorothee Koch | Anita Wilke Vieweg+Teubner ist Teil der Fachverlagsgruppe Springer Science+Business Media. www.viewegteubner.de Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung außerhalb der engen Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung des Verlags unzulässig und strafbar. Das gilt insbesondere für Vervielfältigungen, Übersetzungen, Mikroverfilmungen und die Einspeicherung und Verarbeitung in elektronischen Systemen. Die Wiedergabe von Gebrauchsnamen, Handelsnamen, Warenbezeichnungen usw. in diesem Werk berechtigt auch ohne besondere Kennzeichnung nicht zu der Annahme, dass solche Namen im Sinne der Warenzeichen- und Markenschutz-Gesetzgebung als frei zu betrachten wären und daher von jedermann benutzt werden dürften. Umschlaggestaltung: KünkelLopka Medienentwicklung, Heidelberg Druck und buchbinderische Verarbeitung: STRAUSS GMBH, Mörlenbach Gedruckt auf säurefreiem und chlorfrei gebleichtem Papier. Printed in Germany ISBN 978-3-8348-0921-6
To my parents
Preface The present doctoral thesis has been developed during my employment at the Department of Mathematics at Humboldt-Universität zu Berlin. This employment was enabled and supported by the Bundesministerium für Bildung und Forschung and the Wiener Wissenschafts-, Forschungs- und Technologiefonds. First of all, it is a great pleasure to express my gratitude to my advisor Prof. Werner Römisch who enabled for me a most pleasant and inspiring time at Humboldt-Universität. Prof. Römisch supported me with numerous discussions and suggestions, incessantly encouraged me to follow my own ideas, and made me attend various workshops and conferences that gave me many opportunities to present and discuss my work. Very special thanks also go to my colleague Stefan Vigerske from Humboldt-Universität for the intensive collaboration whose results are also part of this thesis. In particular, it was his profound knowledge in optimization and programming which had allowed to implement the recombining tree decomposition approach in the present form. I would also like to thank him for never tiring to discuss the many facets of computer manipulation as well as mathematics. Special thanks also go to Prof. René Henrion from Weierstraß-Institut in Berlin for the very pleasant and fruitful collaboration on scenario reduction, resulting in parts of this thesis. I also want to thank him for numerous motivating and inspiring discussions. I would also like to thank my colleagues from the BMBF network for the successful cooperation, especially Alexa Epe from Ruhr-Universität Bochum and Oliver Woll from Universität Duisburg-Essen. Furthermore, I want to thank all of my friends and colleagues from Humboldt-Universität for their support, many helpful discussions, and the enjoyable common coffee breaks. In particular, I thank Dr. Andreas Eichhorn, Thomas Surowiec, and, once again, Stefan Vigerske for proofreading parts of this thesis. I also want to thank Dr. Holger Heitsch for providing me his scenred software. Finally, I want to express my deepest gratitude to my family for all their love and support, and, in particular, to Konny for having been so strong and patient during the last years. Christian Küchler
Contents 1 Introduction 1.1 Stochastic Programming Models . . . . . . . . . . . . . . . . . 1.2 Approximations, Stability, and Decomposition . . . . . . . . . 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Stability of Multistage Stochastic Programs 2.1 Problem Formulation . . . . . . . . . . . . . 2.2 Continuity of the Recourse Function . . . . 2.3 Approximations . . . . . . . . . . . . . . . . 2.4 Calm Decisions . . . . . . . . . . . . . . . . 2.5 Stability . . . . . . . . . . . . . . . . . . . .
1 1 4 7
. . . . .
. . . . .
. . . . .
9 10 13 21 25 28
3 Recombining Trees for Multistage Stochastic Programs 3.1 Problem Formulation and Decomposition . . . . . . . . . . 3.1.1 Nested Benders Decomposition . . . . . . . . . . . 3.1.2 Cut Sharing . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Recombining Scenario Trees . . . . . . . . . . . . . 3.2 An Enhanced Nested Benders Decomposition . . . . . . . 3.2.1 Cutting Plane Approximations . . . . . . . . . . . . 3.2.2 Dynamic Recombining of Scenarios . . . . . . . . . 3.2.3 Upper Bounds . . . . . . . . . . . . . . . . . . . . . 3.2.4 Extended Solution Algorithm . . . . . . . . . . . . 3.3 Construction of Recombining Trees . . . . . . . . . . . . . 3.3.1 A Tree Generation Algorithm . . . . . . . . . . . . 3.3.2 Consistency of the Tree Generation Algorithm . . . 3.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 A Power Scheduling Problem . . . . . . . . . . . . 3.4.2 Numerical Results . . . . . . . . . . . . . . . . . . 3.5 Out-of-Sample Evaluation . . . . . . . . . . . . . . . . . . 3.5.1 Problem Formulation . . . . . . . . . . . . . . . . . 3.5.2 Towards Feasible Solutions . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
33 35 37 38 39 40 41 45 51 53 55 57 60 77 77 79 83 84 85
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
x
Contents
3.5.3
Numerical Examples . . . . . . . . . . . . . . . . . . . 89
4 Scenario Reduction with Respect to Discrepancy Distances 4.1 Discrepancy Distances . . . . . . . . . . . . . . . . . . . . . . 4.2 On Stability of Two-Stage and Chance-Constrained Programs 4.3 Scenario Reduction . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Bounds and Specific Solutions . . . . . . . . . . . . . . . . . . 4.4.1 Ordered Solution and Upper Bound . . . . . . . . . . . 4.4.2 Lower Bound . . . . . . . . . . . . . . . . . . . . . . . 4.5 The Inner Problem . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Critical Index Sets . . . . . . . . . . . . . . . . . . . . 4.5.2 Reduced Critical Index Sets . . . . . . . . . . . . . . . 4.5.3 Determining the Coefficients . . . . . . . . . . . . . . . 4.5.4 Optimal Redistribution Algorithm . . . . . . . . . . . . 4.6 The Outer Problem . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Revising Heuristics . . . . . . . . . . . . . . . . . . . . 4.6.2 A Glimpse on Low Discrepancy Sequences . . . . . . . 4.6.3 A Branch and Bound Approach . . . . . . . . . . . . . 4.7 Numerical Experience . . . . . . . . . . . . . . . . . . . . . . . 4.8 Further Results . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 A Note on Extended Discrepancies . . . . . . . . . . . 4.8.2 Mass Transportation and Clustering . . . . . . . . . . 4.8.3 An Estimation between Discrepancies . . . . . . . . . .
97 98 100 105 106 106 110 113 114 115 116 121 124 124 127 127 131 138 138 140 144
Appendix
153
Bibliography
159
List of Figures 3.1 3.2 3.3 3.4
Recombining and non-recombining scenario trees . . . Electricity demand and wind turbine power curve . . Out-of-sample values for the power scheduling model Out-of-sample values for the swing option model . . .
. . . .
. . . .
. . . .
. . . .
. . . .
40 78 91 95
4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15
Recourse function of Example 4.2.1 . . . . . . . . . . . . . . Supporting and non-supporting polyhedra . . . . . . . . . . Rectangular discrepancy for different reduction techniques . Cumulative distribution functions of Example 4.6.1 . . . . . Cell discrepancy and running time for the Forward Selection. Low discrepancy points with adjusted probabilities . . . . . Perturbation of the optimal value of Example 4.7.1 . . . . . Recourse function of Example 4.7.2 . . . . . . . . . . . . . . Perturbation of the optimal value of Example 4.7.2 . . . . . Probabilities adjusted with respect to αBrect and ζ2 . . . . . αBrect and ζ2 for different weighting factors . . . . . . . . . . Optimal mass transportation with respect to αBrect and ζ2 . A polyhedron and the corresponding set M∦ . . . . . . . . . Enlargements of a polyhedron . . . . . . . . . . . . . . . . . Polyhedral singularity of multivariate normal distributions .
. . . . . . . . . . . . . . .
104 117 125 126 134 135 136 137 138 142 143 144 147 149 151
List of Tables 3.1 3.2 3.3 3.4 3.5 3.6
Parameters of the power scheduling model . . . . . . . . Running times of Benders Decomposition . . . . . . . . . Running times with and without using upper bounds . . Running times for different aggregation parameters . . . Out-of-sample values for different aggregation parameters Out-of-sample values for different aggregation parameters
. . . . . .
. . . . . .
. . . . . .
79 80 81 83 92 96
4.1 4.2 4.3
Number of supporting polyhedra and critical index sets . . . . 132 Running times of Algorithm 4.1 . . . . . . . . . . . . . . . . . 133 Running times of various support selection algorithms . . . . . 134
Index of Notation 1B (·) ·, · |I| a, a ¯ αB B Bcell Bcl Bconv Bph,k Bph,W Brect Cfeas Copt δξ ∂B d DF ΔB dist(a, B) E[ξ] F Fp ¯p F ¯ p,B F IB IB∗
the indicator function of the set B the standard scalar product in Rm the cardinality of the finite set I ¯ k , see p.116 a closed polyhedron in Rs induced by a, a ¯∈R the discrepancy distance w.r.t. the system B, see p.98 a system of Borel subsets of Rs , see p.98 the system of closed cells in Rs , see p.98 the system of closed subsets of Rs , see p.98 the system of closed, convex subsets of Rs , see p.98 the system of polyhedra in Rs having at most k vertices, see p.98 the system of polyhedra in Rs each of whose facets parallels a facet of [0, 1]s or pos W , see p.98 the system of closed, s-dimensional rectangles in Rs a set of feasibility cuts, see p.43 a set of optimality cuts, see p.42 the Dirac measure in ξ the topological boundary of the Borel set B the Pompeiu-Hausdorff distance, see p.12 a probability metric with ζ-structure, see p.100 the optimal value of the scenario reduction problem, see p.107 the distance between the point a ∈ Rn and the set B ⊂ Rn , see p.12 the expectation of the random variable ξ a class of Borel measurable mappings, see p.100 a class of locally Lipschitz continuous mappings, see p.101 a class of uniformly bounded, piecewise locally Lipschitz continuous mappings, see p.101 a class of uniformly bounded, locally Lipschitz continuous mappings, see p.101 the system of critical index sets, see p.114 the system of reduced critical index sets, see p.115
Index of Notation
int S ΛRj Mm [1,T ] pos W
xvi
the topological interior of the set S the set of representative nodes at time Rj , see p.40 a set of Borel measurable mappings, see p.11 the positive cone of a (d × m)-matrix W , i.e., pos W = {W y : y ∈ Rm +} Qt (·, ·) the recourse function at time t, see p.13 P, Q Borel probability measures Pt the probability distribution of the random variable ξt under the measure P, see p.11 P[t] the probability distribution of the random variable ξ[t] under the measure P, see p.11 P the set of supporting polyhedra, see p.117 the set of all Borel probability measures on Ξ ⊂ Rs with finite Pp (Ξ) absolute moments of order p ≥ 1, see p.101 R the set of real numbers R+ the set of non-negative real numbers, i.e., R+ = [0, ∞) ¯ ¯ = R ∪ {−∞, +∞} R the set of extended real numbers, R s ξ an R -valued random variable or stochastic process ξ[t] the random vector (ξ1 , . . . , ξt ) ξ[s,t] the random vector (ξs , . . . , ξt ) for s, t ∈ N with s ≤ t S(ξ) the set of decisions that are feasible w.r.t. the process ξ, see p.12 Sn the standard simplex in Rn , see p.105 v(ξ), v(P) the optimal value of a stochastic program, see p.12 the support of the measure Pt Ξt Ξ[t] the support of the measure P[t] ζp the p-th order Fortet-Mourier metric, see p.101 ζp,Bph,k an extended polyhedral discrepancy, see p.101 an extended polyhedral discrepancy, see p.101 ζp,Bph,W
Chapter 1 Introduction 1.1
Stochastic Programming Models
In modern decision theory, it is often the case that at least some of the considered components of a given model are uncertain. Such problems arise in a variety of applications, such as inventory control, financial planning and portfolio optimization, airline revenue management, scheduling and operation of power systems, and supply chain management. Dealing with such decision problems, it is reasonable (and sometimes inevitable) to consider possible uncertainties within an optimization and decision-making process. Stochastic programming provides a framework for modeling, analyzing, and solving optimization problems with some parameters being not known up to a probability distribution. Stochastic programming has its origin in the early work of Dantzig (1955). It was initially motivated to allow uncertain demand in an optimization model of airline scheduling to be taken into account. Since its beginnings, the field has grown and extended in various directions. Introductory textbooks that give an impression of the diversity of stochastic programming are due to Kall and Wallace (1994), Prékopa (1995), Birge and Louveaux (1997), and Ruszczyński and Shapiro (2003b). A variety of applications are discussed by Wallace and Ziemba (2005). In particular, Dantzig (1955) introduced the concept of two-stage linear stochastic programs, which is today regarded as the classical stochastic programming framework. Two-stage stochastic programs model the situation of a decision maker who must first make (first-stage) decisions without knowing some uncertain parameters, which, e.g., may affect the costs or constraints on future decisions. In the second stage, the unknown parameters are revealed and the decision maker then makes a recourse decision that is allowed to depend (in a measurable way) on the realization of the stochastic param-
2
Chapter 1. Introduction
eters. In some applications, the first and second stage decisions stand for investment and operation decisions, respectively. One of several possible mathematical formulations of a two-stage linear stochastic program reads as follows. inf b1 , x1 + E [b2 (ξ), x2 (ξ)] s.t. x1 ∈ X1 , x2 (ξ) ∈ X2 , A2,1 (ξ)x1 + A2,0 (ξ)x2 (ξ) = h2 (ξ)
(1.1) (1.2) (1.3)
Here, ξ is a random vector on a probability space (Ω, F, P) and models the stochastic parameters of the optimization problem. The variables x1 and x2 denote the first- and second-stage decision, respectively. For i = 1, 2, the decision xi has to lie in some Borel constraint set Xi ⊂ Rm . The first-stage decision x1 is a constant, whereas the second-stage decision x2 = x2 (·) is assumed to be a measurable mapping from Ξ supp P[ξ ∈ ·] to Rm . The decision xi at stage i causes linear costs bi , xi with some coefficients bi ∈ Rm , where b2 is allowed to depend affinely on the realization of ξ. The decisions x1 and x2 are intertwined by the time coupling constraint (1.3). Finally, we note that the technology matrix A2,1 , the recourse matrix A2,0 , and the right-hand side h2 may again depend affinely on ξ and take values in Rn·m and Rn , respectively. Note that the objective of the optimization problem (1.1) is to minimize the expected value of the total costs, and the constraints (1.2) and (1.3) are assumed to hold P-almost surely. Dantzig’s framework has been extended during the last few decades in various directions. If some of the components of the decision variables in problem (1.1) are required to be integer, i.e., X1 , X2 ⊂ Zm1 × Rm2
(1.4)
with m1 , m2 ∈ N, m1 + m2 = m, one arrives at mixed-integer two-stage linear stochastic programs. Such integrality constraints may arise in a variety of practical situations, e.g., by modeling technical or economical systems that allow only for discrete decisions. Furthermore, integer variables can be helpful to describe discontinuities or piecewise linear functions by means of linear expressions. Under integrality constraints, continuity and convexity properties of problem (1.1) are generally lost and thus the structure of mixed-integer stochastic programs is more intricate. Despite their practical relevance, mixed-integer stochastic programs have received only limited attention compared to the non-integer case, see Stougie (1985) for an early reference, and Römisch and
1.1. Stochastic Programming Models
3
Schultz (2001), Louveaux and Schultz (2003), Schultz (2003), Sen and Sherali (2006) for more recent results. The constraints in problem (1.1) are claimed to hold P-almost surely. However, in several technical or economical decision problems almost-sure constraints may be too restrictive and may lead to unacceptably expensive solutions, or even to infeasibility of the decision problem. Such problems may be modeled by a further class of stochastic programs considering constraints that are assumed to hold (at least) with a certain probability, i.e., so-called chance constraints. Chance constraints are also a modeling tool for regulatory terms as the Value-at-Risk constraints in financial applications. A simple example for an optimization problem including chance constraints is the following. inf b1 , x1 s.t. x1 ∈ X1 , P [A2,1 (ξ)x1 ≥ h2 (ξ)] ≥ p,
(1.5)
(1.6)
where p ∈ [0, 1] denotes some probability threshold, and b1 , X1 , A2,1 (·), and h2 (·) are defined as above. Further formulations and various results on chance-constrained stochastic programming as well as numerous references are provided by Prékopa (1995, 2003). A natural extension of the two-stage framework (1.1) is the consideration of a multi-stage setting. The latter corresponds to a situation where information about the unknown parameters is revealed sequentially and decisions have to be made at certain time points. A multi-stage extension of (1.1) can be formulated as follows: inf b1 , x1 +
T E bt (ξ[t] ), xt (ξ[t] )
(1.7)
t=2
s.t. x1 ∈ X1 , xt (ξ[t] ) ∈ Xt , t = 2, . . . , T, t−1 A (ξ )x (ξ ) = h (ξ ), t = 2, . . . , T, t,τ t−τ t [t] [t−τ ] [t] τ =0
(1.8)
where ξ = (ξt )t=1,...,T is a stochastic process on (Ω, F, P) with time horizon T ∈ N and ξ[t] denotes the vector (ξ2 , . . . , ξt ). Note that, in particular, the decision xt at time t is allowed to depend (in a measurable way) on ξ[t] , i.e., on the information obtained by observing ξ until time t.
4
Chapter 1. Introduction
A further extension of the classical framework is to replace (or, to adjust) the expectation operator E[·] by some risk functional F[·], i.e., the objective of (1.1) becomes (1.9) inf F [b1 , x1 + b2 (ξ), x2 (ξ)] . A variety of risk functionals have been proposed and studied in the literature. We refer to, e.g., the classical mean-variance approach due to Markowitz (1952), the widely applied (Average-)Value-at-Risk functionals, several (semi-)deviation measures, as well as functionals based on utility functions. Risk functionals for the multistage case have arisen and studied intensively during the last years; we refer to the recent book of Pflug and Römisch (2007) as well as the work of Eichhorn (2007) and the numerous references therein.
1.2
Approximations, Stability, and Decomposition
A common feature of the stochastic programming models considered in the previous section is that in most practical applications analytic solutions are rarely available. In such cases, one has to resort to numerical optimization methods to find optimal (or, at least, acceptable) solutions. While there are approaches that embed the construction of solutions into a sampling scheme, most of the numerical methods require the underlying stochastic entities to take only a finite number of values. Furthermore, in order to enable acceptable solution times, the number of possible values of the stochastic variables has to be very limited in many cases. In particular, this is the case for multistage and mixed-integer stochastic programs. Approximations Whenever the underlying probability measure does not fulfill the aforementioned finiteness requirements, a common approach is to approximate it by a measure that is supported by a suitable number of atoms (or, scenarios). For this purpose, several techniques have been developed. These techniques are based on different principles like random sampling (Shapiro, 2003b), Quasi Monte-Carlo sampling (Pennanen, 2005), and moment matching (Høyland et al., 2003; Høyland and Wallace, 2001). Accordingly, convergence properties of optimal values and/or solution sets for specific techniques as well as bounds for statistical estimates have been established, cf. Pflug (2003), Shapiro (2003b), and the references therein.
1.2. Approximations, Stability, and Decomposition
5
Another established approximation approach relies on the usage of specific probability metrics1 , see, e.g., Pflug (2001), Dupačová et al. (2003), Henrion et al. (2009), Heitsch and Römisch (2008). For such methods, the approximation of the initial measure in terms of a specific metric is considered reasonable whenever the optimal value and solution set of the considered stochastic program are known to possess some regularity with respect to the given metric (e.g., in form of Lipschitz or Hölder continuity). In order to identify distances that are suitable for specific problem classes, perturbation and stability issues become relevant. Stability In Stochastic Programming, the term stability usually refers to calmmess and continuity properties of optimal values and solution sets of a stochastic program under perturbations (or, approximations) of the underlying probability measure (cf. the recent survey by Römisch (2003)). For such regularity properties, the particular probability metric must be adapted to the structure of the stochastic program under consideration. In particular, FortetMourier and Wasserstein metrics are relevant for two-stage stochastic programs (cf. Römisch and Schultz (1991); Rachev and Römisch (2002)). These distances have been used for the approximation of discrete probability distributions in two-stage stochastic programs without integrality requirements (Dupačová et al., 2003; Heitsch and Römisch, 2003, 2007). For two-stage mixed-integer models discrepancy distances are useful, see Schultz (1996), Römisch (2003), Römisch and Vigerske (2008). Discrepancy distances are also relevant for chance-constrained problems, see Römisch and Wakolbinger (1987), Henrion and Römisch (1999, 2004). Heitsch et al. (2006) established a general stability result for linear multistage stochastic programs involving a specific filtration distance. The latter measures the distance between the information flows of the initial and the perturbed stochastic process. This distance is taken into account by the techniques for scenario tree generation developed by Heitsch and Römisch (2008). While consistency and stability results have turned out to be useful for approximation purposes, they usually require the optimization problems and underlying random variables to fulfill specific boundedness and regularity properties, which, however, may be hard to verify in cases of practical interest. Furthermore, due to the numerical complexity of solving stochastic optimization problems, it may be necessary to use approximations that are 1
The term probability metric refers to a distance on some space of probability measures.
6
Chapter 1. Introduction
too rough to provide meaningful error bounds. In these situations, it makes sense to measure the quality of a certain approximation by numerical methods. Such methods have been studied recently by, e.g., Chiralaksanakul and Morton (2004), Hilli and Pennanen (2006), Kaut and Wallace (2007). Decomposition Once the underlying probability measure has finite support, i.e., supp P[ξ ∈ ·] = Ξ = {ξ 1 , . . . , ξ N },
(1.10)
the stochastic program can be formulated as a deterministic optimization problem and, in principle, solved by numerical optimization methods. However, since every atom of the probability measure induces a family of decision variables, these optimization problems are generally of large scale. Thus, much effort has been invested to develop techniques that allow to exploit special structures of stochastic programs in order to decompose the problem in a family of smaller subproblems. This can be done, e.g., by considering problem (1.1) in the dynamic form inf b1 , x1 + E [Q2 (ξ, x1 )] ,
x1 ∈X1
(1.11)
with the recourse function (or, cost-to-go function) Q2 (ξ i , x1 ) inf b2 (ξ i ), x2 s.t. x2 ∈ X2 ,
(1.12)
A2,1 (ξ i )x1 + A2,0 (ξ i )x2 = h2 (ξ i ) for x1 ∈ X1 and ξ i ∈ Ξ. Primal decomposition methods proceed by solving the subproblem (1.12) for many values of x1 and ξ i in order to construct a suitable approximation of Q2 allowing to solve the master problem (1.11). Arguably the most popular primal method for two- and multi-stage stochatic linear programs is due to Benders (1962), known as (Nested) Benders Decomposition. In this method, the recourse function is approximated by a cutting-plane estimate that is successively refined by using dual solutions of the subproblems (1.12). In particular, the finiteness of Ξ ensures that the (convex) recourse function is piecewise linear. Thus, the Benders Decomposition either yields an optimal solution or reveals the infeasibility of problem (1.11) after a finite number of steps. Various variants of this and related approaches are presented by Birge and Louveaux (1997) and Ruszczyński (2003).
1.3. Contributions
7
Integrality constraints provide a serious drawback for primal decomposition methods, due to the fact that neither convexity nor continuity of the cost-to-go function is then available. There are, however, cutting-plane based approaches for mixed-integer stochastic programs known from the literature, e.g. those by Carøe and Tind (1998), Sen and Higle (2005), and Sen and Sherali (2006), see also Section 5 of Schultz (2003) for a summary. However, these approaches are both technically and numerically demanding. Thus, for mixed-integer stochastic programs dual decomposition methods have earned more attention. In principle, dual methods rely on the relaxation of a certain group of constraints. As an example, we write problem (1.1) as inf
n
P[ξ = ξ i ] · b1 , xi1 + b2 (ξ i ), xi2
(1.13)
i=1
s.t. xi1 ∈ X1 , xi2 ∈ X2 , i
A2,1 (ξ )xi1 xi1 = xj1 ,
+ A2,0 (ξ
i = 1, . . . , n i
)xi2
i
= h2 (ξ ),
i = 1, . . . , n i, j = 1, . . . , n.
(1.14)
Here, the vector (xi1 , xi2 ) denotes the decisions in scenario ξ i . The constraint (1.14) ensures that the first-stage decision does not depend on the outcome of ξ (which can not be observed until the second-stage) and is denoted as nonanticipativity constraint. Relaxation of (1.14) turns (1.13) into a family of n decoupled scenario subproblems, that are usually much easier to solve by means of (deterministic) optimization, see Ruszczyński (2003). Such a relaxation is not limited to nonanticipativity constraints. We refer to, e.g., Nowak and Römisch (2000), who study a power scheduling problem and relax constraints that couple different power units. Consequently, they obtain unit subproblems instead of scenario subproblems. While (scenario or unit) subproblems are generally much easier to solve than the initial stochastic program, the challenge is how to obtain a feasible (and optimal) solution to the initial problem out of the decoupled subproblems. This can be done, e.g., by Lagrangian heuristics as in Nowak and Römisch (2000) or by a branch-and-bound approach as in Heinze and Schultz (2008).
1.3
Contributions
In the previous section, we sketched the relevance of stability, approximations, and decomposition in stochastic programming. The purpose of the
8
Chapter 1. Introduction
present work is to study and extend miscellaneous results on these subjects. In Chapter 2, we establish a quantitative stability result for multistage stochastic linear programs, that extends the recent result of Heitsch et al. (2006) by specifying properties that allow to omit the filtration distance appearing in their main result. Under these properties, we verify a suitable type of Lipschitz-continuity of the recourse function and establish an upper bound on the cost increase due to considering only a specific class of calm solutions. Our result can be also seen as a generalization of the consistency proof of Mirkov and Pflug (2007), since it shows how some of their continuity and boundedness assumptions can be relaxed. In Chapter 3, an extension of the Nested Benders decomposition approach for multistage stochastic linear programs is developed. This approach is related to recombining scenario trees and combines the concept of cut sharing discussed by Infanger and Morton (1996) with a specific aggregation procedure preventing an exponentially growing number of subproblem evaluations. Convergence results and stopping criteria are discussed. In order to construct appropriate recombining scenario trees, we propose an algorithm based on successive clustering (related to the methodology of Heitsch and Römisch (2008)) and apply the stability result of Chapter 2 to verify the consistency of this approximation technique. We further study an approach to the numerical evaluation of solutions of linear multistage stochastic programs. Chapter 4 studies an approach to scenario reduction for mixed-integer two-stage and chance-constrained linear stochastic programs. Recent results (Schultz, 1996; Römisch and Vigerske, 2008) have shown that stability of such problems holds true in terms of certain discrepancy distances between probability distributions. We enhance the methodology of Dupačová et al. (2003) and show how approximations with respect to these distance measures can be constructed. Parts of the present work have been published previously. Chapter 2 basically corresponds to the article by Küchler (2008), slighty extended in order to hold for distributions of stochastic processes instead of processes defined on the same probability space. Parts of Chapter 3 have been published by Küchler and Vigerske (2007, 2008) and have been applied to several case studies by Epe et al. (2007, 2008). The consistency result for the recombining tree construction is new. Parts of Chapter 4 correspond to the publications by Henrion et al. (2009, 2008), several examples and the results on extended discrepany distances have been added.
Chapter 2 Stability of Multistage Stochastic Programs Many stochastic optimization problems of practical interest do not allow for an analytic solution, and numerical approaches require the underlying probability measure to have finite support. Whenever the initial probability measure does not meet these demands, it has to be approximated by an auxiliary measure. Thereby, it is reasonable to choose the approximating measure such that the optimal value and the set of optimal decisions of the auxiliary problem are close to those of the originial problem. Consequently, perturbation and stability analysis of stochastic programs is necessary for the development of reliable techniques for discretization and scenario reduction. While stability properties are well understood for non-dynamic chance constrained and two-stage problems, cf. the recent survey by Römisch (2003), it turned out that the multistage case is more intricate. Recently, the latter situation has been studied by a variety of authors, and thus the following references should not be considered to be exhaustive. Statistical bounds have been provided by Shapiro (2003a). Pennanen (2005) established asymptotic stability of specific approximations for a general class of convex multistage problems in terms of epi-convergence. In doing so, he noticed that such quantitative results, as we discuss in this chapter, require stronger assumptions. Indeed, the restriction on models with continuous decisions allowed Mirkov and Pflug (2007) to establish such a quantitative stability result for their tree approximations. Heitsch et al. (2006) did not require regularity conditions on decisions and underlying processes. Consequently, their quantitative stability result, obtained by considering arbitrary perturbations of the underlying process, incorporates a term measuring the distance of the filtrations induced by the initial and the auxiliary process, respectively. Vanishing in the two-stage case, this filtration distance reflects the relevance of
10
Chapter 2. Stability of Multistage Stochastic Programs
the information structure and of the nonanticipativity constraints for multistage decision problems. We also refer to Barty (2004) who studied the role of information in stochastic optimization problems and introduced and reviewed several concepts of distances between filtrations. The recent approach of Heitsch and Römisch (2008) aims to incorporate filtration distances into the construction of scenario trees. However, this requires some extra effort and, to the best of our understanding, these distances are not taken into account by a variety of established techniques. Thus, the main purpose of this chapter is to provide general conditions under which these somewhat delicate terms may be omitted. One of the main difficulties seems to be that without additional assumptions neither the recourse function nor an optimal decision depend continuously on the current state of the underlying process in general. Rockafellar and Wets (1974) showed that under weak conditions, the optimal value can be approximated by continuous decisions. However, while this allows one to deduce convergence results, such as those due to Pennanen (2005), it does not lead to quantitative estimates. For deriving continuity of the recourse function and bounds based on a barycentric approximation scheme, Kuhn (2005) required the underlying processes to be autoregressive. He also indicated that the key element in any scenario tree construction is the discretization of the conditional probabilities. In particular, continuous dependency of these probabilities on the current state of the underlying process is necessary for potential continuity of the recourse function and can be seen as continuity of the available information with respect to the current state. It is illustrated by Example 2.6 of Heitsch, Römisch, and Strugarek (2006) that the latter property is indispensable in order to omit any filtration distances and to obtain a good approximation of the initial process by usual techniques which are based on stagewise clustering. Thus, we impose Lipschitz continuity of the conditional distributions to verify the same regularity for the recourse function in Theorem 1. With this at hand, we estimate in Theorem 2 the gap between the optimal value and the costs of a decision that is locally calm. This allows to prove the main result of this chapter, Theorem 3, which provides an upper bound for the perturbation of the optimal value.
2.1
Problem Formulation
On a probability space (Ω, F, P) we consider an Rs -valued stochastic process ξ = (ξ1 , . . . , ξT ) with time horizon T ∈ N and the associated filtration (Ft )Tt=1 defined through Ft σ(ξ[t] ) for t = 1, . . . , T , where ξ[t] denotes the vector (ξ1 , . . . , ξt ). We assume that F1 = {Ω, ∅}. In the following, · denotes
2.1. Problem Formulation
11
the maximum norm on Rn for the respective value of n ∈ N, and we set
ξ[t] maxi=1,...,t ξi . We further assume that ξ[T ] ∈ Lp (Ω, F, P) for every p ∈ [1, +∞), and set for t = 1, . . . , T , Pt P [ξ t ∈ · ] , P[t] P ξ[t] ∈ · ,
Ξt supp Pt ⊂ Rs , Ξ[t] supp P[t] ⊂ Rs·t .
Furthermore, we consider the costs bt (·), the technology matrices At,1 (·), and the right-hand sides ht (·), which all are assumed to depend affinely on ξt ∈ Ξt and map into Rm , Rn·m , and Rn , respectively, for some m, n ∈ N and t = 1, . . . , T . Together with the nonrandom recourse matrices At,0 ∈ Rm·n these mappings define the following instance of the multistage program (1.7) v(ξ) inf b1 , x1 +
T
E bt (ξt ), xt (ξ[t] )
(2.1)
t=2
s.t. x = (x1 , . . . , xT ) ∈ Mm [1,T ] , xt (ξ[t] ) ∈ Xt , t = 1, . . . , T, At,0 xt (ξ[t] ) + At,1 (ξt )xt−1 (ξ[t−1] ) = ht (ξt ), t = 2, . . . , T, where Xt ⊂ Rm are certain nonempty, closed, and polyhedral sets for t = 1, . . . , T . The set Mm [1,T ] consists of all tuples x = (x1 , . . . , xT ) of Borelmeasurable mappings xt : Rs·t → Rm . The constraints in (2.1) are assumed to be fulfilled P−a.s. for t = 2, . . . , T . The purpose of this chapter is to establish an upper bound for the perturba˜ tion of v(ξ) when ξ is approximated by another process ξ. For convenience, we introduce the set-valued mappings Mt : Xt−1 × Ξt ⇒ Xt , Mt (xt−1 , ξt ) {xt ∈ Xt : At,0 xt + At,1 (ξt )xt−1 = ht (ξt )}. In this chapter, we assume complete recourse; i.e., Mt (xt−1 , ξt ) is nonempty for every xt−1 ∈ Xt−1 and every ξt ∈ Ξt . Complete recourse and the polyhedral form of Mt allow one to conclude (see Example 9.35 of Rockafellar and Wets (1998)) that Mt is Lipschitz continuous on Xt−1 × Ξt with respect to the Pompeiu-Hausdorff distance d in the following sense. There exists a constant M ≥ 0 with xt−1 , ξt )) ≤ M · max{1, ξt } · ˆ xt−1 − xt−1 and d (Mt (xt−1 , ξt ), Mt (ˆ ˆ d Mt (xt−1 , ξt ), Mt (xt−1 , ξt ) ≤ M · max{1, xt−1 } · ξˆt − ξt ,
12
Chapter 2. Stability of Multistage Stochastic Programs
for every (xt−1 , ξt ), (ˆ xt−1 , ξˆt ) ∈ Xt−1 × Ξt . We recall that the PompeiuHausdorff distance between two sets A, B ⊂ Rm is defined by
d(A, B) max sup dist(a, B), sup dist(b, A) , a∈A
b∈B
with dist(a, B) inf b∈B a − b . Remark 2.1.1. Throughout this chapter, the linearity of Mt is used only to obtain the claimed Lipschitz continuity. Analogously, we assume linear costs bt (ξt ), xt only to ensure the existence of a constant B ≥ 0 with
bt (ξt ), xt − bt (ξˆt ), xt ≤ B ξt − ξˆt
xt and
bt (ξt ), xt − bt (ξt ), xˆt ≤ B max{1, ξt } xt − xˆt . The integrability condition on ξ is assumed for notational simplicity. Actually, it suffices to have ξ[T ] ∈ Lp (Ω, F, P) for a sufficiently large p ∈ R+ . Furthermore, all results remain valid if Mt , ht , and bt depend on ξ[t] instead of ξt . Note that the assumption of non-randomness of the recourse matrices At,0 is necessary for the Lipschitz continuity of the mapping Mt . Having in mind problem (2.1), a tuple x = (x1 , . . . , xT ) of Borel-measurable mappings xt : Ξ[t] → Xt , t = 1, . . . , T , is called a feasible decision with respect to ξ, if the recourse condition xt (ξ[t] ) ∈ Mt (xt−1 (ξ[t−1] ), ξt )
(2.2)
is fulfilled P−a.s. for t = 1, . . . , T . The class of feasible decisions x will be denoted by S(ξ) and, for the sake of notational convenience, we set x0 = 1. We further denote the objective function of problem (2.1) by ϕ : Rm·T × Ξ[T ] → R, ϕ(x1 , . . . , xT , ξ[T ] )
T bt (ξt ), xt .
(2.3)
t=1
Using this notation, (2.1) can be written in the following short form v(ξ) =
inf
x ∈ S(ξ)
E [ϕ(x(ξ), ξ)] .
(2.4)
2.2. Continuity of the Recourse Function
2.2
13
Continuity of the Recourse Function
¯ be the recourse function at time t, which is defined Let Qt : Ξ[t] × Xt−1 → R recursively by QT +1 0 and the Dynamic Programming Equation inf bt (ξt ), xt + E Qt+1 ξ[t+1] , xt ξ[t] = ξ[t] Qt (ξ[t] , xt−1 ) xt ∈Mt (xt−1 ,ξt )
for t = T, . . . , 1, where the mapping ξ[t] → E Qt+1 ξ[t+1] , xt ξ[t] = ξ[t] denotes the regular conditional expectation of Qt+1 (·, xt ) relative to Ft . The value Qt (ξ[t] , xt−1 ) represents the minimal achievable
expected future costs after having chosen xt−1 = xt−1 , having observed ξ[t] = ξ[t] , and before deciding on xt . In particular, we have the identity v(ξ) = Q1 (ξ1 , x0 ), and complete recourse implies that Qt < +∞ holds true on Ξ[t] × Xt−1 . It was proved by Evstigneev (1976) that Qt is well-defined and measurable under the following Assumption 2.2.1. (i) There exists an integrable random variable θ such that ϕ(x, ξ) ≥ θ holds P−a.s. for every x ∈ Rm·T . (ii) For each c ∈ R the random level set {x ∈ Rm·T : ϕ(x, ξ) ≤ c} is compact P−a.s. Furthermore, a decision x ∈ S(ξ) is optimal if and only if the equality Qt (ξ[t] , xt−1 (ξ[t−1] )) = bt (ξt ), xt (ξ[t] ) + E Qt+1 ξ[t+1] , xt (ξ[t] ) ξ[t] = ξ[t] (2.5) holds for P[t] −almost every ξ[t] ∈ Ξ[t] and t = 1, . . . , T . Moreover, for every Borel measurable mapping xt−1 : Ξ[t−1] → Xt−1 , there exists a measurable xt : Ξ[t] → Xt such that relation (2.5) holds true for P[t] −almost every ξ[t] ∈ Ξ[t] . Actually, Evstigneev (1976) allows one to show that the P[t] -null sets on which the latter property does not hold coincide for all measurable xt−1 . Indeed, the following corollary is an immediate consequence of applying Lemma 4 of Evstigneev (1976) within the proof of his Theorem 2. Corollary 2.2.2. For t = 1, . . . , T , there is a Borel set A[t] ⊂ Ξ[t] with P[t] [A[t] ] = 1 such that the following property holds. For every Borel measurable mapping xt−1 : Ξ[t−1] → Xt−1 there exists a measurable xt : Ξ[t] → Xt such that identity (2.5) holds true for every ξ[t] ∈ A[t] . We assume that such a decision xt can be chosen to fulfill a certain growth condition:
14
Chapter 2. Stability of Multistage Stochastic Programs
Assumption 2.2.3. For t = 1, . . . , T , there is a Borel set A[t] ⊂ Ξ[t] with P[t] [A[t] ] = 1 and a constant L ≥ 1 such that the following property holds. For every Borel measurable mapping xt−1 : Ξ[t−1] → Xt−1 there exists a measurable xt : Ξ[t] → Xt such that identity (2.5) and the growth condition
xt (ξ[t] ) ≤ L · max 1, xt−1 (ξ[t−1] ) · max 1, ξ[t]
(2.6) hold true for every ξ[t] ∈ A[t] . Remark 2.2.4. Unfortunately, the existence of decisions which are bounded in the above sense may be hard to verify, in general. However, (2.6) holds true for every xt ∈ Mt (xt−1 , ξt ) if Xt is bounded, or, more generally, whenever the projection of Xt onto the kernel of the recourse matrix At,0 is bounded. Furthermore, the linear growth condition (2.6) could be relaxed to polynomial growth, and then the growth rate in ξ[t] of the Lipschitz constant in Theorem 1 and the subsequent results would change accordingly. Assumptions 2.2.1 and 2.2.3 imply the existence of an optimal decision x satisfying
t−1
xt (ξ[t] ) ≤ Lt · max 1, ξ[t]
P − a.s. for t = 1, . . . , T. (2.7) Indeed, a tuple x = (x1 , . . . , xT ) of mappings with (2.2) and (2.5)–(2.7) can be constructed by recursion, and from Theorem 14.37 of Rockafellar and Wets (1998) it follows that every xt can be chosen to be measurable. Consequently, x is an optimal decision. Decisions fulfilling (2.6) and (2.7) will be denoted as bounded in the following. To establish a quantitative stability result, we will study the continuity of Qt . Thereby, regularity properties of the mapping xt−1 → Qt (ξ[t] , xt−1 ) are well-known. We refer to Birge and Louveaux (1997) and Ruszczyński and Shapiro (2003b) who derived convexity as well as piecewise linearity for the case of finite Ξ[T ] and to Kuhn (2005) who proved continuity under compactness assumptions on Ξ[T ] and X1 , . . . , XT . Thus, the following proposition can be seen as an adaption of these results to our Lipschitz continuous framework. Proposition 2.2.5. The recourse function Qt is Lipschitz continuous with respect to the decision xt−1 in the following sense. For t = 1, . . . , T , there ¯ > 0 and a Borel set A[t] ⊂ Ξ[t] with P[t] [A[t] ] = 1, such exists a constant M that for every ξ[t] ∈ A[t] the relation Qt (ξ[t] , xt−1 ) − Qt (ξ[t] , xˆt−1 ) ≤ [Qt ]x (ξ[t] ) · xt−1 − xˆt−1
(2.8) Lip
2.2. Continuity of the Recourse Function
15
holds true for every xt−1 , xˆt−1 ∈ Xt−1 with a (random) Lipschitz constant [Qt ]xLip (ξ[t] ) satisfying ¯ · E max{1, ξ[T ] }2+T −t ξ[t] = ξ[t] . (2.9) [Qt ]xLip (ξ[t] ) ≤ M Proof. The assertion is true for QT +1 ≡ 0. Assume it is true also for s = t + 1, . . . , T with Lipschitz constants [Qs ]xLip and that the difference on the left side of (2.8) is negative. Then, due to (2.5), there exists an x∗t (ξ[t] ) ∈ Mt (xt−1 , ξt ), such that the left side of (2.8) coincides for P[t] -a.e. ξ[t] with − bt (ξt ), x∗t (ξ[t] ) − E Qt+1 ξ[t+1] , x∗t (ξ[t] ) ξ[t] = ξ[t]
+ inf bt (ξt ), xˆt + E Qt+1 ξ[t+1] , xˆt ξ[t] = ξ[t] . (2.10) x ˆt ∈Mt (ˆ xt−1 ,ξt )
Moreover, it follows from Corollary 2.2.2 that we may assume that the P[t] (dξ[t] )-null sets on which this identity does not hold coincide for all xt−1 ∈ Xt−1 . Due to Theorem 14.37 of Rockafellar and Wets (1998) we can choose ˆ ∗t with a measurable x z − x∗t (ξ[t] ) ˆ ∗t (ξ[t] ) ∈ arg min x z∈Mt (ˆ xt−1 ,ξt )
to estimate (2.10) from above by bt (ξt ), x∗ (ξ[t] ) − x ˆ ∗t (ξ[t] ) t + E Qt+1 ξ[t+1] , x∗t − Qt+1 ξ[t+1] , xˆ∗t ξ[t] = ξ[t] . From the linear growth of bt and the Lipschitz continuity of Qt+1 with respect to xt , one concludes that this term is not greater than ˆ ∗t (ξ[t] ), B max{1, ξt } + E [Qt+1 ]xLip (ξ[t+1] ) ξ[t] = ξ[t] · x∗t (ξ[t] ) − x ˆ ∗t and again P[t] (dξ[t] )-a.s. for every xt−1 , xˆt−1 ∈ Xt−1 . By definition of x Lipschitz continuity of Mt , the latter term is bounded from above by M B max{1, ξt 2 } + M max{1, ξt } · E [Qt+1 ]xLip (ξ[t+1] ) ξ[t] = ξ[t] · xt−1 − xˆt−1 . An analoguous estimate holds whenever the difference on the left side of (2.8) is positive. Hence, [Qt ]xLip (ξ[t] ) is given by the term in parentheses, from which we conclude by recursion that we can put T i−1 M i−t+1 E max{1, ξi 2 } · max{1, ξk } ξ[t] = ξ[t] . [Qt ]xLip (ξ[t] ) B i=t
k=t
Thus, the asserted bound for [Qt ]xLip results from a straightforward estimate.
16
Chapter 2. Stability of Multistage Stochastic Programs
Establishing continuity of ξ[t] → Qt (ξ[t] , xt−1 ) is more subtle since, unlike the decision variable xt−1 , the observation ξ[t] impacts not only the Lipschitz continuous time coupling constraints at time t, but also the expectations about future realizations of ξ. Therefore, one can hardly expect Qt to be Lipschitz continuous with respect having that the conditional
to ξ[t] without distribution of (ξs )Ts=t+1 under ξ[t] = ξ[t] depends continuously on ξ[t] with respect to some appropriate measure of distance. It is illustrated by Example 2.6 of Heitsch et al. (2006) that without such a continuous dependency stability of optimal values in terms of an Lp -distance does not hold in general. Thus, for establishing recursively the continuity of Qt , we need that continuity of Qt+1 with respect to ξ[t+1] is passed down to the mapping ξ[t] → E Qt+1 (ξ[t+1] , xt ) ξ[t] = ξ[t] . To this end, we introduce for p ≥ 1 and a given Borel set A[t+1] ⊂ Ξ[t+1] with P[t+1] [A[t+1] ] = 1 the class of functions A Fp [t+1] (Ξ[t+1] ) f : Ξ[t+1] → R : (2.11) holds for ξ[t+1] , ξˆ[t+1] ∈ A[t+1] along with the Lipschitz condition |f (ξ[t+1] ) − f (ξˆ[t+1] )| ≤ max{1, ξ[t+1] , ξˆ[t+1] }p−1 ξ[t+1] − ξˆ[t+1] . (2.11) We consider the following distance between Borel probability measures P , Q on Ξ[t+1] : A
ζp [t+1] (P , Q )
sup A[t+1] f ∈Fp (Ξ[t+1] )
Ξ[t+1]
f (ξ[t+1] )P (dξ[t+1] ) −
Ξ[t+1]
f (ξ[t+1] )Q (dξ[t+1] ) .
Recall that, with the exception that we disregard the P[t+1] -null set Ξ[t+1] \ A A A[t+1] within the definition of Fp [t+1] , the functional ζp [t+1] corresponds to the p-th order Fortet-Mourier distance; see Rachev (1991) and Römisch (2003). Using this notation, the claimed continuity of the conditional distributions is specified by the following Assumption 2.2.6. There exist constants W, K > 0, and r ≥ 0, such that with (2.12) mt 1 + (T − t)(1 + r) for t = 1, . . . , T, the following conditions are fulfilled.
2.2. Continuity of the Recourse Function
17
(i) For every t = 1, . . . , T − 1, every Borel set A[t+1] ⊂ Ξ[t+1] with P[t+1] [A[t+1] ] = 1, and P[t] -a.e. ξ[t] , ξˆ[t] ∈ Ξ[t] A ξ[t] = ξ[t] , P ξ[t+1] ∈ · ξ[t] = ξˆ[t] ζm[t+1] ∈ · P ξ [t+1] (t+1) +1 mt −1
ξ[t] − ξˆ[t] . ≤ K max 1, ξ[t] , ξˆ[t]
(ii) For every t = 1, . . . , T − 1 and P[t] -a.e. ξ[t] ∈ Ξ[t]
1+T −t
mt E max 1, ξ[T ] . ξ[t] = ξ[t] ≤ W · max 1, ξ[t] Since the above assumption is crucial for the following continuity and stability results, it is discussed in the following remark. Remark 2.2.7. Condition (i) is related to terms usually related to Markov processes, namely the coefficient of ergodicity and the Feller property; see, e.g., Dobrushin (1956) and Dynkin (1965), respectively. A similar assumption has been made by Bally et al. (2005) to ensure stability of an optimalstopping problem in a Markovian framework and by Mirkov and Pflug (2007) for their study of consistency of tree approximations. It is also made implicitly by Kuhn (2005) by focusing on autoregressive processes. The more involved formulation of Assumption 2.2.6, allowing for polynomially growing Lipschitz constants, is due to the fact that neither bt (ξt ), xt nor Mt+1 are uniformly Lipschitz continuous in ξt and xt , unless both the support Ξ[T ] and the sets Xt , t = 1, . . . , T are bounded. Indeed, under such a boundedness condition (i) may be significantly simplified; see Remark 2.2.10 below. In particular, (i) and (ii) hold true if Ξ[T ] is finite. Then ζp is the optimal value of a linear optimization problem that can be solved numerically to determine the constants K and r. The following lemma provides conditions under which the conditions of Assumption 2.2.6 hold true. The proof is given in the appendix. Lemma 2.2.8. Assume the dynamics of the process ξ are given by the following scheme: (2.13) ξt+1 = gt (ξ[t] , εt+1 ), where εt+1 is a Rn -valued random variable that is independent of ξ[t] , and gt are measurable mappings from Rs·t × Rn to Rs which satisfy the following Lipschitz and linear growth conditions: (i) gt (ξ[t] , ε) − gt (ξˆ[t] , ε) ≤ max{1, ξ[t] , ξˆ[t] }r ξ[t] − ξˆ[t] h( ε ),,
18
Chapter 2. Stability of Multistage Stochastic Programs
(ii) gt (ξ[t] , ε) ≤ max{1, ξ[t] } k( ε ), for all ε ∈ Rn and ξ[t] , ξˆ[t] ∈ Rs·t , some constant r ≥ 1 and Borel-measurable mappings h, k ≥ 1, such that h( εt+1 ) and k( εt+1 ) are in Lp for every p ∈ [1, +∞). Then ξ fulfills both conditions of Assumption 2.2.6 with the constants T k( εi )1+T −t . K E [k( εt+1 )m1 h( εt+1 )] and W E i=t+1
The conditions of Lemma 2.2.8 are fulfilled, e.g., by a variety of time-series models. We provide the following simple example. Example 2.2.9. Let ξ be a GARCH process defined by the following difference equations: ξt = (wt , vt , εt ) with k vt+1 (βi vt−i + γi εt−i ) i=0
and
wt+1
k
αi wt−i + vt+1 · εt+1
i=0
for certain parameters αi , βi , γi ∈ R. Thereby, v represents the stochastic volatility process of w and (εt )t≥0 is a sequence of i.i.d. random variables, following a standard normal distribution. It is easy to see that ξ fulfills the conditions of Lemma 2.2.8 with r = 1 and h(·), k(·) being affine functions. The following theorem shows that Assumption 2.2.6 indeed provides Lipschitz continuity of Qt with respect to ξ[t] . We also refer to Proposition 2.7 of Kuhn (2005) which represents a corresponding continuity result in a slightly different framework. Theorem 1. Suppose the Assumptions 2.2.1, 2.2.3, and 2.2.6 are fulfilled. For every t = 1, . . . , T there is a constant Ct > 0 and a Borel set A[t] ⊂ Ξ[t] with P[t] [A[t] ] = 1 such that 1 A Qt ( · , xt−1 ) ∈ Fm[t]t +1 (Ξ[t] ) Ct max {1, xt−1 } holds true for every xt−1 ∈ Xt−1 . Proof. The assertion holds true for QT +1 ≡ 0; we show that it follows recursively for t ≤ T . To this end, we proceed as in the proof of Proposition 2.2.5
2.2. Continuity of the Recourse Function
19
and choose a measurable x∗t with x∗t (ξ [t] ) ∈ Mt (xt−1 , ξt ) that fulfills (2.5) and
x∗t (ξ[t] ) ≤ L · max {1, xt−1 } · max 1, ξ[t] . Thus, we obtain Qt (ξ[t] , xt−1 ) − Qt (ξˆ[t] , xt−1 ) = bt (ξt ), x∗t (ξ[t] ) + E Qt+1 (ξ[t+1] , x∗t (ξ[t] )) ξ[t] = ξ[t] − inf bt (ξˆt ), xˆt + E Qt+1 (ξ[t+1] , xˆt ) ξ[t] = ξˆ[t] , (2.14) x ˆt ∈Mt (xt−1 ,ξˆt )
which holds, due to Assumption 2.2.3, for every ξ[t] , ξˆ[t] ∈ A[t] with P[t] [A[t] ] = 1 for all xt−1 ∈ Xt−1 . We consider the case when the term under the norm ˆ ∗t with is negative and choose a measurable x ˆ ∗t (ξˆ[t] ) ∈ argminz∈Mt (xt−1 ,ξˆt ) z − x∗t (ξ[t] )
x to obtain the following upper bound for the right side of equation (2.14): −bt (ξt ), x∗t (ξ[t] ) − E Qt+1 (ξ[t+1] , x∗t (ξ[t] )) ξ[t] = ξ[t] ˆ ∗t (ξˆ[t] ) + E Qt+1 (ξ[t+1] , x ˆ ∗t (ξˆ[t] )) ξ[t] = ξˆ[t] . (2.15) +bt (ξˆt ), x Using linearity of bt and Lipschitz continuity of Mt , the difference of the scalar product terms can be estimated by ˆ ∗t (ξˆ[t] )| |bt (ξt ), x∗t (ξ[t] ) − bt (ξˆt ), x∗t (ξ[t] )| + |bt (ξˆt ), x∗t (ξ[t] ) − bt (ξˆt ), x
≤ B ξt − ξˆt · L · max {1, xt−1 } max 1, ξ[t]
+B max{1, ξˆt } · M max{1, xt−1 } ξt − ξˆt
≤ B(L + M ) max {1, xt−1 } max 1, ξ[t] , ξˆ[t] ξt − ξˆt .
(2.16)
The difference of the conditional expectations in (2.15) is bounded by E Qt+1 (ξ[t+1] , x∗t (ξ[t] )) ξ[t] = ξ[t] − E Qt+1 (ξ[t+1] , x∗t (ξ[t] )) ξ[t] = ξˆ[t] ˆ ∗t (ξˆ[t] )) ξ[t] = ξˆ[t] + E Qt+1 (ξ[t+1] , x∗t (ξ[t] )) ξ[t] = ξˆ[t] − E Qt+1 (ξ[t+1] , x
≤ Ct+1 max 1, x∗t (ξ[t] )
At+1 P ξ[t+1] ∈ · ξ[t] = ξ[t] , P ξ[t+1] ∈ · ξ[t] = ξˆ[t] · ζm (t+1) +1 + E [Qt+1 ]xLip (ξ[t+1] ) ξ[t] = ξˆ[t] M max {1, xt−1 } ξ[t] − ξˆ[t] , (2.17) whereby the last inequality follows from the assertion for Qt+1 , Proposition 2.2.5, and the Lipschitz continuity of Mt . This estimate holds true for every
20
Chapter 2. Stability of Multistage Stochastic Programs
ξ[t] , ξˆ[t] ∈ A[t] for all xt−1 ∈ Xt−1 , where A[t] denotes the sets on which the assertions of Proposition 2.2.5 hold. Applying now condition (i) of Assumption 2.2.6 and the estimate (2.9), we see that the sum (2.17) does not exceed mt −1
KCt+1 max 1, x∗t (ξ[t] ) max 1, ξ[t] , ξˆ[t]
ξ[t] − ξˆ[t]
¯ · E max 1, ξ[T ] 1+T −t ξ[t] = ξ[t] M max {1, xt−1 } · ξ[t] − ξˆ[t]
+M for every ξ[t] , ξˆ[t] ∈ A[t] . Thereby, A[t] denotes the set of P[t] -probability one on which Assumption 2.2.6 holds. From condition (ii) of Assumption 2.2.6 and the boundedness of x∗t , we conclude that the latter sum is again bounded from above by ¯ W M max {1, xt−1 } max{1, ξ[t] , ξˆ[t] }mt · ξ[t] − ξˆ[t] . KCt+1 L + M (2.18) The upper bounds (2.16) and (2.18) remain valid if the term under the norm in (2.14) is positive. Piecing all this together, the assertion for Qt follows with At At ∩ At ∩ At , and the Lipschitz constant Ct can be chosen by collecting the constants from (2.16) and (2.18), i.e., ¯ W M. Ct B(M + L) + KCt+1 L + M
In the following remark we discuss what can be simplified whenever the sets Xt and Ξ[t] are bounded for t = 1, . . . , T . Remark 2.2.10. The constant mt is chosen to be equal to the growth rate of the term max{1, ξ[t] , ξˆ[t] } within an upper bound of (2.17). Assuming boundedness of the
sets Xt for t = 1, . . . , T allows one to estimate the term max 1, x∗t (ξ[t] ) in the first summand of (2.17) by some constant instead of estimating it by max{1, ξ[t] , ξˆ[t] }. Consequently, one can allow the growth rate of the ζ-terms in (2.17) and in Assumption 2.2.6 to increase from mt − 1 to mt . If the set Ξ[T ] is bounded as well, then [Qt+1 ]xLip (ξ[t+1] ) is bounded by a constant, condition (i) of Assumption 2.2.6 may be simplified to t+1 P ξ t+1 ∈ · ξ[t] = ξ[t] , P ξ t+1 ∈ · ξ[t] = ξˆ[t] ≤ K ξ[t] − ξˆ[t] , ζ1A condition (ii) of Assumption 2.2.6 may be omitted, and the assertion of TheA orem 1 can be written as (1/Ct ) Qt ( · , xt−1 ) ∈ F1 [t] (Ξ[t] ); i.e., (ξ[t] , xt−1 ) → Qt (ξ[t] , xt−1 ) is then uniformly Lipschitz continuous.
2.3. Approximations
21
The optimality and boundedness conditions (2.5)–(2.7), as well as the continuity properties claimed in Assumption 2.2.6 and Theorem 1, hold on some Borel set A ⊂ Ξ[T ] with P[T ] [A] = 1. Since an approximation ξ˜ may have its support in the set Ξ[T ] \ A, it is reasonable to modify the considered random variables on this P[T ] -null set to appropriate versions which fulfill the claimed properties for every ξ[T ] ∈ Ξ[T ] . To this end, we recall that P[T ] [A] = 1 and Ξ[T ] = supp P[T ] imply that A is a dense subset of Ξ[T ] . For evn ˆ ery ξˆ[T ] ∈ Ξ[T ] \A we then consider a sequence (ξ[T ] )n∈N ⊂ A converging to ξ[T ] as n goes to infinity. The recourse function and the regular conditional disn tributions are modified in ξˆ[T ] by setting Qt (ξˆ[t] , xt−1 ) limn→∞ Qt (ξ[t] , xt−1 ) n ˆ and E[g(ξ[t+1] )|ξ[t] = ξ[t] ] limn→∞ E[g(ξ[t+1] )|ξ[t] = ξ[t] ] for every Lipschitz continuous mapping g. A bounded optimal solution x∗ can be apnk propriately modified in ξˆ[t] by considering a subsequence (ξ[T ] )k∈N such that x∗ (ξ nk ) converges toward some zt (ξˆ[t] ) ∈ Xt for t = 1, . . . , T . Then we put t
[t]
x∗t (ξˆ[t] ) zt (ξˆ[t] ) and we obtain that the above stated conditions and properties indeed hold for every ξ[T ] ∈ Ξ[T ] .
2.3
Approximations
Whenever an auxiliary process ξ˜ is expected to approximate ξ with regard to the optimization problem (2.4), it is indispensable that ξ˜ is nonanticipative1 with respect to ξ. This is illustrated, for the sake of completeness, by the following example. Example 2.3.1. Consider T = 3 and the process ξ that is given by ξ1 ≡ 0 and the two independent random variables ξ2 and ξ3 , both uniformly distributed on [0, 1]. For n ∈ N and 0 < ε < 1 we introduce the grids A(n) { ni : i = 1, . . . , n} and the associated (right-continuous) projections πA(n) : [0, 1] → A(n) , defined by
j j i i ∈ A(n) : z − ≤ z − for all ∈ A(n) . πA(n) (z) max n n n n (n)
Furthermore, we define processes ξ(n) , n ∈ N, given by ξ1 πA(n) ξ3 , and
if ξ3 ≤ 1/2, πA(n) ξ2 (n) ξ2 (πA(n) ξ2 ) + nε if ξ3 > 1/2.
(n)
≡ 0, ξ3
1 The term nonanticipative is widely used in Stochastic Programming; it means here that ξ˜ is adapted to the filtration σ(ξ[t] ) t=1,...,T .
22
Chapter 2. Stability of Multistage Stochastic Programs
The sequence ξ (n) can be seen as an approximation of ξ, since E ξ − ξ (n) ≤ 1+2ε holds. 2n Let us now consider the following optimization problem
x ≥ 0, xt ∈ σ(ξ[t] ), t = 2, 3, , v(ξ) min E [x2 · ξ2 + x3 · ξ3 ] : t x2 + x3 = 1 which is solved by x∗2 = 1{ξ2 ≤1/2} and x∗3 = 1 − x∗2 with the optimal value v(ξ) = 12/32. When replacing ξ by ξ (n) , we use the decisions (n)
(n)
(n)
x2 = 1{ξ(n) ≤1/4} + 1{ξ(n) ∈ ]1/4, 3/4[ \A(n) } and x3 = 1 − x2 2
2
to obtain lim supn→∞ v(ξ (n) ) ≤ 11/32. Obviously, convergence of v(ξ (n) ) to (n) v(ξ) does not hold since by observing ξ2 one knows whether ξ3 > 1/2 or (n) not, i.e., ξ is not nonanticipative with respect to ξ. With regard to the results of Heitsch et al. (2006) and Mirkov and Pflug (2007), convergence (n) (n) fails since the conditional distributions P[ξ3 ∈ · |ξ2 = z] do not converge toward P[ξ3 ∈ · |ξ2 = z], and the filtration distance between ξ(n) and ξ does not converge toward 0, respectively. Nonanticipativity is ensured in the following by Definition 2.3.2. A stochastic process ξ˜ on (Ω, F, P) is called an approximation of ξ, if there exist Borel-measurable mappings ft : Ξ[t] → Ξt for t = 1, . . . , T, fulfilling the following conditions: (i) ξ˜t = ft (ξ[t] ) for t = 1, . . . T , (ii) f[T ] (Ξ[T ] ) ⊂ Ξ[T ] , (iii) f1 (ξ1 ) = ξ1 for every ξ1 ∈ Ξ1 , and (iv) f[T ] (ξ[T ] ) ∈ Lp (Ω, F, P) for every p ∈ [1, ∞). Thereby, f[t] (ξ[t] ) denotes the vector (fi (ξ[i] ))ti=1 ∈ Rs·t for t = 1, . . . , T . In the following, we use the notation f for the mapping f[T ] (·). Remark 2.3.3. The nonanticipativity condition (i) is equivalent to σ(ξ[t] )measurability of the random variable ξ˜t . Condition (ii) ensures that f maps onto realizations ξ[T ] ∈ Ξ[T ] of the initial process and thus implies that the ˜ decision. restriction of a decision x(·) ∈ S(ξ) on the set f (Ξ[T ] ) is a ξ-feasible The integrability condition (iv) is assumed again for the sake of simplicity. For the following results, it suffices that f[T ] (ξ[T ] ) ∈ Lp (Ω, F, P) for a constant p ∈ R+ that is sufficiently large.
2.3. Approximations
23
The following proposition relies heavily on the continuity of the recourse function stated in Theorem 1. It is shown that, although an optimal decision x∗ (·) is not continuous in general, its expected costs can be approximated by the decision x∗ (f (·)) (which is piecewise constant whenever ξ˜ has finite support). Although x∗ (f (·)) may fail to fulfill the time-coupling constraints (2.2) with respect to ξ, it can be used to construct a feasible decision. This will be carried out in the next section. Proposition 2.3.4. Consider an optimal decision x∗ which is bounded in the sense of (2.7) and an approximation mapping f according to Definition 2.3.2. Then there exists a constant D > 0 such that the following estimate holds: |ϕ(x∗ (ξ), ξ) − ϕ(x∗ (f (ξ)), ξ)| ≤ DE max{1, ξ , f (ξ) }m1 · ξ − f (ξ) , (2.19) where the constant m1 is defined by (2.12). Proof. Due to f1 (ξ1 ) = ξ1 , we have to estimate T T ∗ ∗ . E b (ξ ), x (ξ ) − E b (ξ ), x (f (ξ )) t t t t [t] [t] [t] t t t=2
t=2
By the optimality of x∗ , the first expectation is equal to E Q2 (ξ[2] , x∗1 ) , and it follows from Theorem 1 and the boundedness of x∗1 (and x∗0 1) that E Q2 (ξ[2] , x∗1 ) − Q2 (f[2] (ξ[2] ), x∗1 )
m2
ξ[2] − f[2] (ξ[2] ) (2.20) ≤ LC2 E max 1, ξ[2] , f[2] (ξ[2] )
Thus, it remains to estimate the term T ∗ E Q2 (f[2] (ξ[2] ), x∗ ) − . b (ξ ), x (f (ξ )) t t [t] [t] 1 t
(2.21)
t=2
To this end, we consider the following inequality t−1 E Q2 (f[2] (ξ[2] ), x∗1 ) − bs (ξs ), x∗s (f[s] (ξ[s] )) s=2
−
≤ Dt , (2.22)
Qt (f[t] (ξ[t] ), x∗t−1 (f[t−1] (ξ[t−1] )))
whose left side coincides with (2.21) for t = T + 1. It holds trivially for t = 2 with D2 = 0, and we assume that it is also true for some t ∈ {2, . . . , T } with
24
Chapter 2. Stability of Multistage Stochastic Programs
a constant Dt ≥ 0. To prove it recursively for t + 1, we have to find an upper bound for E Qt (f[t] (ξ[t] ), x∗t−1 (f[t−1] (ξ[t−1] ))) − bt (ξt ), x∗t (f[t] (ξ[t] )) − Qt+1 (f[t+1] (ξ[t+1] ), x∗t (f[t] (ξ[t] ))) . (2.23) To this end, we use again x∗ ’s optimality to expand the first summand: E Qt (f[t] (ξ[t] ), x∗t−1 (f[t−1] (ξ[t−1] ))) bt (ft (ξ[t] )), x∗t (f[t] (ξ[t] )) = Ξ[t]
+ E Qt+1 (ξ[t+1] , x∗t (f[t] (ξ[t] ))) ξ[t] = f[t] (ξ[t] ) P[t] (dξ[t] ).
(2.24)
Now, to estimate (2.23), we have to replace bt (ft (ξ[t] )) by bt (ξ[t] ). The Lipschitz continuity of bt (·) implies |bt (ft (ξ[t] )), x∗t (f[t] (ξ[t] )) − bt (ξt ), x∗t (f[t] (ξ[t] ))| ≤ B · x∗t (f[t] (ξ[t] )) · ξ[t] − f[t] (ξ[t] ) . To estimate the difference of the Qt+1 -terms in (2.23) and (2.24), we add and subtract the term E Qt+1 (f[t+1] (ξ[t+1] ), x∗t (f[t] (ξ[t] ))) ξ[t] = ξ[t] and use the triangle inequality to estimate E Qt+1 (ξ[t+1] , x∗t (f[t] (ξ[t] ))) ξ[t] = f[t] (ξ[t] ) −E Qt+1 (f[t+1] (ξ[t+1] ), x∗t (f[t] (ξ[t] ))) ξ[t] = ξ[t] ≤ E Qt+1 (ξ[t+1] , x∗t (f[t] (ξ[t] ))) ξ[t] = f[t] (ξ[t] ) ∗ −E Qt+1 (ξ[t+1] , xt (f[t] (ξ[t] ))) ξ[t] = ξ[t] + E Qt+1 (ξ[t+1] , x∗t (f[t] (ξ[t] ))) ξ[t] = ξ[t] ∗ −E Qt+1 (f[t+1] (ξ[t+1] ), xt (f[t] (ξ[t] ))) ξ[t] = ξ[t] .
2.4. Calm Decisions
25
By applying Theorem 1 and Assumption 2.2.6 we conclude that this term is bounded for P[t] -almost every ξ[t] by
KCt+1 max 1, x∗t (f[t] (ξ[t] ) max{1, ξ[t] , f[t] (ξ[t] ) }mt −1 ξ[t] − f[t] (ξ[t] )
+Ct+1 max 1, x∗t (f[t] (ξ[t] )
· E max{1, ξ[t+1] , f[t+1] (ξ[t+1] ) }mt+1 ξ[t+1] − f[t+1] (ξ[t+1] ) ξ[t] = ξ[t] ≤ (K + 1)Ct+1 Lt E max{1, ξ[t+1] , f[t+1] (ξ[t+1] ) }mt +t−1 · ξ[t+1] − f[t+1] (ξ[t+1] ) ξ[t] = ξ[t] , where the last inequality follows from the boundedness of x∗t and the relation mt+1 ≤ mt − 1. Integration with respect to P[t] (dξ[t] ) and combining these estimates with (2.24) entails that (2.23) does not exceed (B + (K + 1)Ct+1 )Lt · E max{1, ξ[t+1] , f[t+1] (ξ[t+1] ) }mt +t−1 · ξ[t+1] − f[t+1] (ξ[t+1] ) . (2.25) Hence, (2.22) holds for t + 1 with Dt+1 being equal to the sum of Dt and (2.25). Due to the fact that both mt + t − 1 and m2 are smaller than m1 , the sum of (2.20) and (2.21) does not exceed DE [max{1, ξ , f (ξ) }m1 · ξ − f (ξ) ] with D LC2 + DT +1 . This completes the proof.
2.4
Calm Decisions
One of the main difficulties in establishing the stability of the optimal value v(ξ) with respect to perturbations of the process ξ is that optimal solutions do not depend continuously on the realization of ξ, in general. Furthermore, the gap between v(ξ) and the minimal expected costs which can be realized by, e.g., Lipschitz continuous solutions may be hard to estimate. In this section we shall introduce specific calm decisions and estimate the minimal expected costs realized by those decisions. We consider an optimal decision x∗ which is bounded in the sense of (2.7). The calm modification of x∗ is defined by ¯ ∗1 x ∗ ¯ t (ξ[t] ) x
x∗1 , ∈ argminz∈Mt (x¯ ∗t−1 (ξ[t−1] ), ξt ) x∗t (f[t] (ξ[t] )) − z for t = 2, . . . , T,
26
Chapter 2. Stability of Multistage Stochastic Programs
where, again due to Theorem 14.37 of Rockafellar and Wets (1998), the latter ¯ ∗t coincide mappings can be chosen to be measurable. Observe that x∗t and x on the set f[t] (Ξ[t] ), i.e., ¯ ∗t (f[t] (ξ[t] )) = x∗t (f[t] (ξ[t] )) for every ξ[t] ∈ Ξ[t] . x
(2.26)
¯ ∗t (·) in ξ[t] can Due to the Lipschitz continuity of Mt , the local variability of x ∗ ¯ t (·) is indeed calm locally around f[t] (ξ[t] ) for be estimated recursively, and x every ξ[t] ∈ Ξ[t] in the following sense. Proposition 2.4.1. For every t = 1, . . . , T and every ξ[T ] ∈ Ξ[T ] we have ¯ ∗t (ξ[t] ) − x ¯ ∗t (f[t] (ξ[t] ))
x ≤ L(T − 1)M T −1 max{1, ξ[T ] , f[T ] (ξ[T ] ) }T −1 ξ[T ] − f[T ] (ξ[T ] ) ,
(2.27)
with the constant L ≥ 1 that has been introduced in Assumption 2.2.3. Proof. For t = 1, the difference on the left side of (2.27) vanishes. For t > 1 ¯ ∗t (ξ[t] ) to write we use the identity (2.26) and the definition of x ¯ ∗t (ξ[t] ) − x ¯ ∗t (f[t] (ξ[t] )) =
x
inf
¯ ∗t−1 (ξ[t−1] ), ξt ) z∈Mt (x
¯ ∗t (f[t] (ξ[t] )) .
z − x
(2.28)
¯ ∗t−1 (f[t−1] (ξ[t−1] )), ft (ξ[t] )), we obtain ¯ ∗t (f[t] (ξ[t] )) ∈ Mt (x Using the inclusion x that the right-hand side of (2.28) is not greater than the Pompeiu-Hausdorff ¯ ∗t−1 (ξ[t−1] ), ξ[t] ) and Mt (x ¯ ∗t−1 (f[t−1] (ξ[t−1] )), ft (ξ[t] )). We then distance of Mt (x apply the triangle inequality with respect to this metric and use the Lipschitz continuity of Mt to conclude that the right-hand side of (2.28) is bounded from above by ¯ ∗t−1 (ξ[t−1] ) − x ¯ ∗t−1 (f[t−1] (ξ[t−1] ))
M max{1, ξ[t] } x ¯ ∗t−1 (f[t−1] (ξ[t−1] )) } f[t] (ξ[t] ) − ξ[t] . + M max{1, x By boundedness of xt−1 , the latter sum does not exceed ¯ ∗t−1 (ξ[t−1] ) − x ¯ ∗t−1 (f[t−1] (ξ[t−1] ))
M max{1, ξ[t] } x + M L max{1, f[t−1] (ξ[t−1] ) }t−1 f[t] (ξ[t] ) − ξ[t] . Recursively, we obtain that the left side of (2.27) is bounded by L
t
M t+1−i max{1, f[i−1] (ξ[i−1] ) }i−1 max{1, ξ[t] }t−i ξ[i] − f[i] (ξ[i] ) .
i=2
The assertion follows by a straightforward estimate.
2.4. Calm Decisions
27
By combining Propositions 2.3.4 and 2.4.1, one immediately concludes the following theorem, which shows that the difference of the expected costs ¯ ∗ can be estimated in terms of the deviation between generated by x∗ and x ξ and f (ξ). Theorem 2. Suppose Assumptions 2.2.1, 2.2.3, and 2.2.6 are fulfilled. Consider an optimal decision x∗ which is bounded in the sense of (2.7) and its ¯∗. calm modification x Then there exists a constant C > 0 such that the following estimate holds ¯ ∗ (ξ), ξ)| ≤ C E [max{1, ξ , f (ξ) }m1 · ξ − f (ξ) ] , |Eϕ(x∗ (ξ), ξ) − Eϕ(x where the constant m1 is defined by (2.12). Proof. To prove the assertion, we have to estimate the following term: T T ∗ ∗ ¯ t (ξ[t] ) . bt (ξt ), xt (ξ[t] ) − bt (ξt ), x (2.29) E t=2
t=2
Recall that, by Proposition 2.3.4, x∗ (ξ) and x∗ (f (ξ)) produce comparable ¯ ∗ (ξ) can costs. On the other hand, the difference between x∗ (f (ξ)) and x be estimated due to the calmness of the latter decision. Thus, we add and subtract the term T ∗ E bt (ξt ), xt (f[t] (ξ[t] )) t=2
to the expectation within (2.29). Using then the triangle inequality as well as Proposition 2.3.4, we conclude that (2.29) is not greater than the sum of T T ∗ ∗ ¯ t (ξ[t] ) bt (ξt ), xt (f[t] (ξ[t] )) − bt (ξt ), x (2.30) E t=2 t=2 and the right-hand side of (2.19). It thus remains to estimate (2.30). By ap¯ ∗ from Proposition plying identity (2.26) as well as the calmness property of x 2.4.1, we obtain the following upper bound: T
E
B max{1, ξ[t] }L(T − 1)M T −1
t=2 T −1
· max{1, ξ[T ] , f[T ] (ξ[T ] ) } ξ[T ] − f[T ] (ξ[T ] )
≤ L(T − 1)2 BM T −1 E max{1, ξ[T ] , f[T ] (ξ[T ] ) }T ξ[T ] − f[T ] (ξ[T ] ) .
28
Chapter 2. Stability of Multistage Stochastic Programs
Finally, the sum of the latter term and the right-hand side of (2.19) is smaller than C E max{1, ξ[T ] , f[T ] (ξ[T ] ) }m1 · ξ[T ] − f[T ] (ξ[T ] ) , with the constant C D + L(T − 1)2 B M T −1 .
2.5
Stability
In order to address the question of stability, we have to consider the following issue. Although we assume the existence of bounded optimal solutions to the initial problem (2.4), the perturbed problem may be unbounded. This is illustrated by the following example. Example 2.5.1. Consider some ε ∈ (0, 14 ), T = 2, and the ‘stock prices’ ξ1 ≡ 12 + ε and ξ2 , where the latter is uniformly distributed on [0, 1]. The optimal investment problem v(ξ) = minx≥0 x · ξ1 − E[x · ξ2 ] = minx≥0 x · ε is ˜ defined by ξ˜1 ξ1 and solved by x = 0. The process ξ,
1 if ξ2 ≥ 12 − 2 ε, ξ˜2 , 0 else is an approximation of ξ according to Definition 2.3.2. However, we see that ˜ = minx≥0 −x · ε = −∞. E[ξ˜2 ] = 12 + 2 ε and, consequently, v(ξ) Heitsch et al. (2006) avoid such unfavorable cases by their Assumption (A2) of level-boundedness of the objective, locally around ξ. We proceed by assuming that ξ˜ fulfills Assumption 2.2.3 too; i.e., the perturbed problem ˜ admits a bounded optimal solution. We now state the main result of v(ξ) this chapter. Theorem 3. Suppose Assumptions 2.2.1, 2.2.3, and 2.2.6 are fulfilled. Let ξ˜ be an approximation of ξ according to Definition 2.3.2, which fulfills Assumption 2.2.3, too, and consider the constant m1 defined by (2.12). Then there exists a constant γ > 0, such that ˜ ≤ γ E max{1, ξ , ξ } ˜ m1 · ξ − ξ
˜ v(ξ) − v(ξ) holds.
2.5. Stability
29
Proof. We denote the approximation mapping corresponding to ξ˜ by f and consider a bounded optimal decision x∗ ∈ S(ξ) and the corresponding calm ¯ ∗ from Section 2.4. modification x Applying Theorem 2 yields the following inequality: ˜ − v(ξ) = v(ξ) ˜ − Eϕ(x∗ (ξ), ξ) v(ξ) ˜ m1 · ξ − ξ
˜ . ˜ − Eϕ(x ¯ ∗ (ξ), ξ) + CE max{1, ξ , ξ } ≤ v(ξ) ˜ we can write ¯ ∗ on f (Ξ) is contained in S(ξ), Since the restriction of x ∗ ˜ − Eϕ(x ¯ (ξ), ξ) v(ξ) ˜ ξ) ˜ − Eϕ(x ¯ ∗ (ξ), ξ) ¯ ∗ (ξ), ≤ Eϕ(x T ¯ ∗t (ξ˜[t] ) + bt (ξt ), x ¯ ∗t (ξ˜[t] ) − x ¯ ∗t (ξ[t] ) E bt (ξ˜t ) − bt (ξt ), x = t=2
≤B
T
¯ ∗t (ξ˜[t] ) + max {1, ξt } x ¯ ∗t (ξ˜[t] ) − x ¯ ∗t (ξ[t] ) . E ξ˜t − ξt x
t=2
(2.31) ¯ ∗ and x∗ coincide on the set f (Ξ[T ] ), see (2.26), we Due to the fact that x ∗ ¯ obtain that x fulfills the boundedness condition (2.7) on f (Ξ[T ] ). Using this ¯ ∗ , each of the T − 1 summands in boundedness as well as the calmness of x (2.31) can be estimated. Thus, (2.31) is bounded from above by ˜ T · ξ − ξ
˜ , HE max{1, ξ , ξ } (2.32) with an appropriate constant H > 0, and we can use the relation T ≤ m1 to obtain ˜ . ˜ − v(ξ) ≤ (C + H)E max{1, ξ , ξ } ˜ m1 · ξ − ξ
v(ξ) In order to establish the reverse inequality, we consider a bounded optimal ˜ Following exactly the construction of Section 2.4, we ˜ ∗ of v(ξ). decision x ¯˜ ∗ ∈ S(ξ) that is calm in the sense of Proposition 2.4.1 obtain a decision x ˜ As in (2.31), it follows and whose restriction on f (Ξ[T ] ) is optimal for v(ξ). that ˜ v(ξ) − v(ξ) ˜ ξ) ˜ ¯˜ ∗ (ξ), ξ) − Eϕ(x ¯˜ ∗ (ξ), ≤ Eϕ(x T ¯˜ ∗t (ξ˜[t] ) − x ¯˜ ∗t (ξ[t] ) + ξ˜t − ξt x ¯˜ ∗t (ξ˜[t] )
E max {1, ξt } x ≤B ≤ HE max{1, ξ[T ] , ξ˜[T ] }T · ξ[T ] − ξ˜[T ] . t=2
30
Chapter 2. Stability of Multistage Stochastic Programs
Applying again T ≤ m1 and setting γ C + H completes the proof. Whenever the process ξ˜ is constructed by successively clustering the conditional distributions of the process ξ, the discretization mapping f can be chosen as the (conditional) projection mapping on the resulting scenario tree ˜ from Theorem 3 reflects the discretizaand the upper bound on |v(ξ) − v(ξ)| tion error. The recent approaches of Bally et al. (2005), Hochreiter and Pflug (2007), Pennanen (2009), and Heitsch and Römisch (2008) are examples for such clustering methods. Other approximation techniques, e.g., those proposed by Høyland and Wallace (2001) and Mirkov and Pflug (2007), are not based on projections of the initial process ξ and yield only the distribution ˜ Consequently, neither the joint distribution of ξ and ξ˜ nor of the process ξ. the underlying probability space(s) are necessarily specified. Thus, it seems preferable to generalize our stability result to obtain an upper bound in terms of a distance of the initial and approximated distributions. This is done by the following corollary. Corollary 2.5.2. Suppose the Assumptions 2.2.1, 2.2.3, and 2.2.6 are ful˜ that ˜ F, ˜ P) filled. Let ξ˜ be a stochastic process on a further probability space (Ω, p ˜ ˜ ˜ also takes values in Ξ[T ] , is contained in L (Ω, F, P) for every p ∈ [1, +∞), and fulfills Assumption 2.2.3, too. Assume further that ξ˜1 = ξ1 and consider the constant m1 defined by (2.12). Then there exists a constant γ > 0, such that ˜ ≤ γ inf E [max{1, ξ , f (ξ) }m1 · ξ − f (ξ) ] (2.33) v(ξ) − v(ξ) f with (2.34),(2.35)
holds true with the conditions f = (ft )Tt=1 , ft : Ξ[t] → Ξt measurable, for t = 1, . . . , T, d
˜ f (ξ) = ξ.
(2.34) (2.35)
Proof. Without loss of generality, we can assume that a mapping f with (2.34) and (2.35) exists. Otherwise, the assertion trivially holds since the infimum is then taken over an empty set and, hence, equal to +∞. Note that the optimization problem (2.1) does not depend on the choice of the underlying probability space, and, hence, we can use the product space Ω Ξ[T ] × Ξ[T ] with the σ-algebra F B(Ξ[T ] ) ⊗ B(Ξ[T ] ) = σ(A × B : A, B ∈ B(Ξ[T ] )). Thereby, B(Ξ[T ] ) denotes the Borel sets of Ξ[T ] . The probability measure P on (Ω, F) is defined by P[A × B] = P[ξ ∈ A, f (ξ) ∈ B] for A, B ∈ B(Ξ[T ] ). The random variables ξ and ξ˜ can be defined as the projection of Ω onto the first and second coordinate, respectively. Consequently, we have
2.5. Stability
31
ξ˜ = f (ξ) and f is an approximation mapping in the sense of Definition 2.3.2. Hence, the conditions of Theorem 3 are fulfilled and it follows that ˜ ≤ γ E [max{1, ξ , f (ξ) }m1 · ξ − f (ξ) ] . v(ξ) − v(ξ) Since this estimate holds true for all f with (2.34) and (2.35), the assertion of the corollary is verified. Corollary 2.5.2 will be applied in Chapter 3 to establish the consistency of an algorithm for the generation of recombining scenario trees. Note that determining a mapping f that realizes the infimum in (2.33) is related to Lp -minimal metrics and to mass transportation problems, see also Remark 2.3 of Heitsch et al. (2006). For results on mass transportation problems we refer to the books of Rachev (1991) and Rachev and Rüschendorf (1998), the recent thesis of Pratelli (2003), and the references therein. However, our framework differs due to the nonanticipativity condition (2.34) on f .
Chapter 3 Recombining Trees for Multistage Stochastic Programs In order to solve multistage stochastic optimization problems by numerical methods, the underlying stochastic process is usually approximated by a process that takes only a finite number of values. Consequently, the approximating process can be represented by a scenario tree, where the nodes of the tree correspond to the possible realizations of the process and the tree structure is induced by the filtration generated by the process. Unfortunately, the number of nodes can grow exponentially as the number of time stages increases, and the corresponding optimization problem thus becomes quickly intractable. Hence, many problems of practical interest are represented by stochastic programming models that include only either a small number of time stages or a small number of scenarios. Thereby, models with a small number of time stages 1 either take only a short time horizon into account or they allow only for a very limited branching scheme of the scenario tree. Thus, such models may appear too simplified to represent dynamic decision problems. In order to construct scenario trees that approximate the initial underlying stochastic process as best as possible by a small number of scenarios, certain approximation and scenario reduction techniques have been developed by Pflug (2001), Gröwe-Kuska et al. (2003), Heitsch and Römisch (2003), and Dupačová et al. (2003). Considering only few scenarios allows to solve problems with several thousands of time stages, see, e.g., the recent case study of Eichhorn et al. (2008). However, this reduction requires some compromise with regard to the representation of the underlying stochastic process. 1 The time points where new information is revealed to the decision maker are usually denoted as time stages.
34
Chapter 3. Recombining Trees for Multistage Stochastic Programs
An approach often used in practice, aiming to find acceptable decisions along a concrete observation process, is to optimize with a rolling time horizon (cf., e.g., Sethi and Sorger (1991)). Thereby, a solution is constructed by solving a sequence of subproblems on small overlapping time intervals. However, decisions made by considering only a short time horizon will be myopic and thus generally not optimal whenever the optimization problem includes time-coupling constraints. The situation can be somewhat improved by finding suitable (shadow-) prices for those decision variables that affect the future costs. A further approach to handle problems of larger dimensionality relies on recombining scenario trees. The probably best-known example is the Binomial Model of Stock Price Behaviour due to Cox, Ross, and Rubinstein (1979), where the node number of a binary scenario tree with T time stages decreases from 2T − 1 to T (T + 1)/2 by the recombination of scenarios. However, in a recombined node no information about the history of the parameter and the decision processes is available. Consequently, recombining scenario trees seem at first sight not appropriate to solve optimization problems including time-coupling constraints. In this chapter, we develop a decomposition approach that extends the concept of recombining trees to linear multistage stochastic programs involving time-coupling constraints. The basic idea is related to the cut sharing principle that has been already discussed by Infanger and Morton (1996). They consider a specific class of linearly autoregressive processes and show how on a node of the scenario tree the cutting plane approximation within the Nested Benders decomposition has to be modified in order to be also valid for other nodes. While the cut sharing principle allows to improve the approximation quality and thus the convergence of the solution method, it does not reduce the number of subproblem evaluations. In this chapter, the cut sharing principle is extended in several directions. On the one hand, we do not assume the underlying process to be autoregressive. Instead, we assume its distribution is given by a scenario tree where different nodes at the same time stage share the same subtree. This allows us to apply cut sharing directly2 within a Nested Benders Decomposition, since coinciding subtrees lead to identical recourse functions. On the other hand, we introduce a dynamic aggregation scheme allowing us to handle an exponential growth of the number of subproblem evaluations. Thereby, subproblems on nodes with coinciding subtrees are aggregated as long as the 2 Directly means here that we do not have to carry out a cut transformation as done by Infanger and Morton (1996) in order to make a cut feasible for other nodes.
3.1. Problem Formulation and Decomposition
35
corresponding (best known) decisions are similar. In particular, this aggregation allows to avoid spending much effort in approximating the recourse function on “uninteresting” regions3 . The numerical experiences presented in this chapter show that this approach enables a considerable reduction of running times and to solve problem instances including many time stages. Having in mind the considered scenario tree framework, the aggregation of decision points and the consequential reduction of the number of recourse function evalutions correspond to a dynamic recombination of nodes within the solution algorithm. This chapter is structured as follows. In Section 3.1 we specify the considered class of multistage stochastic problems and recall some concepts and results on the Benders decomposition, cut sharing, and recombining scenario trees. The solution algorithm as well as various convergence results and stopping criteria are detailed in Section 3.2. In Section 3.3 we develop a method for constructing recombining scenario trees and apply the stability results of Chapter 2 to verify a certain type of consistency of the construction method. A case study and some numerical results are shown in Section 3.4. Finally, we discuss in Section 3.5 an approach to evaluate numerically the quality of approximations and solutions to linear multistage stochastic programs.
3.1
Problem Formulation and Decomposition
Following the notation of Chapter 2, we consider an Rs -valued discrete time stochastic process ξ = (ξ1 , . . . , ξT ) on a probability space (Ω, F, P), and the following linear multistage stochastic program4 v(ξ) inf E ϕ x(ξ), ξ (3.1) s.t. x ∈ Mm , x (ξ ) ∈ X , t = 1, ..., T t [t] t [1,T ] At,0 (ξt )xt (ξ[t] ) + At,1 (ξt )xt−1 (ξ[t−1] ) = ht (ξt ),
t = 2, ..., T
where the objective function is again defined by ϕ(x1 , . . . , xT , ξ[T ] )
T bt (ξt ), xt .
(3.2)
t=1 3 During the first iterations of the Benders Decomposition, the resulting solutions are generally still far from being optimal. Thus, one is not interested in a precise approximation of the recourse function in such points. 4 In contrast to the problem (2.1) considered in Chapter 2, the mappings bt , ht , and At,1 are not assumed to be affine here. Furthermore, we here allow for random recourse matrices At,0 and do not impose complete recourse.
36
Chapter 3. Recombining Trees for Multistage Stochastic Programs
Thereby, the sets Xt ⊂ Rm are assumed to be closed and polyhedral, and the costs bt (·) and the right-hand sides ht (·) are measurable mappings from Rs to Rm and Rr , respectively, for t = 1, . . . , T . Further, the recourse and technology matrices are measurable mappings from Rs to Rr×m . Since the purpose of this chapter is to develop a numerical solution method to problem (3.1), it is assumed in the following that ξ takes only finitely many values, i.e., the support of ξ[t] ’s distribution can be written as i : i = 1, . . . , nt ∈ N} supp P ξ[t] ∈ · ⊂ Rs·t , Ξ[t] {ξ[t] for t = 1, . . . , T . In particular, the finiteness of Ξ[T ] allows to represent the process ξ by a scenario tree, cf., e.g., Dupačová et al. (2000). We denote the i } as node i at time t. set {ξ[t] = ξ[t] In order to decompose the optimization problem (3.1), we consider certain time stages 0 = R0 < R1 < . . . < Rn < Rn+1 = T. i The recourse function at time Rj and state (ξ[R , xRj ) ∈ Ξ[Rj ] × XRj is dej] 5 fined recursively by QRn+1 (·, ·) 0 and the Dynamic Programming Equation
i QRj (ξ[R , x Rj ) j]
R j+1
inf E
bt (ξt ), xt (ξ[t] )
t=Rj +1
i (3.3) + QRj+1 (ξ[Rj+1 ] , xRj+1 (ξ[Rj+1 ] )) ξ[Rj ] = ξ[R j] s.t. x ∈ Mm [Rj +1,Rj+1 ] ,
xt ∈ Xt ,
t = Rj + 1, . . . , Rj+1 ,
At,0 (ξt )xt (ξ[t] ) + At,1 (ξt )xt−1 (ξ[t−1] ) = ht (ξt ), t = Rj + 1, . . . , Rj+1 , for j = n, . . . , 1. If the minimization problem (3.3) is infeasible for some xRj , i , xRj ) = +∞. we set QRj (ξ[R j] Due to the finiteness of Ξ[T ] , one can verify recursively the following result that corresponds to Proposition 30 of Ruszczyński and Shapiro (2003a). Proposition 3.1.1. For every j = 1, . . . , n, and i = 1, . . . , nRj , the function i xRj → QRj (ξ[R , xRj ) is polyhedral and convex on its domain. j] 5 The definition (3.3) of QRj slightly differs from that used in Chapter 2, since here the recourse function QRj depends on xRj instead of xRj −1 . Hence, it represents the expected minimal achievable future costs after having observed ξ[Rj ] and after having decided on xRj . In this chapter, this notation is used for notational convenience.
3.1. Problem Formulation and Decomposition
37
Problem (3.1) can be reformulated in terms of the recourse function QR1 as follows: R 1 bt (ξt ), xt (ξ[t] ) + QR1 (ξ[R1 ] , xR1 (ξ[R1 ] )) (3.4) v(ξ) = QR0 min E t=1
xt (ξ[t] ) ∈ Xt ,
t = 1, . . . , R1 ,
At,0 (ξt )xt (ξ[t] ) + At,1 (ξt )xt−1 (ξ[t−1] ) = ht (ξt ),
t = 2, . . . , R1 .
s.t. x ∈
Mm [1,R1 ] ,
Thus, the initial problem is decomposed into the optimization problem (3.4) with a time horizon R1 , and nR1 problems of determining the recourse funci tion QR1 (ξ[R , ·), i = 1, . . . , nR1 . The decomposed problem can be tackled 1] by, e.g., the Nested Benders Decomposition that is recalled in the following section.
3.1.1
Nested Benders Decomposition
In a Nested Benders Decomposition the unknown function QRj+1 (ξ[Rj+1 ] , ·) and QR1 (ξ[R1 ] , ·) in (3.3) and (3.4) are replaced by outer linearizations, i.e., piecewise linear underestimating functions that are based on cutting plane approximations. The resulting modifications of problems (3.3) and (3.4) are then linear programs and can be solved numerically; their solutions can be used to construct outer linearizations for QRj (ξ[Rj ] , ·) and an lower bound on the optimal value v(ξ), respectively. Thereby, the linearizations of the recourse functions are successively improved according to the following principle. Solving a subproblem at time Rj with an approximation of QRj+1 generates solution points xRj+1 . Solving then subsequent subproblems at time Rj+1 in the points xRj+1 possibly yields, on the one hand, a subgradient of QRj+1 that can be used to update the outer linearization at time Rj and, thus, the subproblem at time Rj . On the other hand, solution points xRj+1 can be passed again as initial values to the subsequent problems. Solving and updating repeatedly the modified problems at time stages R0 , . . . , Rn , allows one to improve the cutting plane approximations and lets the optimal value of the linearized problem converge to the optimal value of the intial problem (3.4). Furthermore, due to the finiteness of Ξ[T ] , every function QRj can be described by a finite number of cutting planes. Consequently, the Nested Benders Decomposition stops after a finite number of steps, either by revealing the infeasibility of problem (3.4), or by having solved it to optimality. Being dual to the decomposition of Dantzig and Wolfe (1960), this decomposition approach has been developed by Benders (1962) for determin-
38
Chapter 3. Recombining Trees for Multistage Stochastic Programs
istic two-stage linear programs. Benders’ approach has been extended to two-stage stochastic linear programs by Van Slyke and Wets (1969) and to multistage stochastic programs by Birge (1985). While within these early approaches a single cut has been generated for each node by averaging over the cuts of the next stage recourse function, the approach described above (and studied in the following) directly inserts the cuts for the Rj+1 -th time stage into (3.3) and (3.4). This corresponds to a multistage version of the multicut algorithm introduced by Birge and Louveaux (1988). A recent study of multicut approximations can be found in Ntaimo et al. (2007). For further extensions and modifications of the cutting plane method as, e.g., parallel decomposition, regularization and trust regions, and estimating cuts from sample means, we refer to Gassmann (1990), Infanger (1994), Birge and Louveaux (1997), Ruszczyński (2003), and the references therein.
3.1.2
Cut Sharing
An extension of the Nested Benders Decomposition that is related to the approach studied in this chapter is the principle of cut sharing, that has been applied, e.g., by Infanger and Morton (1996) and Morton (1996). Cut sharing relies on the following observation. While the Nested Benders Decomposition allows to decompose the initial problem into a set of smaller, more easily solvable subproblems, the number of subproblems at time stage Rj equals the number of nodes nRj , that may grow exponentially as Rj increases. Hence, the Nested Benders Decomposition becomes easily inctractable if longer time horizons are considered. However, certain stochastic processes ξ are such that the recourse funci , ·) coincide for several values of i. The most simple example is tions QRj (ξ[R j] the case of independent random variables ξt , t = 1, . . . , T . In this case, the recourse function does not depend on the realization of the random vector ξ[Rj ] . Cut sharing then refers to use a cutting plane, that approximates a certain i i function QRj (ξ[R , ·), to approximate all coinciding functions QRj (ξ[R , ·) as j] j] well. To the best of our knowledge, cut sharing has been mainly used within decomposition based on Monte-Carlo sampling rather than on scenario trees, see Pereira and Pinto (1991) and Infanger and Morton (1996). While cut sharing allows to reduce the number of different recourse functions at every stage Rj , these functions have to be evaluated at every iteration of the decomposition in a large number (i.e., nRj ) of decision points. Thus, in order to ensure acceptable solution times, the algorithm proposed by Pereira and Pinto (1991) evaluates the recourse function in a small number of randomly chosen points xRj . To the best of our understanding, the approach
3.1. Problem Formulation and Decomposition
39
of Infanger and Morton (1996) does not concern the exponentially growing number of control points and subproblems.
3.1.3
Recombining Scenario Trees
Let us now recall the principle of recombining scenario trees and introduce some notation that is used in this chapter. i By its definition (3.3), the recourse function QRj (ξ[R , ·) is determined by j] the conditional distribution i ]. P[ξ[Rj +1,T ] ∈ · | ξ[Rj ] = ξ[R j] i k Consequently, the recourse functions QRj (ξ[R , ·) and QRj (ξ[R , ·) in two j] j] nodes i and k at time Rj coincide whenever the identity i k = a = ξ = P ξ (3.5) P ξ[Rj +1,T ] = a ξ[Rj ] = ξ[R ξ [R +1,T ] [R ] [R ] ] j j j j
holds true for every a ∈ Rs·(T −Rj ) , i.e., the (scenario) subtrees for the time period [Rj + 1, T ] originated at the nodes i and k are equal, cf. Figure 3.1. i ) and Without time-coupling constraints (or if the decision variables xRj (ξ[R] k xRj (ξ[R] ) would be claimed to be equal), it would be sufficient to consider a single subproblem for the two nodes i and k. Consequently, it would be sufficient to associate a single subtree to the two nodes i and k, see again Figure 3.1, and we say that the two nodes can be recombined. Note that the resulting recombining scenario tree would have much less nodes than the initial tree and repeating this recombining at several time stages could prevent the node number to grow exponentially with the number of time stages. This concept of recombining nodes is also used by the Binomial Model of Stock Price Behaviour by Cox, Ross, and Rubinstein (1979), where the only linearly growing node number allows to solve option pricing problems including many time stages. However, most stochastic programming problems of practical interest include time-coupling constraints and, in general, optimal scenario-dependent i k ) and xRj (ξ[R ) will not be equal. Consequently, it is not controls xRj (ξ[R j] j] possible to use a scenario tree with only a single subtree assigned at the nodes i and k. Having in mind the Lipschitz continuity of QRj with respect to xRj , we propose to consider only one subproblem for the nodes i and k i k as long as xRj (ξ[R ) and xRj (ξ[R ) are close. With regard to this dynamic j] j] recombining we will refer to scenario trees with coinciding subtrees (i.e., with property (3.5) and as shown on the left side of Figure 3.1) as recombining
40
Chapter 3. Recombining Trees for Multistage Stochastic Programs
1
2
x3 Ξ i
x3 Ξ i
x3 Ξ j
x3 Ξ j
3
4
5
1
2
3
4
5
Figure 3.1: Left side: Binary scenario tree with coinciding subtrees (gray). Right side: Recombining scenario tree.
scenario trees, although these trees are not recombining in the “classical” sense of Cox et al. (1979) and as shown on the right side of Figure 3.1. The relation (3.5) divides the set of nodes {1, . . . , nRj } at time Rj into a familiy of equivalence classes, where all nodes in one class share the same assigned conditional distribution. These equivalence classes can be represented mR by a set ΛRj = {λ1Rj , . . . , λRj j } ⊂ Ξ[Rj ] of representative nodes at time Rj , i ∈ Ξ[Rj ] there exists one and i.e., the set ΛRj is chosen such that for every ξ[R j] m m k only one λRj ∈ ΛRj such that λRj = ξ[Rj ] for some k ∈ {1, . . . , nRj } and (3.5) holds for i and k. Thereby, mRj ∈ N denotes the number of different subtrees originated at time stage Rj , j = 1, . . . , n. Given the set ΛRj of representative i nodes, every node ξ[R ∈ Ξ[Rj ] can be associated to a representative node by j] the well-defined mapping i i λRj : Ξ[Rj ] → ΛRj , such that ξ[R and λRj (ξ[R ) fulfill (3.5). j] j]
We have thus mRj different recourse functions associated to the nRj nodes at time stage Rj .
3.2
An Enhanced Nested Benders Decomposition
In this section we propose an enhanced Nested Benders Decomposition for the solution of (3.1), that uses cut sharing and dynamic recombination of nodes. Section 3.2.1 recalls the details of the Nested Benders Decomposition and the
3.2. An Enhanced Nested Benders Decomposition
41
construction of cutting planes. In Sections 3.2.2 and 3.2.3 it is shown how the coincidence of subproblems, the Lipschitz continuity, and the convexity of recourse functions allow to handle the exponentially growing number of subproblem evaluations by applying a dynamic recombination scheme. The complete decomposition algorithm is shown in Section 3.2.4.
3.2.1
Cutting Plane Approximations
Let us consider the dynamic formulation of problem (3.1) given by (3.3) and (3.4). Note that, due to the recombining property (3.5), the functions i QRj (ξ[R , ·), i = 1, . . . , nRj in (3.3) can be written in terms of the mappings j] i QRj (λRj , ·), i = 1, . . . , mRj . The Nested Benders Decomposition replaces the unknown recourse functions QRj+1 (λiRj+1 , ·) in (3.3) by certain piecewise linear, underestimating approximations L ¯ wR : ΛRj+1 × XRj+1 → R. j+1
This allows us to define for j = n, . . . , 0, x¯Rj ∈ XRj , and λiRj ∈ ΛRj the following functions: i QL ¯Rj ) Rj (λRj , x
(3.6)
j+1 R bt (ξt ), xt (ξ[t] ) inf E
t=Rj +1
L + wR λRj+1 (ξ[Rj+1 ] ), xRj+1 (ξ[Rj+1 ] ) ξ[Rj ] = λiRj j+1
s.t. xt ∈ Xt , t = Rj + 1, . . . , Rj+1 , x ∈ Mm [Rj +1,Rj+1 ] , At,0 (ξt )xt (ξ[t] ) + At,1 (ξt )xt−1 (ξ[t−1] ) = ht (ξt ), t = Rj + 1, . . . , Rj+1 , x Rj = x ¯ Rj , L where the term wR is omitted for j = n. For j = 0, neither a state of the j+1 process ξ nor a preceding decision has to be taken into account; QLR0 is thus a constant. Note that
QLRj (λiRj , x¯Rj ) ≤ QRj (λiRj , x¯Rj )
(3.7)
for every i = 1, . . . mRj , x¯Rj ∈ XRj , and j = 1, . . . , n. In particular, we have QLR0 ≤ QR0 = v(ξ). The mappings wRj+1 (λiRj+1 , ·), i = 1, . . . , mRj+1 are outer linearizations of QLRj+1 (λiRj+1 , ·) on the basis of certain optimality and feasiblity cuts that are detailed below. In the following, the problem (3.6) with some initial values (λiRj , x¯Rj ) is referred to as subproblem QLRj .
42
Chapter 3. Recombining Trees for Multistage Stochastic Programs
Remark 3.2.1. For notational simplicity, we avoid unboundedness of the subproblems (3.3) by assuming in the following that the decision sets Xt , t = 1, . . . , T , are bounded. However, in general, unbounded sets Xt can be handled by incorporating the method6 proposed by Van Slyke and Wets (1969). Note that, in difference to the classical Nested Benders Decomposition as introduced by Birge (1985), we do not decompose the problem at every time stage, but only at the stages Rj , j = 1, . . . , n. Of course, it is possible to further decompose the subproblems (3.6). Optimality and Feasibility Cuts Let us now sketch the construction of optimality and feasibility cuts that form the mappings wRj . We mainly follow the lines of Ruszczyński (2003), who discusses the construction and the properties of cutting planes in more detail. Cuts at time Rj , j = n, . . . , 1, are constructed recursively as follows. Let us consider some point x¯Rj ∈ XRj that is feasible to problem (3.6), ¯ of the dual variables i.e., q¯ QLRj (λiRj , x¯Rj ) < +∞. Then, the vector π corresponding to the constraint xRj = x¯Rj in an optimal solution of problem (3.6) is a subgradient of the convex function QLRj (λiRj , ·) in the point x¯Rj . Thus, the affine mapping xRj → q¯ + ¯ π , xRj − x¯Rj is a lower bound on QLRj (λiRj , ·) and is denoted as an optimality cut supporting QLRj (λiRj , ·). This cut can be represented by the triple (¯ xRj+1 , q¯, π ¯ ). The set of optimality cuts in the node λiRj is denoted by Copt (λiRj ). If j = 0, no cuts are generated and the value QLR0 provides a lower bound on the optimal value v(ξ). If the subproblem at j = 0 (whose constraints result from relaxing constraints of (3.1)) is infeasible, the initial problem (3.1) is also infeasible. Whenever a point x¯Rj ∈ XRj is infeasible for (3.6) with j > 0, this point is “cut off” by the following feasibility cut. For x ∈ Mm [Rj ,Rj+1 ] we introduce the term
Rj+1
DRj (x)
d1 (xt (ξ[t] ), Xt )+At,0 (ξt )xt (ξ[t] )+At,1 (ξt )xt−1 (ξ[t−1] )−ht (ξt )1 ,
t=Rj +1
where d1 (y, A) inf z∈A y − z 1 for y ∈ Rr and A ⊆ Rr . The term DRj (x) measures the (random) constraint violation of the decision x on the time 6 Having found an unbounded subproblem, one determines a ray of unbounded decrease and solves an auxiliary problem in order to either construct additional constraints or to reveal the unboundedness of the initial problem.
3.2. An Enhanced Nested Benders Decomposition
43
interval {Rj +1, . . . , Rj+1 }. We further define the auxiliary function U (λiRj , ·) measuring the minimal L1 -distance of the point x¯Rj ∈ XRj from the feasible set of (3.6): i i U (λRj , x¯Rj ) inf xRj − xˆRj 1 + E DRj (x) ξ[Rj ] = λRj (3.8) s.t. , xˆ = x¯Rj x ∈ Mm [Rj ,Rj+1 ] Rj u¯ + π ¯ , xRj+1 (ξ[Rj+1 ] ) − x¯Rj+1 ≤ 0, ¯ ) ∈ Cfeas (λRj+1 (ξ[Rj+1 ] )) ∀(¯ xRj+1 , u¯, π By introducing slack variables7 , U (λiRj , x¯Rj ) can be written as the optimal value of a linear program and thus be determined numerically. Note that U (λiRj , x¯Rj ) > 0 if and only if QLRj (λiRj , x¯Rj ) = +∞. In particular, the initial problem (3.1) is infeasible whenever U (λiRj , x¯Rj ) = +∞, i.e., if the feasibility cuts at time Rj+1 admit no feasible solution. In this case, the decomposition approach can be terminated. Let us assume that u¯ U (λiRj , x¯Rj ) ∈ (0, +∞) and denote by π ¯ the vector of dual variables corresponding to the constraint xˆRj = x¯Rj in an optimal solution of U (λiRj , x¯Rj ). Considering the lineariza tion xRj → u¯ + π ¯ , xRj − x¯Rj of U (λiRj , ·) at x¯Rj , a feasibility cut is given by the constraint u¯ + π ¯ , xRj − x¯Rj ≤ 0 and can be represented by the triple ¯ ). The set of feasiblity cuts in the node λiRj is denoted by Cfeas (λiRj ). (¯ xRj , u¯, π L Altogether, the optimality and feasibility cuts define the mappings wR j via the following relation: L wR (λiRj , xRj ) j xRj , u¯, π ¯ )∈ Cfeas (λiRj ), +∞,if u¯ + π ¯ , xRj − x¯Rj > 0 for some (¯ max q¯ + ¯ π , xRj − x¯Rj : (¯ xRj , q¯, π ¯ ) ∈ Copt (λiRj ) , else.
(3.9) Furthermore, the estimate (3.7) extends to L (λiRj , x¯Rj ) ≤ QLRj (λiRj , x¯Rj ) ≤ QRj (λiRj , x¯Rj ) wR j
(3.10)
for every i = 1, . . . mRj , x¯Rj ∈ XRj , and j = 1, . . . , n.
7 The auxiliary variable x ˆRj in (3.8) is not a slack variable; it has been introduced in order to move the parameter x ¯Rj into a constraint to which a dual variable can be assigned.
44
Chapter 3. Recombining Trees for Multistage Stochastic Programs
Nested Benders Decomposition Algorithm While the solution algorithm repeatedly solves the subproblems (3.6) for L i = 1, . . . , mRj and j = 0, . . . , n, the approximations wR are successively j updated. Thereby, the subproblems at time stage Rj exchange information with the subproblems at Rj−1 and Rj+1 as follows. Let us assume that the evaluation of QLRj (λiRj , xRj ) for some8 xRj ∈ XRj entails that the cutting plane approximation of QLRj (λiRj , ·) is not exact in xRj , i.e., we observe L (λiRj , xRj ). QLRj (λiRj , xRj ) > wR j
Then, a new optimality cut (if QLRj (λiRj , xRj ) < +∞) or feasibility cut (if QLRj (λiRj , xRj ) = +∞) is generated and passed to the stage Rj−1 through the L . On the other hand, the solution of the subproblem (3.6) definition of wR j generates new decision points xRj+1 that are passed down to the descendant subproblems at time Rj+1 . Note that there is a total number of mRj+1 different descendant subproblems; all decision points xRj+1 belonging to subproblem i are collected in a set denoted by ZRj+1 (λiRj+1 ), i = 1, . . . , mRj+1 . As remarked by Ruszczyński (2003), there is much freedom in choosing the order in which the subproblems (3.6) are solved. In the literature, this order is sometimes referred to as the sequencing protocol. We have applied a fast-forward-fast-backward procedure, that is detailed in Algorithm 3.2. Thereby, the tree is alternately traversed in a forward and a backward mode. In the forward mode subproblems are solved and sets ZRj (λiRj ) of decision points for the subsequent time stage are generated. The traversing direction is changed into the backward mode when the end of the tree is reached. Then, in the backward mode, further cuts are generated and used to update the ancestor problems. This and other tree traversing strategies are discussed by Wittrock (1983), Birge (1985), Gassmann (1990), and Morton (1996). With regard to the flow of information between the different subproblems, the Nested Benders algorithm can be terminated either if all subproblems have been solved to optimality without yielding new cuts or new solution points, or if QL0 = +∞. In the former case, the cutting plane approximations are exact in the respective decision points at every time stage Rj , and, in particular, at stage R0 . Thus, the problem (3.4) has been solved to optimality. In the latter case, problem (3.4) is infeasible. These well-known results are summarized by the following proposition that corresponds to Theorem 18 of Ruszczyński (2003). xR j , u We here assume that xRj fulfils u ¯+ π ¯ , xRj − x ¯Rj ≤ 0, for every (¯ ¯, π ¯) ∈ Cfeas (λiRj ). 8
3.2. An Enhanced Nested Benders Decomposition
45
Proposition 3.2.2. The Nested Benders Decomposition Algorithm stops after a finite number of steps either by revealing the infeasibility of (3.4) or with a solution that is optimal for (3.4). The finiteness of the algorithm follows from the finiteness of Ξ[T ] that entails that the functions U (λiRj , ·) and QRj (λiRj , ·) are polyhedral and can hence presented by a finite number of cutting planes.
3.2.2
Dynamic Recombining of Scenarios
Applying cut sharing within the Nested Benders Decomposition allows one to reduce the number of different subproblems at time Rj from nRj to mRj and thus enables to improve the particular cutting plane approximations. However, let us consider the sets ZRj (λiRj ) containing the decision points xRj in which the stage Rj subproblems have to be evaluated. Solving the subproblem QLRj (λiRj , xRj ) for some xRj ∈ ZRj (λiRj ) generates a number of points xRj+1 that are added to the sets ZRj+1 . The number of generated points corresponds to the number of nodes at time Rj+1 of the i−th subtree originated at time Rj . For every time stage Rj , we thus have to solve a number of mRj |ZRj | = nRj i=1
subproblems, where the node number nRj can grow exponentially as Rj increases. In the literature, this fact is denoted as the curse of dimensionality. On the other hand, many of the points xRj ∈ ZRj (λiRj ) may be close to each other, and the approximating recourse function QLRj (λiRj , ·) is Lipschitz continuous on its domain. Therefore, with regard to the algorithm’s efficiency, it seems reasonable to aggregate each of the sets ZRj (λiRj ) to a subset ZˆRj (λiRj ) of representative points. To this end, we want to construct sets ZˆRj (λiRj ) of minimal cardinality such that ZRj (λiRj ) ⊂
Bρ∞ (ˆ xR j )
(3.11)
ˆR (λi ) x ˆRj ∈Z R j j
with
xRj ) x ∈ Rm : xRj − xˆRj ∞ ≤ ρ Bρ∞ (ˆ
for some aggregation parameter ρ ≥ 0. The value of ρ determines the extend of the aggregation; for ρ = 0 the sets ZRj (λiRj ) remain unchanged, whereas setting ρ = +∞ collapses every ZRj (λiRj ) to a singleton.
46
Chapter 3. Recombining Trees for Multistage Stochastic Programs
Algorithm 3.1. Aggregating decision points. 0. Set I ZRj (λiRj ), ZˆRj (λiRj ) ∅.
FOR ALL x ∈ ZRj (λiRj ): Set Nρ (x) {¯ x ∈ ZRj (λiRj ) : x − x ¯ ∞ ≤ ρ}.
1. 1.1. 1.2. 1.3.
Choose x ˆ ∈ argmaxx∈I |Nρ (x)|. ˆ. Update ZˆRj (λiRj ) = Zˆj (λiRj ) ∪ x Update I = I \ Nρ (ˆ x). Update Nρ (x) = Nρ (x) \ Nρ (¯ x) for x ∈ I.
2. IF I = ∅: Go back to Step 1.
In order to determine minimal subsets of representative points, let us consider a graph G with the vertices xRj ∈ ZRj (λiRj ). We say that an edge exists between two vertices xRj , x¯Rj ∈ ZRj (λiRj ), if and only if
xRj − x¯Rj ∞ ≤ ρ.
(3.12)
For given ρ ∈ (0, +∞), an optimal aggregation of ZRj (λiRj ) would be given by a minimum vertex cover in G, i.e., a subset of vertices ZˆRj (λiRj ) of minimal cardinality such that ZRj (λiRj ) coincides with the union of ZˆRj (λiRj ) and its neighbourhood9 in G. Since minimum vertex cover is known to be an NPcomplete problem, cp. Garey and Johnson (1979), we use a heuristics that is detailed by Algorithm 3.1. This heuristics successively selects representative points by searching for a vertex with a maximum neighbourhood. After removing this vertex and its neighbourhood from G, the algorithm proceeds with the remaining vertices until no points are left in G. Starting the Nested Benders Decomposition Algorithm with a large aggregation parameter ρstart , a rough approximation of the value functions QLRj (λiRj , ·) is obtained. Numerical experiences show that this preprocessing may lead to a significant speed-up, cf. Table 3.4 in Section 3.4.2. This is due to the fact that the rough approximation produces solution points that are already close to an optimal solution of the problem, and, hence, the generation of too many “useless” cuts during the first iterations of the algorithm can be avoided. After having roughly solved the problem, the approximation can be improved by decreasing ρ to the final value ρend . In applications, the coefficients xi , i = 1, . . . , m of a decision vector x may be of different scale. It is thus reasonable to consider then the conditions The neighbourhood of ZˆRj (λiRj ) in G is given by all those vertices in G that are linked by an edge with some element in ZˆRj (λiRj ). 9
3.2. An Enhanced Nested Benders Decomposition
47
Algorithm 3.2. Fast Forward Fast Backward - Sequencing Protocol. L (·, λi ) −M with some (large) constant M ≥ 0 0. Set ρ = ρstart and wR Rj j for every i = 1, . . . , mRj and j = 1, . . . , n.
1. forward mode: 1.1. Solve the subproblem at stage R0 with Algorithm 3.3. IF it is infeasible: TERMINATE. 1.2. FOR j = 1, . . . , n: 1.3. Solve the subproblems at stage Rj with Algorithm 3.3. 1.4. IF a feasibility cut has been generated: Set j = j − 1 and continue with step 1.2. 2. backward mode: 2.1. FOR j = n, . . . , 1: 2.2. IF new cuts have been generated for one of the descendant subproblems at Rj+1 : Solve the subproblems at stage Rj with Algorithm 3.3. 3. IF a new cut has been generated for some subproblem: Go back to step 1. 4. ELSE IF ρ > ρend : decrease ρ and go back to step 1. 5. ELSE IF ρ ≤ ρend : TERMINATE.
|xi − xˆi | ≤ ρi for several parameters ρi ≥ 0 and i = 1, . . . , m, instead of condition (3.12). The main loop of our extension of the Nested Benders Decomposition is detailed by Algorithm 3.2, where the aggregation of decision points during the decomposition is done by Algorithm 3.1, and the solution of the particular subproblems and the generation of cuts is done by Algorithm 3.3. Note that the pseudo-code command “RETURN” in Algorithm 3.3 goes back to the main loop of the decomposition, while “TERMINATE” quits the decomposition algorithm. The following result shows that the convergence of the decomposition algorithm also holds under aggregation of decision points. Proposition 3.2.3. We assume that relatively complete recourse holds. We denote by v the optimal value of (3.4) and by v L the optimal value of the approximating problem QL0 . Then the Nested Benders Decomposition with aggregation of decsision points, i.e., Algorithm 3.2 with the (Sub-)Algorithms 3.1 and 3.3, stops after a finite number of steps with a solution of problem QL0 that fulfills |v − v L | < Cρ for some constant C ≥ 0.
(3.13)
48
Chapter 3. Recombining Trees for Multistage Stochastic Programs
Algorithm 3.3. Handling a subproblem with aggregation of decision points. Set λ λiRj ∈ ΛRj . IF j = 0 AND forward mode: IF problem QL 0 has been updated: Solve problem QL 0. IF problem QL is infeasible: TERMINATE. 0 Add the points xR1 (ξ[R1 ] ) from the solution of QL 0 to Z1 (λR1 (ξ[R1 ] )). IF j > 0 : IF backward mode: Set Zj (λ) ∅. IF forward mode: Aggregate Zj (λ) to Zˆj (λ). IF forward mode OR problem QL Rj (λ, ·) has been updated: FOR ALL xRj ∈ Zˆj (λ) DO: Compute QL Rj (λ, xRj ). L IF QRj (λ, xRj ) = ∞: IF U (λ, xRj ) = +∞: TERMINATE. L (λ, ·). RETURN. ELSE: Construct a feasibility cut, update wR j L L ELSE IF QRj (λ, xRj ) > wRj (λ, xRj ): L (λ, ·). Construct an optimality cut and update wR j IF forward mode AND j < n: Add the points xRj+1 (ξ[Rj+1 ] ) from the solution of (3.6) to Zj+1 (λRj+1 (ξ[Rj+1 ] )).
Proof. The algorithm stops if no new cuts have been generated. Hence, as in the proof of Theorem 18 of Ruszczyński (2003), the finiteness of the algorithm follows from the polyhedral form of the recourse functions. Let us consider the sets ZRj (λiRj ), λiRj ∈ ΛRj , that have been generated in the last forward pass of the Nested Benders Algorithm. In order to verify inequality (3.13), it is sufficient to show that the estimate L |QRj (λiRj , x) − wR (λiRj , x)| ≤ Cj ρ j
∀x ∈ ZRj (λiRj ),
i = 1, . . . , mj , (3.14) L i holds true for j = 1 with some constant C1 ≥ 0, where the term wR (λ Rj , x) j is defined by equation (3.9). We prove (3.14) recursively as follows. At the final stage j = n + 1 we L have QRn+1 (·, ·) ≡ wR (·, ·) ≡ 0 by definition, (3.14) holds thus true with j Cn+1 = 0. Let us now assume that (3.14) holds true at stage Rj+1 for some
3.2. An Enhanced Nested Benders Decomposition
49
j ∈ {1, . . . , n}. We consider some λiRj ∈ ΛRj and the aggregation mapping ϕ : ZRj (λiRj ) → ZˆRj (λiRj ) that maps every xRj ∈ ZRj (λiRj ) to a nearby representative point, i.e., x − ϕ(x) ∞ ≤ ρ for all x ∈ ZRj (λiRj ). We now consider some x ∈ ZRj (λiRj ). Since x and ϕ(x) are the decisions at time Rj that have been generated during the last forward mode, they are each based on decisions at the stages t = 1, . . . , Rj − 1 that are feasible to the initial problem. The relatively complete recourse therefore implies that QRj (λiRj , x) and QRj (λiRj , ϕ(x)) are finite. Therefore, we can bound the term on the left side of (3.14) as follows: L (λiRj , x) ≤ QRj (λiRj , x) − QRj (λiRj , ϕ(x)) QRj (λiRj , x) − wR j L i + QRj (λiRj , ϕ(x)) − wR (λ , ϕ(x)) R j j L i L (λRj , x) − wR (λiRj , ϕ(x)) . (3.15) + wR j j L The polyhedral functions QRj (λiRj , ·) and wR (λiRj , ·) are Lipschitz continuous j on their domain. Consequently, the first and the third term on the right side of (3.15) are each bounded by Lj x − ϕ(x) ∞ for some constant Lj ≥ 0. In order to estimate the second term on the right side of (3.15), we recall that the Nested Benders Algorithm has been terminated because no optimality cuts have been generated after the last evaluation of QLRj (λiRj , ϕ(x)). ConL (λiRj , ϕ(x) = QLRj (λiRj , ϕ(x)) holds true. Let x∗ = sequently, the identity wR j (x∗Rj +1 , . . . , x∗Rj+1 ) be the optimal solution of problem (3.6) that has been obtained from the evaluation of QLRj (λiRj , ϕ(x)). Note that x∗Rj+1 (ξ[Rj+1 ] ) ∈ ZRj+1 (λRj+1 (ξ[Rj+1 ] )). Consequently, we can derive the following relation: L i (λ , ϕ(x)) QRj (λiRj , ϕ(x)) − wR Rj j
= QRj (λiRj , ϕ(x)) − QLRj (λiRj , ϕ(x)) ≤ E QRj+1 (λRj+1 (ξ[Rj+1 ] ), x∗Rj+1 (ξ[Rj+1 ] ))
L − wR (λRj+1 (ξ[Rj+1 ] ), x∗Rj+1 (ξ[Rj+1 ] )) ξ[Rj ] = λiRj j+1
≤ Cj+1 ρ.
(3.16)
Thereby, the first inequality follows from applying the solution x∗ to the optimization problems that define QRj (λiRj , ϕ(x)) and QLRj (λiRj , ϕ(x)). The second inequality follows from the estimate (3.14) for the stage Rj+1 .
50
Chapter 3. Recombining Trees for Multistage Stochastic Programs
Piecing all this together, we conclude that the estimate L (λiRj , x) ≤ Cj+1 ρ + 2Lj x − ϕ(x) ∞ ≤ Cj ρ QRj (λiRj , x) − wR j holds true with the constant Cj Cj+1 + 2Lj . Thus, (3.14) also holds true for j; the last assertion of the proposition then follows by recursion. The relatively complete recourse assumption made in Proposition 3.2.3 is necessary to ensure that QRj (λiRj , x) < +∞ for those points x ∈ ZRj (λiRj ) that are not contained in the reduced set ZˆRj (λiRj ). Indeed, evaluating QLRj (λiRj , ·) in such a point x would lead to a feasibility cut that removes x from ZRj (λiRj ), but the aggregation prevents the evaluation of QLRj (λiRj , x). Thus, relatively complete recourse is not necessary if the aggregation does not remove any point for which a feasibility cut has to be generated10 . Since the feasible set of problem (3.3) is convex, this is true whenever all extremal points of ZRj (λiRj )’s convex hull are contained in the reduced set ZˆRj (λiRj ). For such a modification of the aggregation scheme one has to determine the convex hull of the set ZRj . This can be done, e.g., by the quickhull algorithm 11 introduced by Barber et al. (1997). An aggregation scheme that conserves the convex hulls of the sets ZRj (λiRj ) is denoted in the following as a convex aggregation. The following corollary shows that a convex aggregration indeed allows to omit the assumption of relatively complete recourse. Corollary 3.2.4. The Nested Benders Decomposition with convex aggregation of decision points stops after a finite number of steps either with reporting the infeasiblity of the initial problem, or with a solution of QL0 that fulfills the inequality (3.13). Proof. While we can mainly adopt the proof of Proposition 3.2.3, we have to verify that QRj (λiRj , x) and QRj (λiRj , ϕ(x)) are finite for all decisions x ∈ ZRj (λiRj ) at time Rj that have been generated during the last forward mode. This can be done recursively as follows. For j = n + 1, the asserted property holds true due to QRj ≡ 0. Let us consider some j ∈ {1, . . . , n} and assume that the property holds for some j + 1. For arbitrary x ∈ ZRj (λiRj ), we have that QLRj (λiRj , ϕ(x)) < +∞. Let x∗ = (x∗Rj +1 , . . . , x∗Rj+1 ) be the optimal 10 Note that under relatively complete recourse, no feasibility cuts have to be generated at all. 11 A recent and widely used implementation of the quickhull algorithm is available at http://www.qhull.org/.
3.2. An Enhanced Nested Benders Decomposition
51
solution of this subproblem obtained by the last forward mode. Again, we have x∗Rj+1 (ξ[Rj+1 ] ) ∈ ZRj+1 (λRj+1 (ξ[Rj+1 ] )), which allows to conclude from the assumption for j + 1 that E QRj+1 (λRj+1 (ξ[Rj+1 ] ), x∗Rj+1 (ξ[Rj+1 ] )) ξ[Rj+1 ] = λiRj < +∞. Since x∗ is a feasible solution to subproblem (3.3), we can conclude that QRj (λiRj , ϕ(x)) < +∞. Thus, QRj (λiRj , ·) is finite in the extremal points of the set ZRj (λiRj ). The convexity of QRj (λiRj , ·)’s domain allows us to conclude that QRj (λiRj , x) < +∞. To verify the assertion of the corollary, we can now continue as done in the proof of Proposition 3.2.3. Remark 3.2.5. If the sets ZRj (λiRj ) are of high dimension and large cardinality, the determination of their convex hulls becomes numerically expensive. Note that the convexity property of the aggregation operation is needed only during the last iteration of the decomposition algorithm. Thus, it is reasonable to solve the problem with the faster “ordinary” aggregation and then pursue with convex aggregation. Remark 3.2.6. In order to solve the subproblem (3.6) for the elements of ZRj (λiRj ), one has to solve many linear programs that differ only in their right hand sides. Due to the possibility of warm starts, this can be done efficiently if the points in Zj (λiRj ) are arranged in a suitable order. Such an arrangement can be obtained by certain bunching methods, see, e.g., Wets (1983), Birge (1985), Haugland and Wallace (1988), Gassmann (1990).
3.2.3
Upper Bounds
The parameter ρ allows one to control the accuracy of the aggregation; it can be seen from Table 3.3 in Section 3.4.2 that the choice of ρ has a significant influence on the running time of the solution algorithm. On the one hand, it is shown by Proposition 3.2.3 and Corollary 3.2.2 that the algorithm converges to an optimal solution of (3.4) as ρ tends to 0 whenever the initial problem is feasible. On the other hand, the difference |v − v L | may be overestimated by the inequality (3.13) and the constant C is in general not known. Consequently, it is not a priori clear how much the parameter ρ has to be decreased to ensure a given error tolerance for the optimal value. In this section we construct certain upper approximations of the recourse functions in order to estimate the accuracy of the supporting plane approximations. In the next section, these upper bounds are used to construct suitable adaptive stopping criteria.
52
Chapter 3. Recombining Trees for Multistage Stochastic Programs
L The lower bounds wR have been constructed recursively by replacing the j recourse function QRj+1 in (3.3) by a lower approximation based on the next L , by solving the resulting problem (3.6) in some points xRj , stage bounds wR j+1 and by using subgradient information from the solution of the subproblem (3.6). U In order to obtain upper bounds wR on QRj , we proceed similarly. We rej U place the recourse function QRj+1 in (3.3) by some upper approximation wR j+1 and solve the resulting problem in some points xRj . The resulting optimal values QURj (λiRj , xRj ) provide upper bounds on the values QRj (λiRj , xRj ). FutherU more, the mapping QRj is convex in xRj . Thus, the mapping wR (λiRj ) can be j constructed from convex combinations of the upper bounds QURj (λiRj , xRj ). More formally, we define recursively the following overestimating funcU tions. We set wR (·, ·) 0 and for j = n, . . . , 0, we set n+1
QURj (λiRj , xRj )
(3.17)
Rj+1
inf E x
t=Rj +1
bt (ξt ), xt (ξ[t] ) U i (λ (ξ ), x (ξ )) = λ + wR ξ R [R ] R [R ] [R ] Rj j+1 j+1 j+1 j+1 j j+1
s.t. xt ∈ Xt , t = Rj + 1, . . . , Rj+1 , x ∈ Mm [Rj +1,Rj+1 ] , At,0 (ξt )xt (ξ[t] ) + At,1 (ξt )xt−1 (ξ[t−1] ) = ht (ξt ), t = Rj + 1, . . . , Rj+1 . U are defined as follows. Consider some For j = n, . . . , 1, the mappings wR j i λRj ∈ ΛRj and assume that the term QURj (λiRj , xRj ) has been evaluated for points xRj in a certain finite set YRj (λiRj ) ⊂ XRj . The construction of YRj (λiRj ) is detailed in the following section. The upper bounding approxiU mation wR (λiRj , ·) of QRj (λiRj , ·) is then defined by a (minimal) interpolation j from the values of QURj (λiRj , ·) on the set YRj (λiRj ): U wR (λiRj , xRj ) min αy QURj (λiRj , y) j (αy )
y∈YRj (λiR ) j
s.t. αy ≥ 0 ∀y ∈ YRj (λiRj ), αy = 1, y∈YRj (λiR )
j
α y y = x Rj ,
y∈YRj (λiR ) j
3.2. An Enhanced Nested Benders Decomposition
53
U where wR (λiRj , xRj ) is set to +∞ whenever the problem on the right side is j U (λiRj , ·) is equal to the convex hull of infeasible. Note that the domain of wR j i i YRj (λRj ). Hence, the sets YRj (λRj ) have to be large enough to obtain meaningful bounds. In the extended decomposition that is detailed by Algorithm 3.4, the sets YRj (λiRj ) are generated adaptively as the sets ZRj (λiRj ). Due to the convexity of QLRj (λiRj , ·) and the definition of QURj , we can complete the estimate (3.10) to L wR (λiRj , x¯Rj ) ≤ QLRj (λiRj , x¯Rj ) ≤ QRj (λiRj , x¯Rj ) j U (λiRj , x¯Rj ) (3.18) ≤ QURj (λiRj , x¯Rj ) ≤ wR j
for every i = 1, . . . mRj , x¯Rj ∈ XRj , and j = 1, . . . , n. Furthermore, we U L have QLR0 ≤ QR0 ≤ QUR0 . Thus, the term provides Q RL0 − QR0 provides an upper bound for the deviation of the optimal values QR0 − QR0 during the decomposition. As soon as this bound is sufficiently small, it is reasonable to terminate the algorithm with some positive value of the aggregation parameter ρ, even if the generation of cuts has not been ended yet.
3.2.4
Extended Solution Algorithm
The upper bounds from the previous section allow further enhancements of the decomposition algorithm. Indeed, let us consider some ε > 0 and assume that the inequality U L wR (λiRj , xRj ) − wR (λiRj , xRj ) ≤ ε j j
(3.19)
is true for some i ∈ {1, . . . mRj }, xRj ∈ XRj , and j ∈ {1, . . . , n}. Having in mind the relation (3.18), the value QLRj (λiRj , xRj ) is an ε-approximation of QRj (λiRj , xRj ). It is thus not necessary to evaluate QLRj (λiRj , xRj ) in order to generate additional cuts. This idea is applied by the extended Algorithm 3.4. The following Proposition extends Corollary 3.2.2 by showing that the consistency of the decomposition algorithm is preserved under this local stopping criterion for the cut generation. Proposition 3.2.7. Consider Algorithm 3.2 with convex aggregation of decision points and assume that the subproblems are solved with Algorithm 3.4. Then the decomposition algorithm stops after a finite number of steps either with reporting the infeasiblity of the initial problem, or with a solution of QL0 that fulfills |v − v L | < L(n + 1) · ρ + n · ε (3.20)
54
Chapter 3. Recombining Trees for Multistage Stochastic Programs
Algorithm 3.4. Handling a subproblem with aggregation of decision points and upper bounds Set λ λiRj ∈ ΛRj . IF j = 0 AND forward mode: IF problem QL 0 has been updated: Solve problem QL 0. IF problem QL is infeasible: TERMINATE. 0 IF problem QU 0 has been updated: Solve problem QU 0. L IF QU 0 > Q0 + ε: Add the points xR1 (ξ[R1 ] ) from the solution of QL 0 to ZR1 (λR1 (ξ[R1 ] )). IF j > 0 : IF forward mode : Aggregate ZRj (λ) to ZˆRj (λ). U IF forward mode OR QL Rj (·, λ) or QRj (·, λ) have been updated: ˆ FOR ALL xRj ∈ ZRj (λ) DO: U (λ, x ) > w L (λ, x ) + ε: IF wR Rj Rj Rj j Solve the problem QL (λ, xRj ). Rj IF QL (λ, x ) = +∞: Rj Rj IF U (λ, xRj ) = +∞: TERMINATE. L (λ, ·). RETURN. ELSE: Construct a feasibility cut, update wR j L L ELSE IF QRj (λ, xRj ) ≥ wRj (λ, xRj ): L (λ, ·). Construct an optimality cut and update wR j U L IF wRj (λ, xRj ) > wRj (λ, xRj ) + ε: Compute QU Rj (λ, xRj )). U (λ, x ) ≥ QU (λ, x ): IF wR Rj Rj Rj j U (λ, ·) by adding x Update wR Rj to YRj (λ). j U (λ, x ) > w L (λ, x ) + ε: IF forward mode AND j < n AND wR Rj Rj Rj j Add the points xRj+1 (ξ[Rj+1 ] ) from the solution of QL (λ, xRj ) Rj to ZRj+1 (λRj+1 (ξ[Rj+1 ] )). IF backward mode: Set ZRj (λ) ∅.
for some constant L ≥ 0. Thereby, L is given by L 2 · maxj=1,...,n Lj , where L Lj is the Lipschitz constant of the mappings QRj (λiRj , ·) and wR (λiRj , ·) that j has been introduced in the proof of Proposition 3.2.3. Proof. The iteration of Algorithm 3.2 for some value of ρ > 0 ends as soon as no new cuts have been generated. The polyhedral form of the recourse functions thus ensures the finiteness of this iteration, cf. Theorem 18
3.3. Construction of Recombining Trees
55
of Ruszczyński (2003). We have to verify that (3.20) is true whenever the algorithm stops with v L < +∞. To this end, we consider the following relation: L i (λ , x) ≤ L(n + 1 − j) · ρ + (n − j) · ε, x ∈ ZRj (λiRj ). QRj (λiRj , x) − wR Rj j (3.21) Obviously, (3.21) holds for j = n + 1. Let us now consider some arbitrary x ∈ ZRj (λiRj ) and the associated point ϕ(x) ∈ ZˆRj (λiRj ). We recall the estimate (3.15) and proceed as in the proof of Proposition 3.2.3 to write L i i L i (λ , x) ≤ 2L ρ + (λ , ϕ(x)) − w (λ , ϕ(x)) QRj (λiRj , x) − wR Q j R R R R R j j j j j j (3.22) Note that no cut has been generated for QLRj (λiRj , ϕ(x)), U L (λiRj , ϕ(x)) − wR (λiRj , ϕ(x)) ≤ ε, or QLRj (λiRj , ϕ(x)) has wR j j L (λiRj , ϕ(x)). and has been equal to wR j
because either been evaluated
In the former case, the second summand on the right side of (3.22) is also smaller than ε, which proves (3.21) for Rj . In the latter case, we conclude, as done in relation (3.16), that L i (λ , ϕ(x)) QRj (λiRj , ϕ(x)) − wR R j j ≤ L(n + 1 − (j + 1)) · ρ + (n − (j + 1)) · ε. (3.23) Thus, (3.21) is also true for Rj in this case. It follows by recursion that (3.21) is true for j = 1, from where the assertion (3.20) follows by the definition of QR0 . Remark 3.2.8. As in Algorithm 3.3, the command “RETURN” in Algorithm 3.4 returns to the main loop of the decomposition. The command “TERMINATE” quits the decomposition algorithm. We further note that U at the beginning of Algorithm 3.2 the upper bounds wR are initialized with j U wRj M for some (large) constant M ≥ 0. Numerical experiences with the decomposition method are provided in Section 3.4. In the following section, we focus on the construction of appropriate scenario trees as input data for the decomposition, i.e., scenario trees that are recombining in the sense of relation (3.5).
3.3
Construction of Recombining Trees
In this section we drop the finiteness assumption on the underlying process ξ. We point out how a process ξ˜ can be constructed that approximates ξ, takes
56
Chapter 3. Recombining Trees for Multistage Stochastic Programs
only a finite number of values, and whose distribution can be represented by a recombining tree. For the sake of notational simplicity, we follow the notation of Ruszczyński and Shapiro (2003a) and denote for values 0 ≤ t ≤ t ≤ T the vector (ξt , . . . , ξt ) by ξ[t ,t] . For the generation of scenario trees, a variety of approaches have been developed by, e.g., Dupačová et al. (2000), Pflug (2001), Dupačová et al. (2003), Dempster (2004), Bally et al. (2005), Casey and Sen (2005), Mirkov and Pflug (2007), Heitsch and Römisch (2008). The forward tree construction of Heitsch and Römisch (2008) generates a scenario tree approximating a given stochastic process ξ by iteratively approximating the conditional distributions i P ξt ∈ · ξ[t−1] ∈ Ct−1 , (3.24) i given a finite partition Ct−1 of supp P ξ[t−1] ∈ · . For i ∈ i =1,...,nt−1
{1, . . . , nt−1 } this is done by choosing a number of nit points cit , i cit ∈ supp P ξt ∈ · ξ[t−1] ∈ Ct−1 ,
such that the term
i E d ξt , πti (ξt ) ξ[t−1] ∈ Ct−1
is small. Here, d(·, ·) is some appropriate measure of distance on Rs and πti is a (d(·, ·)-nearest neighbour) projection on the set {cit : i = 1, . . . , nit }. The i i mapping πt defines a (d(·, ·)-Voronoi) partition (Vt )i=1,...,ni of supp P[ξt ∈
i ] by Vti (πti )−1 cit . Then we define12 · | ξ[t−1] ∈ Ct−1
t
i Cti Ct−1 × Vti .
Recall that nt denotes the number of nodes at time t, i.e., nt = | supp P[ξ˜[t] ∈ ·]|,
and nit represents the number of nodes at time t descending from node i at time t. While the values cit are the possible realizations of the tree process ˜ the corresponding transition probabilities are defined by ξ, it−1 it−1 ) P ξt ∈ Vti | ξ[t−1] ∈ Ct−1 . (3.25) P ξ˜t = cit ξ˜[t−1] = (ci11 , . . . , ct−1 12 For the sake of notational simplicity, the relation between the various indices i is not it−1 it−1 in (3.25) has to be of the form V1i1 × . . . × Vt−1 . detailed here. In particular, the set Ct−1 We refer to the Algorithms 3.5-3.7 for a more rigorous presentation of our approach.
3.3. Construction of Recombining Trees
57
In order to approximate a Markov process by a recombining scenario tree, Bally et al. (2005) approximate the marginal distributions P [ξt ∈ · ] instead of the conditional distributions in (3.24), i.e., they choose nt points cit ∈ supp P [ξt ∈ · ] such that E d ξt − πt (ξt ) is small, where πt is again a projection on the set {cit , i = 1, . . . , nt }. Using again the Voronoi partition (Vti )i=1,...,nt , the tree process ξ˜ is then defined as a Markov process with the transition probabilities i P ξ˜t = cit ξ˜t−1 = cit−1 P ξt ∈ Vti | ξt−1 ∈ Vt−1 . (3.26)
3.3.1
A Tree Generation Algorithm
In order to construct a recombining tree we propose an approach that combines the methods discussed by Bally et al. (2005) and Heitsch and Römisch (2008); it is detailed in Algorithm 3.5. Algorithm 3.5. Generation of a recombining scenario tree. (1) Set mR0 1, ξ0 Δ, and C¯R0 = {Δ} for some auxiliary value Δ ∈ Rs . FOR j = 0, . . . , n: (i) IF j > 0: Recombining at time stage Rj with Algorithm 3.6. (ii) Subtree construction for time period [Rj + 1, Rj+1 ] with Algorithm 3.7. Basically, Algorithm 3.5 proceeds by constructing non-recombining trees for every time interval [Rj−1 + 1, Rj ] and then recombines scenarios at time Rj by assigning the same subtree to several nodes at time Rj . Thereby, two nodes at time Rj obtain the same subtree if ξ’s values in these nodes are close during the time interval [Rj − τ, . . . , Rj ] with some value τ ∈ [0, Rj − (Rj−1 + 1)]. Since the particular subtree has to approximate the distribution of (ξRj +1 , . . . , ξT ), this approach is reasonable whenever for all t = 2, . . . , T the distribution P [(ξt+1 , . . . , ξT ) ∈ · | (ξ1 , . . . , ξt ) = (ξ1 , . . . , ξt )] depends continuously on (ξt−τ , . . . , ξt ) in some sense. This issue will be detailed in Section 3.3.2, where we apply the stability results of Chapter 2 to derive an upper bound on the perturbation of the optimal value under this tree approximation and establish a certain consistency of Algorithm 3.5.
58
Chapter 3. Recombining Trees for Multistage Stochastic Programs
Algorithm 3.6. Recombining at time stage Rj . (k) Approximate Ξ[Rj −τ,Rj ] supp P ξ[Rj −τ,Rj ] ∈ · by choosing points c¯Rj ∈ (k) Ξ[Rj −τ,Rj ] , k = 1, . . . , mRj and a partition (C¯Rj , k = 1, . . . , mRj ) of Ξ[Rj −τ,Rj ] such that (k) (k) (k) ε¯Rj E ξ[Rj −τ,Rj ] − c¯Rj ξ[Rj−1 −τ,Rj ] ∈ C¯Rj (k) is small, and every C¯Rj fulfills mR (k) i,(k ) C¯Rj = ∪k =1j−1 ∪i∈{1,...,n(k ) }: {ξ[Rj −τ,Rj ] : ξ[Rj−1 −τ,Rj ] ∈ CRk }. Rj λ(i,k )=k
(k )
Thereby, λ(·, k ) : {1, . . . , nRj } → {1, . . . , mRj } is a mapping with (λ(i,k )) i,(k )
ξ[Rj−1 −τ,Rj ] ∈ CRj E ξ[Rj −τ,Rj ] − c¯Rj (k) i,(k ) = min E ξ[Rj −τ,Rj ] − c¯Rj ξ[Rj−1 −τ,Rj ] ∈ CRj k=1,...,mRj
for k = 1, . . . , mRj−1 .
Transition Probabilities i,(k) The possible values ct of the process ξ˜ are defined by Algorithm 3.7. These ˜ values define ξ’s distribution together with the following transition probabilities. We set P[ξ˜1 = ξ1 ] = 1 and define for t = Rj + 2, . . . , Rj+1 the transition (k) probability for ξ˜ passing from node i ∈ {1, . . . , nt−1 } of subtree k to node (k) i ∈ {1, . . . , nt } of subtree k by i,(k) ˜ i ,(k) P ξ˜t = ct ξ[1,Rj ] = z, ξ˜[Rj +1,t−1] = c[Rj +1,t−1] i,(k) i ,(k) P ξ t ∈ Vt (3.27) ξ[Rj −τ,t−1] ∈ Ct−1 , i ,(k) for every z ∈ RRj ·s with (z, c[Rj +1,t−1] ) ∈ supp P[ξ˜[1,t−1] ∈ ·]. Note that this transition probability does only depend on the values of ξ˜[R +1,...,t−1] . j
(k )
The transition probability of passing from node i ∈ {1, . . . , nRj } of one subtree k ∈ {1, . . . , mRj−1 } at time Rj to node i of another subtree k ∈
3.3. Construction of Recombining Trees
59
Algorithm 3.7. Subtree construction for time period [Rj + 1, Rj+1 ]. FOR k = 1, . . . , mRj : Set t = Rj + 1. Approximate 1,(k)
Ξt
(k) supp P ξt ∈ · ξ[Rj −τ,Rj ] ∈ C¯Rj
i,(k)
1,(k)
(k)
, i = 1, . . . , nt , such that 1,(k) (k) E ξt − πt (ξt ) ξ[Rj −τ,Rj ] ∈ C¯Rj
by choosing points ct 1,(k)
εt
1,(k)
is small, with πt
∈ Ξt
i,(k)
(ξt ) argmini=1,...,n(k) ξt − ct
1,(k)
for ξt ∈ Ξt
.
(k) i,(k) i,(k) i,(k) Set ⊂ C¯Rj × Vt , c[Rj +1,t] ct , and define the transition probabilities according to (3.28). i,(k) Vt
1,(k) i,(k) (πt )−1 (ct )
t
1,(k) i,(k) Ξt , Ct
FOR t = Rj + 2, . . . , Rj+1 : (k) FOR i = 1, . . . , nt−1 : i −1 i,(k) Set l i=1 nt . Approximate i ,(k) i ,(k) Ξt P ξt ∈ · ξ[Rj −τ,t−1] ∈ Ct−1 i ,(k)
i,(k)
i ,(k)
∈ Ξt , i = l + 1, . . . , l + nt , such that by choosing points ct i ,(k) i ,(k) i ,(k) εt E ξt − πt (ξt ) ξ[Rj −τ,t] ∈ Ct−1 is small, with i ,(k)
πt
Set
(ξt ) argmin
i ,(k)
i=l+1,...,l+nt
i,(k)
Vt
i,(k) Ct i,(k) c[Rj +1,t]
i ,(k) −1
(πt
)
i,(k)
ξt − ct i,(k)
(ct
i ,(k)
for ξt ∈ Ξt
i ,(k)
) ⊂ Ξt
.
,
i ,(k) i,(k) Ct−1 ×Vt , i ,(k) i,(k) (c[Rj +1,t−1] , ct ).
Define the transition probabilities according to (3.27).
{1, . . . , mRj } at time Rj + 1 is given by i,(k) i ,(k ) P ξ˜Rj +1 = cRj +1 ξ˜[1,Rj−1 ] = z, ξ˜[Rj−1 +1,Rj ] = c[Rj−1 +1,Rj ] i,(k) (k) if k = λ(i , k ), P ξRj +1 ∈ VRj +1 ξ[Rj −τ,Rj ] ∈ C¯Rj 0 else.
(3.28)
Remark 3.3.1. Algorithm 3.6 assigns two nodes at time Rj the same subtree whenever the corresponding values of ξ on the interval [Rj − τ, Rj ] fall into
60
Chapter 3. Recombining Trees for Multistage Stochastic Programs
(k) the same cluster C¯Rj . The value mRj represents the number of different (k)
subtrees originated at time Rj , the term nt stands for the number of nodes i ,(k) of subtree k at time t, and nt is the number of nodes at time t that descend from node i at time t − 1 in subtree k. These values determine the structure of the scenario tree; they can either be predefined, or, as proposed by Heitsch and Römisch (2008), determined within the tree generation procedure to not exceed certain local error levels. It has not yet specified how to choose the i,(k) (k) and c¯Rj . For this purpose, the above cited tree discretization points ct generation algorithms use a stochastic gradient method (Bally et al., 2005) and a heuristic choice out of a set of simulated values (Heitsch and Römisch, 2008), respectively. Both methods can be applied in our framework, too. For our numerical experiments, we implemented a variant of the latter forward selection heuristics.
3.3.2
Consistency of the Tree Generation Algorithm
In this section we show that the recombining tree generation algorithm is consistent in the following sense. The optimal value of the approximated problem converges to the initial optimal value as the discretization errors (k) i,(k) ε¯Rj and εt appearing in the Algorithms 3.6 and 3.7 become uniformly small. To this end, we make the following assumption that is discussed in Remark 3.3.3 below. Assumption 3.3.2. The following conditions are fulfilled: ˜ are bounded by a constant. (i) The values ξ ξ[T ] ∞ and ξ
(ii) The process ξ can be written by the following recursion ξt+1 = g(ξ[t−τ,t] , εt+1 ),
(3.29)
with εt , t ≥ 1, being independent random vectors, and g is a mapping satisfying
g(ξ[t−τ,t] , ε) − g(ξˆ[t−τ,t] , ε) ≤ L
t
ξt − ξˆt ,
(3.30)
s=t−τ
for every ξ[t−τ,t] , ξˆ[t−τ,t] ∈ supp P[ξ[t−τ,t] ∈ ·] and ε ∈ supp P[εt+1 ∈ ·]. We further assume that for t ≥ 1 the sets Rg (ξ[t−τ,t] ) g(ξ[t−τ,t] , ε) : ε ∈ supp P[εt+1 ∈ ·] (3.31) coincide for all values ξ[t−τ,t] ∈ supp P[ξ[t−τ,t] ∈ ·].
3.3. Construction of Recombining Trees
61
(iii) For t = 2, . . . , T , the measure P[ξt ∈ ·] is absolutely continuous w.r.t. the Lebesgue measure on Rs . Remark 3.3.3. The boundedness condition (i) is due to notational simplicity. Condition (ii) is related to Assumption 2.2.6 in Chapter 2. Note that in contrast to Chapter 2, (ii) ensures the continuity of the conditional distribution of ξt+1 with respect to the vector ξ[t−τ,t] instead of continuity with respect to ξ[1,t] . The explicit dynamics assumed in (ii) and condition (iii) allow us to establish an upper bound for some specific distance between the distributions i,(k) λ(i,k) P[ξ[Rj +1,T ] ∈ ·|ξ[Rj−1 −τ,Rj ] ∈ CRj ] and P[ξ[Rj +1,T ] ∈ ·|ξ[Rj −τ,Rj ] ∈ C¯Rj ] , see inequality (3.55) and Lemma 3.3.7 below. Such a distance is relevant because Algorithm 3.5 approximates the latter distribution instead of the former one in order to construct the same subtree for different nodes. The coincidence of the sets Rg (·) defined by (3.31) implies that the set supp P[ξt+1 ∈ ·|ξ[t−τ,t] = ξ[t−τ,t] ] does not depend on the value of ξ[t−τ,t] . In particular, this ensures that the process ξ˜ constructed by Algorithm 3.5 takes values in Ξ[T ] (which is a necessary condition for the stability results of Chapter 2). The uniform Lipschitz condition (3.30) can be relaxed to hold only locally with a polynomially growing Lipschitz factor. Furthermore, as in Lemma 2.2.8, the Lipschitz constant on the right side of (3.30) may be allowed to depend on ε and the mappings g may also depend on t. Proposition 3.3.4. Assume complete recourse, boundedness of Xt for t = 1, . . . , T , and non-random recourse matrices At,0 for t = 2, . . . , T . Then, under Assumption 3.3.2, there exists a constant L ≥ 0 such that ˜ ≤ L μ(ξ, ξ) ˜ |v(ξ) − v(ξ)|
(3.32)
with ⎫ ⎧ f = (ft )Tt=1 , T ⎬ ⎨ ˜ inf E ξt − ft (ξ[t] ) : ft : Ξ[t] → Ξt measurable, . μ(ξ, ξ) ⎭ ⎩ t=1 d f (ξ) = ξ˜ (3.33) Proof. Let us verify the conditions of Corollary 2.5.2. The boundedness of Ξ[T ] and Xt , t = 1, . . . , T yield that the conditions of the Assumptions 2.2.1 as well as condition (ii) of Assumption 2.2.6 hold true. Furthermore, the ˜ In order to verify conditions of Assumption 2.2.3 hold true for both ξ and ξ.
62
Chapter 3. Recombining Trees for Multistage Stochastic Programs
condition (i) of Assumption 2.2.6, we consider some f ∈ Fmt+1 +1 (Ξ[t+1] ) and write E[f (ξ[t+1] )|ξ[t] = ξ[t] ] − E[f (ξ[t+1] )|ξ[t] = ξˆ[t] ] = E f ξ[t] , g(ξ[t−τ,t] , εt+1 ) − E f ξˆ[t] , g(ξˆ[t−τ,t] , εt+1 ) mt+1 ≤ E max 1, ξ[t] , g(ξ[t−τ,t] , εt+1 ) , ξˆ[t] , g(ξˆ[t−τ,t] , εt+1 )
· max ξ[t] − ξˆ[t] , g(ξ[t−τ,t] , εt+1 ) − g(ξˆ[t−τ,t] , εt+1 ) . Due to the boundedness of Ξ[T ] , the first factor within the expectation on the right side of the inequality is bounded by a constant K. The second factor can be further estimated by using the Lipschitz continuity of g, see (3.30). ¯ such that the expectation is smaller Consequently, we can find a constant K, ˆ ¯ than K ξ[t] − ξ[t] . Thus, condition (i) of Assumption 2.2.6 holds true. The coincidence of the sets Rg (·) assumed in condition (ii) of Assumption 3.3.2 d and the constraint f (ξ) = ξ˜ ensure that ξ˜ takes values in Ξ[T ] (whenever ˜ < +∞). μ(ξ, ξ) Having now verified all conditions of Corollary 2.5.2, we obtain inequality (2.33) that simplifies due to the boundedness assumption on Ξ[T ] to the asserted inequality (3.32). In order to verify the consistency of Algorithm 3.5, we have to estimate ˜ defined by (3.33). To this end, let us consider for j = 1, . . . , n the term μ(ξ, ξ) the following projection mappings that are induced from the Algorithms 3.6 and 3.7, respectively: (k)
π ¯Rj : supp P[ξ[Rj −τ,Rj ] ∈ ·] → {¯ cRj : k = 1, . . . , mRj } (j)
π ¯Rj (ξ[Rk −τ,Rk ] ) = c¯Rk
(j) if ξ[Rk −τ,Rk ] ∈ C¯Rk ,
(3.34)
and for j = 0, . . . , n and t = Rj + 1, . . . , Rj+1 we set
i,(k) (k) πt : supp P[ξ[Rj −τ,t] ∈ ·] → ct : i = 1, . . . , nt , k = 1, . . . , mRj i,(k)
πt (ξ[Rj −τ,t] ) = ct
i,(k)
if ξ[Rj −τ,t] ∈ Ct
.
(3.35)
i,(k) (k) Note that the set ct : i = 1, . . . , nt , k = 1, . . . , mRj coincides with supp P[ξ˜t ∈ ·]. With this at hand, we can state the following theorem:
3.3. Construction of Recombining Trees
63
¯ ≥ 0 such Theorem 4. Under Assumption 3.3.2, there exists a constant L that the following estimate holds true: ˜ ≤ μ(ξ, ξ)
Rj+1 n
E ξt − πt (ξ[jk −τ,t] )
j=0 t=Rj +1
¯ + 2L
n E ξ[Rj −τ,Rj ] − π ¯Rj (ξ[Rj −τ,Rj ] ) .
(3.36)
j=1
Before proving the theorem, we consider the following corollary that concludes the consistency of Algorithm 3.5 by relating the right side of inequality (3.36) to the discretization error made by the algorithm. Corollary 3.3.5. Under the conditions of Proposition 3.3.4, there is a constant L ≥ 0 such that for every ε > 0 the following property holds: If the approximation of the conditional distributions and the recombination i,(k) clustering in the Algorithms 3.6 and 3.7 are done such that the errors εt (k) (k) and ε¯Rj are smaller than some ε > 0 for t = Rj +1, . . . , Rj+1 , i = 1, . . . , nt−1 , k = 1, . . . , mRj , and j = 0, . . . , n, then we have that ˜ ≤ L · ε. |v(ξ) − v(ξ)|
(3.37)
Proof. For j = 0, . . . , n and t = Rj +1, . . . , Rj+1 we consider the probabilities (k) (k) (3.38) p¯Rj P ξ[Rj −τ,Rj ] ∈ C¯Rj , k = 1, . . . , mRj , (i|k) i,(k) (k) (k) pt P ξ[Rj −τ,t] ∈ Ct ξ[Rj −τ,Rj ] ∈ C¯Rj , i = 1, . . . , nt , (3.39) Note that for j ∈ {0, . . . , n} and t ∈ {Rj + 1, . . . , Rj+1 }, the summand E ξt − πt (ξ[Rj −τ,t] ) from (3.36) equals the following sum13 mRj
k=1
(k)
(k) p¯Rj
nt−1
(i |k) i ,(k) pt−1 · E ξt − πt (ξ[Rj −τ,t] ) ξ[Rj −τ,t−1] ∈ Ct−1 .
(3.40)
i =1
Due to the definition of πt , the conditional expectation equals the value i ,(k) εt defined by Algorithm 3.5. Consequently, (3.40) is smaller than ε by assumption, and, thus, the first sum on the right side of (3.36) is bounded Rj+1 ε = T · ε. from above by nj=0 t=R j +1 13
(k)
(1|k)
For the sake of notational simplicity, we set nRj = 1 and pR0
= 1.
64
Chapter 3. Recombining Trees for Multistage Stochastic Programs
Analogously, we have for j ∈ {1, . . . , n} that ¯Rj (ξ[Rj −τ,Rj ] )
E ξ[Rj −τ,Rj ] − π mRj (k) (k) (k) p¯Rj E ξ[Rj −τ,Rj ] − c¯Rj ξ[Rj −τ,Rj ] ∈ C¯Rj , = k=1 (k)
where the k-th conditional expectation on the right side is equal to ε¯Rj . The second sum on the right side of (3.36) does thus not exceed n · ε, where n denotes the number of recombination time stages Rj . Consequently, the assertion of the corollary is verified by setting L T + n. The remaining part of this section is devoted to the proof of Theorem 4. Proof of Theorem 4. As done in Corollary 3.3.5, we write the asserted inequality (3.36) in the following form: m
˜ ≤ μ(ξ, ξ)
Rj n
(k)
p¯Rj
j=0 k=1
Rj+1 t=Rj +1
mRj
¯ + 2L
n
(k) E ξt − πt (ξ[Rj −τ,t] ) ξ[Rj −τ,Rj ] ∈ C¯Rj
(k) (k) (k) p¯Rj E ξ[Rj −τ,Rj ] − c¯Rj ξ[Rj −τ,Rj ] ∈ C¯Rj .
j=1 k=1
We will prove this inequality by recursion. To this end, let us consider for l = −1, . . . , n, the following estimate ˜ μ(ξ, ξ)
⎡
m
≤
Rj l
(k) p¯Rj E ⎣
j=0 k=1 min{l+1,n} mRj
¯ +2L mRl+1
+
k=1
j=1
k=1
(k)
p¯Rl+1
⎤ (k)
ξt − πt (ξ[Rj −τ,t] ) ξ[Rj −τ,Rj ] ∈ C¯Rj ⎦ t=Rj +1
Rj+1
(k) (k) (k) p¯Rj E ξ[Rj −τ,Rj ] − c¯Rj ξ[Rj −τ,Rj ] ∈ C¯Rj
inf
f with (3.44),(3.45)
T
(3.41)
(3.42)
(k) E ft (ξ[Rl+1 −τ,t] ) − ξt ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1
t=Rl+1 +1
(3.43) with the conditions f = (ft )Tt=Rl+1 +1 , ft : supp P ξ[Rl+1 −τ,t] ∈ · → supp P [ξt ∈ ·] measurable (3.44)
3.3. Construction of Recombining Trees
65
and (k) P[(ft (ξ[Rl+1 −τ,t] ))Tt=Rl+1 +1 ∈ · | ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1 ] ˜ ˜i,(k) = P[ξ˜[Rl+1 +1,T ] ∈ · | ξ˜[Rl +1,Rl+1 ] = c[Rl +1,Rl+1 ] ]. (3.45) ˜ ˜i,(k)
Thereby, c[Rl +1,Rl+1 ] denotes an arbitrary node of the scenario tree at time Rl+1 , that obtains the k-th subtree for the subsequent time stage [Rl+1 + ˜ = k. 1, Rl+2 ], i.e., we have λ(˜i, k) For l = −1, the right hand side of the inequality (3.41) consists only of the ˜ (3.41) holds true as an equality. term (3.43), and, by the definition of μ(ξ, ξ), For l = n, inequality (3.41) is just the assertion of Theorem 4. Let us now assume that the inequality (3.41) holds true for some l ∈ {−1, . . . , n − 1}. We have to show that it is also true for l + 1. To this end, we consider the k-th infimum in (3.43) for some k ∈ {1, . . . , mRl+1 }, i.e., T
inf
f with (3.44),(3.45)
(k) (3.46) E ft (ξ[Rl+1 −τ,t] ) − ξt ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1
t=Rl+1 +1
The infimum in (3.46) increases if we consider only those mappings f = (fRl+1 +1 , . . . , fT ) that coincide until time Rl+2 with the (conditional) projection πt defined by (3.35). We have thus the following upper bound for (3.46):
Rl+2 t=Rl+1 +1
(k) E ξt − πt (ξ[Rl+1 −τ,t] ) ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1
+ inf
T t=Rl+2 +1
(k) E ξt − ft (ξ[Rl+1 −τ,t] ) ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1 :
f with (3.44), (3.45), ft = πt for t = Rl+1 + 1, . . . , Rl+2 . (3.47) If l = n − 1, the infimum vanishes and the assertion of the theorem is proved. Let us now assume that l ≤ n − 2 and focus on the infimum in (3.47). The construction of the k-th subtree for the time period [Rl+1 + 1, Rl+2 ] yields
(k)
ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1
(k)
nR
=
l+2
i=1
i,(k) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 .
66
Chapter 3. Recombining Trees for Multistage Stochastic Programs
Therefore, we can write the infimum in (3.47) as follows: (k)
inf
n Rl+2
T
(i|k) pRl+1
t=Rl+2 +1
i=1
i,(k) E ξt − ft (ξ[Rl+1 −τ,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 :
f with (3.44), (3.45), ft = πt for t = Rl+1 + 1, . . . , Rl+2 , (i|k)
with pRl+1 defined by (3.39). Note that the mappings ft , t ≥ Rl+2 + 1, are allowed to depend on ξ[Rl+1 −τ,Rl+2 ] . The infimum increases if we consider only those mappings ft , t ≥ Rl+2 + 1, which are of the form (k)
nR
ft (ξ[Rl+1 −τ,t] ) =
l+2
i=1
1C i,(k) (ξ[Rl+1 −τ,Rl+2 ] ) · fti (ξ[Rl+2 −τ,t] )
(3.48)
Rl+2
with auxiliary mappings14 f i = (fti )Tt=Rl+2 +1 , i,(k) fti : supp P ξ[Rl+2 −τ,t] ∈ · ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 → supp P [ξt ∈ ·] , (3.49) fti measurable, (k)
for i = 1, . . . , nRl+2 . More intuitively, a mapping ft of the form (3.48) does not depend on the value of ξ[Rl+1 −τ,Rl+2 −τ −1] as long as the random vector i,(k) ξ[Rl+1 −τ,Rl+2 ] remains in the same set CRl+2 . Note that a mapping ft given by (3.48) is measurable, i.e., condition (3.44) holds true. Consequently, the infimum in (3.47) is bounded from above by the following expression: (k)
inf
n Rl+2 i=1
(i|k)
pRl+1
T t=Rl+2 +1
i,(k) E ξt − fti (ξ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 :
f with (3.49), f defined by (3.51) fulfills (3.45) , (3.50) i
14 Actually, the mappings fti also depend on the subtree index (k) that is omitted here for the sake of simplicity.
3.3. Construction of Recombining Trees
67
with f = (ft )Tt=Rl+1 +1 , ⎧ t ≤ Rl+2 , ⎨πt (ξ[Rl+1 −τ,t] ) ft (ξ[Rl+1 −τ,t] ) = n(k) Rl+2 ⎩ i=1 1C i,(k) (ξ[Rl+1 −τ,Rl+2 ] ) · fti (ξ[Rl+2 −τ,t] ) t ≥ Rl+2 + 1. Rl+2
(3.51) Within the minimization problem (3.50), the mappings fi are coupled by the identity (3.45) that enforces the compound mapping f (ξ[Rl+1 −τ,T ] ) to (j) yield the right distribution under the measure P[ · | ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1 ]. However, it turns out that condition (3.45) is fulfilled whenever each random (j) variable f i (ξ[Rl+2 −τ,T ] ), i = 1, . . . , nRl+2 has the “right” distribution under i,(k)
the measure P[ · | ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+1 ]. Indeed, let us consider a tuple of mappings (fti )Tt=Rl+2 +1 with (3.49), that fulfill the condition15 i,(k) P (fti (ξ[Rl+2 −τ,t] ))Tt=Rl+2 +1 ∈ · ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 ˜ ˜i,(k) i,(k) = P ξ˜[Rl+2 +1,T ] ∈ · ξ˜[Rl +1,Rl+1 ] = c[Rl +1,Rl+1 ] , ξ˜[Rl+1 +1,Rl+2 ] = c[Rl+1 +1,Rl+2 ] . (3.52) It is shown by Lemma 3.3.6 below that the corresponding mapping f defined by (3.51) then indeed fulfills (3.45). Thus, the infimum (3.50) increases if the condition (3.45) on f is replaced by the condition (3.52) on every f i . (k) Furthermore, this allows to minimize the summands for i = 1, . . . , nRl+2 independently. Consequently, the term (3.50) does not exceed (k)
nR
l+2 (i|k) pRl+1
i=1
inf
f i with (3.49),(3.52)
T
i,(k) i E ξt − ft (ξ[Rl+2 −τ,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 .
t=Rl+2 +1
(3.53) Having in mind the definition (3.33) of the distance μ, the i-th infimum in (3.53) can be seen as a measure of distance between the distribution i,(k) P ξ[Rl+2 +1,T ] ∈ · ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2
15
˜ ˜i,(k)
Recall that c[Rl +1,Rl+1 ] denotes an arbitrary node of the tree at time Rl+1 that obtains ˜ = k. the k-th subtree for the subsequent time period [Rl+1 + 1, Rl+2 ], i.e., we have λ(˜i, k)
68
Chapter 3. Recombining Trees for Multistage Stochastic Programs
and the (subtree) distribution on the right side of (3.52). Let us denote this distance by i,(k) ˜ ci,(k) ) μRl+2 (ξ, CRl+2 ), (ξ, [Rl+1 +1,Rl+2 ]
T i,(k) iinf E ξt − fti (ξ[Rl+2 −τ,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 . (3.54) f with (3.49),(3.52) t=Rl+2 +1
While we have estimated the infimum (3.46) by term (3.47) by employing the projection mappings πt , the idea does not work for estimating (3.54), since in general the random variable π(ξ) does not have the right distribution i,(k) under the measure P · ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 appearing in (3.54). Loosely
speaking, the idea is now to establish the following triangle inequality 16 :
i,(k) ˜ ci,(k) μRl+2 (ξ, CRl+2 ), (ξ, ) [Rl+1 +1,Rl+2 ] i,(k) (k ) (k ) ˜ ci,(k) ) , ≤ μRl+2 (ξ, CRl+2 ), (ξ, C¯Rl+2 ) + μRl+2 (ξ, C¯Rl+2 ), (ξ, [Rl+1 +1,Rl+2 ] (3.55) with k λ(i, k). Then, we will exploit the continuous dynamics of ξ (condition (ii) of Assumption 3.3.2 ) to estimate the first term on the right side (k ) i,(k) in terms of the distance of the sets C¯Rl+2 and CRl+2 . The second term can be bounded again by applying the projection mappings πt . In order to establish an estimate of type (3.55), let us consider mappings17 k i = (kti )Tt=Rl+2 +1 , (3.56) i,(k) kti : supp P ξ[Rl+2 −τ,t] ∈ · ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 → supp P[ξt ∈ ·] measurable, P
i,(k) ∈ · ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 (k ) ∈ · ξ[Rl+2 −τ,Rl+2 ] ∈ C¯Rl+2 .
(3.57)
(kti (ξ[Rl+2 −τ,t] ))Tt=Rl+2 +1
= P ξ[Rl+2 +1,T ]
(3.58)
The existence of such mappings ki is verified by Lemma 3.3.7. Then the right side of (3.54) is bounded from above by 16 The inequality (3.55) is stated only for the sake of comprehensibility and the terms on the right side are not properly defined. 17 i Note that the mappings kR also depend on the subtree index (k) that is omitted l+2 here.
3.3. Construction of Recombining Trees
inf i
T
inf i
69
i,(k) E ξt − kti (ξ[Rl+2 −τ,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2
k with f with (3.57),(3.58) (3.49),(3.52) t=Rl+2 +1 T
+
t=Rl+2 +1
i,(k) i i , E kt (ξ[Rl+2 −τ,t] ) − ft (ξ[Rl+2 −τ,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2
which can be written as
T ξt − kti (ξ[R −τ,t] ) ξ[R −τ,R ] ∈ C i,(k) inf E Rl+2 l+2 l+1 l+2 i k with (3.57),(3.58) t=Rl+2 +1
+ i inf
i,(k) E kti (ξ[Rl+2 −τ,t] )−fti (ξ[Rl+2 −τ,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 .
T
f with (3.49),(3.52) t=Rl+2 +1
(3.59) In order to estimate the inner infimum in (3.59), we assume18 that there exist mappings h(k ) with (k )
ht
: supp P[ξ[Rl+2 +1,t] ∈ ·] → supp P[ξt ∈ ·] measurable, t = Rl+1 + 1, . . . , T,
(k ) (k ) P (ht (ξ[Rl+2 +1,t] ))Tt=Rl+2 +1 ∈ · ξ[Rl+1 −τ,Rl+2 ] ∈ C¯Rl+2 i,(k) = P ξ[Rl+2 +1,T ] ∈ · ξ˜[Rl+1 −τ,Rl+2 ] = cRl+2 .
(3.60)
(3.61)
Then the compound mapping f i h(k ) ◦ k i , given by (k )
fti (ξ[Rl+2 −τ,t] ) ht ( (kti (ξ[Rl+2 −τ,t ] ))tt =Rl+2 +1 ),
(3.62)
for t = Rl+2 + 1, . . . , T , fulfills (3.49) and (3.52). Hence, taking into account only those mappings f i that allow a representation (3.62), we obtain that the inner infimum in (3.59) is bounded from above by
18 If mappings h(k ) with (3.60) and (3.61) do not exist, the infimum over these h(k ) is set to +∞ and the subsequent estimates trivially hold. Note that h(k ) does not depend on the parameters i, k appearing on the right side of (3.61), since the distribution on the right side of (3.61) is the same for all i, k with λ(i, k) = k .
70
Chapter 3. Recombining Trees for Multistage Stochastic Programs
T
inf
(k ) (kti (ξ[Rl+2 −τ,t ] ))tt =Rl+2 +1 E kti (ξ[Rl+2 −τ,t] ) − ht
h(k ) with (3.60),(3.61) t=Rl+2 +1
=
T
inf
i,(k) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 (k ) (k ) E ξt − ht (ξ[Rl+2 −τ,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ C¯Rl+2 ,
h(k ) with (3.60),(3.61) t=Rl+2 +1
where the last identity follows from (3.58). In particular, the last term does not depend on the choice of ki in the outer minimization problem (3.59). Piecing all this together, we obtain that the term (3.54) is bounded from above by inf i
T
i,(k) E ξt − kti (ξ[Rl+2 −τ,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2
k with (3.57),(3.58) t=Rl+2 +1
+
T
inf
(k ) (k ) E ξt − ht (ξ[Rl+2 −τ,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ C¯Rl+2 .
h(k ) with (3.60),(3.61) t=Rl+2 +1
(3.63) Recalling the conditions (3.58) and (3.61), these terms can be indeed seen as the μ-distances on the right side of (3.55). Furthermore, by Lemma 3.3.7, there exist mappings kti such that the sum (3.63) is bounded from above by ( (k ) i,(k) ¯ L E ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2
) (k ) (k ) ¯ + E ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2
+
inf
h(k ) with (3.60),(3.61)
T t=Rl+2 +1
(k ) (k ) E ξt − ht (ξ[Rl+2 −τ,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ C¯Rl+2 .
Collecting now the summands appearing in (3.47) and (3.53), we obtain the following upper bound for the term (3.43):
3.3. Construction of Recombining Trees mRl+1
(k) p¯Rl+1
k=1
+
t=Rl+1 +1 (k)
l+2
(k)
p¯Rl+1
(k) E ξt − πt (ξ[Rl+1 −τ,t] ) ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1
(3.64)
( (i|k) ¯ λ(i,k) i,(k) pRl+1 L E ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2
nR
mRl+1
Rl+2
71
i=1
) λ(i,k) λ(i,k) ¯ + E ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2
k=1
(3.65) (k)
mRl+1
+
T
nR (k)
p¯Rl+1
l+2
(i|k)
pRl+1
i=1
k=1
inf
h(λ(i,k)) with (3.60),(3.61)
(λ(i,k)) E ξt − ht (ξ[Rl+2 +1,t] )
t=Rl+2 +1
λ(i,k) ¯ (3.66) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2
With this at hand, we can now establish the estimate (3.41) for l + 1. To this end, we note that (3.64) equals the increment of the first sum in (3.41) when the upper index is increased from l to l + 1. The term (3.65) can be written as mRl+2 mRl+1
k =1
(k)
( i,(k) ckRl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 E ξ[Rl+2 −τ,Rl+2 ] −¯
nR (k) p¯Rl+1
l+2
(i|k) ¯ pRl+1 L
i=1
k=1
λ(i,k)=k
) k k ¯ cRl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 . +E ξ[Rl+2 −τ,Rl+2 ] −¯
Applying now the identities k
(k)
mRl+1 nRl+2
C¯Rl+2 =
k=1
i,(k)
CRl+2
and
i=1
λ(i,k)=k
(k )
p¯Rl+2 =
mRl+1
k=1
(k)
nR (k)
p¯Rl+1
l+2
(i|k)
pRl+1 , (3.67)
i=1
λ(i,k)=k
we see that (3.65) coincides with mRl+2
k =1
(k ) ¯E p¯Rl+2 2L ξ[Rl+2 −τ,Rl+2 ] − c¯kRl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ C¯Rk l+2 .
(3.68)
Thus, (3.65) is equal to the increment of (3.42) when replacing l by l + 1. Finally, we proceed as from (3.65) to (3.68) to show that the term (3.66) equals
72
Chapter 3. Recombining Trees for Multistage Stochastic Programs
mRl+2
k =1
(k ) p¯Rl+2
inf
T
(k ) k ¯ E ξt − ht (ξ[Rl+2 +1,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 .
h(k ) with (3.60),(3.61) t=Rl+2 +1
The latter term corresponds to (3.43) with l replaced by l + 1. Consequently, the estimate (3.41) holds true for l + 1 and, recursively, for l = n. This completes the proof of Theorem 4. (k)
Lemma 3.3.6. Consider a tuple of mappings mappings f i , i = 1, . . . , nRl+2 with (3.49) and the corresponding compound mapping f defined by (3.51). (k) If f i fulfills (3.52) for i = 1, . . . , nRl+2 , then f fulfills (3.45). Proof. Since the tree process ξ˜ follows a discrete distribution, we consider i,(k) (k) some c[Rl+1 +1,Rl+2 ] , i ∈ {1, . . . , nRl+2 }, and a vector b ∈ supp P[ξ˜[Rl+2 +1,T ] ∈ ·] and use identity (3.51) to write T i,(k) (k) ft (ξ[Rl+1 −τ,t] ) t=R +1 = (c[Rl+1 +1,Rl+2 ] , b) ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1 l+1 Rl+2 T i,(k) = P πt (ξ[Rl+1 −τ,t] ) t=R +1 = c[Rl+1 +1,Rl+2 ] , ft (ξ[Rl+2 −τ,t] ) t=R +1 = b l+1 l+2 (k) . ξ[R −τ,R ] ∈ C¯
P
l+1
l+1
Rl+1
By the definition (3.35) of πt and the definition (3.51) of f , the latter term equals T i,(k) (k) P ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 , fti (ξ[Rl+2 −τ,t] ) t=R +1 = b ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1 , l+2
that can be written by using the definition of the conditional probabilities in the following form (k) i,(k) P (fti (ξ[Rl+2 −τ,t] ))Tt=Rl+2 +1 = b ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1 , ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 , i,(k) (k) · P ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1 . i,(k) (k) Recall that {ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 } ⊂ {ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1 }, and, hence, the product simplifies to
i,(k) P (fti (ξ[Rl+2 −τ,t] ))Tt=Rl+2 +1 = b ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 , i,(k) (k) · P ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 ξ[Rl+1 −τ,Rl+1 ] ∈ C¯Rl+1 .
3.3. Construction of Recombining Trees
73
Now, we apply the identity (3.52) to the first factor and the definition of ˜ transition probabilities to the second one. This allows us to write the ξ’s product as ˜ ˜i,(k) i,(k) P ξ˜[Rl+2 +1,T ] = b ξ˜[Rl +1,Rl+1 ] = c[Rl +1,Rl+1 ] , ξ˜[Rl+1 +1,Rl+2 ] = c[Rl+1 +1,Rl+2 ] ˜ ˜i,(k) i,(k) · P ξ˜[Rl+1 +1,Rl+2 ] = c[Rl+1 +1,Rl+2 ] ξ˜[Rl +1,Rl+1 ] = c[Rl +1,Rl+1 ] ˜ = k. Applying again the definition of the conditional for some ˜i, k˜ with λ(˜i, k) probabilities, this product coincides with ˜ ˜i,(k) i,(k) P ξ˜[Rl+1 +1,Rl+2 ] = c[Rl+1 +1,Rl+2 ] , ξ˜[Rl+2 +1,T ] = b ξ˜[Rl +1,Rl+1 ] = c[Rl +1,Rl+1 ] . Thus, the mapping f that is defined by (3.51) indeed fulfills the condition (3.45). Lemma 3.3.7. Assume the conditions (ii) and (iii) of Assumption 3.3.2 (k) hold true. Then, for l = 0, . . . , n − 2, k = 1, . . . , mRl+1 , and i = 1, . . . , nRl+2 , there exists a tuple of mappings k i = (kti )Tt=Rl+2 +1 with (3.57) and (3.58) such that the estimate T t=Rl+2 +1
i,(k) E ξt − kti (ξ[Rl+2 −τ,t] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 ( (k ) i,(k) ¯ E ≤L ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2
) (k ) (k ) (3.69) + E ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ C¯Rl+2
¯ ≥ 0. holds true with k λ(i, k) and some constant L Proof. Let us start with establishing the existence of a measurable mapping19 i,(k) (k ) kRi l+2 : supp P[ξ[Rl+2 −τ,Rl+2 ] ∈ · | ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 ] → C¯Rl+2
(3.70)
such that the identity i,(k) P kRi l+2 (ξ[Rl+2 −τ,Rl+2 ] ) ∈ · ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 (k ) = P ξ[Rl+2 −τ,Rl+2 ] ∈ · ξ[Rl+2 −τ,Rl+2 ] ∈ C¯Rl+2 (3.71) 19 i Recall that the mapping kR also depends on the subtree index (k) that is omitted l+2 here.
74
Chapter 3. Recombining Trees for Multistage Stochastic Programs
holds true (identity (3.71) can be seen as a preliminary formulation of (3.58)) and the term i,(k) E ξ[Rl+2 −τ,Rl+2 ] − kRi l+2 (ξ[Rl+2 −τ,Rl+2 ] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 can be estimated. To this end, we consider some ε ≥ 0 and the line segment (k ) (k ) Iε c¯Rl+2 − εe1 , c¯Rl+2 + εe1 , with e1 denoting the first unit vector in Rs·(τ +1) . Condition (iii) of Assumpi,(k) tion 3.3.2 yields that the measure P[ξ[Rl+2 −τ,Rl+2 ] ∈ · | ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 ] has no atoms. Thus, this measure is isomorphic to the Lebesgue measure on the unit interval [0, 1], see, e.g., Theorem 9 in Chapter 14 of Royden (1963). Therefore, we can find a measurable mapping i,(1)
i,(k)
kRl+2 : P[ξ[Rl+2 −τ,Rl+2 ] ∈ · | ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 ] → Iε , i,(1)
i,(k)
such that the measure P[kRl+2 (ξ[Rl+2 −τ,Rl+2 ] ) ∈ · | ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 ] coincides with the normalized (one-dimensional) Lebesgue measure λε on Iε . Furthermore, the triangle inequality implies that i,(1) i,(k) E ξ[Rl+2 −τ,Rl+2 ] − kRl+2 (ξ[Rl+2 −τ,Rl+2 ] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 (k ) i,(k) ≤ E ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 + ε (3.72) Applying again the result of Royden (1963), can we find a measurable mapi,(2) (k ) ping kRl+2 : Iε → C¯Rl+2 , such that the measure λε
i,(2) z ∈ Iε : kRl+2 (z) ∈ ·
(k ) (k ) on C¯Rl+2 coincides with P[ξ[Rl+2 −τ,Rl+2 ] ∈ · | ξ[Rl+2 −τ,Rl+2 ] ∈ C¯Rl+2 ]. As in (3.72), we obtain that
i,(2)
Iε
z − kRl+2 (z) λε (dz) (k ) (k ) ≤ E ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ C¯Rl+2 + ε. (3.73) i,(2)
i,(1)
Setting now kRi l+2 kRl+2 ◦ kRl+2 entails the identity (3.71) as well as the following estimate:
3.3. Construction of Recombining Trees
75
i,(k) E ξ[Rl+2 −τ,Rl+2 ] − kRi l+2 (ξ[Rl+2 −τ,Rl+2 ] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 i,(1) i,(k) ≤ E ξ[Rl+2 −τ,Rl+2 ] − kRl+2 (ξ[Rl+2 −τ,Rl+2 ] ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 i,(1) i,(2) i,(1) +E kRl+2 (ξ[Rl+2 −τ,Rl+2 ] ) − kRl+2 kRl+2 (ξ[Rl+2 −τ,Rl+2 ] )
i,(k) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 ∈ CRi l+2 + ε
(k) ≤ E ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] i,(2) +
z − kRl+2 (z) λε (dz) Iε (k ) i,(k) ≤ E ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 (k ) (k ) + E ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ C¯Rl+2 + 2ε.
(3.74)
For t = Rl+2 + 1, . . . , T , we use the dynamics equation (3.29) of the process ξ to define the mapping kti recursively by setting20 t−1 kti (ξ[Rl+2 −τ,t] ) g ksi (ξ[Rl+2 −τ,s] ) s=R , εt . (3.75) l+2
Then the measurability condition (3.57) holds. To verify (3.58), it is sufficient to show that i,(k) P (kti (ξ[Rl+2 −τ,t] ))tt=Rl+2 ∈ · ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 (k ) = P ξ[Rl+2 −τ,t ] ∈ · ξ[Rl+2 −τ,Rl+2 ] ∈ C¯Rl+2 (3.76) holds true for t = T . For t = Rl+2 , identity (3.76) coincides with the condition (3.71) established above. Let us assume that (3.76) is true for some t ∈ {Rl+2 , . . . , T − 1}. To verify the asserted property for t + 1, we *t +1 consider an arbitrary rectangle t=R [at , bt ] (with at ≤ bt ∈ Rs ) and use l+2 −τ the definition (3.75) to obtain
20 Actually, the lower index ‘s = Rl+2 ’ in (3.75) should be ‘s = max{Rl+2 , t − 1 − τ }’. The present notation is used for notational simplicity.
76
Chapter 3. Recombining Trees for Multistage Stochastic Programs
t +1 P (kti (ξ[Rl+2 −τ,t] ))t=R ∈ l+2
+1 t
t=Rl+2 −τ
= P (kti (ξ[Rl+2 −τ,t] ))tt=Rl+2 ∈ g
i,(k) [at , bt ] ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2
t
[at , bt ],
t=Rl+2 −τ
(kti (ξ[Rl+2 −τ,t] ))tt=Rl+2 , εt +1
i,(k) ∈ [at +1 , bt +1 ] ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 .
Since (3.76) is assumed to hold for t and εt +1 is independent from ξ[1,t ] by condition (ii) of Assumption 3.3.2, we can write the latter conditional probability as t P ξ[Rl+2 −τ,t ] ∈ [at , bt ], g(ξ[Rl+2 −τ,t ] , εt +1 ) ∈ [at +1 , bt +1 ] t=Rl+2 −τ
ξ[R −τ,R ] ∈ C¯ (k ) . Rl+2 l+2 l+1
Using again the definition (3.29) of ξt +1 , we conclude that (3.76) is also true for t + 1, Consequently, k i fulfills (3.58). In order to establish the estimate (3.69), we consider some t ∈ {Rl+2 + 1, . . . , T } and use the dynamics equation (3.29) and the definition (3.75) of kti to write ξt − kti (ξ[R −τ,t] ) = g ξ[t−1−τ,t−1] , εt − g (ksi (ξ[R −τ,s] ))t−1 s=t−1−τ , εt ) l+2
l+2
t−1 ξs − ksi (ξ[R −τ,s] ). ≤L l+2 s=t−1−τ
˜ ≥ 0 such that Then it follows by recursion that we can find some constant L i,(k) E ξt − kti (ξ) ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 ˜ E ξ[R −τ,R ] − k i (ξ[R −τ,R ] ) ξ[R −τ,R ] ∈ C i,(k) ≤L Rl+2 Rl+2 l+2 l+2 l+2 l+2 l+1 l+2 ( (k ) i,(k) ˜ E ≤L ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+1 −τ,Rl+2 ] ∈ CRl+2 ) (k ) (k ) + E ξ[Rl+2 −τ,Rl+2 ] − c¯Rl+2 ξ[Rl+2 −τ,Rl+2 ] ∈ C¯Rl+2 + 2ε , where the second inequality follows from the estimate (3.74). Summing over t = Rl+2 + 1, . . . , T , and letting ε → 0 completes the proof of the lemma.
3.4. Case Study
3.4
77
Case Study
In order to study the numerical properties of the decomposition approach presented in Section 3.2, the approach has been applied to a power scheduling problem that has been formulated as a linear multistage stochastic program. The term power scheduling refers to cost-optimal planning of electric power production. Various versions and extension of this problem have been studied by, e.g., Dentcheva and Römisch (1998), Nowak and Römisch (2000), and Nürnberg and Römisch (2002). The problem is introduced in Section 3.4.1. Section 3.4.2 presents numerical results.
3.4.1
A Power Scheduling Problem
We consider a power generating system consisting of several coal and gas fired thermal units, a pumped hydro storage plant, and a wind power plant. The objective is to find a cost-optimal operation plan of the thermal units and the hydro unit that allows to cover the demand, while the production of the wind power plant is uncertain. We denote the thermal power plants by indices i ∈ I for some index set I and denote the operation level of the thermal unit i ∈ I at time t by pi,t , i.e., pi,t indicates how much units of electrical energy are produced by unit i at timestage t. The variable lt presents the fill level of the water storage at time t, whereas vt and wt denote the operation level21 of the pump and the turbine of the hydro storage plant, respectively. Deterministic parameters of the problem are given by upper and lower operation bounds for the thermal units pi < p¯i , i ∈ I, for the pump v¯ > 0, and for the turbine w ¯ > 0, the capacity of the water storage ¯l > 0, the fill levels lin and lend of the storage at the beginning and at the end of the considered time horizon, the efficiency of the pump η, the fuel costs bi for the termal units, the power gradient δ for the thermal units22 , and the (time dependent) energy demand dt . The wind power production κt at time t is a stochastic parameter. The power scheduling problem is modeled by the following linear multistage stochastic program.
21 The fill level lt and the operation levels vt and wt are also given in units of electric energy. This is precised by the relations (3.77), (3.78), and (3.80). 22 The power gradient indicates how much the operation level can be changed over time, see constraint (3.79).
78
Chapter 3. Recombining Trees for Multistage Stochastic Programs min
E
T
bi pi,t
t=1 i∈I
s.t.
l1 = lin − (w1 − ηv1 ),
lT ≥ lend ,
(3.77)
lt = lt−1 − (wt − ηvt ), pi − pi ), |pi,t − pi,t−1 | ≤ δ(¯ pi,t + (wt − vt ) + κt ≥ dt , i∈I
pi,t ≤
i∈I
t = 2, . . . , T,
(3.78)
i ∈ I, t = 2, . . . , T,
(3.79)
t = 1, . . . , T,
(3.80)
t = 1, . . . , T,
(3.81)
p¯i − βdt ,
i∈I
pi ≤ pi,t ≤ p¯i , 0 ≤ vt ≤ v¯, 0 ≤ wt ≤ w, ¯ 0 ≤ lt ≤ ¯l,
i ∈ I, t = 1, . . . , T.
The initial and final fill level and the dynamics of the hydro storage are described by the constraints (3.77) and (3.78). The maximal difference of the operation levels of the thermal units between successive time stages is constrained by inequality (3.79). The covering of the demand is ensured by inequality (3.80). Condition (3.81) is a reserve requirement that is considered in order to being able to handle unpredictable demand peaks. The model parameters are given by Table 3.1. For our numerical experiments, we have 2000
1000 800 600
1500
400 200
1000 0
24
48
72
96 120 144 168
0
5
10
15
20
25
30
Figure 3.2: Left side: Hourly electricity demand in M W for one week. Right side: Wind turbine power curve, showing the electricity production in M W depending on the wind speed (m/s.) used an hourly discretization and have varied the considered time horizon between two days (T = 48) and one year (T = 8760). The stochastic wind speed process have been modeled by a ARMA-GARCH time series model, see Ewing et al. (2004) a more detailed description of this model. This model has been used to generate a number of wind speed trajectories that are used to construct wind speed scenario trees by Algorithm 3.5. These trees follow a binary branching scheme, and branching takes place three times a day at
3.4. Case Study
79
6, 12, and 18 o’clock. Recombination is done once a day at 6 o’clock23 . The forward selection algorithm24 has been applied to approximate the conditional distributions (in Algorithm 3.7) and to carry out the recombination clustering (in Algorithm 3.6). The resulting tree has been transformed to a wind energy scenario tree by using a wind turbine power curve. Such a power curve25 shows the power output of a wind turbine or wind park corresponding to different wind speeds, see also Nørgård et al. (2004). An example is shown in Figure 3.2.
b1 b2 b3 p¯1 p¯2 p¯3 k¯
v¯ w ¯ ¯l lin , lend η δ β
parameter fuel cost coal fuel cost gas & steam fuel cost gas capacity coal capacity gas & steam capacity gas capacity wind capacity hydro turbine capacity hydro pump capacity hydro storage initial/final storage level pump efficiency power gradient reserve fraction
value 21 48 154 1,000 500 500 1,000 2,000 2,000 12,000 6,000 0.75 0.5 0.1
Table 3.1: Parameters of the power scheduling model in Chapter 3.
3.4.2
Numerical Results
All algorithms have been implemented in C++ and the linear subproblems have been solved with ILOG CPLEX 10.0. The results presented in the following have been realized on a Pentium IV with 3 GHz CPU and 1 GB RAM. 23 The timestage t = 0 represents 6 o’clock of the first day. Thus, with a time horizon of T hours, there are n = (T − 1)/24 recombination time stages Rj with Rj = 24 · j for j = 1, . . . , n. 24 We have employed Algorithm 2 of Dupačová et al. (2003), where we set the distance c(·, ·) equal to euclidean distance. 25 The curve shown in Figure 3.2 is a wind turbine power curve that has been scaled to the assumed size of the wind power park. However, a power curve for a complete wind park is in general smoother due to the spatial distribution of the various turbines of the wind park and the local variability of wind speed.
80
Chapter 3. Recombining Trees for Multistage Stochastic Programs
Cut Sharing We have solved the above power scheduling problem for different time horizons. In order to study the effects of the extensions of the Benders decomposition discussed in Section 3.2, we have first applied a “classical” Nested Benders Decomposition to decompose the problem into subproblems with a respective time horizon of 24h. Note that the coincidence of subtrees and subproblems within the recombining tree is not exploited for cut sharing by the classical Nested Benders. Then we have applied the decomposition Algorithm 3.2 with cut sharing. For better comparability of the results, we did not use here the control space aggregrations proposed in the Sections 3.2.2 and 3.2.3. The resulting running times of both solution algorithms are shown in Table 3.2. The considered time horizons are shown in the first column, where each day consists of 24 time stages. The second column contains the number of different subtrees per timeperiod, i.e., the values of the parameters mRj . The third and fifth column show the total number of subproblems26 considered by the decomposition with and without cut sharing, respectively. It can be seen that by identifying coinciding subtrees the exponential growth of the number of different subproblems can be avoided. Consequently, the running time of the solution algorithm can be drastically reduced. time horizon 2 days
3 days
4 days
mRj 1 2 4 1 2 4 1 2 4
w/o cut sharing # subproblems time 9 10s 9 10s 9 12s 73 91s 73 99s 73 94s 585 762s 585 859s 585 789s
with cut sharing # subproblems time 2 3s 3 4s 5 7s 3 3s 5 5s 9 9s 4 4s 7 6s 13 13s
Table 3.2: Number of different subproblems and running times of the Nested Benders Decomposition Algorithm with and without cut sharing.
26 Due to the binary branching scheme of the scenario tree, every subproblem (3.3) consists of 8 scenarios and 132 nodes.
3.4. Case Study
81
Aggregation of Decision Points While the cut sharing principle allows one to reduce the number of different subproblems, the aggregation of decision points reduces the number of subproblem evaluations. We now study the numerical effects of this aggregation and the evaluation of upper bounds discussed in Section 3.2.3. In order to ensure the same relative precision for the different components of the decision vector xt = (x1t , . . . , x5t ) (p1,t , p2,t , p3,t , wt , vt ), the aggregation parameter ρ on the right side of (3.12) is chosen relative to feasible range of the respective decision variable. More precisely, we have xi − xi ), where x¯i and xi are replaced the condition (3.12) by |xit − xˆit | ≤ ρ · (¯ the lower and upper level constraints on the decision variable xit . Recall that Algorithm 3.2 starts with a large aggregation parameter ρ = ρstart that is successively decreased to ρend , instead of starting directly with ρ = ρend . We have set ρstart = 0.1, and ρend has been varied between 10−4 and 10−2 . The decrease of ρ in Step 4 of Algorithm 3.2 has been implemented by multiplying ρ repeatedly with the factor 0.3. time horizon
mRj 1 2 days 2 4 1 1 week 2 4 1 2 weeks 2 4 1 1 month 2 4 1 3 months 2 4 1 year 1
ρend 10−4 10−4 10−4 10−4 10−4 10−4 10−3 10−3 10−3 10−3 10−3 10−3 10−3 10−3 10−3 10−2
w/o upper bounds with upper bounds rough end rough rough gap end 2s 2s 3s < 0.01% 3s 4s 4s 4s < 0.01% 4s 6s 6s 6s < 0.01% 6s 3s 17s 7s 0.02% 10s 7s 17s 13s 0.05% 15s 13s 26s 26s 0.03% 34s 5s 47s 16s 0.02% 33s 11s 711s 27s 0.04% 536s 21s 2512s 57s 0.03% 4535s 7s 47s 27s 0.01% 28s 21s >3h 59s 0.12% >3h 35s >3h 151s 0.03% >3h 19s >3h 68s 0.05% 507s 60s >3h 195s 0.09% 829s 60s >3h 868s 0.09% >3h 37s >3h 149s 1.12% >3h
Table 3.3: Running times for different time horizons with and without the use of upper bounds
82
Chapter 3. Recombining Trees for Multistage Stochastic Programs
The running times of the solution algorithm for various choices of parameters are shown in Table 3.3. The two columns entitled with “rough” show the running time of the first pass27 of Algorithm 3.2 with ρ = ρstart , whereas the “end”-columns show the total time until ρ has reached its final level ρend . It can be seen that the rough phase is completed very quickly. Moreover, the time needed for this rough approximation appears to depend linearly both on the length of the considered time horizon and on the number mRj of different subtrees. Unfortunately, decreasing the value of ρ corresponds to less recombining and, consequently, to an exponential growth of the number of control points and of the running times. However, the numerical results show that in most cases the solution obtained by the rough phase are not far from being optimal and do not significantly change if ρ is decreased to ρend . Thus, in practice, it may be reasonable to use moderate values for the final level ρend . The quality of the solution after the rough phase can be estimated by the upper bounds discussed in Section 3.2.3. The relative gap between the lower and upper bounds on the optimal value after the rough phase is shown in the column “rough gap” of Table 3.3. On the one hand, evaluating and storing the upper bounds needs ressources and therefore increases the running time. On the other hand, we have used the upper bounds in order to avoid the solution of subproblems whenever the deviation of the cutting plane approximation from the recourse function has been smaller than ρend . This reduces the running time. Indeed, it can be seen that in many cases the time needed until ρ = ρend is reduced by considering the upper bounds. Dynamic Aggregation As pointed out in Section 3.2.2, an extensive aggregation at the beginning of the decomposition allows to quickly obtain a rough approximation of the recourse functions and thus avoids to generate too many cuts that may turn out later as uninteresting. This argument is supported by the results shown in Table 3.4, where we have again set ρend = 10−4 . The advantage of a successive decreasing of ρ is obvious, especially when the problem size increases.
27 “First pass” refers to the first iterations of the algorithm with the initial (large) value of the aggregation parameter ρ. Since this involves a rough aggregation of decision points, this pass is called rough phase.
3.5. Out-of-Sample Evaluation time horizon mRj 3 days 1 3 days 2 3 days 4 5 days 1 5 days 2 5 days 4 1 week 1 1 week 2 1 week 4
83 ρ ρend ρ ≡ ρend 3s 3s 5s 5s 9s 10s 4s 15s 9s 22s 20s 31s 10s 51s 15s 1685s 34s 1960s
Table 3.4: Running times for constant and decreasing aggregation parameter
3.5
Out-of-Sample Evaluation
Algorithm 3.5 in Section 3.3.2 describes how a stochastic process can be approximated by a recombining scenario tree that allows to apply the modified Nested Benders approach. Theorem 4 ensures a certain degree of consistency of Algorithm 3.5 by showing that the optimal value of the approximated problem is close to the optimal value of the initial problem as soon as the accuracy of the approximation is sufficiently high. Unfortunately, this and other consistency and stability results require the optimization problems and underlying random variables to fulfill specific boundedness and regularity properties, see also Römisch (2003), Kuhn (2005), Heitsch et al. (2006), Mirkov and Pflug (2007). Such properties may be hard to verify in some cases of practical interest. Furthermore, due to the numerical complexity of many stochastic programming models, it may be necessary to use approximations that are too rough to obtain meaningful error bounds. In such cases, it makes sense to measure the quality of a certain approximation by numerical methods. Since a main task of stochastic programming is to find suitable decision strategies, it is reasonable to assess an approximation by evaluating the solutions obtained from solving the approximated problem. This can be done, e.g., by evaluating these solutions along out-of-sample scenarios, cf., e.g., Kaut and Wallace (2007) for a study of one-stage problems and Chiralaksanakul and Morton (2004), Hilli and Pennanen (2006) for multistage problems. In the literature, the term out-of-sample scenarios usually refers to scenarios that have not been used to construct the particular solution, i.e., in our framework, scenarios that are not contained in the scenario tree.
84
Chapter 3. Recombining Trees for Multistage Stochastic Programs
In this section, we propose a concept for out-of-sample testing in order to evaluate approximations to linear multistage stochastic programs. In particular we want to evaluate solutions based on the recombining tree generation algorithm and the extended Nested Benders Decomposition discussed in the previous sections. The main difficulty for out-of-sample tests of a multistage stochastic program is that the solutions of an approximated problem are not necessarily feasible along the out-of-sample scenarios. Therefore, the generation of feasible solutions is an important issue, which may be interesting inherently whenever one is interested in obtaining practically applicable solutions out of a certain stochastic programming model. Recently, Hilli and Pennanen (2006) and Chiralaksanakul and Morton (2004) have studied constructions of out-of-sample strategies for multistage programs with a small number of stages (i.e., less than 10 stages). Since we focus on longer time horizons, we choose another approach and study different projection approaches to construct feasible solutions. In order to evaluate approximations by recombining trees and the solution of the extended decomposition method, we consider the power scheduling problem studied in the previous section, as well as a problem of swing option evaluation. Furthermore, we compare the results of the above algorithms with results based on classical (i.e., nonrecombining) scenario trees that have been constructed by the Forward Tree Construction Algorithm of Heitsch and Römisch (2008).
3.5.1
Problem Formulation
We consider the multistage optimization problem (3.1) with the optimal value v(ξ) and the linear cost function ϕ(·, ·) defined by (3.2). Let us assume that the process ξ takes inifnitely many values. Then, a usual approach is to replace the process ξ by a suitable process ξ¯ taking only a finite number of scenarios ξ¯j = (ξ¯tj )t=1,...,T , j ∈ J, with J being some finite index set, such that the approximate problem ¯ inf E ϕ(ξ, ¯ x(ξ)) ¯ v(ξ) (3.82) s.t. x ∈ Mm , x ∈ X , t = 1, ..., T t t [1,T ] At,0 (ξ¯t )xt (ξ¯[t] ) + At,1 (ξ¯t )xt−1 (ξ¯[t−1] ) = ht (ξ¯t ), t = 2, ..., T may be solved by numerical methods. Let us consider an optimal solution x¯ = (¯ xt (·))t=1,...,T of the approximate problem (3.82) that results from such a numerical solution method. Then, the optimal value ¯ = E[ϕ(ξ, ¯ x¯(ξ))] ¯ v(ξ)
3.5. Out-of-Sample Evaluation
85
of problem (3.82) can be considered as an approximation of the unknown value v(ξ). However, it is shown by Example 2.5.1 in Chapter 2 that without ¯ may be far from v(ξ) even if ξ¯ is close appropriate conditions the value v(ξ) (in some sense) to ξ. Being interested in a reliable approximation of the unknown value v(ξ) and having in mind Example 2.5.1, it is reasonable to rather evaluate the approximate solution x¯ with regard to the original data process ξ. This means to consider the value E[ϕ(ξ, x¯(ξ))]. However, the approximate solution x¯ may fail to be feasible along the initial process ξ. Then it may be appropriate to modify x¯ to a (ξ-)feasible strategy x˜¯. Then, the value E[ϕ(ξ, x˜¯(ξ))] (3.83) provides an upper bound on the unknown value v(ξ) and appears to be a more reliable approximation of v(ξ) than E[ϕ(ξ, x˜¯(ξ))], since the value (3.83) can be indeed realized (on average) by implementing the strategy x˜¯. To evaluate the integral (3.83), the law of large numbers suggests to draw from the distribution of ξ a number of independent samples ξ i , i ∈ I, and to consider the out-of-sample value ¯ 1 v o (ξ) ϕ(ξ i , x˜¯(ξ i )). |I| i∈I
(3.84)
Then, different approximations ξ¯ and ξ¯ of ξ with varying accuracy (or constructed by different techniques) may be compared by means of their out-of-sample values v o (ξ¯ ) and v o (ξ¯ ). Similarly, it is possible to compare different solution and decomposition techniques by evaluating the resulting solutions.
3.5.2
Towards Feasible Solutions
Let us consider a solution x¯(·) to the approximate problem (3.82) and the ¯ Starting finite set of scenarios {ξ¯j : j ∈ J} of the approximating process ξ. from x¯(·), we aim to construct a strategy x˜¯(·) that is feasible along a set of out-of-sample scenarios {ξ i , i ∈ I} ⊂ supp P[T ] with P[T ] P[ξ ∈ ·]. This approach is denoted in the following as feasibility restoration. Note that, in order to ensure that x˜¯(·) is implementable by a non-clairvoyant decision maker, the feasibility restoration has to be nonanticipative, i.e., the value of x˜¯t (ξ i ) has to depend measurably on (ξ1i , . . . , ξti ).
86
Chapter 3. Recombining Trees for Multistage Stochastic Programs
To this end, we consider a nonanticipative mapping π : supp P[T ] → {ξ¯j : j ∈ J} that assigns every out-of-sample scenario ξ i to some (nearby) scenario of the approximated process. We say that π is nonanticipative if it is of the form π(ξ) = (π1 (ξ[1] ), . . . , πT (ξ[T ] )), where πt are Borel measurable mappings from supp P[t] → {ξ¯tj : j ∈ J}. Assuming a decision maker who i has observed the event {ξ[t] = ξ[t] } until time t, the rule π suggests a scenario i i (π1 (ξ[1] ), . . . , πt (ξ[t] )) of the approximate model that is close to his observation. (In particular, the mapping π suggests the corresponding decision i i xt (π1 (ξ[1] ), . . . , πt (ξ[t] )) at time t to the decision maker.) As in (3.35), the mapping π may be defined as a sequence of (conditional) projections, see Remark A.3 in the appendix for a detailed construction. In order to measure the distance between the set of out-of-sample scenarios {ξ i : i ∈ I} and their associated tree scenarios π(ξ i ), i ∈ I we introduce the term T i i 1 t=1 ξt − πt (ξ[t] )
. dπ (I, J) T i |I| i∈I t=1 ξt
Note that this term is closely related to the first summand of the upper bound (3.36) of Theorem 4. The nonanticipativity of π implies that x¯(π(·)) ∈ Mm ¯(π(ξ)) is [1,T ] , i.e., x nonanticipative with respect to the process ξ and thus indeed a potential solution to the initial problem (3.1). Unfortunately, x¯(π(ξ i )) is not necessarily feasible along the scenario ξ i of the initial process ξ. In order to enforce the feasibility, we study different projection-based approaches to modify x¯(π(·)). For the remainder of this section, the scenario index i is fixed. The decision x¯(·) along the scenario π(ξ i ) is denoted by x¯i and referred to as the reference solution. The modification of x¯i along the out-of-sample scenario ξ i is denoted by x˜¯i . Feasibility Restoration We aim to modify the sequence of decisions x¯i = (¯ xi1 , . . . , x¯iT ) to a sequence x˜¯i that is feasible along the out-of-sample scenario ξ i . This is done recursively as follows. We set x˜¯i1 x¯i1 . For t = 2, . . . , T and a given value of x˜¯it−1 we search a feasible point x˜¯it that is close to x¯it , in some sense. Such a point x˜¯it is given by the projection of x¯it on the feasible set at timestage t, i.e., on the set xt ∈ Xt : At,0 xt + At,1 x˜¯it−1 = ht (ξ i ), xt ≥ 0 . However, if the underlying model does not fulfill relatively complete recourse, the resulting projection may lead to future infeasibilities. In order to prevent
3.5. Out-of-Sample Evaluation
87
such incidents, we further restrict the feasible set at time t by considering some information about future constraints, see Remark 3.5.1 for a more detailed explanation. More precisely, we introduce the value Δit min xt − x¯it ∞ (3.85) xt ⎫ ⎧ ⎬ ⎨ At,0 xt + At,1 x˜it−1 = ht (ξti ), up ≤ A x + A x ≤ h , τ = t + 1, . . . , T, , (3.86) s.t. hlow τ,0 τ τ,1 τ −1 τ ⎭ ⎩ τ xτ ∈ Xt , τ = t, . . . , T, being the minimal distance from x¯it onto the (further restricted) feasible set at timestage t. and hup are chosen such that hlow ≤ Remark 3.5.1. The vectors hlow τ τ τ i up 28 hτ (ξτ ) ≤ hτ holds true for all i ∈ I and τ = t + 1, . . . , T . The corresponding conditions in (3.86) are added to avoid decisions x˜it that may lead to future infeasibilities, at least to some degree. In particular, we set up hlow τ,j = hτ,j for those components j of hτ (·) that do not depend on ξ. Observe that this simple approach to avoid future infeasibilities relies on the assumption of non-random matrices Aτ,0 and Aτ,1 . While this approach can be extended, e.g., by demanding the existence of feasible decisions xτ , τ ≥ t, along all possible future realizations of the process ξ, the present formulation allows to solve problem (3.85) very quickly. Basic Restoration. In order to determine a feasible point x¯˜it based on previously computed values for x˜¯i1 , . . ., x˜¯it−1 , a basic method is to choose x˜¯it as close as possible to the reference solution x¯it , i.e., we set x˜¯it to an optimal solution of problem (3.85): x˜¯it argmin xt − x¯it ∞ xt
s.t. (xτ )Tτ=t fulfills (3.86). Myopic Restoration However, sometimes it may be reasonable to not only stay as close as possible to x¯it , but to consider also the costs induced by the out-of-sample scenario ξ i . This multiobjective problem can be tackled, e.g., by minimizing the costs along ξ i , while allowing the distance xt − x¯it ∞ to exceed the lower bound Δit by some amount29 . Doing so in a myopic way means to minimize only those costs which are realized at time t, i.e., we set ˜it . Note that setting hlow = hτ (ξτi ) would violate the nonanticipativity condition on x ¯ τ We here allow for a relative fraction εrel · Δit with εrel ≥ 0 or an absolute value εabs ≥ 0. Setting εabs > 0 may be useful whenever Δit = 0, which is true, e.g., under relatively complete recourse, cf. the Swing Option Example in Section 3.5.3. 28 29
88
Chapter 3. Recombining Trees for Multistage Stochastic Programs x¯˜it argmin ct (ξ i ), xt + ρt xt − x¯it ∞ xt
s.t. (xτ )Tτ=t fulfills (3.86),
xt − x¯it ∞ ≤ (1 + εrel )Δit + εabs , for some small constant ρt ≥ 0. The term ρt xt − x¯it ∞ has been added to the objective to ensure that xt is as close as possible to x¯it whenever different cost minimal values xt are available. Thus, it is reasonable to set, e.g., ρt = 10−4 ct (ξ i ) ∞ . Farsighted Restoration The myopic feasibility restoration allows to consider the costs at time t when choosing the value x˜¯it . On the other hand, due to the time-coupling constraints in problem (3.82), the decision xt at time t affects the feasible sets at future time stages and thus the future costs. The latter aspect can be taken into account by considering certain shadow prices associated to the time-coupling constraints. To this end, let us assume for the moment that the sets Xt are polyhedral cones30 and consider the dual cone Xt∗ {x∗ ∈ Rs : x∗ , x ≥ 0 ∀x ∈ Xt } of Xt . Then, the dual problem to (3.82) writes as T ht (ξ¯t ), μt (3.87) max E t=1
⎧ μ ∈ Mr[1,T ] , ⎪ ⎪ ⎪ ⎪ ⎨ c1 − μ1 A1,1 ∈ X1∗ , ∗ s.t. cT (ξ¯T ) − μT (ξ¯[T ] )AT,0 ∈ X T ⎪ ⎪ ¯ ¯ ⎪ ct (ξt ) − μt (ξ[t] )At,0 + E μt+1 (ξ¯[t+1] )At+1,1 |ξ¯[t] ∈ Xt∗ , ⎪ ⎩ for t = 2, .., T − 1
⎫ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎭
,
cf. problem (7.5) of Ruszczyński and Shapiro (2003a). Let μ ¯ be an optimal solution to (3.87). The shadow price vector corresponding to the constraint that couples xt with the future decision xt+1 is then given by i ) . ηt+1 E μt+1 (ξ¯[t+1] )At+1,1 ξ¯[t] = πt (ξ[t] Note that, in particular, −ηt+1 is a subgradient of the recourse function i xt → Qt (πt (ξ[t] ), xt ) that is defined by (3.3). We can thus take the future 30 This assumption is made only in order to simplify the form of (3.87). The dual problem is stated here only for the sake of comprehensibility and, in general, it is not necessary to expicitly formulate the dual problem, since many solvers yield the solution of the dual problem as a “by-product”.
3.5. Out-of-Sample Evaluation
89
costs into account (at least to some degree) and define x˜it in a farsighted way as x˜¯it argminxt ct (ξ i ) − ηt+1 , xt + ρt xt − x¯it ∞ s.t. (xτ )Tτ=t
xt −
(3.88)
fulfills (3.86),
x¯it ∞
≤ (1 + εrel )Δit + εabs .
Extensive Restoration The farsighted approach can be extended by using not only a single but several subgradients of the recourse function. Such data is available, e.g., whenever the approximate problem (3.82) is solved by a cutting plane method as discussed in Section 3.2. Once some subgradients i (xt , q¯, π ¯ ) of xt → Qt (π[t] (ξ[t] ), xt ) are known, we can set x¯˜it argminxt ct (ξ i ), xt + Θt + ρt xt − x¯it ∞ s.t. (xτ )Tτ=t fulfills (3.86), Θt ≥ q¯ + ¯ π , xt − xt ,
for all xt ,
xt − x¯it ∞ ≤ (1 + εrel )Δit + εabs . Remark 3.5.2. The extensive method is closely related to the first approach of Chiralaksanakul and Morton (2004), which has been proposed for multistage stochastic programs with interstage independence or a weak type of interstage dependence. Whenever the optimization model does not possess relatively complete recourse, the above proposed restoration schemes may fail due to infeasibility of the problem (3.85). As pointed out by Küchler and Vigerske (2008), it is then possible to measure the “degree of infeasibility” and to construct solutions that violate the constraints “as little as possible”.
3.5.3
Numerical Examples
The proposed out-of-sample evaluation method has been applied for two stochastic programming models. For both models, we have evaluated recombining scenario trees as well as classical (e.g., non-recombining) scenario trees. The classical scenario trees have been generated with the Forward Tree Construction Algorithm of Heitsch and Römisch (2008). The optimal ¯ of the resulting instance of the stochastic program has been comvalue v(ξ) puted by solving the deterministic equivalent with ILOG CPLEX 10.0. The recombining scenario trees have been constructed by Algorithm 3.5, where the subtrees for the periods [Rj + 1, Rj+1 ], j = 0, . . . , n have been generated again by the Forward Tree Construction.
90
Chapter 3. Recombining Trees for Multistage Stochastic Programs
The stochastic program involving the recombining trees has been solved by Algorithm 3.2 in Section 3.2. Recombination has been taken place every four time stages, and the number of different subtrees per timeperiod has been varied between two and eight. Since the running time of Algorithm 3.2 crucially depends on the choice of the aggregation parameter ρ, the outof-sample evaluation approach has been also applied to study the quality of solutions obtained with different values of ρ. Power Scheduling We consider the power scheduling model detailed in Section 3.4.1 with a time horizon of T = 48 hours. For the bounds on the uncertain wind power low production κt (as required for hlow and hup ≡ 0 and t in (3.86)) we use κt t up κt ≡ max{ κτ ∞ : τ = 1, . . . , T } for t = 1, . . . , T . Some results of the out-of-sample evaluation are shown in Figure 3.3. It ¯ are higher than the minimal can be seen that the out-of-sample values v o (ξ) ¯ costs v(ξ) of the tree based stochastic programs. While the optimal val¯ of the approximate problems do not significantly differ for scenario ues v(ξ) trees with different numbers of nodes, the out-of-sample values significantly decrease with a growing number of nodes. The same decreasing behaviour is shown by the distance dπ between the scenario trees and the set of out-ofsample scenarios. From this observations we can conclude that the quality of the tree based solutions indeed increases with increasing accuracy of the scenario tree approximation. Furthermore, it can be seen in Figure 3.3 that the out-of-sample values ¯ of the recombining tree based stochastic programs are significantly betv o (ξ) ter than those relying on classical scenario trees with the same number of nodes. This is probably because a recombining tree includes much more scenarios than a classical tree with the same number of nodes31 . As one would expect, the out-of-sample values for recombining scenario trees with 4 different subtrees per timeperiod are better than those for trees with only 2 different subtrees. This reveals the significance of the second summand of the upper bound (3.36) of Theorem 4, which stands for the error due to the recombination clustering. The out-of-sample values do not only depend on the particular tree approximation and solution, but also on the specific feasibility restoration method. The left side of Figure 3.3 shows the out-of-sample values resulting from different versions of the farsighted feasibility restoration (3.88), each 31 In order to count the number of nodes of a recombining tree, the nodes of coinciding subtrees are considered only once.
3.5. Out-of-Sample Evaluation
91
Non-recombining scenario trees. 0.05 0.04
1.7
0.03
1.67
0.02 0.01
1.64 250
500
1000
2000
5000
250
500
1000
2000
5000
Recombining scenario trees with mRj = 2 subtrees per timeperiod. 0.05 0.04
1.7
0.03 1.67
0.02 0.01
1.64 250
500
1000
2000
5000
250
500
1000
2000
5000
Recombining scenario trees with mRj = 4 subtrees per timeperiod. 0.05 0.04
1.7
0.03 1.67
0.02 0.01
1.64 250
500
1000
2000
5000
250
500
1000
2000
5000
¯ Figure 3.3: Results for the power scheduling model. Left side: Optimal value v(ξ)
¯ of the tree based problems (in bold) and corresponding out-of-sample values v˜(ξ) for several values of εrel using a series of scenario trees with increasing number ¯ of nodes (abscissa). The values for εrel are (with decreasing position of the v˜(ξ) curve) 0, 0.05, 0.10, 0.25, 0.5, and 1.0. Right side: The distance dπ between the scenario trees and the set of out-of-sample scenarios.
corresponding to a certain value of εrel . Recall that the value of εrel determines how much the modified solution is allowed to differ from the tree solution in favor of cost minimality. The case εrel = 0 corresponds to the basic feasiblity restoration.
92
Chapter 3. Recombining Trees for Multistage Stochastic Programs
On the one hand, the out-of-sample values decrease with increasing values of εrel . This is reasonable because larger values of ε allow, e.g., to reduce the operation level of the thermal units whenever the wind energy input turns out to be larger than in the reference tree scenario. On the other hand, our numerical experiences show that the number of out-of-sample scenarios that lead to an infeasible problem (3.85) in the feasibility restoration phase increases32 with increasing value of εrel . For εrel ≤ 0.1 the share of infeasible scenarios is less than 0.1%; this value increases to 0.4% for εrel = 0.5, and to 5% for εrel = 1. This is because (cost-optimal) reductions of the operation levels of the thermal units may lead to future infeasiblities, which are not perfectly considered by the proposed feasibility restoration. ¯ The influence of the aggregation parameter ρ on the optimal value v(ξ) ¯ and on the out-of-sample value v˜(ξ) are shown in Table 3.5. It can be seen that the error caused by terminating the solution algorithm with a larger value of ρ is almost neglectable for this model. We further note that the solution of the deterministic equivalent of a problem based on a classical scenario trees with a comparable number of nodes took only 15s. However, Figure 3.3 shows that the recombining tree solutions perform significantly better in the out-of-sample evaluation. ¯ v o (ξ) (εrel = 0) 0.1 1.643089 1.655321 0.01 1.643660 1.655206 0.001 1.643660 1.655112 ρ
¯ v(ξ)
¯ v o (ξ) time (εrel = 0.05) 1.654294 36s 1.654178 81s 1.654080 879s
¯ the Table 3.5: Influence of aggregation parameter ρ on the optimal values v(ξ),
¯ and the running time of the decomposition algorithm out-of-sample values v o (ξ), for a recombining scenario tree with mRj = 4 subtrees per timeperiod and 7728 nodes in the power scheduling model.
Swing Option Exercising A swing option is a contract that gives the option holder the right to repeatedly purchase greater or smaller amounts of energy during the time [1, T ] 32 Although the capacity of the thermal units is sufficient to cover the maximal load, cf. Table 3.1 and Figure 3.2, this model does not possess relatively complete recourse because of the condition (3.77) on the minimal final fill level of the water storage and ¯ result from evaluating all the reserve requirements (3.81). The presented values of v o (ξ) feasible decision paths.
3.5. Out-of-Sample Evaluation
93
for a fixed price of K ≥ 0 per unit. Let us assume that the total amount of purchased energy is bounded by a constant U ≥ 0 and the amount of energy purchased at time t ∈ [1, T ], which we denote by xt , has to lie in some interval [l, u] ⊂ R+ . Assuming that the purchased energy is immediately sold on the spot market, the holder of the swing option is interested in finding a purchase strategy xt , t = 1, . . . , T , that maximizes the expected accumulated wealth. This problem33 can be written as follows: ⎧ ⎫ x ∈ M1 , T ⎨ ⎬ [1,T ] t = 1, . . . , T, . K − ξt , xt : xt ∈ [l, u], min E (3.89) T ⎩ ⎭ t=1 x ≤ U, t t=1 Here, the stochastic process ξ = (ξt )Tt=1 describes the spot market price per energy unit and is assumed to follow a discrete time geometric Brownian motion, i.e., ξ1 = 1 and 1 ξt = ξt−1 exp(εt − σ 2 ), 2
t = 2, . . . , T.
(3.90)
Thereby εt , t = 2, . . . , T, are independent, normally distributed random variables with expectation μ and variance σ 2 . In the following, we assume for the sake of notational simplicity l = 0, u = 1, and U ∈ N. The main reason for assuming this (unrealistically simple) form of the spot price process ξ is that problem (3.89) can be solved analytically whenever the drift μ is nonnegative. Indeed, using the (negative) payoff function ψ(ξt ) (K − ξt )− , problem (3.89) can be written as34 ⎧ ⎫ x ∈ M1 , T ⎨ ⎬ [1,T ] t = 1, . . . , T, . min E xt ψ(ξt ) : xt ∈ [0, 1], (3.91) T ⎩ ⎭ t=1 x ≤ U t=1 t If μ ≥ 0, the process ξ is a submartingale. Consequently, due to the monotonicity and the concavity of ψ, the process (ψ(ξt ))t=1,...,T is a supermartingale. In particular, E[ψ(ξt )] is decreasing in t. It is thus no surprise that an early exercise of the the swing option is not optimal, and the strategy x∗ defined by 0, if t ≤ T − U, ∗ (3.92) xt (ξt ) 1, if t > T − U and ξt > K, 33
A similiar problem has been studied by Hilli and Pennanen (2006). Note that problem (3.91) admits more optimal solutions than problem (3.89), since the term ψ(ξt ) vanishes whenever the option is out of the money. 34
94
Chapter 3. Recombining Trees for Multistage Stochastic Programs
for t = 1, . . . , T is an optimal solution for (3.89). This is shown, for the sake of completeness, by Proposition A.1 in the appendix. Observe that the algorithms we have used for the scenario tree construction do not maintain the supermartingale property, in general. Hence, the tree based optimal solutions are not necessarily of type (3.92). Numerical Results For the numerical results discussed in the following, the parameter values T = 52, U = 20, K = 1, μ = 0, and σ = 0.07 have been used. Compared to the power scheduling model, the small number of variables and constraints let appear problem (3.89) simple. However, while in the power scheduling model uncertainty appears only in one of several components of the system, the decisions in the swing option model are driven by the stochastic price process exclusively. Thus, the accuracy of the approximation ξ¯ can be expected to have larger effects on the optimal value of the stochastic program and the out-of-sample values. Indeed, Figure 3.4 ¯ shows that the relative deviation between the tree based optimal values v(ξ) o ¯ and the out-of-sample values v (ξ) is significantly larger than in the power scheduling problem, cp. Figure 3.3. ¯ and For the considered non-recombining scenario trees, the values v(ξ) o ¯ v (ξ) appear to be far from convergent. This suggests that the price process with T = 52 timestages is not well approximated by the considered trees with up to 5 000 nodes. ¯ and the out-of-sample values v o (ξ) ¯ based on The optimal values v(ξ) recombining scenario trees appear more stable. Furthermore, the out-of¯ is apparently a better approximation of v(ξ) than the sample value vo (ξ) ¯ which again are by far too optimistic. tree based values v(ξ), Because the uncertainty appears only in the objective function coefficients, each scenario solution x¯i , i ∈ I, is feasible along every out-of-sample scenario, i.e., we have Δit = 0 in (3.85). Consequently, the basic feasibility restoration simply amounts to evaluating the scenario solution x¯i with regard to the costs ct (ξti ), t = 1, . . . , T . Figure 3.4 shows the results of the basic approach as well as a myopic and a farsighted strategy, each with parameter εabs = 1. Note that the choice εabs = 1 yields solutions x˜¯it that are independent of the tree solution x¯. Furthermore, the out-of-sample values based on the myopic approach do not depend on the scenario tree anymore, while the farsighted method still depends on the scenario tree solution by the consideration of shadow prices. Although the latter improves the results of the myopic approach, one can see that the best results are achieved by the basic approach.
3.5. Out-of-Sample Evaluation
95
Non-recombining scenario trees -2
0.25
-2.5
0.2
-3 0.15 -3.5 0.1 -4 0.05
-4.5 250
500
1000
2000
5000
250
500
1000
2000
5000
Recombining scenario trees (two subtrees / timeperiod) -2
0.25
-2.5
0.2
-3 0.15 -3.5 0.1 -4 0.05
-4.5 250
500
1000
2000
5000
250
500
1000
2000
5000
Recombining scenario trees (eight subtrees / timeperiod) -2
0.25
-2.5
0.2
-3 0.15 -3.5 0.1 -4 0.05
-4.5 250
500
1000
2000
5000
250
500
1000
2000
5000
¯ Figure 3.4: Results for the swing option model. Left side: Optimal value v(ξ) of the stochastic program (in bold) and the corresponding out-of-sample values ¯ for basic (lower straight line) and farsighted (upper straight line) feasibility v˜(ξ) restoration using a series of scenario trees with increasing number of nodes (abscissa). The myopic value and the exact value v(ξ) are plotted as the upper and lower dashed lines, respectively. Right side: The distance dπ between the scenario trees and the set of out-of-sample scenarios.
¯ and the The effect of the aggregation parameter ρ on the tree value v(ξ) ¯ is shown in Table 3.6. We can see that decreasing out-of-sample value v o (ξ) ρ from 0.1 to 0.01 has a significant effect on the values and the running time.
96
Chapter 3. Recombining Trees for Multistage Stochastic Programs
¯ ¯ v o (ξ) v o (ξ) (basic) (farsighted) 0.1 −4.483866 −3.305790 −2.65960 0.01 −4.458681 −3.364209 −2.65597 0.001 −4.458655 −3.364509 −2.65593 ρ
¯ v(ξ)
time 42s 238s 276s
Table 3.6: Influence of the aggregation parameter ρ on the optimal values, the out-of-sample values, and the running time of the decomposition algorithm for a recombining scenario tree with four different subtrees per timeperiod and 13,697 nodes in the swing option model.
Chapter 4 Scenario Reduction with Respect to Discrepancy Distances As we have discussed in the previous chapters, many stochastic optimization problems do not allow for an analytic solution, and, hence, one has to resort to numerical approaches. However, numerical approaches usually require the underlying probability measures to have only a finite support. While such finite measures can be obtained, e.g., by sampling or from historical data, the number of atoms (or, scenarios) has to be in general sufficiently small to maintain the numerical tractability. Approximating a (finite) probability measure by a measure with a smaller number of atoms is denoted as scenario reduction in the literature. In order to approximate a given probability measure by a limited number of scenarios, a variety of approaches has been developed. These techniques follow different principles like random sampling (Shapiro, 2003b), Quasi Monte-Carlo sampling (Pennanen, 2005, 2009), moment matching (Høyland and Wallace, 2001; Høyland et al., 2003), and approximation with respect to certain probability metrics (Pflug, 2001; Dupačová et al., 2003; Heitsch and Römisch, 2007; Hochreiter and Pflug, 2007). The latter approach results from the following considerations. Whenever it is inevitable to approximate some initial probability measure by a finite scenario set, the latter set should be chosen such that the optimal value (and/or solution set) of the approximate problem remains as close as possible to the data of the initial problem. By establishing smoothness properties such as calmness, Lipschitz or Hölder continuity of optimal values and solution sets with respect to certain probability metrics, quantitative stability analysis yields helpful insights how such approximations should look like.
98 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
For instance, stochastic linear two-stage recourse models without integrality constraints are stable with respect to the so-called Fortet-Mourier metrics (Römisch, 2003, Section 3.1). Scenario reduction techniques with respect to Fortet-Mourier metrics have been developed, e.g., by Dupačová et al. (2003) and Heitsch and Römisch (2003, 2007). However, stochastic programs including chance constraints or integrality constraints require other probability metrics, the so-called discrepancy distances. In this chapter, we extend the scenario reduction approach of Dupačová et al. (2003) to these discrepancy distances. In Section 4.1 we recall the notation of discrepancy distances. In order to highlight the relevance of discrepancies in the context of scenario reduction, we conclude in Section 4.2 some recent stability results of mixed-integer and chance-constrained stochastic programs. In Section 4.3, the problem of optimal scenario reduction with respect to a discrepancy distance is formulated. It is shown that this problem can be decomposed into an outer combinatorial problem and a specific inner optimization problem. Upper and lower bounds on the optimal value of the scenario reduction problem as well as particular cases that allow for an explicit solution are discussed in Section 4.4. The inner problem, i.e., finding optimal probability weights for a given fixed (reduced) support set, is studied in Section 4.5. It is shown that this inner problem corresponds to a (huge) linear program. In order to enable a numerical solution, we show how the dimension of this linear program can be significantly reduced. However, it turns out that the determination of the coefficients of the reduced inner problem amounts to a further combinatorial optimization problem. We develop an approach that allows to solve the latter problem (and hence the inner problem) for a certain family of polyhedral discrepancies. In Section 4.6 we revise several heuristics and develop a specific Branch and Bound approach to tackle the outer (NP-hard) combinatorial problem. Numerical experiences are presented in Section 4.7 and further results about discrepancies that are relevant in the context of scenario reduction are discussed in Section 4.8.
4.1
Discrepancy Distances
For s ∈ N and a system B of Borel subsets of Rs , the B-discrepancy between two Borel probability measures P and Q on Rs is defined as αB (P, Q) sup P[B] − Q[B] , (4.1) B∈B
see, e.g., Hlawka and Niederreiter (1969) and Hlawka (1971). Important examples are the system Bcl of all closed subsets, the system Bconv of all
4.1. Discrepancy Distances
99
closed, convex subsets, the system Bph,k of all polyhedra having at most k faces, the system Bph,W of all polyhedra each of whose facets1 parallel to a facet of [0, 1]s or to a facet of pos W (where W denotes some real-valued (s × m1 )-matrix), the system Brect of all closed, s-dimensional rectangles ×si=1 Ii (where Ii denotes a closed interval in R for i = 1, . . . , s), and the system Bcell of all closed cells in Rs (i.e., sets of the form ξ +Rs− with ξ ∈ Rs ). It is easy to see that the estimate αBcell ≤ αBrect ≤ αBph,W (αBph,k ) ≤ αBconv ≤ αBcl
(4.2)
holds true, where for the inequality αBrect ≤ αBph,k we have to require that k is sufficiently large to ensure that Brect ⊆ Bph,k . Every B-discrepancy is non-negative, symmetric, and satisfies the triangle equality, being thus a semimetric on the space of Borel probability measures on Rs . Furthermore, all discrepancies in (4.2) are metrics, since αBcell (P, Q) = 0 implies P = Q. A sequence (Pn )n≥0 of probability measures converges to P with respect to αB for some system B ⊆ Bcl if and only if it converges weakly to P and P[∂B] = 0 holds for each B ∈ B, see Billingsley and Topsøe (1967). In the literature, the distance αBcell is also referred to as the Kolmogorov metric or the uniform metric (Rachev, 1991), because αBcell (P, Q) corresponds to the uniform distance of the distribution functions of P and Q on Rs . The distance αBconv is known as the isotrope discrepancy (Mück and Philipp, 1975) and αBcl as the total variation (Rachev, 1991). Consistently, the distances αBcell and αBrect are denoted as the cell discrepancy and the rectangular discrepancy, respectively. Without being ambiguous, we will refer in the following to each of the distances αBph,k and αBph,W as the polyhedral discrepancy. Some of these discrepancy distances have been applied in the context of uniformly distributed sequences (the so-called “low discrepancy sequences”) in the s-dimensional unit cube U s = [0, 1]s (Kuipers and Niederreiter, 1974) and, more recently, for developing Quasi-Monte Carlo methods for numerical integration (Niederreiter, 1992). Converse inequalities to (4.2), e.g., for the isotrope and rectangular discrepancies of probability measures P and Q on Rs , have been derived by Hlawka (1971), Kuipers and Niederreiter (1974), Mück and Philipp (1975), and Niederreiter and Wills (1975), see also Section 4.8.3 in this chapter. In the next section, we will see that in the context of quantitative stability of stochastic programs the class B of Borel sets is chosen as small as possible (in order to obtain tight estimates), but large enough to contain 1 Following the usual notation, the (s − 1)-dimensional faces of a polyhedron in Rs are denoted as facets.
100 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
all sets appearing in chance constraints of type (1.6) or all closures of the continuity regions of the recourse function in two-stage mixed-integer models. Consequently, the polyhedral, the rectangular, and the cell discrepancies are of special importance for linear chance-constrained and mixed-integer two-stage stochastic programming models.
4.2
On Stability of Two-Stage and ChanceConstrained Programs
In order to highlight the relevance of discrepancy distances in stochastic programming, we recall results of the recent survey by Römisch (2003) on stability of stochastic programs. For the sake of simplicity, we here focus on optimal values rather than on optimal solution sets. A class of probability distances that arises naturally in the context of stability of stochastic programs is given by the (semi-)metrics with ζ-structure, see Zolotarev (1983) and Rachev (1991). Theses distances are of the form DF (P, Q) sup f (ξ)P[dξ] − f (ξ)Q[dξ] , (4.3) f ∈F
Ξ
Ξ
with F denoting a certain class of Borel measurable functions from Ξ ⊂ Rs ¯ Indeed, let us recall the dynamic formulation (1.11) of the linear twoto R. stage program (1.1). It is easy to see that the following estimate holds true for problem (1.1): (4.4) |v(P) − v(Q)| ≤ sup Q2 (x1 , ξ)P[dξ] − Q2 (x1 , ξ)Q[dξ] . x1 ∈X1
Ξ
Ξ
Thus, stability of the optimal value holds with respect to the distance DF with F = {Q2 (x1 , ·) : x1 ∈ X1 }. Estimate (4.4) is a special instance of the following more general result that is detailed by Theorem 5 by Römisch (2003). Let us consider the stochastic optimization problem
f0 (x, ξ)P[dξ] : x ∈ X, fj (x, ξ)P[dξ] ≤ 0, j = 1, . . . , d , (4.5) min Ξ
Ξ
with fj being some random lower semicontinuous mappings from Rm × Ξ ¯ It is shown by Römisch (2003) that under specific regularity and to R. boundedness conditions stability of the optimal value of problem (4.5) holds with respect to the distance DF with F = {fj (x, ·) : x ∈ X, j = 0, . . . , d}. The sets F and F and the corresponding distances DF and DF are welladapted to the particular optimization problems (1.1) and (4.5). However,
4.2. On Stability of Two-Stage and Chance-Constrained Programs
101
they are little general because they involve the detailed objective and constrained data of the underlying stochastic programs. One approach to obtain more general probability distances is to identify suitable analytical properties of the mappings ξ → Q2 (x1 , ξ), x1 ∈ X1 , and to enlarge F to the class of all functions sharing these properties. In the case of linear two-stage stochastic programs, one can verify (again under specific regularity and boundedness conditions) that the mappings ξ → Q2 (x1 , ξ) are locally Lipschitz continuous, i.e., L1 Q2 (x1 , ·) ∈ F2 , where Fp denotes the set of locally Lipschitz continuous functions whose Lipschitz constant grow polynomially of order p − 1: ˆ ≤ max{1, ξ , ξ } ˆ p−1 ξ − ξ
ˆ . (4.6) Fp f : Ξ → R : |f (ξ) − f (ξ)| Consequently, we obtain stability of linear two-stage programs with respect to the second order Fortet-Mourier metric ζ2 DF2 , i.e., |v(P) − v(Q)| ≤ L · ζ2 (P, Q), for P, Q ∈ P2 (Ξ) and some constant L ≥ 0, see Theorem 23 of Römisch (2003) for a more detailed discussion. Thereby, Pp (Ξ) denotes the set of all Borel probability measures on Ξ having finite absolute moments of order p ≥ 1. Mixed-Integer Two-Stage Programs Unfortunately, the continuity of the cost function Q2 (x1 , ·) is generally lost under the integer constraints (1.4), see Example 4.2.1, However, it turns out that Q2 (x1 , ·) is then still piecewise polyhedral, and its regions of Lipschitz continuity can be characterized, see Proposition 2 of Römisch and Vigerske (2008) and the references therein. It turns out that these continuity regions are contained in a certain class B of convex polyhedra in Ξ with a uniformly bounded number of faces. Let us further consider the following class of uniformly bounded locally Lipschitz continuous mappings:
ˆ ≤ max{1, ξ , ξ } ξ ˆ ˆ |f (ξ) − f (ξ)| − ξ , ˆ ¯ F2 f : Ξ → R : ∀ξ, ξ ∈ Ξ . |f (ξ)| ≤ max{1, ξ }2 , Extending earlier results of Schultz (1996) and Römisch (2003), it is shown by Römisch and Vigerske (2008) that the class
¯ 2 (Ξ), B ∈ B ¯ 2,B f 1B : f ∈ F F and the corresponding distance with ζ-structure ζ2,B (P, Q) DF¯2,B (P, Q)
(4.7)
102 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances between probability measures P and Q in P2 (Rs ) is suitable for the mixedinteger two-stage program defined by (1.1) and (1.4), i.e., under certain conditions we obtain (4.8) |v(P) − v(Q)| ≤ L · ζ2,B (P, Q) for some constant L ≥ 0. Unfortunately, the particular system B may be very large. In order to obtain a more tractable system, we assume nonrandom recourse and technology matrices A2,0 and A2,1 , and consider the following (mixed-integer) instance of the two-stage program (1.1): inf b1 , x1 + E [b2 (ξ), x2 (ξ)] s.t. Z m2 1 x1 ∈ X1 , x2 (ξ) = (xR2 (ξ), xZ2 (ξ)), xR2 (ξ) ∈ Rm + , x2 (ξ) ∈ Z+
A2,1 x1 + AR2,0 xR2 (ξ) + AZ2,0 xZ2 (ξ) = h2 (ξ). Then, it can be shown that it is sufficient if the system B contains all sets of the form {ξ ∈ Ξ : h2 (ξ) ∈ A2,1 x1 + B} = {ξ ∈ Ξ : ξ ∈ h−1 2 (A2,1 x1 + B)},
(4.9)
with x1 ∈ X1 and B being a polyhedron each of whose facets is parallel 1 to a facet of the cone pos AR2,0 {A2,0 xR2 : xR2 ∈ Rm + } or of the unit cube d [0, 1] , i.e., B ∈ Bph,AR2,0 , see Schultz (1996, Proposition 3.1) and Römisch and Vigerske (2008, Proposition 1). In the following, we further assume that d = s and h(ξ) = ξ. Then, the sets of type (4.9) are contained in Bph,AR2,0 , and, hence, we can choose
B = Bph,AR2,0 . For notational simplicity, we will use the notation W AR2,0 . Having in mind the inequality (4.8), it is reasonable to develop an approach to scenario reduction with respect to the probability metric ζ2,Bph,W . Unfortunately, as pointed out in Section 4.8.1, the distance ζ2,Bph,W does not seem to be an appropriate measure of distance for scenario reduction. However, since the system Bph,W is stable under intersection with s-dimensional rectangles, we can proceed as in Corollary 3.2 of Schultz (1996) to verify the following estimate: 1
ζ2,Bph,W (P, Q) ≤ CαBph,W (P, Q) s+1
(4.10)
for all probability measures P and Q on Ξ whose supports are contained in a ball with radius R ∈ R+ around the origin. Thereby, the constant C = C(R) in (4.10) only depends on the problem (1.4) and the radius R. Since our objective is to measure (and minimize) the distance of two discrete
4.2. On Stability of Two-Stage and Chance-Constrained Programs
103
probability measures P and Q with Q’s support being a subset of the support of P, both supports are contained in some finite ball around the origin and, thus, the estimate (4.10) applies. Combining the estimates (4.8) and (4.10), we see that the optimal values of mixed-integer two-stage programs with random right-hand sides are stable with respect to the polyhedral discrepancy αBph,W . Consequently, αBph,W seems to be appropriate for scenario reduction techniques. Note that if every facet of pos W parallels a facet of the unit cube, the distance αBph,W coincides with the rectangular discrepancy αBrect . In particular, the latter discrepancy becomes suitable in case of pure integer recourse, 2 i.e., if x2 (ξ) = xZ2 (ξ) ∈ Zm + . The following example illustrates the stability results discussed so far. Example 4.2.1. Let Ξ [0, 1]2 , X {0} ⊂ R2 and P be some probability measure on Ξ. We consider the following mixed-integer two-stage stochastic program: v(P) Q2 (ξ) P[dξ] with (4.11) Ξ
xZ ∈ Z+ , xR2 ∈ R+ , xR2 (ξ) ≤ ξ2 . Q2 (ξ) inf xZ2 (ξ) + 2xR2 (ξ) : 2Z x2 (ξ) + xR2 (ξ) ≥ ξ1 , Q2 (ξ) is equal to 1 if ξ1 > min{0.5, ξ2 } and equal to 2ξ1 otherwise. A plot of ξ → Q2 (ξ) is presented in Figure 4.2. While Q2 is discontinuous, introducing slack variables and writing problem (4.11) in the standard form (1.1) entails that the continuous variable xR2 (together with the slack variables) is assigned to the recourse matrix ) ( 1 −1 0 R . W A2,0 = 1 0 1 We thus can see that the closures of the regions of continuity of Q2 are indeed contained in the familiy Bph,W . Assuming that P follows a uniform distribution on the line segment {(z, z) : z ∈ (0.1, we define for ε ∈ (0, 0.1) the shifted measure Pε via Pε [A] ε0.4)}, for every Borel set A ⊂ Ξ. It follows that v(Pε ) − v(P) = 0.5 for P A + −ε every ε ∈ (0, 0.1). On the other hand, with ε → 0 the measures Pε converge to P in the weak sense, as well as with respect to the rectangular discrepancy αBrect . Having in mind the aforementioned stability with respect to αBph,W , it is no surprise that αBph,W (P, Pε ) = 1 for every ε > 0.
104 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
0.6
Ξ2 0.4 0.2 1 0.5 Q2
0.4 0.2 0 2
0.6
Ξ1
Figure 4.1: Recourse function ξ → Q2 (ξ) of Example 4.2.1. Chance-Constrained Programs Chance-constrained stochastic programs of type (1.5) can be also written in the form (4.5). To this end, we introduce the notation H(x1 ) {ξ ∈ Ξ : A2,1 (ξ)x1 ≥ h2 (ξ)} and write problem (1.5) as follows:
inf b1 , x1 : x1 ∈ X1 , P [H(x1 )] ≥ p
b1 , x1 P[dξ] : x1 ∈ X1 , p − 1H(x1 ) (ξ)P[dξ] ≤ 0 . = inf Ξ
Ξ
Under suitable technical assumptions one can proceed as in Section 3.3 of Römisch (2003) and show that the optimal value of problem (1.5) is stable with respect to the distance DF with F = {1H(x1 ) (ξ) : x1 ∈ X1 }. Since {H(x1 ) : x1 ∈ X1 } is a family of polyhedra with a uniformly bounded number of faces, DF is bounded from above by the discrepancy αph,k for some k ∈ N. Sharper estimates may be obtained in special cases, e.g., when the matrix A2,1 is non-random. In the latter case, we can pass to a discrepancy αB with B containing all polyhedra of the form h−1 2 (−∞, A2,1 x1 ] for x1 ∈ X1 . If additionally h2 (ξ) = ξ holds true, we obtain the cell discrepancy, see also Example 4.7.1 in Section 4.7. We further refer to Example 2 of Henrion et al. (2008) as another chance-constrained program involving the rectangular discrepancy.
4.3. Scenario Reduction
4.3
105
Scenario Reduction
Let us now turn to the problem of optimal scenario reduction with respect to a discrepancy distance. To this end, we consider a probability measure P with finite support {ξ 1 , . . . , ξ N } and set pi P[{ξ i }] > 0 for i = 1, . . . , N . Denoting by δξ the Dirac-measure in the point ξ, the measure P has the form P=
N i=1
pi δξi .
The problem of optimal scenario reduction consists of determining a discrete probability measure Q on Rs supported by a subset of {ξ 1 , . . . , ξ N }, such that Q deviates from P as little as possible with respect to some distance, in this case a certain discrepancy αB . This problem can be written as N n minimize αB (P, Q) = αB ( pi δξ i , qj δηj ) (4.12) i=1 j=1 n qj = 1, qj ≥ 0, j = 1, . . . , n. s.t. {η 1 , . . . , η n } ⊂ {ξ 1 , . . . , ξ N }, j=1
The variables to be optimally adjusted here are the support η = {η 1 , . . . , η n } and the probability weights q = (q1 , . . . , qn ) of the reduced measure Q, altogether they define Q via n Q= qj δηj . (4.13) j=1
The optimization problem (4.12) may be decomposed into an outer problem for determining supp Q = η, and an inner problem for choosing the probabilities q. To formalize this, we denote by αB (P, (η, q)) the B-discrepancy between P and Q, and by Sn the standard simplex in Rn : N n αB (P, (η, q)) αB ( pi δξi , qj δηj ) i=1 j=1 n qj = 1}. Sn {q ∈ Rn : qj ≥ 0, j = 1, . . . , n, j=1
Using this notation, the scenario reduction problem (4.12) can be written as inf inf αB (P, (η, q)) : η ⊂ {ξ 1 , . . . , ξ N }, #η = n , (4.14) η
q∈Sn
with the inner problem inf{αB (P, (η, q)) : q ∈ Sn }
(4.15)
for the fixed support η. The nested problem (4.12) can be tackled by a bilevel approach: in an outer iteration, the support selection is carried out by solving
106 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
the combinatorial optimization problem (4.14), whereas in an inner iteration optimal probabilities qj are determined conditional to the fixed support by solving (4.15). The outer problem (4.14) is related to the so-called k-means problems, specific clustering problems that appear, e.g., in statistics, economics, and data mining. The inner problem (4.15) will turn out as a (high dimensional) linear program. Both problems will be further discussed in the Sections 4.5 and 4.6. In Section 4.4, we derive upper and lower bounds on the optimal value of (4.12) and discuss some particular cases allowing for an explicit solution.
4.4
Bounds and Specific Solutions
In this section, we study a specific solution for problem (4.12) in the case of the closed set discrepancy αB = αBcl . Although not necessarily optimal, this solution allows us to establish universal bounds for the optimal value of (4.12) in case of general discrepancies. We say a bound is “universal” whenever it depends on the probabilities pi of the initial measure P, but not on the geometry or the dimension of supp P. Hence, in contrast to the exact solution of (4.12), these bounds are very easy to compute for a quite general class of discrepancies.
4.4.1
Ordered Solution and Upper Bound
Intuitively, approximating the initial measure P by some other measure Q with supp Q ⊂ supp P requires well to approximate those supporting points of P that have large probability. In this section, we assume, without loss of generality, that p1 ≥ · · · ≥ pN . Then, a naive idea for solving (4.12) would be to put in the definition (4.13) of Q: η j ξ j , j = 1, . . . , n, qj pj , j = 1, . . . , n − 1, N pi . qn
(4.16)
i=n
Thus, supp Q consists of those atoms of P that have the largest probability weights; the assignment of probabilities is adopted from the initial measure except for the n-th atom of Q, where the new probability is modified to make all qj sum up to one. Evidently, this simple approximating probability measure Q is feasible in (4.12). In the following, we refer to Q defined by (4.16) as the ordered solution. This distribution may be a poor approximation
4.4. Bounds and Specific Solutions
107
of P; however, it realizes a universal (with respect to any discrepancy), easy to calculate upper bound on the optimal value of (4.12), which is actually sharp in case of the closed set discrepancy. This is shown by the following proposition and the subsequent corollary. Proposition 4.4.1. We assume, without loss of generality, that p1 ≥ · · · ≥ pN and consider a system B of Borel subsets of Rs . We denote by ΔB the optimal value of the scenario reduction problem (4.12). Then the following properties hold true: ΔB ≤ ΔBcl =
N i=n+1
N
i=n+1
pi ,
(4.17)
pi .
(4.18)
Proof. We define Q in (4.13) as the ordered solution according to (4.16). Let B ∈ B be arbitrary and put C {ξ i : i = n, n + 1, . . . , N }. Since P and Q coincide on {ξ i | i = 1, . . . N } \ C, we obtain |P[B] − Q[B]| = |P[B ∩ C] − Q[B ∩ C] + P[B \ C] − Q[B \ C]| = |P[B ∩ C] − Q[B ∩ C]| and, therefore, we may assume without loss of generality that B ⊂ C. Thus, we can write N |P[B] − Q[B]| = pi δξi (B) + (pn − qn )δξn (B) i=n+1 N N if ξ n ∈ B, i=n+1 pi δξi (B) − i=n+1 pi = N if ξ n ∈ / B. i=n+1 pi δξ i (B) Due to 0≤
N i=n+1
pi δξ i (B) ≤
we get |P[B] − Q[B]| ≤
N i=n+1
N i=n+1
pi ,
pi .
Since B ∈ B was arbitrary, there follows assertion (4.17): ΔB ≤ αB (P, Q) = sup |P[B] − Q[B]| ≤ B∈B
N i=n+1
pi .
108 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances Concerning assertion (4.18), let Q in (4.13) be any discrete measure which is feasible in problem (4.12) with respect to the discrepancy distance αB = αBcl . The feasibility of Q in (4.12) implies that {η 1 , . . . , η n } ⊆ {ξ 1 , . . . , ξ N }. Therefore, η j = ξ ij for certain selections ij ∈ {1, . . . , N } and j = 1, . . . , n. Since B {ξ 1 , . . . , ξ N }\{η 1 , . . . , η n } is a closed set, one derives that B ∈ Bcl and thus: αBcl (P, Q) ≥ |P[B] − Q[B]| pi δξ i (B) + pi δξ i (B) = i∈{i1 ,...,in } i∈{i / 1 ,...,in } n qj δηj (B) − j=1 N = pi ≥ pi , i∈{i / 1 ,...,in }
i=n+1
where in the last inequality the assumed decreasing order of the pi was exploited. As Q was supposed to be arbitrary feasible in (4.12), one gets that N ΔBcl = inf{αBcl (P, Q) : Q is feasible in (4.12)} ≥ pi . i=n+1
Since the reverse inequality (4.18) holds for arbitrary discrepancies, the identity (4.18) follows immediately. Corollary 4.4.2. Every probability measure Q satisfying n Q= qj δξj and qj ≥ pj , j = 1, . . . , n,
(4.19)
j=1
is an optimal solution of problem (4.12) for the closed set discrepancy αBcl . In particular, this is true for the ordered solution defined in (4.16). Proof. On the one hand, (4.18) shows that N i=n+1 pi is the optimal value in (4.12) for the closed set discrepancy αBcl . On the other hand, we may proceed as in the proof of (4.17) to show that for any discrepancy αB , each probability measureQ with (4.19) realizes an objective value in (4.12) which is not larger than N i=n+1 pi . Indeed, we consider such a Q fulfilling (4.19) and an arbitrary B ∈ Bcl to obtain |P[B] − Q[B]| = | pi − qj | i∈{1,...,N }∩B
= |
i∈{1,...,n}∩B
≤
N i=n+1
pi .
j∈{1,...,n}∩B
(pi − qi ) +
i∈{n+1,...,N }∩B
pi |
4.4. Bounds and Specific Solutions
109
The inequality holds since from (4.19) it follows that the sums under the norms have different signs and, furthermore, both | i∈{1,...,n}∩B (pi − qi )| and N i=n+1 pi . i∈{n+1,...,N }∩B pi are less than The corollary shows, that in case of the closed set discrepancy, an explicit solution of problem (4.12) can be found without any effort. Unfortunately, the same this is no longer true for the weaker discrepancies introduced in Section 4.1. Nevertheless, for those other discrepancies too, one may benefit from the estimate (4.17). For instance, from the (ordered) values of the original probabilities pi , one can directly read off the number of atoms n < N required for the approximating measure Q, in order to make the approximation error αB (P, Q) not exceed a prescribed tolerance ε > 0. In the special case of pi = N −1 (i = 1, . . . , N ), one derives the condition n ≥ 1 − ε. N For instance, a tolerance of 10% (ε = 0.1) can be satisfied then, if n is at least 90% of N . Of course, such linear relation between tolerance and size of distribution is not very satisfactory. Indeed, the second assertion of Proposition 4.4.1 tells us, that, in the assumed equi-distributed case, one actually observes this undesirable linear relation for the closed set discrepancy. Consequently, there is some hope, that a better behaviour can be observed for the weaker discrepancies, which are more appropriate for the stability of chance-constrained and mixed-integer stochastic programs (cf. Section 1). This, however, comes at the price that a simple solution of (4.12) is no longer available and, actually, cannot even be obtained computationally for relevant problem sizes in an exact sense. The following example complements Corollary 4.4.2 by showing that the ordered solution is generally not optimal for a discrepancy different from αBcl : Example 4.4.3. We consider the measure P on R defined by ξ 1 = 1, ξ 2 = 3, ξ 3 = 2, ξ 4 = 4;
p1 = p2 = 0.4;
p3 = p4 = 0.1,
i.e., N = | supp P| = 4. We are looking for an optimally reduced measure Q in problem (4.12) with n = | supp Q| = 2. As far as this is done with respect to the closed set discrepancy αBcl , Corollary 4.4.2 implies the optimality of the ordered solution Q with η 1 = 1, η 2 = 3, q1 = 0.4, q2 = 0.6.
110 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances By Proposition 4.4.1, Q realizes the minimal discrepancy ΔBcl = p3 + p4 = 0.2. For the convex set discrepancy αBconv , this ordered solution realizes the same value αBconv (P, Q) = 0.2. However, considering the reduced measure Q∗ defined by η 1 = 1, η 2 = 3, q1 = 0.5, q2 = 0.5, it follows that αBconv (P, Q∗ ) = 0.1. Consequently, the ordered solution is not optimal in (4.12) with respect to αBconv . At the same time, this is an example for a strict inequality in (4.18).
4.4.2
Lower Bound
In this section we derive a universal lower bound on the optimal value of problem (4.12). For this purpose, we will access on the following property. Definition 4.4.4. A system B of Borel subsets of Rs is called “isolating” if for any finite subset {x1 , . . . , xp } ⊆ Rs there exist sets B i ∈ B for i = 1, . . . , p with B i ∩ {x1 , . . . , xp } = {xi }, i = 1, . . . , p. Clearly, the systems Brect , Bconv , Bcl , Bph,W and Bph,k (for k ≥ 2s ) considered in Section 4.1 are isolating, whereas Bcell , for instance, is not. Theorem 5. Let B be an isolating system of Borel subsets of Rs . We assume that the pi are decreasingly ordered and that n < N . Then, the optimal value ΔB of problem (4.12) can be bounded from below by the following estimate: N ΔB ≥ max pn+1 , n−1 pi . i=n+1
Proof. Each measure Q with (4.13) that is feasible in problem (4.12) induces an injective selection mapping σ : {1, . . . , n} → {1, . . . , N } defined by η i = ξ σ(i) ,
i = 1, . . . , n.
Applying the isolating property of B to the support {ξ 1 , . . . , ξ N } of P, we derive the existence of sets B i ∈ B for i = 1, . . . , N such that B i ∩ {ξ 1 , . . . , ξ N } = {ξ i },
i = 1, . . . , N.
Then, P[B σ(i) ] − Q[B σ(i) ] = P[{ξ σ(i) }] − Q[{η i }] = pσ(i) − qi , i = 1, .., n, P[B i ] − Q[B i ] = pi , i ∈ Cσ ,
4.4. Bounds and Specific Solutions
111
where Cσ {1, . . . , N }\{σ(1), . . . , σ(n)}. It follows for the discrepancy that
i i αB (P, Q) ≥ max P[B ] − Q[B ] = max max pi , max pσ(i) − qi . i=1,...,N
i∈Cσ
i=1,...,n
Thereby, the variation of Q among the feasible measures in (4.12) amounts to vary the selection mapping σ and the coefficients qi ≥ 0 subject to the constraints ni=1 qi = 1. This allows us to write ΔB = inf{αB (P, Q) : Q feasible in (4.12)} ≥ inf {ϕ(σ) : σ : {1, . . . , n} → {1, . . . , N } injective} ,
(4.20)
where
ϕ(σ) max max pi , ψ(σ) , i∈Cσ
ψ(σ) inf max pσ(i) − qi : qi ≥ 0
(i = 1, . . . , n) ,
i=1,...,n
n i=1
qi = 1 .
Next, we want to develop the expression for ψ(σ). Since pi > 0 for i = 1, . . . , N and n < N , by assumption, it follows that n pσ(i) < 1. γ i=1
Note that the infimum in the definition of ψ(σ) is always realized as a minimum. We claim that qˆ ∈ Rn defined by qˆi = pσ(i) + n−1 (1 − γ)
(i = 1, . . . , n)
(4.21)
provides this minimum. We have qˆi ≥ 0 for i = 1, . . . , n due to γ < 1 and n q ˆ = 1, hence qˆ is feasible in the definition of ψ(σ). Now, let q ∈ Rn i i=1 be any other feasible n-tuple. Then, by n n qi = qˆi = 1, i=1
i=1
qi
it is excluded that < qˆi holds true for all i = 1, . . . , n. Consequently, there exists some k ∈ {1, . . . , n} relation qˆk ≥ pσ(k) (see with qk ≥ qˆ k . From the (4.21)), one derives that pσ(k) − qk ≥ pσ(k) − qˆk . Thus, max pσ(i) − qi ≥ pσ(k) − qˆk = n−1 (1 − γ) = max pσ(i) − qˆi . i=1,...,n
i=1,...,n
This shows that indeed qˆ realizes the infimum in the definition of ψ(σ) and so, by (4.21) and by definition of Cσ , one gets that n pσ(i) ) = n−1 pi . ψ(σ) = n−1 (1 − γ) = n−1 (1 − i=1
i∈Cσ
112 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
Now, we continue (4.20) as
ΔB ≥ inf max max pi , n−1 i∈Cσ
i∈Cσ
pi
:
σ : {1, . . . , n} → {1, . . . , N } injective . Identifying the set of all selections as given in this relation with the system of all subsets of {1, . . . , N } having cardinality n, one obtains the reformulation
ΔB ≥ inf max max pi , n−1 pi : A ⊆ {1, . . . , N }, #A = N − n . i∈A
i∈A
As the pi are decreasingly ordered, both expressions pi max pi and n−1 i∈A
i∈A
are simultaneously minimized by the set A∗ {n + 1, . . . , N }. Therefore,
N −1 max pi , n pi . ΔB ≥ max i∈{n+1,...,N }
i=n+1
Now, the assertion of the theorem follows immediately from max{pi : i ∈ {n + 1, . . . , N }} = pn+1 . Remark 4.4.5. The lower bound from Theorem 5 can be interpreted as follows. Let us consider an arbitrary reduced measure Q. Since the system B is isolating, the B-discrepancy between P and Q is not smaller than the maximal difference of P and Q on a singleton. Over all common mass points of P and Q, this maximum is at least n−1 N i=n+1 pi , over all points without Q-mass it is not less than pn+1 . Corollary 4.4.6. Under the assumptions of Theorem 5, the following holds true: (1). If n ≥
N , 2
then the lower bound in Theorem 5 reduces to pn+1 .
(2). If n = 1, then ΔB = 1−p1 and an optimal solution of (4.12) is given by the measure Q = δξ 1 , placing unit mass on the atom realizing maximum probability with respect to the initial original measure P. If n = N − 1, then ΔB = pN , and any measure Qj of the form N −1 Qj = pi δξ i + (pj + pN )δξj , j ∈ {1, . . . , N − 1}, i=1,i =j
is an optimal solution of (4.12).
4.5. The Inner Problem Proof. If n ≥
N , 2
113
the decreasing order of the pi implies npn+1 ≥ (N − n)pn+1 ≥
N i=n+1
pi .
Applying this estimate to the lower bound of Theorem 5 immediately yields the asserted property (1). NIn both cases n = 1 and n = N − 1, Theorem 5 provides that ΔB ≥ i=n+1 pi . Now, the upper bound in Proposition 4.4.1 turns this inequality into an equality:
N 1 − p1 if n = 1, pi = ΔB = i=n+1 pN if n = N − 1. From the proof of (4.17) we know that the ordered solution always realizes a discrepancy not larger than N i=n+1 pi , where this last value was just recognized to be optimal for n = 1 and n = N − 1. Consequently, the ordered solution must be optimal in these cases. For n = 1, the ordered solution places unit mass on the atom with highest probability in the original measure P. For n = N − 1, the ordered solution corresponds to the measure QN −1 . Moreover, by construction any measure Qj , j ∈ {1, . . . , N − 1} realizes a discrepancy of pN and is therefore optimal, too. Unfortunately, the results in Corollary 4.4.6 are lost for the cell discrepancy αBcell , which is not isolating. This is shown by the following simple example. 3 2 Example 4.4.7. Consider the probability measure P = i=1 pi δξ i on R 1 2 3 with ordered probabilities p1 ≥ p2 ≥ p3 and ξ = 1, ξ = 0, ξ = 2. For n = 1 we obtain ΔBcell = αBcell (P, δξ 1 ) = p2 , which shows that the assertion of Corollary 4.4.6 does not hold in this case.
4.5
The Inner Problem
In this section, we consider the inner iteration problem (4.15) of optimizing the reduced probability distribution conditional to a fixed (reduced) support. Without loss of generality, we may assume2 {η 1 , . . . , η n } = {ξ 1 , . . . , ξ n }. Then, problem (4.15) is of the form: minimize αB (P, ({ξ 1 , . . . , ξ n }, q)) subject to q ∈ Sn .
(4.22)
2 The assumption of ordered probabilities pi from the previous section is no longer maintained here.
114 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
4.5.1
Critical Index Sets
In the following, we will reformulate (4.22) as a linear optimization problem. To this end, we define for B ∈ B the critical index set I(B) ⊆ {1, . . . , N } by I(B) {i ∈ {1, . . . , N } : ξ i ∈ B}. Then we can write
|P[B] − Q[B]| = pi − qj . i∈I(B) j∈I(B)∩{1,...,n}
(4.23)
Obviously, this value does not depend on the concrete structure of the set B but is uniquely determined by the index set I(B). Consequently, for calculating the discrepancy αB (P, Q), it suffices to know all critical index sets which may occur when B varies in B. We define the system of critical index sets as IB {I(B) ⊆ {1, . . . , N } : B ∈ B}. (4.24) It is easy to see that for the closed set discrepancy one has IBcl = 2{1,...,N } . However, for the other systems B considered in Section 4.1 the inclusion IB ⊂ 2{1,...,N } will be strict, in general. As soon as for some concrete B the system IB of critical index sets is known, the discrepancy between P and Q may be written according to (4.23) as pi − qj . αB (P, Q) = max (4.25) I∈IB i∈I j∈I∩{1,...,n} We recall the well-kown fact that minimizing a function |f (x)| in terms of the variable x is equivalent to minimizing the function t subject to the constraints f (x) ≤ t and −f (x) ≤ t in terms of the variables (x, t). This allows to write (4.22) in terms of the following linear optimization problem: minimize t subject to q ∈ Sn , − j∈I∩{1,...,n} qj ≤ t − i∈I pi I ∈ IB . ≤ t + i∈I pi j∈I∩{1,...,n} qj
(4.26)
The variables to be optimized here are t and qj , j = 1, . . . , n. If (q∗ , t∗ ) is an optimal solution of (4.26), then q ∗ is an optimal solution of the original problem (4.22) and t∗ indicates the optimal value attained by q ∗ in (4.22). Unfortunately, the dimension of (4.26) is too large to allow for a numerical solution, in general. Indeed, since IBcl = 2{1,...,N } , as observed above, the number of constraints in (4.26) may amount up to 2N +1 + n + 1.
4.5. The Inner Problem
4.5.2
115
Reduced Critical Index Sets
While the large number of constraints in (4.26) is a drawback for a direct numerical approach, we observe that the left sides of many inequalities may coincide. Indeed, many different index sets I ∈ IB may lead to the same intersection I ∩ {1, . . . , n}, and the only term which varies then for those sets I, is the right-hand side of the inequalities in (4.26). Consequently, one may pass to the minimum of these right-hand sides corresponding to one and the same intersection I ∩ {1, . . . , n}. To this end, we introduce the reduced system of critical index sets as IB∗ {I ∩ {1, . . . , n} : I ∈ IB }.
(4.27)
Each member J ∈ IB∗ of the reduced system generates a set ϕ(J) of members in the original system IB all of which share the same intersection with {1, . . . , n}: ϕ(J) {I ∈ IB : J = I ∩ {1, . . . , n}}, Now, introducing the quantities γ J max pi and γJ min pi I∈ϕ(J)
i∈I
I∈ϕ(J)
J ∈ IB∗ .
(4.28)
(J ∈ IB∗ ),
(4.29)
i∈I
problem (4.26) may be rewritten as minimize t subject to q ∈ Sn , − q ≤ t − γJ j∈J j J ∈ IB∗ . ≤ t + γJ j∈J qj
(4.30) (4.31)
This corresponds indeed to passing to the minimum on the right-hand sides of the inequalities in (4.26). Since IB∗ is a subset of {1, . . . , n}, the number of inequalities in (4.30) is not larger than 2n+1 + n + 1. Having in mind that often n N holds in scenario reduction problems, this results in a drastic reduction of size in the linear optimization problem (4.26). A Sufficient Optimality Condition A condition on q ∈ Sn ensuring the optimality in (4.30) can be obtained by the following considerations. The linear constraints for each J ∈ IB∗ imply that every feasible t of problem (4.30) satisfies t ≥ 12 (γ J − γJ ) and, thus, one obtains the lower bound 1 max(γ J − γJ ) ≤ inf αB (P, ({ξ 1 , . . . , ξ n }, q)). q∈Sn 2 J∈IB∗
116 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances Equality holds if and only if there exists a q ∗ ∈ Sn satisfying the inequalities in (4.30) with t being equal to this threshold, i.e. γJ −
1 1 max∗ (γ J − γJ ) ≤ qj∗ ≤ max∗ (γ J − γJ ) + γJ j∈J 2 J∈IB 2 J∈IB
(J ∈ IB∗ ). (4.32)
Consequently, such a q ∗ would be optimal for (4.30). In particular, evaluation of (4.32) in J∗ ∈ argmaxJ∈IB∗ (γ J − γJ ) implies
4.5.3
1 qj∗ = (γ J∗ + γJ∗ ). j∈J∗ 2
Determining the Coefficients
The main challenge in the solution of (4.30) is not the solution of the linear program itself but the computational determination of the reduced critical index set IB∗ and of the coefficients γ J and γJ introduced in (4.29). These terms depend on chosen system B of Borel subsets, and, to the best of our understanding, there is no general approach available for their determination. In the following, we show that in case of the polyhedral discrepancy αBph,W the values IB∗ , γ J , and γJ of problem (4.30) can be related to a class of supporting polyhedra, which is, at least to some degree, numerically accessible. In particular, our approach also covers the cell discrepancy αBcell and the rectangular discrepancy αBrect . Given the recourse matrix W , we consider k pairwise linearly independent vectors m1 , . . . , mk in Rs such that every facet of pos W and of the unit cube [0, 1]s is normal in relation to mi for one (and only one) i ∈ {1, . . . , k}. The (k × s)−matrix whose rows are given by m1 , . . . , mk is denoted by M . Then, by definition of Bph,W , every polyhedron B in Bph,W can be written as
¯B =: aB , a ¯B (4.33) B = x ∈ Rs : aB ≤ M x ≤ a ¯ k with aB ≤ a ¯B ∈ R ¯B for suitable vectors aB , a i i for i = 1, . . . , k. While the system Bph,W is not countable, we will show that it suffices to consider the following finite system of supporting polyhedra. Loosely speaking, a polyhedron B ∈ Bph,W is called supporting if each of its facets contains an element of {ξ 1 , . . . , ξ n } in a way such that B can not be enlarged without changing the intersection of B’s interior with {ξ 1 , . . . , ξ n }, see Figure 4.2. This is formalized by the following definitions. We introduce the sets
Rj mj , ξ i , i = 1, . . . , n ∪ {∞, −∞} , j = 1, . . . , k, and R ×kj=1 Rj .
(4.34)
4.5. The Inner Problem
117
Then, every polyhedron B = aB , a ¯B ∈ Bph,W with ¯B ∈ R aB , a
(4.35)
admits a further representation by two vectors i = (i1 , . . . , ik ), ¯i = (¯i1 , . . . , ¯ik ) with ij , ¯ij ∈ {1, . . . , n, ±∞} for j = 1, . . . , k: ¯
j ij j ij ¯B aB j = m , ξ and a j = m , ξ ,
where we set for notational convenience mj , ξ ±∞ ±∞, respectively. Condition (4.35) ensures that every hyperplane in Rs including a facet of B also contains an element of {ξ 1 , . . . , ξ n }. Definition 4.5.1. A polyhedron B = aB , a ¯B ∈ Bph,W with (4.35) is called “supporting”, if it admits a representation i, ¯i, such that for every j, l ∈ {1, . . . , k} , j = l, the following relations hold true: ¯
mj , ξ ij < mj , ξ il < mj , ξ ij whenever il = ±∞, and ¯ ¯ mj , ξ ij < mj , ξ il < mj , ξ ij whenever ¯il = ±∞.
(4.36)
The set of all supporting polyhedra is denoted by P.
Figure 4.2: Left side: A non-supporting polyhedron. Right side: A supporting polyhedron. The dots represent the remaining scenarios ξ 1 , . . . , ξ n . The following proposition shows that every critical index set J corresponds to a supporting polyhedron B whose interior does not contain any ξ i with i ∈ {1, . . . , n} \ J and which has maximal P-probability.
118 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances Proposition 4.5.2. For any J ∈ IB∗ ph,W there exists a supporting polyhedron B such that γ J = P[int B] and ∪j∈J {ξ j } = {ξ 1 , . . . , ξ n } ∩ int B.
(4.37)
Proof. We consider an arbitrary J ∈ IB∗ ph,W . From the definition of ϕ(J) in (4.28) it follows that for any I ∈ ϕ(J) there exists some C ∈ Bph,W such that I = I(C) and J = I(C) ∩ {1, . . . , n}. By definition of I(C) we have pi = pi = P[C], i∈I
and, hence, γ J = max I∈ϕ(J)
i∈I(C)
pi
i∈I
= max {P[C] : C ∈ Bph,W , J = I(C) ∩ {1, . . . , n}}
= max P[C] : C ∈ Bph,W , C ∩ {ξ 1 , . . . , ξ n } = ∪j∈J {ξ j }
¯ k , a, a = max P[a, a ¯] : a, a ¯∈R ¯ ∩ {ξ 1 , . . . , ξ n } = ∪j∈J {ξ j } .(4.38) ¯(0) be a polyhedron attaining the maximum, i.e., we have γ J = Let a(0) , a (0) (0) 1 n j P[a , a ¯ ] and a (0) , a ¯(0) ∩ {ξ , . . . , ξ } = ∪j∈J {ξ }. In addition, due to 1 N finiteness of the set ξ , . . . , ξ , we can assume that these identities are also valid for inta(0) , a ¯(0) , i.e., γ J = P[inta(0) , a ¯(0) ] and ¯(0) ∩ {ξ 1 , . . . , ξ n } = ∪j∈J {ξ j }. inta(0) , a
(4.39) (4.40)
In the following, we will enlarge a(0) , a ¯(0) by succesively shifting its facets until it becomes supporting. To this end, we put (0)
(0)
(0)
a(t) (a1 − t, a2 , . . . , ak ) , t ≥ 0, and consider the polyhedral enlargement a(t), a ¯(0) of a(0) , a ¯(0) . We set
¯(0) ∩ ξ 1 , . . . , ξ n τ sup t ≥ 0 : inta(t), a
. ¯(0) ∩ ξ 1 , . . . , ξ n = inta(0) , a (0)
In particular, τ = ∞ holds if a1 = −∞, and whenever τ = ∞ we define i1 −∞. If τ < ∞ there exists an index i1 ∈ {1, . . . , n} \ J such that (0) m1 , ξ i1 = a1 − τ and (0)
(0)
¯j for j = 1. aj < mj , ξ i1 < a
4.5. The Inner Problem
119
Indeed, this is true because one can find a ξ i1 with i1 ∈ {1, . . . , n} \ J that is contained in the interior of a(τ + ε), a ¯(0) for ε > 0. We put a(1) a(τ ) and consider now, in the second step, the enlarged polyhedron a(1) , a ¯(0) that still fulfils the identity (4.40). Having in mind that a(0) , a ¯(0) is maximizing, we conclude that a(1) , a ¯(0) fulfils the identity (4.39), too. We repeat the above construction for the coordinate a ¯(0) by defining (0)
(0)
(0)
¯2 , . . . , a ¯k ) , t ≥ 0, a ¯(t) (¯ a1 + t, a
τ sup t ≥ 0 : inta(1) , a ¯(t) ∩ ξ 1 , . . . , ξ n
, and ¯(0) ∩ ξ 1 , . . . , ξ n = inta(1) , a a ¯(1) a ¯(τ ). ¯
(0)
Again, if τ < ∞, there exists ¯i1 ∈ {1, . . . , n} such that m1 , ξ i1 = a ¯1 + τ ¯ (0) (0) ¯j for j = 1. Otherwise, we put ¯i1 = ∞. and aj < mj , ξ i1 < a Continuing the construction in this way for the coordinates 2, . . . , k, we arrive at the polyhedron B a(k) , a ¯(k) with (4.35) and (4.37), and obtain ¯ (k) (j) (k) (j) ¯ indices il , il for l = 1, . . . , k with aj = aj = mj , ξ ij , a ¯j = a¯j = mj , ξ ij . Furthermore, we obtain the estimates (l−1)
(l−1)
aj
< mj , ξ il < aj
(l−1) aj
< m , ξ < j
¯il
(l−1) aj
for j = l whenever il = ±∞, and for j = l whenever ¯il = ±∞. (l−1)
These inequalities remain valid for the final vectors a(k) , a ¯(k) , due to aj ≥ (k) (l−1) (k) ¯j ≤a ¯j for j = 1, . . . , k. Thus, (4.36) holds and B is supporting. aj and a Since both (4.39) and (4.40) remain valid during the construction of B, it follows that B possesses the asserted properties. Corollary 4.5.3. The following identities hold:
IB∗ ph,W = J ⊆ {1, . . . , n} : ∃B ∈ P such that (4.37) holds true ,
∀J ∈ IB∗ph,W . γ J = max P[int B] : B ∈ P, (4.37) holds true Proof. The inclusion ‘⊆’ in the first identity and the inequality ‘≤’ in the second identity follow directly from Proposition 4.5.2. For the reverse direc¯B ∈ P be given such that (4.37) tion of the first identity, let B = aB , a holds true for some J ⊆ {1, . . . , n}. Due to the finiteness of {ξ 1 , . . . , ξ n } there exists an ε > 0 such that ε, a ¯B −¯ ε = {ξ 1 , . . . , ξ n }∩int aB , a ¯B = ∪j∈J {ξ j }, (4.41) {ξ 1 , . . . , ξ n }∩aB +¯
120 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances where each entry of the vector ε¯ ∈ Rk is equal to ε. Since aB + ε¯, a ¯B − ε¯ ∈ Bph,W , we observe ¯B − ε¯ = {i ∈ {1, . . . , N } : ξ i ∈ aB + ε¯, a ¯B − ε¯} I aB + ε¯, a ¯B − ε¯}. = J ∪ {i ∈ {n + 1, . . . , N } : ξ i ∈ aB + ε¯, a Therefore, ¯B − ε¯ ∩ {1, . . . , n} = J ∩ {1, . . . , n} = J, I aB + ε¯, a
(4.42)
which provides J ∈ IB∗ ph,W via the definition of IB∗ph,W . Consequently, also the inclusion ‘⊇’ in the first assertion of the corollary holds true. To verify the relation ‘≥’ in the second assertion, let J ∈ IB∗ ph,W and B = aB , a ¯B ∈ P be arbitrary, such that (4.37) holds true. We now choose an ε > 0 with ¯B − ε¯ = {ξ 1 , . . . , ξ N } ∩ int aB , a ¯B (4.43) {ξ 1 , . . . , ξ N } ∩ aB + ε¯, a B ¯B − ε¯ ∈ ϕ(J) (see (4.28)) and and conclude (4.42). Therefore, I a + ε¯, a
γJ ≥
pi
i∈I(aB +¯ ε,¯ aB −¯ ε)
=
P[{ξ i }]
ξ i ∈aB +¯ ε,¯ aB −¯ ε
= P aB + ε¯, a ¯B − ε¯ = P[intaB , a ¯B ], where the last identity follows from (4.43). This proves the inequality ‘≥’ in the second assertion of the corollary, because aB , a ¯B ∈ P was chosen arbitrarily such that (4.37) holds true. Corollary 4.5.3 shows that the set IB∗ ph,W and the coefficients γ J for J ∈ IB∗ph,W can be determined, as soon as the system of P of supporting polyhedra is known. It is pointed out by the following proposition how the coefficients γJ for J ∈ IB∗ ph,W may be computed. Proposition 4.5.4. For all J ∈ IB∗ ph,W , one has γJ = i∈I pi , with I given by I i ∈ {1, . . . , N } : minml , ξ j ≤ ml , ξ i ≤ maxml , ξ j , l = 1, . . . , k . j∈J
j∈J
4.5. The Inner Problem
121
Proof. We consider an arbitrary J ∈ IB∗ph,W . Completely analogous to (4.38) in the proof of Proposition 4.5.2, it follows that ¯] : a, a ¯ ∩ {ξ 1 , . . . , ξ n } = ∪j∈J {ξ j } . (4.44) γJ = min P[a, a We define a∗ , a ¯∗ ∈ Rk by a∗l minml , ξ j for l = 1, . . . , k, and j∈J
a ¯∗l
maxml , ξ j for l = 1, . . . , k, j∈J
to obtain ξ j ∈ a∗ , a ¯∗ for all j ∈ J and, therefore, ∪j∈J {ξ j } ⊆ a∗ , a ¯∗ ∩ 1 n {ξ , . . . , ξ }. If this inclusion would be strict, there would be some i ∈ {1, . . . , n}\J such that ξ i ∈ a∗ , a ¯∗ . From J ∈ IB∗ph,W it then follows the existence of some B ∈ Bph,W with J = I(B) ∩ {1, . . . , n}. Thus, we obtain ξ j ∈ B for all j ∈ J, entailing that a∗ , a ¯∗ ⊆ B by construction of a∗ , a ¯∗ . i We derive that ξ ∈ B and, hence, i ∈ I(B). On the other hand, i ∈ {1, . . . , n}\J, which is a contradiction. It follows that ∪j∈J {ξ j } = a∗ , a ¯∗ ∩ {ξ 1 , . . . , ξ n } and, thus, γJ ≤ P[a∗ , a ¯∗ ]. On the other hand, consider arbitrary a and 1 ¯ ∩ {ξ , . . . , ξ n } = ∪j∈J {ξ j }. Then, ξ j ∈ a, a ¯ for all j ∈ J, a ¯ with a, a and by construction of a∗ , a ¯∗ it follows that a∗ , a ¯∗ ⊆ [a, a ¯]. Consequently, P[a∗ , a ¯∗ ] ≤ P[a, a ¯] holds. Passing to the minimum over all such a, a ¯ and applying identity (4.44) provides P[a∗ , a ¯∗ ] ≤ γJ . Finally, the assertion follows from γJ = P[a∗ , a ¯∗ ] = pi = pi . ξ i ∈a∗ ,¯ a∗
i∈I
Corollary 4.5.3 and Proposition 4.5.4 allow us to develop an algorithm to calculate the coefficients of (4.30) and, thus, to solve the inner problem (4.15). This is shown in the following section.
4.5.4
Optimal Redistribution Algorithm
In the previous section, we have seen that the coefficients of the inner problem (4.15), i.e. IB∗ ph,W and γ J , γJ for J ∈ IB∗ph,W , can be determined by iterating through the set P of supporting polyhedra. To this end, we have to determine the set R defined by (4.34). Thus, given the matrix W , we have to identify
122 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
a normal vector for each facet of the convex cone pos W . Determining these normal vectors corresponds to a transformation of a vertices-based representation of pos W to a representation based on halfspaces. This transformation is a well-studied problem for which efficient algorithms are available, e.g., the double description method proposed by Motzkin et al. (1953) and the reverse search algorithm due to Avis and Fukuda (1992). The supporting polyhedra are defined by the conditions (4.35) and (4.36). While (4.35) suggests that the number of potential supporting polyhedra k equals n+1 , the majority of these candidates can be excluded by consid2 ering condition (4.36). In order to do so efficiently, we propose a recursive ¯] = [(aj )kj=1 , (¯ aj )kj=1 ] approach. More precisely, a supporting polyhedra [a, a is constructed recursively for j = 1, . . . , k, while ensuring at every step j that condition (4.36) is still fulfilled when the j-th coordinate is added. This recursion is carried out by the function Iterate of Algorithm 4.1. Our numerical experiences show that the main effort in Algorithm 4.1 is spent for the determination of the supporting polyhedra, and, compared to this, the time consumed by the solution of the linear program in Step [2] is negligible. Therefore, the complexity of the algorithm is mainly determined k by the number of potential supporting polyhedra, i.e., by the value n+1 . 2 This suggests that the practical value of the algorithm is limited to small values of k (and s) and moderate cardinality n of the reduced support. Since k ≥ s, the scenario reduction problem with respect to the polyhedral discrepancy αBph,W is numerically more demanding than a reduction with respect to the rectangle discrepancy αBrect . In Section 4.8.3 we establish an estimate that sometimes may allow to use the latter distance instead of αBph,W . On the other hand, the computational effort of Algorithm 4.1 is basically linear in N = | supp P|. This is easy to see, since the impact of N on Algo¯ Hence, rithm 4.1 is limited on the effort of updating and evaluating the set I. it seems possible to consider larger values of N . We conclude this section with two remarks. Remark 4.5.5. By setting k = s, Algorithm 4.1 may be used for optimal redistribution with respect to the rectangular discrepancy. Setting further within the function Iterate al = −∞ for l = 1, . . . , s, amounts to an optimal redistribution with respect to the cell discrepancy. However, numerical experiences show that in case of the cell discrepancy a slightly different version of Algorithm 4.1, as proposed by Henrion et al. (2009), may sometimes perform better. Indeed, the of the approach of Henrion et al. complexity , whereas the running time of Algorithm 4.1, (2009) mainly depends on n+s s applied on αBcell , behaves like (n + 1)s .
4.5. The Inner Problem
123
Algorithm 4.1. Optimal redistribution. Step [0]: Step [1]: Step [2]:
Put IB∗ ph,W = {∅}, and γ J = 0 for all J ⊆ {1, . . . , n}. Set J = {1, . . . , n} and I¯ = {1, . . . , N }. ¯ Call Iterate(0, 0, 0, J, I). With the additional data IB∗ ph,W and γJ , γ J for all J ∈ IB∗ ph,W solve the linear optimization problem (4.30).
—————————————————————————————– ¯ : Function Iterate(l, (ij )lj=1 , (¯ij )lj=1 , J, I) IF l = k THEN ¯ and RETURN. call UpdateData((ij )lj=1 , (¯ij )lj=1 , J, I) Set l = l + 1. FOR il = 1, . . . , n + 1 and ¯il = 1, . . . , n + 1 : IF (4.36) does not hold for some j ∈ {1, . . . , l − 1}, i = l, or for some i ∈ {1, . . . , l − 1}, j = l, THEN CONTINUE ’FOR’. ELSE set ml , ξ il if il ∈ {1, . . . , n} , al = −∞ if il = n + 1, ¯ ml , ξ il if ¯il ∈ {1, . . . , n} , a ¯l = +∞ if ¯il = n + 1. ¯l )}. Update I¯ = I¯ ∩ {i ∈ {1, . . . , N } : al < ml , ξ i < a IF I¯ = ∅ THEN CONTINUE ’FOR’. ¯ Update J = J ∩ I. ¯ Call Iterate(l, (i )l , (¯ij )l , J, I). j j=1
j=1
END(FOR). RETURN.
—————————————————————————————– ¯ : Function UpdateData((ij )kj=1 , (¯ij )kj=1 , J, I) IF J ∈ / IB∗ ph,W THEN
Update IB∗ ph,W = IB∗ ph,W ∪ {J}. pi with Set γJ = i∈I
I = {i ∈ I¯ : minml , ξ j ≤ ml , ξ i ≤ maxml , ξ j , l = 1, . . . , k}. j∈J
END(IF). Update γ J = max{γ J ,
j∈J
i∈I¯
RETURN.
pi }.
124 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
Remark 4.5.6. When using Algorithm 4.1 repeatedly with a varying support, e.g., within one of the algorithms to the outer problem (4.14) proposed in the next section, it would be desirable to minimize the numerical complexity by using some of the data γJ , γ J for J ∈ IB∗ph,W for several support sets. While this is indeed possible within the approach of Henrion et al. (2009) with regard to the cell discrepancy, their approach can not be carried forward, to the best of our understanding, to Algorithm 4.1, since γ J and γJ ¯ within are determined simultaneously with the set J ∈ IB∗ ph,W (by the set I) Algorithm 4.1.
4.6
The Outer Problem
The outer problem (4.14), i.e., determining an optimal support for the measure Q, is related to k-means clustering problems being known to be N P -hard (cf. Garey and Johnson (1979)). Whenever we do not consider a discrepancy but a Fortet-Mourier metric, the inner problem (4.15) can be solved analytically and the outer problem (4.14) can be formulated as a mixed-integer linear program, cf. Heitsch (2007). Then, for moderate values of N and n, the latter problem can be solved numerically by available optimization software. However, in general one has to resort to heuristic approaches to solve problem (4.14) only approximately. For this purpose, certain Forward Selection and Backward Reduction techniques are developed by Heitsch and Römisch (2003).
4.6.1
Revising Heuristics
In the context of discrepancy distances, the forward selection determines recursively for i = 1, . . . , n, a set J [i] of i scenarios, such that by passing from J [i−1] to J [i] the discrepancy to P is maximally decreased. Algorithm 4.2. Forward Selection. Step [0]: Step [i]:
J [0] ∅ . li ∈ argmin inf αB P, ({ξ l1 , . . . , ξ li−1 , ξ l }, q) , l ∈J [i−1] q∈Si
[i]
Step [n+1]:
J J [i−1] ∪ {li }. Minimize αB P, ({ξ l1 , . . . , ξ ln }, q) subject to q ∈ Sn .
Analogously, the backward reduction determines sets J [N −i] of N −i scenarios, i = 1, . . . , N −n, such that by passing from J [N −i] to J [N −i−1] the discrepancy to P is minimally increased.
4.6. The Outer Problem
125
Algorithm 4.3. Backward Reduction. J [0] {1, . . . , N }. ui ∈ argmin inf αB P, ξ j : j ∈ J [i−1] \ {u} , q
Step [0]: Step [i]:
[i]
u∈J [i−1] q∈SN −i J [i−1] \ {ui } .
J Minimize αB P, ξ j : j ∈ J [N −n] , q s.t. q ∈ Sn .
Step [N-n+1]:
Although both algorithms do not lead to optimality in (4.12) in general, their implemented analogues for Fortet-Mourier metrics have been shown to be very fast and to provide nearly optimal results, cf. Chapter 7 of Heitsch (2007). 1 0.8 0.6 0.4 0.2 0
5
10 n
15
20
Figure 4.3: Rectangular discrepancies of Forward Selection (solid line), Backward reduction (dashed line) and a complete enumeration (dots) of all possible support sets, depending on the number n of remaining scenarios. The initial measure consists of N = 20 equally weighted points in R2 . Dealing with discrepancy distances, we face different problems in solving the combinatorial problem (4.14). On the one hand, discrepancies do not allow, to the best of our understanding, a reformulation of (4.14) as a mixedinteger linear program that would be accessible by available solvers. On the other hand, the above-mentioned heuristics do not produce nearly optimal results anymore, see Figure 4.3 and Table 4.3. This can be explained by the following considerations. In general, the maximum in (4.1) may be realized by many different sets B ∈ B, see also Figure 1 of Henrion et al. (2009). Even worse, the addition of any single scenario within a Forward Selection may leave the discrepancy unchanged, and, hence, the minimum in Step [i] of Algorithm 4.2 may be realized by every l ∈ / J [i−1] . Consequently, the
126 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances 1 56 46 36 26 16 0
13
12 0
1
2
3
4
5
6
7
1 56 46 36 26 16 0
13
13 0
1
2
3
4
5
6
7
1 56 46 36 26 16 0
13
14 0
1
2
3
4
5
6
7
Figure 4.4: Cumulative distribution functions of the initial measure (bold) and of the approximation measure (dashed) of Example 4.6.1 after Step [1] (left) and Step [2] (center) of the Forward Selection. The graph on the right shows what happens if scenario ξ 2 is chosen in Step [2].
above heuristics may sometimes propose arbitrary scenarios obviously being not optimal for further support expansion or reduction. This is illustrated by Example 4.6.1. It turns out that this effect may be attenuated by the approach detailed in Section 4.8.2. Example 4.6.1. We set N = 6 and consider the scenarios ξ i = i, i = 1, . . . , N, with pi = P[{ξ i }] = 16 for every i. Furthermore, we consider the cell discrepancy αBcell , i.e., the supremum distance between the cumulative distribution functions. Let us now apply the Forward Selection Algorithm 4.2 in order to determine a measure Q supported by n = 3 scenarios and realizing a minimal cell discrepancy to P. Let us assume that whenever in Step [i] the minimum is realized by several indices l ∈ / J [i−1] , li is set to the largest index among them. The distribution functions of P and Q and their supremum distance in the course of the Forward Selection are sketched in Figure 4.4. In Step [1], we choose scenario ξ 4 realizing a minimal discrepancy value of 0.5. In Step [2], we may reduce the discrepancy to 13 by adding one of the scenarios ξ 1 , ξ 2 , or ξ 3 . Denote that, although we add ξ 3 , ξ 2 would be a better choice. Consequently, in Step [3], the discrepancy can not fall under the threshold of 13 . Choosing ξ 2 in Step [2] would allow to decrease the discrepancy in Step [3] to 14 . Besides their nonsatisfying results, both the Forward Selection and the Backward Reduction algorithms are significantly slower in case of discrepancy distances. This is due to the fact that the inner problem (4.15) that has to be solved very often within these heuristics is numerically more demanding than in case of Fortet-Mourier metrics. Therefore, heuristics to the outer problem (4.14) that require to solve the inner problem once only may be of particular interest. Such heuristics may be based, e.g., on low discrepancy sequences.
4.6. The Outer Problem
4.6.2
127
A Glimpse on Low Discrepancy Sequences
A sequence of points (xn )n≥1 ⊂ [0, 1]s is called a low discrepancy sequence if the empirical measures Pn ni=1 n1 δxi are “close” to the uniform distribution on [0, 1]s with respect to some discrepancy for all n ≥ 0, see, e.g., Niederreiter (1992). Most of the results on low discrepancy sequences are concerned with the distances αBcell and αBrect that are usually denoted as star discrepancy and extended discrepancy in this framework. The probably best known examples for low discrepancy sequences are the van der Corput sequence, the Halton sequence, and the Sobol sequence. In particular, low discrepancy sequences are the basis for Quasi-Monte Carlo methods to numerical integration. Estimates like the Koksma-Hlawka inequality (cf. Theorem 2.11 of Niederreiter (1992)), provide upper bounds on the approximation error in terms of discrepancies. While low discrepancy sequences are successfully applied to numerical integration, they seem not to be perfectly suitable to scenario reduction problems. On the one hand, it is not completely clear, to the best of our knowledge, how such a low discrepancy sequence, initially designated to be close to the uniform distribution on the unit cube, should be tranformed to approximate an arbitrary probability distribution on Rs . On the other hand, the number of scenarios used within stochastic optimization problems is generally significantly smaller than the number of (discretization) points considered in problems of numerical integration. Although providing optimal convergence rates, low discrepancy sequences do not necessarily yield (small) point sets realizing a minimal discrepancy. However, the fact that the construction of a low discrepancy sequence does not involve repeated solving of the inner problem (4.15) lets appear these sequences attractive for the purpose of scenario reduction with respect to discrepancy distances. In Section 4.7, we report some numerical expericence in applying a low discrepancy sequence to the scenario reduction problem.
4.6.3
A Branch and Bound Approach
While the scenario reduction problem (4.12) does not seem to allow for a reformulation as a (e.g., mixed-integer) optimization model that is tractable by available optimization algorithms, an optimal solution of (4.12) can be found by enumerating all possible supporting sets of cardinality n, at least for moderate values of n and N . To reduce the time needed by such a complete enumeration we suggest in this section an approach that is close to the broad class of Branch and Bound techniques, see Land and Doig (1960) for an early reference.
128 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
Unfortunately, the complexity of (4.14) quickly increases with the dimension of the problem and real-world stochastic optimization problem are of higher dimension, in general. Consequently, the following approach is of academic interest rather than of practical relevance. For an approach that may be numerically more tractable for larger problems, we refer to Example 4.7.2, where we adopted a low discrepancy sequence to tackle the combinatorial problem (4.14). In the first step of the approach an upper bound α ¯ n on the minimally achievable discrepancy ΔB is computed by applying some heuristics to the outer problem (4.14). Then we make the following consideration. Since the minimal discrepancy achievable by m points decreases with increasing value of m, an optimal tuple of n < m elements may not be contained in any m-tuple with a discrepancy exceeding α ¯n . Hence, we can pass through some tuples u of m ∈ {n + 1, . . . , N − 1} out of N points to determine (or, to reject) some u with a discrepancy exceeding α ¯ n . Afterwards, to determine the optimal n-tuple, it suffices to evaluate all n-tuples being no subset of any of the rejected tuples u. As soon as we find some n-tuple whose discrepancy falls below the upper bound α ¯n , we can update α ¯ n and postpone the enumeration of n−tuples to repeat the iteration of m−tuples for m ∈ {n + 1, . . . , N − 1} to exclude further m- and n-tuples. This procedure is formalized by Algorithm 4.4. Algorithm 4.4. Branch and Bound. Step [1]: Determine an upper bound α ¯n on ΔB . Set U = ∅. Step [2]: Check “some” m-tuples w with m ∈ {n + 1, . . . , N − 1}: If w ⊂ u for all u ∈ U : Calculate the optimal redistribution given the support w. If the resulting discrepancy αw exceeds α ¯ n : Add w to U . Step [3]: Iterate through all n-tuples w ∈ / U: If w ⊂ u for all u ∈ U : Calculate the optimal redistribution given the support w. ¯ n : Add w to U . If the resulting discrepancy αw exceeds α If αw < α ¯ n : Update α ¯n αw and go to Step [2].
Note that for m ≥ n, m-tuples can be seen as admissible solutions to a relaxation of problem (4.14). Consequently, Algorithm 4.4 can be seen as a
4.6. The Outer Problem
129
Branch and Bound approach with iterated breadth-first search (Step [2]) and depth-first search (Step [3]). The choice of the tested m-tuples within Step [2] is crucial for the performance of the algorithm. In the following we address this question and suggest a heuristic approach for this breadth-first search.
Breadth-First Search Heuristics To ensure an efficient breadth-first search (Step [2] of Algorithm 4.4), the following considerations should be taken into account. On the one hand, the number of n-tuples excluded by an m-tuple with a discrepancy exceeding α ¯ n increases with increasing m. On the other hand, with decreasing value of m it becomes more likely to find m-tuples realizing a large discrepancy. However, the evaluation of all m-tuples becomes quickly too time-consuming when m approaches N/2. Thus, we suggest the following approach for Step [2]. Once having determined a set U from the evaluation of some {N − 1, . . . , i + 1} m-tuples, we can calculate the number of remaining n- and i-tuples, respectively. This is shown by Lemma 4.6.2 below. The costs for the evaluation of a single nor i-tuple, i.e., the time needed for solving the inner problem (4.15), can be (roughly) estimated to be proportional to the number of potential supporting k . We denote the time needed for evaluatpolyhedra, i.e., to the value n+1 2 ing all remaining n- and i-tuples by τnU and τiU , respectively. Whenever we observe τiU ≤ τnU , (4.45) we invest a certain share λ ∈ (0, 1) of the time τnU in the evaluation of some i-tuples, i.e., we evaluate a fraction κ ∈ [0, 1] of all remaining i-tuples such that κ · τiU = λ · τnU . This evaluation yields some i-tuples with a discrepancy larger than α ¯n. Adding these tuples to U entails a set U κ with U κ ⊃ U . We then decide to evaluate all remaining i-tuples if and only if
1 U κ (4.46) (τn − τnU ), τnU ≥ τiU . min κ The right-hand side of (4.46) represents the costs of testing all remaining ituples, the left-hand side can be interpreted as an extrapolation of the benefit of such a test. Using (4.45) and the definition of κ, condition (4.46) can be
130 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
written as κ
τnU ≤ (1 − λ)τnU .
(4.47)
In order to calculate the expected computational costs τ , we have to calculate the number of remaining n- and i-tuples, given a set U of excluding supersets. This can be done by the following formula that is based on the inclusionexclusion principle and can be easily proven by induction over m. Lemma 4.6.2. Consider l = |U | finite sets u1 , . . . , ul and n ∈ N with n < |ui | for i = 1, . . . , l. Then the number of sets of cardinality n being no subset of any ui , i = 1, . . . , l, is given by ( ) ( ) l N |∩i∈I ui | k+1 − (−1) 1{|∩i∈I ui |≥n} · . n n k=1
(4.48)
I⊂{1,...,l} |I|=k
Remark 4.6.3. Since the evaluation of (4.48) requires the determination of 2l − 1 intersections, one could think about using an estimation of (4.48) that is easier to evaluate. Indeed, given an even integer l < l, the value (4.48) can be bounded from below and above by taking the first sum only over k ≤ l and k ≤ l + 1, respectively. However, these so-called Bonferroni inequalities, see, e.g., Galambos and Simonelli (1996), do not entail a sharp estimate of (4.48) since these sums are strongly fluctuating with varying parameter l. Furthermore, such estimates do not lead to a significant speedup in the calculation of (4.48), in general, because the condition 1{|∩i∈I ui |≥n} allows to abort the computation in many cases, anyway. On the other hand, numerical experiences with substituting the term 1{|∩i∈I ui |≥n} in (4.48) by 1{|∩i∈I ui |≥n+j} were encouraging for small values of j. However, we do not pursue this approach further on and evaluate (4.48) only if |U | is smaller than a certain threshold ϑ. The above considerations on the breadth-first search heuristics result in Algorithm 4.5 that details Step [2] of Algorithm 4.4. Algorithm 4.5 is further discussed in the following Remark. Remark 4.6.4. Since the time needed for evaluation of a m−tuple increases ¯ ∈ {n + 1, . . . , N − 1} with m, it is reasonable to introduce the parameter L in Step [2a] whenever n N . The vast majority of the computational time of the Branch and Bound Algorithm 4.4 is needed for the computation of discrepancies. Thus, all computed discrepancies are saved and again taken into account in Step [2c] whenever the upper bound α ¯ n decreases.
4.7. Numerical Experience
131
Algorithm 4.5. Breadth-first search heuristics. Step [2a]: Step [2b]: Step [2c]: Step [2d]: Step [2e]: Step [2f ]: Step [2g]: Step [2h]: Step [2i]: Step [2j]:
¯ ∈ {n + 1, . . . , N − 1} and U ∅. Set i L Set i i − 1. If i = n, proceed with Step [3]. Go through all already evaluated i-tuples: Add tuples with a discrepancy exceeding α ¯ n to U . If all i-tuples have been already evaluated go to [2b]. If |U | ≤ ϑ calculate τiU and τnU . Evaluate a fraction κ = λτnU /τiU of all i-tuples, save the discrepancies, and determine U κ . Update U U κ . If τiU > τnU or κ ≥ 1 go to [2b]. κ If |U κ | ≤ ϑ calculate τnU and check if (4.47) holds. If this is the case, go to [2j]. If |U κ | > ϑ and τiU < σ · τnU go to [2j], else go to [2b]. Evaluate all i-tuples, save their discrepancies, update U . Go to [2b].
In Step [2f], a sample of size κ is taken from the i-tuples. Thereby, κ is updated in terms of the τjU as long as this does not take too much time, i.e. whenever |U | ≤ ϑ. This is verified in Step [2e]. Whenever κ ≥ 1 in [2g], all i−tuples have been evaluated in [2f], thus we can proceed with [2b] and i − 1. When the evaluation of the κ-sample entails |U κ | > ϑ for the first time, we do not estimate the worth of an evaluation of all i-tuples via (4.47). Instead of that, we compare τiU with τnU and decide to compute all i−tuples if this seems to be “comparatively cheap”. This is done in Step [2i].
4.7
Numerical Experience
The algorithms described in the previous sections have been implemented in C++. The halfspace representation of pos W has been determined by including the implementation cdd+ (Fukuda, 2005) of the double description method of Motzkin et al. (1953). The linear optimization problem (4.15) has been solved with ILOG CPLEX 10.0, and the numerical results presented in the following have been realized on a Pentium 4 PC with 3 GHz CPU and 1 GB RAM.
132 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
Optimal Redistribution The number of constraints of the linear program (4.30) essentially depends on the number of critical index sets, and, by Proposition 4.5.2, on the number of supporting polyhedra. In order to illlustrate this dependency, we have generated a number of N independent samples from the uniform distribution on the unit cube [0, 1]s . Out of these N samples we have randomly chosen n points and applied Algorithm 4.1 to determine the corresponding critical index sets. Table 4.1: Number of potential supporting polyhedra
n+1k
, number of support2 ing polyhedra |P|, and number of reduced critical index sets |IB∗ | for n points randomly chosen from N = 100 random points on [0, 1]s .
R3
R4
k 3 4 5 6 7 8 9 4 5 6 7 8 9 10
n=5 |P| |IB∗ | 2 3 3.4 · 10 205 32 5.1 · 104 417 32 7.6 · 105 759 32 1.1 · 107 1267 32 1.7 · 108 2227 32 2.6 · 109 3380 32 3.8 · 1010 5245 32 5.1 · 104 413 30 7.6 · 105 721 30 1.1 · 107 1186 30 1.7 · 108 1888 30 2.6 · 109 2766 30 3.8 · 1010 4548 32 5.8 · 1011 6893 32 n+1k
n+1k 2
5
1.7 · 10 9.2 · 106 5.0 · 108 2.8 · 1010 1.5 · 1012 8.4 · 1013 4.6 · 1015 9.2 · 106 5.0 · 108 2.8 · 1010 1.5 · 1012 8.4 · 1013 4.6 · 1015 2.5 · 1017
n = 10 |P| 1.9 · 103 4.7 · 103 1.1 · 104 2.8 · 104 7.1 · 104 1.4 · 105 2.6 · 105 5.9 · 103 1.7 · 104 4.1 · 104 8.6 · 104 1.5 · 105 2.8 · 105 4.8 · 105
|IB∗ | 405 426 462 654 826 898 898 596 781 928 928 928 1024 1024
k Table 4.1 shows the number of potential supporting polyhedra n+1 , the 2 resulting number of supporting polyhedra |P|, and the number of reduced critical index sets |IB∗ | for different values of N, n, and k. It turns out that the vast majority of the potential supporting polyhedra is not supporting and can be excluded by successively verifying property (4.36) within Algorithm 4.1. Table 4.2 shows the corresponding running times of Algorithm 4.1 and the resulting minimal discrepancy values. The column entitled timeN =300
4.7. Numerical Experience
133
Table 4.2: Running times (in seconds) of Algorithm 4.1 for different problem parameters.
R3
R4
k 3 4 5 6 7 8 9 4 5 6 7 8 9 10
n=5 αB timeN =100 timeN =300 0.66 0.01s 0.02s 0.72 0.01s 0.04s 0.83 0.02s 0.06s 0.83 0.04s 0.08s 0.83 0.08s 0.12s 0.83 0.14s 0.17s 0.83 0.25s 0.27s 0.85 0.01s 0.05s 0.85 0.02s 0.07s 0.85 0.04s 0.09s 0.88 0.07s 0.13s 0.91 0.12s 0.20s 0.93 0.21s 0.31s 0.93 0.35s 0.49s
n = 10 αB timeN =100 timeN =300 0.48 0.15s 0.21s 0.50 0.37s 0.35s 0.52 0.84s 0.57s 0.60 2.34s 1.16s 0.60 6.69s 2.56s 0.62 15.76s 4.69s 0.65 33.51s 9.14s 0.53 0.46s 0.52s 0.54 1.53s 1.16s 0.60 4.34s 2.89s 0.62 9.37s 5.89s 0.65 17.95s 13.10s 0.65 37.04s 27.69s 0.66 70.96s 60.96s
shows the running time for N = 300 initial scenarios. It can be seen that the running time quickly increases with increasing support size n and number of facets k, due to the fact that the number of potential supporting polyhedra k equals n+1 . On the other hand, the dependency of the running time on the 2 initial number of scenarios N appears to be linear. Note that here the values of timeN =300 may be smaller than timeN =100 since the randomly chosen n scenarios differ from those in the (N = 100)-case and may thus lead to less supporting polyhedra. As we would expect, the resulting discrepancy values are increasing in k and decreasing in n. Support Selection Figure 4.5 shows the decrease of the minimal discrepancy and the increase of the running time in the course of the Forward Selection (Algorithm 4.2), where the initial measure on R2 is supported by 10, 000 equally weighted points sampled from a standard normal distribution. This initial measure is approximated by measures having up to 25 atoms. Table 4.3 compares the results of a complete enumeration, the Branch and Bound Algorithm 4.4, and the Forward Selection Algorithm 4.2 in the
1
5000
0.8
4000
0.6
3000
0.4
2000
0.2
1000
0
5
10
15
20
Running time in seconds
Cell Discrepancy
134 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
25
n
Figure 4.5: Cell discrepancy and running time in the course of the Forward Selection Algorithm. Table 4.3: Running times (in seconds) of complete enumeration, the Branch and Bound Algorithm 4.4, and the Forward Selection Algorithm 4.2. The values expressed as a percentage show the gaps between the discrepancies obtained by enumeration and those obtained by the heuristics. The parameters used for the ¯ = min{N − 1, n + 4}, λ = 0.01, σ = 0.3 and ϑ = 50. breadth-first search are L
n 5 7 9 11 13 15
Compl. Enum. 34.11 187.50 459.34 524.13 278.62 65.80
R2 Branch & Bound 14.82 34.12 57.68 26.47 16.88 18.61
Forward Selection 0.18 0% 0.24 20% 0.31 0% 0.38 33% 0.45 50% 0.50 50%
Compl. Enum. 43.61 349.39 1490.33 3089.74 3156.71 1342.47
R4 Branch & Bound 49.39 175.88 385.53 1698.63 778.05 223.15
Forward Selection 0.18 50% 0.31 11% 0.61 14% 1.04 9% 1.80 25% 3.10 25%
case of the cell discrepancy. The initial measures were supported by N = 20 atoms in R2 and R4 , respectively. The values expressed as a percentage specify the relative excess of the minimal discrepancy achieved by Forward Selection over the optimal value. Note that the time needed by a complete enumeration can reduced by up to 95% by the proposed Branch and Bound approach.
4.7. Numerical Experience
135
4
4
2
2
0
0
-2
-2
-3
-2
-1
0
1
2
3
-3
-2
-1
0
1
2
3
Figure 4.6: N = 1, 000 initial scenarios (black points) and n = 50 reduced scenarios (gray points) based on a low discrepancy sequence. Radii of the points are proportional to their probabilities. Left side: uniformly weighted low discrepancy points. Right side: probabilities readjusted with respect to the rectangular discrepancy. Back to Stochastic Programming In the following, we study a chance-constrained and a mixed-integer stochastic program in order to apply the scenario reduction techniques developed above. With regard to the complexity of the inner problem (4.15), we adopt a heuristic for the outer problem (4.14) within that problem (4.15) has to be solved only once. More precisely, we use the first n points of the Halton low discrepancy sequence with the bases 2 and 3 as the support of the approximating measure (for more details on the Halton sequence see, e.g., Chapter 3 of Niederreiter (1992)). Recall that this low discrepancy sequence has constructed to be close to the uniform distribution on the unit cube. In order to approximate the product measures in the following examples, we apply the (right-continuous) inverses of the marginal distribution functions on the Halton points, see Hlawka and Mück (1972). The left side of Figure 4.6 shows the resulting (uniformly weighted) points for n = 50, where the initial measure consists of N = 1000 equally weighted points sampled from the two-dimensional standard normal distribution. The right side of Figure 4.6 shows the probabilities optimally readjusted with respect to the rectangular discrepancy by Algorithm 4.1. This readjustment reduces the discrepancy between the initial and the approximating measure from 0.15 to 0.10. The following chance-constrained problem includes probabilistic lower level restrictions that are relevant, e.g., in form of reservoir constraints in water management (cf. Prékopa and Szántai (1979)) or chemical engineering (cf. Henrion and Möller (2003)).
136 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
cell discrepancy
p=0.5
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2 0
10
20 30 p=0.6
40
50
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2 0
10
20 30 p=0.8
40
50
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2 0
10
20
30
40
50
0
10
20 30 p=0.7
40
50
0
10
20 30 p=0.9
40
50
0
10
20
40
50
Figure 4.7: Relative perturbation of the optimal value
30
|v−˜ v| |v|
of Example 4.7.1, depending on n for forward selection (bold), random sampling (thin), the lds (dashed) and Algorithm 4.1 applied on the lds (dotted).
Example 4.7.1. Consider some constant p ∈ [0, 1], a random vector ξ = (ξ1 , ξ2 ) whose distribution is given by 1000 equally weighted points sampled from a standard normal distribution, and the following minimization problem: min x + y s.t. P ξ ≤ (x, y) ≥ p. (4.49) x,y∈R
Problem (4.49) is of type (1.5) with A2,1 = I2 and h2 (ξ) = ξ. Having in mind the stability of chance constrained stochastic programs discussed in Section 4.2, it is reasonable to approximate the distribution of ξ in terms of the cell discrepancy.
4.7. Numerical Experience
137
10
7.5 Q2
10
5 7.5 5
2.5
Ξ1
2.5 0 0.4
0.2 Ξ2
0 0
Figure 4.8: Recourse function ξ → Q2 (ξ) of Example 4.7.2. We compare the results of different scenario reduction techniques for several values of p. Let us denote the optimal values of the initial and the approximated problem by v and v˜, respectively. In Figure 4.7, the relative v| deviations |v−˜ of the optimal values are shown for ξ being approximated by |v| measures supported by up to n = 50 atoms. The bold line stands for the Forward Selection together with Algorithm 4.1, the thin line represents the average error of 10, 000 random samples of size n. The dashed line results from the approximation based on the Halton low discrepancy sequence ( lds) The dotted line is obtained by readjusting the probabilities of the Halton points with Algorithm 4.1. In most cases, the measures obtained by Forward Selection produce significantly smaller errors than those constructed by random sampling or by the (unadjusted) lds. In many cases the quality of the lds approximation is significantly improved by readjusting the probabilities by a single run of Algorithm 4.1. This last approach is of course much faster than the Forward Selection. On the other hand, there is no clear indication of which method provides better solutions. The different qualities of the various approximation methods can be also observed in case of the following mixed-integer stochastic program. Example 4.7.2. We consider the optimization problem (4.11) with Ξ [0, 10]×[0, 0.5] and the measure P consisting of N = 1000 uniformly weighted points, sampled from the uniform distribution on Ξ. Figure 4.8 shows a plot of the second stage value function Ξ ξ → Q2 (ξ). As in Example 4.7.1,
138 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
polyhedral discrepancy
relative deviaton of optimal values
0.8
0.20
0.6
0.15
0.4
0.10
0.2
0.05 0
10
20
30
40
50
0
10
20
30
40
50
Figure 4.9: Polyhedral discrepancy and relative deviation of the optimal value of Example 4.7.2, depending on the value of n for Forward Selection (bold), random sampling (thin), the lds (dashed) and Algorithm 4.1 applied on the lds (dotted). the measure P has been approximated by up to n = 50 scenarios obtained by random sampling, a low discrepancy sequence, and the Forward Selection. The results are shown in Figure 4.9. Again, the approximation quality is significantly higher for the measures obtained by Forward Selection and the adjusted lds taking into account the particular discontinuities of the value function Q2 .
4.8
Further Results
In this section, we present some further results on the scenario reduction problem with respect to discrepancy distances. Section 4.8.1 points out why we focus on the polyhedral discrepancy αBph,W instead of the extended discrepancy ζ2,ph,W that appears more natural with regard to the stability results discussed in Section 4.2. In Section 4.8.2 we discuss a modification of the above scenario reduction approach in order to account for a weighted sum of a discrepancy and a Fortet-Mourier metric. Finally, we establish in Section 4.8.3 an estimate between the polyhedral and the rectangular discrepancy.
4.8.1
A Note on Extended Discrepancies
With regard to the (Lipschitz-type) stability result (4.8) and the (Höldertype) estimate (4.10), the extended discrepancy
¯ ζ2,Bph,W (P, Q) = sup f (ξ) (P − Q)[dξ] : f ∈ F2 (Ξ), B ∈ Bph,W B
4.8. Further Results
139
appears more natural for mixed-integer programs than the polyhedral discrepancy αBph,W . However, an extension of our scenario reduction approach to ζ2,Bph,W seems to be hardly possible due to the following considerations. Let us consider a system B of Borel subsets and some r ≥ 1. For the set IB of critical index sets I(B), B ∈ B , defined by (4.24), we obtain for two discrete probability measures P and Q with N n P= pi δξ i and Q = qj δξ j , i=1
j=1
that the extended B -discrepancy ζr,B (P, Q) is of the form ¯ r (Ξ), I ∈ IB . ζr,B (P, Q) = sup pi f (ξ i ) − qj f (ξ j ) : f ∈ F i∈I
j∈I∩{1,...,n}
Introducing the notation cij max{1, ξ i r−1 , ξ j r−1 } ξ i − ξ j , i, j ∈ {1, . . . , N }, Ur u ∈ RN : |ui | ≤ max{1, ξ i r }, ui − uj ≤ cij , i, j ∈ {1, . . . , N } , we can write ζr,B (P, Q) = sup sup I∈IB u∈Ur
i∈I
pi ui −
qj uj .
(4.50)
j∈I∩{1,...,n}
Note that (4.50) corresponds to the identity (4.25) for the B-discrepancies. Consequently, when dealing with extended discrepancies, the inner problem (4.15) has the form minimize t subject to q ∈ Sn , pi ui − qj uj ≤ t for all I ∈ IB . sup u∈Ur
i∈I
(4.51)
j∈I∩{1,...,n}
Recall that in Section 4.5.2 the left sides of the inequalities in problem (4.26) depend monotonously on I (if I ∩ {1, . . . , n} and q are fixed), and, therefore, it was possible to pass to the reduced system of critical index sets IB∗ and to the reduced problem (4.30). Unfortunately, as shown by Example 4.8.1, the monotonicity of the left sides does not hold in (4.51). Thus, to the best of our understanding, a numerical solution of the inner problem (4.51) seems to be impossible, even for moderate values of N . Example 4.8.1. We set N = 4, n = 1, and ξ i = i for i = 1, 2, 3, and ξ 4 = 1 − ε, pi = 14 for i = 1, . . . , 4, and q1 = 1. Furthermore, we consider the system Brect of closed intervals in R and the growth rate r = 1. Let us calculate the suprema on the left side of (4.51) for some critical index sets:
140 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances I ∈ IBrect {1, 2} supu∈U1 {. . .} 0.75
{1, 2, 3} 1
{1, 2, 4} 0.5 + 0.25ε
where a tuple u∗ ∈ U1 realizing the supremum for all these index sets is given by u∗ = (u∗1 , . . . , u∗4 ) = (−1, 0, 1, −1 + ε). Choosing ε sufficiently small, we see that supu∈U1 {. . .} does not depend monotonously on I.
4.8.2
Mass Transportation and Clustering
In general, the inner problem (4.26) admits a variety of optimal solutions, and it is not a priori clear which solution is preferable. Furthermore, whenever (4.26) is solved by a simplex algorithm, the resulting solutions (i.e., the new probability weights) are extremal points of the admissible set, see also Figure 4.10. With regard to estimate (4.8) it appears reasonable to additionally consider the Fortet-Mourier metric ζ2 in order to find more “regular” probability weights solving problem (4.26). To this end, we modifiy the objective function of (4.26) to a weighted sum, i.e., we fix some constant λ ∈ (0, 1) and consider the weighted inner problem min λ αB (P,
n
qi · δξi ) + (1 − λ) ζ2 (P,
i=1
subject to q =
n
qi · δξi )
i=1 (qi )ni=1
(4.52)
∈ Sn ,
denoting the standard simplex in Rn+ . Note that the minimization with Sn of ζ2 (P, ni=1 qi · δξ i ) subject to q is a linear constrained optimization problem with quadratic objective. In order to obtain an easier to handle linear program, we resort to the dual representation of ζ2 . Duality and Reduced Costs The metric ζ2 is defined by
ζ2 (P, Q) sup f (ξ)P[dξ] − f (ξ)Q[dξ] f ∈F 2
Ξ
Ξ
for probability measures P and Q in P2 (Ξ). We refer to Theorem 5.3.2 of Rachev (1991) to recall that ζ2 also admits a dual representation by the ◦ Kantorovich-Rubinstein functional μ2 that is defined by the following mass transshipment problem:
4.8. Further Results ◦
μ2 (P, Q) inf
Ξ×Ξ
141
˜ ˜ ˜ : max{1, |ξ|, |ξ|} ξ − ξ η[d(ξ, ξ)] η ∈ P(Ω × Ω), π1 η − π2 η = P − Q ,
where πi η denotes the projection of η on the i-th coordinate for i = 1, 2. For measures P, Q with support in the finite set {ξ 1 , . . . , ξ N } we can introduce the transport cost coefficients i j i j ci,j 2 = max{1, |ξ |, |ξ |} ξ − ξ for i, j = 1, . . . , N
and write ◦
μ2 (P, Q) = inf
N i,j=1
ηi,j ∈ [0, 1], i, j = 1, . . . , N N ci,j . N 2 ηi,j : j=1 ηi,j − j=1 ηj,i = pi − qi , i = 1, . . . , N
Unfortunately, the latter problem may become numerically intractable for large values of N because it includes N 2 variables ηi,j . Thus, we consider the reduced costs coefficients cˆi,j 2 inf
m−1 l=1
π(l),π(l+1)
c2
:
m ∈ N, π : {1, . . . , m} → {1, . . . , N }, π(1) = i, π(m) = j ◦
and recall from Section 4.3 of Rachev and Rüschendorf (1998) that μ2 equals the following Monge-Kantorovich functional: ⎧ ⎫ ⎪ η ⎨ i,j ∈ [0, 1], i = 1, . . . , N, j = 1, . . . , n ⎪ ⎬ n η = pi , i = 1, . . . , N, μ ˆˆ2 (P, Q) = inf cˆi,j . 2 ηi,j : j=1 i,j ⎪ ⎪ N ⎩ i=1,...,N ⎭ i=1 ηi,j = qj , j = 1, . . . , n j=1,...,n The determination of the reduced costs cˆi,j 2 is a shortest-path problem which can be solved by, e.g., the Dijkstra Algorithm (cf. Dijkstra (1959)), see also Section 3.5 of Heitsch (2007). Piecing all this together, we can replace ζ2 in problem (4.52) by the distance μ ˆˆ2 and obtain the following linear program:
142 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances 1
0.8
0.6
0.4
0.2
0 0
0.2
0.4
0.6
0.8
1
Figure 4.10: Optimal probabilities q of problem (4.53), adjusted with respect to αBrect and ζ2 with λ = 1 (gray balls) and λ = 0.9 (black circles). minimize subject to
λ tα + (1 − λ)tμ tα , tμ ∈ R+ , q = (qi )i=1,...,n ∈ Sn , η = (ηi,j )j=1,...,n i=1,...,N , ηi,j ≥ 0, tα , (qi )i fulfilling (4.31), i,j cˆ2 ηi,j , tμ ≥
(4.53)
(4.54) (4.55)
i=1,...,N j=1,...,n
n
ηi,j = pi , i = 1, . . . , N,
(4.56)
ηi,j = qj , j = 1, . . . , n,
(4.57)
j=1 N i=1
where the condition (4.31) in (4.54) stands for the constraints in problem (4.30) being induced by the system of reduced critical index sets IB∗ . The constraints (4.55)–(4.57) belong to the minimization of μ ˆˆ2 . Remark 4.8.2. The scenario reduction problem with respect to FortetMourier metrics (Dupačová et al., 2003; Heitsch and Römisch, 2003) leads to a partitioning of the initial scenarios into several clusters. This partitioning was used by Heitsch and Römisch (2008) to implement a scenario tree construction method based on successive conditional clustering. When considering discrepancy distances, the solution of the inner problem yields new probability weights for the remaining scenarios, but, unfortunately, no
4.8. Further Results
143
clustering scheme that could be used by a possible multiperiod recursive extension for scenario tree construction. However, given the new probability weights adjusted with respect to a discrepancy distance, it is possible to generate a clustering by solving a certain mass transportation problem. In particular, such a transportation plan η is directly obtained by solving the weighted inner problem (4.53), see also Figure 4.12.
0.6 0.4 0.2
0
0.2
0.4
0.6
0.8
1
Figure 4.11: Rectangular discrepancy and Fortet-Mourier distance between a set of 100 uniformly distributed points in R2 and 10 points chosen with forward selection, where the discrepancy is weighted with λ (abcissa) in the inner problem. The jump on the right side is of particular interest.
Further Numerical Experience Figure 4.10 shows N = 1000 points sampled from a uniform distribution on [0, 1]2 and n = 25 chosen points based on the Halton sequence. The diameters of the gray balls represent the probabilities resulting from solving problem (4.53) with respect to αBrect and ζ2 with λ = 1. The resulting values of αBrect and ζ2 are 0.230 and 0.157. Considering ζ2 in the objective by setting λ = 0.9 lets the value of αBrect unchanged but reduces ζ2 to 0.085. The resulting probability weights are shown by the black circles and appear more evenly distributed. Figure 4.11 shows the values of αBrect (solid line) and ζ2 (dashed line) resulting from applying the problem (4.53) with different values of λ (abscissa) within the Forward Selection heuristics for the outer problem. As we would except, the value of ζ2 increases as more emphasis is put on minimzing αBrect . However, by passing from λ = 1.0 to λ = 0.999 the value of αBrect decreases from 0.48 to 0.37. It seems that the nonsatisfying behaviour from Forward
144 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0 0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
Figure 4.12: Support selected from 1, 000 uniformly distributed scenarios by a lds. Left side: probabilities adjusted with respect to the Fortet-Mourier distance ζ2 (λ = 0). Right side: probabilities adjusted with respect to the rectangular discrepancy αBrect (λ = 1). The different gray levels show an optimal mass transportation plan from initial to the reduced measure.
Selection addressed in Section 4.6 may be improved by taking the ζ2 -term into account. This effect can be observed by modifying the Forward Selection within Example 4.6.1 by applying the weighted inner problem (4.53) with λ = 0.999. By this modification, we obtain a better choice in Step [2] and, thus, the discrepancy can be reduced from 13 to 14 . Finally, Figure 4.12 shows an optimal transportation plan η, i.e., how the probability weights of the initial scenarios may be transported to the remaining ones. While the optimization with λ = 0 amounts to a projection on the set of remaining scenarios and thus leads to the classical Voronoi regions (cf., e.g., Graf and Luschgy (2000)), this is no longer true for the rectangular discrepancy with λ = 1,
4.8.3
An Estimation between Discrepancies
In the previous sections we have discussed how polyhedral discrepancy distances may be applied to scenario reduction. It turned out that the numerical complexity of the scenario reduction problem grows exponentially with the number of facets of the cone pos W . Thus, it seems preferable to apply a weaker discrepancy than αBph,W to the scenario reduction, for example the rectangular discrepancy αBrect . This is reasonable whenever stability of the
4.8. Further Results
145
underlying optimization problem holds also with respect to αBrect . In particular, the latter is true if αBph,W can be estimated from above in terms of αBrect . For instance, the following result has been established by Niederreiter and Wills (1975): ( αBconv (P, Q) ≤ d
4M d d−1
) d−1 d
1
αBrect (P, Q) d
(4.58)
if P has a density (with respect to the Lebesgue measure on Rd ) which is bounded by some constant M ≥ 0. Estimate (4.58) allows to conclude the desired inequality for αBph,W and αBrect from the relation Bph,W ⊂ Bconv . The following example shows that P’s continuity with respect to the Lebesgue measure is indeed necessary for the estimate (4.58). Example 4.8.3. We consider Example 4.2.1 and suppose that the measure P is the uniform distribution on the diagonal {(x, x) ⊂ R2 : x ∈ (0.1, ε 0.4)}. for For ε ∈ (0, 0.1] we define the shifted measure Pε via Pε [A] P A + −ε every Borel set A ⊂ [0, 1]2 . Recall that the recourse matrix (of the continuous variables) of Example 4.2.1 has the form ) ( 1 −1 0 . W = A¯R2,0 = 1 0 1 Then, on the one hand, we have αBrect (P, Pε ) → 0 with ε → 0. On the other hand, it follows that αBconv (P, Pε ) = αBph,W (P, Pε ) = 1 and v(Pε ) − v(P) = 0.5 for ε ∈ (0, 0.1], where v(·) denotes the optimal value of problem (4.11). Unfortunately, continuity with respect to the Lebesgue measure will be hardly available in the context of scenario reduction, where, in general, discrete measures are considered. Thus, the scope of this section is to establish an estimate of the form αBph,W (P, Q) ≤ K1 αBrect (P, Q) + K2 (P),
(4.59)
that holds also for discrete measures P, Q, with constants K1 , K2 ∈ R+ that may depend on P. Then, it appears reasonable to reduce the initial measure P with respect to the easier accessible discrepancy αBrect whenever K2 (P) is not too large. Having in mind Example 4.8.3, the value K2 (P) should reflect P’s degree of singularity with regard to the polyhedra in Bph,W , in some sense. In order to derive an estimate of type (4.59), we mainly follows the lines of Niederreiter and Wills (1975).
146 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances We assume that supp P ⊂ [0, 1]d and consider for r ≥ 1 the (scaled) d-dimensional unit cube K (r) [−1/r, 1/r]d with edge length 2/r and the lattice set Z (r) {z = (z1 , . . . , zd ) ∈ Zd+ : zj < r for j = 1, . . . , d}. For z ∈ Z (r) we define the d−dimensional half-open interval ) ) zd zd + 1 z1 z1 + 1 , × ··· × , . Iz(r) r r r r We assume that pos W has k ((d − 1)-dimensional) facets and, thus, every polyhedron C ∈ Bph,W , C ⊂ [0, 1]d has 2(k + d) facets (some of them may be empty) with associated normal vectors m1 , . . . , mk+d . For the sake of notational simplicity, we assume that mj , ei ∈ (0, 1) = ei mk+i
for j = 1, . . . , k and i = 1, . . . , d, for i = 1, . . . , d.
We denote by h(mj ) h(r) (mj ) sup x, mj ,
j = 1, . . . , k + d
x∈K (r)
the supporting function of K (r) in direction mj . Observe that h(mj ) = (1/r) mj 1 for j = 1, . . . , k + d, and, in particular, h(mk+i ) = 1/r for i = 1 . . . , d. Upper Bound Let us consider Borel probability measures P and Q on [0, 1]d and a polyhedron C ∈ Bph,W , C ⊂ [0, 1]d . The set C can be written as + C = {x ∈ Rd : mj , x ∈ [c− j , cj ], j = 1, . . . , k + d}, + ¯ for certain values c− j , cj ∈ R. In order to establish an upper bound on P[C] − Q[C] being independent of C, we consider the set − + + x ∈ Rd : mj , x ∈ [c− M∦ j , cj + h(mj )] ∪ [cj − h(mj ), cj ] j=1,...,k
and the following polyhedral enlargement of C:
+ mj , x ∈ [c− d j , cj ], j = 1, . . . , k, C + x ∈ R : . + mk+i , x ∈ [c− k+i − 1/r, ck+i + 1/r], i = 1, . . . , d
4.8. Further Results
147
Figure 4.13: A polyhedron C ⊂ R2 (left) and the corresponding set M∦ (right). The latter set is obtained by shifting each of C s facets being orthogonal to a unit vector by an amount of 1/r in direction of the its (outer) normal vector. The sets M∦ and C + are depicted for a given polyhedron C in the Figures 4.13 and 4.14, respectively. Furthermore, we define the set (r) C1 C1 C ∩ Iz(r) (r)
z∈Z (r) : Iz ⊂C+
As in Niederreiter and Wills (1975), one easily shows the following inclusion: Proposition 4.8.4. C \ C1 ⊂ M∦. Proof. We have to show that C\
(4.60)
Iz(r) ⊂ M∦.
(r)
z∈Z: Iz ⊂C+
We assume that this inclusion does not hold and consider some x ∈ C with (r) (r) x∈ / M∦ and some z˜ ∈ Z (r) with x ∈ Iz˜ ⊂ C + . It is easy to see that Iz˜ (r) (r) is contained in the Minkowski sum x + K . We will show that x + K is (r) contained in C + , contradicting the assumption Iz˜ ⊂ C + . To this end, we observe that x ∈ C \ M∦ fulfills + mj , x ∈ [c− j + h(mj ), cj − h(mj )]
for j = 1, . . . , k.
We consider some y ∈ K (r) and j ∈ {1, . . . , k}. From the definition of h(mj ), + we know that |y, mj | ∈ [−h(mj ), h(mj )], and, thus, x + y, mj ∈ [c− j , cj ]. − On the other hand, we have |yi | ≤ 1/r, which implies x + y, mk+i ∈ [ck+i − 1/r, c+ k+i + 1/r] for i = 1, . . . , d. This implies x + y ∈ C + , and, hence, x + K (r) ⊂ C + . This completes the proof.
148 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances
From relation (4.60) we can derive the following inequality: P[C] − P[C1 ] ≤ P[M∦]. Having in mind that Q[C] ≥ Q[C1 ], we conclude that P[C] − Q[C] = P[C] − P[C1 ] + P[C1 ] − Q[C] ≤ P[M∦] + P[C1 ] − Q[C1 ]
(4.61)
As in Satz 1 of Niederreiter and Wills (1975) and on p.96 of Kuipers and Niederreiter (1974), we can show that the set C1 can be partitioned into r d−1 disjoined half-open d−dimensional intervals Ii , i = 1, . . . , rd−1 . We thus obtain d−1 r P[Ii ] − Q[Ii ] . P[C1 ] − Q[C1 ] ≤ i=1
For every half-open interval Ii we can find an increasing sequence of closed (n) (n) d−dimensional intervals Ii with ∪n∈N Ii = Ii . Consequently, we have (n) (n) (n) (n) P[Ii ] − Q[Ii ] = lim P[Ii ] − Q[Ii ] ≤ sup P[Ii ] − Q[Ii ] . n→∞
n∈N
Hence, we can conclude that P[C1 ] − Q[C1 ] ≤ rd−1 αBrect (P, Q).
(4.62)
To estimate the term P[M∦] in (4.61), we introduce the quantity P j (y) max P x ∈ Rd : mj , x ∈ [a, a + y] , a∈R
j = 1, . . . , k,
measuring the maximal P-probabilitiy of a stripe of width y and being normal with respect to mj . It follows that P[M∦] ≤ 2
k
P j (h(mj )).
(4.63)
j=1
By piecing together the inequalities (4.61)–(4.63) we obtain the following upper bound: P[C] − Q[C] ≤ rd−1 αBrect (P, Q) + 2
k j=1
P j (h(mj )).
(4.64)
4.8. Further Results
149
Figure 4.14: Enlargements C + (left) and C∦+ (right) of a polyhedron C ⊂ R2 . Lower Bound In order to establish a lower bound on P[C] − Q[C] we consider a further polyhedral enlargement of C:
+ mj , x ∈ [c− d j − h(mj ), cj + h(mj )], j = 1, . . . , k, . C∦+ x ∈ R : + mk+i , x ∈ [c− k+i , ck+i ], i = 1, . . . , d The set C∦+ is obtained by shifting each of C s facets being not orthogonal to a unit vector by an amount of h(mj ) in direction of its (outer) normal vector. The set C∦+ is sketched in Figure 4.14. Furthermore, we put (r) (r) Iz(r) . C2 C2 C∦+ ∩ (r)
z∈Z (r) : Iz ∩C =∅
Obviously, we have C ⊂ C2 ⊂ C∦+ , and, thus, P[C2 ] − P[C] ≤ P[C∦+ ] − P[C] = P[C∦+ \ C] − {x ∈ Rd : mj , x ∈ [c− − h(m ), c ]} ≤ P j j j j=1,...,k
+ {x ∈ Rd : mj , x ∈ [c+ j , cj + h(mj )]}
+P
j=1,...,k
≤ 2
k
P j (h(mj )).
j=1
We obtain P[C] − Q[C] ≥ P[C] − Q[C2 ] = P[C] − P[C2 ] + P[C2 ] − Q[C2 ] k ≥ −2 P j (h(mj )) − rd−1 αBrect (P, Q), j=1
(4.65)
150 Chapter 4. Scenario Reduction with Respect to Discrepancy Distances where the last estimate holds since C2 can be partitioned into r d−1 disjoined (half-open) intervals, too. Combining (4.64) and (4.65), we finally get the following estimate of type (4.59): αBph,W (P, Q) ≤ rd−1 αBrect (P, Q) + 2
k
P j (h(r) (mj ))
for all r ≥ 1.
j=1
(4.66) Determining the Polyhedral Singularity For a discrete measure P being supported by N atoms ξi , i = 1, . . . , N , the mappings r → P j (h(r) (mj )) can be computed by Algorithm 4.6. The ¯ N . The value Di , i = 1, . . . , N , algorithm calculates two vectors D, P ∈ R is the minimal width of a stripe being orthogonal with respect to mj and having a P-probability greater or equal than the value Pi ; thus we have that P j (t) = Pi
for t ∈ [Di , Di+1 ) .
Remark 4.8.5. The running time of Algorithm (4.6) for 10 000 scenarios in R2 is less than a second. Observe that the running time depends on the Algorithm 4.6. Calculation of the polyhedral singularity P j (h(r) (mj )). Initialization ¯ N and put Di ∞ and Pi ∞ for i = 1, . . . , N . Define vectors D, P ∈ R Consider a permutation π of {1, ..., N } with ξπ(i) , mj ≤ ξπ(i ) , mj whenever π(i) ≤ π(i ), for i, i ∈ {1, . . . , N }. ¯ N with di = ξπ−1 (i) , mj and pi = P[ξπ−1 (i) ] Define vectors d, p ∈ R for i = 1 . . . , N . Main Step For i = 1, . . . , N : Set p∗ 0. For l = i, . . . , N : Update p∗ p∗ + pl and consider d∗ dl − di . Set m inf{t ≥ 1 : Pt > p∗ }. If Dm > d∗ : Insert p∗ into P and d∗ into D at position m (push back subsequent entries having finite values). Set k 1. While (Dm−k > d∗ ) and (m > k): Update Dm−k d∗ and set k k + 1.
4.8. Further Results
151
1
0.8
0.6
0.4
0.2
0
5
10
15
20
Figure 4.15: Mappings r → P j (h(r) (mj )) for mj = (1, −1) and different measures Pc , each consisting of 10, 000 equally weighted points obtained by sam pling from a normal distribution in R2 with covariance matrix 1c 1c and c ∈ {−0.99, −0.5, 0, 0.5, 0.9, 0.99}. The value P j (h(r) (mj )) depends increasingly on the parameter c.
dimension d only through the computation of the scalar products during the initialization phase.
Appendix Proof of Lemma 2.2.8. Consider f ∈ F1+mt+1 (Ξt+1 ). Then we obtain E f (ξt+1 ) ξ[t] = ξ[t] − E f (ξ t+1 ) ξ[t] = ξˆ[t] = E f (gt (ξ[t] , εt+1 )) − E f (gt (ξˆ[t] , εt+1 )) mt+1 ≤ E max 1, gt (ξ[t] , εt+1 ) , gt (ξˆ[t] , εt+1 )
· gt (ξ[t] , εt+1 ) − gt (ξˆ[t] , εt+1 )
mt+1 h( εt+1 ) ≤ E max 1, gt (ξ[t] , εt+1 ) , gt (ξˆ[t] , εt+1 )
· max{1, ξ[t] , ξˆ[t] }r ξ[t] − ξˆ[t]
mt+1 ≤ E max 1, ξ[t] , ξˆ[t]
k( εt+1 )mt+1 h( εt+1 ) · max{1, ξ[t] , ξˆ[t] }r ξ[t] − ξˆ[t]
= E [k( εt+1 )mt+1 h( εt+1 )] · max{1, ξ[t] , ξˆ[t] }r+mt+1 ξ[t] − ξˆ[t] . Due to the identity r+mt+1 = mt −1, this entails condition (i) of Assumption 2.2.6. The asserted form of K follows from m1 ≥ mt for t = 1, . . . , T . Furthermore, we apply (2.13) recursively to obtain the following estimate:
ξT ≤ max{1, ξ[t] }
T
k( εi ).
i=t+1
Raising both sides to the power of 1 + (T − t) and taking conditional expectations E[ · |ξ[t] = ξ[t] ] verifies condition (ii) of Assumption 2.2.6. Proposition A.1. The strategy x∗ defined by (3.92) is an optimal solution to problem (3.89). Proof. We introduce the value ht U − t−1 s=1 xs that indicates how much of the swing option’s intial capacity U is still available at time t. The time t
154
Appendix
recourse function Qt (·, ·) of problem (3.91) can be written in terms of ht by setting QT +1 ≡ 0 and min xt ψ(ξt ) + E Qt+1 (ht − xt , ξt+1 ) ξt = ξt . (A.2) Qt (ht , ξt ) xt ∈[0,1],xt ≤ht
Let us denote the objective function of problem (A.2) by Φt (ht , ξt , xt ) xt ψ(ξt ) + E[Qt+1 (ht − xt , ξt+1 )|ξt = ξt ]. ∗ The strategy x∗ with h∗t t−1 s=1 xs is optimal for problem (3.91) if and only if the equality Qt (h∗t , ξt ) = Φt (h∗t , ξt , x∗t (ξt )) (A.3) is fulfilled for P−a.e. ξt ∈ Ξt for t = 1, . . . , T . By definition, we have x∗ defined by (3.92) fulfills x∗t = 0 and h∗t = U for t = 1, . . . , T − U . Together with the presentation (A.6) of Qt , this shows that identity (A.3) holds for t = 1, . . . , T − U . Let us now consider t ∈ {T − U + 1, . . . , T } and note that h∗t ≥ T + 1 − t and h∗t ∈ N. Thus, relation (A.6) turns to T E[ψ(ξs )|ξt = ξt ] (A.4) Qt (h∗t , ξt ) = s=t
On the other hand, we have x∗t (ξt ) = 1 if ψ(ξt ) = 0. This yields x∗t (ξt )ψ(ξt ) = ψ(ξt ) = E[ψ(ξt )|ξt = ξt ], and, in particular, Φt (h∗t , ξt , x∗t (ξt )) = E[ψ(ξt )|ξt = ξt ] + E[Qt+1 (h∗t − x∗t (ξt ), ξt+1 )|ξt = ξt ]. (A.5) Applying now h∗t − x∗t ≥ T + 1 − (t + 1) and h∗t − x∗t ∈ N together with (A.6) shows that the right side of (A.5) is equal to the right side of (A.4). Thus, (A.3) holds also for t = T − U + 1, . . . , T and, consequently, x∗ given by (3.92) is optimal for problem (3.91). Because x∗t > 0 only if ξt > K, the strategy x∗ is also optimal for problem (3.89). Proposition A.2. The cost-to-go function Qt (ht , ξt ) of problem (3.91), defined by (A.2), has the form Qt (ht , ξt ) = (ht − ht ) E[ψ(ξT −ht )|ξt = ξt ] 1{T −ht ≥t} +
T
(A.6)
E[ψ(ξs )|ξt = ξt ],
s=max{t,T +1−ht }
i.e., for t = 1, . . . , T the mapping ht → Qt (ht , ξt ) is convex and nonincreasing. Furthermore, it is affine in ht on every interval [z, z + 1] for z ∈ N+ .
Appendix
155
Proof. Since ψ a concave mapping, the conditional version of the Jensen inequality yields for t = 1, . . . , T, and s ≥ t ψ (E[ξs |ξt = ξt ]) ≥ E[ψ(ξs )|ξt = ξt ]. Since the drift coefficient μ in (3.90) is assumed to be nonnegative, the process ξ is a (Markov) submartingale. Hence, we have ξt ≤ E[ξs |ξt = ξt ], and, due to the decrease of ψ, ψ(ξt ) ≥ ψ(E[ξs |ξt = ξt ]). Piecing all this together, we obtain the relation ψ(ξt ) ≥ E[ψ(ξs )|ξt = ξt ].
(A.7)
Let us now focus on the identity (A.6) that holds true for QT +1 . We assume that it is also true for Qt+1 , for some t + 1 ∈ {2, . . . , T }. To prove the assertion for Qt , we consider some value ht ≥ 0 and observe that the value Φt (ht , ξt , xt ) depends continuously xt . In the following, we write Φ(xt ) Φt (ht , ξt , xt ) for notational simplicity. The piecewise linearity of u → Qt+1 (u, ξt ) implies that xt → Φ(xt ) is affine on each of the intervals [0, ht − ht ] and [ht − ht , 1]. Thus, the minimum in (A.2) is attained for some xt ∈ {0, ht − ht , 1}. Hence, we have just to compare the values Φ(0), Φ(ht − ht ), and Φ(1). Using the relation (A.6) for Qt+1 , we obtain Φ(0) = (ht − ht )E[ψ(ξT −ht )|ξt = ξt ]1{T −ht ≥t+1} +
T
E[ψ(ξs )|ξt = ξt ],
s=max{t+1,T +1−ht }
Φ(ht − ht ) = (ht − ht )ψ(ξt ) + E[Qt+1 (ht , ξt+1 )|ξt = ξt ] T E[ψ(ξs )|ξt = ξt ] = (ht − ht )ψ(ξt ) + s=max{t+1,T +1−ht }
Φ(1) = ψ(ξt ) + E[Qt+1 (ht − 1, ξt+1 )|ξt = ξt ] = ψ(ξt ) + (ht − ht )E[ψ(ξT +1−ht )|ξt = ξt ]1{T −ht ≥t} +
T s=max{t+1,T +2−ht }
E[ψ(ξs )|ξt = ξt ]
156
Appendix
In order to determine the smallest of these three terms and to verify the asserted form of Qt , we distinguish the following three cases: Case 1: ht ≥ T + 1 − t, i.e., there is left enough capacity of the swing option to fully exercise in t, . . . , T . Evaluating Φ in the relevant points entails Φ(0) =
T
E[ψ(ξs )|ξt = ξt ],
s=t+1
Φ(ht − ht ) = (ht − ht )ψ(ξt ) +
T
E[ψ(ξs )|ξt = ξt ],
s=t+1
Φ(1) = ψ(ξt ) +
T
E[ψ(ξs )|ξt = ξt ].
s=t+1
We obtain Φ(0) ≥ Φ(ht − ht ) ≥ Φ(1). The minimum in (A.2) is thus realized by xt = 1 and we have Qt (ht , ξt ) = Φ(1). The identity (A.6) follows from the estimate ht ≥ T + 1 − t. Case 2: ht = T − t, i.e., there is left enough capacity of the swing option to fully exercise in t + 1, . . . , T (and to purchase an amount of ht − ht at time t). We obtain Φ(0) =
T
E[ψ(ξs )|ξt = ξt ],
s=t+1
Φ(ht − ht ) = (ht − ht )ψ(ξt ) +
T
E[ψ(ξs )|ξt = ξt ],
s=t+1
Φ(1) = ψ(ξt ) + (ht − ht )E[ψ(ξt+1 )|ξt = ξt ] T E[ψ(ξs )|ξt = ξt ]. + s=t+2
Obviously, we have Φ(0) ≥ Φ(ht − ht ), and the relation (A.7) implies Φ(1) ≥ Φ(ht − ht ). Consequently we have Qt (ht , ξt ) = Φ(ht − ht ) and we may use the relation ht = T − t to verify (A.6).
Appendix
157
Case 3: ht < T − t, i.e., there is left not enough capacity of the swing option to fully exercise in t + 1, . . . , T . We observe Φ(0) = (ht − ht )E[ψ(ξT −ht )|ξt = ξt ] T
+
E[ψ(ξs )|ξt = ξt ],
s=T +1−ht
Φ(ht − ht ) = (ht − ht )ψ(ξt ) +
T
E[ψ(ξs )|ξt = ξt ],
s=T +1−ht
Φ(1) = ψ(ξt ) + (ht − ht )E[ψ(ξT +1−ht )|ξt = ξt ] T
+
E[ψ(ξs )|ξt = ξt ].
s=T +2−ht
Applying again the inequality (A.7) we conclude Φ(1) ≥ Φ(ht − ht ) ≥ Φ(0) and, thus, Qt (ht , ξt ) = Φ(0). Applying ht < T − t establishes the asserted identity (A.6). Remark A.3. The mapping π : {ξ i : i ∈ I} → {ξ¯j : j ∈ J} used in Section 3.5.2 may be constructed as follows. Given a scenario tree consisting of the scenarios ξ¯j , j ∈ J, a node nt at time t is a subset of J, such that the scenarios ξ¯j , j ∈ nt are indistinguishable until time t. The set of nodes at time t is denoted by Nt . The set of nodes at time t + 1 succeeding from the node nt is denoted by succ(nt ) ⊂ Nt+1 , and we have ∪nt+1 ∈succ(nt ) nt+1 = nt . The realization of the (tree) process ξ¯ on a node nt is denoted by ξ¯tnt . Recall that there is only one node at time t = 1, i.e. N1 = {n1 } = {J}, and there are |J| nodes at time horizon T , each of it containing a different singular scenario index. Now, the mapping π is constructed recursively as follows. Given some i out-of-sample scenario ξ i , we set π1 (ξ[1] ) ξ¯1n1 . Assume that for some t ∈ nt−1 i ¯ {2, . . . , T } we have πt−1 (ξ[t−1] ) = ξt−1 for some node nt−1 . Then we choose i πt (ξ[t] ) argminξ¯nt :nt ∈succ(nt−1 ) ξti − ξ¯tnt .
Bibliography D. Avis and K. Fukuda. A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra. Discrete and Computational Geometry, 8:295–313, 1992. V. Bally, G. Pagès, and J. Printems. A quantization tree method for pricing and hedging multidimensional american options. Mathematical Finance, 15(1):119–168, 2005. C. Barber, D. Dobkin, and H. Huhdanpaa. The quickhull algorithm for convex hulls. ACM Trans. on Mathematical Software, 22:469–483, 1997. K. Barty. Contributions à la discrétisation des contraintes de mesurabilité pour les problèmes d’optimisation stochastique. PhD thesis, École Nationale des Ponts et Chaussées, Paris, 2004. J.F. Benders. Partitioning procedures for solving mixed-variables programming problems. Numerische Mathematik, 4:238–252, 1962. P. Billingsley and F. Topsøe. Uniformity in weak convergence. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 7:1–16, 1967. J.R. Birge. Decomposition and partitioning methods for multistage stochastic programming. Operations Research, 33(5):989–1007, 1985. J.R. Birge and F. Louveaux. A multicut algorithm for two-stage stochastic linear programs. European Journal of Operational Research, 34:384–392, 1988. J.R. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer Series in Operations Research. Springer-Verlag, 1997. C.C. Carøe and J. Tind. L-shaped decomposition of two-stage stochastic programs with integer recourse. Mathematical Programming, 83(3):451– 464, 1998.
160
Bibliography
M.S. Casey and S. Sen. The scenario generation algorithm for multistage stochastic linear programming. Mathematics of Operations Research, 30: 615–631, 2005. A. Chiralaksanakul and D. Morton. Assessing policy quality in multi-stage, stochastic programming. Stochastic Programming E-Print Series, 12, 2004. J. Cox, S. Ross, and M. Rubinstein. Option pricing: a simplified approach. Journal on Financial Economics, 7:229–263, 1979. G. Dantzig. Linear programming under uncertainty. Management Science, 1(3/4):197–206, 1955. G. Dantzig and P. Wolfe. Decomposition principle for linear programs. Operations Research, 8:101–111, 1960. M.A.H. Dempster. Sequential importance sampling algorithms for dynamic stochastic programs. Zapiski Nauchnykhas Seminarov POMI, 312:94–129, 2004. D. Dentcheva and W. Römisch. Optimal power generation under uncertainty via stochastic programming, pages 22–56. Volume 458 of , Kall and Marti (1998), 1998. E.W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269–271, 1959. R. Dobrushin. Central limit theorem for non-stationary markov chains i. Teor. Veroyatnost. i Primenen., 1(N1):72–89, 1956. J. Dupačová, G. Consigli, and S.W. Wallace. Scenarios for multistage stochastic programming. Annals of Operations Research, 100:25–53, 2000. J. Dupačová, N. Gröwe-Kuska, and W. Römisch. Scenarios reduction in stochastic programming: An approach using probability metrics. Mathematical Programming, 95(A):493–511, 2003. E.B. Dynkin. Markov processes. Springer, Berlin, 1965. A. Eichhorn. Stochastic Programming Recourse Models: Approximation, Risk Aversion, Applications in Energy. PhD thesis, Humboldt-Universität zu Berlin, 2007. Logos Verlag (Berlin, Germany).
Bibliography
161
A. Eichhorn, H. Heitsch, and W. Römisch. Scenario tree approximation and risk aversion strategies for stochastic optimization of electricity production and trading. In J. Kallrath and P. Pardalos, editors, Optimization in the Energy Industry. Springer, 2008. A. Epe, C. Küchler, W. Römisch, S. Vigerske, H.-J. Wagner, C. Weber, and O. Woll. Stochastische Optimierung mit rekombinierenden Szenariobäumen – Analyse dezentraler Energieversorgung mit Windenergie und Speichern. In Optimierung in der Energiewirtschaft, number 2018 in VDIBerichte, pages 3–13. VDI-Verlag, Düsseldorf, 2007. A. Epe, C. Küchler, W. Römisch, S. Vigerske, H.-J. Wagner, C. Weber, and O. Woll. Optimization of dispersed energy supply - stochastic programming with recombining scenario trees. In J. Kallrath and P. Pardalos, editors, Optimization in the Energy Industry. Springer, 2008. I.V. Evstigneev. Measurable selection and dynamic programming. Mathematics of Operations Research, 1(3):267–272, 1976. B.T. Ewing, J.B. Kruse, and J.L. Schroeder. Time series analysis of wind speed with time-varying turbulence. Technical report, Center for National Hazards Research, Thomas Harriot College of Arts and Sciences, East Carolina University, 2004. available at http://www.ecu.edu/hazards/reports.htm. K. Fukuda. cdd and cddplus homepage. http://www.ifor.math.ethz.ch/∼fukuda/cdd_home/cdd.html, 2005. J. Galambos and I. Simonelli. Bonferroni-type Inequalities with Applications. Springer-Verlag, 1996. M.R. Garey and D.S. Johnson. Computers and Intractability - A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979. H.I. Gassmann. MSLiP: a computer code for the multistage stochastic linear programming problem. Mathematical Programming, 47:407–423, 1990. S. Graf and H. Luschgy. Foundations of Quantization for Probability Distributions, volume 1730 of Lecture Notes in Mathematics. Springer, New York, 2000. M. Grötschel, S.O. Krumke, and J. Rambau, editors. Online Optimization of Large Scale Systems. Springer, Berlin, 2001.
162
Bibliography
N. Gröwe-Kuska, H. Heitsch, and W. Römisch. Scenario reduction and scenario tree construction for power management problems. In A. Borghetti, C.A. Nucci, and M. Paolone, editors, IEEE Bologna Power Tech Proceedings, 2003. J. Guddat, H.Th. Jongen, B. Kummer, and F. Nožička, editors. Parametric Optimization and Related Topics. Akademie-Verlag, Berlin, 1987. D. Haugland and S.W. Wallace. Solving many linear programs that differ only in the righthand side. European Journal of Operational Research, 37 (3):318–324, 1988. T. Heinze and R. Schultz. A branch-and-bound method for multistage stochastic integer programs with risk objectives. Optimization, 57(2):277– 293, 2008. H. Heitsch. Stabilität und Approximation stochastischer Optimierungsprobleme. PhD thesis, Humboldt-Universität zu Berlin, 2007. H. Heitsch and W. Römisch. Scenario reduction algorithms in stochastic programming. Computational Optimization and Applications, 24:187–206, 2003. H. Heitsch and W. Römisch. A note on scenario reduction for two-stage stochastic programs. Oper. Res. Lett., 35:731–738, 2007. H. Heitsch and W. Römisch. Scenario tree modeling for multistage stochastic programs. Mathematical Programming, to appear, 2008. H. Heitsch, W. Römisch, and C. Strugarek. Stability of multistage stochastic programs. SIAM Journal on Optimization, 17:511–525, 2006. R. Henrion, C. Küchler, and W. Römisch. Discrepancy distances and scenario reduction in two-stage stochastic integer programming. Journal of Industrial and Management Optimization, 4(2), 2008. R. Henrion, C. Küchler, and W. Römisch. Scenario reduction in stochastic programming with respect to discrepancy distances. Computational Optimization and Applications, 43(1):67–93, 2009. R. Henrion and A. Möller. Optimization of a continuous distillation process under random inflow rate. Computers & Mathematics with Applications, 45:247–262, 2003.
Bibliography
163
R. Henrion and W. Römisch. Metric regularity and quantitative stability in stochastic programs with probabilistic constraints. Mathematical Programming, 84:55–88, 1999. R. Henrion and W. Römisch. Hölder and Lipschitz stability of solution sets in programs with probabilistic constraints. Mathematical Programming, Ser. A 100:589–611, 2004. P. Hilli and T. Pennanen. Numerical study of discretizations of multistage stochastic programs. working paper, http://math.tkk.fi/∼teemu/publications.html, 2006. E. Hlawka. Zur Definition der Diskrepanz. Acta Arithmetica, 18:233–241, 1971. E. Hlawka and R. Mück. Über eine Transformation von gleichverteilten Folgen II. Computing, 9:127–138, 1972. E. Hlawka and H. Niederreiter. Diskrepanz in kompakten abelschen Gruppen i. Manuscripta Math., 1:259–288, 1969. R. Hochreiter and G.Ch. Pflug. Financial scenario generation for stochastic multi-stage decision processes as facility location problems. Annals of Operations Research, 152(1):257–272, 2007. K. Høyland, M. Kaut, and S.W. Wallace. A heuristic for moment-matching scenario generation. Computational Optimization and Applications, 24: 169–185, 2003. K. Høyland and S.W. Wallace. Generating scenario trees for multistage decision problems. Management Science, 47(2):295–307, 2001. ILOG, Inc. CPLEX 10.0. http://www.ilog.com/products/cplex. G. Infanger. Planning under Uncertainty. The Scientific Press Series. Boyd & Fraser, New York, 1994. G. Infanger and D.P. Morton. Cut sharing for multistage stochastic linear programs with interstage dependency. Mathematical Programming, 75: 241–256, 1996. P. Kall and K. Marti, editors. Stochastic programming methods and technical applications, volume 458 of Lecture Notes in Economics and Mathematical Systems. Springer, Berlin, 1998.
164
Bibliography
P. Kall and S.W. Wallace. Stochastic Programming. Wiley, Chichester, 1994. M. Kaut and S.W. Wallace. Evaluation of scenario-generation methods for stochastic programming. Pacific Journal of Optimization, 3(2):257–271, 2007. C. Küchler. On stability of multistage stochastic programs. SIAM Journal on Optimization, 19:952–968, 2008. C. Küchler and S. Vigerske. Decomposition of multistage stochastic programs with recombining scenario trees. Stochastic Programming E-Print Series, 9, 2007. http://www.speps.org, submitted to Mathematical Programming. C. Küchler and S. Vigerske. Numerical evaluation of approximation methods in stochastic programming. submitted to Proceedings of the 11th Conference on Stochastic Programming (SPXI), Vienna, 2008. D. Kuhn. Generalized bounds for convex multistage stochastic programs, volume 548 of Lecture Notes in Economics and Mathematical Systems. Springer, Berlin, 2005. H.W. Kuhn and A.W. Tucker, editors. Contribution to theory of games, volume 2. Princeton University Press, 1953. L. Kuipers and H. Niederreiter. Uniform Distribution of Sequences. Wiley, New York, 1974. A.H. Land and A.G. Doig. An automatic method of solving discrete programming problems. Econometrica, 28(3):497–520, 1960. F.V. Louveaux and R. Schultz. Stochastic Integer Programming, chapter 4, pages 213–266. Volume 10 of , Ruszczyński and Shapiro (2003b), 2003. H. Markowitz. Portfolio selection. Journal of Finance, 7:77–91, 1952. R. Mirkov and G.Ch. Pflug. Tree approximations of dynamic stochastic programs. SIAM Journal on Optimization, 18(3):1082–1105, 2007. D.P. Morton. An enhanced decomposition algorithm for multistage stochastic hydroelectric scheduling. Annals of Operations Research, 64:211–235, 1996. T.S. Motzkin, H. Raiffa, G.L. Thompson, and R.M. Thrall. The double description method. Volume 2 of , Kuhn and Tucker (1953), 1953. P. Mück and W. Philipp. Distances of probability measures and uniform distribution mod 1. Math. Z., 142:195–202, 1975.
Bibliography
165
H. Niederreiter. Random Number Generation and Quasi-Monte Carlo Methods. CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1992. H. Niederreiter and J.M. Wills. Diskrepanz und distanz von maßen bezüglich konvexer und jordanscher mengen. Math. Z., 144:125–134, 1975. P. Nørgård, G. Giebel, H. Holttinen, and A. Petterteig. Fluctuations and predictability of wind and hydropower. Technical report, WILMAR, Risø National Laboratory, 2004. http://www.wilmar.risoe.dk/Results.htm. M.P. Nowak and W. Römisch. Stochastic Lagrangian relaxation applied to power scheduling in a hydro-thermal system under uncertainty. Annals of Operations Research, 100:251–272, 2000. L. Ntaimo, A. Schaefer, and S. Trukhanov. On adaptive multicut aggregation for two-stage stochastic linear programs with recourse. Optimization Online, November 2007, 2007. http://www.optimization-online.org/. R. Nürnberg and W. Römisch. A two-stage planning model for power scheduling in a hydro-thermal system under uncertainty. Optimization and Engineering, 3:355–378, 2002. T. Pennanen. Epi-convergent discretizations of multistage stochastic programs. Mathematics of Operations Research, 30(1):245–256, 2005. T. Pennanen. Epi-convergent discretizations of multistage stochastic programs via integration quadratures. Mathematical Programming, 116:461– 479, 2009. M.V.F. Pereira and L.M.V.G. Pinto. Multi-stage stochastic optimization applied to energy planning. Mathematical Programming, 52:359–375, 1991. G.Ch. Pflug. Scenario tree generation for multiperiod financial optimization by optimal discretization. Mathematical Programming, 89:251–271, 2001. G.Ch. Pflug. Stochastic Optimization and Statistical Inference, chapter 7, pages 427–482. Volume 10 of , Ruszczyński and Shapiro (2003b), 2003. G.Ch. Pflug and W. Römisch. Modeling, Measuring and Managing Risk. World Scientific, Singapore, 2007. A. Pratelli. Existence of optimal transport maps and regularity of the transport density in mass transportation problems. PhD thesis, Scuola Normale Superiore, Pisa, 2003.
166
Bibliography
A. Prékopa. Stochastic Programming. Kluwer, Dordrecht, 1995. A. Prékopa. Probabilistic Programming, chapter 5, pages 267–351. Volume 10 of , Ruszczyński and Shapiro (2003b), 2003. A. Prékopa and T. Szántai. On optimal regulation of a storage level with application to the water level regulation of a lake. European Journal of Operations Research, 3:175–189, 1979. S.T. Rachev. Probability Metrics and the Stability of Stochastic Models. Wiley, Chichester, 1991. S.T. Rachev and W. Römisch. Quantitative stability in stochastic programming: The method of probability metrics. Mathematics of Operations Research, 27:792–818, 2002. S.T. Rachev and L. Rüschendorf. Mass Transportation Problems, volume I. Springer, Berlin, 1998. R.T. Rockafellar and R. J-B Wets. Continuous versus measurable recourse in n-stage stochastic programming. Journal of Mathematical Analysis and Applications, 48(3):836–859, 1974. R.T. Rockafellar and R. J-B Wets. Variational Analysis. Springer-Verlag, Berlin, 1998. W. Römisch. Stability of Stochastic Programming Problems, chapter 8, pages 483–554. Volume 10 of , Ruszczyński and Shapiro (2003b), 2003. W. Römisch and R. Schultz. Stability analysis for stochastic programs. Ann. Oper. Res., 30:241–266, 1991. W. Römisch and R. Schultz. Multistage stochastic integer programs: an introduction, pages 581–600. In , Grötschel et al. (2001), 2001. W. Römisch and S. Vigerske. Quantitative stability of fully random mixedinteger two-stage stochastic programs. Optimization Letters, 2:377–388, 2008. W. Römisch and A. Wakolbinger. Obtaining convergence rates for approximations in stochastic programming, pages 327–343. In , Guddat et al. (1987), 1987. H.L. Royden. Real analysis. Macmillan, New York, 1963.
Bibliography
167
A. Ruszczyński. Decomposition Methods, chapter 3, pages 141–221. Volume 10 of , Ruszczyński and Shapiro (2003b), 2003. A. Ruszczyński and A. Shapiro. Optimality and Duality in Stochastic Programming, chapter 2, pages 65–140. Volume 10 of Handbooks in Operations Research and Management Science, Ruszczyński and Shapiro (2003b), 2003a. A. Ruszczyński and A. Shapiro, editors. Stochastic Programming, volume 10 of Handbooks in Operations Research and Management Science. Elsevier, Amsterdam, 2003b. R. Schultz. Rates of convergence in stochastic programs with complete integer recourse. SIAM Journal of Optimization, 6:1138–1152, 1996. R. Schultz. Stochastic programming with integer variables. Mathematical Programming, 97(1-2):285–309, 2003. S. Sen and J.L. Higle. The C 3 theorem and a D2 algorithm for large scale stochastic mixed-integer programming: Set convexification. Mathematical Programming, 104(1):1–20, 2005. S. Sen and H.D. Sherali. Decomposition with branch-and-cut approaches for two-stage stochastic mixed-integer programming. Mathematical Programming, 106(A):203–223, 2006. S.P. Sethi and G. Sorger. A theory of rolling horizon decision making. Annals of Operations Research, 29:387–416, 1991. A. Shapiro. Inference of statistical bounds for multistage stochastic programming problems. Mathematical Methods of Operations Research, 58 (1):57–68, 2003a. A. Shapiro. Monte Carlo Sampling Methods, chapter 6, pages 353–425. Volume 10 of , Ruszczyński and Shapiro (2003b), 2003b. L. Stougie. Design and analysis of algorithms for stochastic integer programming. PhD thesis, Center for Mathematics and Computer Science, Amsterdam, 1985. R.M. Van Slyke and R. Wets. L-shaped linear programs with applications to optimal control and stochastic programming. SIAM Journal of Applied Mathematics, 17(4):638–663, 1969.
168
Bibliography
S.W. Wallace and W.T. Ziemba, editors. Applications of Stochastic Programming. MPS/SIAM Series on Optimization. SIAM, Philadelphia, 2005. R. Wets. Solving stochastic programs with simple recourse. Stochastics, 10: 219–242, 1983. R.J. Wittrock. Advances in a nested decomposition algorithm for solving staircase linear programs. Technical Report SOL 83-2, Systems Optimization Laboratory, Department of Operations Research, Stanford University, 1983. V.M. Zolotarev. Probability metrics. Theory of Probability and its Applications, 28:278–302, 1983.