Optimization and Multiobjective Control of Time-Discrete Systems
Dmitrii Lozovanu · Stefan Pickl
Optimization and Multiobjective Control of Time-Discrete Systems Dynamic Networks and Multilayered Structures
123
Prof. Dr. Dmitrii Lozovanu Institute of Mathematics and Computer Science Academy of Sciences of Moldova Academystr. 5 Chisinau 2028 Moldova
[email protected]
ISBN: 978-3-540-85024-3
Prof. Dr. Stefan Pickl Institute for Theoretical Computer Science, Mathematics and Operations Research ¨ der Bundeswehr Munchen ¨ Universitat 85577 Neubiberg-M¨unchen Germany
[email protected]
e-ISBN: 978-3-540-85025-0
DOI 10.1007/978-3-540-85025-0 Library of Congress Control Number: 2008942716 c Springer-Verlag Berlin Heidelberg 2009 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: WMXDesign GmbH, Heidelberg Printed on acid-free paper 9 8 7 6 5 4 3 2 1 springer.com
Foreword
Richard Bellmann developed a theory of dynamic programming which is for many reasons still in the center of great interest. The authors present a new approach in the field of the optimization and multi-objective control of time-discrete systems which is closely related to the work of Richard Bellmann. They develop their own concept and their extension to the optimization and multi-objective control of time-discrete systems as well as to dynamic networks and multilayered structures are very stimulating for further research. Different perspectives of discrete control and optimal dynamic flow problems on networks are treated and characterized. Together with the algorithmic solutions a framework of multi-objective control problems is derived. The conclusion with a real world example underlines the necessity and importance of their theoretic framework. As they come back to the classical Bellmann concept of dynamic programming they stress and honor his basic concept without debase their own work. Multilayered decision processes as part of the design and analysis of complex systems and networks will be essential in many ways and fields in the future.
George Leitmann, Berkeley June 2008
Preface
A relevant topic in modern control theory is concerned with multi-objective control problems and suitable extensions of methods for solving discrete problems that generalize classical ones. In this book an attempt is made to develop a mathematical framework for studying such classes of problems and to elaborate algorithms for solving them. The main concentration is addressed to multi-objective discrete control models with integral-time costs expressed by a trajectory when the starting and the final states of the dynamical system are fixed. Such models are formulated and studied by using game-theoretical concepts. The dynamics of the system is assumed to be controlled by several actors (players) where each of them has the aim to optimize his own integral-time cost along the trajectory determined by vectors of control parameters chosen by all players together. Pareto, Nash and Stackelberg optimization principles for the considered models are applied and new classes of dynamic cooperative, non-cooperative and hierarchical games, respectively, are defined. The basic results are concerned with the determination of optimal stationary and non-stationary strategies of the players in the multi-objective control problems. Necessary and sufficient conditions for the existence of optimal strategies of the players are given and algorithms based on dynamic programming for finding such strategies are proposed. Time-discrete systems with finite sets of states are studied. The dynamics of such systems are described by a directed graph in which each vertex corresponds to a dynamic state and the edges correspond to transitions of the system from one state to another. This fact allows us to formulate the considered control models on dynamic networks and to derive algorithms by using the so-called time-expanded network method. This method is developed for multi-objective control problems and dynamic optimal flow problems. The book consists of five chapters.
viii
Preface
In Chapter 1 we introduce multi-objective control problems with p players. The game-theoretical concept for classical discrete optimal control problems is studied and new classes of dynamic games are formulated. We introduce the multi-objective control problem for the non-cooperative as well as for the cooperative case. Stationary and non-stationary control parameters for these time discrete systems are determined. Theorems of the existence of Nash equilibria, Pareto optima and Stackelberg strategies in the considered dynamic games are proved. These are based on the concept of the so-called alternate players’ control condition and the concept of dynamic games in positional form. Both were invented by Dmitrii Lozovanu as a main theoretical concept. The computational complexity is treated and the time expanded network is characterized. At the end hierarchical control problems are solved. Chapter 2 is devoted to max-min discrete control problems and to the solution of zero-sum dynamic games on networks. Necessary and sufficient conditions for the existence of saddle points in such games are given and algorithms for determining the optimal strategies of the players are derived. The chapter begins with discrete control problems and finite antagonistic games. Max-min control problems with infinite time horizon and zero-sum games on networks are introduced. In the main part results for an arbitrary network are derived. The most important results of this chapter are concerned with finding the optimal stationary strategies in cyclic games. Algorithms based on dynamic programming and the dichotomy method for finding optimal strategies are proposed. In Chapter 3 we extend and generalize the models developed in the first part of the book. We introduce discrete control problems with varying time of states’ transitions. An algorithm for solving a single objective control problem is presented. Discrete control problems with certain cost functions of system passages that depend on the transition time of states’ transitions are introduced. Furthermore, the control problem with transition time functions on the edges is solved. In the main part of this chapter multi-objective control problems of time discrete systems with varying time of states’ transition are considered. We present an algorithm for solving the discrete optimal control problem with infinite time horizon as well as with varying time of states’ transitions. The chapter concludes with a general approach for algorithmic solutions of discrete optimal control problems. Within these game-theoretic extension the noncooperative and the cooperative case is treated. At the end the new special concept of Pareto/Nash equilibria for multi-objective control and of a ParetoStackelberg solution are characterized and interpreted. Chapter 4 is devoted to optimal dynamic flow problems which generalize discrete control problems on networks. The time-expanded network method for solving optimal dynamic multi-commodity flow problems is developed. The chapter begins with single commodity dynamic flow problems and the presentation of the time expanded network method for their solving.
Preface
ix
We consider a dynamic model with flow storage at nodes and introduce integral constant demand supply functions. We extend this approach by minimum cost flows and develop a suitable algorithm for solving the main dynamic flow problem. In the main part multi-commodity dynamic flow problems and algorithms for their solving are treated. The chapter ends with a few generalizations, especially with an algorithm for solving the maximum dynamic multi-commodity flow problem. Finally, a game-theoretic approach for dynamic flow problems is treated. Chapter 5 refers to applications and related topics. In the first chapters the theoretical framework was treated in detail. The work of Richard Bellmann is extended and several algorithms and game-theoretic concepts are developed. The authors use this framework to model a general multilayered decision process on networks (the abbreviation is MILAN). As a an example the so-called Technology Emission Means (TEM-) Model which was developed by Stefan Pickl is extended and embedded into a general multilayered decision process. In the first part of this chapter, the TEM model is introduced as a timediscrete system which should be used for resource planning processes. The problem of fixed point controllability and null-controllability is introduced as well as the determination of the optimal investment parameter. A gametheoretical extension and another interpretation of the Bellmann functional equation leads directly to an applied multilayered decision process and terminates the book. The authors want to thank Dr. Silja Meyer-Nieberg, Dr. Heiko Hahn, Arnold Dupuy, Goran Mihelcic and Marco Schuler for carefully reading the manuscript. Furthermore we would like to thank Annemarie Fischaleck and Tino Krug for their help during the typesetting process and especially ZaferKorcan G¨ org¨ ul¨ u for his outstanding help in the layout process.
Dmitrii Lozovanu, Stefan Pickl, M¨ unchen November 2008
Contents
1
Multi-Objective Control of Time-Discrete Systems and Dynamic Games on Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Single-Objective Discrete Control Problem . . . . . . . . . . . . 1.1.2 Multi-Objective Control Based on the Concept of Non-cooperative Games: Nash Equilibria . . . . . . . . . . . . . 1.1.3 Hierarchical Control and Stackelberg’s Optimization Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Multi-Objective Control Based on the Concept of Cooperative Games: Pareto Optima . . . . . . . . . . . . . . . . . 1.1.5 Stationary and Non-Stationary Control of Time-Discrete Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Multi-Objective Control of Time-Discrete Systems with Infinite Time Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Alternate Players’ Control Condition and Nash Equilibria for Dynamic Games in Positional Form . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Algorithms for Solving Single-Objective Control Problems on Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Dynamic Programming Algorithms for Solving Optimal Control Problems on Networks . . . . . . . . . . . . . . 1.4.2 An Extension of Dijkstra’s Algorithm for Optimal Control Problems with a Free Number of Stages . . . . . . . 1.5 Multi-Objective Control and Non-Cooperative Games on Dynamic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 The Problem of Determining the Optimal Stationary Strategies in a Dynamic c-Game . . . . . . . . . . . . . . . . . . . . . 1.5.2 The Problem of Determining the Optimal Non-Stationary Strategies in a Dynamic c-Game . . . . . . 1.6 Main Results for Dynamic c-Games with Constant Costs of the Edges and Determining Optimal Stationary Strategies of the Players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 2 4 7 8 10 10 11 15 15 18 22 22 25
26
xii
Contents
1.7 Computational Complexity of the Problem of Determining Optimal Stationary Strategies in a Dynamic c-Game . . . . . . . . . 1.8 Determining the Optimal Stationary Strategies for a Dynamic c-Game with Non-Constant Cost Functions on the Edges . . . . . 1.9 Determining Nash Equilibria for Non-Stationary Dynamic c-Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.1 Time-Expanded Networks for Non-Stationary Dynamic c-Games and Their Main Properties . . . . . . . . . 1.9.2 Determining Nash Equilibria . . . . . . . . . . . . . . . . . . . . . . . . 1.10 Application of the Dynamic c-Game for Studying and Solving Multi-Objective Control Problems . . . . . . . . . . . . . . . . . . . . . . . . . 1.11 Multi-Objective Control and Cooperative Games on Dynamic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.11.1 Stationary Strategies on Networks and Pareto Solutions 1.11.2 A Pareto Solution for the Problem with NonStationary Strategies on Networks . . . . . . . . . . . . . . . . . . . 1.12 Determining Pareto Solutions for Multi-Objective Control Problems on Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12.1 Determining Pareto Stationary Strategies . . . . . . . . . . . . . 1.12.2 Pareto Solution for the Non-Stationary Case of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12.3 Computational Complexity of the Stationary Case of the Problem and an Algorithm for its Solving on Acyclic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.13 Determining Pareto Optima for Multi-Objective Control Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.14 Determining a Stackelberg Solution for Hierarchical Control Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.14.1 A Stackelberg Solution for Static Games . . . . . . . . . . . . . 1.14.2 Hierarchical Control on Networks and Determining Stackelberg Stationary Strategies . . . . . . . . . . . . . . . . . . . . 1.14.3 An Algorithm for Determining Stackelberg Stationary Strategies on Acyclic Networks . . . . . . . . . . . . . . . . . . . . . . 1.14.4 An Algorithm for Solving Hierarchical Control Problems 2
Max-Min Control Problems and Solving Zero-Sum Games on Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Discrete Control and Finite Antagonistic Dynamic Games . . . . 2.2 Max-Min Control Problem with Infinite Time Horizon . . . . . . . 2.3 Zero-Sum Games on Networks and a Polynomial Time Algorithm for Max-Min Paths Problems . . . . . . . . . . . . . . . . . . . . 2.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 An Algorithm for Solving the Problem on Acyclic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Main Results for the Problem on an Arbitrary Network
45 45 53 53 55 57 58 58 59 60 60 65
65 66 67 68 69 73 78
81 81 82 83 84 86 88
Contents
2.4
2.5
2.6 2.7 2.8 3
xiii
2.3.4 A Polynomial Time Algorithm for Determining Optimal Strategies of the Players in a Dynamic c-Game 90 2.3.5 A Pseudo-Polynomial Time Algorithm for Solving a Dynamic c-Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 A Polynomial Time Algorithm for Solving Acyclic l-Games on Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 2.4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 2.4.2 Main Properties of Optimal Strategies in Acyclic l-Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 2.4.3 A Polynomial Time Algorithm for Finding the Value and the Optimal Strategies in an Acyclic l-Game . . . . . . 103 Cyclic Games: Algorithms for Finding the Value and the Optimal Strategies of the Players . . . . . . . . . . . . . . . . . . . . . . . . . . 105 2.5.1 Problem Formulation and Main Properties . . . . . . . . . . . . 106 2.5.2 Determining the Best Response of the First Player for a Fixed Strategy of the Second Player . . . . . . . . . . . . . . . . 107 2.5.3 Some Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . 110 2.5.4 The Reduction of Cyclic Games to Ergodic Games . . . . . 111 2.5.5 A Polynomial Time Algorithm for Solving Ergodic Zero-Value Cyclic Games . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 2.5.6 A Polynomial Time Algorithm for Solving Cyclic Games Based on the Reduction to Acyclic l-Games . . . . 113 2.5.7 An Approach for Solving Cyclic Games Based on a Dichotomy Method and Solving Dynamic c-Games . . . . 116 Cyclic Games with Random States’ Transitions of the Dynamical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 A Nash Equilibria Condition for Cyclic Games with p Players . 118 Determining Pareto Optima for Cyclic Games with p Players . 122
Extension and Generalization of Discrete Control Problems and Algorithmic Approaches for its Solving . . . . . 125 3.1 Discrete Control Problems with Varying Time of States’ Transitions of the Dynamical System . . . . . . . . . . . . . . . . . . . . . . . 125 3.1.1 The Single-Objective Control Problem with Varying Time of States’ Transitions of the Dynamical System . . 126 3.1.2 An Algorithm for Solving a Single-Objective Control Problem with Varying Time of States’ Transitions of the Dynamical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 3.1.3 The Discrete Control Problem with Cost Functions of System’s Passages that Depend on the Transition-Time of States’ Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 3.2 The Control Problem on a Network with Transition-Time Functions on the Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 3.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
xiv
Contents
3.3
3.4
3.5
3.6
4
3.2.2 An Algorithm for Solving the Problem on a Network with Transition-Time Functions on the Edges . . . . . . . . . 134 Multi-Objective Control of Time-Discrete Systems with Varying Time of States’ Transitions . . . . . . . . . . . . . . . . . . . . . . . . 141 3.3.1 Multi-Objective Discrete Control with Varying Time of States’ Transitions of Dynamical Systems . . . . . . . . . . 141 3.3.2 A Dynamic c-Game on Networks with Transition-Time Functions on the Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 3.3.3 Remark on Determining Pareto Optima for the Multi-Objective Control Problem with Varying Time of States’ Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 An Algorithm for Solving the Discrete Optimal Control Problem with Infinite Time Horizon and Varying Time of the States’ Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 3.4.1 Problem Formulation and Some Preliminary Results . . . 150 3.4.2 An Algorithm for Determining an Optimal Stationary Control for Dynamical Systems with Infinite Time Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 A General Approach for Algorithmic Solutions of Discrete Optimal Control Problems and its Game-Theoretic Extension . 154 3.5.1 A General Optimal Control Model . . . . . . . . . . . . . . . . . . . 154 3.5.2 An Algorithm for Determining an Optimal Solution of the Problem with Fixed Starting and Final States . . . . . 156 3.5.3 The Discrete Optimal Control Problem on a Network . . 159 3.5.4 The Game-Theoretic Control Model with p Players . . . . 160 3.5.5 The Game-Theoretic Control Problem on Networks and an Algorithm for its Solving . . . . . . . . . . . . . . . . . . . . 161 3.5.6 Multi-Criteria Discrete Control Problems: Pareto Optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Pareto-Nash Equilibria for Multi-Objective Games . . . . . . . . . . . 171 3.6.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 3.6.2 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 3.6.3 Discrete and Matrix Multi-Objective Games . . . . . . . . . . 177 3.6.4 Some Comments on and Interpretations of Multi-Objective Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 3.6.5 Determining a Pareto-Stackelberg Solution for Multi-Objective Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Discrete Control and Optimal Dynamic Flow Problems on Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 4.1 Single-Commodity Dynamic Flow Problems and the Time-Expanded Network Method for Their Solving . . . . . . . . . . 181 4.1.1 The Minimum Cost Dynamic Flow Problem . . . . . . . . . . 182 4.1.2 The Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 4.1.3 The Dynamic Model with Flow Storage at Nodes . . . . . . 186
Contents
xv
4.1.4 The Dynamic Model with Flow Storage at Nodes and Integral Constant Demand-Supply Functions . . . . . . . . . . 188 4.1.5 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 4.1.6 Constructing the Time-Expanded Network and its Size . 190 4.1.7 Approaches for Solving the Minimum Cost Flow Problem with Different Types of Cost Functions on the Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 4.1.8 Determining the Minimum Cost Flows in Dynamic Networks with Transition Time Functions that Depend on Flow and Time . . . . . . . . . . . . . . . . . . . . . . . . . 208 4.1.9 An Algorithm for Solving the Maximum Dynamic Flow Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 4.2 Multi-Commodity Dynamic Flow Problems and Algorithms for their Solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 4.2.1 The Minimum Cost Multi-Commodity Dynamic Flow Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 4.2.2 The Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 4.2.3 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 4.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 4.2.5 The Dynamic Multi-Commodity Minimum Cost Flow Problem with Transition Time Functions that Depend on Flows and on Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 4.2.6 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 4.2.7 An Algorithm for Solving the Maximum Dynamic Multi-Commodity Flow Problem . . . . . . . . . . . . . . . . . . . . 229 4.3 The Game-Theoretic Approach for Dynamic Flow Problems on Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 5
Applications and Related Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 5.1 Analysis and Control of Time-Discrete Systems: Resource Planning - The TEM Model . . . . . . . . . . . . . . . . . . . . . . 233 5.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 5.1.2 The Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 5.1.3 Control Theoretic Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 5.1.4 Problem of Fixed Point Controllability and Null-Controllability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 5.1.5 Optimal Investment Parameter . . . . . . . . . . . . . . . . . . . . . . 240 5.1.6 A Game-Theoretic Extension Relation to Multilayered Decision Problems . . . . . . . . . . . 244 5.2 Algorithmic Solutions for an Emission Reduction Game: The Kyoto Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 5.2.1 The Core in the TEM Model . . . . . . . . . . . . . . . . . . . . . . . 250 5.2.2 A Second Cooperative Treatment of the TEM Model . . 259 5.2.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
xvi
Contents
5.3 An Emission Reduction Process The MILAN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 5.3.1 MILAN: Multilayered Games on Networks The General Kyoto Game as a Multi-Step Process . . . . . 269 5.3.2 Sequencing and Dynamic Programming . . . . . . . . . . . . . . 271 5.3.3 Generalizations of the Feasible Decision Sets: Optimal Solutions on k-Layered Graphs . . . . . . . . . . . . . . 274 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games on Networks
In this chapter we formulate and study multi-objective discrete control problems by using game-theoretical concepts. We consider time-discrete systems with a finite set of states. The starting and the final states of the dynamical system are fixed. We assume that the dynamics of the system is controlled by p actors (players) and each of them intends to optimize his own integral-time cost of the system’s passages by a certain trajectory. Applying Stackelberg, Nash and Pareto optimality principles for such a model we obtain multi-objective control problems and solutions which correspond to solutions of hierarchical, non-cooperative and cooperative dynamic games, respectively. Necessary and sufficient conditions for the existence of Nash equilibria, Pareto optima and Stackelberg strategies for the considered game control models are derived. Such conditions for stationary and nonstationary cases of dynamic games are formulated. Furthermore, we extend the dynamic programming techniques for determining the Nash equilibria and the Pareto optima for dynamic games in positional form, especially for dynamic games on networks . Efficient polynomialtime algorithms are elaborated for finding the optimal strategies of the players in dynamic games on networks. Additionally, the computational complexity of the proposed algorithms for the considered classes of dynamic problems is discussed. Some extensions and generalizations of the obtained results are suggested.
1.1 Problem Formulation
We formulate the multi-objective control problems, applying a game-theoretical concept to the following classical discrete control problem [4, 6, 62, 106].
2
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
1.1.1 Single-Objective Discrete Control Problem Let us consider a discrete dynamical system L with a finite set of states X ⊂ Rn . At every time-step t = 0, 1, 2, . . . , the state of the system L is x(t) ∈ X. Two states x0 and xf are given in X, where x0 = x(0) represents the starting point of L and xf is the state, in which the system L must be brought, i.e. xf is the final state of L. We assume that the system L should reach the final state xf at the time-moment T (xf ) such that T1 ≤ T (xf ) ≤ T2 , where T1 and T2 are given. The dynamics of the system L is described as follows: (1.1) x(t + 1) = gt (x(t), u(t)), t = 0, 1, 2, . . . where x(0) = x0 and
(1.2)
u(t) = (u1 (t), u2 (t), . . . , um (t)) ∈ Rm
represents the vector of the control parameters (see [4, 6, 51, 106]). For any time step t and an arbitrary state x(t) ∈ X the feasible set Ut (x(t)) of the vector u(t) of control parameters is given, i.e. u(t) ∈ Ut (x(t)),
t = 0, 1, 2, . . . .
(1.3)
We assume that in (1.1) the vector functions gt (x(t), u(t)) = (gt1 (x(t), u(t)), gt2 (x(t), u(t)), . . . , gtn (x(t), u(t))) are determined uniquely by x(t) and u(t) at every time-step t = 0, 1, 2, . . . . So, x(t + 1) is determined uniquely by x(t) and u(t). Additionally, we assume that at each moment in time t the cost ct (x(t), x(t + 1)) = ct (x(t), gt (x(t), u(t))) of the system’s passage from state x(t) to state x(t + 1) is known. Let x0 = x(0), x(1), x(2), . . . , x(t), . . . be a trajectory generated by given vectors of the control parameters u(0), u(1), . . . , u(t − 1), . . . . Then either this trajectory passes through the state xf at the time-moment T (xf ) or it does not pass through xf . We denote by T (xf )−1
Fx0 xf (u(t)) =
t=0
ct (x(t), gt (x(t), u(t)))
(1.4)
1.1 Problem Formulation
3
the integral-time cost of the system’s passage from x0 to xf if T1 ≤ T (xf ) ≤ T2 ; otherwise we put Fx0 xf (u(t)) = ∞. Problem 1.1. Find vectors of control parameters u(0), u(1), u(2), . . . , u(t), . . . , which satisfy the conditions (1.1)-(1.3) and minimize functional (1.4). If T1 = T2 we obtain the discrete control problem with a fixed number of stages, i.e. the problem from [6]; if T1 = 0, T2 = ∞ we have the discrete control problem with a free number of stages [4, 51, 106]. Problem 1.1 with T (xf ) = T1 = T2 can be solved by using a dynamic programming method in the following way: Denote by Fx∗0 x(t) the minimal integral-time cost of the system’s passage from starting state x0 = x(0) to state x = x(t) ∈ X by using exactly t stages. So, t−1 Fx∗0 x(t) = cl (x∗ (l), gl (x∗ (l), u∗ (l))), l=0
where
x(0) = x∗ (0), x∗ (1), x∗ (2), . . . , x∗ (t) = x(t)
is the optimal trajectory from x0 = x(0) to x(t), generated by the optimal control u∗ (0), u∗ (1), u∗ (2), . . . , u∗ (t − 1); if there is no control which generates such a trajectory, then Fx∗0 x(t) = ∞. It is easy to observe that the values Fx∗0 x(t) for x(t) ∈ X, t = 0, 1, 2, . . . , T , can be tabulated by using the following recursive formula {Fx∗0 x(t−1) + ct−1 (x(t − 1), x(t))}, if X − (x(t)) = ∅; min ∗ Fx0 x(t) = x(t−1)∈X − (x(t)) ∞, if X − (x(t)) = ∅, t = 1, 2, . . . , T, where Fx∗0 x(0) =
0, if x(0) = x0 ; ∞, if x(0) = x0 ,
and X − (x(t)) = {x(t − 1) ∈ X | x(t) = gt−1 (x(t − 1), u(t − 1)), u(t − 1) ∈ Ut−1 (x(t − 1))}.
4
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
In this procedure the most important effort is concerned with the determination of the set X − (x(t)) for a given state x(t) at the time moment t. In order to determine this set we have to verify if for an arbitrary state x(t − 1) at the time moment t − 1 there exists an admissible vector u(t − 1) ∈ Ut−1 (x(t − 1)) such that x(t) = gt−1 (x(t − 1), u(t − 1)). If Fx∗0 x(t) , t = 0, 1, 2, . . . , T , are known, then the optimal control u∗ (0), ∗ u (1), u∗ (2), . . . , u∗ (T − 1) and the corresponding optimal trajectory x(0) = x∗ (0), x∗ (1), x∗ (2), . . . , x∗ (T ) = xf from x0 to xf can be found starting from the final state xf as follows: Fix the vector u∗ (T − 1) and the state x∗ (T − 1) ∈ X − (x(t)) such that Fx∗0 x∗ (T ) = Fx∗0 x∗ (T −1) + cT −1 (x∗ (T − 1), gT −1 (x∗ (T − 1), u∗ (T − 1)). Then we find u∗ (T − 2) and x∗ (T − 2) such that Fx∗0 x∗ (T −1) = Fx∗0 x∗ (T −2) + cT −2 (x∗ (T − 2), gT −2 (x∗ (T − 2), u∗ (T − 2)). After that, fix u∗ (T − 3) and x∗ (T − 3) such that Fx∗0 x∗ (T −2) = Fx∗0 x∗ (T −3) + cT −3 (x∗ (T − 3), gT −3 (x∗ (T − 3), u∗ (T − 3)) and so on. Finally, we find the optimal control u∗ (0), u∗ (1), u∗ (2), . . . , u∗ (T − 1) and the optimal trajectory x(0) = x∗ (0), x∗ (1), x∗ (2), . . . , x∗ (T ) = xf . Problem 1.1, in case T1 ≤ T (xf ) ≤ T2 , can be reduced to the previous case. First of all we calculate the values Fx∗0 x(t) for x(t) ∈ X, t = 0, 1, 2, . . . , T2 . Then we find the time t∗ such that Fx∗0 x(t∗ ) =
min {Fx∗0 x(t) | x(t) = xf }.
T1 ≤t≤T2
After that, fix T (xf ) = t∗ and determine the optimal control u∗ (0), u∗ (1), u∗ (2), . . . , u∗ (T (xf )−1) and the trajectory x∗ (0), x∗ (1), x∗ (2), . . . , x∗ (T (xf )) from starting state x0 to final state xf in the same way as in the previous case. In the following we will develop a dynamic programming technique for multi-objective control problems. 1.1.2 Multi-Objective Control Based on the Concept of Non-cooperative Games: Nash Equilibria Consider a dynamic system L with a finite set of states X, where at every time-step t the state of L is x(t) ∈ X. The dynamics of the system L is controlled by p players and it is described as follows: x(t + 1) = gt (x(t), u1 (t), u2 (t), . . . , up (t)),
t = 0, 1, 2, . . . .
(1.5)
1.1 Problem Formulation
5
where x(0) = x0 is a starting point of the system L and ui (t) ∈ Rmi represents the vector of the control parameters of player i, i ∈ {1, 2, . . . , p}. The state x(t + 1) of system L at the time-step t + 1 is obtained uniquely if the state x(t) at the time-step t is known and the players 1, 2, . . . , p fix their vectors of the control parameters u1 (t), u2 (t), . . . , up (t) independently, respectively. For each player i, i ∈ {1, 2, . . . , p} the admissible sets Uti (x(t)) for the vectors of the control parameters ui (t) are given, i.e. ui (t) ∈ Uti (x(t)),
t = 0, 1, 2, . . . ; i = 1, p.
(1.6)
We assume that Uti (x(t)), t = 0, 1, 2, . . . ; i = 1, p, are non-empty finite sets and Uti (x(t)) ∩ Utj (x(t)) = ∅, i = j, t = 0, 1, 2, . . . . Let us consider that the players 1, 2, . . . , p fix their vectors of the control parameters u1 (t), u2 (t), . . . , up (t); t = 0, 1, 2, . . . , respectively, and the starting state x(0) = x0 as well as the final state xf are known. Then for the fixed vectors of the control parameters u1 (t), u2 (t), . . . , , up (t) either a unique trajectory x0 = x(0), x(1), x(2), . . . , x(T (xf )) = xf from x0 to xf exists and T (xf ) represents the time-moment when the state xf is reached, or such a trajectory from x0 to xf does not exist. We denote by T (xf )−1
Fxi 0 xf (u1 (t), u2 (t), . . . , up (t)) =
cit (x(t), gt (x(t), u1 (t), u2 (t), . . . , up (t)))
t=0
the integral-time cost of the system’s passage from x0 to xf for player i, i ∈ {1, 2, . . . , p} if the vectors u1 (t), u2 (t), . . . , up (t) satisfy condition (1.6) and generate a trajectory x0 = x(0), x(1), x(2), . . . , x(T (xf )) = xf from x0 to xf such that
T1 ≤ T (xf ) ≤ T2 ;
otherwise we put Fxi 0 xf (u1 (t), u2 (t), . . . , up (t)) = ∞. Note that cit (x(t), gt (x(t), u1 (t), u2 (t), . . . , up (t))) = cit (x(t), x(t + 1)) represents the cost of the system’s passage from state x(t) to state x(t + 1) at the stage [t, t + 1] for player i.
6
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Problem 1.2. Find vectors of control parameters ∗
∗
∗
∗
∗
∗
u1 (t), u2 (t), . . . , ui−1 (t), ui (t), ui+1 (t), . . . , up (t), which satisfy the condition ∗
∗
∗
∗
∗
∗
Fxi 0 xf (u1 (t), u2 (t), . . . , ui−1 (t), ui (t), ui+1 (t), . . . , up (t)) ≤ ∗
∗
∗
∗
∗
≤ Fxi 0 xf (u1 (t), u2 (t), . . . , ui−1 (t), ui (t), ui+1 (t), . . . , up (t)) ∀ ui (t) ∈ Rmi , t = 0, 1, 2, . . . ; i = 1, p. So, we consider the problem of finding the solution in the sense of Nash [2, 79, 82]. The problems formulated above, can be regarded as mathematical models for dynamical systems controlled by several players who do not inform each other which vectors of control parameters they use in the control process. An important particular case of Problem 1.2 is represented by the zero-sum control problem of two players with the given costs ct (x(t), x(t + 1)) = c2t (x(t), x(t + 1)) = −c1t (x(t), x(t + 1)) of the system’s passage from state x(t) to state x(t + 1), which determine the payoff function Fx0 xf (u1 (t), u2 (t)) = Fx20 xf (u1 (t), u2 (t)) = −Fx10 xf (u1 (t), u2 (t)). In this case we seek for a saddle point (u1∗ (t), u2∗ (t)) of the function Fx0 xf (u1 (t), u2 (t)) [83], i.e. we consider the following max-min control problem: Problem 1.3. Find vectors of control parameters u1∗ (t), u2∗ (t) such that min Fx0 xf (u1 (t), u2 (t)) Fx0 xf (u1∗ (t), u2∗ (t)) = max 1 2 u (t) u (t)
= min max Fx0 xf (u1 (t), u2 (t)). 2 1 u (t) u (t)
So, for this max-min control problem we are seeking for a saddle point. We will describe the classification of necessary and sufficient conditions for the existence of Nash equilibria in such dynamic games, which has been obtained in [58, 59]. Furthermore, we introduce the classification of dynamical games for which Nash equilibria exist and algorithms for solving such kind of problems will be proposed.
1.1 Problem Formulation
7
1.1.3 Hierarchical Control and Stackelberg’s Optimization Principle Now we shall use the concept of hierarchical control and assume that in (1.5) for an arbitrary state x(t) at every moment in time, the players fix their vectors of control parameters successively one after another according to their numerical order. Moreover, we assume that each player fixing his vectors of control parameters informs posterior players which vector of control parameters has been chosen at the given moment in time for a given state. So, we consider the following hierarchical control process. Let L be a dynamical system with a finite set of states X and a fixed starting point x(0) = x0 ∈ X. The dynamics of system L is defined by the system of difference equations (1.5) and it is controlled by p players by using the corresponding vectors of the control parameters u1 (t), u2 (t), . . . , up (t). For each vector of control parameters ui (t) the feasible set (1.6) is defined for an arbitrary state x(t) at every discrete moment in time t. Additionally, we assume that for an arbitrary state x(t) ∈ X at every moment in time t the players fix their vectors of control parameters successively one after another according to a given order. For simplicity, we will consider that the players fix their vectors of control parameters in the order corresponding to their numbers. Each player, after fixing his vectors of control parameters, informs posterior players which vector of control parameters has been chosen at the given moment in time for a given state. Finally, if the vectors of control parameters u1 (t), u2 (t), . . . , up (t) and the starting state x(0) = x0 are known then the cost Fxi 0 xf (u1 (t), u2 (t), . . . , up (t)) of the system’s passage from the starting state x0 to the final state xf for player i ∈ {1, 2, . . . , p} is defined in the same way as in Section 1.1.2. In this hierarchical control process we are looking for Stackelberg strategies [70, 102], i.e. we consider the following hierarchical control problem: Problem 1.4. Find vectors of control parameters u1∗ (t), u2∗ (t), . . . , up∗ (t), for which u1∗ (t) =
Fx10 xf (u1 (t), u2 (t), . . . , up (t));
argmin 1
u (t)∈U
1
(ui (t)∈Ri (u1 ,...,ui−1 ))2≤i≤p u2∗ (t) =
Fx20 xf (u1∗ (t), u2 (t), . . . , up (t));
argmin
u2 (t)∈R2 (u1∗ ) (ui (t)∈Ri (u1∗ ,u2 ,...,ui−1 ))
3≤i≤p
u3∗ (t) =
3
argmin 1∗
u (t)∈R3 (u
,u
2∗
)
Fx30 xf (u1∗ (t), u2∗ (t), . . . , up (t));
(ui (t)∈Ri (u1∗ ,u2∗ ,u3 ,...,ui−1 ))4≤i≤p .. . up∗ (t) =
argmin up (t)∈Rp (u1∗ ,u2∗ ,...,up−1∗ )
Fxp0 xf (u1∗ (t), u2∗ (t), . . . , up−1∗ (t), up (t));
8
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
where Rk (u1 , u2 , . . . , uk−1 ) represents the best response of player k when the players 1, 2, . . . , k − 1 have already fixed their vectors u1 (t), u2 (t), . . . , uk−1 (t), i.e. R2 (u1 ) =
Fx20 xf (u1 (t), u2 (t), . . . , up (t));
argmin 2
u (t)∈U
2
(ui (t)∈Ri (u1 ,...,ui−1 ))3≤i≤p R3 (u1 , u2 ) =
Fx30 xf (u1 (t), u2 (t), . . . , up (t));
argmin
u3 (t)∈U 3 i 1 i−1 u (t)∈R )) ( i (u ,...,u
4≤i≤p
.. . Rp (u1 , u2 , . . . , up−1 ) = argmin
up (t)∈U p
Ui =
x(t) t
Fxp0 xf (u1 (t), u2 (t), . . . , up (t));
Uti (x(t)),
t = 0, 1, 2, . . . ; i = 1, p.
It is easy to observe that if the solution u1∗ (t), u2∗ (t), . . . , up∗ (t) of Problem 1.2 does not depend on the order of fixing the vectors of control parameters of the players 1, 2, . . . , p then u1∗ (t), u2∗ (t), . . . , up (t) is a solution in the sense of Nash. If c2t (x(t), x(t+1)) = −c1t (x(t), x(t+1)) = ct (x(t), x(t+1)), then we obtain the max-min control problem of two players with the payoff functions Fx0 xf (u1 (t), u2 (t)) = Fx20 xf (u1 (t), u2 (t)) = −Fx10 xf (u1 (t), u2 (t)). In this case we are seeking for the vector of the control parameters u1∗ (t), u2∗ (t) such that min Fx0 xf (u1 (t), u2 (t)). Fx0 xf (u1∗ (t), u2∗ (t)) = max 1 2 u (t) u (t)
For the considered class of problems we will also develop an algorithm based on dynamic programming. 1.1.4 Multi-Objective Control Based on the Concept of Cooperative Games: Pareto Optima We consider a dynamical system L, which is controlled by p players 1, 2, . . . , p. Assume that the players coordinate their actions in the control processes by using common vectors of control parameters u(t) = (u1 (t), u2 (t), . . . , up (t)) ∈ Rm (see [6, 13, 51, 80, 81, 88, 89]). So, the dynamics of the system is described according to (1.1)-(1.3), i.e. x(t + 1) = gt (x(t), u(t)),
t = 0, 1, 2, . . .
1.1 Problem Formulation
9
where x(0) = x0 and u(t) ∈ Ut (x(t)),
t = 0, 1, 2, . . . .
Additionally, we assume that system L should reach the final state at the time moment T (xf ) such that T1 ≤ T (xf ) ≤ T2 . Let u(0), u(1), u(2), . . . , u(t − 1), . . . be a players’ control, which generates a trajectory x(0), x(1), x(2), . . . , x(t), . . . . Then either this trajectory passes through state xf at the finite moment T (xf ) or it does not passes through xf . We denote by T (xf )−1
Fxi 0 xf (u(t)) =
cit (x(t), gt (x(t), u(t))),
i = 1, p
t=0
the integral-time cost of the system’s passage from x0 to xf if T1 ≤ T (xf ) ≤ T2 ; otherwise we put
Fxi 0 xf (u(t)) = ∞.
Here cit (x(t), gt (x(t), u(t))) = cit (x(t), x(t + 1)) represents the cost of the system’s passage from state x(t) to state x(t + 1) at the stage [t, t + 1] for player i, i ∈ {1, 2, . . . , p}. Problem 1.5. Find vectors of control parameters u∗ (t) such that there is no other control vector u(t) = u∗ (t), for which (Fx10 xf (u(t)), Fx20 xf (u(t)) . . . , Fxp0 xf (u(t))) ≤ ≤ (Fx10 xf (u∗ (t)), Fx20 xf (u∗ (t)) . . . , Fxp0 xf (u∗ (t))) and for any i0 ∈ {1, 2, . . . , p} Fxi00 xf (u(t)) < Fxi00 xf (u∗ (t)). So, we consider the problem of finding a Pareto solution [79, 81, 89]. Unlike Nash equilibria, Pareto optima for multi-objective discrete control always exist if there is an admissible solution u(t), t = 0, 1, 2, . . . , T (xf ), which generates a trajectory x0 = x(0), x(1), x(2), . . . , x(T (xf )) = xf from x0 to xf .
10
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
1.1.5 Stationary and Non-Stationary Control of Time-Discrete Systems Furthermore, we will distinguish the stationary and non-stationary control for time-discrete systems. All problems formulated in Section 1.1.1-1.1.4 are related to the non-stationary case. The main argument for such a classification is that the functions gti , t = 0, 1, 2, . . . , may be different for different moments in time and the players in the control process can change their vectors of control parameters for an arbitrary state x = x(t) at different moments in time t. Additionally, for a given state x = x(t) the admissible sets Uti (x(t)), i = 1, p, can be different for different moments in time t. Moreover, the costs of the system’s passage ct (x(t), x(t + 1)) from state x = x(t) to state y = x(t + 1) are varying in time for given x and y. Stationary versions of the considered control problems correspond to the case when the functions gti , t = 0, 1, 2, . . . , do not change in time, i.e. gti ≡ g i , t = 0, 1, 2, . . . , and the players preserve the same vectors of control parameters in time for given states x ∈ X. Additionally, we consider that the admissible sets Uti (x(t)), t = 0, 1, 2, . . . , for vectors of control parameters do not change in time, i.e. Uti (x(t)) = U i (x), t = 0, 1, 2, . . . , i = 1, p. Note that, in general, for non-stationary control problems the players can use non-stationary strategies although the functions gti , t = 0, 1, 2, . . . , and the admissible sets of the control parameters Uti (x(t)), t = 0, 1, 2, . . . , may not change in time, i.e. gti ≡ g i , t = 0, 1, 2, . . . and Uti (x(t)) = U i (x), t = 0, 1, 2, . . . , i = 1, p.
1.2 Multi-Objective Control of Time-Discrete Systems with Infinite Time Horizon The problems formulated in Section 1.1 correspond to mathematical models for the control of time-discrete systems with fixed starting and final states when the number of stages in the control process is limited. For the control problems with infinite time horizon the final state is not given and the control process is made indefinitely on discrete moments in time t = 0, 1, 2, . . . . The objective function which has to be minimized in such problems for the case of a single-objective control is defined as follows: τ 1 Fx0 (u(t)) = lim ct (x(t), gt (x(t), u(t))). τ →∞ τ t=0
This quantity expresses the mean integral-time cost of a trajectory x(0), x(1), . . . , x(t), . . . if the control u(0), u(1), . . . , u(t) is applied. In [4, 98, 99] it is shown that if the set of states X is finite then for the stationary case of the control problem with infinite time horizon and constant cost functions there exists an optimal stationary control and the optimal
1.3 Alternate Players’ Control and Nash Equilibria
11
value of the mean integral-time cost can be found by using finite methods. In general, for control models with infinite time horizon one of the most important problems is concerned with determining the asymptotic behavior of the integral-time cost function ϕ(τ ) for a given control, i.e. it is necessary to determine the function ϕ(τ ) and k = const such that lim
τ →∞
τ −1 1 ct (x(t), gt (x(t), u(t))) = k. ϕ(τ ) t=0
(1.7)
Here we can also consider the problem of determining the control u1 (0), u(1), . . . , u(t) which satisfies condition (1.7) for a given ϕ(t) and k. The concept of multi-objective control from Section 1.1 can be extended in an analogous way for the control problem with infinite time horizon if we define the mean integral time cost for the players as follows: Fxi 0 (u1 (t), u2 (t), . . . , up (t)) = = lim
τ →∞
τ −1 1 i c (x(t), gt (x(t), u1 (t), u2 (t), . . . , up (t))), τ t=0 t
i = 1, p.
In such a way we define dynamic games with infinite time horizon for which Stackelberg strategies, Nash equilibria and Pareto optima must be found. If p = 2 and c2t (x(t), gt (x(t), u1 (t), u2 (t))) = −c1t (x(t), gt (x(t), u1 (t), u2 (t))), then we obtain a zero-sum control problem with infinite time horizon. For such a game we are seeking for a saddle point. We will describe the most important results related to multi-objective discrete control problems with infinite time horizon in Chapter 2.
1.3 Alternate Players’ Control Condition and Nash Equilibria for Dynamic Games in Positional Form In order to formulate the theorem of the existence of Nash equilibria for the considered multi-objective control problem from Section 1.1.2 we will apply the following condition: We assume that an arbitrary state x(t) ∈ X of the dynamic system L at the time-moment t represents a position (x, t) ∈ X × {0, 1, 2, . . . } of one of the players i ∈ {1, 2, . . . , p}. This means that in the control process the next state x(t + 1) ∈ X is determined (chosen) by player i if the dynamic system L at the time-moment t has the state x(t), which corresponds to the position (x, t) of player i.
12
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
This situation corresponds to the case when the expression gt (x(t), u1 (t), u2 (t), . . . , ui−1 (t), ui (t), ui+1 (t), . . . , up (t)) in (1.5) for a given position (x, t) of player i only depends on the control vector ui (t), i.e. gt (x(t), u1 (t), u2 (t), . . . , ui−1 (t), ui (t), ui+1 (t), . . . , up (t)) = gti (x(t), ui (t)). So, the notations (x, t) and x(t) have the same meaning. Definition 1.6. We say that the alternate players’ control condition is satisfied for the multi-objective control problems if for any fixed (x, t) ∈ X × {0, 1, 2, . . . } the equations (1.5) only depend on one of the vectors of control parameters. The multi-objective control problems with such an additional condition are called game control models in positional form. The following lemma presents a necessary and sufficient condition that holds for the alternate players’ control. Lemma 1.7. The alternate players’ control condition for the multi-objective control problem holds if and only if at every time-step t = 0, 1, 2, . . . for the set of states X there exists a partition X = X1 (t) ∪ X2 (t) ∪ · · · ∪ Xp (t);
(Xi (t) ∩ Xj (t) = ∅, i = j)
(1.8)
such that the equations (1.5) can be represented as follows: x(t + 1) = gti (x(t), ui (t)) if x(t) ∈ Xi (t);
t = 0, 1, 2, . . . ; i = 1, p,
(1.9)
i.e. gt (x(t), u1 (t), u2 (t), . . . , ui (t), ui+1 (t), . . . , up (t)) = = gti (x(t), ui (t)) if x(t) ∈ Xi (t); t = 0, 1, 2, . . . ; i = 1, p. Here, Xi (t) corresponds to the set of the positions of player i at the time-step t (note that some of Xi (t) in (1.8) can be empty sets). Proof. ⇒ Let us assume that the alternate players’ control condition holds for a multi-objective control problem. Then for a fixed time-step t the equations (1.5) depend on only one of the vectors of control parameters ui (t), i ∈ {1, 2, . . . , p}. Therefore, if we denote by Xi (t) the set of the states of the dynamical system which corresponds to the positions of player i at time-step t, equation (1.5) can be regarded as a solution which satisfies (1.9). ⇐ Let us assume that the partition (1.8) is given for any t = 0, 1, 2, . . . , and the expression in (1.5) is represented in the form (1.9). This means that this equation at every time-step t depends on only one of the vectors of the control parameters.
1.3 Alternate Players’ Control and Nash Equilibria
13
On the basis of these results we can prove the important fact that the set of the positions can be characterized in the following way: Corollary 1.8. If the alternate players’ control condition for the multiobjective control problem holds then the set of the positions Zi ⊆ X × {0, 1, 2, . . . } of player i can be represented as follows:
Zi =
(Xi (t), t),
i = 1, p.
t
Let us assume that the alternate players’ control for the problem from Section 1.1.2 holds. Then the set of possible system’s transitions of dynamical system L canbe described by a directed graph G = (Z, E) with the set of p vertices Z = i=1 Zi , where Zi , i = 1, p, represents the set of the positions of player i. An arbitrary vertex z ∈ Z in G corresponds to a position (x, t) of one of the players i ∈ {1, 2, . . . , p} and a directed edge e = (z , z ) reflects the possibility of the system’s transition from state z = (x, t) to state z = (y, t + 1) determined by x(t) and the control vector ui (t) ∈ Uti (x(t)) such that y = x(t + 1) = gti (x(t), ui (t)) if x(t) ∈ Zi . To the edges ((x, t), (y, t+1)) of graph G we associate the costs ci ((x, t), (y, t+ 1)) = cit (x(t), gti (x(t), ui (t))), i = 1, p.
(X,0)
(X,1)
(X,T1-1)
(X,T1)
... x(t)
. . .
. . .
. . .
. . .
(X,T2)
...
...
(x0 , 0)
(X,T2-1)
... . . .
. . .
. . .
x(t+1) . . .
...
. . .
. . .
...
...
... Fig. 1.1.
(xf , T2) . . .
. . .
14
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Graph G is represented in Fig. 1.1. This graph contains T2 + 1 copies of the set of states X(t) = (X, t), where X(t) = X1 (t) ∪ X2 (t) ∪ · · · ∪ Xp (t), t = 0, T2 . In G there are also the edges ((x, t), (xf , T2 )) if T1 − 1 ≤ t ≤ T2 − 1 and for a given position (x, t) = x(t) ∈ Xi (t) of player i there exists a control ui (t) ∈ Uti (x(t)) such that xf = x(t + 1) = gti (x(t), ui (t)). To these edges ((x, t), (xf , T2 )) we associate the costs ci ((x, t), (xf , T2 )) = cit (x(t), gti (x(t), ui (t))), i = 1, p. It is easy to observe that G is an acyclic directed graph in which an arbitrary directed path from (x0 , 0) to (xf , T2 ) contains T (xf ) edges such that T1 ≤ T (xf ) ≤ T2 . So, a directed path from (x0 , 0) to (xf , T2 ) corresponds to a feasible trajectory of the dynamical system from x0 to xf . This means that our multi-objective problem with alternate players’ control condition can be regarded as a dynamic non-cooperative game on a network. Taking into account this representation of the dynamics of system L the following theorem is proved in [67, 70, 71]. Theorem 1.9. Let us assume that for the multi-objective control problem there exists a trajectory x0 = x(0), x(1), x(2), . . . , x(T (xf )) = xf from starting state x0 to final state xf generated by vectors of control parameters u1 (t), u2 (t), . . . , up (t), t = 0, T (xf ) − 1, where ui (t) ∈ Uti (x(t)), i = 1, p, t = 0, T (xf ) − 1 and T1 ≤ T (xf ) ≤ T2 . Moreover, we assume that the alternate players’ control condition is satisfied. ∗ ∗ ∗ Then for this problem there exists a optimal solution u1 (t), u2 (t), . . . , up (t) in the sense of Nash. Furthermore, in Section 1.5, 1.6 we prove this theorem in the more general case, when the dynamics of system L is determined by a directed graph of transitions which may contain cycles. As an important result from Theorem 1.9 we obtain the following corollary: Corollary 1.10. Assume that for any u1 (t) ∈ Ut1 (x(t)), t = 0, 1, 2, . . . in the max-min control problem there exists a control u2 (t) ∈ Ut2 (x(t)), t = 0, T (xf ) − 1 such that u1 (t) and u2 (t) generate a trajectory x0 = x(0), x(1), x(2), . . . , x(T (xf )) = xf from starting state x0 to final state xf , where T1 ≤ T (xf ) ≤ T2 . Moreover, we assume that the alternate players’ control condition is satisfied. Then for the payoff function Fx0 xf (u1 (t), u2 (t)) in the max-min control problem there exists a saddle point (u1∗ (t), u2∗ (t)), i.e.
1.4 Algorithms for Solving Single-Objective Control Problems on Networks
15
Fx0 xf (u1∗ (t), u2∗ (t)) = max min Fx0 xf (u1 (t), u2 (t)) 1 2 u (t) u (t)
= min max Fx0 xf (u1 (t), u2 (t)). 2 1 u (t) u (t)
All results related to the existence theorems and algorithms for solving the problems on networks can be transferred to the problems from the Sections 1.1, 1.3.
1.4 Algorithms for Solving Single-Objective Control Problems on Networks We describe two algorithms for solving single-objective control problems which will be developed for multi-objective control models. 1.4.1 Dynamic Programming Algorithms for Solving Optimal Control Problems on Networks We consider an optimal control problem for which the dynamics of system L is described by a directed graph G = (X, E), where the vertices x ∈ X correspond to the states of L and an arbitrary edge e = (x, y) ∈ E signifies the possibility of the system’s passage from state x = x(t) to state y = x(t + 1) at every moment in time t = 0, 1, 2, . . . So, the set E(x) = {e = (x, y) | (x, y) ∈ E} of edges originated in x corresponds to an admissible set of control parameters, which determines the next possible state y = x(t + 1) of L, if stage x = x(t) at the moment in time t is given. Therefore, we consider E(x) = ∅, ∀x ∈ X. Additionally, we assume that to each edge e = (x, y) ∈ E a cost function ce (t) is associated, which depends on the time and which expresses the cost of the system L to pass from state x = x(t) to state y = x(t + 1) at the stage [t, t + 1] (like a transition). So, this graph of states’ transitions contains edges, which represent the time-depending cost functions. Additionally, in G two vertices x0 and xf , which correspond to the starting and the final states of system L, are given. We call such a special graph a dynamic network [55, 56, 57, 67]. For a given dynamic network we regard the following problem: Problem 1.11. Find a sequence of system’s transitions (x(0), x(1)), (x(1), x(2)), . . . , (x(T − 1), x(T )), which transfers system L from starting state x0 = x(0) to final state xf = x(T ) such that T satisfies the condition T 1 ≤ T ≤ T2 and the integral-time cost Fx0 xf (T ) =
T −1 t=0
c(x(t),x(t+1)) (t)
16
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
of system’s transitions by a trajectory x0 = x(0), x(1), x(2), . . . , x(T ) = xf is minimal. This problem generalizes the well-known shortest path problem in a weighted directed graph [15, 18] and arises as an auxiliary one when solving the minimum-cost flow problem on dynamic networks [26, 27, 29, 30, 31, 63, 76, 104]. We describe the dynamic programming algorithm for solving this problem as well as the problem from Section 1.1.1. First we describe the algorithm for solving the problem in case T1 = T2 = T . Denote by Fx∗0 xf (T ) =
min
x0 =x(0),x(1),...,x(T )=xf
T −1
c(x(t),x(t+1)) (t)
t=0
the minimal integral-time cost of the system’s transition from x0 to xf with T stages. If xf can not be reached by using T stages, then we put Fx∗0 xf (T ) = ∞. For Fx∗0 x(t) (t) the following recursive formula can be gained: Fx∗0 x(t) (t) =
min
− x(t−1) ∈ XG (x(t))
where
Fx∗0 x(t−1) (t − 1) + c(x(t−1),x(t)) (t − 1) ,
Fx∗0 x(0) (0) = 0
and
− (y) = {x ∈ X | e = (x, y) ∈ E}. XG
Using this recursive formula we can tabulate the values Fx∗0 x(t) (t), t = 1, 2, . . . , T for every x(t) ∈ X. So, if T1 = T2 = T , then the problem can be solved in time O(|X|2 T ) (here we do not take into account the number of operations for calculating the values of the functions ce (t) for given t). The tabulation process should be organized in such a way that for every vertex x = x(t) at a given moment in time t it is determined not only the cost Fx∗0 x(t) (t) but also the state x∗ (t − 1) at the previous moment in time for which Fx∗0 x(t) (t) = Fx∗0 x∗ (t−1) + c(x∗ (t−1),x(t)) (t − 1) =
min
− x(t−1)∈XG (x(t))
{Fx∗0 x(t−1) + c(x(t−1),x(t)) (t − 1)}.
So, if to each x at a given moment in time t = 0, 1, 2, . . . , T we associate a label (t, x(t), Fx∗0 x(t) , x∗ (t − 1)), then the corresponding table allows us to
1.4 Algorithms for Solving Single-Objective Control Problems on Networks
17
find the optimal trajectory successively starting from the final position, xf = x∗ (T ), x∗ (T −1), . . . , x∗ (1), x∗ (0) = x0 . In the example given below all possible labels for every x and every t are represented in Table 1. In the case of T (xf ) ∈ [T1 , T2 ] with T1 = T2 the problem can be reduced to T2 − T1 + 1 problems with T = T1 , T = T1 + 1, T = T1 + 2, . . . , T = T2 , respectively; by comparing the minimal integral-costs of these problems we find the best one and T (xf ). An important case of the considered problem is the case of T1 = 0, T2 = ∞. This case is reasonable only for positive and non-decreasing cost functions ce (t) on edges e ∈ E. Obviously, for this case we obtain 0 ≤ T (xf ) ≤ |X| and the problem can be solved in time O(|X|3 ) (the case with a free number of stages). Example. Let the dynamic network determined by graph G = (X, E) represented in Fig. 1.2 be given.
1
5
0
2
3
4 Fig. 1.2.
The cost functions are the following: c(0,1) (t) = c(0,3) (t) = c(2,5) (t) = 1; c(2,3) (t) = c(3,1) (t) = 2t; c(3,4) (t) = 2t + 2; c(1,2) (t) = c(2,4) (t) = c(1,5) (t) = t; c(4,5) (t) = 2t + 1. We consider the problem of finding a trajectory in G from x(0) = x0 = 0 to xf = 5, where T1 = T2 = T = 5. Using the recursive formula described above we find Table 1 with values Fx∗0 x(t) (t) and x∗ (t − 1). Starting from final state xf = 5 we find the optimal trajectory 5∗ ← 1∗ ← 3∗ ← 2∗ ← 1∗ ← 0∗ with integral-time cost Fx0 x(5) (5) = 16.
18
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Table 1 t x 0 Fx∗0 x(0) x∗ (0 − 1) 1 Fx∗0 x(1) x∗ (0) 2 Fx∗0 x(2) x∗ (1) 3 Fx∗0 x(3) x∗ (2) 4 Fx∗0 x(4) x∗ (3) 5 Fx∗0 x(5) x∗ (4)
0 0 ∞ ∞ ∞ ∞ ∞ -
1 ∞ 1 0∗ 3 3 ∞ 12 3∗ ∞ -
2 ∞ ∞ 2 1∗ 5 1 ∞ 16 1
3 ∞ 1 0 ∞ 6 2∗ 11 2 ∞ -
4 ∞ ∞ 5 3 4 2 8 2 21 3
5 ∞ ∞ 2 1 3 2 6 2 16 1∗
1.4.2 An Extension of Dijkstra’s Algorithm for Optimal Control Problems with a Free Number of Stages Let us assume that in the dynamic network all cost functions ce (t), e ∈ E, are positive and T1 = 0, T2 = ∞, i.e. we have the problem with a free number of stages. For this case of the problem we describe an algorithm, which extends Dijkstra’s algorithm for finding the tree of optimal paths in a weighted directed graph [15, 18]. For such an algorithm we will find the optimal paths in a dynamic network for our problem if the following additional condition is satisfied. Let us assume that the cost functions ce (t), e ∈ E, in the dynamic network have the following property: If P ∗ (x0 , x) is an arbitrary optimal path from x0 to x which can be represented as P ∗ (x0 , x) = P1∗ (x0 , y) ∪ P2∗ (y, x), where P1∗ (x0 , y) and P2∗ (y, x) have no common edges, then the leading part P1∗ (x0 , y) of the path P ∗ (x0 , x) is also an optimal path of the problem in G with given starting state x0 and final state y. If such a property holds, then we say that for the dynamic network the optimization principle is satisfied. As example, the graph in Fig. 1.3 with cost functions on the edges c(0,1) (t) ≡ c(0,2) (t) ≡ c(1,2) (t) = 1, c(2,3) (t) = 3t determines a network for which the optimization principle is satisfied. In case c(0,1) (t) ≡ c(1,2) (t) = 1; c(0,2) (t) = 3; c(2,3) (t) = 3t the network does not satisfy the optimization principle because the leading part P1∗ (0, 2) = {(0, 2)} of the optimal path P ∗ (0, 3) = {(0, 2), (2, 3)} is not optimal. In the case that on the network the cost functions ce (t), e ∈ E are positive and the optimization principle is satisfied, the following algorithm determines all optimal paths P ∗ (x0 , x) from x0 to each x ∈ X, which correspond to the optimal strategies in the problem for p = 1.
1.4 Algorithms for Solving Single-Objective Control Problems on Networks
0
2
19
3
1 Fig. 1.3.
Algorithm 1.12. Determining the Tree of Optimal Paths Preliminary step (Step 0): Set Y = {x0 }, E ∗ = ∅. Assign to every vertex x ∈ X two labels t(x) and F (x) as follows: t(x0 ) = 0; t(x) = ∞, ∀ x ∈ X \ {x0 }; F (x0 ) = 0; F (x) = ∞, ∀ x ∈ X \ {x0 }. General step (Step k, k ≥ 1): Find the set E = {(x , y ) ∈ E(Y ) | F (x )+c(x ,y ) (t(x )) = min min {F (x)+c(x,y) (t(x))}, x∈Y y∈X(x)
where E(Y ) = {(x, y) ∈ E | x ∈ Y, y ∈ X\Y },
X(x) = {y ∈ X\Y | (x, y) ∈ E(Y )}.
Find the set of vertices X = {y ∈ X \ Y | (x , y ) ∈ E }. For every y ∈ X select one edge (x , y ) ∈ E and build the union E of such edges. After that change the labels t(y ) and F (y ) for every vertex y ∈ X as follows: t(y ) = t(x ) + 1,
F (y ) = F (x ) + c(x ,y ) (t(x )),
∀(x , y ) ∈ E .
Replace set Y by Y ∪ X and E ∗ by E ∗ ∪ E . Note X k = Y, E k = E ∗ . If X k = X then fix the tree GT k = (X k , E k ) and go to next step k + 1, otherwise fix the tree GT = (X, E ∗ ) and STOP. Note, that the tree GT = (X, E ∗ ) contains optimal paths from x0 to each x ∈ X. After k steps of the algorithm the tree GT k = (X k , E k ) represents a part of GT . If it is necessary to find the optimal path from x0 to xf , then the algorithm can be interrupted after k steps as soon as the condition xf ∈ X k is satisfied, i.e. in this case the condition X k = X in the algorithm must be replaced by xf ∈ X k . The labels F (x), x ∈ X, indicate the costs of optimal paths from x0 to x ∈ X and t(x) represents the number of edges in these paths. The correctness of the algorithm is based on the following theorem:
20
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Theorem 1.13. Let (G, c(t), x0 , xf ) be a dynamic network, where the vectorfunction c(t) = (ce1 (t), ce2 (t), . . . , ce|E| (t)) has positive and bounded components for t ∈ [0, |X| − 1]. Moreover, let us assume that the optimization principle on the dynamic network is satisfied. Then the tree GT k = (X k , E k ) obtained after k steps of the algorithm gives the optimal paths from x0 to every x ∈ X k which correspond to optimal strategies in the problem for p = 1. Proof. We prove the theorem by using the induction principle on the number of steps k of the algorithm. In the case that k = 0 the assertion is evident. Let us assume that the theorem holds for any k ≤ r and let us show that it is true for k = r + 1. If GT r = (X r , E r ) is the tree obtained after r steps and GT r+1 = (X r+1 , E r+1 ) is the tree obtained after r + 1 steps of the algorithm, then X ◦ = X r+1 \ X r and E ◦ = E r+1 \ E r represent the vertex set and the edge set obtained by the algorithm at step r + 1. Let us show that if y is an arbitrary vertex of X ◦ , then in GT r+1 the unique directed path P ∗ (x0 , y ) from x0 to y is optimal. Indeed, if this is not the case, then there exists an optimal path Q(x0 , y ) from x0 to y , which does not contain the edge e = (z , y ) ∈ E ◦ . The path Q(x0 , y ) can be represented as Q(x0 , y ) = Q1 (x0 , x ) ∪ {(x , y)} ∪ Q2 (y, y ), where x is the last vertex of the path Q(x0 , y ) belonging to X r when we pass from x0 to y . It is easy to observe that if the conditions of the theorem hold then cost (Q(x0 , y )) ≥ cost (P ∗ (x0 , y )), where
cost (Q(x0 , y )) =
mQ
cet (t),
t=0
e0 , e1 , . . . , emQ are the corresponding edges of the directed path Q(x0 , y ) when we pass from x0 to y and cost (P ∗ (x0 , y )) =
mp t=0
cet (t),
where e0 , e1 , . . . , emp are the corresponding edges of the directed path P ∗ (x0 , y ) when we pass from x0 to y . According to the algorithm, we can state that F (x ) + c(x ,y ) (t(x )) > F (z ) + c(z ,y ) (t(z )) = F (y ), where e = (z , y ) is the last edge of the path P ∗ (x0 , y ). Then cost (Q1 (x0 , x ) ∪ {(x , y)}) > cost (P ∗ (x0 , y )), because
F (x ) + c(x ,y) (t(x )) = cost (Q1 (x0 , x ) ∪ {(x , y)})
and F (y ) = cost (P ∗ (x0 , y )).
1.4 Algorithms for Solving Single-Objective Control Problems on Networks
21
The cost functions ce (t), ∀ e ∈ E, are positive, therefore, cost (Q(x0 , y )) = cost (Q1 (x0 , x ) ∪ {(x , y)} ∪ Q2 (y, y )) > > cost (Q1 (x0 , x ) ∪ {(x , y)}) > cost (P ∗ (x0 , y )), i.e. Q(x0 , y ) is not an optimal path from x0 to y . This means that the tree GT r+1 = (X r+1 , E r+1 ) contains an optimal path from x0 to every y ∈ X r+1 . All results described in this section are given in [58, 59]. Example. We consider the problem of determining the tree of optimal paths from x0 to vertices x ∈ X \ {x0 } for the network from the example from Section 1.4.1 (Fig. 1.2). The optimization principle is satisfied for this network, therefore if we use Algorithm 1.12 we obtain Step 0. t(01 ) = 0; t(1) = t(2) = t(3) = t(4) = t(5) = ∞; F (01 ) = 0; F (1) = F (2) = F (3) = F (4) = F (5) = ∞; GT 0 = {{0}, ∅}; E 0 = ∅ Step 1. t(01 ) = 0; X = {1, 3}, E = {(0, 1), (0, 3)} t(11 ) = t(01 ) + 1 = 1; F (11 ) = F (01 ) + 1 = 1 t(31 ) = t(01 ) + 1 = 1; F (31 ) = F (01 ) + 1 = 1 t(2) = t(3) = t(4) = t(5) = ∞ F (2) = F (3) = F (4) = F (5) = ∞ E 1 = {(0, 1), (0, 3)} GT 1 = {{0, 1, 3}, {(0, 1), (0, 3)}} Step 2. t(01 ) = 0; t(11 ) = 1; t(31 ) = 1 X = {4, 5}, E = {(1, 2), (1, 5)} 1 1 t(2 ) = t(1 ) + 1 = 2; t(51 ) = t(11 ) + 1 = 2 t(4) = t(3) + 1 = 2 F (01 ) = 0; F (11 ) = 1; F (21 ) = 2; F (31 ) = 2; F (51 ) = 1; F (4) = 5 E 2 = {(0, 1), (0, 3), (1, 2), (1, 5)} GT 2 = {{0, 1, 2, 3, 5}, {(0, 1), (0, 3), (1, 2), (1, 5)}} Step 3. t(01 ) = 0; t(11 ) = 1; t(31 ) = 1; t(21 ) = 2; t(51 ) = 2 X = {4}, E = {(2, 4)} t(41 ) = t(21 ) + 1 = 3 F (01 ) = 0; F (11 ) = 1; F (21 ) = 2; F (31 ) = 2; F (51 ) = 1
22
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
F (41 ) = F (21 ) + 2 = 4 E 3 = {(0, 1), (0, 3), (1, 2), (1, 5), (2, 4)} GT 3 = {{0, 1, 2, 3, 4, 5}, {(0, 1), (0, 3), (1, 2), (1, 5), (2, 4)}} The tree of optimal paths is represented in Fig. 1.4.
1
5
0
2
3
4 Fig. 1.4.
1.5 Multi-Objective Control and Non-Cooperative Games on Dynamic Networks In this section we study multi-objective control problems in positional form. Moreover, we consider our problems in case that the dynamics of the system is described by a directed graph of states’ transitions which may contain cycles. We use the concept of non-cooperative games for these problems and formulate the following two multi-objective control models concerning stationary and non-stationary strategies [5, 8, 67, 71]. 1.5.1 The Problem of Determining the Optimal Stationary Strategies in a Dynamic c-Game Let G = (X, E), be the graph introduced in Section 1.3 with given starting and final states x0 , xf ∈ X. Assume thatthe vertex set X is divided into p p disjoint subsets X1 , X2 , . . . , Xp (X = i=1 Xi , Xi ∩ Xj = ∅, i = j) and regard vertices x ∈ Xi as states of player i, i = 1, p. Moreover, we assume that to each edge e = (x, y) of the graph p functions c1e (t), c2e (t), . . . , cpe (t) are assigned, where cie (t) expresses the cost of the system’s passage from state x = x(t) to state y = x(t + 1) at the stage [t, t + 1] for player i. We define the stationary strategies of the players 1, 2, . . . , p as maps:
1.5 Multi-Objective Control and Non-cooperative Games
23
s1 : x → y ∈ X(x) for x ∈ X1 \ {xf }, s2 : x → y ∈ X(x) for x ∈ X2 \ {xf }, .. . sp : x → y ∈ X(x) for x ∈ Xp \ {xf }, where X(x) = {y ∈ X | e = (x, y) ∈ E}. Taking into account that G = (X, E) is a finite graph we obtain that the set of strategies of player i Si = {si : x → y ∈ X(x) for x ∈ Xi \ {xf }},
i = 1, p
is a finite set. Let s1 , s2 , . . . , sp be an arbitrary set of strategies of the players. We denote by Gs = (X, Es ) the subgraph generated by edges e = (x, si (x)) for x ∈ Xi \ {xf } and i = 1, p. Obviously, for fixed s1 , s2 , . . . , sp either a unique directed path Ps (x0 , xf ) from x0 to xf exists in Gs or such a path does not exist in Gs . The set of edges of path Ps (x0 , xf ) is denoted by E(Ps (x0 , xf )). For fixed strategies s1 , s2 , . . . , sp and fixed states x0 and xf we define the quantities Hx10 xf (s1 , s2 , . . . , sp ), Hx20 xf (s1 , s2 , . . . , sp ), . . . , Hxp0 xf (s1 , s2 , . . . , sp ) in the following way: Let us assume that the path Ps (x0 , xf ) exists in Gs . Then it is unique and we can assign to its edges numbers 0, 1, 2, 3, . . . , ks , starting with the edge that begins in x0 . These numbers characterize the time steps te (s1 , s2 , . . . , sp ) when the system passes from one state to another, if the strategies s1 , s2 , . . . , sp are applied. We put Hxi 0 xf (s1 , s2 , . . . , sp ) =
cie (te (s1 , s2 , . . . , sp )),
e∈E(Ps (x0 ,xf ))
if T1 ≤ |E(Ps (x0 , xf ))| ≤ T2 ;
(1.10)
otherwise we put Hxi 0 xf (s1 , s2 , . . . , sp ) = ∞. We regard the problem of finding maps s∗1 , s∗2 , . . . , s∗p for which the following conditions are satisfied: Hxi 0 xf (s∗1 , s∗2 , . . . , s∗i−1 , s∗i , s∗i+1 , . . . , s∗p ) ≤ ≤ Hxi 0 xf (s∗1 , s∗2 , . . . , s∗i−1 , si , s∗i+1 , . . . , s∗p ),
∀ si ∈ Si , i = 1, p.
So, we consider the problem of finding the optimal solutions in the sense of Nash.
24
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
This problem can be regarded as dynamic game on network (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ) determined by the graph G, the partition X1 , X2 , . . . , Xp , the vector-functions ci (t) = (cie1 (t), cie2 (t), . . . , cie|E| (t)), i = 1, p, the starting and final states x0 , xf and the time-span [T1 , T2 ]. Note that in the considered problem T1 and T2 satisfy the conditions: 0 ≤ T1 ≤ |X| − 1, T1 ≤ T2 ; for T1 ≥ |X| the problem has no sense. If T1 = 0, T2 = ∞ then we shall use the notation (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t), x0 , xf ). The last version of the problem has been studied in [5]. In [71] this problem is named a dynamic c-game. Note that this problem is NP-hard even in case p = 1. Indeed, if T1 = T2 = |X| − 1 and c1e (t) = 1, ∀e ∈ E, then we obtain the problem of finding the hamiltonian path from x0 to xf in the graph G = (X, E). If T1 = 0 and T2 ≥ |X| − 1 then a polynomial-time algorithm for determining optimal stationary strategies of the players in a dynamic c-game with constant costs cie (t) on edges e ∈ E can be derived. In [66] it is considered the variant of a dynamic c-game with a backward time-step account, which is formulated as follows: Let Ps (x0 , xf ) be a directed path from x0 to xf in Gs generated by strategies s1 , s2 , . . . , sp of the players 1, 2, . . . , p. In 2.3.1 for an edge e ∈ E(Ps (x0 , xf )) the time-step te (s1 , s2 , . . . , sp ) is defined as an order of the edge in the path Ps (x0 , xf ) starting with 0 from x0 . To each edge e ∈ E(Ps (x0 , xf )) we may associate also the backward time-step account te (s1 , s2 , . . . , sp ) if we start numbering the edges with 0 from end position xf in inverse order, i.e. te (s1 , s2 , . . . , sp ) = ks − te (s1 , s2 , . . . , sp ). For fixed strategies s1 , s2 , . . . , sp we define the quantities ← − ← − ← − H 1x0 xf (s1 , s2 , . . . , sp ), H 2x0 xf (s1 , s2 , . . . , sp ), . . . , H px0 xf (s1 , s2 , . . . , sp ) in the following way: We put ← − H ix0 xf (s∗1 , s∗2 , . . . , sp ) =
te (s1 , s2 , . . . , sp ),
i = 1, p;
e∈E(Ps (x0 ,xf ))
if in Hs there exists a path Ps (x0 , xf ) from x0 to xf ; otherwise we put ← − H ix0 xf (s1 , s2 , . . . , sp ) = ∞,
i = 1, p.
So, we obtain a new game on the network. In the case that the costs cie (t) are constant this problem coincides with the problem from [66]. This game can be regarded as a dual problem for the dynamic c-game mentioned above.
1.5 Multi-Objective Control and Non-cooperative Games
25
1.5.2 The Problem of Determining the Optimal Non-Stationary Strategies in a Dynamic c-Game In the dynamical model from Section 1.5.1 we assumed that the players 1, 2, . . . , p use stationary strategies s1 , s2 , . . . , sp . This means that each player i ∈ {1, 2, . . . , p} preserves his strategy si of the system’s passage from state x = x(t) ∈ Xi to state y = si (x) at every discrete moment in time t = 0, 1, 2, . . . . In the following we formulate the dynamic c-game if the players 1, 2, . . . , p use non-stationary strategies u1 , u2 , . . . , up . In this problem an arbitrary player applying his non-stationary strategy ui for a given state x ∈ Xi at different moments in time t and t may transfer system L from state x ∈ Xi to different states y = ui (x(t )), y = ui (x(t )), where y 1 , y 2 ∈ XG (x). We define the non-stationary strategies of the players as maps: u1 : (x, t) → (y, t + 1) ∈ X(x) × {t + 1} for x ∈ X1 \ {xf }, t = 0, 1, 2, . . . ; u2 : (x, t) → (y, t + 1) ∈ X(x) × {t + 1} for x ∈ X2 \ {xf }, t = 0, 1, 2, . . . ; .. . up : (x, t) → (y, t + 1) ∈ X(x) × {t + 1} for x ∈ Xp \ {xf }, t = 0, 1, 2, . . . . Here (x, t) has the same meaning as the notation x(t), i.e. (x, t) = x(t). For any set of non-stationary strategies u1 , u2 , . . . , up we define the quantities Fx10 xf (u1 , u2 , . . . , up ), Fx20 xf (u1 , u2 , . . . , up ), . . . , Fxp0 xf (u1 , u2 , . . . , up ) in the following way. Let u1 , u2 , . . . , up be an arbitrary set of strategies. Then either u1 ,u2 ,. . . ,up generate in G a finite trajectory x0 = x(0), x(1), x(2), . . . , x(T (xf )) = xf from x0 to xf and T (xf ) represents the time moment when xf is reached, or u1 , u2 , . . . , up generate in G an infinite trajectory x0 = x(0), x(1), x(2), . . . , x(t), x(t + 1), . . . which does not pass through xf , i.e. T (xf ) = ∞. In such trajectories the next state x(t + 1) is determined uniquely by x(t) and a map uk , k ∈ {1, 2, . . . , p} as follows: x(t + 1) = uk (x(t), t), x(t) ∈ Xk . If the state xf is reached at finite moment in time T (xf ) and T1 ≤ T (xf ) ≤ T2 ,
26
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
then we set T (xf )−1
Fxi 0 xf (u1 , u2 , . . . , up )
=
t=0
ci(x(t),x(t+1)) (t),
i = 1, p;
otherwise we put Fxi 0 xf (u1 , u2 , . . . , up ) = ∞,
i = 1, p.
Thus we regard the problem of finding non-stationary strategies u∗1 , u∗2 , . . . , u∗p for which the following condition is satisfied: Fxi 0 xf (u∗1 , u∗2 , . . . , u∗i−1 , u∗i , u∗i+1 , . . . , u∗p ) ≤ ≤ Fx∗0 xf (u∗1 , u∗2 , . . . , u∗i−1 , ui , u∗i+1 , . . . , u∗p ),
∀ ui , i = 1, p.
So, we consider the problem of finding the optimal solution in the sense of Nash [2, 64, 65, 66, 67, 69, 70, 71, 72, 73]. In the following we show that for fixed T1 and T2 a polynomial-time algorithm for solving this problem can be elaborated. In general if T1 and T2 are not fixed, i.e. T1 and T2 act as input data parameters in the problem, then such a polynomial-time algorithm for solving this problem may not exist.
1.6 Main Results for Dynamic c-Games with Constant Costs of the Edges and Determining Optimal Stationary Strategies of the Players In this section we study a dynamic c-game with constant costs cie (t) = cie , i = 1, p, on edges e ∈ E for the network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , xf , T1 , T2 ). First we stress our attention to the case of the problem without restriction on the number of stages for the dynamical system, i.e. T1 = 0, T2 = ∞. Namely this case is important for the elaboration of polynomial-time algorithms for determining Nash equilibria in the multi-objective control problem in positional form. On the basis of the results for this particular case we will extend algorithms for the general case of the problem. So, let us consider the dynamic c-game with constant costs of edges cie (t) = i ce , i = 1, p, e ∈ E, and without a restriction on the number of stages by a trajectory from x0 to xf . In this case the dynamic c-game is determined by the network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , xf ), where G = (X, E) is a directed graph with sink vertex xf ∈ X. Note, that if G contains a vertex x ∈ X, for which there is no directed path from x to xf , then it can be deleted without changing the sense of the problem. The Nash equilibria condition and the algorithm for determining optimal stationary strategies of the players have been obtained in [5].
1.6 Main Results for Dynamic c-Games with Constant Costs of the Edges
27
First of all we note that the definition of payoff functions Hxi 0 xf (s1 , s2 , . . . , sp ), i = 1, p, differs here a little from the definition in [5, 8]. In [5, 8] Hxi 0 xf (s1 , s2 , . . . , sp ) for every s1 , s2 , . . . , sp is defined in the following way: If s1 , s2 , . . . , sp generate in G a subgraph Gs ,which contains a unique directed path Ps (x0 , xf ) from x0 to xf , then
Hxi 0 xf (s1 , s2 , . . . , sp ) =
cie .
(1.11)
e∈E(Ps (x0 ,xf ))
If in Gs there is no directed path from x0 to xf then a unique directed cycle Cs with a set of edges E(Cs ) can be obtained when we pass through directed edges from x0 . Therefore, there exists a unique directed cycle Cs , which we can get from x0 and a unique directed path Ps (x0 , x ), which connects x0 and Cs (the vertex x is a unique common vertex of Ps (x0 , x ) and Cs ). In this case Hxi 0 xf (s1 , s2 , . . . , sp ) is defined as follows:
Hxi 0 xf (s1 , s2 , . . . , sp )
⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ =
+∞,
⎪ e∈E(Ps (x0 ,x )) ⎪ ⎪ ⎪ ⎪ ⎪ −∞, ⎪ ⎩
if
cie > 0;
e∈E(Cs )
cie ,
if
cie = 0;
e∈E(Cs )
if
(1.12)
cie < 0.
e∈E(Cs )
For positive costs cie on the edges e ∈ E of the network the problems from [5, 8] and Section 1.3 coincide. Therefore, the results we formulate below are related to all problems with positive and constant costs on edges. Furthermore, we need the following definitions: Definition 1.14. Let s0k and s1k be two different strategies of player k ∈ {1, 2, . . . , p} in a dynamic c-game. We say that the strategy s0k dominates the strategy s1k if for every x ∈ X the following condition holds: i Hxx (s1 , s2 , . . . , sk−1 , s0k , sk+1 , . . . , sp ) ≤ f i (s1 , s2 , . . . , sk−1 , s1k , sk+1 , . . . , sp ) ≤ Hxx f
(1.13)
∀(s1 , s2 , . . . , sk−1 , sk+1 , . . . , sp ) ∈ S1 × S2 × · · · × Sk−1 × Sk+1 × · · · × Sp ;
i = 1, p
and there exist strategies s1 , s2 , . . . , sk−1 , sk+1 , . . . , sp such that i0 (s1 , s2 , . . . , sk−1 , s0k , sk+1 , . . . , sp ) < Hxx f i0 (s1 , s2 , . . . , sk−1 , s1k , sk+1 , . . . , sp ) < Hxx f
(1.14)
for one of the players i0 ∈ {1, 2, . . . , p} and at least for a vertex x ∈ X.
28
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Definition 1.15. The strategy s1k is called a not essential strategy for player k ∈ {1, 2, . . . , p} in a dynamic c-game if there exists a strategy s0k ∈ Sk , which dominates s1k ; otherwise the strategy s1k is called an essential one. The following theorem represents one of the most important results we shall use for determining Nash equilibria in the considered multi-objective control problems on networks. Theorem 1.16. Let (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , xf ) be a dynamic network for which vertex xf in G is attainable from every x ∈ X. Assume that the vectors ci = (cie1 , cie2 , . . . , cie|E| ), i ∈ {1, 2, . . . , p} have positive and constant components. Then in a dynamic c-game on network (G, X1 , X2 , . . . , Xp , c1 , c2 , , . . . , cp , x0 , xf ) for the players 1, 2, . . . , p there exists an optimal solution s∗1 , s∗2 , . . . , s∗p in the sense of Nash, which satisfies the following properties: - the graph Gs∗ = (X, Es∗ ) generated by s∗1 , s∗2 , . . . , s∗p has the structure of a directed tree with sink vertex xf ; - s∗1 , s∗2 , . . . , s∗p represents the solution of a dynamic c-game on network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x, xf ) with an arbitrary starting position x ∈ X and a given final position xf . This theorem has been formulated in [5]. Moreover, in [5] the sketch of its proof is given. Here we give the proof of this theorem in a more detailed form. In order to prove this theorem we need the following auxiliary result: Lemma 1.17. Let sk be a strategy of player k, k ∈ {1, 2, . . . , p}, in a dynamic c-game with a network satisfying the conditions of Theorem 1.16. Additionally, let Gsk = (X, E sk ) be a graph obtained from G by deleting all edges e = (x, y) ∈ E, originating in x ∈ Xk , except edges (x, sk (x)). If in Gsk vertex xf is not attainable from at least one of the vertices x ∈ X, then the strategy sk is not essential. Proof. Assume that for a given strategy sk of player k in the corresponding graph Gsk vertex xf is not attainable from vertices x ∈ X and it is attainable from the rest of the vertices x ∈ X \ X , where X = ∅. Fix a strategy s0k ∈ Sk , 0 0 for which the graph Gsk = (X, E sk ) has the property that xf is attainable from every x ∈ X and let us show that s0k dominates sk , i.e. the strategy sk is not essential. It is easy to observe that in Gsk there are no edges e = (x, y) directed i0 (s1 , s2 , . . . , sk−1 , sk , from x ∈ X to y ∈ X \ X . This means that Hxx f sk+1 , . . . , sp ) = ∞ for (s1 , s2 , . . . , sk−1 , sk+1 , . . . , sp ) ∈ S1 × S2 × · · · × Sk−1 × Sk+1 × · · · × Sp , which involves the validity of condition (1.13) for x ∈ X . 0 Moreover, if we take into account that in the graph Gsk vertex xf is attainable from every x ∈ X , then we obtain that at least for a set of stratei0 gies s1 , s2 , . . . , sk−1 , s0k , sk+1 , . . . , sp the values Hxx (s1 , s2 , . . . , sk−1 , s0k , sk+1 , f . . . , sp ) are finite and therefore condition (1.14) holds.
1.6 Main Results for Dynamic c-Games with Constant Costs of the Edges
29
In the following let us show that condition (1.13) also holds for x ∈ X \X . Indeed, let s1 , s2 , . . . , sk−1 , sk+1 , . . . , sp be an arbitrary set of strategies of the players 1, 2, . . . , k − 1, k + 1, . . . , p. If s1 , s2 , . . . , sk−1 , sk , sk+1 , . . . , sp generate in Gsk a directed path Ps (x, xf ) from x ∈ X \ X to xf then this path 0 does not pass through vertices x ∈ X . This means that in Gsk the set of strategies s1 , s2 , . . . , sk−1 , sk+1 , . . . , sp generates the same path Ps (x, xf ) from x ∈ X \ X to xf . So, condition (1.13) for x ∈ X \ X is also satisfied. This proves that the strategy s0k dominates sk , i.e. sk is not essential. Corollary 1.18. Let G = (X, E) be a directed graph in which vertex xf is attainable from every x ∈ X. Then for an arbitrary essential strategy sk of player k ∈ {1, 2, . . . , p} the corresponding graph Gsk = (X, E sk ) has the property that xf is attainable from every x ∈ X. Corollary 1.19. Let a dynamic c-game with a network satisfying the conditions of Theorem 1.16 be given. Assume that in this dynamic c-game Nash equilibria exist. Then for the considered game there exists such a Nash equilibrium s∗1 , s∗2 , . . . , s∗k−1 , s∗k , s∗k+1 , . . . , s∗p that the corresponding graph Gs∗ = (X, Es∗ ) has the structure of a directed tree with sink vertex xf . Proof of Theorem 1.16. We prove this theorem by using the induction principle on the number of players in the dynamic c-game. It is easy to observe that for p = 1 our problem becomes a well-known optimal paths problem in a weighted directed graph G with sink vertex xf . For this problem there exists a tree of optimal paths Gs∗ = (X, Es∗ ) with sink vertex xf , which determines a strategy s∗ : x → y ∈ X, where s∗ (x) = y, (x, y) ∈ Es∗ . So, for p = 1 the theorem holds. Let us assume that the assertion holds for any p ≤ k, k ≥ 1 and let us show that it is true for p = k + 1. We regard the dynamic c-game on a network with p = k + 1 players. Without loss of generality we may assume that x0 ∈ X1 . We consider the two following cases: Case 1). The set X1 contains only one position, X1 = {x0 } (|X1 | = 1), and for the starting position x0 there are no entering edges (x, x0 ) ∈ E. Case 2). The set X1 may contain more than one position (|X1 | ≥ 1) and for the starting position x0 there may exist entering edges (x, x0 ) ∈ E. At first let us prove the theorem in case 1). We denote possible admissible strategies of the first player by s11 , s21 , . . . , sq1 . Each strategy sk1 : x0 → y ∈ X(x0 ), k = 1, q, corresponds to an edge esk1 = (x0 , sk1 (x0 )) ∈ E(x0 ). We call a strategy s1 of player 1 an admissible strategy, if for the rest of players 2, 3, . . . , p there exist strategies s2 , s3 , . . . , sp such that the corresponding graph Gs = (X, Es ), generated by strategies s1 , s2 , . . . , sp , contains a directed path P (x0 , xf ) from x0 to xf . It is easy to observe that for each admissible strategy sk in the graph Gsk = (X, E sk ) the vertex xf is attainable from every x ∈ X. So, an arbitrary admissible strategy s1 is an essential one.
30
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Let us state that the first player fixes his first possible strategy s1 = s11 and we consider the problem of finding optimal solutions in the sense of Nash with respect to the rest of the players 2, 3, . . . , p. Then in the positional form the obtained game can be regarded as a dynamic c-game with p − 1 players, because the position x0 of the first player can be considered as a position of any other player (we consider it as a position of the second player). So, for s1 = s11 we obtain a new dynamic c-game with 1 p − 1 players on network (Gs1 , X21 , X3 , . . . , Xp , c21 , c31 , . . . , cp1 , x0 , xf ), where 1 1 X21 = X1 ∪ X2 and Gs1 = (X, E s1 ) is the graph, obtained from G by deleting 1 edges e = (x0 , y) ∈ E, for which y = s11 (x0 ); ci1 : E s1 → R1 are the functions obtained, respectively, from the function ci as a result of the contraction 1 1 of set E to set E s1 , i.e. ci1e = cie , ∀e ∈ E s1 , i = 2, p. If we consider the game in the normal form, then it is a game with p − 1 players, determined by p − 1 payoff functions Hx20 xf (s11 , s2 , s3 , . . . , sp ), Hx30 xf (s11 , s2 , s3 , . . . , sp ), . . . , Hxp0 xf (s11 , s2 , s3 , . . . , sp ), where s2 ∈ S2 , s3 ∈ S3 , . . . , sp ∈ Sp . According to the induction principle for this game with p − 1 = k players, there ex1∗ 1∗ ists an optimal solution by Nash strategies s1∗ 2 , s3 , . . . , sp , and the graph 1 1 1 1∗ Gs∗ = (X, Es∗ ), which corresponds to this strategies s1 , s2 , . . . , s1∗ p , has the structure of a directed tree with sink xf . In an analogous way we consider the case that the first player fixes his second admissible strategy s21 . Then, according to the induction principle, 2∗ 2∗ we find the optimal solution by Nash strategies s2∗ 2 , s3 , . . . , sp of the players 2, 3, . . . , p in the dynamic c-game, which in the normal form is determined by the payoff functions Hxi 0 xf (s21 , s2 , s3 , . . . , sp ), i = 2, p. The strate2∗ 2∗ 2 2 gies s21 , s2∗ 2 , s3 , . . . , sp generate the graph Gs∗ = (X, Es∗ ), which has the structure of a tree with sink xf . Furthermore, we consider the case that the first player fixes his third admissible strategy s31 and we find the optimal solution by Nash strategies 3∗ s31 , s3∗ 2 , . . . , sp . Continuing this process we find the following set of strategies of the players 1, 2, . . . , p: 1∗ 1∗ s11 , s1∗ 2 , s3 , . . . , sp ; 2∗ 2∗ s21 , s2∗ 2 , s3 , . . . , sp ; .. . q∗ q∗ sq1 , sq∗ 2 , s3 , . . . , sp
and the corresponding directed trees G1s∗ , G2s∗ , . . . , Gqs∗ with sink vertex. Among these sets of players’ strategies in the dynamic c-game we choose j∗ j∗ j∗ the set sj∗ 1 , s2 , s3 , . . . , sp , for which j∗ j∗ i i i∗ i∗ Hx10 xf (sj∗ 1 , s2 , . . . , sp ) = min Hx0 xf (s1 , s2 , . . . , sp ). 1≤i≤q
(1.15)
j∗ j∗ Let us show that sj∗ 1 , s2 , . . . , sp are optimal by Nash strategies for the players 1, 2, . . . , p in the dynamic c-game.
1.6 Main Results for Dynamic c-Games with Constant Costs of the Edges
31
Indeed, j∗ j∗ j∗ j∗ j∗ Hxi 0 xf (sj∗ 1 , s2 , . . . , si−1 , si , si+1 , . . . , sp ) ≤ j∗ j∗ j∗ j∗ ≤ Hxi 0 xf (sj∗ 1 , s2 , . . . , si−1 , si , si+1 , . . . , sp ),
∀si ∈ Si , i = 1, p,
j∗ j∗ since sj∗ 2 , s3 , . . . , sp are optimal solutions by Nash strategies in the dynamic j∗ j∗ c-game for s1 = sj∗ 1 . Taking into account that the graph Gs∗ = (X, Es∗ ), j∗ j∗ j∗ generated by the strategies s1 , s2 , s3 , . . . , sj∗ p , has the structure of a directed tree with sink vertex and j∗ is chosen according to (1.15), we have j∗ j∗ j∗ 1 j∗ Hx10 xf (sj∗ 1 , s2 , . . . , sp ) ≤ Hx0 xf (s1 , s2 , . . . , sp ),
∀s1 ∈ S1 .
So, in case 1) the theorem holds. Note that the given proof of case 1) also takes place if vertex x0 contains entering edges. In the following for the proof of the general statement of the theorem we shall use the case that x0 does not contain entering edges. Now let us prove the theorem in case 2). We assume that the set X1 may contain more than one position (|X1 | ≥ 1) and for starting position x0 there may exist entering edges (x, x0 ). Let us show that this case can be reduced to case 1): On the basis of Lemma 1.17, if in a dynamic c-game a Nash equilibrium exists, then an optimal strategy of the first player s∗1 will correspond to the ∗ ∗ case that the graph Gs1 = (X, E s1 ) has the property that xf is attainable from every x ∈ X. Therefore, we select all possible strategies s11 , s21 , . . . , sq1 , 1 1 2 2 forq which theq corresponding graphs Gs1 = (X, E s1 ), Gs1 = (X, E s1 ), . . . , Gs1 = (X, E s1 ) have the property that xf is attainable from every x ∈ X. After that we construct an auxiliary graph G = (X, E), which is obtained q 1 2 from the graphs Gs1 , Gs1 , . . . , Gs1 by using a special construction. 1
q
2
In order to describe how to obtain G from Gs1 , Gs1 , . . . , Gs1 we will disj i tinguish the vertex sets from different graphs Gs1 , Gs1 by using the notations i X i and X j , which means that X i is a vertex set of Gs1 and X j is a vertex j set of Gs1 ; for vertices of the corresponding graphs we also use the notation xi ∈ X i and xj ∈ X j . 1
2
q
The graph G is obtained from Gs1 , Gs1 , . . . , Gs1 in the following way: The sink vertices x1f , x2f , . . . , xqf of the corresponding graphs we identify in G by a common sink vertex xf (see Fig. 1.5). esi 1
After that we add a new vertex x0 , which is connected by directed edges = (x0 , xi0 ), i = 1, q, with corresponding vertices xi0 ∈ X i . We associate
costs cei = ε, where ε > 0 is a small value, to these edges esi , i = 1, q; all 1
i
costs on edges from Gs1 are preserved as in the initial graphs.
1
32
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
G s1 1
.
e s¢1
.
1
esi*
x0¢
Gsi*
. x ¢f
1
1
. e s¢q 1
. . Gsq 1
Fig. 1.5.
On G we consider a dynamic c-game with starting position x0 and final position xf . According to case 1) there exists a Nash equilibrium s∗1 , s∗2 , . . . , s∗p , for which the corresponding graph Gs∗ = (X, E s∗ ) has the structure of a directed tree with sink vertex. If we fix in Gs∗ the edge esi∗ = (x0 , xi∗ 0 ) for which i∗
1
∗ i∗ i∗ xi∗ 0 = s (x0 ), then we find the subtree Gs∗ = (X , Es∗ ) of Gs∗ , generated i∗ by set X . This tree corresponds to tree Gs∗ = (X, Es∗ ) of optimal solutions characterized by Nash strategies s∗1 , s∗2 , . . . , s∗q .
Remark 1.20. For a dynamic c-game with payoff functions Hxi 0 xf (s1 , s2 , . . . , sp ), i = 1, p, defined according to (1.11), (1.12), Theorem 1.16 holds for nonnegative costs cie , e ∈ E, i = 1, p, if e∈E(Cs ) cie = 0 for every directed cycle Cs in G. For the dynamic c-game from Section 1.5.1 Theorem 1.16 holds for arbitrary nonnegative costs cie , e ∈ E, i = 1, p. Theorem 1.21. Let (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , xf ) be a network for which vertex xf in G is attainable from every x ∈ X. Assume that the vectors ci = (cie1 , cie2 , . . . , cie|E| ), i ∈ {1, 2, . . . , p} have positive and constant components. Then on the vertex set X of the network game there exist p real functions ε1 : X → R1 , ε2 : X → R1 , . . . , εp : X → R1 , which satisfy the conditions:
1.6 Main Results for Dynamic c-Games with Constant Costs of the Edges
33
a) εi (x) − εi (y) + ci(x,y) ≥ 0,
∀(x, y) ∈ Ei , i = 1, p,
where Ei = {e = (x, y) ∈ E | x ∈ Xi , y ∈ X}; b) min {εi (x) − εi (y) + ci(x,y) } = 0,
y∈XG (x)
∀x ∈ Xi , i = 1, p;
c) the subgraph G0 = (X, E 0 ) generated by edge set E 0 = E10 ∪ E20 ∪ · · · ∪ Ep0 , Ei0 = {e = (x, y) ∈ Ei | εi (x)−εi (y)+ci(x,y) = 0}, i = 1, p, has the property that vertex xf is attainable from any vertex x ∈ X and G0 contains a 0 0 0 subgraph G = (X, E ), E ⊂ E, which possesses the same property and besides that 0
εi (x) − εi (y) + ci(x,y) = 0, ∀(x, y) ∈ E , i = 1, p. If ε1 , ε2 , . . . , εp are arbitrary real functions, which satisfy the conditions a)c), then the optimal solution characterized by Nash strategies in the dynamic c-game the network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , xf ) can be found as 0 follows: Choose in G an arbitrary directed tree GT = (X, E ∗ ) with sink vertex xf and fix in GT the following maps: s∗1 : x → y ∈ XGT (x)
for
x ∈ X1 ;
: x → y ∈ XGT (x)
for .. . for
x ∈ X2 ;
s∗2
s∗p : x → y ∈ XGT (x)
x ∈ Xp ,
where XGT (x) = {y ∈ X | (x, y) ∈ E ∗ }. Proof. According to Theorem 1.16 in the dynamic c-game with network (G, X1 , , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , xf ) there exists an optimal solution characterized by Nash strategies s∗1 , s∗2 , . . . , s∗p of the players 1, 2, . . . , p and these strategies generate in G a directed tree GTs∗ = (X, Es∗ ) with sink vertex xf . In this tree we find the functions ε1 : X → R1 , ε2 : X → R1 , . . . , εp : X → R1 , i where εi (x) = Hxx (s∗1 , s∗2 , . . . , s∗p ), ∀x ∈ X, i = 1, p. It is easy to verify that f ε1 , ε2 , . . . , εp satisfy the conditions a) and b). Additionally, we can see that in 0 0 G0 there exists the graph G = (X, E ), which satisfies condition c), because 0 0 GT ⊆ G . Moreover, if in G a directed tree GTs = (X, Es ), which is different from GTs∗ , with sink vertex is chosen, then GTs generates another optimal solution characterized by a Nash solution s1 , s2 , . . . , sp .
34
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Now let us show that if ε1 : X → R1 , ε2 : X → R1 , . . . , εp : X → R1 , are arbitrary functions, which verify the conditions a)-c), then an arbitrary 0 directed tree GT = (X, Es∗ ) of G generates the maps: s∗1 : x → y ∈ XGT (x) for x ∈ X1 ; s∗2 : x → y ∈ XGT (x) for x ∈ X2 ; .. . s∗p : x → y ∈ XGT (x) for x ∈ Xp , which correspond to an optimal solution characterized by a Nash solution. We use the induction on the number p of the players in the dynamic cgame. In case p = 1 the statement is true, because X1 = X and the conditions a)-c) for positive c1e provide the existence of tree GT = (X, Es∗ ) of optimal paths, which correspond to the solution s∗1 for the problem of finding the shortest paths from x ∈ X to xf in G. Assume that the statement holds for p ≤ k, k ≥ 1, and let us prove it for p = k + 1. We consider that the first player fixes his strategy s1 = s∗1 and consider the problem of finding an optimum by Nash strategies in the network game with respect to other players. The obtained game in the positional form can be interpreted as a c-game with p−1 players, since the positions of the first player can be considered as the positions of any other player. Furthermore, we consider them as the positions of the second player. Thus, if s1 = s∗1 , we obtain a new game with p − 1 players in the network game (G1 , X21 , X3 , . . . , Xp , c21 , c31 , . . . , cp1 , x0 , xf ), where X21 , G1 and the functions ci1 , i = 2, . . . , p, are defined as in the proof of Theorem 1.16. In the normal form this game is determined by the functions Hx20 xf (s∗1 , s2 , . . . , sp ), Hx30 xf (s∗1 , s2 , . . . , sp ), . . . , Hxp0 xf (s∗1 , s2 , . . . , sp ), s2 ∈ S2 , s3 ∈ S3 , . . . , sp ∈ Sp , where S2 , S3 , . . . , Sp are the respective sets of admissible strategies of the players 2, 3, . . . , p. In this new network game (G1 , X21 , X3 , . . . , Xp , c21 , c31 , . . . , cp1 , x0 , xf ) consider p − 1 functions ε2 : X → R1 ,
ε3 : X → R1 ,
...,
εp : X → R1 ,
which satisfy the conditions: a)
εi (x) − εi (y) + ci1(x,y) ≥ 0,
∀(x, y) ∈ Ei1 , i = 2, p,
where E21 = {e = (x, y) ∈ E 1 | x ∈ X21 , y ∈ X},
Ei1 = {e = (x, y) ∈ E 1 | x ∈ Xi , y ∈ X},
i = 3, p;
1.6 Main Results for Dynamic c-Games with Constant Costs of the Edges
35
b) min
{ε2 (x) − ε2 (y) + c21(x,y) } = 0,
∀x ∈ X21 ,
min
{ε2 (x) − ε2 (y) + ci1(x,y) } = 0,
∀x ∈ Xi , i = 3, p;
y∈XG1 (x) y∈XG1 (x)
0
0
0
0
c) the subgraph G1 = (X, E 1 ) generated by edge set E 1 = E21 ∪ E30 ∪ 0 ∪ · · · ∪ Ep0 , E21 = {e = (x, y) ∈ E21 | ε2 (x) − ε2 (y) + c21(x,y) = 0}, Ei0 = {e = (x, y) ∈ Ei | εi (x) − εi (y) + ci1(x,y) = 0}, i = 3, p, has the property 0
that vertex xf is attainable from any vertex x ∈ X and G1 contains a 1
0
1
0
subgraph G = (X, E ), which possesses the same property and besides that 10 εi (x) − εi (y) + ci1(x,y) = 0, ∀(x, y) ∈ E , i = 2, p. According to the induction assumption, in network game (G1 , X21 , X3 , . . . , Xp , c21 , c31 , . . . , cp1 , x0 , xf ) the solution s∗2 , s∗3 , . . . , s∗p generated by the directed tree GT = (X, Es∗ ), s∗2 : x → y ∈ XGT (x) for x ∈ X21 ; s∗3 : x → y ∈ XGT (x) for x ∈ X3 ; .. . s∗p : x → y ∈ XGT (x) for x ∈ Xp , where s∗2 (x) = s∗1 (x) for x ∈ X1 and s∗2 (x) = s∗2 (x) for x ∈ X2 , is optimal in the sense of Nash. Thus i (s∗1 , s∗2 , s∗3 , . . . , s∗i−1 , s∗i , s∗i+1 , . . . , s∗p ) ≤ Hxx f i ≤ Hxx (s∗1 , s∗2 , s∗3 , . . . , s∗i−1 , si , s∗i+1 , . . . , s∗p ), f
∀si ∈ Si , 2 ≤ i ≤ p.
Also, it is easy to verify that 1 1 (s∗1 , s∗2 , . . . , s∗p ) ≤ Hxx (s1 , s∗2 , . . . , s∗p ), Hxx f f
∀s1 ∈ S1 ,
because for fixed s∗2 , s∗3 , . . . , s∗p in G the problem of finding 1 min Hxx (s1 , s∗2 , . . . , s∗p ) for x ∈ X f
s1 ∈S1
becomes the problem of finding shortest paths from x to xf in graph G = (X, E ), generated by a set E1 and edges (x, s∗i (x)), x ∈ Xi , i = 2, p, with costs c1e on the edges e ∈ E . On this graph the following condition is satisfied: ε1 (x) − ε1 (y) + c1(x,y) ≥ 0;
∀(x, y) ∈ E ,
36
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
which involves 1 1 Hxx (s∗1 , s∗2 , . . . , s∗p ) ≤ Hxx (s1 , s∗2 , . . . , s∗p ), f f
∀s1 ∈ S1 ,
1 (s∗1 , s∗2 , . . . , s∗p ) = ε1 (x), ∀x ∈ X. because Hxx f ∗ ∗ Hence s1 , s2 , . . . , s∗p is an optimal solution in the sense of Nash in the dynamic c-game.
Remark 1.22. Let ε1 : X → R1 , ε2 : X → R1 , . . . , εp : X → R1 , be arbitrary real functions on X in G and c1 , c2 , . . . , cp are p new cost functions on edges e ∈ E obtained from c1 , c2 , . . . , cp as follows: ci(x,y) = εi (x) − εi (y) + ci(x,y) ,
∀(x, y) ∈ E, i = 1, p.
(1.16)
Then the dynamic c-games determined on the networks (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , xf ) and (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , xf ), respectivei (s1 , s2 , . . . , sp ) and ly, are equivalent, because the payoff functions Hxx f i
H xxf (s1 , s2 , . . . , sp ) in such games differ only by a constant, i.e. i
i Hxx (s1 , s2 , . . . , sp ) = H xxf (s1 , s2 , . . . , sp ) + εi (x) − εi (xf ). f
In [5, 8] transformation (1.16) is named the potential transformation of the edges’ costs of the players in G. Remark 1.23. The conditions of Theorem 1.21 guarantee the existence of optimal stationary strategies s∗1 , s∗2 , . . . , s∗p of the players 1, 2, . . . , p for every starting position x ∈ X in a dynamic c-game on network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x, xf ) with positive and constant cost functions c1 , c2 , . . . , cp . If c1 , c2 , . . . , cp are arbitrary constant functions then the conditions of Theorem 1.21 represent necessary and sufficient conditions for the existence of optimal stationary strategies s∗1 , s∗2 , . . . , s∗p in the dynamic c-game on network (G, X1 , X2 , , . . . , Xp , c1 , c2 , . . . , cp , x, xf ) for every starting position x ∈ X. On the basis of the obtained results we can propose the following algorithm for determining Nash equilibria in the considered dynamic game with constant costs on edges of networks. Algorithm 1.24. Determining Nash Equilibria for the Dynamic cGame on an Acyclic Network Let us consider a dynamic c-game for which graph G = (X, E) has the structure of an acyclic directed graph with sink vertex xf .
1.6 Main Results for Dynamic c-Games with Constant Costs of the Edges
37
Preliminary step (Step 0): Fix X 0 = {xf } and put εi (xf ) = 0, ∀i = 1, p; General step (Step k, k ≥ 1): If X \ X k−1 = ∅ then STOP; otherwise find a vertex xk ∈ X \ X k−1 for which XG (xk ) ⊆ X k−1 , where XG (xk ) = {y ∈ X | (xk , y) ∈ E}. If xk ∈ Xik , ik ∈ {1, 2, . . . , p}, then find an edge (xk , y k ) for which εik (y k ) + ci(xk k ,yk ) = min {εik (y) + ci(xk k ,y) }. y∈XG (xk )
After that put
εi (xk ) = εi (y k ) + ci(xk ,yk ) ,
and
i = 1, p
X k = X k−1 ∪ {xk }.
Then go to the next step. If the functions εi , i = 1, p, are known, then the optimal strategies of the players s∗1 , s∗2 , . . . , s∗p can be found as follows: 0
0
Find a tree GTs∗ = (X, Es∗ ) in graph G = (X, E ) and fix the strategies si (x) : x → y ∈ Xi ,
(x, y) ∈ Es∗ , i = 1, p.
Example. Let a dynamic c-game be given on an acyclic network with two players represented in Fig. 1.6, i.e. the network consists of a directed graph G = (X, E) with partition X = X1 ∪ X2 , X1 = {0, 1, 4, 5}, X2 = {2, 3}, starting position x0 = 0, final position xf = 5 and costs of the players 1 and 2 given in parenthesis in Fig. 1.6. In Fig. 1.6 the positions of the first player are represented by circles and positions of the second player are represented by squares.
2 (3,4)
(2,6) (3,2)
0
(2,1)
(1,1) (2,1)
1 (4,1)
4
(3,5)
5
(2,2) (6,8)
(2,3)
3 Fig. 1.6.
We consider the problem of determining optimal stationary strategies of the players in this dynamic c-game with an arbitrary starting position x ∈ X and fixed final position xf = 5.
38
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
If we apply Algorithm 1.24 we obtain Step 0. X 0 = {5}, ε1 (5) = 0, ε2 (5) = 0. Step 1. X \ X 0 = ∅, therefore, find a vertex x1 ∈ X \ X 0 such that XG (x1 ) ⊆ X 0 , i.e. x1 = 4. Vertex 4 belongs to the set of positions of the first player and we calculate ε1 (4) = ε1 (5) + 3 = 3;
ε2 (4) = ε2 (5) + 5 = 5;
X 1 = X 0 ∪ {4} = {5, 4}. Step 2. X \ X 1 = ∅ and a find vertex x2 ∈ X \ X 1 such that XG (x2 ) ⊆ X 1 , i.e. x2 = 2. Vertex 2 belongs to the set of positions of the second player and we calculate min{ε2 (5) + c2(2,5) , ε2 (4) + c2(2,4) } = min{6, 6} = 6. So, we obtain this minimum for the edges (2, 4) and (2, 5). Here we can fix an arbitrary edge from {(2, 4), (2, 5)}. For example, we fix edge (2, 5). Then at step 2 we obtain ε2 (2) = 6;
ε1 (2) = ε1 (5) + c1(2,5) = 2;
X 2 = X 1 ∪ {2} = {2, 4, 5}. Step 3. X \ X 2 = ∅; x3 = 3. Vertex 3 belongs to the set of positions of the second player, therefore we find min{ε2 (2) + c2(3,2) , ε2 (4) + c2(3,4) , ε2 (5) + c2(3,5) } = 7. So, we obtain this minimum for e = (3, 2). We calculate ε2 (3) = ε2 (2) + c2(3,2) = 7;
ε1 (3) = ε1 (2) + c1(3,2) = 4;
X 3 = X 2 ∪ {3} = {2, 3, 4, 5}. Step 4. X \ X 3 = ∅; x4 = 1. Vertex 1 belongs to the set of positions of the first player, therefore we find min{ε1 (2) + c1(1,2) , ε1 (2) + c1(1,3) } = 5. So, we obtain this minimum for e = (1, 2). We calculate ε1 (1) = ε1 (2) + c1(1,2) = 5;
ε2 (1) = ε2 (2) + c2(1,2) = 8;
X 4 = X 3 ∪ {1} = {1, 2, 3, 4, 5}.
1.6 Main Results for Dynamic c-Games with Constant Costs of the Edges
39
Step 5. X \ X 4 = ∅; x5 = 0. Vertex 1 belongs to the set of positions of the first player, therefore we find min{ε1 (2) + c1(0,2) , ε1 (1) + c1(0,1) , ε1 (3) + c1(0,3) } = 5. We determine ε1 (0) = ε1 (2) + c1(0,2) = 5;
ε2 (0) = ε2 (2) + c2(0,2) = 10;
X 5 = X 4 ∪ {0} = {0, 1, 2, 3, 4, 5}. Step 6. X \ X 5 = ∅ STOP The graph GTs∗ = (X, Es∗ ), generated by the corresponding edges that determine εi (x), is represented in Fig. 1.7.
2
0
1
4
5
3
Fig. 1.7.
So, the optimal stationary strategies of the players are the following: s∗1 : 0 → 2; s∗2 : 2 → 5;
1 → 2;
4 → 5;
3 → 2.
Note that at step 2 the minimal value ε2 (2) is not determined uniquely because ε2 (5) + c2(2,5) = ε2 (4) + c2(2,4) = 6. If we select edge (2, 4), then we obtain another solution, i.e. Step 0. ε1 (5) = 0, ε2 (5) = 0 Step 1. ε1 (4) = 3, ε2 (4) = 5 Step 2. ε1 (2) = 4, ε2 (2) = 6
40
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Step 3. ε1 (3) = 6, ε2 (3) = 7 Step 4. ε1 (1) = 7, ε2 (1) = 8 Step 5. ε1 (0) = 7, ε2 (0) = 10
2
0
1
4
5
3 Fig. 1.8.
The corresponding graph GTs∗ = (X, Es∗ ) in this case (see Fig. 1.8) generates other stationary strategies: s∗1 : 0 → 2; s∗2 : 2 → 4;
1 → 2; 3 → 2.
4 → 5;
Algorithm 1.25. Determining Nash Equilibria in a Dynamic c-Game on an Arbitrary Network Based on the Reduction to the Case with an Acyclic Network Let us have a dynamic c-game with p players and let the directed graph G has an arbitrary structure, i.e. G may contain directed cycles. Moreover, we consider that for xf there are no leaving edges (xf , x) ∈ E. We show that the problem in this case can be reduced to the problem of finding optimal strategies in an auxiliary game with a network without directed cycles. We construct an auxiliary directed graph G = (Z, E) without directed cycles, where Z and E are defined as follows: Z = Z 0 ∪ Z 1 ∪ Z 2 ∪ · · · ∪ Z |X|−1 ,
1.6 Main Results for Dynamic c-Games with Constant Costs of the Edges
where
j }, Z j = {z0j , z1j , z2j , . . . , z|X|−1
41
j = 0, |X| − 1,
so, Z 0 , Z 1 , . . . , Z |X|−1 represent the copies of set X; E = E 0 ∪ E 1 ∪ E 2 ∪ · · · ∪ E |X|−2 ∪ E f , where E j = {(zkj , zlj+1 ) | (xk , xl ) ∈ E}, f
E =
|X|−1 {(zkj , zf )
j = 0, |X| − 2;
| (xk , xf ) ∈ E, j = 0, |X| − 3}. |X|−1
is attainable in this graph from any It is easy to observe that vertex zf zk0 ∈ Z 0 . If we delete in G all vertices zki , for which there is no directed path from zki to zfi , then we obtain an acyclic directed graph G = (Z , E ) with |X|−1
sink vertex zf . In the following we divide the vertex set Z into p subsets Z1 , Z2 , . . . , Zp corresponding to the position sets of the players 1, 2, . . . , p, respectively: Z1 = {zkj ∈ Z | xk ∈ X1 , j = 0, |X| − 1}; Z2 = {zkj ∈ Z | xk ∈ X2 , j = 0, |X| − 1}; .. . Zp = {zkj ∈ Z | xk ∈ Xp , j = 0, |X| − 1}.
We define on the edge set E the cost functions as follows: ci(zj ,zj+1 ) = ci(xk ,xl ) , k
l
ci(zj ,z|X|−1 ) = ci(xk ,xf ) , k
f
∀(zkj , zlj+1 ) ∈ E j , j = 0, |X| − 2, i = 1, p; |X|−1
∀(zkj , zf
) ∈ E f , j = 1, |X| − 3;
After that we consider a dynamic c-game with network (G Z1 , Z2 , |X|−1 . . . , Zp , c1 , c2 , . . . , cp , z00 , zf ), where G is an acyclic directed graph with |X|−1
sink vertex zf . If we use Algorithm 1.25, then we find the values εi (zkj ), ∀zkj ∈ Z , i = 1, p. It is easy to observe that if we put εi (xf ) = 0, i = 1, p, |X|−1 and εi (xk ) = εi (zk ), ∀xk ∈ X \ {xf }, i = 1, p, then we obtain funci tions ε : X → R, which satisfy the conditions a)-c) from Theorem 1.21. Thus, we find the tree GT = (X, Es ), which corresponds to optimal strategies s∗1 , s∗2 , . . . , s∗p of the players in our dynamic c-game. Algorithm 1.25 is inconvenient because of the great number of vertices in the auxiliary network. Furthermore, we present a simpler algorithm for finding optimal strategies of the players.
42
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Algorithm 1.26. Determining Nash Equilibria for the Dynamic cGame with an Arbitrary Network Preliminary step: Assign to every vertex x ∈ X a set of labels ε1 (x), ε2 (x), ,. . . , εp (x) as follows: εi (xf ) = 0, i
ε (x) = ∞,
∀i = 1, . . . , p, ∀x ∈ X \ {xf }, i = 1, . . . , p.
General step (step k (k ≥ 1)): For every vertex x ∈ X \ {xf } change the labels εi (x), i = 1, . . . , p, in the following way: If x ∈ Xk then find vertex x for which εk (x) + ck(x,x) = min {εk (y) + ck(x,y) }. y∈X(x)
If εk (x) > εk (x) + ck(x,x) , then replace εi (x) by εi (x) + ci(x,x) , i = 1, . . . , p. If εk (x) ≤ εk (x) + ck(x,x) , then do not change the labels. Repeat the general step n−1 times. Then labels εi (x), i = 1, . . . , p, x ∈ X, become constant. Remark 1.27. In the algorithm the labels εi (x), i = 1, 2, . . . , p, may become constant after less than n − 1 steps. So, the algorithm stops when the labels become constant. Let us note that these labels satisfy the conditions of Theorem 1.21. Hence, using labels εi (x), i = 1, . . . , p, x ∈ X, and Theorem 1.21, we construct an optimal solution characterized by Nash strategies of the players 1, 2, . . . , p. Algorithm 1.26 has the computational complexity O(p|X|2 |E|). Example. Let a dynamic c-game of two players on the network represented by Fig. 1.9 be given. This network consists of a directed graph G = (X, E) with sink vertex xf = 5, given partition X = X1 ∪ X2 , X1 = {0, 1, 4, 5}, X2 = {2, 3}, and costs for the players 1 and 2 written respectively in parenthesis in Fig. 1.9. We are seeking for optimal stationary strategies of the players in the dynamic c-game with an arbitrary starting position x ∈ X and fixed final state xf = 5. Step 0 (Preliminary step). Fix ε1 (5) = 0, ε2 (5) = 0; ε1 (0) = ε1 (1) = ε1 (2) = ε1 (3) = ε1 (4) = ∞; ε2 (0) = ε2 (1) = ε2 (2) = ε2 (3) = ε2 (4) = ∞. We repeat the general step 5 times. At each step we examine each vertex x ∈ X and update its labels ε1 (x), ε2 (x) using the condition of the algorithm; we will examine the vertices according to their numerical order. Step 1. Vertex 0 ∈ X1 , therefore, calculate ε1 (0) = ∞; this implies ε2 (0) = ∞; vertex 1 ∈ X1 , therefore calculate ε1 (0) = ∞; this implies ε2 (1) = ∞;
1.6 Main Results for Dynamic c-Games with Constant Costs of the Edges
43
2 (2,4)
(3,5) (1,4)
0
(5,2)
(6,2) (2,1)
1 (3,2)
4
(4,2)
5
(1,2) (2,4)
(8,2)
3 Fig. 1.9.
vertex 2 ∈ X2 , therefore, calculate ε2 (5) + c2(2,5) = min{ε2 (5) + c2(2,5) , ε2 (4) + c2(2,4) } = min{5, ∞} = 5; so, ε2 (2) = 5; this implies ε1 (2) = ε1 (5) + c1(2,5) = 3; 3 ∈ X2 , therefore, calculate ε2 (5) + c2(3,5) = min{ε2 (5) + c2(3,5) , ε2 (2) + c2(3,2) } = min{0 + 4, 5 + 1} = 4; so, ε2 (3) = 4; this implies ε1 (3) = ε1 (5) + c1(3,5) = 0 + 2 = 2; 4 ∈ X1 , therefore, calculate ε1 (3) + c1(4,3) = min{ε1 (5) + c1(4,5) , ε1 (3) + c1(4,3) } = min{0 + 4, 2 + 1} = 3; so, ε1 (4) = 3; this implies ε2 (4) = ε2 (3) + c2(4,3) = 4 + 2 = 6; 5 ∈ X1 ; ε1 (5) = 0, ε2 (5) = 0. Step 2. Vertex 0 ∈ X1 , therefore, calculate ε1 (0) = ∞; this implies ε2 (0) = ∞; ε1 (2) + c1(0,2) = min{ε1 (2) + c1(0,2) , ε1 (1) + c1(0,1) , ε1 (3) + c1(0,3) } = = min{3 + 2, ∞ + 5, 2 + 8} = 5; so, ε1 (0) = 5; this implies ε2 (2) = ε2 (2) + c2(0,2) = 4 + 5 = 9; vertex 1 ∈ X1 , therefore, calculate ε1 (2) + c1(1,2) = min{ε1 (2) + c1(1,2) , ε1 (3) + c1(1,3) } = min{3 + 1, 2 + 3} = 4; so, ε1 (1) = 4; this implies ε2 (1) = ε2 (2) + c2(1,2) = 5 + 4 = 9; vertex 2 ∈ X2 , therefore, calculate ε2 (5) + c2(2,5) = min{ε2 (5) + c2(2,5) , ε2 (4) + c2(2,4) } = min{5, 8} = 5; so, ε2 (2) = 5; this implies ε1 (2) = ε1 (5) + c2(2,5) = 3; vertex 3 ∈ X2 , therefore, calculate ε2 (5) + c2(3,5) = min{ε2 (5) + c2(3,5) , ε2 (2) + c2(3,2) } = min{4, 6} = 4; so, ε2 (3) = 4; this implies ε1 (3) = ε1 (5) + c1(3,5) = 2; vertex 4 ∈ X1 , therefore, calculate ε1 (3) + c1(1,3) = min{ε1 (5) + c1(4,5) , ε1 (3) + c1(4,3) } = min{4, 3} = 3; so, ε1 (4) = 3 and ε2 (4) = 6; vertex 5 ∈ X1 ; ε1 (5) = 0, ε2 (5) = 0.
44
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Step 3. Vertex 0 ∈ X1 , therefore, calculate ε1 (1) + c1(0,1) = min{ε1 (2) + c1(0,2) , ε1 (1) + c1(0,1) , ε1 (3) + c1(0,3) } = = min{5, 9, 10} = 5; so, ε1 (0) = 5 and ε2 (0) = ε2 (2) + c2(0,2) = 9; vertex 1 ∈ X1 , therefore, calculate ε1 (2) + c1(1,2) = min{ε1 (2) + c1(1,2) , ε1 (3) + c1(1,3) } = min{3 + 1, 2 + 3} = 4; so, ε1 (1) = 4 and ε2 (1) = ε2 (2) + c2(1,2) = 5 + 4 = 9; vertex 2 ∈ X2 , therefore, calculate ε2 (5) + c2(2,5) = min{ε2 (5) + c2(2,5) , ε2 (4) + c2(2,4) } = min{5, 8} = 5; so, ε2 (2) = 5 and ε1 (2) = 3; vertex 3 ∈ X2 , therefore, calculate ε2 (5) + c2(3,5) = min{ε2 (5) + c2(3,5) , ε2 (2) + c2(3,2) } = min{4, 6} = 4; so, ε2 (3) = 4 and ε1 (3) = 2; vertex 4 ∈ X1 , therefore, calculate ε1 (3) + c1(4,3) = min{ε1 (5) + c1(4,5) , ε1 (3) + c1(4,3) } = min{4, 3} = 3; so, ε1 (4) = 4 and ε2 (4) = 2. After step 3 we observe that the labels coincide with the labels after step 2. So, the labels become constant and we finish the algorithm. Finally we have obtained ε1 (0) = 5, ε1 (1) = 4, ε1 (2) = 3, ε1 (3) = 2, ε1 (4) = 3, ε1 (5) = 0; ε2 (0) = 9, ε2 (1) = 9, ε2 (2) = 5, ε2 (3) = 4, ε2 (4) = 6, ε2 (5) = 0. So, if we make a potential transformation and select the tree with zero-cost in G we obtain the following trees:
2
0
1
4
5
3 Fig. 1.10.
So, the optimal stationary strategies of the players are the following: s∗1 : 0 → 2;
1 → 2;
: 2 → 5;
3 → 5.
s∗2
4 → 3;
1.8 Determining the Optimal Stationary Strategies for a Dynamic c-Game
45
For the general case of the dynamic c-game the following theorem holds: Theorem 1.28. Let (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , xf , T1 , T2 ) be a dynamic network for which in G there exists a directed path Ps (x0 , xf ) from x0 to xf such that condition (1.10) holds (0 ≤ T1 ≤ |X| − 1, T1 ≤ T2 ). Additionally, assume that in network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , xf , T1 , T2 ) the vectors ci = (cie1 , cie2 , . . . , cie|E| ), i ∈ {1, 2, . . . , p} have positive and constant components. Then in the dynamic c-game on network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , xf , T1 , T2 ) there exists an optimal solution in the sense of Nash s∗1 , s∗2 , . . . , s∗p . This theorem can be proved by using the constructive scheme of the proof of Theorem 1.16 with some modifications.
1.7 Computational Complexity of the Problem of Determining Optimal Stationary Strategies in a Dynamic c-Game The results from Section 1.6 allow us to describe a class of dynamic c-games for which polynomial-time algorithms for determining the optimal stationary strategies of the players can be elaborated. This class is related to dynamic c-games with constant cost functions on the edges of the network and without restrictions on the number of stages. In general, if additional condition (1.10) on the number of stages for the considered problem is given, then it is N P -hard. This problem remains N P hard even if p = 1, T1 = T2 = |X| − 1 and the costs of edges are constant, because in this case it becomes a Hamiltonian path problem in G with a given starting vertex x0 and final vertex xf . In the following we can see that if G has the structure of an acyclic directed graph then the problem of determining optimal stationary strategies in the dynamic c-game with the given restriction on the number of stages and constant costs of edges can be reduced to a similarly problem on an auxiliary time-expanded network (see Section 1.9.1).
1.8 Determining the Optimal Stationary Strategies for a Dynamic c-Game with Non-Constant Cost Functions on the Edges We consider again the problem of determining Nash equilibria for a dynamic c-game without restrictions on the number of stages by a trajectory from x0 to xf . The dynamic c-game is determined by network (G, X1 , X2 , . . . , Xp ,
46
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
c1 (t), c2 (t), . . . , cp (t), x0 , xf ) where the components ciek (t) of the vector functions ci (t) = (cie1 (t), cie2 (t), . . . , cie|E| (t)), i = 1, p, may be non-constant functions. First of all we note that Nash equilibria in such games may fail to hold even in the case of positive and non-decreasing cost functions cie (t), i = 1, p, e ∈ E, defined on the edges of this dynamic network. An example, which confirms this affirmation, is the following: We consider a dynamic network, that consists of the directed graph G = (X, E) represented below by Fig. 1.11, for which the partition of vertex set X = X1 ∪ X2 , X1 = {0, 1}, X2 = {2, 3, 4}, where x0 = 0, xf = 4, is given.
2
0
4
3
1 Fig. 1.11.
All cost functions on the edges are constantly equal to 1 except the following: c1(0,2) (t) ≡ c2(0,2) (t) ≡ 3; 1 if t ≤ 1, 2 c(2,4) (t) = M if t > 1; 1 if t ≤ 2, 1 c(3,4) (t) = M if t > 2, where M is a big number. It is easy to check that the optimal stationary strategies in the sense of Nash for a dynamic c-game on this network do not exist. Nevertheless, we will describe a class of dynamic c-games with nonconstant costs cie (t), i = 1, p, on the edges e ∈ E of the network for which Nash equilibria exist. At first we extend the optimization principle for the stationary case of the problem on dynamic networks with p players. We define the optimization principle with respect to player i, i ∈ {1, 2, . . . , p}, on dynamic networks (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t), x0 , xf ).
1.8 Determining the Optimal Stationary Strategies for a Dynamic c-Game
47
Let E i be a subset of edges of E starting in vertices x ∈ Xi , i.e. E i = {(x, y) ∈ E | x ∈ Xi }, i = 1, p. Hereby, the set E i represents the admissible set of system’s passages from state x ∈ Xi to state y ∈ X for the player i. Furthermore, the set E i indicates the set of the edges of player i. By Esi we denote the subset of E i generated by a fixed strategy si of player i, i ∈ {1, 2, . . . , p}, i.e. Esi = {(x, y) ∈ E i | x ∈ Xi , y = si (x)}. Let s1 , s2 , . . . , si−1 , si+1 , . . . , sp be a set of strategies of the players 1, 2, . . . , i − 1, i + 1, . . . , p and let GS\si = (X, ES\si ) be a subgraph of G, where ES\si = Es1 ∪ Es2 ∪ · · · ∪ Esi −1 ∪ E i ∪ Esi +1 ∪ · · · ∪ Esp . The graph GS\si represents the subgraph of G generated by the set of edges of player i and edges of E when the players 1, 2, . . . , i − 1, i + 1, . . . , p fix their strategies s1 , s2 , . . . , si−1 , si+1 , . . . , sp , respectively. On GS\si we consider the single objective control problem with respect to cost functions cie (t) of player i, with starting vertex x0 and final vertex xf . Definition 1.29. Let us assume that for any given set of strategies s1 , s2 , . . . , si−1 , si+1 , . . . , sp the cost functions cie (t), e ∈ ES\si in GS\si have the property that if an arbitrary optimal path P ∗ (x0 , x) can be represented as P ∗ (x0 , x) = P1∗ (x0 , z)∪ P2∗ (z, x) (P1∗ (x0 , z) and P2∗ (z, x) have no common edges), then the leading part P1∗ (x0 , z) of P ∗ (x0 , x) is an optimal one. We call this property the optimization principle for dynamic networks with respect to player i . A similar definition is introduced in [66] for a dynamic c-game with backward time account. According to this definition the optimization principle is satisfied with respect to player i if in G an arbitrary optimal path P ∗ (x, xf ) in Gs\si is represented as P ∗ (x, xf ) = P1∗ (x, z) ∪ P2∗ (z, xf ) (P1∗ (x, z) and P2∗ (z, xf ) have no common edges), then the directed path P2∗ (z, xf ) is an optimal one. Theorem 1.30. Let (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t), x0 , xf ) be a dynamic network with p players for which vertex xf in G is attainable from any vertex x ∈ X. Assume, that the vector-functions ci (t) = (cie1 (t), cie2 (t), . . . , cie|E| (t)), i = 1, p have non-negative and non-decreasing components. Moreover, let us assume that the optimization principle on the dynamic network is satisfied with respect to each player. Then, in the dynamic c-game on network (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t), x0 , xf ) for the players 1, 2, . . . , p there exists an optimal solution in the sense of Nash s∗1 , s∗2 , . . . , s∗p . This theorem can be proved in the same way as Theorem 1.16. The proof of this theorem is given in [59].
48
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
In general, if for a dynamic c-game with positive and nondecreasing cost functions cie (t), e ∈ E, i = 1, p, a Nash equilibrium s∗1 , s∗2 , . . . , s∗p exists, then the optimal trajectory x0 , x1 , . . . , xf , generated by these strategies, corresponds to the optimal trajectory for a non-stationary dynamic c-game. In the next section we show that for the non-stationary dynamic c-game a Nash equilibrium exists if at least one directed path from x0 to xf exists. A polynomial time algorithm for determining optimal trajectories from x0 to xf is proposed in Section 1.9. Here it is important to note that the optimal strategies of the players s∗1 , s∗2 , . . . , s∗P in a dynamic c-game depend on the starting position x0 . Additionally, in [58] it is also proved the following result: Theorem 1.31. Let (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . cp (t), x0 , xf ) be a dynamic network with p players for which in G any vertex x ∈ X is attainable from x0 and the vector-functions ci (t) = (cie1 (t), cie2 (t), . . . , cie|E| (t)), i = 1, p, have non-negative and non-decreasing components. Moreover, let us consider that the optimization principle for the dynamic network is satisfied with respect to each player. Then, in G there exists a tree GT ∗ = (X, E ∗ ) for which any vertex x ∈ X is attainable from x0 , and a unique directed path PGT ∗ (x0 , x) from x0 to x in GT ∗ corresponds to optimal strategies s∗1 , s∗2 , . . . , s∗p of the players in the dynamic c-game on network (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , , cp (t), x0 , x) with starting position x0 and final position x. For different ver tices x and y the optimal paths PGT ∗ (x0 , x) and PGT ∗ (x0 , y) correspond to ∗ ∗ ∗ different strategies of the players s∗1 , s∗2 , . . . , s∗p and s1 , s2 , . . . , sp in different games with starting vertex x0 and final positions x, y respectively. A similar theorem holds for a dynamic c-game with backward time account if the optimization principle in G is satisfied with respect to each player (see [66]). If the optimization principle in the dynamic c-game is satisfied with respect to each player then the following algorithm finds the tree of optimal paths GT ∗ = (X, E ∗ ) in G, when G has no directed cycles, i.e. G is an acyclic graph. We assume that the positions of the network are numbered with 0, 1, 2, . . . , |X| − 1 according to a partial order determined by the structure of the acyclic graph G. This means that if y > x then there is no directed path P (y, x) from y to x. The algorithm consists of |X| steps and constructs a sequence of trees GT k = (X k , E k ), k = 0, |X| − 1, such that at the final step k = |X| − 1 we obtain GT |X|−1 = GT ∗ . Algorithm 1.32. Determining the Tree of Optimal Paths in an Acyclic Network Preliminary step (step 0): Set GT ◦ = (X ◦ , E ◦ ), where X ◦ = {x0 }, E ◦ = ∅. Assign to every vertex x ∈ X a set of labels H 1 (x), H 2 (x), . . . , H p (x), t(x) as follows:
1.8 Determining the Optimal Stationary Strategies for a Dynamic c-Game
H i (x0 ) = 0, H i (x) = ∞, t(x0 ) = 0, t(x) = ∞,
49
i = 1, p, ∀x ∈ X \ {x0 }, i = 1, p, ∀x ∈ X \ {x0 }.
General step (step k, k ≥ 1): Find in X \ X k−1 the least vertex xk and the set of incoming edges E − (xk ) = {(xr , xk ) ∈ E | xr ∈ X k−1 } for xk . If |E − (xk )| = 1 then go to a); otherwise go to b). a) Find a unique vertex y such that e = (y, xk ) ∈ E − (xk ) and calculate H i (xk ) = H i (y) + ci(y,xk ) (t(y)),
i = 1, p;
k
t(x ) = t(y) + 1. After that form the sets X k = X k ∪ {xk }, E k = E k−1 ∪ {(y, xk )} and put GT k = (X k , E k ). If k < |X| − 1 then go to next step k + 1; otherwise fix E ∗ = E |X|−1 , GT ∗ = (X, E ∗ ) and STOP. k−1 such that in graph GT k = b) Select the biggest vertex z ∈ X k−1 k k−1 − k X ∪{x }, E ∪E (x ) there exist at least two parallel directed paths P (z, xk ), P (z, xk ) from z to xk without common edges, i.e. E(P (z, xk )) ∩ E(P (z, xk )) = ∅. Let e = (xr , xk ) and e = (xs , xk ) be respective edges of these paths with common end vertex in xk . So, e , e ∈ E − (xk ). For vertex z determine iz such that z ∈ Xiz . If H iz (xr ) + ci(xz r ,xk ) (t(xr )) ≤ H iz (xs ) + ci(xz s ,xk ) (t(xs )) then delete the edge e = (xs , xk ) from E − (xk ) and from G; otherwise delete the edge e = (xr , xk ) from E − (xk ) and from G. After that check again if the condition |E − (xk )| = 1 holds. If |E − (xk )| = 1 then go to a) otherwise go to b). Remark 1.33. The values H i (x), i = 1, p for x ∈ X in Algorithm 1.32 express the respective costs of the players in a dynamic c-game with starting position x0 and final position x. Example. Let us consider a stationary dynamic c-game of two players. The game is determined by the network given in Fig. 1.12. This network consists of a directed graph G = (X, E) with partition X = X1 ∪ X2 , X1 = {0, 2, 5, 6}, X2 = {1, 3, 4}, starting position x0 = 0, final position xf = 6 and cost functions of the players 1 and 2 given in parenthesis in Fig. 1.12. In order to apply Algorithm 1.32 we number vertices of G in such a way that if y > x then there is no directed path from y to x. It is easy to observe that the numeration of vertices in Fig. 1.12 satisfies this condition.
50
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games (t,3t)
5
1 (1,2)
0
(1,2t)
(t,2t)
(2,3)
(1,2)
(2t,t)
3
(1,1)
(2,2t)
(1,t)
2
6
(1,1)
4
Fig. 1.12.
Note that for our dynamic network the optimization principle is satisfied with respect to each player. Therefore, if we use Algorithm 1.32 we obtain: Step 0. GT ◦ = ({0}, ∅); X ◦ = {0}; E ◦ = ∅; H 1 (0) = 0; H 2 (0) = 0; t(0) = 0; H i (x) = ∞ for x = 0, i = 1, 2; t(x) = ∞ for x = 0. Step 1. x1 = 1; E − (1) = {(0, 1)}; H 1 (0) = 0; H 2 (0) = 0; H 1 (1) = 1; H 2 (1) = 2; t(0) = 0; t(1) = 1; H i (x) = ∞ for x = 0, 1 and i = 1, 2; t(x) = ∞ for x = 0, 1. After step 1 we have GT 1 = ({0, 1}, {(0, 1)}), i.e. X 1 = {0, 1}, E 1 = {0, 1}. Step 2. x2 = 2; E − (2) = {(0, 2), (1, 2)}. Since E − (2) = 1 we have the case b): z = 0; P (0, 2) = {(0, 2)}, P (0, 2) = {(0, 1), (0, 2)}; e1 = (0, 2); e = (1, 2) and iz = 1 because 0 ∈ X1 . For e and e the following condition holds: H 1 (0) + c1(0,2) (0) ≤ H 1 (1) + c1(1,2) (1). Therefore, we delete (1,2) from E − (2) and obtain E − (2) = {(0, 2)} (case a)). We calculate H 1 (2) = H 1 (0)+c1(0,2) (0) = 1; H 2 (2) = H 2 (0)+c2(0,2) (0) = 1; t(2) = t(0) + 1 = 1. After step 2 we obtain GT 2 = ({0, 1, 2}, {(0, 1), (0, 2)}); H 1 (0) = 0; H 2 (0) = 0; H 1 (1) = 1; H 2 (1) = 2; H 1 (2) = 1; H 2 (2) = 1; H i (x) = ∞ for x = 0, 1, 2, i = 1, 2; t(0) = 0; t(1) = 1; t(2) = 1; t(x) = ∞ for x = 0, 1, 2.
1.8 Determining the Optimal Stationary Strategies for a Dynamic c-Game
51
Step 3. x3 = 3; E − (3) = {(1, 3), (2, 3)}. So, we have E − (3) = 1 (case b): z = 0; P (0, 3) = {(0, 1), (1, 3)}; P (0, 3) = {(0, 2), (2, 3)}; e1 = (1, 3); e = (2, 3); iz = 0. For e and e we have H 1 (1) + c1(1,3) (1) ≤ H 1 (2) + c1(2,3) (1) Therefore, we delete (2,3) from E − (3). So, E − (3) = {(1, 2)} and H 1 (3) = H 1 (1) + c1(1,3) (1) = 1 + 1 = 2; H 2 (3) = H 2 (1) + c2(1,3) (1) = 2 + 2 = 4; t(3) = t(1) + 1 = 2. We delete the GT 3 = ({0, 1, 2, 3}, {(0, 1), (0, 2), (1, 3)}); H 1 (0) = 0; H 2 (0) = 0; H 1 (1) = 1; H 2 (1) = 2; H 1 (2) = 1; H 2 (2) = 1; H 1 (3) = 2; H 2 (3) = 4; H i (x) = ∞ for x ∈ {4, 5, 6}, i = 1, 2; t(0) = 1; t(1) = 1; t(2) = 1; t(3) = 2; t(4) = t(5) = t(6) = ∞. Step 4. x4 = 4; E − (3) = {(3, 4)}. Therefore, we obtain GT 4 = ({0, 1, 2, 3, 4}, {(0, 1), (0, 2), (1, 3), (3, 4)}); H 1 (0) = 0; H 2 (0) = 0; H 1 (1) = 1; H 2 (1) = 2; H 1 (2) = 1; H 2 (2) = 1; H 1 (3) = 2; H 2 (3) = 4; H 1 (4) = 3; H 2 (4) = 8; H 1 (5) = ∞; H 2 (5) = ∞; H 1 (6) = ∞; H 2 (6) = ∞; t(0) = 0; t(1) = 1; t(2) = 1; t(3) = 2; t(4) = 3; t(5) = t(6) = ∞. Step 5. x5 = 5; E − (3) = {(1, 5), (3, 5), (4, 5)}. Since E − (3) = 3 we have case b): z = 3; P (3, 5) = {(3, 5)}; P (3, 5) = {(3, 4), (4, 5)}; e = (3, 5); e = (4, 5); iz = 2. For e and e we have H 2 (3) + c2(3,5) (2) ≤ H 2 (4) + c2(4,5) (3). We delete the edge (4,5) from E − (4) and obtain E − (4) = {(1, 5), (3, 5)}. For the edges e = (1, 5) and e = (3, 5) we find z = 1, iz = 2 and the paths P (1, 5), P ((1, 3), (3, 5)). Since the following condition holds: H 2 (1) + c2(1,5) (1) ≤ H 2 (3) + c2(3,5) (2) we delete the edge (3,5) from E − (4) and obtain E − (4) = {(1, 5)}. So H 1 (5) = H 1 (1) + c1(1,5) (1) = 2; H 2 (5) = H 2 (1) + c2(1,5) (1) = 5;
t(5) = t(1) + 1 = 2.
After step 5 we obtain GT 5 = ({0, 1, 2, 3, 4, 5}, {(0, 1), (0, 2), (1, 3), (3, 4), (1, 5)}); H 1 (0) = 0; H 2 (0) = 0; H 1 (1) = 1; H 2 (1) = 2; H 1 (2) = 1; H 2 (2) = 1; H 1 (3) = 2; H 2 (3) = 4; H 1 (4) = 3; H 2 (4) = 8; H 1 (5) = 2; H 2 (5) = 5; H 1 (6) = ∞; H 2 (6) = ∞; t(0) = 0; t(1) = 1; t(2) = 1; t(3) = 2; t(4) = 3; t(5) = 2.
52
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Step 6. x6 = 6; E − (6) = {(4, 6), (5, 6)}; E − (6) = 1 we have case b): z = 1; P (1, 6) = {(1, 5), (5, 6)}; P (1, 6) = {(1, 3), (3, 4), (4, 6)}; e = (5, 6); e = (4, 6); i7 = 2. For e and e the following condition holds: H 2 (5) + c2(5,6) (2) ≤ H 2 (4) + c2(4,6) (3). We delete (4,6) from E − (4). Therefore, we obtain H 1 (6) = H 1 (5) + c1(5,6) (2) = 3; H 2 (6) = H 2 (5) + c2(5,6) (2) = 7;
t(6) = t(5) + 1 = 3.
Finally we obtain GT 6 = GT ∗ = ({0, 1, 2, 3, 4, 5, 6}, {(0, 1), (0, 2), (1, 3), (3, 4), (1, 5), (5, 6)}); H 1 (0) = 0; H 2 (0) = 0; H 1 (1) = 1; H 2 (1) = 2; H 1 (2) = 1; H 2 (2) = 1; H 1 (3) = 2; H 2 (3) = 4; H 1 (4) = 3; H 2 (4) = 8; H 1 (5) = 2; H 2 (5) = 5; H 1 (6) = 3; H 2 (6) = 7; t(0) = 0; t(1) = 1; t(2) = 1; t(3) = 2; t(4) = 3; t(5) = 2; t(6) = 3. So, for the dynamic c-game on our network we obtain the tree of optimal paths given in Fig. 1.13. 5
1
0
6
3
2
4
Fig. 1.13.
For the case of a dynamic c-game with backward time-step account the tree of optimal paths is given in Fig. 1.14.
1.9 Determining Nash Equilibria for Non-Stationary Dynamic c-Games
53
5
1
0
6
3
2
4
Fig. 1.14.
1.9 Determining Nash Equilibria for Non-Stationary Dynamic c-Games Now we study the problem of finding Nash equilibria for non-stationary dynamic c-games. For this case of the problem we will use a time-expanded network utilized in [67]-[76]. We show that the problem of finding non-stationary strategies can be reduced to the problem of finding stationary strategies on an auxiliary network with constant costs on the edges. 1.9.1 Time-Expanded Networks for Non-Stationary Dynamic c-Games and Their Main Properties Let (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ) be a network which determines our dynamic c-game. We assume that in G = (X, E) vertex xf ∈ X is attainable from every x ∈ X. Additionally, we construct an auxiliary network (G, Z1 , Z2 , . . . , Zp , c1 , c2 , . . . , cp , y0 , yf ) with constant costs on the edges, where the graph G = (Y, E) is obtained as follows: Y = Y 0 ∪ Y 1 ∪ Y 2 ∪ · · · ∪ Y T1 ∪ Y T1 +1 ∪ · · · ∪ Y T2
(Y t ∩ Y k = ∅, t = k);
Y t = (X, t) corresponds to the set of states at time-step t, t = 0, T2 ; E = E 0 ∪ E 1 ∪ E 2 ∪ · · · ∪ E T1 ∪ E T1 +1 ∪ · · · ∪ E T2 −1 ∪ E f , where E t = {((x, t), (y, t + 1)) | (x, t) ∈ Y t , (y, t + 1) ∈ Y t+1 , (x, y) ∈ E}, t = 0, T2 − 1; f
E = {((x, t), (y, T2 )) | (x, t) ∈ Y t , (y, T2 ) ∈ Y T2 , (x, y) ∈ E, t = T1 − 1, T2 − 2}.
54
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
In case T1 = T2 we obtain a T2 -partite network. So, the sets Y t = (X, t), t = 0, T2 , represent T2 + 1 copies of the set X where level sets (layers) Y t and Y t+1 are linked by edges of the form (x, t), (y, t + 1) if (x, y) ∈ E. Additionally, in G there exist edges (x, t), (y, T2 ) which connect the set (X, t) and (X, T2 ), t = T1 , T2 − 2. We define the cost functions ci((x,t),(y,t+1)) and ci((x,t),(y,T2 )) on the edges ((x, t), (y, t + 1)), ((x, t), (y, T2 )) in G in the following way: ci((x,t),(y,t+1)) = c(x,y) (t), ci((x,t),(y,T2 ))
= c(x,y) (t),
t = 0, T2 − 1, i = 1, p; t = T1 − 1, T2 − 2, i = 1, p.
The sets Zi of the players’ position in that auxiliary network are Zi = t (Xi , t), i = 1, p. The main properties we will use for the problem are based on the following lemma: Lemma 1.34. Let P (y0 , yf ) be an arbitrary directed path from y0 to yf in the graph G. Then the number of edges |E(P (y0 , yf ))| of the directed path satisfies the condition: T1 ≤ |E(P (y0 , yf ))| ≤ T2 . Moreover, in G there exists a directed path P (y0 , yf ) from y0 to yf if and only if in G there exists a directed path P (x0 , xf ) from x0 to xf (P (x0 , xf ) may contain directed cycles), which contains the same number of edges |E(P (x0 , xf )| = |E(P (y0 , yf ))|. Proof. Let P (y0 , yf ) be an arbitrary path from y0 to yf in G and let us show that T1 ≤ |E(P (y0 , yf ))| ≤ T2 , where E(P (y0 , yf )) is the set of edges of the path P (y0 , yf ). Indeed the path P (y0 , yf ) contains at least T1 edges because it passes through all the layers T2 Y 0 , Y 1 , Y 2 , . . . , Y T1 −1 and then goes to one of the positions y ∈ i=T Y i. 1 On the other hand the number of edges of path P (y0 , yf ) can not exceed T2 because each vertex (x, t) of the level sets Y T1 , Y T1 +1 , . . . , Y T2 is connected with (xf , T2 ) in G (if in G there exists an edge (x, xf )). Corollary 1.35. • Let G be a graph without directed cycles. Then G is an acyclic graph and in G there exists a directed path P (y0 , yf ) from y0 to yf with the property: T1 ≤ |E(P (y0 , yf ))| ≤ T2 if and only if in G there exists a path P (x0 , xf ) from x0 to xf with the property: T1 ≤ |E(P (x0 , xf ))| ≤ T2 .
1.9 Determining Nash Equilibria for Non-Stationary Dynamic c-Games
55
• Let G be a graph which may contain directed cycles. If P (y0 , yf ) is an arbitrary path from y0 to yf in G with vertex set X(P (y0 , yf )) = {y0 , y1 , y2 , . . . , yT (xf ) = yf }, where yt = (xt , t), t = 0, T (xf ), then {x0 , x1 , x2 , . . . , xT (xf ) = xf } generates in G a directed path P (x0 , xf ) from x0 to xf (P (x0 , xf ) may contain directed cycles). So, one may conclude that the auxiliary time-expanded network gives all admissible directed paths from x0 to xf for the considered problems (for the non-cooperative and the cooperative case). On the basis of this result an algorithmic solution is presented. 1.9.2 Determining Nash Equilibria Now we show that the problem of finding Nash equilibria for non-stationary dynamic c-games can be reduced to the stationary case of the game on an auxiliary time-expanded network with constant cost functions on the edges and without restrictions on the number of stages. Theorem 1.36. Let (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t)), x0 , xf , T1 , T2 ) be a network with positive and nondecreasing cost functions cie (t), i = 1, p, on edges e ∈ E. Moreover, let us assume that in G = (X, E) there exists a directed path PG (x0 , xf ) from x0 to xf such that T1 ≤ |E(PG (x0 , xf ))| ≤ T2 , i.e. PG (x0 , xf ) = {x0 , e0 , x1 , e1 , x2 , . . . , xT (xf )−1 , eT (xf )−1 , xT (xf ) }, where T1 ≤ T (xf ) ≤ T2 (here PG (x0 , xf ) may contain directed cycles). Then for the non-stationary dynamic c-game on the network there exist non-stationary strategies in the sense of Nash u∗1 , u∗2 , . . . , u∗p . Proof. Let us consider arbitrary stationary strategies s1 , s2 , . . . , sp of the players in the dynamic c-game on the auxiliary time-expanded network (G, Z1 , Z2 , . . . , Zp , c1 , c2 , . . . , cp , y0 , yf ) with constant cost functions on the edges. It is obvious that in the initial dynamic c-game on (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ) we uniquely can determine the non-stationary strategies u1 , u2 , . . . , up of the players as follows: si (x, t) = ui (x, t) for (x, t) ∈ Xi × {1, 2, . . . , T }, i = 1, p. So, between the set of stationary strategies of the players on network (G, Z1 , Z2 , , . . . , Zp , c1 , c2 , . . . , cp , y0 , yf ) and the set of non-stationary strategies of the players on (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ) there exists
56
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
a bijective mapping, which preserves integral-time costs on certain trajectories: If s∗1 , s∗2 , . . . , s∗p is an equilibrium solution in the sense of Nash for the stationary case of the problem on network (G, Z1 , Z2 , . . . , Zp , c1 , c2 , . . . , cp , y0 , yf ) then we observe that u∗ (x, t) = s∗ (x, t) for (x, t) ∈ Xi × {1, 2, . . . , T2 }, i = 1, p, is an equilibrium solution in the sense of Nash for the non-stationary case of the game on network (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ). Since the time t on the time expanded network for every position is determined by the level set the cost functions ci((x,t),(y,t+1) , ci((x,t),(y,T2 ) , i = 1, p, on the auxiliary network can be considered as constant. Therefore, if in G there exists a directed path PG (y0 , yf ) from y0 to yf then for the dynamic c-game on the auxiliary network a Nash equilibrium exists. According to Lemma 1.34 such a path PG (y0 , yf ) exists in G because in G there exists a directed path PG (x0 , xf ) = {x0 , e0 , x1 , e1 , x2 , . . . , xT (xf )−1 , eT (xf )−1 , xT (xf ) } where T1 ≤ T (xf ) ≤ T2 (PG (x0 , xf ) may contain directed cycles).
On the basis of this theorem now we can propose the following algorithm for determining the equilibrium non-stationary strategies of the players in such dynamic c-games. Algorithm 1.37. Determining the Optimal Non-Stationary Strategies in a Dynamic c-Game 1. Construct the auxiliary time-expanded network (G, Z1 , Z2 , . . . , Zp , c1 , c2 , . . . , cp , y0 , yf ). 2. Define the equilibrium stationary strategies s∗1 , s∗2 , . . . , s∗p in the dynamic c-game on (G, Z1 , Z2 , . . . , Zp , c1 , c2 , . . . , cp , y0 , yf ). 3. Put u∗i (x, t) = s∗i (x, t) for (x, t) ∈ Xi × {1, 2, . . . , T2 },
i = 1, p.
In the next section we extend our approach for multi-objective control problems on networks with Pareto optimality principles.
1.10 Application of the Dynamic c-Game
57
1.10 Application of the Dynamic c-Game for Studying and Solving Multi-Objective Control Problems Let us show that the results from Section 1.9 can be used for studying and solving the multi-objective control problems from Section 1.1. At first we consider the problem from Section 1.1.2 (Problem 1.2), for which the alternate players’ control condition is satisfied. We regard this prob. . . , Zp , lem as a dynamic c-game determined by an acyclic network (G, Z1 , Z2 , p c1 , c2 , . . . , cp , y0 , yf ), where the graph G = (Y, E) with partition Y = i=1 Zi and constant functions ci : E → R, i = 1, p, are defined in the following way: states corThe set of vertices Y consists of T2 + 1 copies of the set Tof 2 , i.e. Y = (X, t) with responding to moments in time t = 0, 1, 2, . . . , T 2 t=0 p the partition Y = i=1 Zi , determined by the alternate players’ condition T2 (X i (t), t). Zi = t=0 The set of edges E is also represented as E = E 0 ∪ E 1 ∪ E 2 ∪ · · · ∪ E T1 ∪ E T1 +1 ∪ · · · ∪ E T2 −1 ∪ E f ; where Y t = (X, t),
t = 0, T2 ;
t
E = {((x, t), (y, t + 1)) | (x, t) ∈ Y t , (y, t + 1) ∈ Y t+1 , (x, y) ∈ E}, t = 0, T2 − 1; E f = {((x, t), (y, T2 )) | (x, t) ∈ Y t , (y, T2 ) ∈ Y T2 , (x, y) ∈ E, t = T1 − 1, T2 − 2}. Note that (x, t) ∈ (X, t), and here, the notation (x, t) has the same meaning as x(t), i.e. (x, t) = x(t). In our network we fix y0 = (x0 , 0) ∈ (X, 0) and yf = (xf , T2 ) ∈ (X, T2 ). We define the cost functions ci , i = 1, p, as follows: ci((x,t),(y,t+1)) = ci(x(t),y(t+1)) , if y(t + 1) = gti (x(t), ui (t)) for given x(t) ∈ Zi and ui (t) ∈ Uti (x(t)), i = 1, p, t = 0, T2 − 1; ci((x,t),(y,T2 )) = ci(x(t),y(t+1)) , if y(t + 1) = gti (x(t), ui (t)) for given x(t) ∈ Zi and ui (t) ∈ Uti (x(t)), i = 1, p, t = T1 − 1, T2 − 2. It is easy to see that in this network every directed path PG (z0 , zf ) from z0 to zf contains |E(PG (z0 , zf ))| edges such that T1 ≤ |E(PG (z0 , zf ))| ≤ T2 . So, if we define the admissible solution u1 (t), u2 (t), . . . , up (t) as a set of vectors of control parameters, which satisfy the conditions (1.5), (1.6) and T1 ≤ T (xf ) ≤ T2 , then we may conclude that there exists a bijective mapping between the set of admissible solutions of the control problem in positional form and the set of admissible strategies in the dynamic c-game. This means that Theorem 1.9 holds and the following algorithm can be used:
58
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Algorithm 1.38. Determining Nash Equilibria for Multi-Objective Control Problems in Positional Form 1. Construct an auxiliary network (G, Z1 , Z2 , . . . , Zp , c1 , c2 , . . . , cp , y0 , yf ) according to the rules described above. 2. Find the optimal stationary strategy in the dynamic c-game determined by the network (G, Z1 , Z2 , . . . , Zp , c1 , c2 , . . . , cp , y0 , yf ) and the directed path Ps∗∗ = {y0 = (x∗0 , 0), (x∗1 , 1), . . . , (x∗t , t), (x∗t+1 , t + 1)}, . . . , yf = (x∗f , T2 )}. 3. Starting from the final position (yf∗ , T2 ) find recursively ui∗ (t), t = T − 1, T − 2, . . . , 1, 0, such that x∗ (t + 1) = gt (x∗ (t), u1∗ (t), u2∗ (t), . . . , up∗ (t)). Then u1∗ (t), u2∗ (t), . . . , up∗ (t) is a solution of the problem. In an analogous way the problem of determining a Pareto solution of the multi-objective control problem from Section 1.1.4 can be reduced to the c1 , c2 , . . . , cp , z0 , zf ). Here we should not take control problem on network (G, p into account the partition Z = i=1 Zi .
1.11 Multi-Objective Control and Cooperative Games on Dynamic Networks Now we shall use the concept of cooperative games to formulate multiobjective control problems on networks applying the Pareto optimality principle. In an analogous way as in the previous section we distinguish two versions of the problem concerning stationary and non-stationary strategies. 1.11.1 Stationary Strategies on Networks and Pareto Solutions Let the graph of states’ transitions G = (X, E) with non-negative cost functions cie (t) on edges e ∈ E and fixed starting and final states x0 , xf ∈ X be given. On G we consider the following cooperative game: The stationary strategies of the players 1, 2, . . . , p are defined as a map: s: x → y ∈ X(x) for x ∈ X \ {xf }. For an arbitrary stationary strategy s ∈ S = {s | s : x → y ∈ X(x) for x ∈ X \ {xf }} we denote by Gs = (X, Es ) the subgraph of G generated by the edges e = (x, s(x)) for x ∈ X \ {xf }. Then for every s ∈ S in G either a unique directed path Ps (x0 , xf ) from x0 to xf exists or such a path does
1.11 Multi-Objective Control and Cooperative Games on Dynamic Networks
59
not exists in G. For a given s and fixed x0 and xf we define the quantities 1 2 p H x0 xf (s), H x0 xf (s), . . . , H x0 xf (s) in the following way: Let us assume that the path Ps (x0 , xf ) exists in G. Then it is unique and we can assign to its edges, starting with the edge that begins in x0 , numbers 0, 1, 2, . . . , ks . These numbers determine the time steps te (s) when the system passes from one state to another if the stationary strategy s is applied. We put i H x0 xf (s) = cie (te (s)), if T1 ≤ |E(Ps (x0 , xf ))| ≤ T2 ; e∈E(Ps (x0 ,xf )) i
otherwise we put H x0 xf (s) = ∞. We consider the problem of finding the set SP∗ of Pareto solutions (or one Pareto solution) in the set of stationary strategies S. Note that s∗ , where s∗ ∈ SP∗ , is called a Pareto solution if in S \ Sp∗ there is no strategy s such that i i H x0 xf (s ) ≤ H x0 xf (s∗ ), i = 1, p, i0
i0
and H x0 xf (s ) < H x0 xf (s∗ ) for an index i0 ∈ {1, 2, . . . , p}. 1.11.2 A Pareto Solution for the Problem with Non-Stationary Strategies on Networks The non-stationary strategy for our cooperative dynamic game is defined as a map: u: (x, t) → (y, t + 1) ∈ X(x) × {t + 1} for x ∈ X \ {xf },
t = 0, 1, 2, . . .
The payoff functions 1
2
p
F x0 xf (u), F x0 xf (u), . . . , F x0 xf (u) of the game are defined in the following way: Let u be an arbitrary strategy. Then u either generates in G a finite trajectory x0 = x(0), x(1), x(2), . . . , x(T (xf )) = xf from x0 to xf and T (xf ) represents the time moment when xf is reached, or u generates in G an infinite trajectory x0 = x(0), x(1), x(2), . . . , x(t), x(t + 1), . . . which does not pass through xf , i.e. T (xf ) = ∞. In both cases the next state x(t + 1) is determined uniquely by x(t) and u(t) as follows: x(t + 1) = u(x(t), t),
t = 0, 1, 2, . . ..
60
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
If the state xf is reached at a finite moment in time T (xf ) (i.e. the attainability T1 ≤ T (xf ) ≤ T2 is guaranteed) then we set T (x0 )−1
i
F x0 xf (u) =
t=0
ci(x(t),x(t+1)) (t),
i = 1, p;
otherwise we put i
F x0 xf (u) = ∞,
i = 1, p.
1.12 Determining Pareto Solutions for Multi-Objective Control Problems on Networks Note that in the considered multi-objective control problems a Pareto solution always exists if in G there exists at least one directed path P (x0 , xf ) from x0 to xf and the costs cie , i = 1, 2, . . . , p, on the edges e ∈ E are nonnegative. We propose algorithms for solving the stationary and non-stationary cases of the problems. 1.12.1 Determining Pareto Stationary Strategies First of all, an algorithm for determining Pareto stationary solutions for multiobjective control problems on networks without restrictions on the number of stages when the costs on the edges are constant and positive functions is proposed: Algorithm 1.39. Determining Pareto Solutions for the Problem with Constant Costs on the Edges Assume that in G there exists at least a directed path from each vertex x ∈ X to xf and the costs cie , i = 1, 2, . . . , p, on the edges e ∈ E are nonnegative. i
Preliminary step (step 0): Set X 0 = {xf }; E 0 = ∅; H xf xf = 0, i = 1, p. General step (step k, k ≥ 1): If X k−1 = X then find the set of edges
E(X k−1 ) = e = (y, x) ∈ E x ∈ X k−1 , y ∈ X \ X k−1 . Then find an edge e = (y , x ) in E(X k−1 ) such that the following conditions are satisfied: a) ir
ir
H y xf = H x xf + ci(yr ,x ) = for an index ir ∈ {1, 2, . . . , p};
min
(y,x)∈E(X k−1 )
i
r r H xxf + ci(y,x)
1.12 Determining Pareto Solutions for Multi-Objective Control Problems
61
b) there is no edge (y, x) ∈ E(X k−1 ) such that i
i
i0
i0
H xxf + ci(y,x) ≤ H x xf + ci(y ,x ) , 0 H xxf + ci(y,x) < H x xf + ci(y0 ,x )
For given y fix
i
i
i = 1, p and for an index i0 ∈ {1, 2, . . . , p}.
H y xf = H x xf + ci(y ,x ) ,
i = 1, p.
After that, put X k = X k−1 ∪ {y }, E k = E k−1 ∪ {(y , x )} and go to the next step. If X k−1 = X (k = n) then find the tree GT n−1 = (X, E n−1 ) which determines the optimal Pareto strategy s∗ of the players as follows: s∗ (y) = x for y ∈ X \ {xf } if (y, x) ∈ E n−1 . Algorithm 1.39 is an extension of Dijkstra’s algorithm [15, 18] for a multiobjective version of the optimal paths problem in a weighted directed graph. The algorithm determines the Pareto stationary strategy s∗ of the players for the multi-objective control problem on network (G, X, c1 , c2 , . . . , cp , x, xf , T1 , , T2 ) with an arbitrary starting position x ∈ X and a given final position xf ∈ X, i.e. the tree GT n−1 = (X, E n−1 ) gives all Pareto optimal paths from every x ∈ X to xf . Theorem 1.40. Let a network (G, X, c1 , c2 , . . . , cp , x, xf ) for which in G = (X, E) there exists a directed path PG (x, xf ) from an arbitrary x ∈ X to xf be given. Additionally, assume that the costs cie , i = 1, 2, . . . , p, e ∈ E, are positive. Then Algorithm 1.39 finds Pareto stationary strategies of the players in the multi-objective control problem on network (G, X, c1 , c2 , . . . , cp , x, xf ) for every given starting position x and final position xf . The running-time of the algorithm is O(|X|3 p). Proof. We prove this theorem by using an induction principle on the number of players p. In case p = 1 Algorithm 1.39 becomes Dijkstra’s algorithm for determining the tree of shortest paths in a weighted directed graph; therefore, the theorem holds. Let us assume that the theorem holds for any p ≤ q, q ≥ 1, and let us show that it is true for p = q + 1. We consider an auxiliary graph Gq+1 = (X, Eq+1 (E \ E(Xp+1 ))), where Eq+1 represents the set of edges e = (y , x ) found by the iterations of Algorithm 1.39 with ir = q + 1 and Xq+1 = {y ∈ X | e = (y , x ) ∈ Eq+1 }, E(Xp+1 ) = {e ∈ E | e = (y , x) ∈ E}. Based on the conditions a) and b) of Algorithm 1.39 we may conclude that if we find a Pareto solution of the multi-objective problem on G with respect to
62
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
the players 1, 2, . . . , q, q + 1, then we obtain the same solution of the problem as the one on G. Taking into account that in G every vertex y ∈ Xq+1 has only one leaving edge, we may regard our problem on G as a multi-objective one with respect to the players 1, 2, . . . , q. According to the induction principle Algorithm 1.39 finds a Pareto solution for the multi-objective problem with respect to the players 1, 2, . . . , q. In such a way we obtain the Pareto solution s∗ of the problem on the auxiliary graph G which at the same time is the Pareto solution of the problem on G with respect to the players 1, 2, . . . , q, q + 1. It is also easy to observe that the number of elementary operations at the general step of the algorithm is O(|X|2 p). Therefore, the running-time of the algorithm is O(|X|3 p). Example. Consider the problem of determining Pareto strategies on a network with two players. The corresponding network with sink vertex xf = 5 is given in Fig. 1.15. The values of the cost functions on the edges for the players 1 and 2 are given in parenthesis alongside them.
(3,5)
0 (3,2)
3
2
(5,2)
(2,4)
(4,2)
(1,3)
(1,3)
5
(2,1)
(1,5)
(3,3)
1
4 (2,4)
Fig. 1.15.
If we apply Algorithm 1.39 then we obtain: Step 0. 1 2 X 0 = {5}; E 0 = ∅; H 5,5 = 0, H 5,5 = 0; Step 1. X 0 = X, therefore we find E(X 0 ) = {(3, 5), (4, 5)}. It is easy to check that the edge e = (3, 5) ∈ E(X 0 ) satisfies the conditions a) and b) for ir = 1. Indeed, 1
1
1
1
H 3,5 = H 5,5 + c1(3,5) = min{H 5,5 + c1(3,5) , H 5,5 + c1(4,5) } = min{0 + 2, 3 + 0} = 2 and condition b) holds. 1 2 For vertex 3 we fix H 3,5 = 2 and H 3,5 = 0 + 4 = 4. After that we put X 1 = X 0 ∪ {3}; E 1 = E 0 ∪ {(3, 5)}, i.e. X 1 = {3, 5}, E 1 = {(3, 5)}.
1.12 Determining Pareto Solutions for Multi-Objective Control Problems
63
Step 2. X 1 = X, therefore we find E(X 1 ) = {(0, 3), (2, 3), (4, 3), (4, 5)}. If we put ir = 1 then edge e = (4, 5) ∈ E(X 1 ) satisfies the conditions a) and b) of the algorithm. Indeed, 1
1
H 4,5 = H 5,5 + c1(4,5) 1
1
1
1
= min{H 3,5 + c1(0,3) , H 3,5 + c1(2,3) , H 3,5 + c1(4,3) , H 3,5 + c1(4,5) } = min{2 + 3, 2 + 4, 2 + 2, 3 + 0} = 3 and condition b) holds. 1 2 2 So, we fix H 4,5 = 3 and H 4,5 = H 5,5 + c2(4,5) = 3. After that we put X 2 = X 1 ∪ {4}; E 2 = E 1 ∪ {(4, 5)}, i.e. X 2 = {3, 4, 5}, E 2 = {(3, 5), , (4, 5)}. Step 3. X 2 = X, therefore we find E(X 2 ) = {(3, 1), (4, 2), (2, 3), (1, 4)}. If we put ir = 2 then edge e = (2, 3) ∈ E(X 2 ) satisfies the conditions a) and b), i.e. 2
2
H 2,5 = H 3,5 + c2(2,3) 2
2
2
2
= min{H 3,5 + c2(0,3) , H 3,5 + c2(2,3) , H 4,5 + c2(2,4) , H 4,5 + c2(1,4) } = min{4 + 5, 4 + 2, 3 + 5, 3 + 4} = 6 and condition b) holds. 2 1 1 So, we fix H 2,5 = 6 and H 2,5 = H 3,5 + c1(2,3) = 6. After that we put X 3 = X 2 ∪ {2}; E 3 = E 2 ∪ {(2, 3)}, i.e. X 3 = {2, 3, 4, 5}, E 3 = {(2, 3), (3, 5), , (4, 5)}. Step 4. X 3 = X, therefore we find E(X 3 ) = {(0, 3), (0, 2), (1, 2), (1, 4)}. If we put ir = 2 then edge e = (1, 4) ∈ E(X 3 ) satisfies the conditions a) and b), i.e. 2
2
H 1,5 = H 4,5 + c2(1,4) 2
2
2
2
= min{H 4,5 + c2(1,4) , H 2,5 + c2(1,2) , H 2,5 + c2(0,2) , H 3,5 + c2(0,3) } = min{3 + 4, 6 + 3, 6 + 2, 4 + 5} = 7 and condition b) holds. 2 1 1 So, we fix H 1,5 = 7 and H 2,5 = H 4,5 + c1(1,4) = 3 + 2 = 5. After that we put X 4 = X 3 ∪ {1}; E 4 = E 3 ∪ {(1, 4)}, i.e. X 4 = {1, 2, 3, 4, 5}, E 4 = {(1, 4), (2, 3), (3, 5), (4, 5)}.
64
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Step 5. X 4 = X, therefore we find E(X 4 ) = {(0, 3), (0, 2), (0, 1)}. We put ir = 1 and find edge e = (0, 3) ∈ E(X 4 ) which satisfies the conditions a) and b) of the algorithm 1
1
H 0,5 = H 3,5 + c1(0,3) 1
1
1
= min{H 3,5 + c1(0,3) , H 2,5 + c1(0,2) , H 2,5 + c1(0,1) } = min{2 + 3, 6 + 3, 5 + 5} = 5 and condition b) holds. 1 2 2 So, H 0,5 = 5 and H 0,5 = H 3,5 + c2(0,3) = 4 + 5 = 9. After that we put X 5 = X 4 ∪ {0}; E 5 = E 4 ∪ {(0, 3)}, i.e. X 5 = {0, 1, 2, 3, 4, 5}, E 5 = {(0, 3), (1, 4), (2, 3), (3, 5), (4, 5)}. Finally, we have obtained X 5 = X, therefore we fix the tree GT n−1 = (X, E n−1 ). This tree gives Pareto optimal paths from an arbitrary vertex x ∈ X to xf = 5 (see Fig. 1.16), i.e. s∗ : 0 → 3;
1 → 4;
2 → 3;
0
4 → 5.
3
2
5
1
4 Fig. 1.16.
Remark 1.41. Another approach for solving a multi-objective control problem with the Pareto optimality principle is based on reducing it to a single objective control problem with a certain convolution criterion [10, 11, 12, 23, 97]: H x0 xf (s) =
p i=1
where
p i=1
αi = 1;
i
αi H x0 xf (s),
αi > 0, i = 1, p.
1.12 Determining Pareto Solutions for Multi-Objective Control Problems
65
Note that in such a way we can find a Pareto solution for some classes of the problem with positive nondecreasing costs cie (t), i = 1, p, on the edges of the networks. But in the general case, via such an approach, not all Pareto solutions can be determined. 1.12.2 Pareto Solution for the Non-Stationary Case of the Problem In order to solve the non-stationary case of the problem we shall use timeexpanded networks. The time-expanded network for the cooperative case of the problem is defined in the same way as for the non-cooperative case. There is only one single exception: We do not take into account the partition Y = Z1 ∪ Z2 ∪ · · · ∪ Zp . The problem of determining non-stationary Pareto strategies for a multiobjective control problem on network (G, X, c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ) can be reduced to the problem from Section 1.12.1 on an auxiliary time-expanded network (G, Y, c1 , c2 , . . . , cp , y0 , yf ) with constant costs on the edges. This reduction is based on the following theorem: Theorem 1.42. Let (G, Y, c1 , c2 , . . . , cp , y0 , yf ) be an auxiliary time-expanded network for a given network (G, X, c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ). If s∗ is a Pareto stationary strategy of the players for a multi-objective control problem on network (G, Y, c1 , c2 , . . . , cp , y0 , yf ) then u∗ (x, t) = s∗ (x, t) for (x, t) ∈ X × {1, 2, . . . , T2 } is a non-stationary Pareto strategy for the multi-objective control problem on network (G, X, c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ). This theorem can be proved in the analogous way as Theorem 1.36. For finding non-stationary Pareto strategies of the players for the multiobjective control problem the following algorithm can be used: Algorithm 1.43. Determining Pareto Solutions for the Non-stationary Case of the Problem 1. For a given network (G, X, c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ) construct the auxiliary time-expanded network (G, Y, c1 , c2 , . . . , cp , y0 , yf ). 2. Determine Pareto stationary strategies for the multi-objective control problem on network (G, Y, c1 , c2 , . . . , cp , y0 , yf ) using Algorithm 1.39. 3. Put u∗ (x, t) = s∗ (x, t) for (x, t) ∈ X × {1, 2, . . . , T2 }. 1.12.3 Computational Complexity of the Stationary Case of the Problem and an Algorithm for its Solving on Acyclic Networks Note that our stationary multi-objective control problem on the general network is N P -hard [34] even in case p = 1, T1 = T2 = |X| − 1, because it becomes a Hamiltonian path problem in a directed graph where
66
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
all cost functions on the network are constantly equal to 1. Therefore, in the general case this problem is N P -hard. But if G has the structure of an acyclic graph then the stationary Pareto solution s∗ on network (G, X, c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ) can be found by using a Pareto stationary strategy s∗ for the problem on the auxiliary network (G, Y, c1 (t), c2 (t), , . . . , cp (t), y0 , yf ) in the following way: Algorithm 1.44. Determining a Stationary Pareto Solution on Acyclic Networks Preliminary step (step 0): Fix an arbitrary Pareto solution s∗ : y → z ∈ Y (y) for y ∈ Y for the problem on network (G, Y, c1 (t), c2 (t), . . . , cp (t), y0 , yf ) by using Algorithm 1.39. Then put W 0 = {(x0 , 0), (x1 , 1), . . . , (xT (xf ) , T (xf ))}, X 0 = {x0 , x1 , . . . , xT (xf ) } and fix s∗ (xt ) = x(t + 1) for xt ∈ X 0 , t = 0, T (xf ) − 1, where (x0 , 0), (x1 , 1), . . . , (xT (xf ) , T (xf )) is a trajectory generated by a Pareto stationary strategy s∗ in the auxiliary time-expanded network (G, Y, c1 (t), c2 (t), . . . , cp (t), y0 , yf , T1 , T2 ). If in the auxiliary time-expanded network there is no directed path from y0 to yf then for the considered problem on network (G, X, c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ) a Pareto stationary strategy does not exist. General step (step k, k ≥ 1): If X k−1 = X then STOP, otherwise determine the set
Ws∗ (X k−1 ) = (x, t) ∈ (X \ X k−1 ) × {1, 2, . . . , T2 } s∗ (x, t) ∈ W k−1 . If Ws∗ (X k−1 ) = ∅ then find a vertex (x , t ) ∈ Ws∗ (X k−1 ) with a minimal t for a given x and fix s∗ (x ) = z if s∗ (x , t ) = (z, t + 1). After that construct sets X k = X k−1 ∪ {x }, W k = W k−1 ∪ {(x , t )} and go to the next step. Some similar multi-objective problems on dynamic networks have been studied in [47].
1.13 Determining Pareto Optima for Multi-Objective Control Problems Based on the results from Section 1.12 we can describe the algorithm for solving the problem from Section 1.1.4. We reduce this problem to a similar problem on an auxiliary network (G, c1 , c2 , . . . , cp , y0 , yf ), where the graph G = (Y, E) is obtained in the following way: set of states correThe set of vertices Y consists of T2 + 1 copies of the T2 (X, t). sponding to a moment in time t = 0, 1, 2, . . . , T2 , i.e. Y = t=0
1.14 Determining a Stackelberg Solution for Hierarchical Control Problems
67
The set of edges E is defined as follows: E = E 0 ∪ E 1 ∪ E 2 ∪ · · · ∪ E T1 ∪ E T1 +1 ∪ · · · ∪ E T2 −1 ∪ E f ; where Y t = (X, t),
t = 0, T2 ;
E t = {((x, t), (y, t + 1)) | (x, t) ∈ Y t , (y, t + 1) ∈ Y t+1 , (x, y) ∈ E}, t = 0, T2 − 1; E f = {((x, t), (y, T2 )) | (x, t) ∈ Y t , (y, T2 ) ∈ Y T2 , (x, y) ∈ E, t = T1 − 1, T2 − 2}. Note that (x, t) ∈ (X, t), and here, the notation (x, t) has the same meaning as x(t), i.e. (x, t) = x(t). In our network we fix y0 = (x0 , 0) ∈ (X, 0) and yf = (xf , T2 ) ∈ (X, T2 ). We define the cost functions ci , i = 1, p, as follows: ci((x,t),(y,t+1)) = ci(x(t),y(t+1)) ,
if y(t + 1) = g t (x(t), u(t))
for given u(t) ∈ Ut (x(t)), i = 1, p, t = 0, T2 − 1; ci((x,t),(y,T2 )) = ci(x(t),y(t+1)) ,
if y(t + 1) = g t (x(t), u(t))
for given u(t) ∈ Ut (x(t)), i = 1, p, t = T1 − 1, T2 − 2. If the network (G, c1 , c2 , . . . , cp , y0 , yf ) is known then find Pareto stationary strategies and a directed path Ps∗∗ = {y0 = (x∗0 , 0), (x∗1 , 1), . . . , (x∗t , t), (x∗t+1 , t + 1), . . . , (x∗f , T2 ) = yf }. After that, starting from final position x∗f , T2 find recursively u∗ (t), such that
t = T − 1, T − 2, . . . , 1, 0,
x∗ (t + 1) = gt (x∗ (t), u∗ (t)).
Then u∗ (t) is a solution of the problem.
1.14 Determining a Stackelberg Solution for Hierarchical Control Problems We consider the hierarchical control problem from Section 1.1.3. In order to develop a dynamic programming technique for determining a Stackelberg solution we study the static case of the hierarchical problem and analyze the computational complexity of this problem. Additionally, we formulate the hierarchical control problem on the network and propose a dynamic programming algorithm for its solving. Based on these results we extend the dynamic programming technique for the hierarchical control problem from Section 1.1.3.
68
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
1.14.1 A Stackelberg Solution for Static Games Let a static game of p players Γ = (S1 , S2 , . . . , Sp , F1 , F2 , . . . , Fp ) be given, where Si , i = 1, p, represent nonempty finite sets of strategies of the players and Fi : S1 × S2 × Sp → R1 , i = 1, p, are the corresponding payoff functions in Γ . We consider the problem of determining a Stackelberg solution in this game, i.e. we are seeking for strategies s∗1 , s∗2 , . . . , s∗p such that s∗1 = s∗2 = s∗3 =
argmin
s1 ∈S 1 , (si ∈R2 (s1 ,...,si−1 ))2≤i≤p
argmin
F1 (s1 , s2 , . . . , sp );
s2 ∈R2 (s∗ 1 ), (si ∈Ri (s∗ 1 ,s2 ,...,si−1 ))3≤i≤p
F2 (s∗1 , s2 , . . . , sp );
argmin
∗ s3 ∈R3 (s∗ 1 ,s2 ), ∗ (si ∈Ri (s∗ ,s ,s ,...,s i−1 ))4≤i≤p 1 2 3
F3 (s∗1 , s∗2 , . . . , sp );
.. . s∗p =
argmin
∗ ∗ sp ∈Rp (s∗ 1 ,s2 ,...,sp−1 )
Fp (s∗1 , s∗2 , . . . , s∗p−1 , sp ),
where Rk (s1 , s2 , . . . , sk−1 ) is the set of best responses of player k when the players 1, 2, . . . , k − 1 have already fixed their strategies s1 , s2 , . . . , sk−1 , i.e.
R2 (s1 ) =
R3 (s1 , s2 ) =
argmin
s2 ∈S2 , (si ∈Ri (s1 ,...,si−1 ))3≤i≤p
argmin
F2 (s1 , s2 , . . . , sp );
s3 ∈S3 , (si ∈Ri (s1 ,s2 ,...si−1 ))4≤i≤p
F3 (s1 , s2 , . . . , sp );
.. . Rp (s1 , s2 , . . . , sp−1 ) = argmin Fp (s1 , s2 , . . . , sp ). sp ∈Sp
In this game the players fix their strategies successively according to their numerical order. Therefore, if the order of fixing the strategies of the players is changed then the best responses of the players will correspond to a Stackelberg solution with respect to a new order of the players.
1.14 Determining a Stackelberg Solution for Hierarchical Control Problems
69
Lemma 1.45. Let s∗1 , s∗2 , . . . , s∗p be a Stackelberg solution of the game Γ . If this solution remains the same for an arbitrary order of fixing strategies of the players, then s∗1 , s∗2 , . . . , s∗p is a Nash equilibrium. Proof. Assume that s∗1 , s∗2 , . . . , s∗i−1 , s∗i , s∗i+1 , . . . , s∗p is a Stackelberg solution for an arbitrary order of fixing strategies of the players. Then we may consider that an arbitrary player i fixes his strategy in the last order and therefore Fi (s∗1 , s∗2 , . . . , s∗i−1 , s∗i , s∗i+1 , . . . , s∗p ) ≤ ≤ Fi (s∗1 , s∗2 , . . . , s∗i−1 , si , s∗i+1 , . . . , s∗p ),
∀si ∈ Si , i = 1, p.
So, s∗1 , s∗2 , . . . , s∗i−1 , s∗i , s∗i+1 , . . . , s∗p is a Nash equilibrium.
The computational complexity of determining pure Nash equilibria in discrete games have been studied in [24]. Based on results from [24] we can conclude that finding a Stackelberg solution in the considered games is NPhard if the number of players p acts as input data parameter of the problem. In the case that p is fixed (i.e. p does not fact as input data parameter of the problem), then a Stackelberg solution can be found in polynomial time. In the case of a small number of players, especially in the case of two or three players, exhaustive search allows us to determine Stackelberg strategies for large dimensioned finite games. Indeed, if we calculate s∗1 , s∗2 , . . . , s∗p according to the condition from the definition of a Stackelberg solution we use |S1 | × |S2 | × · · · × |Sp | steps. So, in the case of two players we can determine a Stackelberg solution using O(|S1 ||S2 |) elementary operations (here we do not take into account the number of operations for calculating the values Fi (s1 , s2 ) for given (s1 , s2 ). We can use this fact for solving hierarchical control problems with two or three players. 1.14.2 Hierarchical Control on Networks and Determining Stackelberg Stationary Strategies Let G = (X, E) be the graph of states’ transitions for a time-discrete system L. So, X corresponds to the set of states of L and an arbitrary directed edge e = (x, y) ∈ E means the possibility of system L to pass from state x = x(t) to state y = x(t + 1) at every moment in time t = 0, 1, 2, . . . Assume that system L is controlled by p players and on the edge set the following p functions are defined: ci : E → R1 , i = 1, p, which assign p costs c1e , c2e , . . . , cpe to each edge e ∈ E. For player i the quantity cie expresses the cost of system L to pass through edge e = (x, y) from state x = x(t) to state y = x(t + 1) at every moment in time t = 0, 1, 2, . . . On G the players use only stationary strategies and intend to minimize their integral-time costs by a trajectory x0 = x(0), x(1), x(2), . . . , x(T (xf )) = xf from starting state x0 to final state xf , where T1 ≤ T (xf ) ≤ T2 . We define the stationary strategies of the players as p multi-value functions
70
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
s1 : x → X1j1 (x) ∈ A1 (x) for x ∈ X \ {xf }; s2 : x → X2j2 (x) ∈ A2 (x) for x ∈ X \ {xf }; .. . sp : x → Xpjp (x) ∈ Ap (x) for x ∈ X \ {xf }; which satisfy the condition |s1 (x) ∩ s2 (x) ∩ · · · ∩ sp (x)| = 1,
∀x ∈ X,
(1.17)
where Ai (x), i = 1, p, are given sets of subsets from X(x) = {y ∈ X | e = K (x) (x, y) ∈ E}, i.e. Ai (x) = {Xi1 (x), Xi2 (x), . . . , Xi i (x)}, Xij (x) ⊆ Xi (x), j = 1, Ki (x). The strategies s1 (x), s2 (x), . . . , sp (x) for a given x ∈ X\{xf } correspond to vectors of control parameters u1 (t), u2 (t), . . . , up (t) at a given state x = x(t) in the control problem 1.3 and reflect the fact that the set of control parameters at the given state x = x(t) uniquely determines the next state y = x(t+1) ∈ X at every moment in time tp = 0, 1, 2, . . . . Therefore, here we use condition (1.17) and consider that y = i=1 si (x). If | sj (x)| = 1 then the set of strategies s1 , s2 , . . . , sp is not feasible. In the following we consider that the players use only feasible strategies. An arbitrary set Xij (x) ∈ Ai (x) in our control problem represents a possible set of next states y = x(t + 1) ∈ X in which player i prefers to transfer system L if at the moment in time t the state of the dynamical system is x = x(t). This set for control problem 1.3 can be treated as a set of possible next states y = x(t+1) ∈ X when player i fixes a feasible vector of control parameters ui (t) ∈ Uti (x(t)). Therefore, if we treat Xij (x) as preferences of next possible sets of states from Ai (x) for player i ∈ {1, 2, . . . , p} then the unique next state y represents the intersection of preferences s1 (x), s2 (x), . . . , sp (x) p of the players 1, 2, . . . , p, i.e. y = i=1 si (x), where si : x → Xiji ∈ Ai (x) for x ∈ X \ {xf }, i = 1, p. Let s1 , s2 , . . . , sp be a fixed set of feasible strategies of the players 1, 2, . . . , p. We denote by G sp = (X, Es ) the subgraph generated by edges e = (x, y) ∈ E such that y = i=1 si (x) for x ∈ X \ {xf }. In an analogous way as in Section 1.5.1 we obtain that either a unique directed path Ps (x0 , xf ) in Gs exists or such a path does not exist. Therefore, for s1 , s2 , . . . , sp and fixed starting and final states x0 , xf ∈ X we can define the quantities 1 (s1 , s2 , . . . , sp ), H 2 (s1 , s2 , . . . , sp ), . . . , H p (s1 , s2 , . . . , sp ) H x0 xf x0 xf x0 xf in the following way. We put xi x (s1 , s2 , . . . , sp ) = H 0 f
e∈E(Ps (x0 ,xf ))
cie , i = 1, p
1.14 Determining a Stackelberg Solution for Hierarchical Control Problems
71
if T1 ≤ |E(Ps (x0 , xf ))| ≤ T2 ; otherwise we put
i (s1 , s2 , . . . , sp ) = +∞. H x0 xf
Note that in this control process the players fix their strategies successively one after another according to their numerical order at each moment in time t = 0, 1, 2, . . . for every state x ∈ X \ {xf }. Additionally, we assume that each player fixing his strategies informs posterior players which strategy has been chosen. In the considered hierarchical control problem we are seeking for Stackelberg stationary strategies, i.e. we are seeking for strategies s∗1 , s∗2 , . . . , s∗p , for which s∗1 =
s∗2 =
s∗3 =
argmin
s1 ∈S 1 , (si ∈Ri (s1 ,...,si−1 ))2≤i≤p
argmin
1 (s1 , s2 , . . . , sp ); H x0 xf
s2 ∈R2 (s∗ 1 ), (si ∈R3 (s∗ 1 ,s2 ,...,si−1 ))3≤i≤p
x2 x (s∗1 , s2 , . . . , sp ); H 0 f
argmin
∗ s3 ∈R3 (s∗ 1 ,s2 ), ∗ (si ∈Ri (s∗ 1 ,s2 ,s3 ,...,si−1 ))4≤i≤p
x3 x (s∗1 , s∗2 , . . . , sp ); H 0 f
.. . s∗p =
argmin
∗ ∗ sp ∈Rp (s∗ 1 ,s2 ,...,sp−1 )
p (s∗ , s∗ , . . . , s∗ , sp ), H x0 xf 1 2 p−1
where Rk (s1 , s2 , . . . , sk−1 ) is the set of best responses of player k when the players 1, 2, . . . , k − 1 have already fixed their strategies s1 , s2 , . . . , sk−1 , i.e. R2 (s1 ) =
R3 (s1 , s2 ) =
argmin
2 (s1 , s2 , . . . , sp ); H x0 xf
argmin
3 (s1 , s2 , . . . , sp ); H x0 xf
s2 ∈S2 , (si ∈Ri (s1 ,...,si−1 ))3≤i≤p
s3 ∈S3 , (si ∈Ri (s1 ,...,si−1 ))4≤i≤p
.. . p (s1 , s2 , . . . , sp ). Rp (s1 , s2 , . . . , sp−1 ) = argmin H x0 xf sp ∈Sp
where S 1 , S 2 , . . . , S p represent the corresponding admissible sets of stationary strategies of the players 1, 2, . . . , p.
72
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Remark 1.46. In general the stationary strategies s1 , s2 , . . . , sp of the players in the hierarchical control problem on G can be defined as arbitrary multivalue functions si : x → Xiji (x) ∈ Ai (x) for x ∈ X \ {xf },
i = 1, p.
If the conditions (1.17) for x ∈ X \ {xf } do not take place, i.e. if at least for a state x ∈ X \ {xf } the following condition holds: |s1 (x) ∩ s2 (x) ∩ · · · ∩ sp (x)| = 1, xi x (s1 , s2 , . . . , sp ) = +∞. then we put H 0 f So, the hierarchical control problem is determined by the dynamic network (G, A1 , A2 , . . . , Ap , c1 , c2 , . . . , cp , x0 , xf , T1 , T2 ), where Ai = x∈X Ai (x) and ci = (cie1 , cie2 , . . . , cie|E| ), i = 1, p. In case T2 = ∞, T1 = 0 we denote the corresponding network by (G, A1 , A2 , . . . , Ap , c1 , c2 , . . . , cp , x0 , xf ). The following theorem allows us to describe a class of multi-objective hierarchical control problems for which an arbitrary Stackelberg solution is also a Nash equilibrium. Theorem 1.47. Let be given the hierarchical control problem on network (G, A1 , A2 , . . . , Ap , c1 , c2 , . . . , cp , x0 , xf ), where G has the property that for an arbitrary vertex x ∈ X there exist a directed path from x to xf . Additionally, assume that the sets A1 (x), A2 (x), . . . , Ap (x) satisfy the following condition: for an arbitrary vertex x ∈X \ {xf } there exists ix ∈ {1, 2, . . . , p} such that Aix (x) = {y} | y ∈ XG (x) and Aix (x) = {XG (x)} if i ∈ {1, 2, . . . , p} \ {ix }. Then for the hierarchical control problem on G there exists a Nash equilibrium. Proof. First of all it is easy to observe that if the conditions of the theorem hold then in the multi-objective control problem on G there exist stationary strategies s1 , s2 , . . . , sp which generate a trajectory x0 , x1 , x2 , . . . , xp from starting state x0 to final state xf for an arbitrary given x0 = x ∈ X. This means that a Stackelberg solution for the hierarchical control problem on G exists. Additionally, we can see that the dynamic c-game from Section 1.6 (the multi-objective control problem in positional form) represents a particular case of the problem formulated above when the sets A1 (x), A2 (x), . . . , Ap (x) satisfy the condition that for an arbitrary x ∈ X \ {xf } there exists ix ∈ {1, 2, . . . , p} such that Aix (x) = {y} | y ∈ XG (x) and Ai (x) = {XG (x)} if i ∈ {1, 2, . . . , p} \ {ix }. In this case a Stackelberg solution of the hierarchical control problem does not depend on the order of fixing strategies by the players, and therefore, on the bases of Lemma 1.45 an arbitrary Stackelberg solution of the multi-objective control problem on G is a Nash equilibrium.
1.14 Determining a Stackelberg Solution for Hierarchical Control Problems
73
1.14.3 An Algorithm for Determining Stackelberg Stationary Strategies on Acyclic Networks We consider the hierarchical control problem on acyclic network (G, A1 , A2 , , . . . , Ap , c1 , c2 , . . . , cp , x0 , xf ), i.e. G = (X, E) is an acyclic directed graph and T1 = 0, T2 = ∞. We also assume that in G vertex xf is attainable from every vertex x ∈ X. Algorithm 1.48. Determining Stackelberg Strategies on Acyclic Networks Preliminary step (step 0): Fix X 0 = {xf }, E 0 = ∅ and put εi (xf ) = 0, i = 1, p. General step (step k, k ≥ 1): If X \ X k−1 = ∅ then STOP; otherwise find a vertex xk ∈ X \ X k−1 for which XG (xk ) ⊆ X k−1 , where XG (xk ) = {y ∈ X | (xk , y) ∈ E}. With respect to vertex xk we consider the static problem of finding Stackelberg strategies s∗1 (xk ), s∗2 (xk ), . . . , s∗p (xk ) in the game Γ (xk ) = (S 1 (xk ), S 2 (xk ), . . . , S p (xk ), F1 , F2 , . . . , Fp ), where the sets of strategies of the players S 1 (xk ), S 2 (xk ), . . . , S p (xk ) and the payoff functions F1 , F2 , . . . , Fp are defined as follows: j
S i (xk ) = {si (xk ) | si : xk → Xi j (xk ) ∈ Ai (xk )}, i = 1, p; p i ε (y) + ci(xk ,y) , if y = i=1 si (xk ); p Fi (s1 , s2 , . . . , sp ) = (1.18) +∞, if | i=1 si (xk )| = 1. After that find a Stackelberg solution s∗1 (xk ), s∗2 (xk ), . . . , s∗p (xk ) for the p static game Γ (xk ) and determine the vertex y ∗ = i=1 s∗i (xk ). Then calculate εi (xk ) = εi (y ∗ ) + ci(xk ,y∗ ) ,
i = 1, p,
xk x (s∗ , s∗ , . . . , s∗ ) = εi (xk ), i = 1, p. Put X k = X k−1 ∪ {xk }, and fix H p 1 2 f k k−1 ∪ {(xk , y)}, GT k = (X k , E k ) and go to the next step. E =E This algorithm finds Stackelberg stationary strategies s∗1 , s∗2 , . . . , s∗p for an arbitrary starting position x0 = x and the fixed final position xf . The corresponding optimal values of integral costs of the system’s passage from starting x x (s∗ , s∗ , . . . , s∗p ). The algorithm finds the state x0 = x to final state xf are H 1 2 0 f tree GT |X|−1 = (X, E |X|−1 ) of optimal strategies with sink vertex xf which gives Stackelberg strategies for an arbitrary starting position x0 = x ∈ X. It is easy to observe that if for a given starting position x0 of the considered dynamic game a Nash equilibrium exists then the algorithm determines this equilibrium.
74
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
Note that the proposed algorithm can also be adapted for the problem when for different moments in time and for different states the order of fixing stationary strategies of the players can be different. We should take into account this order of fixing strategies of the players calculating values h∗i (xk , y) for a given state xk (t). Example. Consider a hierarchical control problem on a network with two players where the corresponding graph G = (X, E) is presented in Fig. 1.17.
1
3 4 2
0 Fig. 1.17.
This network has the structure of a directed acyclic graph with given starting vertex x0 = x(0) and final vertex xf = 4. To each vertex it is given a set of subsets Ai (x) = {Xij }, where A1 (0) = {{1, 2}, {1, 4}}; A2 (0) = {{2, 4}, {1, 4}}; A1 (1) = {{2, 3}, {3, 4}}; A2 (1) = {{2, 4}, {2, 3}}; A1 (2) = {{4}}; A2 (2) = {{4}}; A1 (3) = {{2}, {2, 4}};
U 2 (2) = {{4}, {2, 4}}.
On edges e ∈ E there are defined cost functions c1 : E → R1 and c2 : E → R1 , where
1.14 Determining a Stackelberg Solution for Hierarchical Control Problems
c1(0,1) = 2;
c1(0,2) = 1;
c1(0,4) = 5;
c2(0,1) = 3;
c2(0,2) = 2;
c2(0,4) = 6;
c1(1,2) = 3;
c1(1,3) = 4;
c1(1,4) = 3;
c2(1,2) = 3;
c2(1,3) = 1;
c2(1,4) = 5;
c1(3,2) = 1;
c1(3,4) = 2;
c2(3,2) = 1;
c2(3,4) = 4;
c1(2,4) = 1;
c2(2,4) = 2.
75
If we use Algorithm 1.48 then we obtain: Step 0. Fix X 0 = {4}, ε1 (4) = 0, ε2 (4) = 0, E 0 = ∅. Step 1. X \ X 0 = ∅ and XG (2) ⊆ X 0 , therefore, fix x1 = 2 and solve the static game Γ (2) = (S 1 (2), S 2 (2), F1 , F2 ) where S 1 (2) = {s1 : 2 → {4}},
S 2 (2) = {s2 : 2 → {4}}
and F1 (s1 , s2 ) = 1; F2 (s1 , s2 ) = 2. For this game we have a trivial solution s∗1 (2) = {4}; s∗2 (2) = {4}. We calculate ε1 (2) = 0 + c1(2,4) 1 (s∗ , s∗ ) = 1, H 2 (s∗ , s∗ ) = 2. = 1; ε2 (2) = 0 + c2 = 2 and put H (2,4)
24
1
2
24
1
2
Fix X 1 = X 0 ∪ {2} = {2, 4}; E 1 = E 0 ∪ {(2, 4)} = {(2, 4)}; GT 1 = ({2, 4}, {(2, 4)}). Step 2. X \ X 1 = ∅ and XG (3) ⊆ X 1 , therefore, fix x2 = 3 and solve the static game Γ (3) = (S 1 (3), S 2 (3), F1 , F2 ) where S 1 (3) = {s11 : 3 → {2}; s21 : 3 → {2, 4}}, S 2 (3) = {s12 : 3 → {2, 4}; s22 : 3 → {4}} and Fi (sj1 , sj2 ) are defined according to (1.18), i.e.
76
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
F1 (s11 , s12 ) = 2;
F2 (s11 , s12 ) = 3 (s11 (3) ∩ s12 (3) = 2);
F1 (s11 , s22 ) = F2 (s11 , s22 ) = ∞
(s11 (3) ∩ s22 (3) = ∅);
F1 (s12 , s21 ) = F2 (s12 , s21 ) = ∞ (s12 (3) ∩ s21 (3) = {2, 4}, i.e. |{2, 4}| = 1); F1 (s12 , s22 ) = 2;
F2 (s12 , s22 ) = 4 (s12 (3) ∩ s22 (3) = 4).
If we solve this game we find a Stackelberg solution s∗1 (3) = {2}; s∗2 (3) = {2, 4}. We calculate ε1 (3) = ε1 (2) + c1(3,2) = 2; ε2 (3) = ε2 (2) + c2(3,2) = 3 2 (s∗ , s∗ ) = 2. 1 (s∗ , s∗ ) = 1, H and put H 24 1 2 24 1 2 2 1 Fix X = X ∪ {3} = {2, 3, 4}; E 2 = E 1 ∪ {(3, 2)} = {(2, 4), (3, 2)}; GT 2 = ({2, 3, 4}, {(3, 2), (2, 4)}). Step 3. X \ X 2 = ∅ and XG (1) ⊆ X 2 , therefore, fix x3 = 1 and solve the static game Γ (1) = (S 1 (1), S 2 (1), F1 , F2 ) where S 1 (1) = {s11 : 1 → {2, 3}; s21 : 1 → {3, 4}}, S 2 (3) = {s12 : 1 → {2, 4}; s22 : 1 → {3, 4}} and Fi (sj1 , sj2 ) are defined according to (1.18), i.e. F1 (s11 , s12 ) = 4;
F2 (s11 , s12 ) = 5 (s11 (1) ∩ s12 (1) = 2);
F1 (s11 , s22 ) = F2 (s11 , s22 ) = ∞
(s11 (1) ∩ s22 (1) = ∅);
F1 (s21 , s12 ) = 3;
F2 (s21 , s12 ) = 5 (s21 (1) ∩ s12 (1) = 4);
F1 (s21 , s22 ) = 5;
F2 (s21 , s22 ) = 3 (s21 (1) ∩ s22 (1) = 3).
If we solve this game we find a Stackelberg solution s∗1 (1) = s11 (1); s∗2 (1) = s12 (1), i.e. s∗1 (1) = {2, 3}); s∗2 (1) = {2, 4}. We calculate ε1 (1) = ε1 (2)+ 1 (s∗ , s∗ ) = 4, +c1(3,2) = 4; ε2 (1) = ε2 (1) + c2(3,2) = 5 and put H 24 1 2 2 (s∗ , s∗ ) = 5. H 24 1 2 Fix X 3 = X 2 ∪{1} = {1, 2, 3, 4}; E 3 = E 2 ∪{(1, 2)} = {(1, 2), (2, 4), (3, 2)}; GT 3 = ({1, 2, 3, 4}, {(1, 2), (3, 2), (2, 4)}). Step 4. X \ X 3 = ∅ and XG (0) ⊆ X 3 , therefore, fix x4 = 0 and solve the static game Γ (0) = (S 1 (0), S 2 (0), F1 , F2 ) where
1.14 Determining a Stackelberg Solution for Hierarchical Control Problems
77
S 1 (0) = {s11 : 0 → {1, 2}; s21 : 0 → {1, 4}}, S 2 (0) = {s12 : 0 → {2, 4}; s22 : 0 → {1, 4}} and Fi (sj1 , sj2 ) are defined according to (1.18), i.e. F1 (s11 , s12 ) = 2;
F2 (s11 , s12 ) = 4
(s11 (0) ∩ s12 (0) = 2);
F1 (s11 , s22 ) = F2 (s11 , s22 ) = ∞
(s11 (0) ∩ s22 (0) = ∅);
F1 (s21 , s12 ) = 5;
(s21 (0) ∩ s12 (0) = 4);
F2 (s21 , s12 ) = 6
F1 (s21 , s22 ) = F2 (s21 , s22 ) = ∞
(s21 (0) ∩ s22 (0) = {1, 4}, |{1, 4}| = 1).
If we solve this game we find a Stackelberg solution s∗1 = s11 ; s∗2 = s22 , i.e. s∗1 = {1, 2}); s∗2 = {2, 4}. We calculate ε1 (0) = ε1 (2) + c1(0,2) = 2; 1 (s∗ , s∗ ) = 2, H 2 (s∗ , s∗ ) = 4. = 4 and put H ε2 (0) = ε2 (0) + c2 (0,2)
24
1
2
24
1
2
Fix X 4 = X 3 ∪ {0} = {0, 1, 2, 3, 4}; E 4 = E 3 ∪ {(0, 2)} = {(0, 2), (1, 2), (2, 4), (3, 2)}; GT 4 = ({0, 1, 2, 3, 4}, {(0, 2), (1, 2), (3, 2), (2, 4)}). Step 5. X \ X 4 = ∅ STOP So, the optimal stationary strategies of the players are the following: s∗1 : 0 → {1, 2}; s∗2 : 0 → {2, 4};
1 → {2, 3}; 1 → {2, 4};
2 → {4}; 2 → {4};
3 → {2} 3 → {2, 4}
The set of stationary strategies in G generates the tree given in Fig. 1.18. In this tree the strategies s∗1 (0), s∗2 (0) generate the passage (0, 1); s∗1 (1), s∗2 (1) generate the passage (1, 2); s∗1 (2), s∗2 (2) generate the passage (2, 4); s∗1 (3), s∗2 (3) generate the passage (3, 4). So, this tree gives the optimal stationary strategies of the players for an arbitrary starting position x0 = x. In this example, for x0 = 0 we obtain a Stackelberg solution which is also a Nash equilibrium. If we fix x0 = 1 this Stackelberg solution is not a Nash equilibrium.
78
1 Multi-Objective Control of Time-Discrete Systems and Dynamic Games
1
3 4 2
0 Fig. 1.18.
1.14.4 An Algorithm for Solving Hierarchical Control Problems Based on the results from Section 1.14.3 we can discuss the algorithm for solving the multi-objective hierarchical control problem from Section 1.1.3. First of all we show that the hierarchical control problem from Section 1.1.3 can be reduced to a stationary hierarchical control problem on an auxiliary network (G, c1 , c2 , . . . , cp , y0 , yT ) for which Stackelberg stationary strategies should be found. The graph G = (Y, E) of the network can be constructed in the same way as the graph G from Section 1.9.1: Y = Y 0 ∪ Y 1 ∪ Y 2 ∪ · · · ∪ Y T1 ∪ Y T1 +1 ∪ · · · ∪ Y T2
(Y k ∩ Y l = ∅, k = l);
where Y t = (X, t) corresponds to the set of states x(t) ∈ X of system L at time moment t (t = 0, T2 ): E = E 0 ∪ E 1 ∪ E 2 ∪ · · · ∪ E T1 ∪ E T1 +1 ∪ · · · ∪ E T2 −1 ∪ E f ; where E t , t = 0, T2 − 1, represents the set of directed edges in G which connects vertices from Y t with vertices from Y t+1 . We include an arbitrary directed edge ((x, t), (y, t+1)) in E t , t = 0, T2 − 1, if in the control process at time moment t for a given state x = x(t) there exist vectors of control parameters u1 (t), u2 (t), . . . , up (t) from corresponding feasible sets Ut1 (x(t)), Ut2 (x(t)), . . . , Utp (x(t)) such that y(t + 1) = gt (x(t), u1 (t), u2 (t), . . . , up (t)).
1.14 Determining a Stackelberg Solution for Hierarchical Control Problems
79
In an analogous way, we define the set E f . We include an arbitrary edge ((x, t), (xf , T2 )) in E f , t = T1 − 1, T2 − 1, if at time moments t ∈ [T1 −1, T2 −1] for state x(t) there exist vectors of control parameters u1 (t), u2 (t), . . . , up (t) from corresponding feasible sets Ut1 (x(t)), Ut2 (x(t)), . . . , Utp (x(t)) such that xf (t + 1) = gt (x(t), u1 (t), u2 (t), . . . , up (t)). Additionally, to each vertex (x, t) we associate a set of subsets Ai (x, t) = {Xij (x, t + 1), j = 1, 2, . . . , Ki (x)}, where an arbitrary set Xij (x, t + 1) represents the set of possible next states x(t + 1) when player i fixes a feasible vector of control parameters u(t) ∈ Uti (x(t)), i.e. |Ai (x, t)| = |Ut (x(t))|. In G the cost functions c1 , c2 , . . . , cp are defined as follows: To each edge et = ((x, t), (y, t + 1)) ∈ E t there are associated p costs ciet = cit (x(t), y(t + 1)), i = 1, p, t = 0, T2 − 1; to edge et = ((x, t), (xf , T2 )) ∈ E f there is associated the cost ciet = cit (x(t), xf (t + 1)), i = 1, p, t = T1 − 1, T2 − 1. After that we use the algorithm from Section 1.14.3 and determine Stackelberg stationary strategies on G with fixed starting state y0 = (x0 , t) and final state yt = (xf , T2 ). Taking into account that there exists a bijection between the set of Stackelberg stationary strategies of the players on G and the Stackelberg solution of the hierarchical control problem from Section 1.1 we can find a Stackelberg solution of the problem.
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
The mathematical tool we develop in this chapter allows us to derive methods and algorithms for solving max-min discrete control problems and to determine optimal stationary strategies of the players in dynamic zero-sum games on networks. We propose polynomial-time algorithms for finding max-min paths on networks and determining optimal strategies of players in antagonistic positional games. These algorithms are applied for studying and solving cyclic games. The computational complexity of the proposed algorithms is analyzed.
2.1 Discrete Control and Finite Antagonistic Dynamic Games We consider a discrete dynamical system L with a finite set of states X ⊆ Rn . Assume that the dynamical system is controlled by two players and it is described as follows: x(t + 1) = gt (x(t), u1 (t), u2 (t)),
t = 0, 1, 2, . . . ,
where x(0) = x0 is a given starting point of system L and u1 (t) ∈ Rm1 , u2 (t) ∈ Rm2 represent vectors of control parameters of the players 1 and 2, respectively. For the players feasible sets Ut1 (x(t)) and Ut2 (x(t)) are given at every moment in time t for an arbitrary state x(t), i.e. u1 (t) ∈ Ut1 (x(t)), u2 (t) ∈ Ut2 (x(t)) for t = 0, 1, 2, . . . and x(t) ∈ X. Additionally, we assume that the final state xf is fixed and the dynamical system should reach xf at time moment T (xf ) such that T1 ≤ T (xf ) ≤ T2 .
82
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
For given admissible vectors of control parameters u1 (t), u2 (t) of the players the integral-time cost of the system’s passage from starting state x0 to final state xf is defined in the following way: T (xf )−1
Fx0 xf (u1 (t), u2 (t)) =
ct (x(t), gt (x(t), u1 (t), u2 (t)))
t=0
if T1 ≤ T (xf ) ≤ T2 ; otherwise we put Fx0 xf (u1 (t), u2 (t)) = ∞. Here ct (x(t), gt (x(t), u1 (t), u2 (t))) = ct (x(t), x(t + 1)) expresses the cost of system L to pass from state x(t) to state x(t + 1) at the stage [t, t + 1]. We consider the antagonistic game of two players, i.e. the first player intends to maximize the integral time cost by a trajectory x(0), x(1), . . . , , x(T (xf )) = xf while the second one intends to minimize the integral-time cost. The main results we discuss in this chapter are concerned with the existence of saddle points, i.e. we are seeking for vectors u1∗ (t) ∈ U 1 , u2∗ (t) ∈ U 2 , for which Fx0 xf (u1 (t), u2∗ (t)) ≤ Fx0 xf (u1∗ (t), u2∗ (t)) ≤ Fx0 xf (u1∗ (t), u2 (t)), ∀u1 (t) ∈ U 1 , ∀u2 (t) ∈ U 2 , where U 1 = Πt,x(t) Ut1 (x(t)), U 2 = Πt,x(t) Ut2 (x(t)). In Section 1.3 we have already formulated the main results related to dynamic antagonistic games (Corollary 1.10 of Theorem 1.9). In order to prove these results we shall use the same approach as in Chapter 1 and consider antagonistic dynamic games on networks for which the alternate players’ control condition holds. Based on the time-expanded network method we can reduce this problem to antagonistic positional games on networks. The most important results we discuss in this chapter are related to determining the optimal stationary strategies of the players in the zero-sum dynamic c-game on networks which may contain directed cycles. Additionally, we also will consider antagonistic dynamic games with infinite time horizon.
2.2 Max-Min Control Problem with Infinite Time Horizon In Chapter 1 we have noted that the concept of antagonistic games can be applied to the optimal control problem with infinite time horizon. This leads to the following max-min control problem with infinite time horizon:
2.3 Zero-Sum Games on Networks
83
Let a dynamic system L with finite set of states X ⊆ Rn be given. Assume that the dynamics of system L is controlled by two players and it is described by a system of difference equations in the same way as in the previous section. Here we assume that the final state is not given and the control is made on an infinite interval of time [0, ∞]. In the control process the first player has the aim to maximize the integral mean cost τ −1 1 ct (x(t), gt (x(t), u1 (t), u2 (t))) lim τ →∞ τ t=0
by a trajectory x0 = x(0), x(1), x(2), . . . , x(τ ), . . . , while the second player intends to minimize this integral mean cost. If for given vectors of control parameters of the players u1 (t), u2 (t), t = 0, 1, 2, . . . there exists the given limit, then we will consider it as the value of the payoff function Fx0 (u1 (t), u2 (t)); otherwise we put Fx0 (u1 (t), u2 (t)) = ∞. An important particular case of the max-min control problem with infinite time horizon represents a cyclic game. Cyclic games correspond to the stationary case of max-min control problems when the set of states X is divided into two subsets X = X1 ∪ X2 (X1 ∩ X2 = ∅) such that the functions gt (x(t), u1 (t), u2 (t)), t = 0, 1, 2, . . . , satisfy the following condition: 1 g (x(t), u1 (t)), if x(t) ∈ X1 ; 1 2 gt (x(t), u (t), u (t)) = g 2 (x(t), u2 (t)), if x(t) ∈ X2 , where and
u1 (t) ∈ U 1 (x(t)),
t = 0, 1, 2, . . . , for x(t) ∈ X1
u2 (t) ∈ U 2 (x(t)),
t = 0, 1, 2, . . . , for x(t) ∈ X2 .
We show that the mathematical tool for studying max-min control problems described in this chapter allows us to elaborate algorithms for solving cyclic games with constant costs of the system’s passage from one state to another. Taking into account that the dynamical system in the considered problems has a finite set of states we formulate and study our max-min problems on dynamic networks.
2.3 Zero-Sum Games on Networks and a Polynomial Time Algorithm for Max-Min Paths Problems In the previous chapter we have studied dynamic c-games with positive cost functions on the edges. Therefore, we cannot use those results for zero-sum games.
84
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
In the following we study zero-sum games of two players with arbitrary cost functions on the edges and propose polynomial-time algorithms for their solving. The main results related to this problem have been obtained in [55, 56, 57, 60, 61, 77]. 2.3.1 Problem Formulation In this section we study the antagonistic dynamic c-game of two players on a network with arbitrary constant cost functions on the edges. This case of the problem corresponds to the max-min paths problem on networks, which generalizes classical combinatorial problems of the shortest and the longest paths in weighted directed graphs. This max-min paths problem arises as an auxiliary one when searching optimal stationary strategies of players in cyclic games. Additionally, we shall use the considered dynamic c-game for studying and solving the zero-sum control problem from Section 1.1.2. The main results are concerned with the existence of polynomial-time algorithms for determining max-min paths in networks as well as the elaboration of such algorithms. Let G = (X, E) be a directed graph with vertex set X and edge set E. Assume that G contains a vertex xf ∈ X such that it is attainable from each vertex x ∈ X, i.e. xf is a sink in G. On edge set E it is given a function c : E → R, which assigns a cost ce to each edge e ∈ E. Additionally, the vertex set is divided into two disjoint subsets XA and XB (X = XA ∪ XB , XA ∩ XB = ∅), which we regard as position sets of two players. On G we consider a game of two players. The game starts at position x0 ∈ X. If x0 ∈ XA , then the move is done by the first player, otherwise it is done by the second one. Move means the passage from position x0 to neighbor position x1 through edge e1 = (x0 , x1 ) ∈ E. After that, if x1 ∈ XA , then the move is done by the first player, otherwise it is done by the second one and so on. As soon as the final position is reached the game is over. The game can be finite or infinite. If the final position xf is reached in finite time, then the game is finite. In the case that the final position xf is not reached, the game is infinite. The first player in this game has the aim to maximize i cei while the second one has the aim to minimize i cei . Strictly, the considered game in normal form can be defined as follows: We identify the strategies sA and sB of the players with the maps sA : x → y ∈ X(x) for x ∈ XA ; sB : x → y ∈ X(x) for x ∈ XB , where X(x) represents the set of extremities of the edges e = (x, y) ∈ E, i.e. X(x) = {y ∈ X | e = (x, y) ∈ E}. Since G is a finite graph then the sets of strategies of the players SA = {sA : x → y ∈ X(x) for x ∈ XA }; SB = {sB : x → y ∈ X(x) for x ∈ XB }
2.3 Zero-Sum Games on Networks
85
are finite sets. The payoff function Hx0 (sA , sB ) on SA × SB is defined in the following way: Let in G be a subgraph Gs = (X, Es ) generated by edges of the form (x, sA (x)) for x ∈ XA and (x, sB (x)) for x ∈ XB . Then either a unique directed path Ps (x0 , xf ) from x0 to xf exists in Gs or such a path does not exist in Gs . In the second case, in Gs there exists a unique directed cycle Cs , which can be reached from x0 . For given sA and sB we set Hx0 (sA , sB ) = ce , e∈E(Ps (x0 ,xf ))
if in Gs there exists a directed path Ps (x0 , xf ) from x0 to xf , where E(Ps (x0 , xf )) is a set of edges of the directed path Ps (x0 , xf ). If in G there is no directed path from x0 to xf , then we define Hx0 (sA , sB ) as follows. Let Ps (x0 , y0 ) be a directed path, which connects the vertex x0 with the cycle Cs and Ps (x0 , y0 ) has no other common vertices with Cs except y0 . Then we put ⎧ +∞, if ce > 0; ⎪ ⎪ ⎪ ⎪ e∈E(C ) ⎪ s ⎪ ⎪ ⎪ ⎨ ce , if ce = 0; Hx0 (sA , sB ) = (x ,y )) ⎪ e∈E(P e∈E(C ) 0 0 s ⎪ s ⎪ ⎪ ⎪ ⎪ ⎪ −∞, if ce < 0. ⎪ ⎩ e∈E(Cs )
This game is related to zero-sum positional games of two players and it is determined by the graph G with sink vertex xf , partition X = XA ∪ XB , cost function c : E → R and starting position x0 . We denote the network, which determines this game, by (G, XA , XB , c, x0 , xf ). In case when the dynamic c-game is considered for an arbitrary starting position x ∈ X we shall use the notation (G, XA , XB , c, xf ). In [60, 61] it is shown that if G does not contain directed cycles, then for every x ∈ X the following equality holds: v(x) = max min Hx (sA , sB ) = min max Hx (sA , sB ), sA ∈SA sB ∈SB
sB ∈SB sA ∈SA
(2.1)
which means the existence of optimal strategies of the players in the considered game. Moreover, in [60, 61] it is shown that in G there exists a tree GT = (X, E ∗ ) with sink vertex xf , which gives the optimal strategies of the players in the game for an arbitrary starting position x0 ∈ X. The strategies of the players are obtained by fixing s∗A (x) = y, if (x, y) ∈ E ∗ and x ∈ XA \ {xf };
s∗B (x) = y, if (x, y) ∈ E ∗ and x ∈ XB \ {xf }.
86
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
In the general case for an arbitrary graph G equality (2.1) may fail to hold. Therefore, we formulate necessary and sufficient conditions for the existence of optimal strategies of the players in this game and propose a polynomial-time algorithm for determining the tree of max-min paths from every x ∈ X to xf . Furthermore, we show that our max-min paths problem on the network can be regarded as a zero-value ergodic cyclic game. So, the proposed algorithm can be used for solving such games. In [55, 56] the formulated game on network (G, XA , XB , c, x0 , xf ) is named dynamic c-game. Some preliminary results related to this problem have been obtained in [60, 61]. More general models of positional games on networks with p players have been studied in [5, 58, 59, 67, 69, 70, 71]. The considered max-min paths problem can be used for the zero-sum control problem with alternate players’ control (see Corollary 1.10). For p = 2 on the basis of construction from Section 1.10 we obtain network (G, Z1 , Z2 , c, z0 , zf ), where G = (Z, E), Z = Z1 ∪ Z2 (Z1 ∩ Z2 = ∅) and c = c1e = −c2e , ∀e ∈ E. This network determines the max-min paths problem, the solution of which corresponds to the solution of the zero-sum control problem. 2.3.2 An Algorithm for Solving the Problem on Acyclic Networks The formulated problem for acyclic networks has been studied in [56, 60, 61]. Let G = (X, E) be a finite directed graph without directed cycles and a given sink vertex xf . The partition X = XA ∪XB (XA ∩XB = ∅) of the vertex set of G is given and the cost function c : E → R on the edges is defined. We consider a dynamic c-game on G with a given starting position x ∈ X. It is easy to observe that for fixed strategies of the players sA ∈ SA and sB ∈ SB the subgraph Gs = (X, Es ) has the structure of a directed tree with sink vertex xf ∈ X. This means that the value Hx (sA , sB ) is determined uniquely by the sum of edge costs of the unique directed path Ps (x, xf ) from x to xf . In [60, 61] it is proved that for an acyclic c-game on network (G, XA , XB , c, x0 , xf ) there exist strategies of the players s∗A , s∗B such that v(x) = Hx (s∗A , s∗B ) = max min Hx (sA , sB ) sA ∈SA sB ∈SB
= min max Hx (sA , sB ) sB ∈SB sA ∈SA
(2.2)
and s∗A , s∗B do not depend on a starting position x ∈ X, i.e. (2.2) holds for every x ∈ X. The equality (2.2) is evident in the case that ext(c, x) = 0, ∀x ∈ X \ {xf }, where ⎧ ⎨ max c(x,y) , x ∈ XA ; y∈X(x) ext(c, x) = ⎩ min c(x,y) , x ∈ XB . y∈X(x)
2.3 Zero-Sum Games on Networks
87
In this case v(x) = 0, ∀x ∈ X and the optimal strategies of the players can be obtained by fixing the maps s∗A : XA \ {xf } → X and s∗B : XB \ {xf } → X such that s∗A ∈ VEXT(c, x) for x ∈ XA \ {xf } and s∗B ∈ VEXT(c, x) for x ∈ XB \ {xf }, where VEXT(c, x) = {y ∈ X(x) | c(x,y) = ext(c, x)}. If network (G, XA , XB , c, x0 , xf ) has the property that ext(c, x) = 0, ∀x ∈ X \ {xf }, then it is named network in canonic form. So, for the acyclic c-game on a network in canonic form equality (2.2) holds and v(x) = 0, ∀x ∈ X. In the general case equality (2.2) can be proved by using the properties of the potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) on edges e = (x, y) of the network, where ε : X → R is an arbitrary real function on X (the potential transformation for positional games has been introduced in [8, 40, 56]). The fact is that such a transformation of costs on the edges of a acyclic network in a c-game does not change the optimal strategies of the players, although the values v(x) of positions x ∈ X are changed by v(x)+ε(xf )−ε(x). It means that for an arbitrary function ε : X → R the optimal strategies of the players in acyclic c-games on the networks (G, XA , XB , c, x0 , xf ) and (G, XA , XB , c , x0 , xf ) are the same. Note that the vertices x ∈ X of the acyclic graph G can be numbered with 1, 2, . . . , |X|, such that if x > y then in G there is no directed path from y to x. Therefore, we can use the following recursive formula: ε(xf ) = 0; ε(x) =
maxy∈X(x) {c(x,y) + ε(y)} for x ∈ XA \ {xf }; miny∈X(x) {c(x,y) + ε(y)} for x ∈ XB \ {xf }
(2.3)
to tabulate the values ε(x), ∀x ∈ X. It is evident that the transformation c(x,y) = c(x,y) + ε(y) − ε(x) satisfies the condition ext(c , x) = 0, ∀x ∈ X. This means that the following theorem holds: Theorem 2.1. For an arbitrary acyclic network (G, XA , XB , c, x0 , xf ) with a sink vertex xf there exists a function ε : X → R which determines the potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) on edges e = (x, y) such that network (G, XA , XB , c, x0 , xf ) has canonic form. The values ε(x), x ∈ X, which determine function ε : X → R, can be found by using recursive formula (2.3). On the basis of this theorem the following algorithm for determining optimal strategies of the players in a c-game is proposed in [56].
88
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
Algorithm 2.2. Determining the Optimal Strategies of the Players on an Acyclic Network 1. Find values ε(x), x ∈ X, according to recursive formula (2.3) and the corresponding potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) on edges (x, y) ∈ E. 2. Fix arbitrary maps s∗A (x) ∈ VEXT(c , x) for x ∈ XA \ {xf } and s∗B (x) ∈ VEXT(c , x) for x ∈ XB \ {xf }. Remark 2.3. The values ε(x), x ∈ X, represent the values of the acyclic c-game on (G, XA , XB , c, x0 , xf ) with starting position x, i.e. ε(x) = v(x), ∀x ∈ X. Algorithm 2.2 needs O(|X|2 ) elementary operations, because the tabulation of the values ε(x), x ∈ X, using formula (2.3) for acyclic networks needs this number of operations.
2.3.3 Main Results for the Problem on an Arbitrary Network First of all we give an example which shows that equality (2.1) may fail to hold. In Fig. 2.1 it is given a network with starting position x0 = 0 and final position xf = 3, where positions of the first player are represented by circles and positions of the second player are represented by squares; values of cost functions on edges are given alongside them.
1
2 2
0 3
-2
3 -1
2 Fig. 2.1.
It is easy to observe that max min H0 (sA , sB ) = 2,
sA ∈SA sB ∈SB
min max H0 (sA , sB ) = 3.
sB ∈SB sA ∈SA
The following theorem gives conditions for the existence of a saddle point with finite v(x) for each x ∈ X in the c-game. Theorem 2.4. Let (G, XA , XB , c, x0 , xf ) be an arbitrary network with sink vertex xf ∈ X. Additionally, assume that e∈E(Cs ) ce = 0 for every directed cycle Cs from G. Then for a c-game on (G, XA , XB , c, x0 , xf ) condition (2.1)
2.3 Zero-Sum Games on Networks
89
with finite p(x) holds for every x ∈ X if and only if there exists a function ε : X → R, which determines a potential transformation c(x,y) = c(x,y) +ε(y)− ε(x) on edges (x, y) ∈ E such that ext(c , x) = 0, ∀x ∈ X. Moreover, if in G there exists a potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) on edges (x, y) ∈ E such that ext(c , x) = 0, ∀x ∈ X \ {xf }, then v(x) = ε(x) − ε(xf ), ∀x ∈ X. Proof. =⇒ Let us consider that e∈E(Cs ) ce = 0 for every directed cycle Cs in G and condition (2.1) holds for every x ∈ X. Moreover, we consider that v(x) is a finite value for every x ∈ X. Taking into account that the potential transformation does not change the cost of the cycles, we obtain that this transformation does not change optimal strategies of the players although values v(x) of positions x ∈ X are changed by v(x) − ε(x) + ε(xf ). It is easy to observe that if we put ε(x) = v(x) for x ∈ X, then the function ε : X → R determines a potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) on edges (x, y) ∈ E such that ext(c , x) = 0, ∀x ∈ X. ⇐= Let us consider that there exists a potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) on edges (x, y) ∈ E such that ext(c , x) = 0, ∀x ∈ X. The value v(x) of the game after the potential transformation is zero for every x ∈ X and optimal strategies of the players can be found by fixing s∗A and s∗B such that s∗A (x) ∈ VEXT(c , x) for x ∈ XA \ {xf } and s∗B (x) ∈ VEXT(c , x) for x ∈ XB \{xf }. Since the potential transformation does not change optimal strategies of the players we put v(x) = ε(x) − ε(xf ) and obtain (2.1). Corollary 2.5. If for every directed cycle Cs in G the condition e∈E(Gs ) ce = 0 and equality (2.1) hold then there exists a potential transformation ε : X → R such that ext(c , x) = 0, ε(xf ) = 0 and v(x) = ε(x), ∀x ∈ X. Corollary 2.6. If for every directed cycle Cs in G the condition e∈E(Gs ) ce = 0 holds then the existence of a potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) on edges (x, y) ∈ E such that ext(c , x) = 0, ∀x ∈ X
(2.4)
represents a necessary and sufficient condition for the validity of equality (2.1) for every x ∈ X. In the case that in G there exists a cycle Cs with e∈E(Cs ) ce = 0 condition (2.4) becomes only a necessary one for the validity of equality (2.1) for every x ∈ X. Corollary 2.7. If in a c-game there exist the strategies s∗A and s∗B , for which (2.1) holds for every x ∈ X and these strategies generate in G a tree Ts∗ = (X, Es∗ ) with sink vertex xf , then there exists a potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) on edges (x, y) ∈ E such that the graph G0 = (X, E 0 ), generated by the set of edges E 0 = {(x, y) ∈ E | c(x,y) = 0}, contains the tree Ts∗ as a subgraph.
90
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
Taking into account the results mentioned above we propose an algorithm for determining optimal strategies of the players in a c-game based on constructing of the tree of max-min paths. This algorithm works if such a tree in G exists. 2.3.4 A Polynomial Time Algorithm for Determining Optimal Strategies of the Players in a Dynamic c-Game We consider the dynamic c-game determined by network (G, XA , XB , c, x0 ) where the graph G has a sink vertex xf . At first we assume that for an arbitrary vertex there exists the value v(x) which satisfies condition (2.4) and v(x) = ±∞. So, we assume that in G there exists a tree of max-min paths from x ∈ X to xf . We show that for determining optimal strategies of the players in the considered game there exists a polynomial time algorithm. In this section we propose such an algorithm based on the reduction to an auxiliary dynamic c-game with an acyclic network (G, WA , WB , c, wf0 ), where graph G = (W, E) is obtained from G = (X, E) in the following way: The set of vertices W consists of n − 1 copies of vertex set X and sink vertex wf0 , i.e. W = {wf0 } ∪ W 1 ∪ W 2 ∪ · · · ∪ W n−1 , i where W i = {w0i , w1i , . . . , wn−1 }, i = 1, n − 1. Here W i ∩ W j = ∅ for i = i i j and vertices wk ∈ W , i = 1, n − 1, correspond to vertex xk from X = {x0 , x1 , x2 , . . . , xn−1 }. The set of edges E is defined in the following way:
E = E 0 ∪ E 1 ∪ E 2 ∪ · · · ∪ E n−1 ; E i = {(wki+1 , wli ) | (xk , xl ) ∈ E}, 0
E =
{(wki , wf0 )
i = 1, n − 2;
| (xk , xf ) ∈ E, i = 1, n − 1}.
In G the edge subset E i ⊆ E connects vertices of the set W i+1 with vertices of set W i by edges (wki+1 , wli ) if in G there exists a directed edge (xk , xl ). Additionally, in G each vertex wki , i = 1, n − 1, is connected with sink vertex wf0 by edge (wki , wf0 ) if in G there exists a directed edge (xk , xf ). The subsets WA , WB and the cost function c: E → R are defined as follows: WA = {wki ∈ W | xk ∈ XA },
WB = {wki ∈ W | xk ∈ XB };
c(wki+1 , wli ) = c(xk , xl ),
(xk , xl ) ∈ E and (wki+1 , wli ) ∈ E i ; i = 1, n − 2;
c(wki , wf0 ) = c(xk , xf ),
(xk , xf ) ∈ E and (wki , wf0 ) ∈ E 0 ; i = 1, n − 1.
¿From G we delete all vertices wki for which there are no directed paths from wki to wf0 . For the obtained directed graph we will preserve the same notation and we will keep in mind that G does not contain such vertices.
2.3 Zero-Sum Games on Networks
91
Let us consider the dynamic c-game determined by the acyclic network (G, WA , WB , c, wf0 ) with sink vertex wf0 . So, we consider the problem of determining the values v (wki ) of the game for every wki ∈ W . We show that if v (wk1 ), v (wk2 ), . . . , v (wkn−1 ) are the corresponding values of vertices wk1 , wk2 , . . . , wkn−1 in an auxiliary game, then there exists an i ∈ {1, n − 1} such that v(xk ) = v (wki ). The vertex wki we seek among wkn−1 , wkn−2 , . . . , wk2 , wk1 starting with the highest level set W n−1 . We consider in G the max-min path , wkn−3 , . . . , wkn−r−1 , wf0 } PG (wkn−1 , wf0 ) = {wkn−1 , wkn−2 1 2 r from wkn−1 to wf0 generated by directed edges e = (wkn−i−1 , wkn−i ) for which i i+1 ε (wkn−i ) − ε (wkn−i−1 ) + c(wkn−i−1 , wkn−i ) = 0, i+1 i i i+1 where ε (wkj ) = v (wkj ) for every wkj ∈ {wkn−1 , wkn−2 , wkn−3 , . . . , wkn−r−1 , wf0 }. 1 2 r n−1 0 The directed path PG (wk , wf ) corresponds in G to a directed path PG (xk , xf ) = {xk , xk1 , xk2 , . . . , xkr , xf } = (Xkn−1 , Ekn−1 ) induced from xk to xf . In G we consider the subgraph Gn−1 k n−1 by the set of vertices Xk = {xk , xk1 , xk2 , . . . , xkr , xf }. For vertices xki and ), v(xk ) = v (wkn−1 ) and verify if in Gn−1 the xk we put v(xki ) = v (wkn−i−1 k i following condition holds: ext(c , z) = 0,
∀z ∈ Xkn−1 ,
(2.5)
where c (z, x) = ε(x) − ε(z) + c(z, x) for e = (z, x) ∈ Ekn−1 . does not contain directed cycles then If condition (2.5) holds and Gn−1 k we may conclude that for the dynamic c-game on G with starting position xk holds v(xk ) = v (wk ). Note that for every vertex xki of the directed path ). If the condition mentioned above P0 (xki , xf ) we obtain v(xki ) = v (wkn−i−1 i does not take place, then v(xk ) = v (wkn−1 ) and we delete wkn−1 from G. After = (Xkn−2 , Ekn−2 ) and that, consider vertex wkn−2 , construct the graph Gn−2 k n−1 in the same way verify if v(xk ) = v (wk ). Finally, we obtain that at least for one vertex wki the directed path PG (wki , wf ) does not contain a directed cycle and condition (2.5) holds, i.e. v(xk ) = v(wki ). In such a way we obtain v(xk ) for every xk ∈ X. If v(x) is known for every x ∈ X then we fix ε(x) = v(x) and define a potential transformation c (z, x) = c(z, x) + ε(x) − ε(z) on the edges (z, x) ∈ E. After that, find the graph G0 = (V, E 0 ), generated by the set of edges E 0 = {(z, x) ∈ E | c (z, x) = 0}. In G0 fix an arbitrary tree T ∗ = (V, E ∗ ), which determines the optimal strategies of the players as follows: s∗A (z) = x, if (z, x) ∈ E ∗ and z ∈ XA \ {xf }; s∗B (z) = x, if (z, x) ∈ E ∗ and z ∈ VB \ {xf }.
The correctness of the algorithm is based on the following theorem:
92
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
Theorem 2.8. Let v(xk ) be the value of vertex xk in the dynamic c-game on G and PG (xk , xf ) = {xk , xk1 , xk2 , . . . , xkr , xf } be the max-min path from xk to xf in G. Then v (wkr+1 ) = v(xk ). Proof. The construction described above allows us to conclude that between the set of directed paths from xk to xf with no more than r + 1 edges in G and the set of directed paths from wkr+1 to wf0 with no more than r + 1 edges in G there exists a bijective mapping which preserves the sum of costs of the edges. Therefore, v (wkr+1 ) = v(xk ). Remark 2.9. If PG (xk , xf ) = {xk , xk1 , xk2 , . . . , xkr , xf } is the max-min path from xk to xf in G then in G may exist several vertices wkr+i ∈ W for which v (wkr+i ) = v(xk ), where i ≥ 1. If v (wkr+i ) = v(xk ), then in G the max-min , wkr+i−2 , . . . , wki r , wf0 } corresponds to the path PG (wkr+1 , wf0 ) = {wkr+i , wkr+i−1 1 2 max-min path PG (xk , xf ) in G. It is easy to observe that the running time of the algorithm is O(n4 ). Indeed, the values of the positions of the game on an acyclic network can be calculated in time O(N 2 ), where N is the number of vertices of the network. Taking into account that N ≈ n2 for our auxiliary network we obtain that the running time of the algorithm is O(n4 ). Note that the proposed algorithm can be also applied for the c-game when the tree of max-min paths in G may not exist but there exists a max-min path from a given vertex x = x0 to xf , i.e. the algorithm can be applied for a c-game with starting position x0 . An important problem for a dynamic c-game is how to determine vertices x ∈ X for which v(x) = +∞ and vertices x ∈ X for which v(x) = −∞. Taking into account that the final position xf in such games cannot be reached we may delete vertices x of the graph G for which there exist max-min paths from x to xf . In order to specify the algorithm for this case we need to study the infinite dynamic c-game where the graph G has no sink vertex xf . This means that the outcome of the game is a cycle which may have positive, negative or zero-sum cost of edges. For determining the outcome of the game in this case we can use the same approach based on the reduction to an acyclic c-game. The algorithm for finding optimal strategies of the players in infinite dynamic c-games is similar to the algorithm for finding optimal strategies of the players in cyclic games. Such an algorithm we describe in Section 2.5.6 and we can see that for an arbitrary position x ∈ X the value of the cyclic game is positive if and only if the value v(x) of the infinite dynamic c-game is positive. Additionally, we can see that an efficient polynomial time algorithm for solving cyclic games can be elaborated if a polynomial time algorithm for solving an infinite dynamic c-game exists. In the following we give an example which illustrates the details of the algorithm proposed above.
2.3 Zero-Sum Games on Networks
93
Example. Consider a dynamic c-game determined by network (G, XA , XB , c, xf ) given in Fig. 2.2. The position set XA of the first player is represented by cycles and the position set XB of the second player is represented by squares; xf = 0. The costs of edges are given alongside them.
2 -6
0 0
1
0
3 4
0
1 Fig. 2.2.
The auxiliary acyclic network for our dynamic c-game is represented in Fig. 2.3.
0
-6 23
-5 33
13
-6
-6
0
0
0 4 5 32
0
4
1 22
1
0
31
0 12 1
4 21
0
00
0 0 0
0
0 11 Fig. 2.3.
Each vertex in Fig. 2.3 is represented by double numbers where the first one represents the number of the copy in G and the second one corresponds to the number of the vertex in G. Alongside edges there are given their costs and alongside vertices there are given values of the dynamic c-game on an auxiliary network.
94
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
Let us fix vertex wkn−1 = 33 as starting position of the dynamic c-game on the auxiliary network. Then we obtain v (33) = −5. In order to verify if v(3) = −5 we find the max-min path PG (33, 00) = {33, 22, 11, 00} and the values v (33) = −5, v (22) = 1, v (11) = 0, v (00) = 0. The path PG (33, 00) in G corresponds to path PG (3, 2, 1, 0). For vertices 3, 2, 1, 0 in G we fix ε(3) = v (33) = −5, ε(2) = v (22) = 1, ε(1) = v (11) = 0, ε(0) = v (00) = 0. After that, find graph G33 = (X33 , E33 ) generated by the set of vertices X 3 = {3, 2, 1, 0}. Then make a potential transformation c (x, y) = ε(y) − ε(x) + c(x, y) = 0 with given ε(3) = −5, ε(2) = 1, ε(1) = 0, ε(0) = 0, c (1, 0) = ε(0) − ε(1) + c(1, 0) = 0 − 0 + 0 = 0, c (2, 0) = ε(0) − ε(2) + c(2, 0) = 0 − 1 + 0 = −1, c (3, 0) = ε(0) − ε(3) + c(3, 0) = 0 − (−5) + 0 = 5, c (1, 3) = ε(3) − ε(1) + c(1, 3) = −5 − 0 + 4 = −1, c (2, 1) = ε(1) − ε(2) + c(2, 1) = 0 − 1 + 1 = 0, c (3, 2) = ε(2) − ε(3) + c(3, 2) = 1 − (−5) − 6 = 0. So, after the potential transformation c (x, y) = ε(y) − ε(x) + c(x, y), ∀(x, y) ∈ E, we obtain the network given in Fig. 2.4 with new costs on the edges.
2 0
-1 5
0
3 -1
0
0
1 Fig. 2.4.
If we select the tree with zero-cost edges we obtain the tree of max-min paths, represented in Fig. 2.5. If we start with vertex wkn−1 = 32 then we obtain the subgraph G32 = 3 (X2 , E23 ) which coincides with the graph G = (X, E) and for which ε(2) = v (32) = 5, ε(1) = v (21) = 4, ε(3) = v (13) = 0, ε(0) = v (00) = 0. It is easy to see that in this case the condition extr(c , x) = 0, is not satisfied.
∀x ∈ X,
2.3 Zero-Sum Games on Networks
95
2
3
0
1 Fig. 2.5.
2.3.5 A Pseudo-Polynomial Time Algorithm for Solving a Dynamic c-Game In this section we describe an algorithm for solving a dynamic c-game which is based on a special recursive procedure for calculating the values v(x) of the game. From the practical point of view the proposed algorithm may be more useful than the algorithm from the previous section although its computational complexity is O(|X|3 e∈E |ce |) (c : E → R is an integer function). We assume that in G there exists the tree of max-min paths. Preliminary step (step 0) Set X ∗ = {xf }, ε(xf ) = 0. General step (step k) Find the set of vertices X = {z ∈ X \ X ∗ | (z, x) ∈ E, x ∈ X ∗ }. For each z ∈ X we calculate ⎧ ⎪ ⎨ max {ε(x) + c(z, x)}, x∈OX ∗ (z) ε(z) = ⎪ ⎩ min {ε(x) + c(z, x)}, x∈OX ∗ (z)
z ∈ XA z ∈ XB
X ; X ,
(2.6)
where OX ∗ (z) = {x ∈ X ∗ | (z, x) ∈ E}, and then do the following points a) and b): a) Fix β(z) = ε(z) for z ∈ X ∪ X ∗ and then for every x ∈ X ∪ X ∗ calculate ⎧ ⎨ β(z) =
⎩
max
{ε(x) + c(z, x)}, z ∈ XA ∩ (X ∪ X ∗ );
min
{ε(x) + c(z, x)}, z ∈ XB ∩ (X ∪ X ∗ ).
x∈OX ∗ ∪X (z)
(2.7)
x∈OX ∗ ∪X (z)
b) Check if β(z) = ε(z) for every z ∈ X ∪ X ∗ . If this condition is not satisfied then fix ε(z) = β(z) for z ∈ X ∪ X ∗ ; go to point a).
96
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
If β(z) = ε(z) for every z ∈ X ∪ X ∗ then in X ∪ X ∗ we find the subset
Y k = z ∈ X ∗ X extrx∈OX ∗ ∪X (z) {ε(x) − ε(z) + c(z, x)} = 0, where extx∈OX ∗ ∪X (z) {ε(x) − ε(z) + c(z, x)} = ⎧ ⎨ =
⎩
max
{ε(x) − ε(z) + c(z, x)},
z ∈ (X ∪ X ∗ ) ∪ XA ;
min
{ε(x) − ε(z) + c(z, x)},
z ∈ (X ∪ X ∗ ) ∪ XB ,
x∈OX ∗ ∪X (z) x∈OX ∗ ∪X (z)
After that we change X ∗ by Y k and check if X ∗ = X. If X ∗ = X, then go to the next step. If X ∗ = X, then define a potential transformation c (z, x) = c(z, x) + ε(x) − ε(z) on the edges (z, x) ∈ E and find the graph G0 = (X, E 0 ), generated by the set of edges E 0 = {(z, x) ∈ E | c (z, x) = 0}. In G0 fix an arbitrary tree T ∗ = (X, E ∗ ), which determines optimal strategies of the players as follows: s∗A (z) = x,
if (z, x) ∈ E ∗ and z ∈ XA \ {xf };
s∗B (z) = x,
if (z, x) ∈ E ∗ and z ∈ XB \ {xf }.
Let us show that this algorithm finds the tree of max-min paths T ∗ = (X, E ∗ ) if such a tree exists in G. Theorem 2.10. If in G there exists the tree of max-min pathsT ∗ = (X, E ∗ ) with sink vertex xf then the algorithm finds it using O(|X|3 e∈E |ce |) elementary operations. Proof. Consider the set Y k−1 obtained after k − 1 steps of the algorithm and assume that at step k after the points a) and b) the condition β(z) = ε(z)
for every z ∈ X
holds. This condition is equivalent to the condition extx∈OX ∗ ∪X (z) {ε(x) − ε(z) + c(z, x)} = 0,
∀z ∈ X
which involves Y k−1 ⊂ Y k . Therefore, in the following we obtain that if for every step k of the algorithm the corresponding calculation procedure (2.7) is convergent then Y 0 ⊂ Y 1 ⊂ Y 2 ⊂ · · · ⊂ Y r = X, where r < n. This means that after r < n steps the algorithm finds values ε(x) for x ∈ X and a potential transformation c (y, x) = ε(x)−ε(y)+c(y, x) for edges e = (y, x) ∈ E such that ext(c , y) = 0, ∀x ∈ X, i.e. the algorithm constructs the tree T ∗ = (X, E ∗ ). So, for an complete proof of the theorem we have to show the convergence of the calculation procedure based on formula (2.7) for an arbitrary step k of the algorithm.
2.3 Zero-Sum Games on Networks
97
Assume that at step k of the algorithm the following condition extx∈OX ∗ ∪X (z) {ε(x) − ε(z) + c(z, x)} = 0
for every z ∈ X .
holds. Consider the set of edges E = {e = (z, x ) ∈ E | β(z) = ε(x ) + c(z, x ), z ∈ X , x ∈ x ∈ OX ∗ ∪X (z)} where x corresponds to vertex z such that maxx∈OX ∗ ∪X (z) {ε(x) + c(z, x)}, z ∈ XA ∩ (X ∪ X ∗ ); ε(x ) + c(z, x ) = minx∈OX ∗ ∪X (z) {ε(x) + c(z, x)}, z ∈ XB ∩ (X ∪ X ∗ ), The calculation on the basis of a) and b) can be treated as follows: The players improve the values ε(z) of vertices z ∈ X using passages from z to corresponding vertices x ∈ OX ∗ ∪X (z). At each iteration of this calculation procedure the players can improve their income by β(z) − ε(z) units for every position z ∈ X. the subset of vertices z ∈ X for which in G = (X , E ) Denote by X to vertices from X k−1 . Then the there exist directed paths from z ∈ X improvements mentioned above of the players are possible for an arbitrary This means that if procedure a), b) at step k is applied then vertex z ∈ X. In after using one iteration of this procedure we obtain β(z) = ε(z), ∀z ∈ X. the following we can see that in order to achieve ε(z) = β(z) for the rest of it is necessary to use more than one iteration. the vertices z ∈ X \ X = X \ X. Then in G there are Let us consider in G the subset X and x ∈ X k−1 . Without no directed edges e = (z, x ) such that z ∈ X loss of generality we may consider that X in G generates a directed cycle. Denote by n(C) the number of vertices of this cycle and assume that the sum of its edges is equal to θ (θ may be positive or negative). We can see that if we apply formula (2.7) then after each n(G) iterations of the calculation procedure the values ε(z) of the vertices z ∈ C will decrease at least by |θ| units if θ < 0; if θ > 0 then these values will increase by θ. Therefore, the first player will preserve passages from vertices z ∈ C to vertices x of the cycle C if β(z) − ε(z) > 0; otherwise the first player will change the passage from one vertex z 0 ∈ C to a vertex x ∈ OX ∗ ∪X (z) which may belong to X k−1 . In an analogous way the second player will preserve passages from vertices z ∈ C to vertices x of cycle C if β(z) − ε(z) < 0; otherwise the second player will change the passage from one vertex z 0 ∈ X to a vertex x which may belong to X k−1 . So, if in G there exists the tree of max-min paths then after a finite number of iterations of procedure a), b) we obtain β(z) = ε(z) for z ∈ X . Taking into account that the values β(z) will decrease (or increase) after each n(G) iteration by integer units |θ| we may conclude that the number · maxz∈X |β(z) − ε(z)|. of iterations of the procedure is comparable with |X|2 In the worst case these quantities are limited by |X|2 e∈E |ce |.This involves that the computational complexity of the algorithm is O(|X|3 e∈E |c(e)|).
98
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
Remark 2.11. The algorithm for acyclic networks can be applied without points a) and b), because the condition β(z) = ε(z), ∀z ∈ X holds at every step k. In general, the version of the algorithm can be used without the points a) and b) if Y k−1 = Y k at every step k. In this case the running time of the algorithm is O(|X|3 ). The algorithm described above can be modified for the dynamic c-game in general form when the network contains vertices x for which v(x) = ±∞. In order to detect such vertices in point a) it is necessary to introduce a new condition which allows us to select vertices z ∈ X with big values β(z) (positive and negative). But in this case the algorithm becomes more difficult than the algorithm for finite games. Below we present two examples which illustrate the details of the algorithm. The first example illustrates the work of the algorithm when it is not necessary to use the points a) and b). The second example illustrates the details of the recursive calculation procedure in the points a) and b). Example 1. Consider the problem of determining optimal stationary strategies on a network which may contain cycles. The corresponding network with sink vertex xf = 5 is given in Fig. 2.6. In this network the positions of the first player are represented by circles and the positions of the second player are represented by squares, i.e. X1 = {1, 2, 4, 5}, X2 = {0, 3}. The values of cost functions of the edges are given in parenthesis alongside them.
6
1
3
3 1
3
2 1
2
4
5
-4 4
1
0
2
4 6
Fig. 2.6.
We can see that for this network there exists a tree of max-min paths which can be found by using the algorithm. Step 0. X ∗ = {5}; ε(5) = 0. Step 1. Find the set of vertices X = {3, 4} for which there exist directed edges (3, 5) and (4, 5) from vertices 3 and 4 to vertex 5. Then we calculate
2.3 Zero-Sum Games on Networks
99
according to (2.6) values ε(3) = 3, ε(4) = 2. It is easy to check that for vertices 3 and 4 the following condition holds: exty∈XX ∗ ∪X (x) {ε(y) − ε(x) + c(x,y) } = 0. So, Y 1 = {3, 4, 5}. Therefore, if we change X ∗ by Y 1 , after step 1 we obtain X ∗ = {3, 4, 5}. Step 2. Find the set of vertices X = {0, 1, 2} for which there exist directed edges from vertices x ∈ X to vertices y ∈ X ∗ . Then according to (2.6) we calculate {ε(3) + c(2,3) , ε(4) + c(2,4) } = max{5, 3} = 5; ε(2) = max ∗ y∈X (2)
ε(1) = ε(3) + c(1,3) = 9; ε(0) = ε(4) + 6 = 8. So, ε(0) = 8, ε(1) = 9, ε(2) = 5, ε(3) = 3, ε(4) = 2, ε(5) = 0. It is easy to check that Y 2 = {0, 2, 3, 4, 5}. Indeed, exty∈XX ∗ ∪X (3) {ε(y) − ε(3) + c(3,y) } = = min{ε(5) − ε(3) + c(3,5) , ε(2) − ε(3) + c(3,2) } = min{0 − 3 + 3, 5 − 3 + 1} = 0; exty∈XX ∗ ∪X (2) {ε(y) − ε(3) + c(2,y) } = = max{ε(3) − ε(2) + c(2,3) , ε(4) − ε(2) + c(2,4) } = max{3 − 5 + 2, 2 − 5 + 1} = 0; exty∈XX ∗ ∪X (1) {ε(y) − ε(1) + c(1,y) } = = max{ε(3) − ε(1) + c(1,3) , ε(2) − ε(1) + c(1,2) , ε(0) − ε(1) + c(0,1) } = max{3 − 9 + 6, 3 − 9 + 5, 8 − 9 + 4} = 3; exty∈XX ∗ ∪X (0) {ε(y) − ε(0) + c(0,y) } = = min{ε(4) − ε(0) + c(0,4) , ε(2) − ε(0) + c(0,2) , ε(1) − ε(0) + c(0,1) } = min{2 − 8 + 6, 5 − 8 + 4, 9 − 8 + 1} = 0; exty∈XX ∗ ∪X (4) {ε(y) − ε(4) + c(4,y) } = = max{ε(5) − ε(4) + c(4,5) , ε(2) − ε(4) + c(4,2) } = max{0 − 2 + 2, 5 − 2 − 4} = 0. So, the set of vertices for which exty∈XX ∗ ∪X (x) {ε(y) − ε(x) + c(x,y) } = 0 consists of vertices 0, 2, 3, 4, 5.
100
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
Step 3. Find the set of vertices X = {1} and calculate ε(1) =
max {ε(y) + c(1,y) }
y∈XX ∗ (1)
= max{ε(3) + c(1,3) , ε(2) + c(1,2) , ε(0) + c(1,0) } = max{3 + 6, 5 + 3, 9 + 4} = 12. Now we can see that the obtained values ε(0) = 8, ε(1) = 12, ε(2) = 5, ε(3) = 3, ε(4) = 2, ε(5) = 0 satisfy the conditions ε(y) − ε(x) + c(x,y) ≤ 0 for every (x, y) ∈ E, x ∈ XA ; ε(y) − ε(x) + c(x,y) ≥ 0 for every (x, y) ∈ E, x ∈ XB . The directed tree GT = (X, E ∗ ) generated by edges (x, y) ∈ E for which ε(y) − ε(x) + c(x,y) = 0 is represented in Fig. 2.7.
3
1
2
5
0
4 Fig. 2.7.
The optimal strategies of the players are: sA : 1 → 0;
2 → 3;
sB : 0 → 4;
3 → 5.
4 → 5;
Example 2. Consider the problem of determining the tree of max-min paths T ∗ = (X, E ∗ ) for the network given in Fig. 2.3 with the same costs of edges as in the previous section. If we apply the algorithm described above then we use only one step (k = 1). But at step 1 we use the points a) and b) and make calculations on the basis of formula (2.7). In Table 2 there are given values β(0), β(1), β(2), β(3) at each iteration of the procedure.
2.4 A Polynomial Time Algorithm for Solving Acyclic l-Games on Networks
101
Table 2 i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
β(0) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
β(1) 0 4 0 0 3 0 0 2 0 0 1 0 0 0 0
β(2) 0 1 5 1 1 4 1 1 3 1 1 2 1 1 1
β(3) 0 -6 -5 -1 -5 -5 -2 -5 -5 -3 -5 -5 -4 -5 -5
We can see that the convergence of the calculation procedure is obtained at iteration 14. Therefore, we conclude that ε(0) = 0, ε(1) = 0, ε(2) = 1, ε(3) = −5. If we make a potential transformation we obtain the network in Fig. 2.4. In Fig. 2.5 it is presented the tree of max-min paths T ∗ = (X, E ∗ ).
2.4 A Polynomial Time Algorithm for Solving Acyclic l-Games on Networks An acyclic l-game on networks has been introduced in [56, 57] as an auxiliary problem for studying and solving cyclic games, which we will consider in the next section. 2.4.1 Problem Formulation Let (G, XA , XB , c) be a network, where G = (X, E) represents a directed acyclic graph with sink vertex xf ∈ X. On E it is defined a function c: E → R and on X it is given a partition X = XA ∪ XB (XA ∩ XB = ∅) where XA and XB correspond to positions sets of the two players A and B, respectively. We consider the following acyclic game from [56]. Again we define strategies of the players as maps sA : x → y ∈ X(x) for x ∈ XA \ {xf }; sB : x → y ∈ X(x) for x ∈ XB \ {xf }. We define a payoff function H x0 : SA × SB → R in this game as follows:
102
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
Let sA ∈ SA and sB ∈ SB be fixed strategies of the players. Then the graph Gs = (X, Es ), generated by edges (x, sA (x)), x ∈ X \{xf }, and (x, sB (x)), x ∈ X \ {xf }, has the structure of a directed tree with sink vertex xf . Therefore, it contains a unique directed path Ps (x0 , xf ) with n(Ps (x0 , xf )) edges. We put 1 H x0 (sA , sB ) = ce . n(Ps (x0 , xf )) e∈E(Ps (x0 ,xf ))
The payoff function H x0 (sA , sB ) on SA × SB defines a game in normal form, which is determined by network (G, XA , XB , c, x0 , xf ). We consider the problem of finding strategies s∗A and s∗B , for which v(x0 ) = H x0 (s∗A , s∗B ) = max min H x0 (sA , sB ). sA ∈SA sB ∈SB
2.4.2 Main Properties of Optimal Strategies in Acyclic l-Games First of all let us show that for the considered max-min problem there exists a saddle point. Denote v(x0 ) = H x0 (s0A , s0B ) = min max H x0 (sA , sB ) sB ∈SB sA ∈SA
and let us show that v(x0 ) = v(x0 ). Theorem 2.12. For an arbitrary acyclic l-game the following equality holds: v(x0 ) = H x0 (s∗A , s∗B ) = max min H x0 (sA , sB ) = min max H x0 (sA , sB ). sA ∈SA sB ∈SB
sB ∈SB sA ∈SA
Proof. First of all let us note the following property of an acyclic l-game, determined by (G, XA , XB , c, x0 , xf ): If the cost function c is changed by c = c+h (h is an arbitrary real number), then we obtain an equivalent acyclic l-game determined by (G, XA , XB , c , x0 , xf ) for which v (x0 ) = v(x0 ) + h and v (x0 ) = v(x0 )+h. It is easy to observe that if h = −v(x0 ) then for the acyclic l-game with network (G, XA , XB , c , x0 , xf ) we obtain v (x0 ) = 0. This means that the acyclic l-game becomes an acyclic c-game for which the following property holds:
0 = v (x0 ) = max min H x0 (sA , sB ) = min max H x0 (sA , sB ) = 0. sA ∈SA sB ∈SB
sB ∈SB sA ∈SA
Taking into account that
H x0 (sA , sB ) = H x0 (sA , sB ) − v(x0 ) we obtain that
2.4 A Polynomial Time Algorithm for Solving Acyclic l-Games on Networks
min max
sB ∈SB sA ∈SA
103
H x0 (sA , sB ) − v(x0 ) =
= max min
sA ∈SA sB ∈SB
H x0 (sA , sB ) − v(x0 ) = v(x0 ) − v(x0 ),
i.e. v(x0 ) − v(x0 ) = 0. So, v(x0 ) = v(x0 ).
Theorem 2.13. Let an acyclic l-game determined by network (G, XA , XB , c, x0 , xf ) with starting position x0 be given. Then there exist a value v(x0 ) and a function ε: X → R, which determine a potential transformation c(x,y) = c(x,y) + ε(x) − ε(y) of costs on the edges e = (x, y) ∈ E such that the following conditions hold a) v(x0 ) = ext(c , x), ∀x ∈ X \ {xf }; b) ε(x0 ) = ε(xf ). The optimal strategies of the players in an acyclic l-game can be found as follows: Fix the arbitrary maps s∗A : XA \ {xf } → X and s∗B : XB \ {xf } → X such that s∗A (x) ∈ VEXT(c , x) for x ∈ XA \ {xf } and s∗B (x) ∈ VEXT(c , x) for x ∈ XB \ {xf }. Proof. The proof of the theorem follows from Theorem 2.1 if we regard the acyclic l-game as an acyclic c-game on network (G, XA , XB , c , x0 , xf ) with cost function c = c − v(x0 ). Corollary 2.14. The difference ε(x) − ε(x0 ), x ∈ X, represents the costs of a max-min path from x to xf in an acyclic c-game on network (G, XA , XB , c , x0 , xf ) with c(x,y) = c(x,y) − v(x0 ), ∀ (x, y) ∈ E. 2.4.3 A Polynomial Time Algorithm for Finding the Value and the Optimal Strategies in an Acyclic l-Game The algorithm, which we describe below, is based on the results from Section 2.4.2. In this algorithm we shall use the following properties: 1. The value v(x0 ) of an acyclic l-game on network (G, XA , XB , c, x0 , xf ) is nonnegative if and only if the value v(x0 ) of an acyclic l-game on network (G, XA , XB , c, x0 , xf ) is nonnegative; moreover v(x0 ) = 0 if and only if v(x0 ) = 0. 2. If M 1 = min ce and M 2 = max ce , then M 1 ≤ v(x0 ) ≤ M 2 . e∈E
e∈E
3. If in network (G, XA , XB , c, x0 , xf ) the cost function c : E → R is changed by the function ch : E → R, where che = ce − h,
∀e ∈ E
(2.8)
(h is an arbitrary constant), then the acyclic l-games on (G, XA , XB , c, x0 , xf ) and (G, XA , XB , ch , x0 , xf ), respectively, have the same optimal strategies s∗A , s∗B . Additionally, the values v(x0 ) and v h (x0 ) of these games differ by a constant h: v h (x0 ) = v(x0 ) − h. So, the acyclic l-games on (G, XA , XB , c, x0 , xf ) and (G, XA , XB , ch , x0 , xf ) are equivalent.
104
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
According to the properties mentioned above, if v(x0 ) is known, then the acyclic l-game can be reduced to an acyclic c-game by using transformation (2.3) with h = v(x0 ). After that, we can find optimal strategies in the game with network (G, XA , XB , ch , x0 , xf ) by using Algorithm 2.2. The most important aspect for us in the proposed algorithm is represented by the problem of finding a value h, for which v h (x0 ) = 0. Taking into account properties 1 and 2, we will seek for this value by using a dichotomy method on segment [M 1 , M 2 ], such that at each step of this method we will solve a dynamic cgame with network (G, XA , XB , ck , x0 , xf ), where ck = c − hk . The main idea of the general step of the algorithm is the following: We make transformation (2.8) with h = hk , where hk is the midpoint of segment [Mk1 , Mk2 ] at step k. After that, we apply Algorithm 2.2 for the dynamic c-game on network (G, XA , XB , chk , x0 , xf ) and find vhk (x0 ). If vhk (x0 ) > 0 then we fix segment 1 2 1 2 , Mk+1 ], where Mk+1 = Mk1 and Mk+1 = [Mk+1 Mk1 +Mk2 2
Mk1 +Mk2 ; 2
otherwise we put
1 2 = and Mk+1 = Mk2 . If vhk (x0 ) = 0 then STOP. The detailed Mk+1 description of the algorithm is the following:
Algorithm 2.15. Determining the Value and the Optimal Strategies in an Acyclic l -Game Let us assume that the cost function c : E → R is an integer and maxe∈E |ce | = 0. Preliminary step (step 0): Find value v(x0 ) and optimal strategies s∗A and of the dynamic c-game on (G, XA , XB , c, x0 , xf ) by using Algorithm 2.2. If v(x0 ) = 0 then fix s∗A and s∗B as solution of the l-game, put v(x0 ) = 0 and STOP; otherwise fix M11 = mine∈E ce , M12 = maxe∈E ce , L = maxe∈E |ce | + 1. s∗B
General step (step k, k ≥ 1): Find hk = tion of the edges’ costs cke = ce − hk
Mk1 +Mk2 2
and make a transforma-
for e ∈ E.
Solve the dynamic c-game on network (G, XA , XB , ck , x0 , xf ) and find value vk (x0 ) and optimal strategies s∗A , s∗B . If vk (x0 ) = 0 then fix the optimal 1 ∗ strategies s∗A and s∗B and put v(x0 ) = hk . If |vk (x0 )| ≤ 4|X| 2 L then fix sA H
(s∗ ,s∗ )
1 and s∗B ; find v(x0 ) = n(Pxs0∗ (xA0 ,xBf )) and STOP. If vk (x0 ) > 4|X| 2 L then fix 1 1 1 2 Mk+1 = Mk , Mk+1 = hk and go to step k + 1. If vk (x0 ) < − 4|X|2 L then fix 1 2 = hk , Mk+1 = Mk2 and go to step k + 1. Mk+1
2.5 Cyclic Games
105
Theorem 2.16. Let (G, XA , XB , c, x0 , xf ) be a network with integer cost function c : E → R, and L = maxe∈E |ce |. Then Algorithm 2.15 finds correctly value v(x0 ) and optimal strategies s∗A , s∗B in the acyclic l-game. The running time of the algorithm is O(|X|2 log L + 2|X|2 log |X|). Proof. Let (G, XA , XB , ck , x0 , xf ) be a network after final step k of Algorithm 2.15. Then 1 |vk (x0 )| ≤ 4|X|2 L and the number εk (x), x ∈ X, determined according to Algorithm 2.2 (when we solve the acyclic c-game), represents an approximative solution of the system ⎧ k ⎪ ⎪ ε(y) − ε(x) + c(x,y) ≤ 0 for x ∈ XA , (x, y) ∈ E; ⎨ ε(y) − ε(x) + ck(x,y) ≥ 0 for x ∈ XB , (x, y) ∈ E; ⎪ ⎪ ⎩ ε(x0 ) = ε(xf ). This means that εk (x), x ∈ X, and hk represent an approximative solution of the system ⎧ ε(y) − ε(x) + c(x,y) ≤ h for x ∈ XA , (x, y) ∈ E; ⎪ ⎪ ⎨ ε(y) − ε(x) + c(x,y) ≥ h for x ∈ XB , (x, y) ∈ E; ⎪ ⎪ ⎩ ε(x0 ) = ε(xf ). According to [45, 46], the exact solution h = v(x), ε(x), x ∈ X, of this system can be obtained from hk , εk (x), x ∈ X, by using the special roundoff procedure in time O(log(L + 1)). Therefore, the strategies s∗A , s∗B after final step k of the algorithm correspond to the optimal solution of the acyclic l-game. Taking into account that the tabulation of values ε(x), x ∈ X, in G needs O(|X|2 ) operations and the number of iterations of the algorithm is O(log L + 2 log |X|), we obtain that the running time of the algorithm is O(|X|2 log L + 2|X|2 log |X|).
2.5 Cyclic Games: Algorithms for Finding the Value and the Optimal Strategies of the Players Cyclic games have been introduced in [22, 40, 79] as extension of control models for discrete systems with infinite time horizon and mean integraltime cost by a trajectory. Here we show that the problem of finding optimal strategies of the players in such games is tightly connected with the problem of finding optimal strategies of players in a dynamic c-game and an acyclic l-game. On the basis of these results we propose algorithms for determining value and optimal strategies in cyclic games.
106
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
2.5.1 Problem Formulation and Main Properties Let G = (X, E) be a finite directed graph in which every vertex x ∈ X has at least one leaving edge e = (x, y) ∈ E. On edge set E it is given a function c: E → R which assigns a cost ce to each edge e ∈ E. Additionally, the vertex set X is divided into two disjoint subsets XA and XB (X = XA ∪ XB , XA ∩ XB = ∅) which we will regard as positions sets of the two players. On G we consider the following two-person game from [22, 40, 105, 107]: The game starts at position x0 ∈ X. If x0 ∈ XA then the move is done by the first player, otherwise it is done by the second one. Move means the passage from position x0 to neighbor position x1 through edge e1 = (x0 , x1 ) ∈ E. After that, if x1 ∈ XA then the move is done by the first player, otherwise it is done by the second player and t so on indefinitely. The first player has the aim to maximize limt→∞ inf 1t i=1 cei while the second player has the aim t to minimize limt→∞ sup 1t i=1 cei . In [22] it is proved that for this game there exists a value v(x 0t) such that the first player has a strategy of moves that insures limt→∞ inf 1t i=1 cei ≥ v(x0 ) t and the second player has a strategy of moves that insures limt→∞ sup 1t i=1 cei ≤ v(x0 ). Furthermore, in [22] it is shown that the players can achieve value v(x0 ) applying the strategies of moves which do not depend on t. This means that the considered game can be formulated in terms of stationary strategies. Such a characterization statement of the game in [40] is named cyclic game. The strategies of the players in a cyclic game are defined as maps sA : x → y ∈ X(x) for x ∈ XA , sB : x → y ∈ X(x) for x ∈ XB , where X(x) = {y ∈ X | e = (x, y) ∈ E}. Since G is a finite graph then the sets of strategies of the players SA = {sA : x → y ∈ X(x) for x ∈ XA }; SB = {sB : x → y ∈ X(x) for x ∈ XB } are finite sets. The payoff function H x0 : SA × SB → R in the cyclic game is defined as follows: Let sA ∈ SA and sB ∈ SB be fixed strategies of the players. Denote by Gs = (X, Es ) the subgraph of G generated by edges of the form (x, sA (x)) for x ∈ XA and (x, sB (x)) for x ∈ XB . Then Gs contains a unique directed cycle Cs which can be reached from x0 through the edges e ∈ Es . We assume that the value H x0 (sA , sB ) is equal to the mean edges cost of cycle Cs , i.e. 1 H x0 (sA , sB ) = ce , n(Cs ) e∈E(Cs )
where E(Cs ) represents the set of edges of cycle Cs and n(Cs ) is the number of edges of Cs . So, the cyclic game is determined uniquely by network
2.5 Cyclic Games
107
(G, XA , XB , c, x0 ), where x0 is a given starting position of the game. If we consider the problem of finding optimal strategies of the players for an arbitrary starting position x ∈ X, then we will use the notation (G, XA , XB , c). In [22, 40] it is proved that there exist strategies s∗A ∈ SA and s∗B ∈ SB such that v(x) = H x (s∗A , s∗B ) = max min H x (sA , sB ) sA ∈SA sB ∈SB
= min max H x (sA , sB ), sB ∈SB sA ∈SA
∀x ∈ X.
So, the optimal strategies s∗A , s∗B of the players in cyclic games do not depend on a starting position x0 although for different positions x, y ∈ X the values v(x) and v(y) may be different. This means that the positions set X can be divided into several classes X = X 1 ∪ X 2 ∪ · · · ∪ X k according to values of positions v 1 , v 2 , . . . , v k , i.e. x, y ∈ X i if and only if v i = v(x) = v(y). In case k = 1 the network (G, XA , XB , c) is named ergodic network [40]. In [55, 60] it is shown that every cyclic game with an arbitrary network (G, XA , XB , c, x0 ) and a given starting position x0 can be reduced to a cyclic game on an auxiliary , XB , c ). ergodic network (G , XA It is well-known [44,105, 107] that the decision problem associated to a cyclic game is in N P co-N P . Some exponential and pseudo-polynomial algorithms for finding value and optimal strategies of the players in a cyclic game are proposed in [107]. Our aim is to propose polynomial time algorithms for determining optimal strategies of the players in cyclic games. We discuss such algorithms on the basis of the results which have been announced in [60, 61]. 2.5.2 Determining the Best Response of the First Player for a Fixed Strategy of the Second Player In order to find the best response of the first player for a fixed strategy of the second player we shall use the model from Section 2.5.1 in case XB = ∅, i.e. X = XA . This case of the model corresponds to the problem of finding in G the maximal mean cost cycle, which can be reached from x0 . An efficient polynomial time algorithm for finding a maximal mean cost cycle in an weighted directed graph is proposed in [14, 43]. In [55, 98, 99] it is shown that for a strongly connected graph this problem can be represented as the following linear programming problem: Maximize the objective function H=
e∈E
subject to
ce αe
108
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
⎧ ⎪ αe − αe = 0, ⎪ ⎪ ⎪ ⎪ e∈E − (x) e∈E + (x) ⎪ ⎨ αe = 1; ⎪ ⎪ ⎪ e∈E ⎪ ⎪ ⎪ ⎩ αe ≥ 0, e ∈ E,
∀x ∈ X;
where E − (x) is a set of edges e = (y, x) ∈ E, which have their extremities in x, and E + (x) is a set of edges e = (x, y) ∈ E, originated in x. The variable αe is associated with each edge e ∈ E. An arbitrary admissible solution α of the considered linear programming problem determines in G a flow circulation with a constant (equal to 1) sum of flow values by the edges of the directed weighted graph G. It is easy to show, that any admissible solution of the linear programming problem can be represented in the form of a convex combination of flow values of elementary directed cycles with a constant (equal to 1) sum of flow values by edges of these cycles. Thus, associating to each solution α of a polyhedral admissible set Zα of the problem the directed subgraph Gα = (Xα , Eα ) generated by the edges e ∈ E with αe > 0, we obtain that any of the extreme points α of the polyhedral set Zα will correspond to the subgraph Gα of G, which has the structure of an elementary directed cycle. So, the following lemma holds: Lemma 2.17. If α is a solution of the problem, which corresponds to an extreme point of Zα , then the graph Gα represents an elementary cycle in G. On the basis of this lemma in [55, 98] the following theorem is proved: Theorem 2.18. If α∗ is an optimal basic solution of the considered linear programming problem, then the cycle Cα∗ is the maximal mean cost cycle in G. So, the problem of finding the maximal mean cost cycle in G can be solved by using the polynomial algorithm. Moreover, on the basis of duality theory, we can find a condition for determining value v of the maximal mean cycle, and the solution. Indeed, if for our linear programming problem we define the dual problem: Minimize z=v subject to ε(x) − ε(y) + v ≥ c(x,y) ,
∀(x, y) ∈ E,
then we obtain the following result which is similar to the one from [8]:
2.5 Cyclic Games
109
Theorem 2.19. For a given strongly connected directed graph G = (X, E) there exist a value v and a function ε : X → R such that c(x,y) = ε(y) − ε(x) + c(x,y) − v ≤ 0, and
max c(x,y) = 0,
y∈X(x)
∀(x, y) ∈ E
∀x ∈ X.
Moreover, if we fix in G an arbitrary map s∗ : x → y ∈ X(x) such that c(x,s(x)) = 0, ∀x ∈ X, then an arbitrary directed cycle C in Gα∗ = (X, Eα∗ ) is a solution of the problem. Let us show that if sB is an arbitrary fixed strategy of the second player then the best response s∗A of the first player can be found by using the approach described above. Indeed, if the second player fixes his strategy sB then this means that in G the set of edges EsB = {(x, sB (x)) | x ∈ XB } is fixed. Therefore, we obtain a subgraph G = (X, E), where E = EA ∪ EsB and EA = {(x, y) ∈ E | x ∈ XA }, and in order to obtain the best response of the first player we have to find in this graph the maximal mean cost cycle, which corresponds to a solution s∗A : H x (s∗A , sB ) = max H x (sA , sB ) sA
for ∀x ∈ X.
This approach based on the alternate best response of the players in cyclic games, of course, can be used for solving some classes of cyclic games. But such an approach can not be estimated from the computational point of view. Therefore, in the following we will propose another approach for determining optimal strategies in cyclic games. A similar continuous model can be used for determining the best response of the first player for a fixed strategy of the second player in the acyclic l-game. If we consider X = XA then the optimal mean directed path from starting position x0 to final position xf can be found on basis of the following linear programming problem: Maximize the objective function H x0 =
ce αe
e∈E
subject to
⎧ ⎧ ⎪ ⎪ ⎪ ⎨ −1, x = x0 ; ⎪ ⎪ ⎪ 0, x = x0 , xf ; αe − αe = ⎪ ⎪ ⎪ ⎪ ⎨ e∈E − (x) ⎩ + e∈E (x) 1, x = xf ; ⎪ ⎪ αe = 1; ⎪ ⎪ ⎪ ⎪ e∈E ⎪ ⎪ ⎩ αe ≥ 0, e ∈ E.
110
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
2.5.3 Some Preliminary Results First of all we need to remind some preliminary results from [40, 55, 56, 60, 61]. Let (G, XA , XB , c) be a network with the properties described in Section 2.5.1. In an analogous way as for dynamic c-games here we denote ⎧ ⎨ max c(x,y) for x ∈ XA , y∈X(x) ext(c, x) = ⎩ min c(x,y) for x ∈ XB , y∈X(x)
VEXT(c, x) = {y ∈ X(x) | c(x,y) = ext(c, x)}. We shall use the potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) for costs on the edges e = (x, y) ∈ E, where ε: X → R is an arbitrary function on the vertex set X. In [40] it is noted that the potential transformation does not change value and optimal strategies of the players in cyclic games. Theorem 2.20. Let (G, XA , XB , c) be an arbitrary network with the properties described in Section 2.5.1. Then there exists a value v(x), x ∈ X and a function ε: X → R which determines a potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) for costs on edges e = (x, y) ∈ E, such that the following properties hold: a) v(x) = ext(c , x)
for x ∈ X,
b) v(x) = v(y)
for x ∈ XA ∪ XB and y ∈ VEXT(c , x),
c) v(x) ≥ v(y)
for x ∈ XA and y ∈ XG (x),
d) v(x) ≤ v(y)
for x ∈ XB and y ∈ XG (x),
e)
maxe∈E |ce |
≤ 2|X| maxe∈E |ce |.
The values v(x), x ∈ X on network (G, XA , XB , c) are determined uniquely and optimal strategies of the players can be found in the following way: Fix arbitrary strategies s∗A : XA → X and s∗B : XB → X such that s∗A (x) ∈ VEXT(c , x) for x ∈ XA and s∗B (x) ∈ VEXT(c , x) for x ∈ XB . The proof of Theorem 2.20 is given in [40]. Another proof of Theorem 2.20 can be obtained if the conditions of Theorem 2.19 are applied with respect to each position set of the players. Furthermore, we shall use Theorem 2.20 in the case of a ergodic network (G, X1 , X2 , c), i.e. we shall use the following corollary: Corollary 2.21. Let (G, XA , XB , c) be an ergodic network. Then there exists a value v and a function ε: X → R which determines a potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) for costs of edges e = (x, y) ∈ E such that v = ext(c , x) for x ∈ X. The optimal strategies of the players can be found as follows: Fix arbitrary strategies s∗A : XA → X and s∗B : XB → X such that s∗A (x) ∈ VEXT(c , x) for x ∈ XA and s∗B (x) ∈ VEXT(c , x) for x ∈ XB .
2.5 Cyclic Games
111
2.5.4 The Reduction of Cyclic Games to Ergodic Games Let us consider an arbitrary network (G, XA , XB , c) with a given starting position x0 ∈ X which determines a cyclic game. In [55, 60, 61] it is shown that this game can be reduced to a cyclic game on an auxiliary ergodic network (G , WA , WB , c), G = (W, E ) with the same value v(x0 ) of the game as the initial one, where x0 ∈ W = X ∪ U ∪ Z. The graph G = (W, E ) is obtained from G if each edge e = (x, y) is changed by a triple of edges e1 = (x, u), e2 = (u, z), e3 = (z, y) with the costs ce1 = ce2 = ce3 = ce . Here u ∈ U , z ∈ Z and x, y ∈ X; W = X ∪ U ∪ Z. Additionally, in G each vertex u is connected with x0 by edge (u, x0 ) with the cost c(u,x0 ) = M (M is a big value) and each vertex z is connected with x0 by edge (z, x0 ) with the cost c(z,x0 ) = −M . In (G , WA , WB , c) the sets WA and WB are defined as follows: WA = XA ∪ Z; WB = XB ∪ U . It is easy to observe that this reduction can be done in linear time. 2.5.5 A Polynomial Time Algorithm for Solving Ergodic Zero-Value Cyclic Games Let us consider an ergodic zero-value cyclic game determined by a network (G, XA , XB , c, x0 ), where G = (X, E). Then according to Theorem 2.20 there exists a function ε : X → R which determines a potential transformation c(x,y) = c(x,y) + ε(y) − ε(x) on edges (x, y) ∈ E such that ext(c, x) = 0,
∀x ∈ X.
(2.9)
This means that if xf is a vertex of the cycle Cs∗ determined by optimal strategies s∗A and s∗B then the problem of finding a function ε : X → R which determines a canonic potential transformation is equivalent to the problem of finding values ε(x), x ∈ X in a max-min paths problem on G with sink vertex xf where ε(xf ) = 0. So, in order to solve the zero-value cyclic game each time we fix a vertex x ∈ X as a sink vertex (xf = x) and solve a max-min paths problem on G with sink vertex xf . If for a given xf = x a function ε : X → R, obtained on the basis of the algorithm from Section 2.3.4 and 2.3.5, determines a potential transformation which satisfies (2.9) then we fix s∗A and s∗B such that s∗A (x) ∈ VEXT(c , x) for x ∈ XA and s∗B (x) ∈ VEXT(c , x) for x ∈ XB . If for a given x the function ε : X → R does not satisfy (2.9) then we select another vertex x ∈ X as a sink vertex and so on. This means that optimal strategies of the players in zero-value ergodic cyclic games can be determined in time O(|X|4 ). Example. Consider the ergodic zero-sum cyclic game determined by a network given in Fig. 2.8 with starting position x0 = 0. Positions of the first player are represented by circles and positions of the second player are represented by squares, i.e. X1 = {1, 2, 4, 5}, X2 = {0, 3}. The network in Fig. 2.8 is obtained from the network in Fig. 2.8 by adding edge (5, 2) with
112
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
cost c(5,2) = −5. It is easy to check that the value of the cyclic game on this network for an arbitrary fixed starting position is equal to zero. 6
1
3
3 1
3
2 1
2
4
-5
5
-4 4
1
0
2
4 6
Fig. 2.8.
The max-min mean cycle which determines a way with zero-cost is 2 → 3 → 5 → 2. Therefore, if we fix a vertex of this cycle as a sink vertex (for example x = 5) then we can find a potential function ε : X → R which determines a potential transformation c(x,y) = ε(y) − ε(x) + c(x,y) such that ext(c , x) = 0, ∀x ∈ X. This function ε : X → R can be found by using the algorithm from the example in the Sections 2.3.4 and 2.3.5, i.e. we find costs of min-max paths from every x ∈ X to vertex 5. So, ε(0) = 8, ε(1) = 12, ε(2) = 5, ε(3) = 3, ε(4) = 2, ε(5) = 0. After the potential transformation we obtain a network with the following costs of edges: c(3,5) = ε(5) − ε(3) + c(3,5) = 0 − 3 + 3 = 0 c(4,5) = ε(5) − ε(4) + c(4,5) = 0 − 2 + 2 = 0 c(5,2) = ε(2) − ε(5) + c(5,2) = 5 − 0 − 5 = 0 c(2,3) = ε(3) − ε(2) + c(2,3) = 3 − 5 + 2 = 0 c(3,2) = ε(2) − ε(3) + c(3,2) = 5 − 3 + 1 = 3 c(4,2) = ε(2) − ε(4) + c(4,2) = 5 − 2 − 4 = −1 c(2,4) = ε(4) − ε(2) + c(2,4) = 2 − 5 + 1 = −2 c(0,4) = ε(4) − ε(0) + c(0,4) = 2 − 8 + 6 = 0 c(0,2) = ε(2) − ε(0) + c(0,2) = 5 − 8 + 4 = 1 c(1,3) = ε(3) − ε(1) + c(1,3) = 3 − 12 + 6 = −3 c(1,2) = ε(2) − ε(1) + c(1,2) = 5 − 12 + 3 = −4 c(1,0) = ε(0) − ε(1) + c(1,0) = 8 − 12 + 4 = 0 c(0,1) = ε(1) − ε(0) + c(0,1) = 12 − 8 + 1 = 5
2.5 Cyclic Games
113
The network after potential transformation is given in Fig. 2.9. We can
-3
1
3
-4
0
0 3
1
0
2
0
5
-1 1
-2
0
0
4 0
Fig. 2.9.
see that ext(c , x) = 0, ∀x ∈ X. Therefore, the edges with zero-cost determine the optimal strategies of the players s∗A : 1 → 0;
2 → 3;
s∗B : 0 → 4;
3 → 5.
4 → 5;
5 → 2;
The graph Gs∗ = (X, Es∗ ) generated by these strategies is represented in Fig. 2.10.
3
1
2
0
5
4 Fig. 2.10.
2.5.6 A Polynomial Time Algorithm for Solving Cyclic Games Based on the Reduction to Acyclic l-Games On the basis of the obtained results we can propose a polynomial time algorithm for solving cyclic games.
114
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
We consider an acyclic game on a ergodic network (G, XA , XB , c, x0 ) with a given starting position x0 . The graph G = (X, E) is considered to be strongly connected and X = {x0 , x1 , x2 , . . . , xn−1 }. Assume that x0 belongs to cycle Cs∗ determined by optimal strategies of the players s∗A and s∗B . If in G there are several such cycles we consider one of them with the minimum number of edges. We construct an auxiliary acyclic graph GTr = (W r , E r ), where W r = {w00 } ∪ W 1 ∪ W 2 ∪ · · · ∪ W r , i W i = {w0i , w1i , . . . , wn−1 },
W i ∩ W j = ∅, i = j;
i = 1, r;
E r = E 0 ∪ E 1 ∪ E 2 ∪ · · · ∪ E r−1 ; E i = {(wki+1 , wli ) | (xk , xl ) ∈ E},
i = 1, r − 1;
E 0 = {(wki , w00 ) | (xk , x0 ) ∈ E, i = 1, r}. The vertex set W r of GTr is obtained from X if it is doubled r times and a sink vertex w00 is added. The edge subset E i ⊆ E in GTr connects vertices of the set W i+1 and vertices of the set W i in the following way: If in G there exists an edge (xk , xl ) ∈ E then in GTr we add the edge (wki+1 , wli ). The edge subset E 0 ⊆ E in GTr connects vertices wki ∈ W 1 ∪ W 2 ∪ · · · ∪ W r with sink vertex w00 , i.e. if there exists an edge (xk , x0 ) ∈ E then in GTr we add the edges (wki , w00 ) ∈ E 0 , i = 1, r. After that, we define an acyclic network (GTr , WA , WB , c , w00 ), GTr = (Wr , Er ) where GTr is obtained from GTr by deleting the vertices wki ∈ W r from which vertex w00 is not attainable. The sets WA , WB and the cost function c : Er → R are defined as follows: WA = {wki ∈ Wr | xk ∈ XA }, c(wi+1 ,wi ) = c(xk ,xl ) k
l
c(wi ,w0 ) = c(xk ,x0 ) k
0
WB = {wki ∈ Wr | xk ∈ XB };
if (xk , xl ) ∈ E and (wki+1 , wli ) ∈ E i ; i = 1, r − 1; if (xk , x0 ) ∈ E and (wki , w00 ) ∈ E 0 ; i = 1, r.
Now we consider an acyclic c-game on tan acyclic network (GTr , WA , WB , c , w0r , w00 ) with sink vertex w00 and starting position w0r . Lemma 2.22. Let v = v(x0 ) be the value of a ergodic cyclic game on G and the number of edges of the max-min cycle Cs∗ in G is equal to r. Additionally, let v r (w0r ) be the value of the l-game on (GTr , WA , WB , c ) with starting position w0r . Then v(x0 ) = v r (w0r ). Proof. It is evident that there exists a bijective mapping between the set of cycles with no more than r edges (which contains the vertex x0 ) in G and the set of directed paths with no more than r edges from w0r to w00 in GTr . Therefore, v(x0 ) = v r (w0r ).
2.5 Cyclic Games
115
On the basis of this lemma we can propose the following algorithm for finding optimal strategies of the players in cyclic games: Algorithm 2.23. Determining the Optimal Stationary Strategies of the Players in Cyclic Games with Known Vertex x0 of a Max-min Cycle of the Network We construct the acyclic networks (GTr , WA , WB , c ), r = 2, 3, . . . , n, and for each of them we solve the l-game. In such a way we find the values v 2 (w02 ), v 3 (w03 ), . . . , v n (w0n ) for these l-games. Then we fix consecutively v = v 2 (w02 ), v 3 (w03 ), . . . , v n (w0n ) and each time solve the c-game on network (G, XA , XB , c ), where c = c − v. Fixing each time the values ε (xk ) = v(xk ) for xk ∈ X we check if the following condition ext(cr , xk ) = 0,
∀xk ∈ X
is satisfied, where cr(xk ,xl ) = c(xk ,xl ) + ε(xl ) − ε(xk ). We determine r for which this condition holds and fix the respective maps s∗A and s∗B such that s∗A (xk ) ∈ VEXT(c , xk ) for xk ∈ XA and s∗B (xk ) ∈ VEXT(c , xk ) for xk ∈ XB . So, s∗A and s∗B represent optimal strategies of the players in the cyclic games on G. Remark 2.24. Algorithm 2.23 finds value v(x0 ) and optimal strategies of the players in time O(|X|5 log L + 4|X|3 log |X|), because Algorithm 2.15 needs O(|X|4 log L + 4|X|2 log |X|) elementary operations for solving the acyclic lgame on network (GTr , WA , WB , c ), where L = maxe∈E |ce | + 1. In the general case, if the belonging of x0 to the max-min cycle is unknown then we use the following algorithm: Algorithm 2.25. Determining the Optimal Strategies of the Players in Ergodic Cyclic Games (General Case) Preliminary step (step 0): Fix Y1 = X. General step (step k): Select a vertex y ∈ Y1 , fix x0 = y and apply Algorithm 2.23. If there exists r ∈ {2, 3, . . . , n} such that ext(cr , x) = 0, ∀x ∈ X, then fix s∗A ∈ VEXT(ck , x) for x ∈ XA and s∗B ∈ VEXT(ck , x) for x ∈ XB and STOP; otherwise put Yk+1 = Yk \ {y} and go to next step k + 1. Remark 2.26. Algorithm 2.25 finds value v and optimal strategies of the players in time O(|X|6 log L + 4|X|4 log |X|), because in the worst case Algorithm 2.23 is repeated |X| times. The algorithm for solving cyclic games allows us to determine the sign of value v(x0 ) in an infinite dynamic c-game on G with starting position x0 . In order to determine sign(v(x0 )) we solve on G the cyclic game with starting position x0 and find v(x0 ). Then sign(v(x0 ))=sign(v(x0 )).
116
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
2.5.7 An Approach for Solving Cyclic Games Based on a Dichotomy Method and Solving Dynamic c-Games In this section we describe an approach for solving cyclic games considering that there exist efficient algorithms for solving dynamic c-games (including infinite dynamic c-games). Consider an ergodic cyclic game determined by an ergodic network (G, XA , , XB , c, x0 ) where the value of the game may be different from zero. The graph G is assumed to be strongly connected. At first we show how to determine value of the game and optimal strategies of the players in the case that vertex x0 belongs to a max-min cycle in G induced by optimal strategies of the players. To our ergodic cyclic game we associate a dynamic c-game determined by an auxiliary network (G, XA , XB ∪ {x0 }, c, x0 , x0 ), where the graph G = (X ∪ {x0 }, E) is obtained from G by adding a copy x0 of vertex x0 together with copies e = (x, x0 ) of edges e = (x, x0 ) ∈ E with costs ce = ce . So, for x0 there are no leaving edges (x0 , x). It is evident that if the value v = v(x0 ) of the ergodic cyclic game on (G, XA , XB , c, x0 ) is known then the problem of finding optimal strategies of the players is equivalent to the problem of finding optimal strategies of the players in a dynamic c-game on network (G, XA , XB ∪ {x0 }, c , x0 , x0 ) with the cost functions ce = ce − v(x0 ) for e ∈ E. Moreover, if s∗A and s∗B are optimal strategies of the players in the dynamic c-game on (G, XA , XB ∪ {x0 }, c , x0 , x0 ), then optimal strategies s∗A and s∗B of the players in the ergodic cyclic game can be found as follows: s∗A (x) = sA (x) for x ∈ XA if sA (x) = x0 ; s∗B (x) = sB (x) for x ∈ XB if sB (x) = x0 ; and s∗A (x) = x0
if sA (x) = x0 ;
s∗B (x) = x0
if sB (x) = x0 .
It is easy to observe that for the considered problems the following properties hold: 1. The value v(x0 ) of the ergodic cyclic game on network (G, XA , XB , c, x0 ) is nonnegative if and only if the value v(x0 ) of the dynamic c-game on network (G, XA , XB ∪ {x0 }, c, x0 , x0 ) is nonnegative; moreover v(x0 ) = 0 if and only if v(x0 ) = 0. 2. If M 1 = min ce and M 2 = max ce , then M 1 ≤ v(x0 ) ≤ M 2 . e∈E
e∈E
2.6 Cyclic Games with Random States’ Transitions of the Dynamical System
117
3. If in the network (G, XA , XB , c, x0 ) the cost function c : E → R is changed by c = c + h, then the optimal strategies of players in the ergodic cyclic game on the network (G, XA , XB , c , x0 ) do not change although the value v(x0 ) is changed by v (x0 ) = v(x0 ) + h. On the basis of these properties we seek for the unknown value v(x0 ) = v(x), which we denote by h, using the dichotomy method on segment [M 1 , M 2 ] such that at each step of this method we will solve a dynamic c-game with network (G, XA , XB ∪ {x0 }, ch , x0 , x0 ), where ch = c − h. So, the main idea of the general step of the algorithm is the following: We make a transformation ck = c − hk
for e ∈ E,
where hk is midpoint of segment [Mk1 , Mk2 ] at step k. After that we apply the algorithm from Section 2.3.5 for the dynamic c-game on network (G, XA , XB ∪ {x0 }, ck , x0 , x0 ) and find vhk (x0 ). If vhk (x0 ) > 0 then we fix M 1 +M 2 1 2 1 2 , Mk+1 ], where Mk+1 = Mk1 and Mk+1 = k 2 k ; otherwise segment [Mk+1 M 1 +M 2
1 2 = k 2 k and Mk+1 = Mk2 . If vhk (x0 ) = 0 then STOP. we put Mk+1 So, using a dichotomy method in an analogous way as for an acyclic lgame we determine the value of the acyclic game. If this value of the dynamic c-game is known then we determine strategies of the players by using the algorithms from Section 2.3.4 or Section 2.3.5. In the case that x0 may not belong to the max-min cycle determined by the optimal strategies of the players in a cyclic game we solve |X| problems by fixing each time a starting position x0 = x for x ∈ X. Then at least for one position x0 = x ∈ X we obtain the value of the cyclic game and the optimal strategies of the players.
2.6 Cyclic Games with Random States’ Transitions of the Dynamical System In cyclic games with random states’ transitions of a dynamical system the network consisting of sets of states XA and XB of the players A and B also contains states x ∈ XD for which on the set of transitions E + (x) it is given a distribution function Π(x, y) = 1, Π(x, y) ≥ 0 for y ∈ XG (x), y∈XG (x)
i.e. Π(x, y) represents the probability of system L to pass from state x to state y ∈ XG (x). We denote the network by (G, XA , XB , XD , c, x0 ), where XA is the set of positions of the first player, XB is the set of positions of the second player and XD represents the set of positions with random states’ transitions of the dynamical system.
118
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
This stochastic game model comprises the cyclic game from Section 2.3 (XD = ∅), the Marcov process with income (XA = ∅, XB = ∅) [42] and the control Marcov chains with income (XB = ∅) [42]. The payoff function H x0 : SA × SB → R in the cyclic game with random states’ transitions of the dynamical system is defined as follows: Let sA : x → y ∈ XG (x) for x ∈ XA ; sB : x → y ∈ XG (x) for x ∈ XB be arbitrary stationary strategies of the players A and B, respectively. If the players A and B fix their strategies then we can consider that they use moves on the set of transitions E + (x) with probabilities 1, if y = sA (x) Π(x, y) = for x ∈ XA ; 0, if y = sA (x) Π(x, y) =
1, if y = sB (x) 0, if y = sB (x)
for x ∈ XB .
This means that we have a full stochastic case, i.e. we have a Marcov process with income, where the probabilities Π(x, y) and the costs for arbitrary states’ transitions are given. It is well known that for such a process there exists the mean income and we denote this income by H x0 (sA , sB ). If for the given version of a stochastic game we denote ext(c, x) = Π(x, y) c(x, y); p(x) = Π(x, y)p(y), y∈XG (x)
y∈XG (x)
then Theorem 1.9 holds. This involves that max min H x0 (sA , sB ) = min max H x0 (sA , sB ).
sA ∈SA sB ∈SB
sB ∈SB sA ∈SA
2.7 A Nash Equilibria Condition for Cyclic Games with p Players A cyclic game with p players is determined by a network (G, X1 , X2 , , . . . , Xp , c1 , c2 , . . . , cp , x0 ), where G = (X, E) is a directed graph in which every vertex x ∈ X has at least one leaving edge e = (x, y) ∈ E. A partition X = X1 ∪ X2 ∪ · · · ∪ Xp (Xi ∩ Xj = ∅, i = j) on the vertex set X is given and p functions c1 : E → R1 ; c2 : E → R1 ; . . . ; cp : E → R1 on the edge set E are defined. The strategies of the players si : x → y ∈ XG (x)
for x ∈ Xi , i = 1, p
2.7 A Nash Equilibria Condition for Cyclic Games with p Players
119
i
and the payoff functions H x0 : S1 × S2 × · · · × SP → R, i = 1, p, in the cyclic game with p players are defined in an analogous way as for the zero-sum cyclic game from Section 2.5. Denote by Gs = (X, Es ) a subgraph of G, generated by fixed strategies s1 , s2 , . . . , sp of the players 1, 2, . . . , p. Then Gs contains a unique directed cycle Cs , which can be reached from a given starting position i x0 through the edges e ∈ Es . The values H x0 (s1 , s2 , . . . , sp ) are considered to be equal to the mean edges’ costs of cycle Cs , i.e. i
H x0 (s1 , s2 , . . . , sp ) =
1 n(Cs )
cie ,
e∈E(Cs )
where n(Cs ) is the number of edges of cycle Cs and E(Cs ) is the set of edges of this cycle. In the considered game we are seeking for a Nash equilibrium, i.e. it is necessary to find strategies s∗1 , s∗2 , . . . , s∗p , for which i
H x0 (s∗1 , s∗2 , . . . , s∗i−1 , s∗i , s∗i+1 , . . . , s∗p ) ≤ i
H x0 (s∗1 , s∗2 , . . . , s∗i−1 , si , s∗i+1 , . . . , s∗p ) ∀si ∈ Si ,
(2.10)
i = 1, p.
Intuitively, it is clear that for cyclic games with p players Nash equilibria may not exist. An example, for which Nash equilibria in a cyclic game of two players (with maximum criteria) do not exist, is given in [40]. This example is related to a cyclic game on a complete bipartite graph G = (X1 ∪ X2 , E) with set of positions X1 = {x1 , x2 , x3 } of the first player and set of positions X2 = {y1 , y2 , y3 } of the second player; E = {(xi , yj ) | i = 1, 3, j = 1, 3}. The cost functions of the players on the edges (in both directions) are defined by the matrices ⎛
0 C1 = ⎝ ε 0
0 0 ε
⎞ 1 0⎠; 0
⎛
1 C2 = ⎝ 0 1−ε
0 1 0
⎞ 0 0⎠ 1
If ε is a small value (for example ε = 0.1) then a Nash equilibrium for such a game does not exist. Here we formulate a necessary and sufficient condition for the existence of Nash equilibria in so-called ergodic cyclic games with p players, which extend zero-sum ergodic cyclic games.
120
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
Definition 2.27. Let s∗1 , s∗2 , . . . , s∗p be a solution in the sense of Nash for a cyclic game determined by network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 ), where G = (X, E) is a strongly connected directed graph. We call this game an ergodic cyclic game if s∗1 , s∗2 , . . . , s∗p represents the solution in the sense of Nash for a cyclic game on network (G, X1 , X2 , . . . , Xp , c1 , c2 , p . . . , c , x) with an arbitrary starting position x ∈ X and i
i
H x (s∗1 , s∗2 , . . . , s∗p ) = H y (s∗1 , s∗2 , . . . , s∗p ),
∀x, y ∈ X, i = 1, p.
Theorem 2.28. The dynamic c-game determined by the network (G, X1 , X2 , , . . . , Xp , c1 , c2 , . . . , cp , x0 ), where G = (X, E) is a strongly connected directed graph, is ergodic one if and only if on X there exist p real functions ε1 : X → R1 , ε2 : X → R1 , . . . , εp : X → R1 , and p values v 1 , v 2 , . . . , v p such that the following conditions are satisfied: a) εi (x) − εi (y) + ci(x,y) − v i ≥ 0,
∀(x, y) ∈ Ei ,
where Ei = {e = (x, y) ∈ E | x ∈ Xi }, i = 1, p; b) miny∈XG (x) {εi (x) − εi (y) + ci(x,y) − v i } = 0, ∀x ∈ Xi , i = 1, p; 0
0
0
c) the subgraph G = (X, E ) generated by edge set E = E10 ∪ E20 ∪ ∪ · · · ∪ Ep0 , Ei0 = {e = (x, y) ∈ Ei | εi (x) − εi (y) + ci(x,y) − v i = 0}, i = 1, p, 0
0
has the property that it contains a connected subgraph G = (X, E ), for 0 which every vertex x ∈ X has only one leaving edge e = (x, y) ∈ E and besides that 0
εi (x) − εi (y) + ci(x,y) − v i = 0,
∀(x, y) ∈ E , i = 1, p.
The optimal solution of the problem can be determined by fixing the maps: s∗1 : x → y ∈ XG0 (x)
for
x ∈ X1 ;
s∗2 : x → y ∈ XG0 (x)
for
x ∈ X2 ;
.. . s∗p : x → y ∈ XG0 (x) 0
where XG0 (x) = {y | (x, y) ∈ E }.
for
x ∈ Xp ,
2.7 A Nash Equilibria Condition for Cyclic Games with p Players
121
Proof. =⇒ Let (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 ) be a network which determines an ergodic cyclic game with p players, i.e. in this game there exists a Nash equilibrium s∗1 , s∗2 , . . . , s∗p which satisfies condition (2.10). Define q i = Hxi 0 (s∗1 , s∗2 , . . . , s∗p ),
i = 1, p.
(2.11)
It is easy to verify that if we change the cost function ci by ci = ci − q , i = 1, p, then the obtained network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 ) determines a new ergodic cyclic game which is equivalent to the initial one. For the new game s∗1 , s∗2 , . . . , s∗p is a Nash equilibrium and i
Hxi 0 (s∗1 , s∗2 , . . . , s∗p ) = 0,
i = 1, p.
This game can be regarded as the dynamic c-game from [5, 56] on network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , x0 ) with given starting position x0 and final position x0 ∈ Cs∗ , where Cs∗ is a directed cycle generated by strategies s∗1 , s∗2 , . . . , s∗p such that
cie = 0,
i = 1, p.
e∈E(Cs∗ )
Taking into account that our game is ergodic we may state, without loss of generality, that x0 belongs to a directed cycle Cs∗ generated by strategies s∗1 , s∗2 , . . . , s∗p . Therefore, the ergodic game with network (G, X1 , X2 , . . . , Xp , , c1 , c2 , . . . , cp , x0 ) can be regarded as the dynamic c-game from [5, 56] on network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , x0 ) with starting position x0 and final position x0 . So, according to Theorem 2 from [5] there exist p real functions ε1 : X → R1 , ε2 : X → R1 , . . . , εp : X → R1 , such that the following conditions are satisfied: 1) εi (x) − εi (y) + ci(x,y) ≥ 0, ∀(x, y) ∈ Ei , i = 1, p; 2) minx∈XG (x) {εi (x) − εi (y) + ci(x,y) } = 0, ∀x ∈ Xi , i = 1, p; 3) the subgraph G0 = (X, E 0 ), generated by the edge set E 0 = E10 ∪ E20 ∪ · · · ∪ Ep0 , Ei0 = {e = (x, y) ∈ Ei | εi (x) − εi (y) + ci(x,y) = 0}, i = 1, p, 0
0
has the property that it contains a connected subgraph G = (X, E ), for 0 which every vertex x ∈ X has only one leaving edge e = (x, y) ∈ E and besides that εi (x) − εi (y) + ci(x,y) = 0,
0
∀(x, y) ∈ E , i = 1, p.
If in the conditions 1), 2) and 3) mentioned above we take into account that ci(x,y) = ci(x,y) − q i , ∀(x, y) ∈ E, i = 1, p, then we obtain the conditions a), b) and c) from Theorem 2.28.
122
2 Max-Min Control Problems and Solving Zero-Sum Games on Networks
⇐= Assume that the conditions a), b) and c) of Theorem 2.28 hold. Then for network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 ) the conditions 1), 2) and 3) are satisfied. It is easy to check that an arbitrary set of strategies s∗1 , s∗2 , . . . , s∗p , where s∗i : x → y ∈ XG0 (x), i = 1, p, is a Nash equilibrium for an ergodic cyclic game on network (G, X1 , X2 , , . . . , Xp , c1 , c2 , . . . , cp , x0 ) and i
H x0 (s∗1 , s∗2 , . . . , s∗p ) = 0,
i = 1, p.
This involves that s∗1 , s∗2 , . . . , s∗p determine a Nash equilibrium for the er godic cyclic game on network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 ). Remark 2.29. The value v i , i = 1, p, coincides with the value of the payoff i function H x (s∗1 , s∗2 , . . . , s∗p ), i = 1, p. If v i = 0, then the ergodic cyclic game coincides with the dynamic c-game on network (G, X1 , X2 , . . . , Xp , c1 , c2 , . . . , cp , x0 , x0 ). Note that for ergodic zero-sum games Theorem 2.28 becomes a necessary and sufficient condition of the existence of Nash equilibria, i.e. we obtain Theorem 2.20. Some extension of cyclic games for stochastic cases has been considered in [16, 36, 42].
2.8 Determining Pareto Optima for Cyclic Games with p Players To determine a Pareto solution for a cyclic game with p players we can use the continuous model from Section 2.5.2 and extend it for the multi-objective case of the problem in the following way: Minimize the vector function 1 2 p H(α) = H (α), H (α), . . . , H (α) subject to
⎧ ⎪ αe − αe = 0, ⎪ ⎪ ⎪ − (x) ⎪ e∈E + (x) ⎨ e∈E αe = 1; ⎪ ⎪ ⎪ e∈E ⎪ ⎪ ⎩ αe ≥ 0, e ∈ E,
where i
H (α) =
e∈E
cie αe ,
i = 1, p;
∀x ∈ X;
2.8 Determining Pareto Optima for Cyclic Games with p Players
E − (x) = {e = (y, x) | (y, x) ∈ E};
123
E + (x) = {e = (x, y) | (x, y) ∈ E}.
Pareto optima for this multi-criteria problem can be found by using the approach from [10, 11, 12, 23, 97]. Solutions of this continuous problem will correspond to solutions of the discrete multi-criteria problem on a given strongly connected graph G = (X, E) with cost functions ci : E → R, i = 1, p. Note that a Pareto solution for the cyclic game with p players on G does not depend on the partition X = X1 ∪ X2 ∪ · · · ∪ Xp .
3 Extension and Generalization of Discrete Control Problems and Algorithmic Approaches for its Solving
The aim of this chapter is to extend methods and algorithms from the previous chapters to more general classes of problems. We describe a class of discrete control problems for which a dynamic programming technique can be used efficiently. The results from the first chapter are generalized for the case of control problems with varying time of states’ transitions of dynamical systems. Additionally, we consider a control problem with an algorithmically defined objective function. We show that the concept of multi-objective games for the considered class of control problems can be applied and we propose a new algorithm for determining optimal strategies of the players.
3.1 Discrete Control Problems with Varying Time of States’ Transitions of the Dynamical System
In the previous chapters we have studied discrete control models with fixed unit time of states’ transitions of the dynamical system by a trajectory. Now we extend these models and consider discrete control problems when the time of the system’s passage from one state to another may be different from 1 and may vary in the control process. We assume that the time of states’ transitions depends on the vectors of control parameters belonging to a feasible set defined for an arbitrary state at every discrete moment in time. In this section we show that the algorithms from Chapter 1 can be specified and generalized for the problems with varying time of states’ transitions of the dynamical system. First of all we extend these results for the single-objective control problem from Section 1.1.1.
126
3 Extension and Generalization of Discrete Control Problems
3.1.1 The Single-Objective Control Problem with Varying Time of States’ Transitions of the Dynamical System Consider a dynamical system L with a finite set of states X, where at every discrete moment in time t = 0, 1, 2, . . . the state of L is x(t) ∈ X. The starting state x0 = x(0) and the final state xf are fixed. Assume, that the dynamical system should reach final state xf at time moment T (xf ) such that T1 ≤ T (xf ) ≤ T2 where T1 and T2 are given. The control of the time-discrete system L at each time-moment t = 0, 1, 2, . . . for an arbitrary state x(t) is made by applying the vector of control parameter u(t) for which a feasible set Ut (x(t)) is given, i.e. u(t) ∈ Ut (x(t)). Additionally, we assume that for arbitrary t and x(t) on Ut (x(t)) it is defined an integer function τx(t) : Ut (x(t)) → N which associates with each control u(t) ∈ Ut (x(t)) an integer value τx(t) (u(t)). This value represents the time of the system’s passage from state x(t) to state x t + τx(t) (u(t)) if the control u(t) ∈ Ut (x(t)) has been applied at the moment t for a given state x(t). Assume that the dynamics of the system is described by the following system of difference equations ⎧ tj+1 = tj + τx(tj ) (u(tj )) ; ⎪ ⎪ ⎪ ⎨ x(tj+1 ) = gtj (x(tj ), u(tj )) ; (3.1) ⎪ ⎪ ⎪ u(tj ) ∈ Utj (x(tj )) ; ⎩ j = 0, 1, 2, . . . , where t0 = 0,
x(t0 ) = 0
(3.2)
is a starting representation of dynamical system L. We suppose that the functions gtj and τx(tj ) in (3.1) are known and tj+1 and x(tj+1 ) are determined uniquely by x(tj ) and u(tj ) at every step j = 0, 1, 2, . . . . Let u(tj ), j = 0, 1, 2, . . . , be a control, which generates a trajectory x(0), x(t1 ), x(t2 ), . . . , x(tk ), . . . Then, either this trajectory passes through final state xf and T (xf ) = tk represents the time-moment when this final state xf is reached or this trajectory does not pass trough xf . For an arbitrary control we define the quantity
3.1 Discrete Control Problems with Varying Time of States’ Transitions
Fx0 ,xf (u(t)) =
k−1
ctj x(tj ), gtj (x(tj ), u(tj ))
127
(3.3)
j=0
if the trajectory x(0), x(t1 ), x(t2 ), . . . , x(tk ), . . . passes through final state xf at time-moment tk = T (xf ) such that T1 ≤ T (xf ) ≤ T2 ; otherwise we put Fx0 ,xf (u(t)) = ∞. Here ctj x(tj ), gtj (x(tj ), u(tj )) = ctj (x(tj ), x(tj+1 )) represents the cost of system L to pass from state x(tj ) to state x(tj+1 ) at the stage [j, j + 1]. We consider the following control problem: Problem 3.1. Find time-moments t = 0, t1 , t2 , . . . , tk−1 and vectors of control parameters u(t0 ), u(t1 ), u(t2 ), . . . , u(tk−1 ) that satisfy the conditions (3.1), (3.2) and minimize functional (3.3). In the following we develop a mathematical tool for solving this problem. We show that a simple modification of the time-expanded network method from Chapter 1 allows us to elaborate an efficient algorithm for solving the considered problem. 3.1.2 An Algorithm for Solving a Single-Objective Control Problem with Varying Time of States’ Transitions of the Dynamical System Here we develop a dynamic programming algorithm for solving Problem 3.1 from Section 3.1.1 in the case that T is fixed, i.e. T1 = T2 = T . The proposed algorithm can be discussed in the same way as the algorithm from Section 3.1.1. We denote by Fx∗0 x(tk ) the minimal integral-time cost of the system’s passage from starting state x0 = x(0) to state x = x(tk ) ∈ X by using exactly tk units of time. So, Fx∗0 x(tk ) = where
k−1
ctj x∗ (tj ), gtj (x∗ (tj ), u∗ (tj ))
j=1
x(0) = x∗ (0), x∗ (t1 ), x∗ (t2 ), . . . , x∗ (tk−1 ), x(tk )
is optimal trajectory from x0 = x∗ (0) to x∗ (tk ), generated by an optimal control u∗ (0), u∗ (t1 ), u∗ (t2 ), . . . , u∗ (tk−1 ) where t0 = 0; tj+l = tj + τx∗ (tj ) (u∗ (tj )),
j = 0, 1, 2, . . . , k − 1.
128
3 Extension and Generalization of Discrete Control Problems
If for a given x ∈ X exists no trajectory from x0 to x = x(tk ) such that x may be reached by using tk units of time, then we put Fx∗0 ,x(tk ) = ∞. For Fx∗0 ,x(tk ) the following recursive formula can be gained:
Fx∗0 x(tj )
⎧ minx(tj−1 )∈X − (x(tj )) {Fx∗0 x(tj−1 ) + ctj−1 (x(tj−1 ), x(tj ))}, ⎪ ⎪ ⎪ ⎪ ⎨ if X − (x(tj )) = ∅; = ⎪ ⎪ ⎪ ⎪ ∞, if X − (x(tj )) = ∅, ⎩ j = 1, 2, . . . ,
where t0 = 0, Fx∗0 ,x(0) =
0, if x(0) = x0 ; ∞, if x(0) = x0 ,
and X − (x(tj )) = {x(tj−1 ) ∈ X | x(tj ) = gtj−1 (x(tj−1 ), u(tj−1 )), , tj = tj−1 + τx(tj−1 ) (u(tj−1 )), u(tj−1 ) ∈ Utj−1 (x(tj−1 ))}. In this procedure the most complicated aspect is concerned with the determination of set X − (x(tj )) for a given state x(tj ) ∈ X at time moment tj . In order to determine X − (x(tj )) we have to verify if for an arbitrary state x(tj−1 ) at time moment tj−1 there exists an admissible vector u(tj−1 ) ∈ Ut−1 (x(tj−1 )) such that x(tj ) = gtj−1 (x(tj−1 ), u(tj−1 )). If Fx∗0 x(t) , tj = 0, 1, 2, . . . , T , are known then the optimal control u∗ (0), u∗ (t1 ), u∗ (t2 ), . . . , u∗ (tk−1 ) and the optimal trajectory x(0) = x∗ (0), x∗ (t1 ), x∗ (t2 ), . . . , x∗ (tk−1 ), x(tk ) = x(T ) from x0 to xf can be obtained in the following way: Find tk−1 , u∗ (tk−1 ) and x∗ (tk−1 ) ∈ X − (x(tk )) such that Fx∗0 x∗ (tk ) = Fx∗0 x∗ (tk−1 ) + ctk−1 (x∗ (tk−1 ), gtk−1 (x∗ (tk−1 ), u∗ (tk−1 ))}, where tk = tk−1 + τx∗ (tk−1 ) (u∗ (tk−1 )). After that, find tk−2 , u∗ (tk−2 ) and x∗ (tk−2 ) ∈ X − (x(tk−1 )) such that Fx∗0 x∗ (tk−1 ) = Fx∗0 x∗ (tk−2 ) + ctk−2 (x∗ (tk−2 ), gtk−2 (x∗ (tk−2 ), u∗ (tk−2 ))}, where tk−1 = tk−2 + τx∗ (tk−2 ) (u∗ (tk−2 )). Using k − 1 steps we find the optimal control u∗ (0), u∗ (t1 ), u∗ (t2 ), ,. . . , u∗ (tk−1 ) and the trajectory x(0), x∗ (t1 ), x∗ (t2 ), . . . , x∗ (tk−1 ), x(tk ) = x(T ).
3.1 Discrete Control Problems with Varying Time of States’ Transitions
129
In order to develop an algorithm we shall use the time-expanded network from Section 1.9.1 with a simple modification. At the beginning, we ground the algorithm when T2 = T1 = T and then we show that the general case of the problem with T2 > T1 can be reduced to the case with a fixed T . Assume that T2 = T1 = T and construct a time-expanded network with the structure of an acyclic directed graph G = (Y, , E), where Y consists of T + 1 copies of the set of states X corresponding to time moments t = 0, 1, 2, . . . , T . So, Y = Y 0 ∪ Y 1 ∪ Y 2 ∪ · · · ∪ Y T (Y t ∩ Y l = ∅, t = l), where Y t = (X, t) corresponds to the set of states of the dynamical system at time moment t = 0, 1, 2, . . . , T . This means that Y t = {(x, t) | x ∈ X}, t = 0, 1, 2, . . . , T . The graph G is represented in Fig. 3.1, where at each moment in time t = 0, 1, 2, . . . , T we can get all copies of vertex set X. The set of edges E of graph G is defined in the following way: If at a given moment in time tj ∈ [0, T ] for a given state x = x(tj ) of the dynamical system there exists a vector of control parameters u(tj ) ∈ Utj (x(tj )) such that z = x(tj+1 ) = gtj (x(tj ), u(tj )) , where tj+1 = tj + τx(tj ) (u(tj )) , then ((x, tj ), (z, tj+1 )) ∈ E, i.e. in G we connect the vertex yj = (x, tj ) ∈ Y tj with the vertex yj+1 = (z, tj+1 ) (see Fig. 3.1). With edge e = ((x, tj ), (z, tj+1 )) we associate in G a cost ce = ctj (x(tj ), x(tj+1 )). t =0
t = tj
t =1
t = t j +1
c
e
t=T
yj+1 =(z,tj+1)
y j = (x, t j )
t = T −1
τ (u(t j ))
yf = (xf ,T)
y0 = (x0 ,0)
Fig. 3.1.
130
3 Extension and Generalization of Discrete Control Problems
The following lemma holds: Lemma 3.2. Let u(t0 ), u(t1 ), u(t2 ), . . . , u(tk−1 ) be a control of the dynamical system in Problem 3.1, which generates a trajectory x0 = x(t0 ), x(t1 ), x(t2 ), . . . , x(tk ) = xf from x0 to xf , where t0 = 0,
tj+1 = tj + τx(tj ) (u(tj )) , j = 0, 1, 2, . . . , k − 1;
u(tj ) ∈ Ut (x(tj )) ,
j = 0, 1, 2, . . . , k − 1;
tk = T. Then in G there exists a directed path PG (y0 , yf ) = {y0 = (x0 , 0), (x1 , t1 ), (x2 , t2 ), . . . , (xk , T ) = yf } from y0 to yf , where xj = x(tj ),
j = 0, 1, 2, . . . , k;
and x(tk ) = xf , i.e. t(xf ) = tk = T . So, between the set of states of trajectory x0 = x(t0 ), x(t1 ), x(t2 ), . . . , x(tk ) = xf and the set of vertices of directed path PG (y0 , yf ) there exists a bijective mapping (xj , tj ) ⇔ x(tj ),
j = 0, 1, 2, . . . , k
such that xj = x(tj ), j = 0, 1, 2, . . . , k and k−1 j=0
ctj (x(tj ), x(tj+1 )) =
k−1
c((xj ,tj ),(xj+1 ,tj+1 )) ,
j=0
where t0 = 0, x0 = x(t0 ), and xf = x(tk ), tk = T . Proof. In Problem 3.1 an arbitrary control u(tj ) at a given moment in time tj for a given state x(tj ) ∈ Utj (x(tj )) uniquely determines the next state x(tj+1 ). So, u(tj ) can be identified with a unique passage (x(tj ), x(tj+1 )) from state x(tj ) to state x(tj+1 ). In G = (Y, E) this passage corresponds to a unique directed edge ((xj , tj ), (xj+1 , tj+1 )) which connects the vertices (xj , tj ) and (xj+1 , tj+1 ); the cost of this edge is c((xj ,tj ),(xj+1 ,tj+1 )) = ctj (x(tj ), x(tj+1 )). This one-to-one correspondence between control u(tj ) at a given moment in time and directed edge e = ((xj , tj ), (xj+1 , tj+1 )) ∈ E implies the existence of a bijective mapping between the set of trajectories from starting state x0 to final state xf in Problem 3.1 and the set of directed paths from y0 to yf in G, which preserve the integral-time costs.
3.1 Discrete Control Problems with Varying Time of States’ Transitions
131
Corollary 3.3. If u∗ (tj ), j = 0, 1, 2, . . . , k − 1 is optimal control of the dynamical system in Problem 3.1, which generates a trajectory x0 = x∗ (0), x∗ (t1 ), x∗ (t2 ), . . . , x∗ (tk ) = xf from x0 to xf , then in G the corresponding directed path PG∗ (y0 , yf ) = {y0 = (x0 , 0), (x∗1 , t1 ), (x∗2 , t2 ), . . . , (x∗k , tk ) = yT } is the minimal integral cost directed path from y0 to yf and vice-versa. On the basis of the results mentioned above we can propose the following algorithm for solving Problem 3.1: Algorithm 3.4. Determining the Optimal Solution of Problem 3.1 Based on the Time-Expanded Network Method 1. We construct an auxiliary time-expanded network consisting of a directed acyclic graph G = (Y, E), a cost function c : E → R1 and given starting and final vertices y0 and yf . ∗ 2. We find in G the directed path PG (y0 , yf ) from starting vertex y0 to final vertex yf with minimal sum of the edges’ costs. 3. We determine the control u∗ (tj ), j = 0, 1, 2, . . . , k−1, which corresponds to ∗ a directed path PG (y0 , yf ) from y0 to yf . Then u∗ (tj ), j = 0, 1, 2, . . . , k−1, is a solution of Problem 3.1. Remark 3.5. Algorithm 3.4 finds a solution of the control problem with fixed time T (xf ) = T of the system’s passage from a starting state to a final state. In case T1 ≤ T (xf ) ≤ T2 (T2 > T1 ) Problem 3.1 can be solved by the reduction to T2 − T1 + 1 problems with T = T1 , T1 + 1, . . . , T2 , respectively, and finding the best solution of these problems. This means that the problem with T1 = T2 can be solved on another graph G = (Y, E) which is obtained from G = (Y, E) by adding a new vertex z f and directed edges ((x, t), z f ), t = T1 , T1 +1, . . . , T2 , where c((x,t),zf ) = 0. An arbitrary directed path from starting vertex (x0 , t) to final vertex z f in G corresponds to an optimal trajectory of the control problem from the starting state x0 to the final state xf . In general, if we construct an auxiliary acyclic directed graph G = (Y, E) with T = T2 then in G the tree of an optimal path from starting vertex y0 = (x0 , 0) to an arbitrary vertex y = (x, t) ∈ Y can be found. This tree allows us to find a solution of the control problem with given starting state and an arbitrary state x = x(t) with t = 0, 1, 2, . . . , T2 ; in particular the solution of Problem 3.1 with T1 ≤ T (xf ) ≤ T2 can be obtained. ∗ ∗ ∗ Denote by GTy0 = (Y , Ey0 ) the tree of optimal directed paths with root vertex y0 = (x0 , 0), which gives all optimal directed paths from y0 to an arbitrary attainable vertex y = (x, t) ∈ Y .
132
3 Extension and Generalization of Discrete Control Problems
As we have noted, this tree allows us to find in the control problem all optimal trajectories from starting state x0 = x(0) to an arbitrary reachable state x = x(t) at a given moment in time t ∈ [0, T ]. In G we can also find the tree of an optimal directed path GTy0f = (Y 0 , Ey00 ) with sink vertex yf = (xf , T ), which gives all possible optimal directed paths from an arbitrary y ∈ (x, t) ∈ Y to a sink vertex yf = (xf , T ). This means that in the control problem we can find all possible optimal trajectories with starting state x = x(t) at a given moment in time t ∈ [0, T ] to a final state xf = x(T ). ∗
∗
∗
If the trees GTy0 = (Y , Ey0 ) and GTy0f = (Y 0 , Ey0f ) are known then we can solve the following control problem: Find an optimal trajectory from starting state x0 = x(0) to final state xf = x(T ) such that the trajectory passes at a given moment in time t ∈ [0, T ] through state x = x(t). Finally, we note that Algorithm 3.4 can be simplified if we delete from G all vertices y ∈ Y which are not attainable from y0 and vertices y ∈ Y for which there is no directed path from y to yf . So, we should solve the auxiliary 0 0 0 problem on a new graph G = (Y , E ) which is a subgraph of G = (Y, E).
3.1.3 The Discrete Control Problem with Cost Functions of System’s Passages that Depend on the Transition-Time of States’ Transitions In the control model from Section 3.1.1 the cost function ct x(t), gt (x(t), u(t)) = ct (x(t), x(t + 1)) of the system’s passage from state x = x(t) to state y = x(t + 1) depends on time t, on state x = x(t) and on the vector of control parameters u(t). In general, we may assume that the cost function of the system’s passage from state x(t) to state x(t +1) depends also on the transition time τx(t) (u(t)), i.e. the cost function ct x(t), gt (x(t), u(t)) , τx(t) (u(t)) = ct x(t), x(t + 1), τx(t) (u(t)) depends on t, x(t), u(t) and τx(t) (u(t)). It is easy to observe that the problem in such general form can be solved in an analogous way as the problem from Section 3.1.1 by using Algorithm 3.4 with a simple modification. In the auxiliary time-expanded network the cost functions ce on edges e should be defined as follows: ce = ctj x(tj ), x(tj+1 ), τx(tj ) (u(tj )) . So, the problem with cost functions of the system’s passage that depend on the transition-time of states’ transitions can be solved by using Algorithm 3.4 with the cost functions on the time-expanded network defined above.
3.2 The Control Problem on a Network with Transition-Time Functions
133
3.2 The Control Problem on a Network with Transition-Time Functions on the Edges We extend the control problem on the network from Section 1.4.1 by introducing the transition-time functions of states’ transition on the edges. 3.2.1 Problem Formulation Let be given a dynamical system L with a finite set of states X and a given starting point x0 = x(0). Assume that system L should be transferred into state xf at time moment T such that T1 ≤ T (xf ) ≤ T2 , where T1 and T2 are given. We consider the control problem for which the dynamics of the system is described by a directed graph G = (X, E), where the vertices x ∈ X correspond to the states and an arbitrary edge e = (x, y) ∈ E means the possibility of the system to pass from state x to state y at every moment in time t. Each edge e = (x, y) ∈ E is associated with a transition function τe (t) of the system’s passage from state x = x(t) to state y. This means that if at time-moment t system L starts to pass from state x = x(t) through an edge e = (x, y) then the state y is reached at time-moment t + τe (t), i.e. y = x (t + τe (t)). Additionally, each edge e(x, y) ∈ E is associated with a cost function ce (t) that depends on time and expresses the cost of system L to pass from state x = x(t) to state y = x (t + τe (t)). The control on G with given transition-time functions τe on the edges e ∈ E is made in the following way: For a given starting state x0 we fix t0 = 0. Then we select a directed edge e0 = (x0 , x1 ) through which we transfer the system L from state x0 = x(t0 ) to state x1 = x(t1 ) at the moment in time t1 , where t1 = t0 + τe0 (0). If x1 = xf then STOP; otherwise we select an edge e1 = (x1 , x2 ) and transfer system L from state x1 = x(t1 ) at the moment in time t1 to state x2 = x(t2 ) at time moment t2 = t1 + τe1 (t1 ). If x2 = xf then STOP; otherwise select an edge e2 = (x2 , x3 ) and so on. In general, at the time moment tk−1 we select an edge ek−1 = (xk−1 , xk ) and transfer system L from state xk−1 = x(tk−1 ) to state xk = x(tk ) at time-moment tk = tk−1 + τek . If xk = xf then the integral-time cost of the system’s passage from x0 to xf is Fx0 xk (tk ) =
k−1
c(x(tj ),x(tj+1 )) (tj ).
j=0
So, at time moment tk system L is transferred in state xk = xf with the integral-time cost Fx0 xf (tk ). If T ≤ tk ≤ T2 , we obtain an admissible control with tk = T (xf ) and integral-time cost Fx0 xf (T (xf )). We consider the following problem:
134
3 Extension and Generalization of Discrete Control Problems
Problem 3.6. Find a sequence of system’s transitions (xj , xj+1 ) = (x(tj ), x(tj+1 )) ,
tj+1 = tj + τ(xj ,xj+1 ) (tj ), j = 0, 1, 2, . . . , k − 1,
which transfer system L from starting vertex (state) x0 = x(t0 ), t0 = 0, to final vertex (state) xf = xk = x(tk ), such that T ≤ tk ≤ T 2 and the integral-time cost Fx0 xf (tk ) =
k−1
c(xj ,xj+1 ) (tj )
j=0
of system’s transitions by a trajectory x0 = x(t0 ), x(t1 ), x(t2 ), . . . , x(tk ) = xf is minimal. 3.2.2 An Algorithm for Solving the Problem on a Network with Transition-Time Functions on the Edges The algorithm from Section 3.1.2 can be specified for solving the control problem on a network with transition-time functions on the edges. Assume that T2 = T1 = T and describe the details of the algorithm for the control problem on G. The dynamic programming algorithm for solving the control problem on a network with transition-time functions on edges can be characterized in the same way as the algorithm from Section 1.1.4. We denote by Fx∗0 x (tk ) the minimal integral-time cost of system transitions from starting state x0 to final state x = x∗ (tk ) by using tk units of time, i.e. Fx∗0 x (tk ) where
k−1
=
c(x(tj ),x(tj+1 )) (tj ),
j=0
x0 = x∗ (0), x∗ (t1 ), x∗ (t2 ), . . . , x∗ (tk ) = x
is an optimal trajectory from x0 to xf , where tj+1 = tj + τ(x(tj ),x(tj+1 )) (tj ),
j = 0, 1, 2, . . . , k − 1.
It is easy to observe that the following recursive formula for Fx∗0 x (tk ) holds: Fx∗0 x(tj ) (tj ) =
min
{Fx∗0 x(tj−1 ) (tj−1 ) + c(x(tj−1 ),x(tj )) (tj−1 )},
− x(tj−1 )∈XG (x(tj ))
3.2 The Control Problem on a Network with Transition-Time Functions
where
135
− (tj ) = {x = x(tj−1 ) (x(tj−1 ), x(tj )) ∈ E, XG tj = tj−1 + τ(x(tj−1 ),x(tj )) (tj−1 )}.
This means that if we start with Fx∗0 x(0) (0) = 0, Fx∗0 x(t) (t) = ∞, t = 1, 2, . . . , tk , then on the basis of the recursive formula given above we can find Fx∗0 x(t) (t) for t = 0, 1, 2, . . . , tk for an arbitrary vertex x = x(t). After that the optimal trajectory from x0 = x∗ (0), x∗ (t1 ), x∗ (t2 ), . . . , x∗ (tk ) = xf from x0 to xf can be obtained in the following way: Fix vertex x∗k−1 = x∗ (tk−1 ) for which Fx∗0 x∗ (tk−1 ) (tk−1 ) + c(x(tk−1 ),x∗ (tk )) (tk−1 ) = =
min
{Fx∗0 x(tk−1 ) (tk−1 ) + c(x(tk−1 ),x∗ (tk )) (tk−1 )}.
− x(tk−1 )∈XG (x∗ (tk ))
Then we find vertex x∗ (tk−2 ) for which Fx∗0 x∗ (tk−2 ) (tk−2 ) + c(x(tk−2 ),x∗ (tk−1 )) (tk−2 ) = =
min
− x(tk−2 )∈XG (x∗ (tk−1 ))
{Fx∗0 x(tk−2 ) (tk−2 ) + c(x(tk−2 ),x∗ (tk−1 )) (tk−2 )}.
After that, we fix vertex x∗ (tk−3 ) for which Fx∗0 x∗ (tk−3 ) (tk−3 ) + c(x(tk−3 ),x∗ (tk−2 )) (tk−3 ) = =
min
− x(tk−3 )∈XG (x∗ (tk−2 ))
{Fx∗0 x(tk−3 ) (tk−3 ) + c(x(tk−3 ),x∗ (tk−2 )) (tk−3 )}
and so on. Finally, we find the optimal trajectory x0 = x∗ (0), x∗ (t1 ), x∗ (t2 ), . . . , x∗ (tk ) = xf . This algorithm can also be grounded on the basis of the time-expanded network method. We give the construction which allows us to reduce our problem to an auxiliary one on a time-expanded network. The structure of this time-expanded network corresponds to a directed graph G = (Y, E) without directed cycles. The set of vertices Y of G consists of T + 1 copies of the set of vertices (states) X of the graph G corresponding to time-moments t = 0, 1, 2, . . . , T , i.e. Y = Y 0 ∪ Y 1 ∪ Y 2 ∪ ··· ∪ Y T
(Y t ∩ Y l = ∅, t = l),
where Y t = (X, t). So, Y t = {(x, t) | x ∈ X}, t = 0, 1, 2, . . . , T .
136
3 Extension and Generalization of Discrete Control Problems
We define the set of edges E of graph G as follows: e = ((x, tj ), (z, tj+1 )) ∈ E if and only if in G there exists a directed edge e = (x, y) ∈ E, where x = x(tj ), z = x(tj+1 , tj+1 = tj + τe (tj ). So, in G we connect vertices (x, tj ) and (z, tj+1 ) with a directed edge ((x, tj ), (z, tj+1 )) if (x, z) ∈ E and tj+1 = tj + τe (tj ). Each edge e = ((x, tj ), (z, tj+1 )) ∈ G we associate with the cost ce = cx,z (tj ), i.e. c((x,tj ),(z,tj+1 )) = c(x,z) (tj ). On G we consider the problem of finding a directed path from y0 = (x0 , 0) to yf (xf , T ) with minimum sum of the edges’ costs. Based on the results from Section 3.1.2 we obtain the following result: Lemma 3.7. Let (xj , xj+1 ) = (x(tj ), x(tj+1 )) ,
tj+1 = tj + τ(xj ,xj+1 ) (tj ), j = 0, 1, 2, . . . , k − 1
be a sequence of system’s transactions from state x0 = x(t0 ), t0 = 0, to state xf = xk = x(tk ), tk = T . Then in G = (Y, E) there exists a directed path PG (y0 , yf ) = {y0 = (x0 , 0), (x1 , t1 ), (x2 , t2 ), . . . , (xk , T ) = yf }, from y0 , to yf , where xj = x(tj ),
j = 0, 1, 2, . . . , k
(tk = T ).
So, between the set of vertices {x0 = x(t0 ), x(t1 ), x(t2 ), . . . , x(tk ) = xf } and the set of vertices of a directed path PG (y0 , yf ) there exists a bijective mapping (xj , tj ) ⇔ x(tj ) = xj ,
j = 0, 1, 2, . . . , k
such that xj = x(tj ), j = 0, 1, 2, . . . , k, and k−1 j=0
c(xj ,xj+1 ) (tj ) =
k−1
c((xj ,tj ),(xj+1 ,tj+1 )) ,
j=0
where t0 = 0, tj+1 = tj + τ(xj ,xj+1 ) (tj ), j = 0, 1, 2, . . . , k − 1. This lemma follows as a corollary from Lemma 3.2. The algorithm for solving the control problem on G is similar to Algorithm 3.4 from Section 3.1.2. So, the control problem on G can be solved in the following way: Algorithm 3.8. An Algorithm for Solving the Control Problem on a Network 1. We construct a network consisting of an auxiliary graph G = (Y, E), a cost function c : E → R and given starting and final states y0 = (x0 , 0), yf = (xf , t). ∗ 2. We find in G the directed path PG (y0 , yf ) from y0 to yf with minimal sum of the edges’ costs.
3.2 The Control Problem on a Network with Transition-Time Functions
137
3. We determine vertices xj = x(tj ), j = 0, 1, 2, . . . , k which correspond to ∗ vertices (xj , tj ) of a directed path PG (y0 , yf ) from y0 to yf . Then x0 = x(0), x1 = x(t1 ), x2 = x(t2 ), . . . , xk = x(tk ) = xf represents an optimal trajectory from x0 to xf in the control problem G. Remark 3.9. Algorithm 3.8 can be modified for solving an optimal control problem on a network when the cost functions on the edges e ∈ E depend not only on time t but also depend on the transition-time τe (t) of the system’s passage through edge e = (x(t), x(t + τe (t)). So, Algorithm 3.8 can be used for solving an control problem when for each edge e = (x, z) ∈ E it is given a cost function c(x,z) t, τ(x,z) (t) that depends on time t and on the transition-time τ(x,z) (t). The modification of the algorithm for solving a control problem on a network in such general form can be made in the same way as the modification of Algorithm 3.4 for the problem from Section 3.1.3. This means that the cost functions ce on the edges e = ((x, tj ), (z, tj+1 )) of graph G should be defined as follows: c((x,tj ),(z,tj+1 )) = c(x,z) tj , τ(x,z) (tj ) . Remark 3.10. Algorithm 3.8 can be simplified if we delete from G all vertices y ∈ Y , which are not attainable from y0 and all vertices for which there is no directed path from y to xf . So, the problem may be solved on a simplified 0 0 0 graph G = (Y , E ). Example. Consider the control problem on a network with the structure of a directed graph G = (X, E) given in Fig. 3.2.
1 0
3
2 Fig. 3.2.
The starting state corresponds to x0 = 0, the final state corresponds to xf = 3 and T = 3. The transition-time functions τe (t) and the cost functions ce (t) on the edges e ∈ E are defined in the following way: τ(0,1) (t) = τ(0,2) (t) = τ(1,2) (t) = 1;
τ(2,1) (t) = 2t2 − t;
138
3 Extension and Generalization of Discrete Control Problems
τ(1,3) (t) = τ(2,3) (t) =
2, t ≤ 1 ; 1, t > 1
c(0,1) (t) = 2;
c(0,2) (t) = 3; c(1,2) (t) = t + 1; 6, t ≤ 1 c(2,1) (t) = 2t; c(1,3) (t) = ; c(2,3) (t) = 2t. 2, t > 1 We are seeking for an optimal trajectory on G = (X, E) from starting state x0 = 0 to final state xf = 3 with T2 = T1 = T . The solution of the problem may be obtained by using Algorithm 3.8. 1. We construct an auxiliary network. So, we construct the graph G = (Y, E) (see Fig. 3.3)). This graph contains T + 1 = 3 + 1 = 4 copies of the vertex set X of Graph G, which correspond to the moments in time t = 0, t = 1, t = 2, t = 3. In G we fix the starting vertex (state) y0 = (x0 , 0) = (0, 0) and the final vertex (state) yf = (xf , T ) = (3, 3).
y0 = (0,0)
t =0
t =1
0
0
0
0
1
1
1
1
2
2
2
2
3
3
3
3
t=2
t =3
Fig. 3.3.
At time-moment t = 0 we obtain all leaving edges for the set Y 0 = (X, 0) in the following way: ((0, 0), (1, 1)) ∈ E corresponds to edge (0, 1) ∈ E with τ(0,1) (0) = 1; ((0, 0), (2, 1)) ∈ E corresponds to edge (0, 2) ∈ E with τ(0,2) (0) = 1; ((1, 0), (2, 1)) ∈ E corresponds to edge (1, 2) ∈ E with τ(1,2) (0) = 1; ((1, 0), (3, 2)) ∈ E corresponds to edge (1, 3) ∈ E with τ(1,3) (0) = 2; ((2, 0), (3, 2)) ∈ E corresponds to edge (2, 3) ∈ E with τ(2,3) (0) = 2.
3.2 The Control Problem on a Network with Transition-Time Functions
139
It is easy to check that for the rest of the edges e ∈ E it is valid τe (0) ∈ / [1, 3] and therefore, there are no other leaving edges for Y 0 = (X, 0). In an analogous way we determine all leaving edges for the set Y 1 = (X, 1), which correspond to t = 1: ((0, 1), (1, 2)) ∈ E corresponds to edge (0, 1) ∈ E with τ(0,1) (1) = 1; ((0, 1), (2, 2)) ∈ E corresponds to edge (0, 2) ∈ E with τ(0,2) (1) = 1; ((1, 1), (2, 2)) ∈ E corresponds to edge (0, 2) ∈ E with τ(1,2) (1) = 1; ((1, 1), (1, 3)) ∈ E corresponds to edge (1, 3) ∈ E with τ(1,3) (1) = 2; ((2, 1), (1, 2)) ∈ E corresponds to edge (2, 1) ∈ E with τ(2,1) (0) = 1; ((2, 1), (3, 3)) ∈ E corresponds to edge (2, 3) ∈ E with τ(2,3) (1) = 2. / [1, 2] and therefore, there For the rest of the edges e ∈ E it is valid τe (1) ∈ are no other leaving edges for Y 1 = (X, 1). Finally, we determine all leaving edges for the set Y 2 = (X, 2), which correspond to t = 2. ((0, 2), (1, 3)) ((0, 2), (2, 3)) ((1, 2), (2, 3)) ((1, 2), (3, 3)) ((2, 2), (3, 3))
corresponds corresponds corresponds corresponds corresponds
to to to to to
edge edge edge edge edge
(0,1) (0,2) (1,2) (1,3) (2,3)
with with with with with
τ(0,1) (2) = 1; τ(0,2) (2) = 1; τ(0,2) (2) = 1; τ(1,3) (2) = 1; τ(2,3) (2) = 1.
Each edge ((x, tj )(z, tj+1 )) ∈ E is associated with the cost c((x,tj ),(z,tj+1 )) = c(x,z) (tj ), i.e. c((0,0),(1,1)) = c((0,1),(1,2)) = c((0,2),(1,3)) = 2; c((0,0),(2,1)) = c((0,1),(2,2)) = c((0,2),(2,3)) = 3; c((1,0),(2,1)) = 1;
c((1,1),(2,2)) = 2;
c((1,2),(2,3)) = 3;
c((1,0),(3,2)) = 6;
c((1,1),(3,3)) = 6;
c((1,2),(3,3)) = 2;
c((2,1),(1,2)) = 2;
c((2,0),(3,2)) = 4;
c((2,2),(3,3)) = 4;
c((2,1),(3,3)) = 2.
2. We find a directed path PG (y0 , yf ) from vertex y0 = (0, 0) to vertex yf = (3, 3) with minimal sum of the edges’ costs in G. It is easy to observe that this is the directed path P G (y0 , yf ) = {y0 = (0, 0), (2, 1), (3, 3) = yf }. 3. We determine the vertices x0 = 0, 2, 3 = xf which give an optimal trajectory from starting vertex x0 = 0, to final vertex xf = 3, which use 3 units of times.
140
3 Extension and Generalization of Discrete Control Problems
Taking into account Remark 3.10, our control problem can be solved on a 0 0 0 graph G = (Y , E ), which is obtained from G by deleting vertices which are not attainable from (0,0) and vertices for which there are no directed paths from (x, t) to (3,3). The graph G0 is represented in Fig. 3.4.
t=0
t=1
t=2
t=3
0 2
1
1
2
3
2
2
2
2 2
4
3 Fig. 3.4.
In general, our problem can be solved without constructing the auxiliary network in Fig. 3.4. We obtain such a solution if we tabulate the values Fx∗0 x (tk ) by using the following recursive formula: Fx∗0 x (tk ) =
min
{Fx∗0 x (tk−1 ) + c(x(tk−1 ),x(tk )) (tk−1 )},
x(tk−1 )∈X − (x(tk ))
− where Fx∗0 x0 = 0 and XG (x(tk )) = {x(tk−1 ) ∈ X | (x(tk−1 ), x(tk )) ∈ E}. In these calculations we have to memorize the values Fx∗0 x (tk ) and the corresponding states x∗ (tk−1 ) such that
Fx∗0 x (tk ) = Fx∗0 x∗ (tk−1 ) (tk−1 ) + c(x(tk−1 ),x(tk )) (tk−1 )}.
These dates for our example are given in Table 3.
3.3 Multi-Objective Control of Time-Discrete Systems
141
Table 3 t 0 1 2 3
x Fx∗0 x(0) (0) x∗ (tk−1 ) ∗ Fx0 x(1) (1) x∗ (tk−2 ) Fx∗0 x(2) (2) x∗ (tk−1 ) Fx∗0 x(3) (3) x∗ (tk−1 )
0 0 0∗ ∞ ∞ ∞ -
1 ∞ 2 0 5 2 2 0
2 ∞ 3 0∗ 4 1 5 1
3 ∞ ∞ ∞ 5 2∗
Using this table we find an optimal path starting from final state xf = 3∗ at time moment tk = 3. The previous optimal state is x∗ (tk−1 ) = 2 and we determine tk−1 = tk − τ (2, 3) = 3 − 2 = 1. So, the previous state x∗ (tk−2 ) = 2 at time moment tk−1 = 1. After that we find the state x∗ (tk−2 ) = 0, where tk−2 = tk−1 − τ (1, 3) = 1 − 1 = 0. The optimal path is 0→2→3 ∗ (3) = 5. and the optimal value of the objective function is F03
3.3 Multi-Objective Control of Time-Discrete Systems with Varying Time of States’ Transitions Now we extend the multi-objective control problems from Section 1.1 of Chapter 1 by using hypotheses of varying time of states’ transition for dynamical systems. We see that the main results from Chapter 1 may be developed for multi-objective control problems with varying time of states’ transitions using a modified time-expanded network method from the previous sections. 3.3.1 Multi-Objective Discrete Control with Varying Time of States’ Transitions of Dynamical Systems First of all we formulate a multi-objective control problem using the concept of non-cooperative games. In order to formulate the problem in a general case we introduce the time-transition functions τx(t) = τx(t) u1 (t), u2 (t), . . . , up (t) ,
142
3 Extension and Generalization of Discrete Control Problems
on the set Ut (x(t)) =
Uti (x(t))
i
for an arbitrary state x = x(t) ∈ X at every discrete moment in time t = 0, 1, 2, . . . , where ui (t) ∈ Rmi represents the vector of control parameters of player i ∈ {1, 2, . . . , m}. The dynamics of system L is described by the following system of difference equations: ⎧ tj+1 = tj + τx(tj ) u1 (tj ), u2 (tj ), . . . , up (tj ) ; ⎪ ⎪ ⎪ ⎨ x(tj+1 ) = gtj x(tj ), u1 (tj ), u2 (tj ), . . . , up (tj ) ; (3.4) ⎪ ⎪ ⎪ ui (tj ) ∈ Utij (x(tj )) , i = 1, p; ⎩ j = 0, 1, 2, . . . , where t0 = 0, x(t0 ) = 0
(3.5)
is a starting representation of dynamical system L. The state x(tj+1 ) of system L at time-moment tj+1 is obtained uniquely if in (3.4) state x(tj ) at timemoment tj is known and the players 1, 2, . . . , p independently fix their vectors of control parameters u1 (tj ), u2 (tj ), . . . , up (tj ), respectively, where ui (tj ) ∈ Utij (x(tj )) ,
i = 1, 2, . . . , p; j = 0, 1, 2, . . .
Let be given a control ⎧ u1 (t0 ), u2 (t0 ), . . . , up (t0 ), ⎪ ⎪ ⎪ ⎪ ⎪ u1 (t1 ), u2 (t1 ), . . . , up (t1 ), ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ u1 (t2 ), u2 (t2 ), . . . , up (t2 ), .. ⎪ . ⎪ ⎪ ⎪ 1 2 p ⎪ u (t ), u (t ⎪ j j ), . . . , u (tj ), ⎪ ⎪ ⎪ ⎪ .. ⎩ .
(3.6)
where tj+1 = tj + τx(tj ) u1 (tj ), u2 (tj ), . . . , up (tj ) ,
j = 0, 1, 2, . . . ;
t0 = 0. Then for a given control either an unique trajectory x0 = x(0), x(t1 ), x(t2 ), . . . , x(tk ) = x (T (xf )) = xf from x0 to xf exists and tk = T (xf ) represents the time-moment when the state xf is reached or such a trajectory does not pass through xf .
3.3 Multi-Objective Control of Time-Discrete Systems
143
We denote Fxi 0 xf u1 (t), u2 (t), . . . , up (t) = =
k−1 j=1
citj x(tj ), gtj x(tj ), u1 (tj ), u2 (tj ), . . . , up (tj )
the integral-time cost of the system’s passage from x0 to xf for player i, i ∈ {1, 2, . . . , p} if the vectors u1 (tj ), u2 (tj ), . . . , up (tj ) satisfy the conditions (3.4), (3.5) and generate a trajectory x0 = x(0), x(t1 ), x(t2 ), . . . , x(tk ) = x (T (xf )) = xf from x0 to xf such that
T1 ≤ T (xf ) ≤ T2 ;
otherwise we put Fxi 0 xf u1 (t), u2 (t), . . . , up (t) = ∞. In an analogous way as T2 are given. in Section 1.1.2 we consider that T1 and = citj (x(tj ), x(tj+1 )) Here citj x(tj ), gtj x(tj ), u1 (tj ), u2 (tj ), . . . , up (tj ) represents the cost of the system’s passage from state x(tj ) to state x(tj+1 ) at step j. We consider the problem of finding a vector of control parameters ∗
∗
∗
∗
∗
∗
u1 (t), u2 (t), . . . , ui−1 (t), ui (t), ui+1 (t), . . . , up (t) which satisfies the condition ∗ ∗ ∗ ∗ ∗ Fxi 0 xf u1∗ (t), u2 (t), . . . , ui−1 (t), ui (t), ui+1 (t), . . . , up (t) ≤ ∗ ∗ ∗ ∗ ≤ Fxi 0 xf u1∗ (t), u2 (t), . . . , ui−1 (t), ui (t), ui+1 (t), . . . , up (t) , ∀ u(t) ∈ Rmi ,
t = 0, 1, 2, . . . ; i = 1, 2, . . . p.
So, we are seeking for a Nash equilibrium. In order to formulate the theorem of the existence of a Nash equilibrium for the considered multi-objective control problem we can use the same alternate players’ control condition as in Section 1.3 of Chapter 1. We assume that an arbitrary state x(tj ) ∈ X at time-moment tj represents a position (x, tj ) ∈ X × {0, 1, 2, . . . } for one of the players i ∈ {1, 2, . . . , p} and the passage of dynamical system L from state x(tj ) to next state y = x(tj+1 ) is made by player i. So, the expression gtj (x(tj ), u1 (tj ), u2 (tj ), . . . , ui−1 (tj ), ui (tj ), ui+1 (tj ), . . . , up (tj )) in (3.4) for a given tj only depends on control vector ui (tj ), i.e.
144
3 Extension and Generalization of Discrete Control Problems
gtj (x(tj ), u1 (tj ), u2 (tj ), . . . , ui−1 (tj ), ui (tj ), ui+1 (tj ), . . . , up (tj )) = = g tj (x(tj ), ui (tj )). Moreover, we assume that for a given state x(tj ) at a given moment in time tj the transition-time function τx(tj ) = τx(tj ) (u1 (tj ), u2 (tj ), . . . , ui−1 (tj ), ui (tj ), ui+1 (tj ), . . . , up (tj )) also only depends on the vector of control parameters ui (t), i.e. τx(tj ) (u1 (tj ), u2 (tj ), . . . , ui−1 (tj ), ui (tj ), ui+1 (tj ), . . . , up (tj )) = τ x(tj ) (ui (tj )). The following theorem holds: Theorem 3.11. Let us assume that for a multi-objective control problem with varying time of states’ transitions there exists a trajectory x0 = x(0), x(t1 ), x(t2 ), . . . , x(tk ) = x (T (xf )) = xf from starting state x0 to final state xf , generated by vectors of control pa rameters (3.6), where t0 = 0, tj+1 = tj + τx(tj ) u1 (tj ), u2 (tj ), . . . , up (tj ) , j = 0, 1, 2, . . . , ui (tj ) ∈ Utj (x(tj )) , i = 1, 2, . . . , p and T1 ≤ T (xf ) ≤ T2 . Moreover, we assume that the alternate players’ control condition is satisfied. Then, for this problem there exists an optimal solution in the sense of Nash. Proof. We prove this theorem by using the idea and the constructive scheme of the main theorem from [70] concerning the existence of Nash equilibria for a multi-objective control problem with unit time of states’ transitions. So, we give a construction which allows us to reduce our problem to a problem of finding optimal stationary strategies of the players in a dynamic c-game on network (G, Y1 , Y2 , . . . , Yp , c1 , c2 , . . . , cp , z0 , z f ), where G = (Y ∪{z f }, E) is an acyclic directed graph with sink vertex z f . The set of vertices Y consists of T2 + 1 copies of the set of states X, i.e. Y = X0 ∪ X1 ∪ · · · ∪ XT2 , where Xt = (X, t) is a set of states which correspond to the moment in time t = 0, 1, 2, . . . , T2 (see Fig. 3.5). In G two vertices, (x, tj ) and (y, tj+1 ), are connected with a directed edge ((x, tj ), (y, tj+1 )) ∈ E if for a given position z = (x, tj ) = x(tj ) ∈ Yi there exists a control ui (tj ) ∈ Utij (x(tj )) such that y = x(tj+1 ) = gti (x(tj ), ui (tj )). Additionally, in G for t = T1 , T1 + 1, . . . , T2 we add edges ((x, t), z f ). Each edge ((x, tj ), (y, tj+1 )) ∈ E is associated with p costs
3.3 Multi-Objective Control of Time-Discrete Systems (X,tj)
(X,0)
...
(X,tj+1)
(X,T1)
(X,T1+1)
145
(X,T2)
...
...
...
...
...
...
(x,tj)
(x0 , 0)
... . . .
. . .
... . . .
. . .
...
. . .
. . .
...
zf=(x,T2)
...
(y,tj+1)
. . .
...
. . .
. . .
...
. . .
. . .
...
. . .
zf
...
Fig. 3.5.
ci ((x, tj ), (y, tj+1 )) = citj (x(tj ), y(tj+1 )),
i = 1, p,
and for each edge ((x, t), z f ), t = T1 , T1 + 1, . . . , T2 , we put ci ((x, t), z f ) = 0,
i = 1, p.
It is easy to observe that between the set of feasible trajectories of the dynamical system L from starting state x0 = x(0) to final state xf in a multiobjective control problem and the set of directed paths from (x0 , 0) to z f in G there exists a bijective mapping such that the corresponding costs of the payoff functions of the players in the multi-objective control problem and in the dynamic c-game on the network are the same. Taking into account that for the dynamic c-game on an acyclic network there exists a Nash equilibrium for arbitrary costs on the edges (see [5, 71]) we may conclude that for our multi-objective control problem there exists a Nash equilibrium. Based on the constructive proof of Theorem 1 we can propose the following algorithm for determining a Nash equilibrium in the multi-objective control problem when the alternate players’ control condition holds: 1. Construct an auxiliary network (G, Y1 , Y2 , . . . , Yp , c1 , c2 , . . . , cp , z0 , z f ); 2. Find optimal stationary strategies of the players in a dynamic c-game on the auxiliary network; 3. Find an optimal solution in the sense of Nash for the multi-objective control problem using the one to one correspondence between the set of strategies of the players in the dynamic c-game on G and the set of feasible control of the players in the multi-objective control problem.
146
3 Extension and Generalization of Discrete Control Problems
The problem of determining a Stackelberg solution in the case of varying the time of states’ transition of the dynamical system can also be formulated and solved in an analogous way as the problem from Section 1.14. 3.3.2 A Dynamic c-Game on Networks with Transition-Time Functions on the Edges In the following we formulate the dynamic c-game in the general form, i.e. we consider the c-game on a network which may contain cycles and transitiontime functions on the edges which depend on time. Let G = (X, E) be the graph of states’ transitions for dynamical system L with a finite set of states X. So, an arbitrary edge e = (x, y) ∈ E means the possibility of system L to pass from state x = x(t) to state y ∈ XG (x) at every moment in time t = 0, 1, 2, . . . . Two states x0 , xf are given in X, where x0 = x(0) is starting state of dynamical system L and xf is final state of L. The final state xf should be reached at time moment T (xf ) such that T1 ≤ T (xf ) ≤ T2 , where T1 and T2 are known. Assume that system L is controlled by p players and the control on G is made in the following way: The vertex set X is divided into p subsets X = X 1 ∪ X2 ∪ · · · ∪ Xp
(Xi ∩ Xj = ∅, i = j),
where vertices x ∈ Xi are regarded as positions of the player i ∈ {1, 2, . . . , p}. The control starts at position x0 = x(t0 ), where t0 = 0. If x(t0 ) ∈ Xi1 , then player i1 transfers system L from state x0 to state x1 = x(t1 ), where t1 = t0 + τe0 (t0 ), e0 = (x0 , x1 ) ∈ E. If x(t1 ) ∈ Xi2 , then player i2 transfers system L from state x1 to state x2 = x(t2 ), where t2 = t1 + τe1 (t1 ), e1 = (x1 , x2 ) ∈ E and so on. If at time moment tk the final state xf is reached, i.e. x(tk ) = xf , then STOP. After that, the players calculate their integral-time costs ci(x(tj ),x(tj+1 )) (tj ), i = 1, p, Hxi 0 xf = if T1 ≤ tk ≤ T2 ; otherwise put Hxi 0 xf = ∞. Here cie (tj ) represents the cost function on the edge e = (x, y) ∈ E, which expresses the cost of the system’s passage from state x = x(tj ) to state y = x(tj + τe (tj )). In the control process the players intend to minimize their integral-time costs by a trajectory x(t0 ), x(t1 ), . . . , x(tk ). In this dynamic game we assume that the players use only stationary strategies. We define stationary strategies of the players 1, 2, . . . , p as maps: s1 : x → y ∈ X(x) for x ∈ X1 \ {xf }; s2 : x → y ∈ X(x) for x ∈ X2 \ {xf }; .. . sp : x → y ∈ X(x) for x ∈ Xp \ {xf }.
3.3 Multi-Objective Control of Time-Discrete Systems
147
Let s1 , s2 , . . . , sp be an arbitrary set of strategies of the players and Gs = (X, E) represents the subgraph of G generated by edges e = (x, si (x)) for x ∈ X \ {xf }, i = 1, p. Then for fixed s1 , s2 , . . . , sp either a unique directed path Ps (x0 , xf ) from x0 to xf exists in Gs or such a path does not exist in Gs . For fixed strategies s1 , s2 , . . . , sp and given x0 and xf we define the quantities Hx10 xf (s1 , s2 , . . . , sp ), Hx20 xf (s1 , s2 , . . . , sp ), . . . , Hxp0 xf (s1 , s2 , . . . , sp ) in the following way: Let us assume that the path Ps (x0 , xf ) exists in G and we assign to its edges numbers 0, 1, 2, . . . , ks starting with the edge that begins in x0 . Then we can calculate the time tek = tek (s1 , s2 , . . . , sp ), where te0 = 0, tej = tej−1 + τej (tej−1 ), j = 1, ks . We put Hxi 0 xf (s1 , s2 , . . . , sp ) =
ks
cej (tek (s1 , s2 , . . . , sp )),
j=0
if T1 ≤ |E(Ps (x0 , xf ))| ≤ T2 ; otherwise we put Hxi 0 xf (s1 , s2 , . . . , sp ) = ∞. We consider the problem of finding maps s∗1 , s∗2 , . . . , s∗p for which the following conditions are satisfied: Hxi 0 xf (s∗1 , s∗2 , . . . , s∗i−1 , s∗i , s∗i+1 , . . . , s∗p ) ≤ ≤ Hxi 0 xf (s∗1 , s∗2 , . . . , s∗i−1 , si , s∗i+1 , . . . , s∗p ),
∀si ∈ Si , i = 1, p.
The network for the dynamic c-game with transition time functions we denote by (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t), τ (t), x0 , xf , T1 , T2 ), where τ (t) = (τe1 (t), τe2 (t), . . . , τe|E| (t)). In an analogous way as for the stationary dynamic c-game from Section 1.5 here T1 and T2 satisfy the conditions: 0 ≤ T1 ≤ |X|−1, T1 ≤ T2 . If T1 = 0, T2 = ∞, then we will use the notation (G, X1 , X2 , . . . , Xp , , c1 (t), c2 (t), . . . , cp (t), τ (t), x0 , xf ). The following theorem holds: Theorem 3.12. Let (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), . . . , cp (t), τ (t), x0 , xf , T1 , , T2 ) be a dynamic network for which in G there exists a directed path Ps (x0 , xf ) with integral time T (xf ) such that T1 ≤ T (xf ) ≤ T2 (0 ≤ T1 ≤ |X| − 1, T1 ≤ T2 ). Additionally, vectors ci = (cie1 , cie2 , . . . , cie|E| ), i ∈ {1, 2, . . . , p}, and τ = (τe1 , τe2 , . . . , τe|E| ) have positive and constant components. Then in the dynamic c-game on network (G, X1 , X2 , . . . , Xp , c1 (t), c2 (t), , . . . , cp (t), τ (t), x0 , xf , T1 , T2 ) there exists an optimal solution in the sense of Nash. This theorem in the case of integer functions τe follows as corollary of Theorem 1.28. Indeed, the dynamic c-game on network (G, X1 , X2 , . . . , Xp ,
148
3 Extension and Generalization of Discrete Control Problems
c1 (t), c 2 (t), . . . , c p (t), τ (t), x 0 , xf , T 1 , T 2 ) can be reduced to an auxiliary dynamic c-game with unit transition time functions on the edges on an auxiliary network (G, X 1 , X 2 , . . . , X p , c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ), where the graph G is obtained from G if each directed edge e = (x, y) ∈ E is changed by a sequence of |τe | edges e1 = (x, x1 ), e2 = (x1 , x2 ), . . . , e|τe | = (x|τe | , y) with the costs c(x,x1 ) = ce ,
c(x1 ,x2 ) = c(x2 ,x3 ) = · · · = c(x|τe | ,y) = 0.
We define a partition X 1 , X 2 , . . . , X p in G in the following way: The corresponding sets X1 , X2 , . . . , Xp are associated with X 1 , X 2 , . . . , X p respectively and all new vertices xi , i = 1, |τe | are associated with X 1 . According to Theorem 1.28 in the dynamic c-game on (G, X 1 , X 2 , . . . , X p , , c1 (t), c2 (t), . . . , cp (t), x0 , xf , T1 , T2 ) there exists a Nash equilibrium if T1 ≤ |E(Ps (x0 , xf ))| ≤ T2 . Taking into account that between the set of strategies in the auxiliary dynamic c-game and the set of strategies in the initial dynamic c-game there exists a bijective mapping which preserves the integral-time costs of strategies, we obtain the proof of Theorem 3.12. So, for finding optimal stationary strategies in the dynamic c-game with transition-time functions on the edges the algorithms from Chapter 1 on an auxiliary network can be used. For the dynamic c-game with transition-time functions we can define nonstationary strategies of the players also as maps: u1 : (x, t) → (y, t + τ(x,y) (t)) for x ∈ X1 \ {xf } u2 : (x, t) → (y, t + τ(x,y) (t)) for x ∈ X2 \ {xf } .. . up : (x, t) → (y, t + τ(x,y) (t))
for
x ∈ Xp \ {xf }
where y ∈ X(x) and t = 0, 1, 2, . . . . Using the same concept from Chapter 1 for an arbitrary set of nonstationary strategies u1 , u2 , . . . , up , which generates a trajectory x(t0 ), x(t1 ), x(t2 ), . . . , x(tk ), . . . , we can define the integral-time costs Fxi 0 xf (u1 , u2 , . . . , up )
tk−1
=
j=0
ci(x(tj ),x(tj+1 )) (tj ),
i = 1, p
if T1 ≤ tk ≤ T2 ; otherwise we put Fxi 0 xf (u1 , u2 , . . . , up ) = ∞. Here t0 = 0, tj+1 = tj +τ(x(tj ),x(tj+1 )) (tj ), tk = T (xf ) and x(tj+1 ) = ui (x(tj )) if x(tj ) ∈ Xi , i = 1, p. The same technique of the time-expanded network from the Sections 1.9.1 and 3.1.2 can be developed for reducing the non-stationary problem to the stationary case with constant costs and constant transition times on the edges in the acyclic auxiliary network.
3.3 Multi-Objective Control of Time-Discrete Systems
149
3.3.3 Remark on Determining Pareto Optima for the Multi-Objective Control Problem with Varying Time of States’ Transitions The concept of cooperative games for the multi-objective control problem with varying time of states’ transition of the dynamical system can be used in an analogous way as in Section 1.1.4. Here we should take into account that the dynamics of the system is described by the following system of difference equations: tj+1 = tj + τx(tj ) (u(t)) , x(tj+1 ) = gtj (x(tj ), u(tj )) , u(tj ) ∈ Ut (u(tj )) , j = 0, 1, 2, . . . where x0 = 0,
x(t0 ) = x0
is a starting representation of dynamical system L. In the control process p players participate, which coordinate their actions by using a common vector of control parameters u(t) ∈ Rm . Let u(tj ), j = 0, 1, 2, . . . , be a cooperative control of the dynamical system, which generates a trajectory x(t0 ), x(t1 ), x(t2 ), . . . , where t0 = 0, tj+1 = tj + τx(tj ) (u(tj )) , j = 0, 1, 2, . . . We define Fxi 0 ,xf (u(t)) =
k−1 j=0
citj x(tj ), gtj (x(tj ), u(tj )) ,
i = 0, 1, 2, . . . , p
the integral-time cost of the system’s passage from x0 = x(0) to xf = x(tk ) = x (T (xf )) if T1 ≤ T (xf ) ≤ T2 ; otherwise we put Fxi 0 ,xf (u(t)) = ∞. A Pareto solution for this cooperative game can be obtained on the basis of the modified time-expanded network method by using algorithms from Chapter 1. The auxiliary network for the problem in this case is obtained in an analogous way as in Section 3.3.1. Here we should not take into account the partition X = X1 ∪ X2 ∪ · · · ∪ Xp , i.e. graph G is obtained in the same way, and two vertices (x, tj ), (y, tj+1 ) are connected if there exists a control u(t)
150
3 Extension and Generalization of Discrete Control Problems
such that y = gt (x(t), u(t)). Additionally, in G we add also an edge ((x, t), z) for t = T1 , T1 + 1, . . . , T2 if for a given state x = x(t) there exists a control u(t) such that xf = gt (x(t), u(t)).
3.4 An Algorithm for Solving the Discrete Optimal Control Problem with Infinite Time Horizon and Varying Time of the States’ Transitions In this section we formulate and study the discrete optimal control problem with infinite time horizon and varying time of states’ transitions of the dynamical system. We show that this problem can be reduced to the problem of determining the optimal mean cost cycle in the directed graph of states’ transitions of the dynamical system where with each edge the cost and the transition time of the system’s passage through the edge are associated. Polynomial-time algorithms for solving the considered problems are described. 3.4.1 Problem Formulation and Some Preliminary Results Consider the dynamical system L with a finite set of states X ⊆ Rn , where at every discrete moment in time t = 0, 1, 2, . . . the state of L is x(t) ∈ X. Assume that the control of system L at each time-moment t = 0, 1, 2, . . . for an arbitrary state x(t) is made by using the vector of control parameters u(t) ∈ Rm for which a feasible set Ut (x(t)) is given, i.e. u(t) ∈ Ut (x(t)). For arbitrary t and x(t) on Ut (x(t)) it is defined an integer function τx(t) : Ut (x(t)) → N which gives to each control u(t) ∈ Ut (x(t)) an integer value τx(t) (u(t)). This value expresses the time of the system’s passage from state x(t) to state x t + τx(t) (u(t)) if the control u(t) ∈ Ut (x(t)) has been applied at moment t for a given state x(t). The dynamics of system L is described by the following system of difference equations: ⎧ tj+1 = tj + τx(tj ) (u(tj )) ; ⎪ ⎪ ⎪ ⎨ x(tj+1 ) = gtj (x(tj ), u(tj )) ; ⎪ ⎪ ⎪ u(tj ) ∈ Utj (x(tj )) ; ⎩ j = 0, 1, 2, . . . , where x(t0 ) = 0,
t0 = 0
3.4 An Algorithm for Solving the Discrete Optimal Control Problem
151
is a given starting representation of dynamical system L. Here we suppose that the functions gt and τx(t) are known and tj+1 and x(tj+1 ) are determined uniquely by x(tj ) and u(tj ) at every step j. Let u(tj ), j = 0, 1, 2, . . . , be a control, which generates a trajectory x(0), x(t1 ), x(t2 ), . . . , x(tk ), . . . . For this control we define the mean integral-time cost by a trajectory k−1
Fx0 (u(t)) = lim
k→∞
ctj x(tj ), gtj (x(tj ), u(tj ))
j=0 k−1
τx(tj ) (u(tj ))
j=0
where ctj x(tj ), gtj (x(tj ), u(tj )) = ctj (x(tj ), x(tj+1 )) represents the cost of system L to pass from state x(tj ) to state x(tj+1 ) at stage [j, j + 1]. We consider the problem of finding time-moments t = 0, t1 , t2 , . . . , tk−1 , . . . and vectors of control parameters u(0), u(t1 ), u(t2 ), . . . , u(tk−1 ), . . . which satisfy the conditions mentioned above, and minimize functional Fx0 (u(t)). In case τx(t) ≡ 1 for every t and x(t) this problem becomes a control problem with unit time of states’ transitions from [3, 4, 55]. In [43, 55, 98] the problem of determining the stationary control with unit time of states’ transitions has been studied. In the papers mentioned, it is assumed that Ut (x(t)), gt and ct do not depend on t, i.e. gt = g, ct = c and Ut (x) = U (x) for t = 0, 1, 2, . . . . R. Bellman showed in [3] that for the stationary case of the problem with unit time of states’ transitions there exists an optimal stationary control u∗ (0), u∗ (1), u∗ (2), . . . , u∗ (t), . . . , such that k−1 1 c x(t), g (x(t), u∗ (t)) = k→∞ k t=0
lim
= inf lim
u(t) k→∞
k−1 1 c x(t), g (x(t), u(t)) = λ < ∞. k t=0
Furthermore in [55, 98] it is shown that the stationary case of the problem can be reduced to the problem of finding the optimal mean cost cycle in a graph of states’ transitions of the dynamical system. Based on these results in [43, 55, 98] polynomial-time algorithms for finding optimal stationary control are proposed. Here we develop the results mentioned above for the general stationary case of the problem with arbitrary transition-time functions τx . We show that this problem can be formulated as a problem of determining optimal mean cost cycles in the graph of states’ transitions of the dynamical system for an arbitrary transition-time function on the edges.
152
3 Extension and Generalization of Discrete Control Problems
3.4.2 An Algorithm for Determining an Optimal Stationary Control for Dynamical Systems with Infinite Time Horizon We consider the stationary case of the control problem from Section 3.4.1 where gt , ct , Ut (x(t)), τx(t) do not depend on time and the control u(t) for an arbitrary state x ∈ X is the same at different moments in time t = 0, 1, 2, . . . . So, gt = g, ct = c, Ut (x) = U (x), τx(t) = τ for t = 0, 1, 2, . . . and the control only depends on the state. In this case it is convenient for us to study the problem on a directed graph G = (X, E) of states’ transitions of the dynamical system L. An arbitrary vertex x of G corresponds to a state x ∈ X and an arbitrary directed edge e = (x, y) ∈ E expresses the possibility of system L to pass from state x(t) to state x(t + τ (e)), where τ (e) is the time of the system’s passage from state x to state y through edge e = (x, y). So, on the edge set E it is defined a function τ : E → R+ which gives to each edge a positive number τ (e) which means that if system L at the moment in time t has state x = x(t) then the system can reach state y at the moment in time t + τ (e) if it passes through the edge e = (x, y), i.e. y = x(t + τ (e)). Additionally, on the edge set E it is defined a cost function c : E → R, which gives to each edge a cost c(e) of the system’s passage from state x = x(t) to state y = x(t + τ (e)) for an arbitrary discrete moment in time t. So, finally we have that with each edge two numbers, c(e) and τ (e), are associated. In G an arbitrary edge e = (x, y) corresponds to a control in our problem and the set of edges E(x) = {e = (x, y) | (x, y) ∈ E}, originated in vertex x corresponds to the feasible set U (x) of the vectors of control parameters at state x. The transition time function τ in G is induced by transition time functions τx for the stationary case of our control problem. It is easy to observe that the problem of determining the optimal stationary control of a time-discrete system L with infinite time horizon and varying time of states’ transitions can be regarded as the problem of finding in G the minimal mean cost cycle C ∗ which can be reached from vertex x0 (vertex x0 corresponds to the starting state x0 = x(0) of the dynamical system L). Indeed, a stationary control in G means fixing for every x ∈ X the system’s passage through an edge e = (x, y) to a neighbor vertex y. Such a strategy of system’s passage in G generates a trajectory which leads to a directed cycle C with the set of edges E(C). Therefore, our stationary case of the problem is reduced to the problem of finding the minimal mean cost cycle which can be reached from x0 , where with each directed edge e = (x, y) ∈ E there are associated the cost c(e) and the transition time τ (e) of the system’s passage from state x = x(t) to state y = x(t + τ (e)). If the minimal mean cost cycle C ∗ in G is known then the stationary optimal control for our problem can be found in the following way: In G we fix an arbitrary simple directed path P (x0 , xk ) with the set of edges E(P (x0 , xk )) which connect the vertex x0 with the cycle C ∗ . After that, for an arbitrary state x ∈ X we choose a stationary control which corresponds to a unique
3.4 An Algorithm for Solving the Discrete Optimal Control Problem
153
directed edge e = (x, y) ∈ E(P (x0 , xk ))∪E(C ∗ ). For such a stationary control the following equality holds: k−1
inf lim
c x(tj ), g (x(tj ), u(tj ))
j=0 k−1
u(t) k→∞
= τx (u(tj ))
c(e)
e∈E(C ∗ )
. τ (e)
e∈E(C ∗ )
j=0
Note that the condition U (x) = ∅, ∀x ∈ X, for the stationary case of the control problem means that in G each vertex x contains a leaving directed edge e = (x, y). We consider that in G every vertex x ∈ X is attainable from x0 ; otherwise we can delete vertices from X for which there are no directed paths P (x0 , x) from x0 to x. Moreover, without loss of generality, we may consider that G is a strongly connected graph. Then the problem of finding a stationary control for the problem from Section 3.4.1 can be formulated as the following optimization problem on G: Find a directed cycle C ∗ such that c(e) c(e) e∈E(C ∗ )
e∈E(C)
τ (e)
= min
e∈E(C ∗ )
{C}
. τ (e)
e∈E(C)
In the following, we show that this problem can be reduced to the following linear fractional problem: Minimize c(e)α(e) e∈E
z=
(3.7) τ (e)α(e)
e∈E
subject to ⎧ ⎪ α(e) − α(e) = 0, ⎪ ⎪ ⎪ ⎪ e∈E + (x) e∈E − (x) ⎪ ⎨ α(e) = 1; ⎪ ⎪ ⎪ e∈E ⎪ ⎪ ⎪ ⎩ α(e) ≥ 0, e ∈ E,
∀x ∈ X; (3.8)
where E − (x) = {e = (y, x) ∈ E | y ∈ X}; E + (x) = {e = (x, y) ∈ E | y ∈ X}. In order to prove this fact we should use the results from [55]. Let α = (α(e1 ), α(e2 ), . . . , α(e|E| )) be an arbitrary feasible solution of system (3.8) and denote by Gα = (Xα , Eα ) the subgraph of G generated by the set of edges Eα = {e ∈ E | α(e) > 0}. In [55] it is shown that an arbitrary extreme point α0 = (α0 (e1 ), α0 (e2 ), . . . , α0 (e|E| )) of the set of solutions of system
154
3 Extension and Generalization of Discrete Control Problems
(3.8) corresponds to a subgraph Gα0 = (Xα0 , Eα0 ) which has the structure of an elementary directed cycle. Taking into account that for problem (3.7), (3.8) there exists an optimal solution α∗ = (α∗ (e1 ), α∗ (e2 ), . . . , α∗ (e|E| )) which corresponds to an extreme point of the set of solutions (3.8) we obtain that c(e)α∗ (e) e∈E
max z = α
∗
τ (e)α∗ (e)
.
e∈Eα∗
and the set of edges Eα∗ generates a directed cycle Gα∗ for which α∗ (e) = 1 ∗ |Eα∗ | , ∀e ∈ Eα . Therefore,
c(e)
e∈Eα∗
max z =
. τ (e)
e∈Eα∗
So, the optimal solutions of problem (3.7), (3.8) correspond to the minimal mean cost cycle in the directed graph of states’ transitions of the dynamical system. This means that the fractional linear programming problem (3.7), (3.8) can be used for determining an optimal stationary control for our problem.
3.5 A General Approach for Algorithmic Solutions of Discrete Optimal Control Problems and its Game-Theoretic Extension In this section we study the control models for which the objective function is defined algorithmically [68]. We show that such an aspect of the control models allows us to extend the algorithms from Sections 1.1-1.8 for a more large class of problems. 3.5.1 A General Optimal Control Model Let L be a dynamical system with a set of states X ⊆ Rn where at every 1, 2, . . . the state of L is x(t) ∈ X, x(t) = moment in time t = 0, x1 (t), x2 (t), . . . , xn (t) ∈ Rn . The dynamics of system L is described as follows: (3.9) x(t + 1) = gt x(t), u(t) , t = 0, 1, 2, . . . , where x(0) = x0
(3.10)
3.5 A General Approach for Algorithmic Solutions
155
is the starting point of system L and u(t) = u1 (t), u2 (t), . . . , um (t) ∈ Rm represents the vector of control parameters. For vectors of control parameters u(t), t = 0, 1, 2, . . . the admissible sets Ut x(t) are given, i. e. u(t) ∈ Ut x(t) , t = 0, 1, 2, . . . . (3.11) We assume that the vector functions gt x(t), u(t) = gt1 x(t), u(t) , gt2 x(t), u(t) , . . . , gtn x(t), u(t) are determined uniquely by x(t) and u(t) at every moment in time t = 0, 1, 2, . . . . So, x(t + 1) is determined uniquely by x(t) and u(t). Let x(0), x(1), . . . , x(t), . . . (3.12) be a process, generated according to (3.9)-(3.11) with given vectors of control parameters u(t), t = 0, 1, 2, . . . For each state x(t), t = 0, 1, 2, . . . of process (3.12) we define a numerical determination Ft x(t) by using the following recursive formula: Ft+1 x(t + 1) = ft x(t), u(t), Ft x(t) , t = 0, 1, 2 . . . where
F0 x(0) = F0
is a given representation of the starting state x(0) of system L; ft (·, ·, ·), t = 0, 1, 2, . . . are arbitrary functions. In this model Ft (x(t)) expresses the cost of the system’s passage from x0 to x(t). In the following, we distinguish two optimization problems: Problem 3.13. For a given T determine vectors of control parameters u(0), u(1), . . . , u(T − 1), which satisfy the conditions x(t + 1) = gt x(t), u(t) , t = 0, 1, 2, . . . , T − 1; x(0) = x0 , x(T ) = xf , (3.13) u(t) ∈ Ut x(t) , t = 0, 1, 2, . . . , T − 1; Ft+1 x(t + 1) = ft x(t), u(t), Ft x(t) , t = 0, 1, 2, . . . , T − 1; F0 x(0) = F0 and minimize the objective function Ix0 x(T ) u(t) = FT x(T )
(3.14)
Problem 3.14. For given T1 and T2 determine T ∈ [T1 , T2 ] and a control sequence u(0), u(1), . . . , u(T − 1), which satisfy condition (3.13) and minimize objective function (3.14).
156
3 Extension and Generalization of Discrete Control Problems
Remark 3.15. It is obvious that an optimal solution of Problem 3.14 can be obtained by reducing it to Problem 3.13, fixing the parameter T = T1 , T = T1 + 1, . . . , T = T2 . By choosing the optimal value of the solutions of the problems of type 3.13 with T = T1 , T = T1 + 1, . . . , T = T2 we obtain the solution of Problem 3.14 with T ∈ [T1 , T2 ]. It is easy to observe that a large class of dynamic optimization problems can be represented as a problem mentioned above. For example, if ft x(t), u(t), Ft x(t) = Ft x(t) + ct x(t), u(t) , where F0 (x0 ) = 0 and ct x(t), u(t) represents the cost of the system’s passage from state x(t) to state x(t + 1), then we obtain discrete control problems with integral-time which are introduced and treated in [4, 6, 37, 64, 67, 71, 72]. Some classes of control problems from [4, 6] may be obtained if ft x(t), u(t), Ft x(t) = Ft x(t) · ct x(t), u(t) , t = 1, 2, . . . ,
where F0 (x0 ) = 1 or
ft x(t), u(t), Ft x(t) = max{Ft x(t) , ct x(t), u(t) },
where F0 (x0 ) = 0. We propose a general scheme based on dynamic programming for solving these problems. 3.5.2 An Algorithm for Determining an Optimal Solution of the Problem with Fixed Starting and Final States We propose a general procedure for determining optimal solutions of the formulated problems in the case that ft (x, u, F ), t = 0, 1, 2, . . . , are nondecreasing functions with respect to the third argument, i.e. with respect to F . So, we shall consider that for fixed x and u the functions ft (x, u, F ), t = 0, 1, 2, . . . satisfy the condition ft (x, u, F ) ≤ ft (x, u, F )
if F ≤ F .
(3.15)
Then the following algorithm determines an optimal solution of Problem 3.13:
3.5 A General Approach for Algorithmic Solutions
157
Algorithm 3.16. Determining the Solution of the General Optimal Control Problem 1. Set F0∗ x(0) = F0 ; Ft∗ x(t) = ∞; x(t) ∈ X, t = 1, 2, . . . ; X0 = {x0 }. 2. For t = 0, 1, 2, . . . , T − 1 determine: Xt+1 = {x(t + 1) ∈ X | x(t + 1) = gt x(t), u(t) , , x(t) ∈ Xt , u(t) ∈ Ut x(t) } and for every x(t + 1) ∈ Xt+1 determine ∗ x(t + 1) = min ft x(t), u(t), Ft∗ x(t) | x(t + 1) = gt (x(t), u(t)), Ft+1
, x(t) ∈ Xt , u(t) ∈ Ut x(t) ; 3. Find a sequence xT = x∗ (T ), x∗ (T − 1), x∗ (T − 2), . . . , x∗ (1), x∗ (0) = x0 and
u∗ (T − 1), u∗ (T − 2), . . . , u∗ (1), u∗ (0),
which satisfy the conditions FT∗ −τ x∗ (T − 1) = fT −τ −1 x∗ (T − τ − 1), u∗ (T − τ − 1), , FT∗ −τ −1 x(T − τ − 1) ,
τ = 0, 1, 2, . . . , T.
Then u∗ (0), u∗ (1), u∗ (2), . . . , u∗ (T − 1) represents the optimal solution of Problem 3.13. Theorem 3.17. If ft (x, u, F ), t = 0, 1, 2, . . . , T are non-decreasing functions with respect to the third argument of F , i.e. the functions ft (x, u, F ), t = 0, 1, 2, . . . , T satisfy condition (3.15), then the algorithm determines the optimal solution of Problem 3.13. Moreover, an arbitrary leading part x∗ (0),x∗ (1), . . . , x∗ (k) of an optimal trajectory x∗ (0), x∗ (1), . . . , x∗ (k), . . . , x∗ (T ) is again an optimal one. Proof. We prove this theorem by using the induction principle on the number of stages T . In case T ≤ 1 the theorem is evident. We consider that the theorem holds for T ≤ k and prove it for T = k + 1. Assume via contradiction that u∗ (0), u∗ (1), . . . , u∗ (T − 2), u∗ (T − 1) is not an optimal solution of Problem 3.13 and u (0), u (1), . . . , u (T −2), u (T −1) is an optimal solution of Problem 3.13, which differs from u∗ (0), u∗ (1), . . . , u∗ (T − 2), u∗ (T − 1).
158
3 Extension and Generalization of Discrete Control Problems
Then u (0), u (1), . . . , u (T − 2), u (T − 1) generates a trajectory x0 = x (0), x (1), . . . , x (T ) = xT with corresponding numerical evaluations of the states x (t + 1) = ft x (t), u (t), Ft x (t) , t = 0, 1, 2 . . . , T − 1; Ft+1
where F0 x (0) = F0 and FT x (T ) < FT∗ x (T ) ,
(3.16)
because x (T ) = x∗ (T ). According to the induction principle for Problem 3.13 with T − 1 stages the algorithm finds the optimal solution. So, for arbitrary x(T −1) ∈ X we obtain the optimal evaluations FT∗ −1 x(T −1) for x(T −1) ∈ X. Therefore, FT∗ −1 x (T − 1) ≤ FT −1 x (T − 1) . According to the algorithm fT −1 x∗ (T − 1), u∗ (T − 1), FT∗ −1 x∗ (T − 1) ≤ ≤ fT −1 x (T − 1), u (T − 1), FT∗ −1 x (T − 1) .
(3.17)
Since ft (F, x, u), t = 0, 1, 2, . . . are non-decreasing functions with respect to F then fT −1 x (T − 1), u (T − 1), FT∗ −1 x (T − 1) ≤ (3.18) ≤ fT −1 x (T − 1), u (T − 1), FT −1 x (T − 1) . Using (3.17) and (3.18) we obtain FT∗ x(T ) = fT −1 x∗ (T − 1), u∗ (T − 1), FT∗ −1 x∗ (T − 1) ≤ fT −1 x (T − 1), u (T − 1), FT∗ −1 x (T − 1) ≤ fT −1 x (T − 1), u (T − 1), FT −1 x (T − 1) = FT x(T ) , i.e.
FT∗ x(T ) ≤ FT x(T ) ,
which is contrary to (3.16). So the algorithm finds the optimal solution of Problem 3.13 with T = k + 1.
3.5 A General Approach for Algorithmic Solutions
159
Theorem 3.18. Let X and Ut (x), x ∈ X, t = 0, 1, 2, . . . , T − 1, be finite sets. Then, the algorithm uses at most M · |X| · T elementary operations (excluding the operations for calculating values of the functions ft (F, x, u) for given F, x and u), where M=
max
x∈X, t=0,1,2,...,T −1
|Ut (x)|.
Proof. It is sufficient to prove that at step t the algorithm uses no more than finding the value Ft+1 x(t + M · |X| elementary operations. Indeed, for 1) for x(t + 1) ∈ X it is necessary to use x∈X |Ut (x)| operations. Since |U (x)| ≤ |X| · M then at step t the algorithm uses no more than t x∈X |X| · M elementary operations. So, in general the algorithm uses no more than M · |X| · T elementary operations. 3.5.3 The Discrete Optimal Control Problem on a Network Let L be a dynamical system with a finite set of states X, and at every discrete moment in time t = 0, 1, 2, . . . the state of system L is x(t) ∈ X. Note that here we associate x(t) with an abstract element (in Sections 3.5.1 and 3.5.2 x(t) represents a vector from Rn ). Two states x0 and xf are chosen in X, where x0 is a starting state of the system L, x0 = x(0), and xf is the final state of the system, i.e. xf is the state in which the system must be brought. We consider the optimal control problem, when the dynamics of the system is described by a directed graph of transitions G = (X, E) with given costs ce (t) on the edges e ∈ E, i.e. we consider the control problem from Section 1.4.1. So, we are seeking for a sequence of system’s passages (x(0), x(1)), (x(1), x(2)), . . . , (x(T − 1), x(T )) ∈ E (which transfers the system L from state x0 = x(0) to state xf = x(T ) with minimal integral-time cost) by a trajectory x0 = x(0), x(1), x(2), . . . , x(T ) = xf . We will discuss two variants of this problem: 1) The number of the stages (time T ) is fixed; 2) T is unknown and it must be determined. It is easy to observe that for solving these problems we can use the algorithm from Section 3.5.2. We put F0 x(0) = 0 and Ft+1 x(t + 1) = Ft x(t) + c
x(t),x(t+1)
(t)
for x(t), x(t + 1) ∈ E.
So, we obtain an algorithm which is based on dynamic programming techniques. The running time for solving this problem in case 1 by using Algorithm 3.16 is O(n2 T ).
160
3 Extension and Generalization of Discrete Control Problems
A more general optimal control model on the network is obtained if with each edge e ∈ E at a given moment in time t we associate a function fet (x(t), Ft (x(t))) which depends on state x(t) and on the numerical evaluation Ft (x(t)) of this state. Here, fet (x(t), Ft (x(t))) has the same meaning as ft (x(t), u(t), Ft (x(t))) in the previous model, where u(t) = et , i. e. fet (x(t), Ft (x(t))) = ft (x(t), u(t), Ft (x(t))). For a given trajectory of system passages x(0), x(1), . . . , x(t), x(t + 1) the following recursive formula Ft+1 (x(t + 1)) = fet (x(t), Ft (x(t))),
t = 0, 1, 2, . . .
for determining numerical evaluations of the states are given, where F0 (x(0)) = F0 is considered to be known. In this model we seek for a trajectory x(0), x(1), . . . , x(T − 1), x(T ) = xf which transfers system L from starting state x0 to final state xf with minimal FT (xf ). If fet (x(t), Ft (x(t))), ∀e ∈ E, t = 1, 2, . . . , are increasing functions then the control problem on the network can be solved by using Algorithm 3.16. 3.5.4 The Game-Theoretic Control Model with p Players Now we extend the control model using the concept of non-cooperative games. We assume that the dynamics of system L is controlled by p players: x(t + 1) = gt x(t), u1 (t), u2 (t), . . . , up (t) , t = 0, 1, 2, . . . , (3.19) where x(0) = x0 is a given starting point of L and ui (t) is a vector of control parameters of player i. For each player i ∈ {1, 2, . . . , p} the admissible sets Uti (x(t)), t = 1, 2, . . . , for the vectors of control parameters ui (t), are given. Additionally, a numerical determination Fti (x(t)) of state x(t) at time moment t for player i is defined according to the following recursive formula: i (x(t + 1)) = fti (x(t), u1 (t), u2 (t), . . . , up (t), Fti (x(t))), Ft+1
t = 0, 1, 2, . . . ,
(3.20)
3.5 A General Approach for Algorithmic Solutions
where
F0i (x(0)) = F0i
161
(3.21)
are given representations of the starting state x0 of system L for player i; ft (·, ·, . . . , ·), t = 0, 1, 2, . . . , i = 1, 2, . . . , p, are arbitrary functions. Here Fti (x(t)) expresses the cost of the system’s passage from starting state x0 to state x(t) for player i by a trajectory x0 , x(1), x(2), . . . , x(t), determined by a fixed set of vectors of control parameters u1 (t), u2 (t), . . . , up (t), t = 0, 1, 2, . . . In this model we assume that the players choose vectors of control parameters in order to achieve the final state xf from the starting state at moment in time T (xf ), where T1 ≤ T (xf ) ≤ T2 . Moreover, each player has to minimize his own cost of the system’s passage to xf Ixi 0 xf (u1 , u2 , . . . , up ) = FTi (xf ) (xf ). Note that for a given u1 (t), u2 (t), . . . , up (t) the cost of the system’s passage from x0 to xf can be calculated on the basis of (3.20)-(3.21) if the corresponding trajectory x0 , x(1), x(2), . . . , x(t), . . . passes through final state xf . If for a given u1 (t), u2 (t), . . . , ur (t) the trajectory x0 , x(1), x(2), . . . , x(t), . . . does not pass through the state xf then we put Ixi 0 xf (u1 , u2 , . . . , up ) = ∞. In this model we are seeking for a Nash equilibrium. So, we consider the problem of finding u1∗ (t), u2∗ (t), . . . , up∗ (t), for which the following condition is satisfied: Ixi 0 xf (u1∗ (t), u2∗ (t), . . . , ui−1∗ (t), ui∗ (t), ui+1∗ (t), . . . , up∗ (t)) ≤ ≤ Ixi 0 xf (u1∗ (t), u2∗ (t), . . . , ui−1∗ (t), ui (t), ui+1∗ (t), . . . , up∗ (t)), i = 1, p. If in this game a Nash equilibrium does not exist then we may seek for a Stackelberg solution. In the case of a Stackelberg solution we should define the order of the fixing of vectors of control parameters for each state x = x(t) at every moment in time t = 0, 1, 2, . . . In the following, we will assume that for the considered control problem the alternate players’ control condition (see Section 1.3) is satisfied. This will allow us to regard our problem as the game control problem on networks and will guarantee the existence of Nash equilibria. 3.5.5 The Game-Theoretic Control Problem on Networks and an Algorithm for its Solving In this section we consider the game-theoretic control problem on networks and propose an algorithm for determining optimal strategies of the players in the case that the structure of the network corresponds to a T -partite directed graph.
162
3 Extension and Generalization of Discrete Control Problems
General Statement of the Game-Theoretic Control Problem on Networks Let G = (X, E) be a finite directed graph which describes the dynamics of system L. So, an arbitrary directed edge e = (x, y) ∈ E expresses the possibility of the dynamical system to pass from state x = x(t) to state y = x(t + 1) at every moment in time t = 0, 1, 2, . . . Two states, x0 = x(0) and xf , which correspond to the starting and the final states of L, respectively, are distinguished in G. It is known that system L has to reach the final state at time moment T (xf ) such that T1 ≤ T (xf ) ≤ T2 ; if T2 = T1 = T then the system will reach the final state at time T . Assume that the vertex set X of G is divided into p subsets X = X1 ∪X2 ∪ ∪ · · · ∪ Xp (Xi ∩ Xj = ∅, i = j), where vertices x ∈ Xi are regarded as the positions of player i, i = 1, 2, . . . , p. On G we consider the following dynamic game: The game starts at position x0 = x(0), for which the starting numerical representations F01 (x0 ) = F01 , F02 (x0 ) = F02 , . . . , F0p (x0 ) = F0p of the players 1, 2, . . . , p are given. These quantities express values of the payoff functions of the players 1, 2, . . . , p at time moment t = 0. If x0 ∈ Xi0 then the move is done by player i0 . This means that system L is transferred from state x0 = x(0) to state x1 = x(1) such that e0 = (x(0), x(1)) ∈ E. After that, at time moment t = 1 the values F1i (x(1)), i = 1, p are calculated according to the following formula: F1i (x(1)) = fei0 (x(0), F0i (x(0))),
i = 1, p,
where fei0 , i = 1, p, are arbitrary given functions associated with the edge e0 . Note that in G for each edge e ∈ E at every time step the functions fe1t (·, ·), fe2t (·, ·), . . . , fept (·, ·) are considered to be given. If at time step 1 position x(1) ∈ Xi1 then the move is done by player i1 . This means that player i1 transfers system L from state x(1) to another state x(2) such that e1 = (x(1), x(2)) ∈ E. After that the values F2i (x(2)) = fei1 (x(1), F1i (x(1))),
i = 1, p
are calculated, and so on. In general, if x(t) ∈ Xit then the move is done by player it , i. e. system L is transferred from state x(t) to state x(t + 1) such that et = (x(t), x(t + 1)) ∈ E. At time step t + 1 the quantities i Ft+1 (x(t + 1)) = feit (x(t), Fti (x(t))),
are determined.
i = 1, p
3.5 A General Approach for Algorithmic Solutions
163
As soon as the final state is reached, i. e. x(t + 1) = xf , the game is over and the values of the payoff functions of the players are equal to FT1 (xf ) (xf ), FT2 (xf ) (xf ), . . . , FTp(xf ) (xf ), respectively, where T (xf ) = t+1. In this dynamic game each player i has the aim to minimize his own payoff function FTi (xf ) (xf ) such that T1 ≤ T (xf ) ≤ T2 . More strictly, the game-theoretic control problem on G can be formulated as follows: We define the non-stationary strategies of the players as maps: ui : (x, t) → (y, t + 1) ∈ X(x) × {t + 1}
for x ∈ Xi \ {xf },
t = 0, 1, 2, . . . ,
i = 1, p,
where X(x) = {y ∈ X | (x, y) ∈ E}. Here (x, t) has the same meaning as the notation x(t), i. e. (x, t) = x(t). For any set of non-stationary strategies u1 , u2 , . . . , up of the players we define the quantities Ix10 xf (u1 , u2 , . . . , up ), Ix20 xf (u1 , u2 , . . . , up ), . . . , Ixp0 xf (u1 , u2 , . . . , up ) in the following way: Let u1 , u2 , . . . , up be an arbitrary set of strategies. Then either u1 ,u2 ,. . . ,up generate in G a finite trajectory x0 = x(0), x(1), x(2), . . . , x(T (xf )) = xf from x0 to xf , where T (xf ) represents the time moment when xf is reached, or u1 , u2 , . . . , up generate an infinite trajectory x0 = x(0), x(1), x(2), . . . , x(t), . . . , which does not pass through xf , i. e. T (xf ) = ∞. If state xf is reached at a finite moment in time T (xf ) and T1 ≤ T (xf ) ≤ T2 then we put Ixi 0 xf (u1 , u2 , . . . , up ) = FTi (xf ) (xf ),
i = 1, p
where FTi (xf ) (xf ) is calculated recursively by using the following formula: i i Ft+1 (x(t + 1)) = f(x(t),x(t+1)) (x(t), Fti (x(t))),
t = 0, T (xf ) − 1;
F0i (x(0)) = 0; x(t + 1) = ui (x(t)),
t = 0, T (xf ) − 1.
If state xf cannot be reached at a finite moment in time then we set Ixi 0 xf (u1 , u2 , . . . , up ) = ∞,
i = 1, p.
164
3 Extension and Generalization of Discrete Control Problems
Thus we regard the problem of finding non-stationary strategies u∗1 , u∗2 , . . . , u∗p , for which the following condition is satisfied: Ixi 0 xf (u∗1 , u∗2 , . . . , u∗i−1 , u∗i , u∗i+1 , . . . , u∗p ) ≤ ≤ Ixi 0 xf (u∗1 , u∗2 , . . . , u∗i−1 , ui , u∗i+1 , . . . , u∗p ),
∀ui , i = 1, p.
So, we consider the problem of finding a solution in the sense of Nash. It is easy to observe that if i Ft+1 (x(t + 1)) = Fti (x(t)) + ci(x(t),x(t+1)) (t)
for (x(t), x(t + 1)) ∈ E, i = 1, p, then we obtain the problem from [6, 59]. The Game-Theoretic Control Problem on T -Partite Networks and an Algorithm for its Solving Now we show that if G has the structure of a T + 1-partite graph then a Nash equilibrium for the game-theoretic control problem exists. Moreover, we propose an algorithm for finding optimal strategies of the players. So, we assume that G has the structure of a (T + 1)-partite graph: the vertex set X in G is divided into T + 1 nonempty sets: X = Z0 ∪ Z1 ∪ · · · ∪ ZT , Zi ∩ Zj = ∅, i = j; the edge set E in G is divided into T nonempty sets: E = E0 ∪E1 ∪· · ·∪ET −1 , Ei ∩Ej = ∅, i = j, such that each edge e = (x, y) ∈ Et starts in x ∈ Zt and enters y ∈ Zt+1 , t = 0, T − 1. On G we consider the problem with T1 = T2 = T . So, x0 ∈ Z0 and xf ∈ ZT . Moreover, we assume that each set Zt represents a position set for one of the players i ∈ {1, 2, . . . , p}. So, for each Zt there exists it ∈ {1, 2, . . . , p} such that Zt ⊆ Xit , where X = X1 ∪ X2 ∪ · · · ∪ Xp and Xi is a set of positions of player i. In this case for the game-theoretic control problem it is possible to extend Algorithm 3.16 if for every et ∈ E, t = 0, 1, 2, . . . , the functions feit (x, F i ) are non-decreasing with respect to F i . The values of the payoff functions Ixi 0 xf (u1 , u2 , . . . , up ) = Fti (x(t)) can be found by using the following procedure: Preliminary step (Step 0): For starting position x(0) = x0 set F0i (x(0)) = i F0 , i = 1, p; General step (Step t, t ≥ 0): Assume that at time moment t the position set Zt is controlled by player it ∈ {1, 2, . . . , p}, i.e. Zt ⊆ Xit . Then for an arbitrary state x(t + 1) ∈ Zt+1 find a vertex x (t) ∈ Xt such that it i f(x = (t),x(t+1)) x (t), Ft x (t) =
min −
x(t)∈X (x(t+1))
it f(x(t),x(t+1)) x(t), Ftit x(t) ,
where X − (x(t + 1)) = {x(t) ∈ Zt | (x(t), x(t + 1)) ∈ Et }.
3.5 A General Approach for Algorithmic Solutions
Then calculate i i i x(t + 1) = f(x , Ft+1 (t),x(t+1)) x (t), Ft x (t)
165
i = 1, p.
If t < T − 1 then go to the next step; otherwise STOP. If Fti (x(t)) are known for every x(t) ∈ X then u1 , u2 , . . . , up can be found starting from end position xf by fixing each time uik (x(t)) = x(t + 1), for which ik ik Ft+1 (x(t + 1)) = f(x(t),x(t+1)) (x(t), Ftik (x(t))) (3.22) if x(t) ∈ X − (x(t + 1)) ∩ Xik . For fixed u1 , u2 , . . . , ui−1 , ui+1 , . . . , up the proposed procedure becomes Algorithm 3.16 for the control problem with respect to ui . Therefore, on the basis of the results from Section 3.5.2 we obtain the following theorem: Theorem 3.19. If in a T + 1-partite graph G = (X, E) there exists a directed path from x0 ∈ Z0 to xf ∈ ZT and for every e ∈ E the functions fei (x, F i ) are non-decreasing with respect to F i then for the game-theoretic control problem on G there exists a Nash equilibrium. Proof. We prove the theorem by using an induction principle on the number of players p. In case p = 1 the theorem is evident. Assume that the theorem is true for any p ≤ r and let us show that it is true for p = r + 1. We consider the network with p = r + 1 players and fix successively all possible admissible strategies u11 , u21 , . . . , uq1 of the first player. A strategy uj1 of player 1 is called admissible if for the rest of the players there exist strategies u2 , u3 , . . . , up such that uj1 , u2 , . . . , up generate a trajectory x0 = x(0), x(1), . . . , x(T ) = xf from x0 to xf . Note that the set of admissible strategies of the first player is finite because G has the structure of a (T + 1)-partite graph and T is finite. It is easy to observe that if in G we fix the first possible strategy u11 of the first player then our game becomes the game of p − 1 = r players with respect to the players 2, 3, . . . , p on G1 = (X, E 1 ), where E 1 is obtained from E by deleting edges (x, y) originating in x ∈ X1 such that y = y(t) = u11 (x(t)). So, for every vertex x ∈ X1 in G1 there exists only one leaving edge (x, u11 (x(t))). On G1 = (X, E 1 ) we consider a game-theoretic control problem with respect to the players 2, 3, . . . , p. In this game we can regard the vertices x ∈ X1 as positions of an arbitrary other player; in the following we regard them as positions of the second player. Then according to the induction principle for this game-theoretic control problem there exists a Nash equilibrium u12 , u13 , . . . , u1p and we can calculate the corresponding costs FT21 (xf ), FT31 (xf ), . . . , FTp1 (xf ) of the system passages from x0 to xf , where FTi1 (xf ) = Ixi10 xf (u12 , u13 , . . . , u1p ), i = 2, p.
166
3 Extension and Generalization of Discrete Control Problems
In an analogous way we can fix the second possible strategy u21 of the first player. If we solve the new obtained game-theoretic problem we find a Nash equilibrium u22 , . . . , u2p and so on. After q steps we obtain a set of solutions u11 , u12 , . . . , u1p u21 , u22 , . . . , u2p .. . uq1 , uq2 , . . . , uqp k∗ k∗ Among these solutions we select that one uk∗ 1 , u2 , . . . , up , for which
FT1k (xf ) = min {FT1j (xf )}
(3.23)
1≤j≤q
Let us show that uk1 , uk2 , . . . , ukp is a Nash equilibrium for the initial gametheoretic control problem. First of all it is evident that Ixi 0 xf (uk1 , uk2 , . . . , uki−1 , ui , uki+1 , . . . , ukp ) ≤ ≤ Ixi 0 xf (uk1 , uk2 , . . . , uki−1 , uki , uki+1 , . . . , ukp ),
∀ui , i = 1,
because for fixed uk1 , we have a Nash equilibrium uk2 , uk3 , . . . , ukp on Gk with respect to the players 2, 3, . . . , p. Taking into account that G is a T + 1-partite graph and each level Zi belongs to one of the sets X1 , X2 , . . . , Xp , condition (3.23) involves Ixi 0 xf (uk1 , uk2 , uk3 , . . . , ukp ) ≤ ≤ Ixi 0 xf (u1 , uk2 , . . . , ukp ),
∀u1 .
In order to illustrate the details of the proposed algorithm we shall use the following example: Example. We consider the game-theoretic control problem on a network with two players. This network has the structure of a 4-partite graph G = (X, E) given in Fig. 3.6. So, X = Z0 ∪Z1 ∪Z2 ∪Z3 , where Z0 = {0}, Z1 = {1, 2}, Z2 = {3, 4, 5}, Z3 = {3}; T = 3, x0 = 0 ∈ Z0 , xf = 6 ∈ Z3 . The position sets of the players 1 and 2 are determined by X1 = {0, 3, 4, 5} and X2 = {1, 2, 6}, respectively. With each edge e ∈ E two functions fe1 (x(t), Ft1 (x(t + 1))), fe2 (x(t), Ft2 (x(t + 1))) are associated, i.e.
3.5 A General Approach for Algorithmic Solutions
3 1
0
4
6
2
5 Fig. 3.6. 1 f(0,1) (0, Ft1 (0)) = Ft1 (0) + 2t; 1 f(0,2) (0, Ft1 (0)) = Ft1 (0) + 1; 1 f(1,3) (1, Ft1 (1)) = Ft1 (1) + 3t; 1 f(1,4) (1, Ft1 (1)) = Ft1 (1) + t; 1 f(2,4) (2, Ft1 (2)) = Ft1 (2) + 6t; 1 f(2,5) (2, Ft1 (2)) = Ft1 (2) + 2; 1 f(3,6) (3, Ft1 (3)) = Ft1 (3) + 1; 1 f(4,6) (4, Ft1 (4)) = Ft1 (4) + 3; 1 f(5,6) (5, Ft1 (5)) = Ft1 (5) + t;
2 f(0,1) (0, Ft2 (0)) = Ft2 (0) + 3; 2 f(0,2) (0, Ft2 (0)) = Ft2 (0) + t; 2 f(1,3) (1, Ft2 (1)) = Ft2 (1) + 1; 2 f(1,4) (1, Ft2 (1)) = Ft2 (1) + 3; 2 f(2,4) (2, Ft2 (2)) = Ft2 (2) + 2; 2 f(2,5) (2, Ft2 (2)) = Ft2 (2) + 2t; 2 f(3,6) (3, Ft2 (3)) = Ft2 (3) + 2; 2 f(4,6) (4, Ft2 (4)) = Ft2 (4) + 1; 2 f(5,6) (5, Ft2 (5)) = Ft2 (5) + 2t.
F01 (0) = 0 and F02 (0) = 5 are given for the starting position x0 = 0. According to the algorithm we can recursively calculate: Step 0. Fix F01 (0) = 0, F02 (0) = 5. Step 1. 1 (0, F01 (0)) = F01 (0) + 2 · 0 = 0; F11 (1) = f(0,1) 2 (0, F02 (0)) = F02 (0) + 3 = 5 + 3 = 8; F12 (1) = f(0,1) 1 (0, F01 (0)) = F01 (0) + 1 = 0 + 1 = 1; F11 (2) = f(0,2) 2 (0, F02 (0)) = F02 (0) + 0 = 5 + 0 = 5. F12 (2) = f(0,2)
Step 2. 1 F21 (3) = f(1,3) (1, F11 (1)) = F11 (1) + 3 · 1 = 0 + 3 = 3; 2 (1, F12 (1)) = F12 (1) + 1 = 8 + 1 = 9. F22 (3) = f(1,3)
167
168
3 Extension and Generalization of Discrete Control Problems
In vertex 4 two of the edges (1, 4) and (2, 4) enter and 1, 2 ∈ X2 . Therefore, we calculate 2 (1, F12 (1)), F22 (4) = min{f(1,4) 2 f(2,4) (2, F12 (2))} = min{F12 (1) + 3, F12 (2) + 2}
= min{8 + 3, 5 + 2} = 7. Taking into account that this minimum is obtained for edge (2, 4) we calculate 1 F21 (4) = f(2,4) (2, F11 (2)) = F11 (2) + 6 · 1 = 1 + 6 = 7.
Then we calculate 1 F21 (5) = f(2,5) (2, F11 (2)) = F11 (2) + 2 = 1 + 2 = 3; 2 F22 (5) = f(2,5) (2, F12 (2)) = F12 (2) + 2 · 2 = 5 + 4 = 9.
Step 3. In vertex 6 three edges (3, 6), (4, 6) and (5, 6) enter and 3, 4, 5 ∈ X1 . Therefore, we calculate 1 1 1 (3, F21 (3)), f(4,6) (4, F21 (4)), f(5,6) (5, F21 (5))} F31 (6) = min{f(3,6)
= min{F21 (3) + 1, F21 (4) + 3, F21 (5) + 2} = min{3 + 1, 7 + 3, 3 + 2} = 4. This minimum is obtained for edge (4, 6), therefore, we calculate 2 F32 (6) = f(4,6) (4, F22 (4)) = F22 (4) + 1 = 7 + 1 = 8.
The optimal trajectory from x0 = 0 to xf = 6 we determine starting from the final position. On the basis of the algorithm we obtain 6 ← 4 ← ← 2 ← 0, because F31 (6), F22 (4), F11 (2) satisfy condition (3.22). In such a way we obtain the optimal strategies of the players: u∗1 : (0, 0) → (2, 1);
(4, 2) → (6, 3),
u∗2 : (2, 1) → (4, 2), which generate a trajectory 0, 2, 4, 6 from x0 = 0 to xf = 6. The proposed algorithm finds only one Nash equilibrium for the considered control problem. It is evident that a Nash equilibrium in this game-theoretic problem may not be unique.
3.5 A General Approach for Algorithmic Solutions
169
The proposed approach allows us to conclude that for determining a Nash equilibrium the player at a given stage must know which minimizing points the other players have used at the previous stages. The proposed approach allows us to determine optimal non-stationary strategies of the players in dynamical games from [67], but do not allow to determine optimal strategies of the players for dynamical games from [5]. A similar multi-criteria control problem with a Pareto optimal solution can be formulated and dynamic programming techniques for solving it can be developed. 3.5.6 Multi-Criteria Discrete Control Problems: Pareto Optima In this section we extend the control model from Section 3.5.1 using the concept of cooperative games. General Statement of the Problem We assume that the dynamics of a system L is controlled by p players, who coordinate their actions using a common vector of control parameters u(t). So, the dynamics of system L is described according to (3.9)-(3.11). Let x(0), x(1), . . . , x(t), . . . be a process generated according to (3.9)-(3.11) with a given vector of control parameters u(t), t = 0, 1, 2, . . . For each state we define quantities Fti (x(t)), i = 1, 2, . . . , p in the following way: i (x(t + 1)) = fti (x(t), u(t), Fti (x(t))) Ft+1
where
F0i (x(0)) = F0i ,
i = 1, 2, . . . , p
(3.24) (3.25)
are given representations of the starting state x(0) of system L; fti (x(t), u(t), Fti (x(t))), t = 0, 1, 2, . . . , are arbitrary functions. So, Fti (x(t)) expresses the cost of the system’s passage from state x(0) to state x(t) for player i. In this model we assume that the players choose vectors of control parameters in order to achieve the final state xf from a starting state x0 at moment in time T (xf ), where T1 ≤ T (xf ) ≤ T2 . For a given u(t) the cost of the system’s passage from x0 to xf for player i is calculated on the basis of (3.9)-(3.11), (3.24), (3.25) and we put Ixi 0 xf (u(t)) = FTi (xf ) (xf ), if the trajectory passes through xf at time moment T (xf ), such that T1 ≤ T (xf ) ≤ T2 ; otherwise we put Ixi 0 xf (u(t)) = ∞.
170
3 Extension and Generalization of Discrete Control Problems
We consider the problem of finding a Pareto solution u∗ (t), i.e. there is no other vector u(t), for which Ix10 xf (u(t)), Ix20 xf (u(t)) . . . , Ixp0 xf (u(t)) ≤ ≤ Ix10 xf (u∗ (t)), Ix20 xf (u∗ (t)) . . . , Ixp0 xf (u∗ (t)) and for any i0 ∈ {1, 2, . . . , p} Ixi00 xf (u(t)) < Ixi00 xf (u∗ (t)). The Multi-Criteria Problem on the Network and an Algorithm for its Solving on T -Partite Networks We formulate the multi-criteria control model on the network in general form on the basis of the control model from Section 1.4. Let G = (X, E) be a directed graph of transitions for a dynamical system L with a given starting state x0 ∈ X and a final state xf ∈ X. Additionally, for state x0 starting representations F01 (x0 ) = F01 , F02 (x0 ) = F02 , . . . , F0p (x0 ) = F0p are given, which express the payoff functions of the players at time-moment t = 0. We define the control u∗ on G as a map u : (x, t) → (y, t + 1) ∈ XG (x) × {t + 1}
for x ∈ X \ {xf }, t = 1, 2, . . .
For an arbitrary control u we define quantities: Ix10 xf (u), Ix20 xf (u), . . . , Ixp0 xf (u) in the following way: Let x0 = x(0), x(1), x(2), . . . , x(T (xf )) = xf be a trajectory from x0 to xf generated by control u, where T (xf ) is the time-moment, when state xf is reached. Then we put Ixi 0 xf (u) = FTi (xf ) (xf )
if T1 ≤ T (xf ) ≤ T2 , i = 1, p,
where Fti (x(t)) are calculated recursively by using the following formula: i i Ft+1 (x(t + 1)) = f(x(t),x(t+1)) (x(t), Fti (x(t))),
t = 0, T (xf ) − 1;
F0i (x(0)) = F0i , where fe1 (·, ·), fe2 (·, ·), . . . , fep (·, ·) are arbitrary functions. If T (xf ) ∈ / [T1 , T2 ] then we put I i (u) = ∞, i = 1, p. We regard the problem of finding a Pareto solution u∗ .
3.6 Pareto-Nash Equilibria for Multi-Objective Games
171
In the following, let us show that if graph G has the structure of a (T + 1)partite graph and T1 = T2 = T , then the algorithm from Section 3.5.2 can be extended for a multi-criteria control problem on a network. So, assume that the vertex set X is represented as X = Z0 ∪ Z1 ∪ · · · ∪ ZT , Zi ∩ Zj = ∅, i = j, and the edge set E is divided into T non-empty subsets E = E0 ∪ E1 ∪ · · · ∪ ET −1 such that an arbitrary edge e = (y, z) ∈ Et begins in y ∈ ZT and enters z ∈ Zt+1 , t = 0, T − 1. In this case, for a nondecreasing function fei (·, ·) with respect to the second argument the values I i (u) = Fti (xt ) can be calculated by using the following procedure: Preliminary step (Step 0): For a starting position x(0) = x0 set F0i (x(0)) = i = 1, p; for any x ∈ X \ {x0 } put Fti (x(t)) = ∞, i = 1, p, t = 1, T . General step (Step t, t ≥ 0): For an arbitrary state x(t + 1) ∈ Xt+1 find a vertex x (t) ∈ Xt such that there is no other vertex x(t) ∈ Xt \ {xf } for which 1 2 (x(t), Ft1 (x(t))), f(x(t),x(t+1)) (x(t), Ft2 (x(t))), . . . , f(x(t),x(t+1)) p , . . . , f(x(t),x(t+1)) (x(t), Ftp (x(t))) ≤ 1 1 2 2 ≤ f(x (t),x(t+1)) (x (t), Ft (x (t))), f(x (t),x(t+1)) (x (t), Ft (x (t))), . . . , p p . . . , f(x (t),x(t+1)) (x (t), Ft (x (t))) F0i ,
and
i0 i0 i0 (x(t), Fti0 (x(t))) < f(x f(x(t),x(t+1)) (t),x(t+1)) (x (t), Ft (x (t)))
for any i0 ∈ {1, 2, . . . , p}. Then calculate i i i (x(t + 1)) = f(x Ft+1 (t),x(t+1)) (x (t), Ft (x (t))),
i = 1, p.
If t < T − 1 then go to the next step; otherwise STOP. If Fti (x(t)) are known for every vertex x(t) ∈ X then a Pareto optimum u∗ can be found starting from the end position xf by fixing each time u∗ (x(t)) = x(t + 1) for which i i (x(t + 1)) = f(x(t),x(t+1)) (x(t), Fti (x(t))), Ft+1
i = 1, p.
3.6 Pareto-Nash Equilibria for Multi-Objective Games In this section we consider multi-objective games, which generalize noncooperative ones [48, 82, 83] and Pareto multi-criteria problems [89, 91, 97]. The payoff functions of the players in such games are presented as vector functions, where the players intend to optimize them in the sense of Pareto
172
3 Extension and Generalization of Discrete Control Problems
on their sets of strategies. At the same time in our game-theoretic model it is assumed that the players are interested to preserve a Nash optimality principle when they interact between them on the set of situations. Such an aspect of the game leads to a new equilibria notion which we call Pareto-Nash equilibria [9, 74, 85]. Such a concept can be used for multi-objective control problems and algorithms for their solving can be derived. 3.6.1 Problem Formulation The multi-objective game with p players is denoted by Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ), where Xi is the set of strategies of the player i, i = 1, p, and F i = (Fi1 , Fi2 , . . . , Firi ) is the vector payoff function of player i, defined on the set of situations X = X1 × X2 × · · · × Xp : F i : X1 × X2 × · · · × Xp → Rri ,
i = 1, p.
Each component Fik of F i corresponds to a partial criterion of player i and represents a real function defined on the set of situations X = X1 × X2 × · · · × Xp : Fik : X1 × X2 × · · · × Xp → R1 , k = 1, ri , i = 1, p. We call a solution of the multi-objective game Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ) a Pareto-Nash equilibrium and define it in the following way: Definition 3.20. The situation x∗ = (x∗1 , x∗2 , . . . , x∗p ) ∈ X is called a ParetoNash equilibrium for a multiobjective game Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ) if for every i ∈ {1, 2, . . . , p} strategy x∗i represents a Pareto solution for the following multi-criteria problem: i
max → f x∗ (xi ) = (fxi1∗ (xi ), fxi2∗ (xi ), . . . , fxir∗i (xi )),
xi ∈Xi
where fxik∗ (xi ) = Fik (x∗1 , x∗2 , . . . , x∗i−1 , xi , x∗i+1 , . . . , x∗p ),
k = 1, ri , i = 1, p.
This definition generalizes the well-known Nash equilibria notation for classical noncooperative games (single-objective games) and Pareto optima for multi-criteria problems. If ri = 1, i = 1, p, then Γ becomes a classical non-cooperative game, where x∗ represents a Nash equilibria solution; in case p = 1 the game Γ becomes a Pareto multi-criteria problem, where x∗ is a Pareto solution. An important special class of multi-objective games represents zero-sum games of two players. This class is obtained from the general case of a multiobjective game Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ) when p = 2, r1 = r2 = r and F 2 (x1 , x2 ) = −F 1 (x1 , x2 ). The zero-sum multi-objective game is denoted by Γ = (X1 , X2 , F ), where F (x1 , x2 ) = F 2 (x1 , x2 ) = −F 1 (x1 , x2 ).
3.6 Pareto-Nash Equilibria for Multi-Objective Games
173
A Pareto-Nash equilibrium for this game corresponds to a saddle point x∗ = (x∗1 , x∗2 ) ∈ X1 × X2 for the following max-min multi-objective problem: max min → F (x1 , x2 ) = (F 1 (x1 , x2 ), F 2 (x1 , x2 ), . . . , F r (x1 , x2 )). (3.26)
x1 ∈X1 x2 ∈X2
Strictly, we define a saddle point x∗ = (x∗1 , x∗2 ) ∈ X1 × X2 for the zero-sum multi-objective problem (3.26) in the following way: Definition 3.21. The situation (x∗1 , x∗2 ) ∈ X1 × X2 is called a saddle point for the max-min multi-objective problem (3.26) (i.e. for the zero-sum multiobjective game Γ = (X1 , X2 , F )) if x∗1 is a Pareto solution for the multicriteria problem: max → F (x1 , x∗2 ) = (F 1 (x1 , x∗2 ), F 2 (x1 , x∗2 ), . . . , F r (x1 , x∗2 )),
x1 ∈X1
and x∗2 is a Pareto solution for the multi-criteria problem: min → F (x∗1 , x2 ) = (F 1 (x∗1 , x2 ), F 2 (x∗1 , x2 ), . . . , F r (x∗1 , x2 )).
x2 ∈X2
If r = 1 this notion corresponds to a classical saddle point notation for min-max problems, i.e. we obtain the saddle point notation for classical zerosum games of two players. In this section we show that the theorems of J. Nash [82] and J. Neumann [80, 83] related to classical non-cooperative games can be extended for our multi-objective case of games. Moreover, we show that all results related to discrete multi-objective games, especially matrix games can be developed in an analogous way as for classical ones. Algorithms for determining optimal strategies of the players in considered games will be developed. 3.6.2 Main Results First we formulate the main theorem which represents an extension of the Nash theorem for our multi-objective version of the game. Theorem 3.22. Let Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ) be a multi-objective game, where X1 , X2 , . . . , Xp are convex compact sets and F 1 , F 2 , . . . , F p represent continuous vector payoff functions. Moreover, let us assume that for every i ∈ {1, 2, . . . , p} each component Fik (x1 , x2 , . . . , xi−1 , xi , xi+1 , . . . , xp ), k ∈ {1, 2, . . . , ri }, of the vector function F i (x1 , x2 , . . . , xi−1 , xi , xi+1 , . . . , xp ) represents a concave function with respect to xi on Xi for fixed x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp . Then for the multi-objective game Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ) there exists a Pareto-Nash equilibria situation x∗ = (x∗1 , x∗2 , . . . , x∗p ) ∈ X1 × X2 × · · · × Xp .
174
3 Extension and Generalization of Discrete Control Problems
Proof. Let α11 , α12 , . . . , α1r1 , α21 , α22 , . . . , α2r2 , . . . , αp1 , αp2 , . . . , αprp be an arbitrary set of real numbers which satisfy the following condition ⎧ ri ⎪ ⎨ αik = 1, i = 1, p; k=1 ⎪ ⎩
αik > 0, k = 1, ri ,
(3.27) i = 1, p.
We consider an auxiliary non-cooperative game (single-objective game) Γ = (X1 , X2 , . . . , Xp , f1 , f2 , . . . , fp ), where fi (x1 , x2 , . . . , xp ) =
ri
αik Fik (x1 , x2 , . . . , xp ),
i = 1, p.
k=1
It is evident that fi (x1 , x2 , . . . , xi−1 , xi , xi+1 , . . . , xp ) for every i ∈ {1, 2, . . . , p} represents a continuous and concave function with respect to xi on Xi for fixed x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp because α11 , α12 , . . . , α1r1 , α21 , α22 , . . . , α2r2 , . . . , αp1 , αp2 , . . . , αprp satisfy condition (3.27) and Fik (x1 , x2 , . . . , xi−1 , xi , xi+1 , . . . , xp ) is a continuous and concave function with respect to xi on Xi for fixed x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp , k = 1, ri , i = 1, p. According to the Nash theorem [82] for the non-cooperative game Γ = (X1 , X2 , . . . , Xp , f1 , f2 , . . . , fp ) there exists a Nash equilibrium x∗ = (x∗1 , x∗2 , . . . , x∗p ), i.e. fi (x∗1 , x∗2 , . . . , x∗i−1 , xi , x∗i+1 , . . . , x∗p ) ≤ ≤ fi (x∗1 , x∗2 , . . . , x∗i−1 , x∗i , x∗i+1 , . . . , x∗p ),
∀xi ∈ Xi , i = 1, p.
Let us show that x∗ = (x∗1 , x∗2 , . . . , x∗p ) is a Pareto-Nash equilibria solution for the multi-objective game Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ). Indeed, for every xi ∈ Xi we have ri
αik Fik (x∗1 , x∗2 , . . . , x∗i−1 , xi , x∗i+1 , . . . , x∗p )
k=1
= fi (x∗1 , x∗2 , . . . , x∗i−1 , xi , x∗i+1 , . . . , x∗p ) ≤ fi (x∗1 , x∗2 , . . . , x∗i−1 , x∗i , x∗i+1 , . . . , x∗p ) =
ri k=1
αik Fik (x∗1 , x∗2 , . . . , x∗i−1 , x∗i , x∗i+1 , . . . , x∗p ) ∀xi ∈ Xi , i = 1, p.
3.6 Pareto-Nash Equilibria for Multi-Objective Games
175
So, ri
αik Fik (x∗1 , x∗2 , . . . , x∗i−1 , xi , x∗i+1 , . . . , x∗p ) ≤
k=1
≤
ri
αik Fik (x∗1 , x∗2 , . . . , x∗i−1 , x∗i , x∗i+1 , . . . , x∗p ),
∀xi ∈ Xi , i = 1, p,
k=1
(3.28) for given α11 , α12 , . . . , α1r1 , α21 , α22 , . . . , α2r2 , . . . , αp1 , αp2 , . . . , αprp which satisfy (3.27). Taking in account that the functions fxik∗ = Fik (x∗1 , x∗2 , . . . , x∗i−1 , xi , x∗i+1 , . . . , x∗p ), k = 1, ri , are concave functions with respect to xi on a convex set Xi r i and αi1 , αi2 , . . . , αik satisfy the condition k=1 αik = 1, αik > 0, k = 1, ri , then according to the theorem from [23] (see also [10, 11, 12]) condition (3.28) implies that x∗i is a Pareto solution for the following multi-criteria problem: i
max → f x∗ (xi ) = (fxi1∗ (xi ), fxi2∗ (xi ), . . . , fxir∗i (xi )),
xi ∈Xi
i ∈ {1, 2, . . . , p}.
This means that x∗ = (x∗1 , x∗2 , . . . , x∗p ) is a Pareto-Nash equilibria solution for the multi-objective game Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ). So, if the conditions of Theorem 3.22 are satisfied then a Pareto-Nash equilibria solution for the multi-objective game can be found by using the following algorithm: Algorithm 3.23. Determining Pareto-Nash Equilibria of a MultiObjective Game 1. Fix an arbitrary set of real numbers α11 , α12 , . . . , α1r1 , α21 , α22 , . . . , α2r2 , . . . , αp1 , αp2 , . . . , αprp , which satisfy condition (3.27); 2. Form the single-objective game Γ = (X1 , X2 , . . . , Xp , f1 , f2 , . . . , fp ), where fi (x1 , x2 , . . . , xp ) =
ri
αik Fik (x1 , x2 , . . . , xp ),
i = 1, p;
k=1
3. Find Nash equilibria x∗ = (x∗1 , x∗2 , . . . , x∗p ) for the non-cooperative game Γ = (X1 , X2 , . . . , Xp , f1 , f2 , . . . , fp ) and fix x∗ as a Pareto-Nash equilibria solution for the multi-objective game Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ). Remark 3.24. Algorithm 3.23 finds only one of the solutions for the multiobjective game Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ). In order to find all solutions in the sense of Pareto-Nash, it is necessary to apply Algorithm 3.23 for every α11 , α12 , . . . , α1r1 , α21 , α22 , . . . , α2r2 , . . . , αp1 , αp2 , . . . , αprp which satisfy (3.27) and to form the union of all obtained solutions.
176
3 Extension and Generalization of Discrete Control Problems
Note that the proof of Theorem 3.22 is based on a reduction of the multiobjective game Γ = (X1 , X2 , . . . , Xp , F1 , F2 , . . . , Fp ) to an auxiliary one Γ = (X1 , X2 , . . . , Xp , f1 , f2 , . . . , fp ) for which the Nash theorem from [82] can be applied. In order to reduce the multi-objective game Γ to an auxiliary one Γ a linear convolution criteria for vector payoff functions in the proof of Theorem 3.22 has been used. Perhaps a similar reduction of the multi-objective game to a classical one can be used also applying other convolution procedures for vector payoff functions of the players, for example, the standard procedure for the multi-criteria problem from [23, 97]. For the zero-sum multi-objective game of two players the following theorem holds: Theorem 3.25. Let Γ = (X1 , X2 , F ) be a zero-sum multi-objective game of two players, where X1 , X2 are convex compact sets and F (x1 , x2 ) is a continuous vector function on X1 × X2 . Moreover, let us assume that each component F k (x1 , x2 ), k ∈ {1, 2, . . . , r}, of F (x1 , x2 ) for fixed x1 ∈ X1 represents a convex function with respect to x2 on X2 and for every fixed x2 ∈ X2 it is a concave function with respect to x1 on X1 . Then for the zero-sum multiobjective game Γ = (X1 , X2 , F ) there exists a saddle point x∗ = (x∗1 , x∗2 ) ∈ X1 × X2 , i.e. x∗1 is a Pareto solution for the multi-criteria problem: max → F (x1 , x∗2 ) = (F 1 (x1 , x∗2 ), F 2 (x1 , x∗2 ), . . . , F r (x1 , x∗2 ))
x1 ∈X1
and x∗2 is a Pareto solution for the multi-criteria problem: min → F (x∗1 , x2 ) = (F 1 (x∗1 , x2 ), F 2 (x∗1 , x2 ), . . . , F r (x∗1 , x2 )).
x2 ∈X2
Proof. The proof of Theorem 3.25 can be obtained as a corollary from Theorem 3.22 if we regard our zero-sum game as a game of two players of the form Γ = (X1 , X2 , F 1 (x1 , x2 ), F 2 (x1 , x2 ), where F 2 (x1 , x2 ) = −F 1 (x1 , x2 ) = F (x1 , x2 ). The proof of Theorem 3.25 can be obtained also by reducing our zero-sum multi-objective game Γ = (X1 , X2 , F ) to the classical single-objective case Γ = (X1 , X2 , f ) and applying Neumann’s theorem from [83], where f (x1 , x2 ) =
r
αk F k (x1 , x2 )
k=1
and α1 , α2 , . . . , αr are arbitrary real numbers, such that r
αk = 1;
αk > 0, k = 1, r.
k=1
It is easy to show that if x∗ = (x∗1 , x∗2 ) is a saddle point for the zero-sum game Γ = (X1 , X2 , f ) then x∗ = (x∗1 , x∗2 ) represents a saddle point for the zero-sum multi-objective game Γ = (X1 , X2 , F ).
3.6 Pareto-Nash Equilibria for Multi-Objective Games
177
So, if the conditions of Theorem 3.25 are satisfied then a solution of a zero-sum multi-objective game Γ = (X1 , X2 , F ) can be found by using the following algorithm: Algorithm 3.26. Determining the Saddle Point of Payoff Functions in a Zero-Sum Multi-Objective Game 1. Fix an arbitrary set of real numbers α1 , α2 , . . . , αr , such that r
αk = 1;
αk > 0, k = 1, r;
k=1
2. Form the zero-sum game Γ = (X1 , X2 , f ), where f (x1 , x2 ) =
r
αk F k (x1 , x2 ).
k=1 ∗
(x∗1 , x∗2 ) for the single-objective zero-sum game x∗ = (x∗1 , x∗2 ) as a saddle point for the zero-sum
3. Find a saddle point x = Γ = (X1 , X2 , f ). Then fix multi-objective game Γ = (X1 , X2 , F ).
Remark 3.27. Algorithm 3.26 finds only one solution for the given zero-sum multi-objective game Γ = (X1 , X2 , F ). In order to find all saddle points it is necessaryto apply Algorithm 3.26 for every α1 , α2 , . . . , αr satisfying the r conditions k=1 αk = 1; αk > 0, k = 1, r, and then to form the union of the obtained solutions. Note, that for reducing the zero-sum multi-objective games to classical ones another convolution criteria for vector payoff functions can be used, i.e. the standard procedure from [23, 97]. 3.6.3 Discrete and Matrix Multi-Objective Games Discrete multi-objective games are determined by the discrete structure of the sets of strategies X1 , X2 , . . . , Xp . If X1 , X2 , . . . , Xp are finite sets then we may consider Xi = Ji , Ji = {1, 2, . . . , qi }, i = 1, p. In this case the multi-objective game is determined by vectors F i = (Fi1 , Fi2 , . . . , Firi ),
i = 1, p,
where each component Fik , k = 1, ri , represents a p-dimensional matrix of size q1 × q 2 × · · · × q p . If p = 2 then we have a bi-matrix multi-objective game and if F2 = −F1 then we obtain a matrix multi-objective one. In an analogous way as for singleobjective matrix games here we can interpret the strategies ji ∈ Ji , i = 1, p, of the players as pure strategies.
178
3 Extension and Generalization of Discrete Control Problems
It is evident that for such matrix multi-objective games Pareto-Nash equilibria may not exist, because Nash equilibria may not exist for bi-matrix and matrix games in pure strategies. But with each finite discrete multiobjective game we can associate a continuous multi-objective game Γ = (Y1 , Y2 , . . . , Yp , f 1 , f 2 , . . . , f p ) by introducing mixed strategies yi = (yi1 , yi2 , . . . , yiri ) ∈ Yi of player i and vector payoff functions f 1 , f 2 , . . . , f p , which we define in the following way: ⎧ ⎫ ri ⎨ ⎬ yij = 1, yij ≥ 0, j = 1, ri ; Yi = yi = (yi1 , yi2 , . . . , yiri ) ∈ Rri ⎩ ⎭ j=1 f i = fi1 , fi2 , . . . , firi , where fik (y11 , y12 , . . . , y1r1 , y21 , y22 , . . . , y2r2 , . . . , yp1 , yp2 , . . . , yprp ) = =
r1 r2 j1 =1 j2 =1
···
rp
F k (j1 , j2 , . . . , jp ) yij1 yij2 . . . yijp ;
k = 1, ri , i = 1, p.
jp =1
It is easy to observe that for an auxiliary multi-objective game Γ = (Y1 , Y2 , . . . , Yp , f 1 , f 2 , . . . , f p ) the conditions of Theorem 3.22 are satis∗ ∗ ∗ ∗ ∗ , y12 , . . . , y1r , y21 , y22 ,..., fied and, therefore, Pareto-Nash equilibria y ∗ = (y11 1 ∗ ∗ ∗ ∗ y2r2 , . . . , yp1 , yp2 , . . . , yprp ) exist. In the case of matrix games the auxiliary zero-sum multi-objective game of two players is defined as follows: Γ = (Y1 , Y2 , f ); ⎧ ⎨ Y1 = y1 = (y11 , y12 , . . . , y1r ) ∈ Rr ⎩
⎫ r ⎬ y = 1, y ≥ 0, j = 1, r ; 1j 1j ⎭ j=1
⎧ ⎨ Y2 = y2 = (y21 , y22 , . . . , y2r ) ∈ Rr ⎩
⎫ r ⎬ y2j = 1, y2j ≥ 0, j = 1, r ; ⎭ j=1
f = (f 1 , f 2 , . . . , f r ), r r f k (y11 , y12 , . . . , y1r , y21 , y22 , . . . , y2r ) = F k (j1 , j2 )y1j1 y2j2 ;
k = 1, r.
j1 =1 j2 =1
The game Γ = (Y1 , Y2 , f ) satisfies the conditions of Theorem 3.25 and, therefore, a saddle point y ∗ = (y1∗ , y2∗ ) ∈ Y1 × Y2 exists.
3.6 Pareto-Nash Equilibria for Multi-Objective Games
179
So, the results related to discrete and matrix games can be extended for the multi-objective case of the game and can be interpreted in an analogous way as for single-objective games. In order to solve these associated multiobjective games, Algorithms 3.23 and 3.26 can be applied. 3.6.4 Some Comments on and Interpretations of Multi-Objective Games The considered multi-objective games extend the classical ones and represent a combination of cooperative and non-cooperative games. Indeed, the player i in a multi-objective game Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ) can be regarded as a union of ri sub-players with payoff functions Fi1 , Fi2 , . . . , Firi , respectively. So, the game Γ represents a game with p coalitions 1, 2, . . . , p which interact between them on the set of situations X1 × X2 × · · · × Xp . The introduced Pareto-Nash equilibria notation uses the concept of cooperative games because according to this notation sub-players of the same coalitions should optimize in the sense of Pareto their vector functions F on the set of strategies Xi . On the other hand the Pareto-Nash equilibria notation takes also into account the concept of non-cooperative games, because the coalitions interact between them on the set of situations X1 ×X2 ×· · ·×Xp and are interested to maintain Nash equilibria between coalitions. The obtained results allow us to describe a class of multi-objective games for which a Pareto-Nash equilibria exists. Moreover, a suitable algorithm for finding Pareto-Nash equilibria is proposed. 3.6.5 Determining a Pareto-Stackelberg Solution for Multi-Objective Games For the multi-objective game Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ) a hierarchical optimization principle can be applied in an analogous way as for the classical static game from Section 1.12.2. This allows us to define a ParetoStackelberg solution x∗1 , x∗2 , . . . , x∗p for the multi-objective game Γ if we assume that the players fix their strategies successively one after another according to their numerical order. Each player optimizes his vector payoff function in the sense of Pareto and, fixing his optimal strategy x∗i , informs posterior players which strategy has been applied. So, a Pareto-Stackelberg solution for the considered multi-objective game Γ = (X1 , X2 , . . . , Xp , F 1 , F 2 , . . . , F p ) can be defined in the same way as the Stackelberg solution for the classical static game from Section 1.12.2 if at the corresponding level player i optimizes his vector payoff function in the sense of Pareto taking into account that the previous players have already fixed their strategies and the posterior players will act optimally. It is easy to show that if a set of strategies x∗1 , x∗2 , . . . , x∗p is a ParetoStackelberg solution of the multi-objective game for an arbitrary order of fixing strategies of the players, then x∗1 , x∗2 , . . . , x∗p is a Pareto-Nash solution.
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
In this chapter we consider another important generalization of discrete control problems which leads to optimal dynamic flow problems on networks. We show that the time-expanded network method can be extended for such kind of problems and efficient algorithms for determining optimal flows in dynamic networks can be derived. For the considered class of problems the gametheoretical concept can be applied in an analogous way as for the problems from the previous chapters and new classes of games on dynamic networks will be obtained.
4.1 Single-Commodity Dynamic Flow Problems and the Time-Expanded Network Method for Their Solving We consider the dynamic nonlinear minimum cost flow problem and the dynamic maximum flow problem on networks [26, 27, 29, 30, 31, 32, 33, 41, 63]. These problems generalize corresponding classical flow problems on static networks [33, 87, 88] and extend some dynamic models from the previous chapters. We consider the minimum cost flow problem on dynamic networks with nonlinear cost functions defined on the edges that depend on time and on flow. Moreover, we assume that the demand-supply function and the capacity function depend on time. The maximum flow problem is considered on dynamic networks with time-varying capacities of the edges. We propose algorithms for solving these problems, which are based on the reduction of the dynamic problems to the classical static problems on a time-expanded network. We also propose algorithms for constructing reduced time-expanded networks for acyclic networks and estimate their size. We consider dynamic networks with different types of cost functions and in the case of incapacitated dynamic networks with cost functions, which are concave with regard to the flow value and do not change over time, we reduce the problem to the minimum cost flow problem on a static network of equal size, not the time-expanded network.
182
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
A more general model of the minimum cost flow problem on dynamic networks with transition time functions that depend on the amount of flow and the moment in time is studied and an algorithm for solving such a problem is proposed. 4.1.1 The Minimum Cost Dynamic Flow Problem A dynamic network N = (X, E, f, τ, d, ϕ) consists of a directed graph G = (X, E) with the set of vertices X, |X| = n, and the set of edges E, |E| = m, the capacity function f : E×T → R+ , the transition time function τ : E → R+ , the demand-supply function d: X ×T → R and the cost function ϕ: E ×R+ ×T → R+ , where T = {0, 1, 2, . . . , T }. We consider the discrete time model, in which all times are integral and bounded by horizon T . Time is measured in discrete steps, so that if one unit of flow leaves vertex z at time t on edge e, one unit of flow arrives at vertex x at time t + τe , where τe is the transition time of edge e = (z, x). The time horizon (finite or infinite) is the time until which the flow can pass in the network and defines the makespan T = {0, 1, . . . , T } of time moments which we consider. For the flow existing we require that t∈T x∈X dx (t) = 0. It is evident that this condition is necessary but it is not a sufficient one. If dx (t) > 0 for an arbitrary node x ∈ X for a given moment in time t ∈ T, then we treat this node x at time-moment t as a source. If at moment in time t ∈ T the condition dx (t) < 0 holds, then we regard the node x at time-moment t as a sink. In case dx (t) = 0 at moment in time t ∈ T, we consider the vertex x for a given time-moment t as an intermediate node. In such a way, the same node x ∈ X at different moments in time can serve as a source, a sink or an intermediate node. Without losing generality, we consider that the set of vertices X is divided into three disjoint subsets X+ , X− , X∗ , such that: X+ consists of nodes x ∈ X, for which dx (t) ≥ 0 for t ∈ T and there exists at least one moment in time t0 ∈ T such that dx (t0 ) > 0; X− consists of nodes x ∈ X, for which dx (t) ≤ 0 for t ∈ T and there exists at least one moment in time t0 ∈ T such that dx (t0 ) < 0; X∗ consists of nodes x ∈ X, for which dx (t) = 0 for every t ∈ T. So, X+ is a set of sources, X− is a set of sinks and X∗ is a set of intermediate nodes of network N . A feasible dynamic flow on N is a function α: E × T → R+ that satisfies the following conditions: αe (t) − αe (t − τe ) = dx (t), ∀ t ∈ T, ∀ x ∈ X; (4.1) e∈E − (x)
e∈E + (x) t−τ (e)≥0
0 ≤ αe (t) ≤ fe (t), αe (t) = 0,
∀ t ∈ T, ∀ e ∈ E;
∀ e ∈ E, t = T − τe + 1, T .
(4.2) (4.3)
4.1 Single-Commodity Dynamic Flow Problems
183
Here the function α defines the value αe (t) of the flow entering edge e at time t. It is easy to observe that the flow does not enter edge e at time t if it has to leave the edge after time T ; this is ensured by condition (4.3). Capacity constraints (4.2) mean that in a feasible dynamic flow, at most fe units of the flow can enter the edge e within each integral time step. Constraint (4.1) represents a flow conservation condition. To model transition costs, which may change over time, we define the cost function ϕe (αe (t), t) with the meaning that a flow of value = αe (t) entering edge e at time t will incur a transition cost of ϕe (, t). We assume that ϕe (0, t) = 0 for all e ∈ E and t ∈ T. The integral cost F (α) of dynamic flow α on N is defined as follows: ϕe (αe (t), t). (4.4) F (α) = e∈E t∈T
The dynamic minimum cost flow problem consists of finding a feasible flow that minimizes the objective function (4.4). It is easy to observe that if τe = 0, ∀ e ∈ E, and T = 0 then the formulated problem becomes the classical minimum cost flow problem in a static network. 4.1.2 The Main Results We propose an approach for solving the formulated problem, which is based on its reduction to a static minimum cost flow problem. We show that the dynamic problem on network N = (X, E, f, τ, d, ϕ) can be reduced to a static problem on an auxiliary network N T = (X T , E T , f T , dT , ϕT ), which is named time-expanded network. The advantage of this approach is that it transforms the problem of determining an optimal flow over time into a classical static network flow problem. The time-expanded network contains a copy of the vertex set of the underlying network for each discrete interval of time, building a time layer. For each edge with transition time τe in the given network, there is a copy between each pair of the time layers of the distance. We define the network N T as follows: 1.
X T := {x(t) | x ∈ X, t ∈ T};
2.
E T := {(x(t), z(t + τe )) | e = (x, z) ∈ E, 0 ≤ t ≤ T − τe };
3.
T fe(t) := fe (t)
for e(t) ∈ E T ;
T 4. ϕTe(t) (αe(t) ) := ϕe (αe (t), t)
5.
dTx(t) := dx (t)
for e(t) ∈ E T ;
for x(t) ∈ X T .
An example which shows how to obtain the auxiliary time-expanded network N T for the given dynamic network N is presented below.
184
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
Consider the dynamic network N in Fig. 4.1 with the set of discrete time moments T = {0, 1, 2, 3} and the transition times on the edges τe1 = 1, τe2 = 1, τe3 = 2.
x2 e1
e3
x3
x1 e2 Fig. 4.1.
Using the construction described above we obtain the time-expanded network presented in Fig. 4.2
t =0
t =1
t=2
t =3
x1(0)
x1(1)
x1(2)
x1(3)
x2(0)
x2(1)
x2(2)
x2(3)
x3(0)
x3(1)
x3(2)
x3(3)
Fig. 4.2.
In the following we show that the construction described above allows us to establish a correspondence between feasible dynamic flows in the dynamic network N and feasible static flows in the time-expanded network N T . We define this correspondence as follows:
4.1 Single-Commodity Dynamic Flow Problems
185
Let e(t) = (x(t), z(t + τe )) ∈ E T and let αe (t) be a flow in the dynamic network N . Then we put T αe(t) = αe (t),
∀ e(t) ∈ E T .
(4.5)
The following lemma holds: Lemma 4.1. The correspondence (4.5) is a bijection from the set of feasible flows in the dynamic network N onto the set of feasible flows in the timeexpanded network N T . Proof. It is obvious that the correspondence (4.5) is a bijection from the set of T -horizon functions in the dynamic network N onto the set of functions in the time-expanded network N T . It is also easy to observe that a feasible flow in the dynamic network N is a feasible flow in the time-expanded network N T and vice-versa. Indeed, T T 0 ≤ αe(t) = αe (t) ≤ fe (t) = fe(t) ,
∀ e ∈ E, 0 ≤ t < T.
Therefore, it is enough to show that each dynamic flow in the dynamic network N is put into the correspondence with a static flow in the time-expanded network N T and vice-versa. T Let αe (t) be a dynamic flow on N and let αe(t) be a corresponding function T T on N . Let’s prove that αe(t) satisfies the conservation constraints in its static network. Let x ∈ X be an arbitrary vertex in N and t: 0 ≤ t < T an arbitrary moment in time: (i) αe (t) − αe (t − τe ) dx (t) = e∈E − (x)
=
e(t)∈E − (x(t))
e∈E + (x) t−τ (e)≥0 T αe(t)
−
(ii) T αe(t−τ (e)) =
(4.6) dTx(t) .
e(t−τe )∈E + (x(t))
Note that according to the definition of the time-expanded network the set of edges {e(t − τe ) | e(t − τe ) ∈ E + (x(t))} consists of all edges that enter x(t), while the set of edges {e(t) | e(t) ∈ E − (x(t))} consists of all edges that originate from x(t). Therefore, all necessary conditions are satisfied for each T is a flow in the time-expanded network N T . vertex x(t) ∈ X T . Hence, αe(t) T be a static flow in the time-expanded network N T and let αe (t) Let αe(t) be a corresponding function in the dynamic network N . Let x(t) ∈ X T be an arbitrary vertex in N T . The conservation constraints for this vertex in the static network are expressed by equality (ii) from (4.6), which holds for all x(t) ∈ X T at all times t: 0 ≤ t < T . Therefore, equality (i) holds for all x ∈ X at all times t: 0 ≤ t < T . In such a way αe (t) is a flow in the dynamic network N . The lemma is proved.
186
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
Lemma 4.2. If α is a flow in the dynamic network N and αT is a corresponding flow in the time-expanded network N T , then F (α) = F T (αT ), where F T (αT ) =
T ϕTe(t) (αe(t) )
t∈T e(t)∈E T
is the total cost of the static flow in the time-expanded network N T . Proof. The proof is straightforward: F (α) =
e∈E t∈T
ϕe (αe (t), t) =
T ϕTe(t) (αe(t) ) = F T (αT ).
t∈T e(t)∈E T
The lemmas above imply the validity of the following theorem: Theorem 4.3. For each minimum cost flow in the dynamic network there is a corresponding minimum cost flow in the static network and vice-versa. The results described above allow us to conclude that the dynamic minimum cost flow problem can be solved by reducing it to the static minimum cost flow problem on the auxiliary time-expanded network N T , for which classical optimization methods and algorithms can be used [28, 39, 54, 75, 86]. 4.1.3 The Dynamic Model with Flow Storage at Nodes In the mathematical model from the previous section it is assumed that flow cannot be stored at nodes. This model can be extended for the case with flow storage at nodes if we associate to each node x ∈ X a transition time τx which means that the flow passage through this node takes τx units of time. If, additionally, we associate with each node x the capacity function fx (t) and the cost function ϕx (α, t), we obtain a more general mathematical model. It is easy to observe that in this case the problem can be reduced to the previous one by simple transformation of the network where each node x is changed by a couple of vertices x and x connected with a directed edge ex = (x , x ). Here, x preserves all entering edges and x preserves all leaving edges of the previous network. With the edge ex we associate the transition time τex = τx , the cost function ϕex (α, t) = ϕx (α, t) and the capacity function fex (t) = fx (t) (see Fig. 4.3). Another mathematical model with unlimited flow storage at nodes can be obtained by introducing loops in those nodes in which there is flow storage. The flow which was stored at the nodes passes through these loops.
4.1 Single-Commodity Dynamic Flow Problems x
x’
187
x’’
Fig. 4.3.
Moreover, by introducing transition times for the loops and the costs we can formulate the problem with flow storage at nodes and storage costs at the nodes. For example, if we assume that the flow can be stored two units of time at node x2 and one unit of time at node x3 of the network in Fig. 4.1, then we add loop (x2 , x2 ) with τ(x2 ,x2 ) = 2 and loop (x3 , x3 ) with τ(x3 ,x3 ) = 1 (see Fig. 4.4). Additionally, with these loops we can associate the costs ϕ(x2 ,x2 ) (α(x2 ,x2 ) (t), t) and ϕ(x3 ,x3 ) (α(x3 ,x3 ) (t), t) which reflect the cost of flow storage at these nodes. The time-expanded network for the dynamic network in Fig. 4.4 is presented in Fig. 4.5, which is constructed in the same way as in the previous section.
x2
e1
e3
x3
x1
e2
Fig. 4.4.
An important particular case of the considered problem is when the whole amount of the flow is dumped into the network from sources x ∈ X+ at the time moment t = 0 and it arrives at sinks x ∈ X− at the time moment t = T . This means that the supply-demand function d : X × T → R satisfies the conditions: a) dx (0) > 0, dx (t) = 0, t = 1, 2, . . . , T,
for x ∈ X+ ;
b) dx (T ) < 0, dx (t) = 0, t = 0, 1, 2, . . . , T − 1, for x ∈ X− . In the following we show that in this case we can use another dynamic mathematical model.
188
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
t =0
t =1
t=2
t =3
x1(0)
x1(1)
x1(2)
x1(3)
x2(0)
x2(1)
x2(2)
x2(3)
x3(0)
x3(1)
x3(2)
x3(3)
Fig. 4.5.
4.1.4 The Dynamic Model with Flow Storage at Nodes and Integral Constant Demand-Supply Functions In the previous section we have formulated the dynamic model with flow storage at nodes by introducing loops in the dynamic network. In this section we consider the minimum cost flow problem on the dynamic network with flow storage at nodes and integral constant demand-supply functions. Let a dynamic network N = (X, E, τ, f, d, ϕ) be given, determined by a directed graph G = (X, E) with a set of vertices X and a set of edges E, a transition time function τ : E → R+ , a capacity function f : E × T → R+ , a demand-supply function d : X → R, and cost function ϕ : E × R+ × T → R+ . Without losing generality, we will assume that no edges enter sources or exit sinks. We note that in order for a flow to exist, the supply must be equal to demand: x∈X dx = 0. The mathematical model of the minimum cost flow problem on this dynamic network is the following:
T
αe (t) −
θ
e∈E − (x) t=0
T
αe (t − τe ) = dx ,
∀ x ∈ X;
(4.7)
e∈E + (x) t=τe
e∈E − (x) t=0
αe (t) −
θ
αe (t − τe ) ≤ 0,
∀x ∈ X∗ , ∀ θ ∈ T; (4.8)
e∈E + (x) t=τe
0 ≤ αe (t) ≤ fe (t), αe (t) =
0,
∀ t ∈ T, ∀ e ∈ E; ∀ e ∈ E, t = T − τe + 1, T .
(4.9) (4.10)
4.1 Single-Commodity Dynamic Flow Problems
189
Condition (4.10) ensures that there is no flow in the network after time horizon T . Condition (4.9) is a capacity constraint. As the flow passes through the network, we allow unlimited flow storage at the nodes, but prohibit any deficit by constraint (4.8). Finally, all demands must be met, flow must not remain in the network after time T , and each source must not exceed its supply. This is ensured by constraint (4.7). The cost of the dynamic flow α is defined as follows: ϕe (αe (t), t). F (α) = t∈T e∈E
The dynamic minimum cost flow problem is to find a feasible flow that minimizes this objective function. According to this dynamic model all flow is dumped into the network at zero time-moment and it arrives in the whole at final moment in time T . To solve the formulated problem we use the time-expanded network method. We construct an auxiliary static network N T as follows: 1.
X T := {x(t) | x ∈ X, t ∈ T};
2.
T T X+ := {x(0) | x ∈ X+ } and X− := {x(T ) | x ∈ X− };
3.
E T := {(x(t), z(t + τe )) | e = (x, z) ∈ E, 0 ≤ t ≤ T − τe }∪ ∪{x(t), x(t + 1) | x ∈ X, 0 ≤ t < T };
4.
dTx(t) := dx
T T for x(t) ∈ X+ ∪ X− ; otherwise dx(t) T := 0;
T 5. f(x(t),z(t+τ := f(x,z) (t) (x,z) )) T f(x(t),x(t+1)) := ∞
for (x(t), z(t + τ(x,z) )) ∈ E T ;
for (x(t), x(t + 1)) ∈ E T ;
T ) := ϕ(x,z) (α(x,z) (t), t) 6. ϕT(x(t),z(t+τ(x,z) )) (α(x(t),z(t+τ (x,z) ))
for (x(t), z(t + τ(x,z) )) ∈ E T ; T ϕT(x(t),x(t+1)) (α(x(t),x(t+1)) ) := 0 for (x(t), x(t + 1)) ∈ E T . T T := αe (t), where α(x(t),x(t+1)) If we define a flow correspondence to be αe(t) in N T corresponds to the flow in N stored at node x at period of time from t to t+1, the minimum cost flow problem on dynamic networks can be solved by solving the static minimum cost flow problem on the time-expanded network.
4.1.5 The Algorithm Let the dynamic network N be given. The minimum cost dynamic flow problem has to be solved on N . Proceedings are the following: 1. Build the time-expanded network N T for a given dynamic network N . 2. Solve the classical minimum cost flow problem on the static network N T .
190
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
3. Reconstruct the solution of the static problem in N T to the dynamic problem on N . Now let us examine the complexity of this algorithm including the necessary time to solve the resulting problem in the static time-expanded network. Building the time-expanded network and reconstructing the solution of the static minimum cost flow problem to the dynamic one has a complexity O(nT + mT ), where n is the number of vertices in the dynamic network and m is the number of edges in this network. The complexity of step (2) depends on the complexity of the algorithm used for the minimum cost flow problem in static networks. If such an algorithm has a complexity O(f (n , m )), where n and m are the number of vertices and edges in the network, then the complexity of solving the minimum cost flow problem in the time-expanded network employing the same algorithm is O(f (nT, mT )). 4.1.6 Constructing the Time-Expanded Network and its Size In this section we will consider the minimum cost flow problem on a dynamic network N = (X, E, f, τ, d, ϕ) that satisfies the following conditions: 1) The graph G = (X, E) does not contain directed cycles; 2) No edges enter sources or exit sinks; 3) The transition time τe for each edge e is constant and equal to one unit of time, i.e. τe = 1, ∀e ∈ E; 4) The edge capacities are unlimited: fe = +∞, ∀e ∈ E; 5) The time horizon is T = +∞; 6) All flow is dumped in the network at time 0; 7) The flow storage is not allowed at intermediary nodes, i.e. the following condition is true: αe (t) − αe (t − τe ) = 0, ∀t ∈ T, ∀x ∈ X∗ . (4.11) e∈E − (x)
e∈E + (x) t−τe ≥0
Because fe ≡ +∞ and τe ≡ 1, we use the shorthand notation N = (X, E, d, ϕ) for the minimum cost flow problem, which satisfies the conditions 1)–7). Furthermore, we frequently refer to the number of edges in a path L as length, denoted by |L|. Let us study the influence of the properties above on flows in the network N . Lemma 4.4. Let N = (X, E, d, ϕ) be a given dynamic network. Let T ∗ = |L∗ |, where L∗ is a directed path with maximum length in (X, E). Consider an arbitrary flow αe (t) in N . Then for ∀e ∈ E, ∀t ≥ T ∗ it is true that αe (t) = 0. Proof. If T ∗ = 0 the dynamic network does not contain any edges. Suppose t0 ≥ T ∗ > 0 and e0 = (z0 , x0 ) ∈ E : αe0 (t0 ) > 0. Since t0 > 0, z0 must be an intermediate node, and therefore, by the conservation constraints it follows
4.1 Single-Commodity Dynamic Flow Problems
191
that there is a flow entering node z0 from one of the sources, or another intermediate node. If the flow is coming into z0 from an intermediate node z1 , we repeat the same reasoning for node z1 . With this process we are constructing a directed path, and therefore, the process will conclude in a finite number of steps, because the network contains only a finite number of nodes, and no directed cycles. When the process ends, the path looks like this: L0 = (zk , zk−1 , . . . , z1 , z0 , x0 ), ej = (zj , zj−1 ) ∈ E, αej (t0 − j) > 0. Since the process has stopped at node zk , and the conservation constraints (4.11) apply to all intermediate nodes, and there are no edges outgoing from sinks, it follows that zk ∈ X+ . The iterations of the process guarantee that αek (t0 − k) > 0. On the other hand, the condition that all flow is dumped in the network at time 0, implies t0 − k = 0, hence t0 = k because that is the only time an edge originating from a source can have positive flow on it. Therefore, the acyclic path L0 consists of t0 + 1 > T ∗ edges. However, this contradicts the statement that the longest path has length T ∗ , and therefore, the supposition is false. The lemma is proved. This lemma allows us to replace the infinite time horizon with the finite one, by substituting T ∗ = |L∗ | for the positive infinity. Taking into account the assumptions related to the dynamic network N , ∗ we modify a little the definition of the time-expanded network N T for the ∗ acyclic unit-time network N = (X, E, X+ , X− , d, ϕ), and let T = |L∗ |, where L∗ is a path of maximum length in N . The new static network ∗ ∗ ∗ ∗ ∗ T∗ T∗ , X− , dT , ϕT ) is defined as follows: N T = (X T , E T , X+ ∗
1. For each node x in X∗ there are T ∗ + 1 corresponding nodes in X T : x(0), x(1), . . . , x(T ∗ − 1), x(T ∗ ); 2. For each edge e = (z, x), z ∈ X∗ , x ∈ X∗ from E there are T ∗ correspond∗ ing edges in E T : e(0), e(1), . . . , e(T ∗ − 1), where e(t) = (z(t), x(t + 1)); T∗ T∗ = {x(0) | x ∈ X+ } and X− = {x(0) | x ∈ X− }; 3. The terminal sets are X+ 4. For each edge e = (z, x), z ∈ X+ there is exactly one corresponding edge ∗ e(0) = (z(0), x(1)) in E T , and for each edge e = (z, x), x ∈ X− there are ∗ T ∗ corresponding edges e(t) = (z(t), x(0)), t = 0, T ∗ − 1 in E T ; ∗ 5. The demand-supply function remains virtually the same: dTx(t) = dx (t), ∗ ∀x(t) ∈ X T ; ∗ T∗ ) is an expansion over time of the dynamic 6. The cost function ϕTe(t) (αe(t) ∗ ∗ T T∗ network function: ϕe(t) (αe(t) ) = ϕe (αe (t), t) for e(t) ∈ E T . The time-expanded network of a dynamic network N can be built directly, according to the definition. In such a case, the network will have less than n(1 + T ∗ ) nodes, and less than mT ∗ edges. Since the maximum number of edges a directed path can have in an acyclic network is n − 1, it immediately results that the time-expanded network has less than n2 nodes and less than
192
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
m(n − 1) edges. In particular, a network with one source and one sink, may have (n − 2)(T ∗ + 1) + 2 nodes, and (m − 1)T ∗ + 1 edges. On the other hand, it’s easy to note that in many cases the large majority of intermediate nodes are not connected with a directed path both to a sink, and to a source. Removing such nodes from the network N does not influence the set of flows in that network. We shall call these nodes irrelevant to the minimum cost flow problem. Intermediate nodes that are not irrelevant will be denoted relevant. The network in Fig. 4.6 is a dynamic network that conforms to the definition of the acyclic unit-time dynamic network, with X+ = {x1 } and X− = {x5 }. Note that the maximum number of edges in a directed path is 3.
x2
x1
x4
x3
x5
Fig. 4.6.
Therefore, built according to the definition, the time-expanded network would have (n − 2)(1 + T ∗ ) + 2 = 3 × 4 + 2 = 14 nodes, and (m − 1)T ∗ + 1 = 5 × 3 + 1 = 16 edges. However, if we exclude the irrelevant nodes we obtain a much smaller static network depicted in Fig. 4.7 with only 4 intermediate nodes, and 7 edges.
x4(2)
x2(1)
x4(1)
x1(0)
x3(1) Fig. 4.7.
x5(0)
4.1 Single-Commodity Dynamic Flow Problems
193
We call the static network obtained by eliminating the irrelevant nodes and all edges adjacent to them from the time-expanded network: reduced timeexpanded network. If the context allows no confusion, we will often abbreviate that to a reduced network. Consider the following algorithm for constructing the reduced network, based on the process of elimination of irrelevant nodes from the time-expanded network: An Algorithm for Constructing the Reduced Network ∗
Given the dynamic network N , we build the reduced network N T : ∗
1. Build the time-expanded network N T , according to the definition. 2. For each source from the time-expanded network perform a breadth-first T∗ ) of the nodes parse of the nodes. The result of this step is the set X − (X+ ∗ T that can be reached from at least one source in X+ . 3. For each sink, perform a breadth-first parse of the nodes, beginning with the sink, and parsing the edges in the direction opposite to their normal T∗ ) of nodes from orientation. The result of this step is the set X + (X− ∗ T can be reached. which at least a sink in X− ∗ 4. The reduced network will consist of a subset of nodes from X T and edges T∗ from E , determined in the following way: ∗
∗
∗
∗
∗
∗
T T X T = X T ∩ X − (X+ ) ∩ X + (X− ), ∗
∗
E T = E T ∩ (X T × X T ). ∗
5. dT x(t) = dx (t), ∗
∗
∗
∀x(t) ∈ X T .
T 6. ϕT e(t) (αe(t) ) = ϕe (αe (t), t)
∗
for e(t) ∈ E T . ∗
The complexity of building the time-expanded network N T is O(mT ∗ + ∗ nT ), this construction implies a very simple process directly dependent on the number of nodes and edges in the network. Let us examine the complexity of the other steps of the algorithm above. Breadth-first parsing in a graph with m edges and n nodes beginning from a single node has complexity O(m + n ) ([76]), but this does not mean that the operation must be repeated for every node, increasing complexity n-fold. In fact, it is easy to see that we can do a breadth-first parse beginning with all the sources, still with complexity O(mT ∗ +nT ∗ ). Breadth-first parsing from the sinks has the same complexity. Using the data structures and representation of the graphs employed at steps (1)–(3) we can execute steps (5),(6) with the same complexity of O(mT ∗ + nT ∗ ) = O((m + n)T ∗ ). Therefore, the complexity of constructing the reduced network can be estimated to be the same as the complexity of constructing the time-expanded network. Now we prove that the reduced network can indeed be used in place of the time-expanded network.
194
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
Lemma 4.5. For the purposes of this lemma, let N = (X, E, d, ϕ) be a static acyclic network, and αe an arbitrary flow in this network. If for a node x ∈ X∗ there is flow entering this node, then there is a directed path L0 = (x0 , x1 , ..., x ), linking the node to a source, and a directed path L1 = (x , xi+1 , ..., xf ), linking the node to a destination. Proof. We first establish that an intermediate node x having a flow entering it implies that there is an edge e0 ∈ E(x ) = E + (x ) ∪ E − (x ) such that αe0 > 0; the same is implied by a flow exiting x . Note that from the conservation constraints and since x ∈ X∗ and αe ≥ 0, ∀e ∈ E, we have: e0 ∈ E + (x ) ⇒ ∃e1 ∈ E − (x ) : αe1 > 0, e0 ∈ E − (x ) ⇒ ∃e1 ∈ E + (x ) : αe1 > 0. We examine, without reducing generality the case when e0 ∈ E + (x ) and e1 ∈ E − (x ). Let e0 = (z1 , x ), then if z1 ∈ X+ , the lemma is proved for L1 . Otherwise we proceed in the same fashion with z1 , and obtain edge (z2 , z1 ). Continuing the process as long as we can find a satisfactory edge, we obtain a path, since the network is finite and acyclic: L0 = (zk , zk−1 , . . . , z1 , x ). Note that the construction process only stops at node zk if it does not satisfy the conservation constraint for an intermediate node. Since the structure of the network clearly shows that no edge originates from a sink, we obtain zk ∈ L0 . Analogously, we built a path L1 starting with edge e1 , obtaining xl ∈ X− : L1 = (x , x1 , . . . , xl−1 , xl ). The lemma is proved.
Therefore, for any flow in any acyclic static network any node “present” in the flow is connected to at least one sink, and at least one source. Since the time-expanded network is acyclic, the lemma implies, as already noted, that ∗ the set of flows in the time-expanded network N T is equal to the set of flows ∗ in the reduced network N T . We note that the proposed algorithm begins with the dynamic network containing a small number of nodes, builds the time-expanded network with the largest number of nodes, and then selects from it the reduced network with a smaller number of nodes. While in some cases the reduced network can be close in size to the time-expanded network, in any case building the time-expanded network consumes additional time and memory. Let’s examine ∗ an algorithm for constructing the reduced network N T directly from the dynamic network N .
4.1 Single-Commodity Dynamic Flow Problems
195
An Algorithm for Constructing the Reduced Network Directly from the Dynamic Network ∗
Given the dynamic network N , we build the reduced network N T : 1. Build the dynamic network N which contains all the nodes in N except those that are not connected with a directed path with at least one sink, and at least one source, employing the same method as used in the algorithm from the previous section for the static network. 2. Create the queue C = (x1 (0), x2 (0), . . . , xl (0)), where {x1 , x2 , . . . , xl } = X+ . 3. Initialize sets: ∗
T X+ ←∅ ∗
T ← {xi (0) | xi ∈ X− }, X− ∗
∗
∗
T T ∪ X− . X T ← X+ 4. While queue C is not empty, execute for each node x1 (t1 ) at the head of the queue: ∗ a) If node x1 (t1 ) is already in X T , then jump to step (4d). − b) For each (x1 , xi ) ∈ E (x1 ) in the dynamic network execute: ∗ i. If xi ∈ X∗ and node xi (t1 + 1) is not already in X T , then add node xi (t1 + 1) to queue C, and add edge (x1 (t1 ), xi (t1 + 1)) to ∗ E T . ∗ ii. If node xi ∈ X− and (x1 (t1 ), xi (0)) is not already in E T , then T ∗ add edge (x1 (t1 ), xi (0)) to E . ∗ c) Add node x1 (t1 ) to X T . d) Remove node x1 from queue C, all nodes move one step closer to the head of the queue: xi (ti ) ← xi+1 (ti+1 ). ∗ T ∗ 5. dx(t) = dx (t), ∀x(t) ∈ X T . ∗ ∗ T ∗ for e(t) ∈ E T . 6. ϕT e(t) (αe(t) ) = ϕe (αe (t), t) ∗
Let us prove that the network N T produced by the algorithm above is ∗ the reduced network of N T . ∗
Lemma 4.6. The network N T built by the algorithm described above con∗ tains only intermediate nodes from the time-expanded network N T that are relevant. Furthermore, it contains all intermediate nodes with this property ∗ from N T . Proof. It is easy to note that if at a certain step node x0 (t0 ) is present in ∗ X T and there is a path from x0 to xk in the dynamic network N : L0 = (x0 , x1 , . . . , xk ), then the algorithm will place in the network all nodes and edges of the path:
196
4 Discrete Control and Optimal Dynamic Flow Problems on Networks ∗
LT0 = (x0 (t0 ), x1 (t0 + 1), . . . , xl (t0 + k)). Let us now prove that if node x (t) is relevant, then it is present in the built network, together with the path connecting it to a source, and a sink. ∗ Indeed, if a node is so connected with 2 paths in N T : ∗
∗
LT0 = (x0 (0), x1 (1), . . . , xt−1 (t − 1), x (t)),
T x0 (0) ∈ X+ ,
LT1 = (x (t), xt+1 (t + 1), . . . , xl−1 (l − 1), xl (0)),
T xl (0) ∈ X− ,
∗
∗
then it follows that node x is present in N together with the nodes and edges of the paths: L0 = (x0 , x1 , . . . , xt−1 , x ), x0 ∈ X+ , L1 = (x , xt+1 , . . . , xl−1 , xl ), xl ∈ X− . Since there is a path L0 from a sink to node x it follows that the algorithm ∗ will place path LT0 in the built network, and node x (t) will be present in ∗ the queue C. Therefore, the path LT1 will also be placed in the built network ∗ N T . ∗ Let us now prove the converse: if node x (t) is present in N T , then it is ∗ relevant. Indeed, since x (t) ∈ N T ⇒ x ∈ N , we obtain that N contains paths: L0 = (x0 , x1 , . . . , xt−1 , x ), x0 ∈ X+ , xl ∈ X− . L1 = (x , xt+1 , . . . , xl ), It then follows that paths: ∗
∗
LT0 = (x0 (0), x1 (1), . . . , xt−1 (t − 1), x (t)),
T x0 (0) ∈ X+ ,
LT1 = (x (t), xt+1 (t + 1), . . . , xl−1 (l − 1), xl (0)),
T xl (0) ∈ X− ,
∗
∗
∗
∗
are present in N T and hence in N T . The lemma is proved.
Therefore, the network built by the proposed algorithm consists of the same intermediate nodes as the reduced network, with the same sources and ∗ sinks. Furthermore, we note that if the network N T built by the algorithm contains a node x0 (t0 ), then it contains all the edges originating from that node E − (x0 (t0 )), and hence the set of edges coincides with that of the reduced network. Finally, it is obvious that the functions d, ϕ are the same as those in the reduced network. Therefore, we can conclude that the built network is the reduced network. As we have noted, the reduced networks have the advantage over the timeexpanded networks in that they contain less nodes. This is explained by the fact that node x(t), 0 ≤ t ≤ T ∗ , is present in the reduced network only if it is relevant, while the same node is always present in the time-expanded network. ∗
Lemma 4.7. Node x (t) is present in the reduced network N T if and only if in the dynamic network N there is a directed path from a source to x of t edges: (x0 , x1 , . . . , xt−1 , x ).
4.1 Single-Commodity Dynamic Flow Problems
197
Proof. Note that from the algorithm it is obvious that node x (t) being in∗ ∗ cluded in X T implies that there is a node xt−1 (t−1) in X T and a connecting ∗ edge in E T . By repeating the reasoning for xt−1 (t − 1) we obtain a directed ∗ path present in N T : ∗
LT = (x0 (0), x1 (1), . . . , xt−1 (t − 1), x (t)),
∗
T x0 (0) ∈ X+ ,
which immediately implies that a path consisting of t edges: L = (x0 , x1 , . . . , xt−1 , x ),
x0 ∈ X+ ,
is present in N . Sufficiency is obvious, as noted in Lemma 4.6. The lemma is proved. Lemma 4.8. A path of maximum length in the reduced dynamic network N connects a source to a sink. Proof. Suppose the statement of the lemma is not true, and let L∗ = (xi , xi+1 , . . . , xj−1 , xj ) be one of the longest paths. According to the property of the reduced dynamic network, there exist two directed paths (at least one not empty), one connecting a source to the first node xi in L∗ , and the second connecting the last node xj in L∗ to a sink: L1 = (x0 , x1 , . . . , xi−1 , xi ), L2 = (xj , xj+1 , . . . , xk−1 , xk ). The directed paths L1 , L2 and L∗ do not share any nodes, and hence edges; otherwise this would mean there are directed cycles in N . Therefore, we can form a new path L1 L∗ L2 , longer than L∗ by a margin of at least 1. This is a contradiction, therefore, the supposition was false. The lemma is proved. Lemma 4.9. If there is a node xT in the reduced acyclic dynamic network N such that a path of maximum length from a source to it has length T , then for each t = 1, T there is a node xt in N such that the longest path from a source to it is t. Proof. Let LT be a path of maximum length in N from a source to the given node: LT = (x0 , x1 , . . . , xT −1 , xT ). Then there is at least one path from the source node to each node in the path, node xt being connected with a path of length t: Lt = (x0 , x1 , . . . , xt−1 , xt ).
198
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
Furthermore, for each xt there are no paths longer than t from a source to xt . Suppose such a path Lt = (x0 , x1 , . . . , xt+k−1 , xt ) of length t + k exists. Then we can form a new path L∗1 = (x0 , x1 , . . . , xt+k−1 , xt , xt+1 , . . . , xT −1 , xT ) with length k + T , and without cycles in it, since the graph we are studying is acyclic. This implies that there is a path from a source to xT longer than T . We have obtained a contradiction, therefore, the supposition is false, and nodes xt are the sought nodes with longest distance t from a source to them. The lemma is proved. We further examine only the number of intermediate nodes in the timeexpanded and reduced networks, since the number of sources and sinks remains the same after expansion in time and reduction. It is easy to note, based on the result of the lemma above that the number of intermediate nodes in the reduced network can be at most n∗ (T ∗ − 1), where n∗ = |X∗ | is the number of intermediate nodes in the dynamic network. Indeed, a path ending in a non-sink node has length of at most T ∗ − 1, and there are no x(0) intermediate nodes in the reduced network. On the other hand, we note that for an intermediate node x ∈ X∗ to correspond to T nodes in the reduced network, it is necessary for a directed path of length T + 1 < T ∗ connecting a source to node x to exist. Therefore, if the maximum length of a path from a source to x is T + 1, then node x cannot have more than T images x (t) in the static network. Since there exists a path of length T ∗ from a source to a sink, there is at least one intermediate node xt with maximum distance T ∗ − t from sources, for t = 1, T ∗ − 1. Let ni , i = 1, n∗ , be the number of images of an intermediate node xi in the reduced network. Then there is a permutation τ of numbers 1 to n∗ such as: (nτi ) ≤ (1, 2, . . . , T ∗ − 1, T ∗ − 1, . . . , T ∗ − 1). ∗
Therefore, the number of intermediate nodes is bounded above |X T | ≤ Z(n∗ , T ∗ ), where: 1 Z(n∗ , T ∗ ) = n∗ (T ∗ − 1) − (T ∗ − 2)(T ∗ − 1). 2
(4.12)
The decrease of the number of nodes is obvious and substantial compared to the total number of nodes especially for T ∗ close to n. Let us examine the upper bound on the number of intermediate nodes as a function of n∗ only. Since the maximum length of a directed path in an acyclic network is n∗ + 1 it follows that the time-expanded network has less than n∗ (n∗ + 1) intermediate nodes.
4.1 Single-Commodity Dynamic Flow Problems
199
On the other hand, if we treat the upper bound Z in (4.12) as a function of a continuous variable T ∗ and differentiate with regard to T ∗ , we obtain that Z reaches a global maximum when T ∗ = n∗ + 32 , and that the bound is monotonous both left, and right of T ∗ = n∗ + 32 . Therefore, the discretevariable upper bound that we are interested in reaches a maximum for either T ∗ = n∗ + 32 = n∗ + 1 or T ∗ = n∗ + 32 = n∗ + 2. Substituting both values of the argument for T ∗ we obtain that the corresponding values Z are equal: n∗ (n∗ + 1) n∗ − 1 )= 2 2 ∗ n ) = Z(n∗ , n∗ + 1), = (n∗ + 2 − 1)(n∗ − 2
Z(n∗ , n∗ ) = n∗ (n∗ −
and therefore:
n∗ (n∗ + 1) . (4.13) 2 Now let us show that for each n∗ > 0 there is a dynamic network for which bound (4.13) is reached, and therefore, it cannot be lowered. Let us examine the network in Fig. 4.8 with 4 intermediate nodes xi , i = 2, 5, one source x1 and one sink x6 . It is easy to see that the corresponding ∗
|X T | ≤
x6
x5
x4
x3
x2
x1 Fig. 4.8.
reduced network will contain one image for node x2 , 2 images for x3 , 3 images for x4 , and 4 images for x5 . Therefore, the number of intermediate nodes in the reduced network is 1 + 2 + 3 + 4 = 10. It is obvious by induction that for the network with n∗ intermediate nodes, the reduced network will contain n∗ (n∗ +1) intermediate nodes. 2 So far we have considered networks with infinite capacities on the edges fe = +∞, ∀e ∈ E. However, all the results can be verified for networks with finite capacities. We can even consider the case when edge capacities vary over time and are defined by f : E × T → R+ . To solve such a problem, we T∗ = fe (t). It is worth noting, take the capacities in the static network as fe(t) however, that the set of feasible flows in a network is a subset of the set of all flows in that network, and therefore, in some cases where the problem with infinite capacities admits a solution, the problem with finite capacities does not.
200
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
We can easily generalize unit-time networks to networks with arbitrary transition times τe ≥ 0 across the edges. For this purpose, we will redefine the time transition length of a path L in N to be equal to the sum of transition times across all edges: τe . |L| = e∈L ∗
∗
Hence, if we take T = |L | to be the maximum time length of a directed path in N we can still conclude that there is no flow in the network after time T ∗ , and therefore, examine the network only up to time T ∗ . When defining the time-expanded network, we need to take into account that edges have varying time length. Therefore, if an edge e = (z, x) ∈ E has length τe , it will have ∗ T ∗ − τe + 1 images in the time-expanded network N T : e(t) = (z(t), x(t + τe )),
t = 0, T ∗ − τe .
With these adjustments we can easily prove that the time-expanded network can be used to find a solution of the minimum cost flow problem in the dynamic network. It is also easy to prove that the reduced network can be used instead of the time-expanded network. Furthermore, we can employ the algorithms for constructing the time-expanded network, as well as the reduced network. Note that when we have τe = 0, ∀e ∈ E, and ϕe (αe (t), t) is constant over time, there are simpler approaches to the problem. It is easy to see that the number of nodes in the time-expanded network will be at most n(T ∗ + 1), and the number of edges mT ∗ . However, the estimated amount of nodes in the reduced network is no longer accurate for this generalization. The time-expanded network of a network with arbitrary transition times is still an acyclic static network, and the transition times will not show up in it in an explicit way, but rather in the structure of the network. Therefore, we can apply the usual static-network algorithms to solve the minimum cost flow problem in it. 4.1.7 Approaches for Solving the Minimum Cost Flow Problem with Different Types of Cost Functions on the Edges In the following we analyze the following cases of a minimum cost flow problem on a dynamic network. Linear Cost Functions on the Edges If the cost function ϕe (αe (t), t) is linear with regard to αe (t), then the cost function of the time-expanded network will be linear. In this case we can apply well-established methods for minimum cost flow problems, including linear programming algorithms [32, 33, 41], combinatorial algorithms [33], as well as other developments, like [26, 27].
4.1 Single-Commodity Dynamic Flow Problems
201
Convex Cost Functions on the Edges If the cost function ϕe (αe (t), t) is convex with regard to αe (t), then the cost function of the time-expanded network is convex. An algorithm for solving the dynamic version of the minimum cost flow problem with convex cost functions with regard to the flow can be obtained on the basis of the results from Section 4.1.2. We construct the auxiliary time-expanded network, solve the minimum cost flow problem in the static network with convex cost functions on the edges and then reconstruct the obtained solution. Solving the static problem we can apply methods from convex programming and the specification of such methods for the minimum cost flow problem. Concave Cost Functions on the Edges If there is exactly one source, and the cost function ϕe (αe (t), t) is concave with regard to αe (t), then the cost function in the time-expanded network is concave. If the dynamic network is acyclic, then the time-expanded network is acyclic and finite. Therefore, we can solve the static problem using classical algorithms for minimum cost flow problems in acyclic networks with concave cost functions [76, 88]. In the following we present an approach for incapacitated dynamic networks with cost functions that are concave with regard to the flow value, and that do not change over time. Relying on concavity, we reduce the problem to the minimum cost flow problem in a static network of equal size, not the time-expanded network. Most previous work involving flow costs in dynamic networks considers linear or convex cost functions with regard to the flow value. This implies that the cost of transporting one unit of flow is the same regardless of how many units are transported at a time, or that the cost per unit is rising with the total number of units. However, in many practical cases transports of larger quantities enable discounts on the price per unit. This behavior is best modelled by concave cost functions with regard to the flow value. Flow Properties in Dynamic Networks Let a dynamic network N = (X, E, f, τ, ϕ, d) with constant demands-supplies of nodes and the possibility of flow storage at nodes be given. As above, we consider that no edges enter sources or exit sinks. The corresponding static network N 0 of N is obtained by discarding all time-related information: N 0 = (X, E, f, ϕ0 , d), where ϕ0e () = ϕe (, 0). Lemma 4.10. Let N be an incapacitated dynamic network with concave cost functions with regard to the flow and constant in time. If α is a flow on N , then ye = t∈T αe (t) is a flow in the corresponding static network N 0 and F 0 (y) ≤ F (α).
202
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
Proof. Note that if φ : R+ → R+ is a concave function, then φ(α + β) ≤ φ(α) + φ(β) for all α, β ∈ R+ . Since F and F 0 are concave with regard to the flow value, we obtain:
0
F (y) =
ϕ0e (ye )
=
e∈E
=
ϕ0e
e∈E
"
# αe (t)
t∈T
≤
ϕ0e (αe (t))
e∈E t∈T
ϕe (αe (t), t) = F (α).
e∈E t∈T
T T Moreover, ye = t=0 αe (t) = t=τe αe (t − τe ), since flow α obeys constraint (4.10). Hence, by substituting ye in dynamic conservation constraint (4.7), we obtain the corresponding static conservation constraint. Therefore, y is a flow in N 0 . The lemma is proved. Definition 4.11. The graph Gα = (Xα , Eα ) that consists of the edge set Eα = {e | αe > 0} and the node set Xα = {x | ∃z such that (x, z) ∈ Eα or (z, x) ∈ Eα } is called the base graph of flow α in the network N . Definition 4.12. The graph Gα = (Xα , Eα ) consisting of the edge set Eα = {e | t∈T αe (t) > 0} and the node set Xα = {x | ∃z such that (x, z) ∈ Eα or (z, x) ∈ Eα } is called the base graph of the dynamic flow α in N . Lemma 4.13. Let N be an infinite-horizon dynamic network with cost functions constant in time. If y is a static flow in N 0 such that its base graph Gy is a forest, then there exists a dynamic flow α in N such that F (α) = F 0 (y). Proof. Let α(x,z) (t) = y(x,z) if t = tx , and α(x,z) (t) = 0 otherwise, where: 0, if x ∈ X+ , tx = (4.14) max{tz + τ(z,x) | (z, x) ∈ Ey }, otherwise. Since Gy = (Xy , Ey ) is a forest, the constants tx are well-defined and finite. To prove that α is a flow in N , we have to show that it satisfies the constraints (4.7), (4.8) and 4.10). Because of T = +∞, it follows that T ≥ tx , ∀x ∈ Xy . Therefore, for any e = (x, z) ∈ Ey , we obtain T ≥ tz ≥ tx + τe , hence T − τe ≥ tx . Since t = tx implies αe (t) = 0, it follows that αe (t) = 0 for all tx > T − τe ≥ tx , hence constraint (4.10) is obeyed. Condition (4.14) means that the flow starts leaving ∀x ∈ X∗ ∩Xy only after θ all inbound flow has arrived. Thus, for θ < tx we have e∈E − (x) t=0 αe (t) = 0, hence constraint (4.8) holds for θ < tx . For θ ≥ tx the flow summed over time on any edge is the same as the flow on that edge in the static network:
θ
e∈E + (x) t=τe
αe (t − τe ) =
e∈E + (x)
ye =
e∈E − (x)
ye =
θ
e∈E − (x) t=0
αe (t).
4.1 Single-Commodity Dynamic Flow Problems
203
Therefore, constraint (4.8) holds for θ ≥ tx . We have established that constraint (4.8) is obeyed. By taking θ = T ≥ tx in the previous argument, we obtain that constraint (4.7) holds for all x ∈ X∗ ∩ Xy . For all sources x ∈ X+ the incoming flow is zero: e∈E + (x) t∈T αe (t) = 0, since no edges enter a source. On the other hand, the outgoing flow equals the supply: αe (t) = y e = dx . e∈E − (x) t∈T
e∈E − (x)
Therefore, constraint (4.7) holds for all sources. The proof for sinks is similar, taking into account that no edges exit sinks. Therefore, constraint (4.7) is obeyed. Having proved that α is a flow, it is easy to see that it is feasible, since 0 ≤ αe (t) ≤ ye ≤ fe . Finally, ϕe (αe (t), t) = ϕe (αe (tx ), tx ) = ϕ0e (ye ) = F 0 (y). F (α) = e∈E t∈T
The lemma is proved.
e∈Ey
e∈E
In the proof above we employ the fact that T =+∞, only to maintain that tx ≤ T, ∀x ∈ Xy . However, if we denote by |L| = e∈L τe the time-length of a path in N , then we immediately obtain that tx ≤ maxL∈N {|L|}. Hence maxL∈N {|L|} is an upper bound for the makespan of flow α as constructed in the lemma above, and we can broaden the class of networks we examine. Lemma 4.14. Let N be a dynamic network with cost functions constant in time such that T ≥ maxL∈N {|L|}. If y is a static flow in N 0 such that its base graph Gy is a forest, then there exists a dynamic flow α in N such that F (α) = F 0 (y). To make the connection between Lemma 4.10, Lemma 4.14, and the minimum cost flows in dynamic networks, we will employ the following property of minimum cost flows in static networks with concave cost functions [55]. Lemma 4.15. Let N 0 be an incapacitated static network with concave nondecreasing cost functions. If there exists a flow in N 0 , then there exists a minimum cost flow y in N 0 such that its base graph Gy is a forest. We are now able to prove the main result of this subsection. Denote by y T the dynamic flow in N obtained from a forest-like flow y in N 0 such that T T (t) = y(x,z) if t = tx , and y(x,z) (t) = 0 otherwise, where tx are defined as y(x,z) in (4.14). Theorem 4.16. Let N be an incapacitated dynamic network with concave cost functions with regard to the flow and constant in time such that T ≥ maxL∈N {|L|}. If there exists a flow in N , then there exists a minimum cost forest-like flow ξ in N 0 , and the flow ξ T is a minimum cost flow in N .
204
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
Proof. Since there exists a flow in N , a flow can be constructed in N 0 according to Lemma 4.10. Hence, according to Lemma 4.15 there exists a minimum cost forest-like flow in N 0 ; denote this flow by ξ. Flow ξ T is a minimum cost flow in N . Indeed, for any flow α in N we have: F (ξ T ) = F 0 (ξ) ≤ F 0 (y) ≤ F (α), where y is a static flow in N 0 such that ye = t∈T αe (t). Equality F (ξ T ) = F 0 (ξ) follows from Lemma 4.14, inequality F 0 (ξ) ≤ F 0 (y) from the fact that ξ is a minimum cost flow in N 0 , and inequality F 0 (y) ≤ F (α) from Lemma 4.10. The theorem is proved. Therefore, a dynamic minimum cost flow can be computed using the following procedure: (i) Find a forest-like minimum cost flow ξ in the corresponding static network; (ii) Compute the constants tx and construct the dynamic minimum cost flow ξT . An Algorithmic Approach We will present a combinatorial algorithm for the minimum cost flow problem on dynamic networks that meets the conditions of Theorem 4.16, and has exactly 1 source, based on the approach in [55]. We begin by examining networks with 1, 2, or 3 sinks, and then present an algorithm for the general case with k sinks. Without loss of generality, we will label the source node 0, and the k sinks 1, . . . , k. We seek a minimum cost flow with a base graph that is a forest. Since there is exactly 1 source, and x∈X dx = 0 it follows that the base graph is connected, and thus it is a tree. Furthermore, since there are no edges entering sources or exiting sinks, the base graph rooted at the source will have the sinks as leaves; the problem becomes related to network design problems and general Steiner trees. One sink We seek a rooted tree Gy with 1 leaf, in other words a simple path from the source 0 to sink 1. Obviously, the flow on any edge of the path is equal to the total demand −d1 , and we need to minimize e∈Gy ϕ0e (−d1 ). This can be achieved by finding a shortest path from 0 to 1 with regard to the edge lengths ϕ0e (−d1 ). The coefficients tx for reconstructing the dynamic flow can be computed trivially by parsing the path from source to sink. Two sinks We seek a tree with a fixed root 0 (the source) and two fixed leaves 1 and 2 (the sinks). Any such tree would have the structure of one of the trees in Fig. 4.9 where by arrows we have denoted either simple paths or edges. Moreover, the tree structure (a) contains the other tree structures as degenerate cases; hence, we can assume all trees have this structure.
4.1 Single-Commodity Dynamic Flow Problems
Fig. 4.9
205
Fig. 4.10
Fig. 4.9. All structures for the trees with one source and 2 sinks. Structure (b) is a degenerate case obtained from (a) by setting w = 0; structure (c) is obtained from (a) by setting w = 1; structure (d) is obtained from (a) by setting w = 2. Fig. 4.10. All non-degenerate structures for trees with one source and 3 sinks. All other structures can be obtained as degenerate cases by using various combinations of setting w1 and w2 to 0, 1, 2, 3, and each other.
Denote by −dW the demand x∈W (−dx ); then the flow on the paths from 0 to w, w to 1, and w to 2 equals −d12 , −d1 , and −d2 , respectively. Consequently, we need to minimize for all w ∈ X the flow cost F 0 (y) =
e∈L0w
ϕ0e (−d12 ) +
e∈Lw1
ϕ0e (−d1 ) +
ϕ0e (−d2 ).
e∈Lw2
This can be achieved with 3 one-to-all shortest path computations. Denote by cW vw the minimum distance from x to w with regard to the edge lengths 1 F 0 (−dW ); we compute for all w ∈ X c12 0w from 0 to w, cw1 backwards from 2 1 to w, and cw2 backwards from 2 to w. Then it is straightforward to find 1 2 minw∈X {c12 0w +cw1 +cw2 } and then to compute tx by parsing the resulting tree. Note that path overlapping does not lead to invalid results, due to concavity; we will address this in more detail in the proofs for the general case. Three sinks For k = 3 we need to minimize the flow cost of a tree with fixed root 0 and three fixed sink leaves 1, 2, and 3. There are more structures such trees can take, but all are degenerate or proper cases of the 3 structures depicted in Fig. 4.10. We need to compute the minimum cost flow tree for each structure, and then select the optimum from them.
206
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
Consider the minimum cost flow computation of the trees that fit structure 12 (a) from Fig. 4.10. We compute c1x1 , c2x2 , c3x3 , c123 0x for all x ∈ X, as well as cvw for all x ∈ X and w ∈ X; overall O(n) one-to-all shortest path computations. 12 3 1 2 Then we compute minx,w∈X {c123 0x + cvw + cx3 + cw1 + cw2 } and finally tx . The general case We use dynamic programming to generalize, and present algorithm MCDF for computing the minimum cost of a dynamic flow in network N that satisfies the conditions of Theorem 4.16. We will consider that: (i) the dynamic network N with n nodes, m edges, 1 source, and k sinks; (ii) the set of all tree structures with k leaves A. MCDF(N , A): cmin := +∞; for each A ∈ A do: c := MCTF(N , A); if c < cmin then cmin := c; end for; return cmin . MINLEN(N , x, l[]): return minimum distances d[] from x to all nodes in the network N with regard to edge lengths l[]. DESC(A, x): return the set A+ (x) of all direct descendants of node x in tree A. READY(A, b[]): return the set {x ∈ A | b[x] = 0 ∧ b[w] = 1, ∀w ∈ DESC(A, x)} of all unprocessed nodes ready to be processed. MCTF(N , A): for each x ∈ A do: b[x] := 0; c[x][] := +∞; end for; while ∃x ∈ READY(A, b[]) do: b[x] := 1; if x ∈ X − then d[x] := w∈DESC(A,x) d[w]; for each e ∈ E do l[x][e] := ϕe (d[x], 0); for each w ∈ X do: if x ∈ X− or x = w then c[x, w] := 0; for each α ∈ DESC(A, x) do: d[] := MINLEN(N , w, l[α][]); c[x, w] := c[x, w] + minβ∈X {d[β] + c[α, β]};
4.1 Single-Commodity Dynamic Flow Problems
207
end for; end for; end while; return c[0, 0]. is given. The algorithm can be modified to return not just the minimum cost, but also the flow for which it is obtained. Lemma 4.17. The cost computed by sub-algorithm MCTF(N , A) equals or exceeds the cost of at least one flow in N 0 . Proof. First we note that for nodes which have no descendants in A (the sinks), the algorithm will compute c[x, w] = 0 if x = w and c[x, w] = +∞ otherwise. This corresponds to the fact that the flow has either reached its sink, or missed it. Assume that the algorithm has begun computing c[x][] for a node x ∈ A that is not a sink. It follows from the definition of READY and the use of vector b that the algorithm has already computed c[w][] for all descendants w of x. We note that d[x] is computed to equal the total demand of all sinks under x, and that the sum of the costs of sending 1 and 2 units of flow across an edge e equals or exceeds the cost of sending 1 + 2 units together, since ϕ0e is concave. It follows by induction up the tree A that c[x, w] equals or exceeds the cost of at least one way of sending d[x] units of flow from node w to the sinks in the sub-tree with root x in A while obeying the flow conservation constraints and meeting sink demands. Therefore, c[0][0] equals or exceeds the cost of at least one way of sending d0 units of flow from source 0 to all the sinks while meeting their demands and obeying the flow conservation constraints. The lemma is proved. Lemma 4.18. If there exists a minimum cost tree-like flow y with structure A in N 0 , the cost computed by MCTF(N , A) equals the minimum cost of a flow in N . Proof. If flow y has structure A then we can associate with each node x ∈ A a node xy ∈ X. We will refer to the cost of the flow being transported on the edges of a sub-tree of the base graph Gy as the cost of that subtree. We note that MCTF computes the shortest path from a node to its descendants, and uses the minimum total cost selecting each descendant. It follows by induction up the tree A that c[x, xy ] is less or equal to the cost of the subtree rooted at xy with regard to the flow y. Therefore, c[0][0] ≤ F 0 (y). But according to Lemma 4.17, c[0][0] exceeds or equals the cost of a flow in N 0 . Since y is a minimum cost flow in N we obtain that c[0][0] = F 0 (y). The lemma is proved.
208
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
Theorem 4.19. Algorithm MCDF(N , A) computes the minimum cost of a flow in the dynamic network N . Proof. According to Lemma 4.18 if there is a minimum cost flow with structure A in N then MCTF(N , A) will return its cost. However, for each treelike flow in N there is a structure in A. Since MCTF is called for each structure in A it follows that at least one call will return the minimum cost of a flow in A. Moreover, since according to Lemma 4.17 any cost returned by MCTF equals or exceeds the cost of a flow, all other calls will return costs equal to or higher than the cost of the minimum flow. Since MCDF selects the minimum from all calls, it will return the minimum flow in N 0 , and according to Theorem 4.16 it equals the cost of a minimum cost flow in N . The theorem is proved. The Complexity of the Algorithm We will first examine the complexity of the sub-algorithm MCTF(N , A). Note that with an appropriate choice of a tree representation, the iteration while ∃x ∈ READY(A, b[]) can be executed in O(|A|N ), where N is the number of operations required to execute the body of the cycle for each iteration. Similarly, the iteration for each α ∈ DESC(A, x) can be executed in O(|A|N ), where N is the number of operations required to execute the bode of the cycle for each iteration. Moreover, since each descendent is only processed once, and this cycle is embedded in the while loop, we can consider the complexity to be O(N ). In this evaluation we will assume that Dijkstra’s algorithm is used for the MINLEN computations. From the loop structure of MCTF it follows that its execution has the complexity O(|A| + |A|(m + n(n log n + m + n))) = O(|A|(mn+n2 log n)). Therefore algorithm MCDF has the complexity O(|A|(mn + n2 log n)). Obviously, for any k, A contains a number of trees N (k) that is exponential of k. However, if we consider networks with k fixed (not part of the input) then the algorithm is polynomial, with complexity O(mn + n2 log n). This approach represents an extension of the method for solving the static minimum cost flow problem proposed in [55]. In general, this algorithm can be developed for the case that the cost functions on the edges are concave with respect to the flow for every fixed moment in time t ∈ T. 4.1.8 Determining the Minimum Cost Flows in Dynamic Networks with Transition Time Functions that Depend on Flow and Time In the previous dynamic mathematical models the transition time functions are assumed to be constant at every moment in time for each edge of the network. The reality is that the transition time for shipping flows along the edges depends on the amount of flow and the moment in time. If we take
4.1 Single-Commodity Dynamic Flow Problems
209
into account this assumption we obtain a more general dynamic model for the minimum cost flow problem. We also consider two-side restrictions on the edge capacities. In order to formulate strictly the new general model we will describe some properties of transition time functions. We assume that the transition time function τe = τe (xe (t), t) is a non-negative non-decreasing left-continuous step function with respect to the amount of flow xe (t) for every fixed time-moment t ∈ T and an arbitrary given edge e ∈ E. Let us consider, for example, that the transition time function τe = τe (xe (t), t), graphic, of which for the fixed moment in time t and the given edge e is presented in Fig. 4.11. We denote by Pe,t the set of numbers of steps of the transition time function for the fixed moment in time t and the given edge e, i.e. here Pe,t = {1, 2, 3}. So, for the fixed moment in time t on the given edge e the transition time is equal to 3 if the value of the flow belongs to the interval [ue (t), 2]; the transition time is equal to 5 if the value of the flow belongs to the interval (2, 4]; the transition time is equal to 8 if the value of flow belongs to interval (4, ue (t)].
τ e (α e ( t ) , t ) τ e3 (α e ( t ) , t ) = 8 τ e2 (α e ( t ) , t ) = 5 τ e1 (α e ( t ) , t ) = 3
αe (t )
α e2 ( t ) = 4 α e0 ( t ) = ue′ ( t ) α e1 ( t ) = 2 α e3 ( t ) = ue′′ ( t ) Fig. 4.11.
Now let us formulate the minimum cost flow problem in dynamic networks with transition time functions that depend on flow and time. Let be given a network N = (X, E, f , f , τ, d, ϕ) with a set of vertices X and a set of edges E, lower and upper capacity functions f , f : E × T → R+ , a transition time function τ : E × T × R+ → R+ , a demand-supply function d: X × T → R and a cost function ϕ: E × R+ × T → R+ . As mentioned above, we consider the discrete time model, in which all times are integral and bounded by a time horizon T , which defines the set T = {0, 1, . . . , T } of time moments we consider. The supply is equal to the demand, i.e. t∈T x∈X dx (t) = 0.
210
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
A dynamic flow in N is represented by a function α: E × T → R+ , which defines the value αe (t) of the flow entering edge e at time t. Since we require that all edges must be empty after time horizon T , the following implication must hold for all e ∈ E and t ∈ T: If αe (t) > 0, then t + τe (αe (t), t) ≤ T . The dynamic flow α must satisfy the flow conservation constraints, which mean that at any time moment t ∈ T for every vertex x ∈ X the difference between the total amount of flow that leaves node x and the total amount of flow that enters node x, is equal to dx (t). The dynamic flow α is called feasible if it satisfies the following capacity constraints: fe (t) ≤ αe (t) ≤ fe (t),
∀ t ∈ T, ∀ e ∈ E.
As mentioned above, the cost function ϕe (αe (t), t) indicates the cost of shipping flows over edge e entering edge e at time t. The total cost of the dynamic flow α is defined as follows: F (α) =
ϕe (αe (t), t).
t∈T e∈E
The aim of the minimum cost dynamic flow problem is to find a feasible flow that minimizes this objective function.
The Method for Solving the Problem We propose an approach for solving the formulated problem, which is based on the reduction of this problem to a static one on a special auxiliary timeexpanded network N T . We define the network N T = (X T , E T , dT , f T , f T , , ϕT ) as follows: T
:= {x(t) | x ∈ X, t ∈ T};
1.
X
2.
T := {e(x(t)) | x(t) ∈ X T , e ∈ E − (x), t ∈ T \ T }; X
3.
T ; X T := X ∪ X
4.
T T := { E e(t) = (x(t), e(x(t))) | x(t) ∈ X and corresponding
T
T , t ∈ T \ T }; e(x(t)) ∈ X 5.
T T , E := {ep (t) = (e(x(t)), z(t + τep (αe (t), t))) | e(x(t)) ∈ X T
z(t + τep (αe (t), t)) ∈ X , e = (x, z) ∈ E, 0 ≤ t ≤ T − τep (αe (t), t), p ∈ Pe,t }; 6.
T T ; E T := E ∪ E
4.1 Single-Commodity Dynamic Flow Problems
dTx(t) := dx (t)
7.
211
T
for x(t) ∈ X ;
T ; dTe(x(t)) := 0 for e(x(t)) ∈ X 8.
f ee(t) T := fe (t)
T ; for e(t) ∈ E
T ; f ee(t) T := fe (t) for e(t) ∈ E f ep (t) T := αep−1 (t)
T
for ep (t) ∈ E , where αe0 (t) = fe (t); T
f ep (t) T := αep (t) for ep (t) ∈ E ; 9.
ϕTee(t) (αee(t) T ) := ϕe (αe (t), t) ϕTep (t) (αep (t) T ) := εp
T ; for e(t) ∈ E T
for ep (t) ∈ E , where ε1 < ε2 < · · · < ε|Pe,t | are small numbers.
To make the notations more clear we construct a part of the time-expanded network for a fixed moment in time t for a given edge e = (x, z) with the transition time function presented by Fig. 4.11. The constructed part of the time-expanded network is given in Fig. 4.12. where lower and upper capacities are written above each edge and the cost is written below each edge.
x(t)
f e′( t ) , 2
f e′( t ) , f e′′( t )
ϕ e (α e ( t ) , t )
e(x(t))
ε1
z(t+3)
2, 4
ε2
z(t+5)
4, f e′′( t )
ε3
z(t+8)
Fig. 4.12.
The following lemma gives the correspondence between flows in the dynamic network and the time-expanded network: Lemma 4.20. Let αT : E T → R+ be a flow in the static network N T . Then the function α: E × T → R+ defined as follows: αe (t) = αee(t) T = αep (t) T T , for e = (x, z) ∈ E, e(t) = (x(t), e(x(t))) ∈ E
212
4 Discrete Control and Optimal Dynamic Flow Problems on Networks T
ep (t) = (e(x(t)), z(t + τep (αe (t), t))) ∈ E , p ∈ Pe,t such that αee(t) T ∈ (αep−1 (t), αep (t)], t ∈ T, represents a flow in the dynamic network N . Let α: E×T → R+ be a flow in the dynamic network N . Then the function αT : E T → R+ defined as follows: αee(t) T = αe (t) αep (t) T = αe (t)
T , e = (x, z) ∈ E, t ∈ T; for e(t) = (x(t), e(x(t))) ∈ E for such p ∈ Pe,t that αe (t) ∈ (αep−1 (t), αep (t)] and αep (t) T = 0 for all other p ∈ Pe,t for T
ep (t) = (e(x(t)), z(t + τep (αe (t), t))) ∈ E ,
e = (x, z) ∈ E, t ∈ T,
represents a flow in the static network N T . The proof of this lemma is similar to the proof of Lemma 4.1. The following theorem holds: Theorem 4.21. If α∗ T is a static minimum cost flow in the static network N T , then the corresponding dynamic flow α∗ in the dynamic network N according to Lemma 4.20 is also a minimum cost flow and vice-versa. In such a way, to solve the minimum cost flow problem in dynamic networks with transition time functions that depend on the flow and time, we construct the time-expanded network, then solve the static minimum cost flow problem and reconstruct the solution of the static problem to the dynamic problem. 4.1.9 An Algorithm for Solving the Maximum Dynamic Flow Problem Let a network N with set of vertices X and set of edges E be given. As mentioned above, we consider the time discrete model, in which all times are integral and bounded by horizon T . The aim is to find a maximum flow over time in the network N within the makespan T = {0, 1, 2, . . . , T } while respecting the following restrictions: Each edge e ∈ E has a nonnegative timevarying capacity fe (t) which bounds the amount of the flow allowed on each edge at every moment in time. Moreover, edge e has an associated nonnegative transition time τe which determines the amount of time it takes for a flow to pass from the tail to the head of that edge. A feasible dynamic flow on N is a function α: E × T → R+ that satisfies the conditions (4.2), (4.3) as well as the following conditions: ⎧ ⎪ ⎨ yx (t), x ∈ X+ , αe (t) − αe (t − τe ) = 0, x ∈ X∗ , ∀ t ∈ T, ∀ x ∈ X; ⎪ ⎩ + e∈E − (x) e∈E (x) −yx (t), x ∈ X− , t−τ ≥0 e
4.1 Single-Commodity Dynamic Flow Problems
yx (t) ≥ 0,
213
∀ t ∈ T, ∀ x ∈ X.
The value of the flow α is defined as follows: |α| =
T
yx (t).
t=0 x∈X+
The maximum dynamic flow problem consists in finding a feasible flow that maximizes this objective function. It is easy to observe that if τe = 0, ∀ e ∈ E and T = 0 then the formulated problem becomes the classical maximum flow problem in a static network. To solve the formulated problem we propose an approach, which is based on the reduction of the dynamic maximum flow problem to a static maximum flow problem which is a well studied problem in operation research and other fields. We show that our problem in network N can be reduced to a static problem in the time-expanded network N T . The essence of the time-expanded network is that it contains a copy of the vertices of the dynamic network for each time t ∈ T, and the transition times and the flows are implicit in the edges linking those copies. This network is a static representation of the dynamic network and it can be defined in the similar way as for the minimum cost flow problem. Let e(t) = (x(t), z(t + τe )) ∈ E T and let αe (t) be a flow in the dynamic T in the time-expanded network network N . The corresponding function αe(t) T N is defined by relation (4.5). As mentioned above, it can be shown that the following theorem is true: Theorem 4.22. For each maximum flow in the dynamic network there is a corresponding maximum flow in the static time-expanded network and viceversa. We can easily obtain the following corollary: Corollary 4.23. The following condition is true: t∈T x∈X−
yx (t) =
yx (t).
t∈T x∈X+
In such a way, to solve the maximum flow problem in dynamic networks we have to construct the time-expanded network, after what we have to solve the classical maximum flow problem in the static network and then we have to reconstruct the solution of the static problem to the dynamic problem.
214
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
4.2 Multi-Commodity Dynamic Flow Problems and Algorithms for their Solving Now we study dynamic versions of the nonlinear minimum cost multicommodity flow problem and the maximum multi-commodity flow problem on networks, which generalize classical static flow problems and extend some dynamic problems from [26, 27, 32, 33, 41] and control models on networks from Chapter 1. To solve these dynamic problems, we propose algorithms based on the reduction of the problems to static ones on an auxiliary time-expanded network. The multi-commodity flow problem consists of shipping several different commodities from their respective sources to their sinks through a given network so that the total flow going through each edge does not exceed its capacity. No commodity ever transforms into another commodity, so that each one has its own flow conservation constraints, but they compete for the resources of the common network. We consider the minimum cost and the maximum multi-commodity flow problems on dynamic networks with timevarying capacities of the edges and transition times of the edges that depend on sort of commodity entering them, what means that the transition time functions on the set of edges for different commodities can be different. For the minimum cost multi-commodity dynamic flow problem we assume that cost functions, defined on the edges, are nonlinear and depend on the time and flow. Moreover, we assume that the demand-supply function also depends on time. 4.2.1 The Minimum Cost Multi-Commodity Dynamic Flow Problem The minimum cost multi-commodity dynamic flow problem asks to find the flow of a set of commodities through a network with given time horizon, satisfying all supplies and demands with minimum cost such that link capacities are not exceeded. We consider the discrete time model, in which all times are integral and bounded by horizon T , which defines the makespan T = {0, 1, . . . , T } of time moments we consider. Time is measured in discrete steps, so that if one unit of flow of commodity k leaves node z at time t on edge e, one unit of flow arrives at node x at time t + τek , where τek is the transition time of edge e = (z, x) for commodity k. Without loosing generality, we assume that no edges enter sources or exit sinks. In order for the flow to exist it is required that t∈T x∈X dkx (t) = 0, ∀k ∈ K. If dkx (t) > 0 for an arbitrary node, x ∈ X for a moment in time t ∈ T, then we treat this node x at the time-moment t as a source for commodity k ∈ K. If at a moment in time t ∈ T the condition dkx (t) < 0 holds, then we regard the node x at time-moment t as a sink for commodity k ∈ K. In case dkx (t) = 0 at a moment in time t ∈ T, we consider the vertex x for time-moment t as an intermediate node for commodity k ∈ K. In such a way, the same node
4.2 Multi-Commodity Dynamic Flow Problems and Algorithms
215
x ∈ X at different moments in time can serve as a source or as a sink on an intermediate node for commodity k ∈ K. Without losing generality, we consider that for every commodity k ∈ K k k the set of vertices X is divided into three disjoint subsets X+ , X− , X∗k , such that: k X+ consists of nodes x ∈ X, for which dkx (t) ≥ 0 for t ∈ T and there exists at least one moment in time t0 ∈ T such that dkx (t0 ) > 0; k X− consists of nodes x ∈ X, for which dkx (t) ≤ 0 for t ∈ T and there exists at least one moment in time t0 ∈ T such that dkx (t0 ) < 0; k X∗ consists of nodes x ∈ X, for which dkx (t) = 0 for every t ∈ T. k k So, X+ is a set of sources, X− is a set of sinks and X∗k is a set of intermediate nodes for the commodity k ∈ K in the network N . A multi-commodity dynamic flow in N is a function α: E × K × T → R+ that satisfies the following conditions: αek (t) − αek (t − τek ) = dkx (t), e∈E − (x)
e∈E + (x) t−τek ≥0
(4.15)
∀ t ∈ T, ∀ x ∈ X, ∀k ∈ K;
αek (t) ≤ fe (t),
∀ t ∈ T, ∀e ∈ E;
(4.16)
0 ≤ αek (t) ≤ rek (t),
∀ t ∈ T, ∀ e ∈ E, ∀k ∈ K;
(4.17)
αek (t) = 0, ∀ e ∈ E,
t = T − τek + 1, T , ∀k ∈ K.
(4.18)
k∈K
Here the function α defines the value αek (t) of the flow of commodity k entering edge e at time t. The flow of commodity k does not enter edge e at time t if it has to leave the edge after time T ; this is ensured by condition (4.18). Individual capacity constraints (4.17) mean that at most rek (t) units of the flow of commodity k can enter edge e at time t. Mutual capacity constraints (4.16) mean that at most fe (t) units of the flow can enter edge e at time t. The constraints (4.17) and (4.16) are called weak and strong forcing constraints, respectively. The conditions (4.15) represent flow conservation constraints. To model transition costs, which may change over time, we define the cost function ϕe (αe1 (t), αe2 (t), . . . , αeq (t), t) which indicates the cost of shipping flows over edge e entering edge e at time t. The total cost of the dynamic multi-commodity flow α is defined as follows: F (α) = ϕe (αe1 (t), αe2 (t), . . . , αeq (t), t). t∈T e∈E
The aim of the minimum cost multi-commodity dynamic flow problem is to find a flow that minimizes this objective function.
216
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
It is important to note that in many practical cases cost functions are presented in the following form: ϕke (αek (t), t). (4.19) ϕe (αe1 (t), αe2 (t), . . . , αeq (t), t) = k∈K
The case that τek = 0, ∀ e ∈ E, ∀ k ∈ K and T = 0 can be considered as the static minimum cost multi-commodity flow problem. 4.2.2 The Main Results We propose an approach for solving the formulated problem, which is based on a reduction of this problem to a static one. We show that the minimum cost multi-commodity flow problem in network N can be reduced to a static problem in a time-expanded auxiliary network N T . We define the time-expanded network N T = (X T , E T , K, dT , rT , f T , ϕT ) as follows: 1.
X
T
:= {x(t) | x ∈ X, t ∈ T
};
T
2.
T := {e(x(t)) | x(t) ∈ X , e ∈ E − (x), t ∈ T \ T }; X
3.
T T ; X T := X ∪ X
4.
T T := { E e(t) = (x(t), e(x(t))) | x(t) ∈ X and
T , t ∈ T \ T }; corresponding e(x(t)) ∈ X 5.
T T, z(t + τ k ) ∈ X T, E := {ek (t) = (e(x(t)), z(t + τek )) | e(x(t)) ∈ X e
e = (x, z) ∈ E, 0 ≤ t ≤ T − τek , k ∈ K}; 6. 7.
8.
T
T ; E T := E ∪ E T
dkx(t) := dkx (t)
T
for x(t) ∈ X , k ∈ K;
T T , k ∈ K; dke(x(t)) := 0 for e(x(t)) ∈ X $ rek (t), if l = k T l for rek (t) := 0, if l = k T
ek (t) ∈ E , k ∈ K and T
rlee(t) = ∞ 9.
f ee(t) T := fe (t) f ek (t) T := ∞
T , l ∈ K; for e(t) ∈ E
T ; for e(t) = (x(t), e(x(t))) ∈ E T
for ek (t) ∈ E , k ∈ K;
4.2 Multi-Commodity Dynamic Flow Problems and Algorithms
10.
T
T
T ϕTee(t) (α1ee(t) , α2ee(t) , . . . , αqee(t) ) T
T
217
:= ϕe (αe1 (t), αe2 (t), . . . , αeq (t), t)
T ; for e(t) = (x(t), e(x(t))) ∈ E T
T
ϕTek (t) (α1ek (t) , α2ek (t) , . . . , αqek (t) ) := 0 for ek (t) ∈ E , k ∈ K. The correspondence between the flows in the dynamic network and the static time-expanded network is presented by the following lemma: Lemma 4.24. Let αT : E T ×K → R+ be a multi-commodity flow in the static network N T . Then the function α: E × K × T → R+ defined in the following way: T
T
αek (t) = αkek (t) = αkee(t) for e = (x, z) ∈ E,
T
ek (t) = (e(x(t)), z(t + τek )) ∈ E , T , e(t) = (x(t), e(x(t))) ∈ E k ∈ K, t ∈ T, represents a multi-commodity flow in the dynamic network N . Let α: E ×K ×T → R+ be a multi-commodity flow in the dynamic network N . Then the function αT : E T × K → R+ defined in the following way: T
T , e = (x, z) ∈ E, αkee(t) = αek (t) for e(t) = (x(t), e(x(t))) ∈ E k ∈ K, t ∈ T; $ T αkek (t) = αek (t); T for ek (t) = (e(x(t)), z(t + τek )) ∈ E , T l αek (t) = 0, l = k e = (x, z) ∈ E, l, k ∈ K, t ∈ T, represents a multi-commodity flow in the static network N T . Proof. To prove the first part of the lemma, we have to show that the conditions (4.15)-(4.18) for the α defined above in the dynamic network N are true. These conditions evidently result from the following definition of multicommodity flows in the static network N T :
T
αke(t) −
e(t)∈E − (x(t))
T
T
αke(t−τek ) = dkx(t) ,
e(t−τek )∈E + (x(t)) T
(4.20)
∀ x(t) ∈ X , ∀ t ∈ T, ∀k ∈ K;
T
αke(t) ≤ f e(t) T ,
∀e(t) ∈ E T , ∀ t ∈ T;
(4.21)
∀ e(t) ∈ E T , ∀ t ∈ T, ∀k ∈ K;
(4.22)
∀ e(t) ∈ E, t = T − τek + 1, T , ∀k ∈ K.
(4.23)
k∈K T
T
0 ≤ αke(t) ≤ rke(t) , T
αke(t) = 0,
218
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
In order to prove the second part of the lemma it is sufficient to show that the conditions (4.20)-(4.23) hold. The correctness of these conditions results from the procedure of constructing the time-expanded network, as well as the correspondence between the flows in static and dynamic networks and the satisfied conditions (4.15)-(4.18). The lemma is proved. The following theorem holds: Theorem 4.25. If α∗ T is a static minimum cost multi-commodity flow in the static network N T , then the corresponding dynamic multi-commodity flow α∗ in the dynamic network N (according to Lemma 4.24) is also a minimum cost flow and vice-versa. Proof. Taking into account the correspondence between static and dynamic multi-commodity flows on the basis of Lemma 4.24, we obtain that the costs of the static multi-commodity flow in the time-expanded network N T and the corresponding dynamic multi-commodity flow in the dynamic network N are equal. Indeed, to solve the minimum cost multi-commodity flow problem in the static time-expanded network N T , we have to solve the following problem: q 1 2 F T (αT ) = ϕTe(t) (αe(t) , αe(t) , . . . , αe(t) ) → min t∈T e(t)∈E T
subject to (4.20)-(4.23).
The Dynamic Network with Separable Cost Functions and without Mutual Capacity of the Edges In the case of a minimum cost flow problem with separable cost functions (4.19) and without mutual capacity constraints for the edges we can simplify the procedure of constructing the time-expanded network. In this case we do T . In that T and a new set of edges E not have to add a new set of vertices X T way the time-expanded network N is defined as follows: 1.
X T := {x(t) | x ∈ X, t ∈ T};
2.
E T := {ek (t) = (x(t), z(t + τek )) | e = (x, z) ∈ E, 0 ≤ t ≤ T − τek , k ∈ K};
3. 4.
5.
T
dkx(t) := dkx (t) for x(t) ∈ X T , k ∈ K; k re (t), if l = k for ek (t) ∈ E T , k ∈ K; T l rek (t) := 0, if l = k for ek (t) ∈ E T , k ∈ K; k k ϕe (αe (t), t), if l = k for ek (t) ∈ E T , k ∈ K; T T l l ϕek (t) (αek (t) ) := 0, if l = k for ek (t) ∈ E T , k ∈ K.
4.2 Multi-Commodity Dynamic Flow Problems and Algorithms
219
The correspondence between flows in the dynamic network N and the static network N T is defined as follows: Let αT : E T × K → R+ be a multicommodity flow in the static network N T . Then the function α: E × K × T → T R+ , defined as follows: αek (t) = αkek (t) for e ∈ E, ek (t) ∈ E T , k ∈ K, t ∈ T, represents the multi-commodity flow in the dynamic network N . Let α: E × K × T → R+ be a multi-commodity flow in the dynamic network N . Then the function αT : E T × K → R+ , defined as follows: T T αkek (t) = αek (t); αlek (t) = 0, l = k for ek (t) ∈ E T , e ∈ E, l, k ∈ K, t ∈ T, represents the multi-commodity flow in the static network N T . As above, it can be proved that if α∗ T is a static minimum cost multicommodity flow in the static network N T , then the corresponding dynamic multi-commodity flow α∗ in the dynamic network N is also a minimum cost flow and vice-versa. The Dynamic Network with Common Transition Times for Each Commodity In the case of the minimum cost flow problem with common transition times for each commodity the time-expanded network can also be constructed in a T and a new set of more simply way without adding a new set of vertices X T T edges E . Thus the time-expanded network N is defined as follows: 1.
X T := {x(t) | x ∈ X, t ∈ T};
2.
E T := {e(t) = (x(t), z(t + τe )) | e = (x, z) ∈ E, 0 ≤ t ≤ T − τe }; T
3.
dkx(t) := dkx (t)
for x(t) ∈ X T , k ∈ K;
4.
f e(t) T := fe (t)
for e(t) ∈ E T ;
5.
rke(t) := rek (t)
6.
T
T
T
for e(t) ∈ E T , k ∈ K;
T
ϕTe(t) (α1e(t) , α2e(t) , . . . , αqe(t) ) := ϕe (αe1 (t), αe2 (t), . . . , αeq (t), t) for e(t) ∈ E T .
In this case the correspondence between flows in the dynamic network N and the static network N T is defined in the following way: Let αT : E T × K → R+ be a multi-commodity flow in the static network N T . Then the function T α: E × K × T → R+ defined as follows: αek (t) = αke(t) for e ∈ E, e(t) ∈ E T , k ∈ K, t ∈ T, represents the multi-commodity flow in the dynamic network N . Let α: E × K × T → R+ be a multi-commodity flow in the dynamic network N . Then the function αT : E T × K → R+ defined as follows:
220
4 Discrete Control and Optimal Dynamic Flow Problems on Networks T
αke(t) = αek (t) for e(t) ∈ E T , e ∈ E, k ∈ K, t ∈ T, represents the multicommodity flow in the static network N T . As above, it can be proved that if α∗ T is a static minimum cost multicommodity flow in the static network N T , then the corresponding dynamic multi-commodity flow α∗ in the dynamic network N is also a minimum cost flow and vice-versa. 4.2.3 The Algorithm Let be given a dynamic network N . The minimum cost multi-commodity flow problem has to be solved on N . The procedure is as follows: 1. Build the time-expanded network N T for the given dynamic network N . 2. Solve the classical minimum cost multi-commodity flow problem in the static network N T using the known algorithms. 3. Reconstruct the solution of the static problem in N T to the dynamic problem in N . 4.2.4 Examples In this section we show how to construct the time-expanded network N T in different cases for the dynamic network N given in Fig. 4.1 with two commodities. According to the introduced notations we have: A set of vertices X = {x1 , x2 , x3 }; a set of edges E = {e1 = (x1 , x2 ), e2 = (x1 , x3 ), e3 = (x2 , x3 )}; a set of commodities K = {1, 2}. The set of time moments we consider is T = {0, 1, 2, 3}. The transition times for each edge for each commodity are defined in the following way: τe11 = 2, τe21 = 1, τe12 = 1, τe22 = 3, τe13 = 1, τe23 = 2. The mutual capacity, individual capacity, demand-supply, and cost functions are considered to be known. Example. Let us construct the time-expanded network N T for the dynamic network N in the general case. In accordance with the definition of the time-expanded network N T we obtain:
X
T
= x1 (t), x2 (t), x3 (t) t ∈ T ;
T = e1 (x1 (t)), e2 (x1 (t)), e3 (x2 (t)) t ∈ T \ T ; X T T ; XT = X ∪ X T = e1 (t) = (x1 (t), e1 (x1 (t))), t ∈ T \ T ; E
e2 (t) = (x1 (t), e2 (x1 (t))), t ∈ T \ T ;
4.2 Multi-Commodity Dynamic Flow Problems and Algorithms
e3 (t) = (x2 (t), e3 (x2 (t))), t ∈ T \ T ; T E = ek1 (t) = (e1 (x1 (t)), x2 (t + τek1 )), 0 ≤ t ≤ T − τek1 , k ∈ K; ek2 (t) = (e2 (x1 (t)), x3 (t + τek2 )), 0 ≤ t ≤ T − τek2 , k ∈ K;
ek3 (t) = (e3 (x2 (t)), x3 (t + τek3 )), 0 ≤ t ≤ T − τek3 , k ∈ K = e11 (t) = (e1 (x1 (t)), x2 (t + 2)), t = 0; 1; e21 (t) = (e1 (x1 (t)), x2 (t + 1)), t = 0; 1; 2; e12 (t) = (e2 (x1 (t)), x3 (t + 1)), t = 0; 1; 2; e22 (t) = (e2 (x1 (t)), x3 (t + 3)), t = 0; e13 (t) = (e3 (x2 (t)), x3 (t + 1)), t = 0; 1; 2;
e23 (t) = (e3 (x2 (t)), x3 (t + 2)), t = 0; 1 ; T T ; ET = E ∪ E T
T
T
dkx1 (t) = dkx1 (t), dkx2 (t) = dkx2 (t), dkx3 (t) = dkx3 (t), t ∈ T, k ∈ K; T
T
T
dke1 (x1 (t)) = 0, dke2 (x1 (t)) = 0, dke3 (x2 (t)) = 0, t ∈ T \ T, k ∈ K; T
T
T
rkek (t) = rek1 (t), rkek (t) = rek2 (t), rkek (t) = rek3 (t), t ∈ T, k ∈ K; 1
2
T
3
T
T
rlek (t) = 0, rlek (t) = 0, rlek (t) = 0, t ∈ T, l = k, k ∈ K; 1
2
T
3
T
T
rkee1 (t) = ∞, rkee2 (t) = ∞, rkee3 (t) = ∞, t ∈ T \ T, k ∈ K; f ee1 (t) T = fe1 (t), f ee2 (t) T = fe2 (t), f ee3 (t) T = fe3 (t), t ∈ T \ T ; f ek1 (t) T = ∞, f ek2 (t) T = ∞, f ek3 (t) T = ∞, t ∈ T, k ∈ K; T
T
T
T
T
T
ϕTee1 (t) (α1ee1 (t) , . . . , αqee1 (t) ) = ϕe1 (αe11 (t), . . . , αeq1 (t), t), ϕTee2 (t) (α1ee2 (t) , . . . , αqee2 (t) ) = ϕe2 (αe12 (t), . . . , αeq2 (t), t), ϕTee3 (t) (α1ee3 (t) , . . . , αqee3 (t) ) = ϕe3 (αe13 (t), . . . , αeq3 (t), t), t ∈ T \ T ; T
T
T
T
ϕTek (t) (α1ek (t) , . . . , αqek (t) ) = 0, ϕTek (t) (α1ek (t) , . . . , αqek (t) ) = 0, 1
1
1
2
2
2
221
222
4 Discrete Control and Optimal Dynamic Flow Problems on Networks T
T
ϕTek (t) (α1ek (t) , . . . , αqek (t) ) = 0, t ∈ T, k ∈ K. 3
3
3
The constructed time-expanded network N T is represented in Fig. 4.13.
t
0
t 1 e1x10
x11
x10
x20
2
x12
e2x12
e3x2 1
x3 1
3
x13
x23
x22
x2 1 e3 x2 0
t e1x12
e2x11
e2x10
x30
t e1x11
e3x22
x3 2
x3 3
Fig. 4.13.
Example. Let us construct the time-expanded network N T for the dynamic network N in case of separable cost functions and without mutual capacity of the edges. In accordance with the definition of the time-expanded network N T in this case we obtain:
X T = x1 (t), x2 (t), x3 (t) t ∈ T ; E T = ek1 (t) = (x1 (t), x2 (t + τek1 )), 0 ≤ t ≤ T − τek1 , k ∈ K; ek2 (t) = (x1 (t), x3 (t + τek2 )), 0 ≤ t ≤ T − τek2 , k ∈ K;
ek3 (t) = (x2 (t), x3 (t + τek3 )), 0 ≤ t ≤ T − τek3 , k ∈ K = e11 (t) = (x1 (t), x2 (t + 2)), t = 0; 1; e21 (t) = (x1 (t), x2 (t + 1)), t = 0; 1; 2; e12 (t) = (x1 (t), x3 (t + 1)), t = 0; 1; 2;
4.2 Multi-Commodity Dynamic Flow Problems and Algorithms
223
e22 (t) = (x1 (t), x3 (t + 3)), t = 0; e13 (t) = (x2 (t), x3 (t + 1)), t = 0; 1; 2;
e23 (t) = (x2 (t), x3 (t + 2)), t = 0; 1 ; T
T
T
dkx1 (t) = dkx1 (t), dkx2 (t) = dkx2 (t), dkx3 (t) = dkx3 (t), t ∈ T, k ∈ K; T
T
T
rkek (t) = rek1 (t), rkek (t) = rek2 (t), rkek (t) = rek3 (t), t ∈ T, k ∈ K; 1
2
T
3
T
T
rlek (t) = 0, rlek (t) = 0, rlek (t) = 0, t ∈ T, l = k, k ∈ K; 1
T
2
3
T
ϕkek (t) (αkek (t) ) = ϕke1 (αek1 (t), t), 1
1
T
T
ϕkek (t) (αkek (t) ) = ϕke2 (αek2 (t), t), 2
2
T
T
ϕkek (t) (αkek (t) ) = ϕke3 (αek3 (t), t), t ∈ T, k ∈ K; 3
3
T
T
ϕlek (t) (αlek (t) ) = 0, 1
1
T
T
ϕlek (t) (αlek (t) ) = 0, 2
2
T
T
ϕlek (t) (αlek (t) ) = 0, t ∈ T, l = k, k ∈ K. 3
3
The constructed time-expanded network N T is represented in Fig. 4.14. Example. Let us construct the time-expanded network N T for the dynamic network N in case of common transition times for each commodity. We consider that τe1 = 1, τe2 = 1, τe3 = 2. In accordance with the definition of the time-expanded network N T in this case we obtain:
X T = x1 (t), x2 (t), x3 (t) t ∈ T ; E T = e1 (t) = (x1 (t), x2 (t + τe1 )), 0 ≤ t ≤ T − τe1 ; e2 (t) = (x1 (t), x3 (t + τe2 )), 0 ≤ t ≤ T − τe2 ;
e3 (t) = (x2 (t), x3 (t + τe3 )), 0 ≤ t ≤ T − τe3 = e1 (t) = (x1 (t), x2 (t + 1)), t = 0; 1; 2; e2 (t) = (x1 (t), x3 (t + 1)), t = 0; 1; 2;
224
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
t
0
t 1
x1 0
x 1 1
x 2 0
x2 1
x3 0
x 3 1
t
2 x1 2
x2 2
x3 2
t
3 x1 3
x 2 3
x 3 3
Fig. 4.14.
e3 (t) = (x2 (t), x3 (t + 2)), t = 0; 1 ; T
T
T
dkx1 (t) = dkx1 (t), dkx2 (t) = dkx2 (t), dkx3 (t) = dkx3 (t), t ∈ T, k ∈ K; f e1 (t) T = fe1 (t), f e2 (t) T = fe2 (t), f e3 (t) T = fe3 (t), t ∈ T; T
T
T
rke1 (t) = rek1 (t), rke2 (t) = rek2 (t), rke3 (t) = rek3 (t), t ∈ T, k ∈ K;
T
T
T
T
T
T
ϕTe1 (t) (α1e1 (t) , . . . , αqe1 (t) ) := ϕe1 (αe11 (t), . . . , αeq1 (t), t), ϕTe2 (t) (α1e2 (t) , . . . , αqe2 (t) ) := ϕe2 (αe12 (t), . . . , αeq2 (t), t), ϕTe3 (t) (α1e3 (t) , . . . , αqe3 (t) ) := ϕe3 (αe13 (t), . . . , αeq3 (t), t), t ∈ T. The constructed time-expanded network is represented in Fig. 4.15. 4.2.5 The Dynamic Multi-Commodity Minimum Cost Flow Problem with Transition Time Functions that Depend on Flows and on Time In the following, we propose an approach for solving the minimum cost multicommodity dynamic flow problem with transition time functions that depend on flows and on time. We consider this problem on dynamic networks with
4.2 Multi-Commodity Dynamic Flow Problems and Algorithms
t
0
t 1
t
x 1 1
x1 0
x 2 0
x 2 1
x 3 0
x 3 1
2
t
3
x 1 2
x1 3
x 2 2
x 2 3
x 3 2
225
x 3 3
Fig. 4.15.
time-varying lower and upper capacity functions, time-varying mutual capacity function, and time-varying demand-supply function. We assume that the cost functions, defined on edges, are nonlinear and depend on flow and on time. We consider that the transition time function τek (αek (t), t) is a nonnegative non-decreasing left-continuous step function for each commodity k ∈ K.
τ e1 (α e1 ( t ) , t ) τ e1,3 (α e1 ( t ) , t ) = 8 τ e1,2 (α e1 ( t ) , t ) = 5 τ e1,1 (α e1 ( t ) , t ) = 3 α
1,0 e
(t ) = r (t ) '1 e
α
1,1 e
(t )
Fig. 4.16.
α
1,2 e
(t )
α e1 ( t ) α e1,3 ( t ) = re''1 ( t )
226
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
τ e2 (α e2 ( t ) , t ) τ e2,3 (α e2 ( t ) , t ) = 7 τ e2,2 (α e2 ( t ) , t ) = 4 τ e2,1 (α e2 ( t ) , t ) = 2 α
2,0 e
(t ) = r (t ) '2 e
α
2,1 e
α
(t )
2,2 e
(t )
α e2 ( t ) α e2,3 ( t ) = re''2 ( t )
Fig. 4.17.
For example, let us consider the transition time functions for an edge e = (x, y) at the moment in time t presented in the Figures 4.16 and 4.17, which correspond to commodities 1 and 2, respectively. The method for solving the minimum cost multi-commodity dynamic flow problem with transition time functions that depend on flows and time is based on the reduction of the dynamic problem to a static problem on a special auxiliary time-expanded network N T . We define the network N T = (X T , E T , dT , f T , rT , rT , ϕT ) as follows: T
:= {x(t) | x ∈ X, t ∈ T};
1.
X
2.
T := {e(x(t)) | x(t) ∈ X T , e ∈ E − (x), t ∈ T \ {T }}; X
3.
T ; X T := X ∪ X
4.
T T := { E e(t) = (x(t), e(x(t))) | x(t) ∈ X and corresponding
T
T , t ∈ T \ {T }}; e(x(t)) ∈ X 5.
T
T , E := {ek,p (t) = (e(x(t)), z(t + τek,p (αek (t), t))) | e(x(t)) ∈ X T
z(t + τek,p (αek (t), t)) ∈ X , e = (x, z) ∈ E, k 0 ≤ t ≤ T − τek,p (αek (t), t), p ∈ Pe,t − set of numbers of
steps of the transition time function τek (αek (t), t), k ∈ K};
4.2 Multi-Commodity Dynamic Flow Problems and Algorithms
227
T T ; E T := E ∪ E
6.
T
T
dkx(t) := dkx (t) for x(t) ∈ X , k ∈ K;
7.
T
T , k ∈ K; dke(x(t)) := 0 for e(x(t)) ∈ X f ee(t) T := fe (t)
8.
f ek,p (t) T := ∞
9.
rlek,p (t)
T
rlek,p (t)
T
T ; for e(t) ∈ E T
for ek,p (t) ∈ E ; ⎧ T ⎪ αek,p−1 (t), if l = k for ek,p (t) ∈ E , l ∈ K, where ⎪ ⎪ ⎨ := αek,0 (t) = rek (t); ⎪ ⎪ ⎪ ⎩ T if l = k for ek,p (t) ∈ E , l ∈ K; ⎧ 0, ⎨ αek,p (t), if l = k for ek,p (t) ∈ E T , l ∈ K; := T ⎩ 0, if l = k for ek,p (t) ∈ E , l ∈ K;
T T T , l ∈ K; rlee(t) = −∞; rlee(t) = +∞ for e(t) ∈ E
10.
T
T
T
ϕee(t) T (α1ee(t) , α2ee(t) , . . . , αqee(t) ) := ϕe (αe1 (t), αe2 (t), . . . , αeq (t), t) T ; for e(t) ∈ E T
T
T
T
ϕek,p (t) T (α1ek,p (t) , α2ek,p (t) , . . . , αqek,p (t) ) := εk,p for ek,p (t) ∈ E , k
where εk,1 < εk,2 < · · · < εk,|Pe,t | are small numbers. The constructed part of the time-expanded network for the fixed moment in time t for the edge e = (x, z) is presented in Fig. 4.18. Lemma 4.26. Let αT : E T ×K → R+ be a multi-commodity flow in the static network N T . Then the function α: E × K × T → R+ defined in the following way: T
αek (t) = αkee(t) = αkek,p (t)
T
for
T , e = (x, z) ∈ E, e(t) = (x(t), e(x(t))) ∈ E T
ek,p (t) = (e(x(t)), z(t + τek,p (αek (t), t))) ∈ E , T
k p ∈ Pe,t is such that αkee(t) ∈ (αek,p−1 (t), αek,p (t)],
t ∈ T, k ∈ K, represents a multi-commodity flow in the dynamic network N .
228
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
Let α: E ×K ×T → R+ be a multi-commodity flow in the dynamic network N . Then the function αT : E T × K → R+ defined in the following way: T
αkee(t) = αek (t)
for
T , e = (x, z) ∈ E, k ∈ K, t ∈ T; e(t) = (x(t), e(x(t))) ∈ E T
αlek,p (t) = 0,
l = k;
T
αkek,p (t) = αek (t)
k for such p ∈ Pe,t that αek (t) ∈ (αek,p−1 (t), αek,p (t)], T
k αkek,p (t) = 0 for all other p ∈ Pe,t T
for ek,p (t) = (e(x(t)), z(t + τek,p (αek (t), t))) ∈ E , e = (x, z) ∈ E, l, k ∈ K, t ∈ T, represents a multi-commodity flow in the static network N T . Proof. This lemma can be proved in a similar way as Lemma 4.24.
z(t+2)
e2,1 ( t )
z(t+3)
e1,1 ( t )
x(t)
e2,2 ( t )
e ( t ) e(x(t))
e1,2 ( t )
z(t+4)
z(t+5)
e2,3 ( t ) e1,3 ( t )
z(t+7)
z(t+8)
Fig. 4.18.
On the basis of this lemma and the definition of the time-expanded network we can prove the following theorem:
4.2 Multi-Commodity Dynamic Flow Problems and Algorithms
229
Theorem 4.27. If α∗ T is a minimum cost multi-commodity flow in the static network N T , then the corresponding multi-commodity flow α∗ in the dynamic network N (according to Lemma 4.26) is also a minimum cost one and viceversa. Solving the minimum cost multi-commodity dynamic flow problem with transition time functions that depend on flow and time in such a way we construct the time-expanded network. Afterwards, we solve the multi-commodity static flow problem and reconstruct the solution of the static problem to the dynamic problem. 4.2.6 Generalizations In the following we study some more general cases of the minimum cost dynamic multi-commodity flow problems. The model of the dynamic network with flow storage at nodes can be reduced to the initial one by introducing loops in those nodes in which there is flow storage. The flow which was stored at the nodes passes through these loops. We assume that the introduced loops have corresponding capacities, specified limited transition times and zero-cost functions. The dynamic network, in which the cost functions also depend on the flow at nodes, can be reduced to the initial one in a similar way. The same reasoning to solve the minimum cost flow problem in the dynamic networks and its generalization holds in the case that, instead of condition (4.17) in the definition of the multi-commodity dynamic flow, the following condition holds: rek (t) ≤ αek (t) ≤ rek (t),
∀ t ∈ T, ∀ e ∈ E, ∀ k ∈ K
where rek (t) and rek (t) are lower and upper boundaries of the capacity of edge e, respectively. The approach of reducing the problem with two-side restrictions on the edge capacities to the problem with one-side restrictions on the edge capacities can be found in [15]. 4.2.7 An Algorithm for Solving the Maximum Dynamic Multi-Commodity Flow Problem In this section we show that the proposed time-expanded network method can be also used for the maximum dynamic multi-commodity flow problem. This problem requires to find the maximum flow of a set of commodities within a given time bound through a network without violating capacity constraints of any edge. As mentioned above, we consider the discrete time model, in which all times are integral and bounded by horizon T . Time is measured in discrete steps, the set of time moments we consider is T = {0, 1, . . . , T }.
230
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
We consider a network N that contains a directed graph G = (X, E) and a set of commodities K that must be routed through the same network. Each edge e ∈ E has a nonnegative time-varying capacity rek (t) which bounds the amount of a flow of each commodity k ∈ K allowed on edge e at every moment in time t ∈ T. We also consider that every edge e ∈ E has a nonnegative timevarying capacity for all commodities, which is known as the mutual capacity fe (t). Moreover, each edge e ∈ E has an associated nonnegative transition time τek which determines the amount of time it takes for a flow of commodity k to travel from the tail to the head of that edge. A dynamic multi-commodity flow in N is a function α: E × K × T → R+ that satisfies the conditions (4.16) – (4.18) and the following conditions: ⎧ k y k (t), x ∈ X+ , ⎪ ⎪ ⎨ x k k k k 0, x ∈ X∗ , αe (t) − αe (t − τe ) = ⎪ ⎪ ⎩ k e∈E − (x) e∈E + (x) k −yx (t), x ∈ X− , k t−τe ≥0
∀ t ∈ T, ∀ x ∈ X, ∀k ∈ K; yxk (t) ≥ 0, ∀ x ∈ X, ∀t ∈ T, ∀k ∈ K. The value of the dynamic flow α is defined as follows: |α| =
yxk (t).
k k∈K t∈T x∈X+
The aim of the maximum multi-commodity dynamic flow problem is to find a flow that maximizes this objective function. The case τek = 0, ∀ e ∈ E, ∀ k ∈ K and T = 0 can be regarded as the static maximum multi-commodity flow problem. To solve the maximum multi-commodity dynamic flow problem by its reduction to a static one, we define the time-expanded network N T as in Section 4.2.2 but without demand-supply and cost functions. The correspondence between flows in the dynamic network N and the static network N T is also defined as mentioned above. The time-expanded network in the case that there are no mutual capacity constraints for the edges or that the transition time functions are the same for different commodities is defined in an analogous mode as in Section 4.2.2. Using the same reasoning as mentioned above we obtain the following theorem:
4.3 The Game-Theoretic Approach for Dynamic Flow Problems
231
Theorem 4.28. If α∗ T is a static maximum multi-commodity flow in the static network N T , then the corresponding dynamic multi-commodity flow α∗ in the dynamic network N is also a maximum flow and vice-versa. The maximum multi-commodity flow problem in dynamic networks can be solved in such a way by applying network flow optimization techniques for static flows directly to the expanded network. To solve the maximum multicommodity flow problem on N we have to build the time-expanded network N T for the given dynamic network N , after what we have to solve the classical maximum multi-commodity flow problem in the static network N T , using the known algorithms and then it is necessary to reconstruct the solution of the static problem in N T to the dynamic problem in N .
4.3 The Game-Theoretic Approach for Dynamic Flow Problems on Networks The game-theoretic approach we have used in Chapters 1–3 can be developed for more general dynamic models such as minimum cost dynamic flow problems on networks. In the following, we can see that the optimal control problem on the network from Section 1.4 represents the particular case of the considered minimum cost flow problem on dynamic networks studied in [1, 9, 26, 27, 29, 30, 31, 41, 63, 76, 87]. This particulary case is obtained for single source - single sink incapacitated problems when the cost functions on edges do not depend on the amount of flow but depend only on time. We consider the discrete time model, in which all times are integral and bounded by horizon T . Time is measured in discrete steps, so that if one unit of flow leaves node x at time t on edge e = (x, y), one unit of flow arrives at node y at time t + τe , where τe is the transition time of edge e. In order to describe the game-theoretic approach for the considered problem we shall use the multi-commodity version of the optimal dynamic flow problem. So, we consider a network N = (X, E, K, r, f, τ, d, ϕ) with set of vertices X, set of edges E and set of commodities K that must be routed through the same network. Each edge e ∈ E has a nonnegative time-varying capacity rek (t) which bounds the amount of flow of each commodity k ∈ K allowed on each edge e ∈ E at every moment in time t ∈ T. We also consider that every edge e ∈ E has a nonnegative time-varying capacity for all commodities, which is known as the mutual capacity fe (t). Moreover, each edge e ∈ E has an associated nonnegative transition time τe which determines the amount of time it takes for a flow to travel from the tail to the head of that edge. The underlying network also consists of a demand function d: X × K × T → R and a cost function ϕ: E × R+ × K × T → R+ , where T = {0, 1, 2, . . . , T }. All assumptions made above hold in this case of the problem.
232
4 Discrete Control and Optimal Dynamic Flow Problems on Networks
A feasible multi-commodity dynamic flow on N is a function α: E × K × T → R+ that satisfies the following conditions:
αek (t − τe ) −
e∈E + (x) t−τe ≥0
αek (t) = dkx (t),
∀ t ∈ T, ∀ x ∈ X, ∀k ∈ K;
αek (t) ≤ fe (t),
∀ t ∈ T, ∀e ∈ E;
e∈E − (x)
k∈K
0 ≤ αek (t) ≤ rek (t), αek (t) = 0,
∀ t ∈ T, ∀ e ∈ E, ∀k ∈ K; ∀ e ∈ E, t = T − τe + 1, T , ∀k ∈ K.
If we associate with each commodity a player for each k ∈ K and define the playoff function F k (α) = ϕke (αe1 (t), αe2 (t), . . . , αe|K| (t), t), t∈T e∈E
then we can regard this problem as a game-theoretic problem, where players interact with each other and the choices of one player influence the others’ choices. Control decisions are made by each player according to its own individual performance objectives and depending on the choices of the other players. The game-theoretic approach fits perfectly in the realm of such a problem, and an equilibrium or stable operating point of the system has to be found. Game-theoretic models are widely employed in the context of flow control, routing, virtual path bandwidth allocation and pricing in modem networking. Flow problems in multimedia applications (teleconferencing, digital libraries) over high-speed broadband networks can serve as good examples of this references. The problem of providing bandwidth which will be shared by many users is also a very important problem. As it is typical for games in such a problem class the interaction among the users on their individual strategies has to be imposed. The game-theoretic approach can also be applied in a problem of power control in radio systems.
5 Applications and Related Topics
In this chapter, we describe some practical discrete control problems for which the game-theoretical approach can be applied. The most important results are related to the so called Technology-Emissions-Means-(TEM-) model and especially Kyoto Games. We show that the proposed conceptual framework of dynamic games can be used for the analysis and optimization of the considered practical problems.
5.1 Analysis and Control of Time-Discrete Systems: Resource Planning - The TEM Model This section is concerned with time-discrete dynamical systems whose dynamics are described by a system of vector-difference equations involving state and control vector functions. For example, the Technology-Emissions-Means(TEM-) model is treated. The problem of null-controllability is described and solved exploiting the Kalman condition. Sufficient conditions for the solvability of the null-controllability problem are summarized. They are described in detail in [50]. There, the problem is solved using the solution of a suitable approximation problem. A special case is treated in [35, 49]. This case exploits the fact that the system matrix is inverse monotone. For the TEM model it is possible to prove the economic fact that if every player is an opponent of every other and if his own contribution to achieve his goal is greater than the negative sum of the contributions of his opponents, then everybody can reach the absolute minimum of his costs. This result is generalized. The solution is characterized in the context of the dual problem. At the end of the section, a theoretical and algorithmic determination of a Nash equilibrium is presented. The theoretical and algorithmic results concerning the TEM model and general time-discrete systems are embedded in [51] into the theory of La Salle exploiting limit sets and the general theory of invariants.
234
5 Applications and Related Topics
5.1.1 Motivation The conferences of Rio de Janeiro in 1992 and Kyoto in 1997 mandated new economic instruments which focus on environmental protection in macro and micro economic situations. An important economic tool for the Kyoto Treaty is Joint-Implementation. It is a program which intends to strenghten international cooperation between enterprises to reduce CO2 -emissions [96]. A sustainable development can only be guaranteed if the program is embedded into an optimal energy management program. For that reason, the TEM model was developed, providing the possibility to simulate such an extraordinary market situation. The realization of Joint-Implementation (JI) is restricted by technical and financial constraints. In a JI program, the reduced emissions resulting from technical cooperations are recorded at the Clearing House. The TEM model integrates the effects of simulation of both the technical and the financial parameters. In [90] the TEM model is treated as a time-discrete control problem. The analysis of the feasible set of solutions is examined in [91]. In the following, a short introduction to the TEM model is given. Furthermore, we present a new bargaining approach based on game theory which leads to a procedure for international emissions trading within the so-called Kyoto game (see Chapter 3). 5.1.2 The Basic Model The TEM model describes the economic interaction between several players (sometimes called actors) who intend to maximize their reduction of emissions (Ei ) caused by technologies (Ti ) through expenditures of money or financial means (Mi ). The index i stands for the i-th player, i ∈ {1, . . . , n}. The players are linked by technology and by the market. The effectiveness measure parameter emij measures the effect on the emissions of the i-th player when the j-th player invests money for his technologies. We can say that it expresses how effective technology cooperations are (such as an innovation factor), which are the central elements of a JI program. The variables ϕi can be regarded as memory parameters of the financial investments, and the value λi acts as an economical growth parameter. A detailed description is contained in [90]. The TEM model is represented by the following two equations: Ei (t + 1) = Ei (t) +
n
emij (t)Mj (t),
(5.1)
Mi (t + 1) = Mi (t) − λi Mi (t)[Mi∗ − Mi (t)]{Ei (t) + ϕi ΔEi (t)}.
(5.2)
j=1
The great advantage of the TEM model is that we are able to determine the emij -parameter empirically. (see [38]) In the first equation, the level of the reduced emissions at the t + 1-th time-step depends on the previous value plus a market effect.
5.1 Analysis and Control of Time-Discrete Systems
235
This effect is represented by the additive terms +, which might be negative or positive. In general, Ei > 0 implies that the actors have already reached the demanded value Ei = 0 (normalized Kyoto-level). A value Ei < 0 expresses that the emissions are less than the requirements of the treaty. In the second equation we see that for such a situation the financial means will increase, whereas Ei > 0 leads to a reduction of Mi (t + 1): Mi (t + 1) = Mi (t) − λi Mi (t)[Mi∗ − Mi (t)]{Ei (t) + ϕi ΔEi (t) }. The second equation contains a logistic functional dependence on Mi and the memory parameter ϕi which describes the effect of the preceding investment of financial means. The dynamics does not guarantee that the parameter Mi (t) is less than Mi∗ , which can be regarded as the budget for the i-th actor. To ensure against bankruptcy we have additionally to impose the following restrictions on the dynamical representation: 0 ≤ Mi (t) ≤ Mi∗ ,
i = 1, . . . , n
and t = 0, . . . , N.
These restrictions ensure that the financial investments can neither be negative nor exceed the budget of each actor. Now, it is easy to see that for Mi (t) −λi Mi (t)[Mi∗ − Mi (t)] ≤ 0
for
i = 1, . . . , n
and t = 0, . . . , N.
We have guaranteed that Mi (t + 1) increases if Ei (t) + ϕi ΔEi (t) ≤ 0 and it decreases if Ei (t) + ϕi ΔEi (t) ≥ 0. By incorporating the memory parameter ϕi , we have developed a reasonable model for the money expenditure emission interaction, where the influence of the technologies is integrated in the emmatrix of the system. We can use the TEM model as a time-discrete ecological model where we start with a given parameter set and observe the resulting trajectories. Usually, the actors start with negative values of Ei , i.e., they are below the baseline mentioned in the Kyoto Protocol. They try to reach a positive value of Ei . In order to reach steady states, which are determined in [90], an independent institution may coordinate trade relations between the actors (clearing house mechanism). For simplicity, it is assumed that ϕi = 0, i = {1, ..., n} 1 . Then we get the following system: Ei (t + 1) = Ei (t) +
r
emij Mj (t),
j=1
Mi (t + 1) = Mi (t) − λi Mi (t)(Mi∗ − Mi (t))Ei (t)
(5.3)
for i = 1, . . . , r and t ∈ N0 . 1
This assumption is only feasible if ΔEi Ei which is valid at the beginning of a JI program.
236
5 Applications and Related Topics
For t = 0 we assume the system to be in the state E0i , M0i , i = 1, . . . , r which leads to the initial conditions Ei (0) = E0i
and Mi (0) = M0i for i = 1, . . . , r.
In [90, 93] the economic background is described in detail. Let us concentrate here on the analytical background before we introduce the control theoretic aspects and the optimization part. If we define x = (E T , M T )T , E, M ∈ Rr , and functions fi : Rn → Rn , i = 1, . . . , n = 2r by fi (x) = Ei +
r
emij Mj ,
i = 1, . . . , r,
(5.4)
i = r + 1, . . . , n,
(5.5)
j=1
fi (x) = Mi − λi Mi (Mi∗ − Mi )Ei , then we can write (5.3) in the form x(t + 1) = f (x(t)),
t ∈ N0 ,
(5.6)
ˆ T , ΘT )T , E ˆ∈ where f (x) = (f1 (x), . . . , fn (x))T . For every fixed point x ˆ = (E r r R , we have x ˆ = f (ˆ x). Let x ˆ be any such fixed point of f . Then we replace the system (5.6) by the linear system x)x(t), t ∈ N0 , (5.7) x(t + 1) = Jf (ˆ where the Jacobian matrix Jf (ˆ x) is given by % & Ir C Jf (x) = Or D where Ir and Or is the r × r − unit and −zero matrix, respectively, and ⎞ em11 · · · em1r ⎟ ⎜ C = ⎝ ... . . . ... ⎠ emr1 · · · emrr ⎛
with
⎛ and
ˆi , dii = 1 − λi Mi∗ E
⎜ D=⎝
0
d11 .. 0
.
⎞ ⎟ ⎠
drr
i = 1, . . . , r.
In [51] the behavior of the eigenvalues of Jf (x) is treated in detail. If we consider the eigenvalues and corresponding eigenvectors of the Jacobi matrix, it is proved in [51] that the zero-sequence (x(t) = Θn )t∈N0 which satisfies (5.7) is stable. The question now arises, how can one influence (control) such models presented so far. Control means here, how one can calculate additional investments which guarantee that after a certain time-step the system tends to a
5.1 Analysis and Control of Time-Discrete Systems
237
certain behavior (or fixed point). In the following, although we concentrate only on the TEM model, the control theoretic issues are described in a more general form in [50, 51]: Firstly, in [50] we are given sufficient conditions for the solvability of this problem of controllability . Then we develop a stepwise game-theoretical method for its solution. In the cooperative case this method can be combined with the solution of a suitable approximation problem which thereby leads to a solution of the problem of controllability within the smallest number of time-steps, if the problem is solvable. Furthermore, [50] contains a stepwise non-cooperative game theoretical solution. In the following, we present some results concerning the TEM model. 5.1.3 Control Theoretic Part The emission reduction model was introduced as a controlled model in [50], [90] and [91] in the following way: We add the conditions 0 ≤ Mi (t) ≤ Mi∗
for all t ∈ N0 and i = 1, . . . , r
and the initial conditions Ei (0) = E0i
and Mi (0) = M0i for i = 1, . . . , r
where E0i ∈ R and M0i ∈ R with 0 ≤ M0i ≤ Mi∗ for i = 1, . . . , r are given. The corresponding general controlled system reads in this case Ei (t + 1) = Ei (t) +
r
emij (Mj (t) + uj (t)),
j=1
Mi (t + 1) = Mi (t) + ui (t) − λi (Mi (t) + ui (t))(Mi∗ − Mi (t) − ui (t))Ei (t) for i = 1, . . . , r and t ∈ N0 . The control functions ui : N0 → R, i = 1, . . . , r, must satisfy the conditions 0 ≤ Mi (t) + ui (t) ≤ Mi∗ for i = 1, . . . , r and t ∈ N0 . ˆ ∈ Rr ˆ T , ΘrT )T with E Fixed points of the system (5.3) are of the form (E r arbitrary. Now we have to find a vector function v : N0 → R , vi (t) := Mi (t) + ui (t) with Θr ≤ v(t) ≤ M ∗ for t = 0, . . . , N − 1, v(t) = Θr for t ≥ N such that the solution E : N0 → Rr of E(t + 1) = E(t) + Cv(t),
t ∈ N0 , (C = (emij )i,j=1,...,r )
238
5 Applications and Related Topics
E(0) = E0 , satisfies ˆ E(t) = E
for all t ≥ N
where N ∈ N is to be determined. An algorithmic procedure and numerical solution are described in detail in [51]. If we regard the linearized version, we can solve the underlying control problem in terms of the problem of null-controllability in a very elegant way to be described in the next section. 5.1.4 Problem of Fixed Point Controllability and Null-Controllability In order to motivate the approach let us consider at first an abstract linear system of the emission reduction with x(t + 1) = Ax(t) + Bu(t),
t ∈ N0 ,
(5.8)
where A is a real n × n − matrix and B a real n × m − matrix and where u : N0 → Rm is a given control function. The corresponding uncontrolled system reads now (5.9) x(t + 1) = Ax(t), t ∈ N0 , and admits x ˆ = Θn as a fixed point. The problem of fixed point controllability is then equivalent to the problem of null-controllability . Given x0 ∈ Rn , find some N ∈ N0 and a control function u ∈ U with U = {u : N → Rm | u(t) ∈ Ω for all t ∈ N0 } and Ω ⊆ Rm be a subset with θm ∈ Ω and u(t) = θm for all t ≥ N such that the solution x : N0 → Rn of (5.8) with x(0) = 0 satisfies the end condition x(N ) = Θn
(5.10)
(which implies x(t) = Θn for all t ≥ N ). An algorithmic method for solving the general problem of null-controllability is presented in [50, 51, 92]. The method can be applied to the linearized TEM-model in the following way: ˆ ∈ Rr , of the ˆ T , θrT )T , E If we linearize our system at a fixed point (E uncontrolled system, we obtain a linear control system of the following form % x(t + 1) = Ax(t) + Bu(t),
t ∈ N0 ,
with
A=
& % & Ir C C , B= , D 0r D
where Ir and 0r is the r × r-unit and zero-matrix, respectively, and
5.1 Analysis and Control of Time-Discrete Systems
239
⎞ ⎞ ⎛ ⎛ 0 em11 . . . em1r 1 − λ1 M1∗ Eˆ1 ⎟ ⎜ .. ⎟ , D = ⎜ .. C = ⎝ ... ⎠. ⎝ . . ⎠ ∗ ˆ emr1 . . . emrr 0 1 − λ r M r Er This implies k
A B=
%
C(Ir + D + . . . + Dk ) Dk+1
& for all k ∈ N0 .
Let us assume that C and D are non-singular. Then it follows that the matrices A and % & C C(Ir + D) D D2 are non-singular which implies that the Kalman condition, i.e. there exists some N0 ∈ N such that rank (B|AB| . . . |AN0−1 B) = n , is satisfied for N0 = 2. Let d1 , . . . , dr be the diagonal elements of D. Thus non-singularity of D is equivalent to di = 0 for all i = 1, . . . , r. If all di = 1 for i = 1, . . . , r, then it follows that the eigenvectors corresponding to the eigenvalues μi = 1 for i = 1, . . . , r and μi+r = di for i = 1, . . . , r of A are linearly independent which also holds true for AT . If |di | ≤ 1 for all i = 1, . . . , r and Ω = {u ∈ Rr | u ≤ γ} for some γ > 0 where · is any norm in Rr , then it may be proven that the problem of null-controllability has a solution for every choice of x0 = T T (x10 , x20 ) ∈ R2r . A solution can be obtained solving the following optimization problem (the transformation is described in detail in [51]: For a given N ∈ N solve min
u:{0,...,N −1}→Rr
s.t.
ϕN (u) =
max u(k − 1)2
k=1,...,N
& N % C(Ir + D + . . . + Dk ) k=1
%
Dk+1
u(k − 1) =
I C(Ir + D + . . . + DN −1 ) =− r DN 0r
&%
x10 x20
&
(where ·2 denotes the Euclidean norm in Rr ). Numerical examples are contained in [52]. In [51] the general problem of controllability is discussed. The problem of reachability and stability for such systems is treated from a theoretical and algorithmic point of view. In [95] the control theoretic approaches are extended to game-theoretic issues considering the economic constraints for optimal investment planning.
240
5 Applications and Related Topics
5.1.5 Optimal Investment Parameter In the following an approach is presented which determines in a direct way optimal investment parameters. In a first, step we separate the control problem. We begin with an uncontrolled system of the form x1 (t + 1) = g1 (x1 (t), x2 (t)) x2 (t + 1) = g2 (x1 (t), x2 (t)),
t ∈ N0 ,
(5.11)
where gi : Rn1 × Rn2 → Rni , i = 1, 2, are given continuous mappings and xi : N0 → Rni , i = 1, 2, are considered as state functions. For t = 0 we assume initial conditions x1 (0) = x10 ,
x2 (0) = x20
(5.12)
where x10 ∈ Rn1 and x20 ∈ Rn2 are given vectors with Θn2 ≤ x20 ≤ x2∗ for some x2∗ ≥ Θn2 which is also given. We further assume that the system T T ˆ2 )T ∈ Rn1 × Rn2 with above admits fixed points (ˆ x1 , x Θn2 ≤ x ˆ2 ≤ x2∗ which are then solutions of system x ˆ1 = g1 (ˆ x1 , x ˆ2 ),
x ˆ2 = g2 (ˆ x1 , x ˆ2 ).
Now we consider again the following Control Problem: Find vector functions x1 : N0 → Rn1 and x2 : N0 → Rn2 with Θn2 ≤ x2 (t) ≤ x2∗
for all t ∈ N0
(5.13)
which satisfy the system equations (5.11) (later on, these equations will be separated), the initial conditions as well as x1 (t) = x ˆ1 ,
x2 (t) = x ˆ2
for all t ≥ N
where N ∈ N0 has to be determined. In general, this problem will not have a solution. Therefore, we replace the uncontrolled system by the following controlled system: x1 (t + 1) = g1 (x1 (t), x2 (t) + u(t)), x2 (t + 1) = g2 (x1 (t), x2 (t) + u(t)),
t ∈ N0 ,
(5.14)
5.1 Analysis and Control of Time-Discrete Systems
241
where u : N0 → Rn2 is a control function. Then we consider the problem of finding a control function u : N0 → Rn2 such that the solutions x1 : N0 → Rn1 and x2 : N0 → Rn2 of (5.13) and (5.12) satisfy the conditions Θn2 ≤ x2 (t) + u(t) ≤ x2∗ and
ˆ1 , x1 (t) = x
for all t ∈ N0
x2 (t) + u(t) = x ˆ2
for all t ≥ N
where N ∈ N0 has to be determined. In the following, we decompose the problem. Decomposition of the Control Problem In the following the control problem of (5.13) is decomposed into two parts. It will be shown that by such a decomposition a solution can be gained. Problem A: Let us assume that we can find a vector function v : N0 → Rn with Θn2 ≤ v(t) ≤ x2∗
for t = 0, . . . , N − 1 and v(t) = x ˆ2 for t ≥ N (5.15)
such that the solution x1 : N0 → Rn1 of x1 (t + 1) = g1 (x1 (t), v(t)),
t ∈ N0 ,
x1 (0) = x10 satisfies
ˆ1 x1 (t) = x
for all t ≥ N
where N ∈ N has to be determined. Remark 5.1. The assumption indicates that for t ≥ N the parameter v(t) has reached the fixed point x ˆ2 . If such a parameter exists, the control parameter u(t) can be determined. Now it remains to solve Problem B: After we have solved Problem A we put x2 (0) = x20 , x2 (t + 1) = g2 (x1 (t), v(t))
for all t ∈ N0
and define now u(t) = v(t) − x2 (t)
for t ∈ N0 .
(5.16)
242
5 Applications and Related Topics
Remark 5.2 (Decomposition Principle). The parameter v(t) was characterized in Problem A. Via (5.16) u(t) can be calculated. With these definitions we obtain a solution of the control problem above in (5.13). Thus, in order to find such a solution we have to find a vector function v : N0 → Rn2 with Θn2 ≤ v(t) ≤ x2∗ for t = 0, . . . , N − 1, v(t) = x ˆ2 for t ≥ N such that the solution x1 : N0 → Rn1 of x1 (t + 1) = g1 (x1 (t), v(t)),
t ∈ N0 ,
x1 (0) = x10 satisfies
x1 (t) = x ˆ1
for all t ≥ N
where N ∈ N has to be determined. Example. Let us demonstrate all this by the emission reduction model to which we add the conditions 0 ≤ Mi (t) ≤ Mi∗
for all t ∈ N0 and i = 1, . . . , r
and the initial conditions Ei (0) = E0i
and
Mi (0) = M0i
for i = 1, . . . , r
where E0i ∈ R and M0i ∈ R with 0 ≤ M0i ≤ Mi∗ for i = 1, . . . , r are given. The corresponding controlled system (5.13) reads in this case Ei (t + 1) = Ei (t) +
r
emij (Mj (t) + uj (t)),
j=1
Mi (t + 1) = Mi (t) + ui (t) − λi (Mi (t) + ui (t))(Mi∗ − Mi (t) − ui (t))Ei (t) for i = 1, . . . , r and t ∈ N0 . The control functions ui : N0 → R, i = 1, . . . , r, must satisfy the conditions 0 ≤ Mi (t) + ui (t) ≤ Mi∗ for i = 1, . . . , r and t ∈ N0 . ˆ T , ΘrT )T with E ˆ ∈ Rr Fixed points of the system (5.3) are of the form (E r arbitrary. We have to find a vector function v : N0 → R with Θr ≤ v(t) ≤ M ∗ for t = 0, . . . , N − 1,
5.1 Analysis and Control of Time-Discrete Systems
243
v(t) = Θr for t ≥ N such that the solution E : N0 → Rr of E(t + 1) = E(t) + Cv(t),
t ∈ N0 , (C = (emij )i,j=1,...,r )
E(0) = E0 , satisfies ˆ E(t) = E
for all t ≥ N
where N ∈ N has to be determined. (This is again the decomposition A.) Within this part of the problem it can be observed that for every N ∈ N "N −1 # E(N ) = E0 + C v(t) t=0
Let us assume that C = (cij )i,j=1,...r is inverse monotone, i.e., C is invertible ˆ ≥ E0 . (This is feasible from and C −1 is positive2 . Further we assume that E an economic point of view [53].) ˆ if and only if Item: Then E(N ) = E, If we define
N −1 t=0
ˆ − E0 ) ≥ Θr . v(t) = C −1 (E
v(t) = Θr
for all t ≥ N,
ˆ E(t) = E
for all t ≥ N.
then Let us put vN =
N −1
ˆ − E0 ). v(t) = C −1 (E
t=0
If we define v(t) = then
1 vN N N −1
for t = 0, . . . , N − 1
(5.17)
ˆ − E0 ) v(t) = C −1 (E
t=0 2
The matrix C is inverse monotone if cii > 0, i = 1, . . . , r, cij < 0, i = j, i = P 1, . . . , r and cii ≥ i=j,i=1,...,r −cij . For that reason the inverse monotone property represents the economic scenario where every actor can be interpreted as an opponent of every other (cij < 0, i = j, i = 1, . . . , r). Furthermore, his own effect P is greater than the negative sum of the contributions of his opponents (cii ≥ i=j,i=1,...,r −cij ).
244
5 Applications and Related Topics
and
Θr ≤ v(t) ≤ M ∗
for sufficiently large N , if
Mi∗
for t = 0, . . . , N − 1
> 0 for i = 1, . . . , r.
Remark 5.3. According to the item mentioned above we obtain only information about the sum of the control parameters. The optimal investment parameters v(t) are determined according to (5.17). The parameter N can be interpreted as an economic restriction. It is only possible to guarantee that θr ≤ v(t) ≤ M ∗ if N is sufficiently large. The parameters v(t) are feasible and ˆ for all t ≥ N . E(N ) = E 5.1.6 A Game-Theoretic Extension Relation to Multilayered Decision Problems Now we consider the variables as actors who control the system as players. We use this expression because now each actor (player) is interested in minimizing his own costs and we get a game-theoretic situation. We summarize that approach and present a characterization of the solution: The players have to find now a cost vector function v : N0 → Rr with Θr ≤ v(t) ≤ M ∗ v(t) = Θr such that C
"N −1
for t = 0, . . . , N − 1, for t ≥ N for some N ∈ N
(5.18)
# v(t)
ˆ − E0 , =E
C = (emij )i,j=1,...,r .
t=0
Let us replace this condition by # "N −1 ˆ − E0 v(t) ≥ E C
(5.19)
t=0
and neglect the requirement v(t) ≤ M ∗ for t = 0, . . . , N − 1 which in case Mi∗ > 0 for i = 1, . . . , r can always be satisfied, if we can find v : N0 → Rr with (5.18), (5.19). Remark 5.4 (Feasibility). This neglection is feasible according to the decomposition principle of the control problem into the two Problems A and B in the last paragraph. In contrast to that approach we present now an algorithmic ˆ t ≥ N , i.e., the actors reduce more CO2 solution to reach a value E(t) ≥ E, than it is demanded. If we put cij = emij , i, j = 1, . . . , r,
x=
N −1 t=0
v(t),
ˆ − E0 , b=E
5.1 Analysis and Control of Time-Discrete Systems
245
then (5.19) can be written in the form r
cij xj ≥ bi ,
i = 1, . . . , r,
(5.20)
j=1
and we have to find a vector x ∈ Rr with xi ≥ 0 for i = 1, . . . , r
(5.21)
such that the inequalities (5.20) are satisfied. N −1 Now each player is interested in minimizing his costs xi = t=0 vi (t), i = 1, . . . , r. Let us assume that the players try to minimize the total cost s(x) =
r
xj
(5.22)
j=1
under the constraints (5.20), (5.21). Then we get the following typical problem of linear programming 3 : r
min s(x) =
xj
j=1
s.t.
r
cij xj ≥ bi ,
i = 1, . . . , r and xi ≥ 0
for i = 1, . . . , r.
j=1
Let us assume that x ˆ ∈ Rr is a solution of this problem. If we choose, for any i ∈ {1, . . . , r}, some xi ≥ 0 such that r
ckj x ˆj + cki xi ≥ bk
j = 1 j = i
then it follows that
r
j=1
x ˆj ≤
r
j=1,j=i
for k = 1, . . . , r, x ˆj + xi and therefore x ˆi ≤ xi .
Remark 5.5 (Optimal Solution). Every solution (x1 , x2 , . . . , xr ) of (5.20), (5.21) which minimizes (5.22) has the following property: If the i-th player declines from his choice of costs whereas all the others stick to theirs, he can at most do it not worse or, equivalently, there is no advantage in changing his costs. Every actor tries to minimize his own contribution xi , i = 1, . . . , r. Furthermore, he has to fulfill (5.22). If xi declines from his choice and (5.22) has to be fulfilled, then there is at least one player who has to increase his amount. Hence, the solution (x1 , x2 , . . . , xr ) is optimal according to (5.22), i.e., according to the total cost of the players. 3
If the efficiency parameters are assumed to be non-negative then there always exists a solution of the linear program. Unfortunately, there might occur cases where some parameters are negative.
246
5 Applications and Related Topics
It is possible to characterize this solution regarding the dual problem. The dual problem to the problem of minimizing (5.22) subject to (5.20), (5.21) consists of max t(y) =
r
bi y i ,
y ∈ Rr
i=1 r
s.t.
(5.23)
cij yi ≤ 1,
j = 1, . . . , r
(5.24)
and yi ≥ 0,
i = 1, . . . , r.
(5.25)
i=1
For y1 = y2 = . . . = yr = 0 the side conditions (5.24), (5.25) are satisfied. If we assume that there exists some x ∈ Rr which satisfies (5.20), (5.21), then there exists a solution x = x ˆ ∈ Rr of (5.20), (5.21) which minimizes (5.22) r and a solution y = yˆ ∈ R of (5.24), (5.25) which maximizes (5.23) and we obtain s(ˆ x) = t(ˆ y ) which is equivalent to the two implications ⎧ r ⎪ ⎪ ˆ > 0 =⇒ cij yˆi = 1, ⎪x ⎪ ⎨ j i=1
(CSL)
r ⎪ ⎪ ⎪ ⎪ y ˆ > 0 =⇒ cij x ˆ j = bi . ⎩ i i=1
On introducing slack variables zj ≥ 0 for j = 1, . . . , r,
(5.26)
condition (5.24) can be rewritten in the form zj +
r
cij yi = 1,
j = 1, . . . , r,
(5.27)
i=1
and the dual problem is equivalent to maximizing r
0 · zj +
j=1
r
bi y i
(5.28)
i=1
subject to (5.25), (5.26), (5.27). This problem can be immediately solved with aid of the simplex method starting with the feasible basis solution zj = 1, j = 1, . . . , r
and yi = 0, i = 1, . . . , r.
A special case is treated in [35, 49]:
5.1 Analysis and Control of Time-Discrete Systems
247
Special case: Inverse Monotone Matrix Let bj ≥ 0 and cjj > 0 for all j = 1, . . . , r. If we assume that, for some j ∈ {1, . . . , r}, cji ≤ 0
for all i = 1, . . . , r
with
i = j,
i.e., the player j can be considered as an opponent of all the others, then it follows for the solution x ˆ ∈ Rr of (5.20), (5.21) which minimizes (5.22) that r
cjk x ˆ k = bj .
k=1
For, otherwise (ˆ x1 , . . . , x ˆj−1 , x∗j , x ˆj+1 , . . . , x ˆr ) with ⎛ ⎞ r 1 ⎜ ⎟ x∗j = cjk x ˆk ⎠ < x ˆj ⎝bj − cjj k = 1 k = j
also solves (5.20), (5.21) and it follows x∗j + r dicting the minimality of k=1 x ˆk . Now we assume that
r
k=1,k=j
x ˆk <
r
k=1
x ˆk contra-
cji ≤ 0 for all j = i, i.e., every player can be considered as an opponent of every other. Then it follows that r cjk x ˆk = bj for all j = 1, . . . , r . k=1
If in addition we assume that r
cij > 0
for all i = 1, . . . , r,
j=1
then cjj > 0 for all j = 1, . . . , r and the matrix C = (cij )i,j=1,...,r is inverse monotone, i.e., C −1 exists and C −1 is positive. This implies x ˆ = C −1 b ≥ Θr . Item: If x ∈ Rr is any solution of (5.20), (5.21), then it follows that x ≥ C −1 b = x ˆ,
i.e.,
xi ≥ x ˆi
In other words, this means the following:
for i = 1, . . . , r.
248
5 Applications and Related Topics
Remark 5.6. If every player is an opponent of every other and if his own contribution to achieve his goal is greater than the negative sum of the contributions of his opponents, then everybody can reach the absolute minimum of his costs. This is a realistic economic situation. Actors are only interested in such emission reduction programs if the positive effect in such situations is higher than the negative one. Nevertheless, there is a need for optimal investment planning. We conclude this section with an algorithmic approach to determine an optimal solution in the sense of Remark 5.6 in a direct way. Algorithmic Solution for the Determination of an Optimal Solution Let us conclude this section with a direct method for the determination of an optimal solution in the sense of Remark 5.5, i.e., of an x ˆ ∈ Rr with x ˆ ≥ Θr and r cij x ˆj ≥ bi , i = 1, . . . , r (5.29) j=1
such that the following is true: If for an arbitrary i ∈ {1, . . . , r} there exists some xi ≥ 0 with r
ckj x ˆj + cki xi ≥ bk ,
k = 1, . . . , r,
(5.30)
j = 1 j = i
then it follows that x ˆi ≤ xi . In order to determine such an optimal solution we apply an iterative method as follows: Starting with a vector x0 ≥ Θr which satisfies (5.29) with ˆ, we construct a sequence (xL )L∈N0 with L = l · r + i , l ∈ N0 , x0 instead of x i = 1, . . . , r − 1 in the following manner: If xL ≥ Θr with (5.29) for xL instead of x ˆ is given, then we minimize xi ∈ R, i = 1, . . . , r subject to xi ≥ 0 and get the following problem:
min
xi ∈R
r
ckj xL j + cki xi ≥ bk ,
k = 1, . . . , r
(5.31)
j = 1 j = i
s.t.
xi ≥ 0.
(5.32)
This problem has a solution x∗i ≥ 0 which can be explicitly calculated if cii > 0 for all i = 1, . . . , r, as we shall see later and for which x∗i ≤ xL i holds true.
5.1 Analysis and Control of Time-Discrete Systems
If we define xL for L+1 j = xj x∗j for
j= i, j=i
where
(l + 1)r , L+1= l · r + i + 1,
249
if i = r − 1, if i < r − 1,
then xL+1 ≥ Θr satisfies (5.29) with xL+1 instead of x ˆ and xL+1 ≤ xL . The latter implies the existence of x ˆ = lim xL+1 ≤ xL L→∞
for all L ∈ N0
which satisfies (5.29).
Assertion: x ˆ is an optimal solution which fulfills (5.29) and (5.30), if cii > 0 for all i ∈ {1, . . . , r}.
(5.33)
Proof. Assume that x ˆ is not such an optimal solution. Then there is some i ∈ r {1, . . . , r} and some xi ≥ 0 such that j=1,j=i ckj x ˆj +cki xi ≥ bk , k = 1, . . . , r ˆi . This implies and xi < x r
cij x ˆj + cii x ˆi > bi .
(5.34)
j = 1 j = i
If we define a subsequence (Ll )l∈N0 by Ll = l · r + i, then we obtain r
Ll +1 l ckj xL ≥ bk j + cki x
for all
k = 1, . . . , r
and all l ∈ N0 .
j = 1 j = i
In particular it follows that r
Ll +1 l cij xL = bi j + cii xi
for all l ∈ N0
j = 1 j = i l +1 (for, otherwise xL could be chosen smaller) which implies i
r
cij x ˆj + cii x ˆ i = bi
j = 1 j = i
contradicting (5.34). Hence, the assumption is false and x ˆ is an optimal solution which fulfills (5.29) and (5.30). Remark 5.7. The assumption that cii > 0 for all i ∈ {1, . . . , r} indicates that financial investments of one actor have no negative effect on his own reductions. From an economic point of view this is plausible.
250
5 Applications and Related Topics
5.2 Algorithmic Solutions for an Emission Reduction Game: The Kyoto Game This section describes a cooperative approach to the CO2 -problem. The Kyoto Protocol goal is for every actor to achieve certain states [92, 96]. This demand leads to a special structure of the feasible sets which define a control problem. In the previous section, there were no (game-theoretic) restrictions on the feasible sets (except they should be not empty). Now they shall be characterized from an economic point of view. This leads to the concept of a core of optimal coalitions between actors for which there is no advantage in forming additional coalitions. If we regard the core concept as an obvious concept, necessary and sufficient conditions for the existence are essential [52]. The main part consists in solving the control problem regarding only feasible sets which are represented by the core. A special bargaining solution is then taken as the control parameter and the underlying equivalence problem is solved [91]. At the end of the section, general transformation principles for such structures are summarized [51]. 5.2.1 The Core in the TEM Model Let us assume that we have found a cost vector function v : N0 → Rr which satisfies (5.18) and (5.19) of the last section. Then the controlled costs are given by Mi (t + 1) = vi (t) − λi vi (t)(Mi∗ − vi (t))Ei (t) = vi (t) − λi vi (t)(Mi∗ − vi (t))(Ei (t − 1) +
r
emij vj (t − 1))
j=1
for i = 1, . . . , r and t ∈ N. Now let K be any subset of N = {1, . . . , r} and, for any t ∈ {1, . . . , N −1}, let cK (t − 1) = (cK ij (t − 1))i,j=1,...,r be a non-negative r × r-matrix with cK ii (t − 1) = 0 for i = 1, . . . , r, cK ij (t − 1) > 0 for i, j ∈ K(i = j). If we define, for every t ∈ {1, . . . , N − 1}: K c˜K ij (t − 1) = emij + cij (t − 1),
c˜K ij (N − 1) = emij then
N t=1
C˜ K (t − 1)v(t − 1) ≥ C
for i, j = 1, . . . , r, "N −1 t=0
# v(t)
ˆ − E0 . ≥E
5.2 Algorithmic Solutions for An Emission Reduction Game
251
Hence, condition (5.19) is also satisfied, if we replace, for every t ∈ {0, . . . , N − cK 1}, the matrix C by C˜ K (t) = (˜ ij (t))i,j=1,...,r . Then, the controlled costs are given by MiK (t + 1) = vi (t) − λi vi (t)(Mi∗ − vi (t))(Ei (t − 1) +
r
c˜K ij (t − 1)vj (t − 1))
j=1
for i = 1, . . . , r and t = 1, . . . , N − 1, and it follows that MiK (t + 1) ≤ Mi (t + 1)
for all i = 1, . . . , r and t = 1, . . . , N − 1.
If we define, for every K ⊆ N and every t ∈ {1, . . . , N − 1}: vt (K) =
r
(Mi (t + 1) − MiK (t + 1))
i=1
=
r
λi vi (t)(Mi∗ − vi (t))
i=1
r
cK ij (t − 1)vj (t − 1),
j=1
then vt (φ) = 0. Now the function vt : 2r → R+ can therefore be interpreted as the payoff function of a cooperative r-persons game. The subsets K of N can be interpreted as coalitions which are built by the players by changing the matrix C of mutual influence to the matrix C˜ K (t − 1) for t = 1, . . . , N − 1 whereby they guarantee that the controlled costs are diminished. If i ∈ K, K then cK ij (t − 1) = 0 for all j ∈ N and therefore, Mi (t + 1) = Mi (t + 1) so that vt (K) =
(Mi (t + 1) − MiK (t + 1)).
i∈K
In particular we have vt (N ) =
r
(Mi (t + 1) − MiN (t + 1)).
i=1
If we denote the gain of the i-th player, if he joins the coalition K ⊆ N , by vti (K) = Mi (t + 1) − MiK (t + 1) r = λi vi (t)(Mi∗ − vi (t)) cK ij (t − 1)vj (t − 1) j=1
then vt (K) = i∈K vti (K). The Kyoto Protocol intends to advance the grand coalition. For that reason let us assume that vt (N ) ≥ vt (K)
for all K ⊆ N .
252
5 Applications and Related Topics
Then the grand coalition N leads to the largest joint gain vt (N ).4 The quesvt (N ), i.e., tion now is whether there exists a imputation r (x1 , . . . , xr ) of x such that xi ≥ 0 for all i = 1, . . . , r and vt (N ) = i i∈K xi ≥ i=1 vt (K) for all K ⊆ N . This means that there is no incentive to build coalitions which differ from the grand coalition. The set of all such economically important imputations of vt (N ) is called the core of the game. Conditions for the Core to be Non-Empty The existence of a non-empty core is guaranteed, if r
cK ij (t − 1)vj (t − 1) ≤
j=1
r
cN ij (t − 1)vj (t − 1)
j=1
for all i = 1, . . . , r and K ⊆ N . ¿From this condition it follows that vti (K) ≤ vti (N )
for i = 1, . . . , r and all K ⊆ N .
If we, therefore, set xi = vti (N ) for i = 1, . . . , r, then we can conclude that xi ≥ 0 for i = 1, . . . , r,
r
xi = vt (N ) and
i=1
xi ≥ vt (K) for all K ⊆ N ,
i∈K
i.e., (x1 , . . . , xr )T is in the core of vt . In general, the calculation time increases exponentially with the number of the players so it is not easy to show that the core c(vt ) of vt is not empty. For this reason necessary and sufficient conditions for the existence of the core are indispensable. Theoretical results and sufficiency criteria are derived in [51]. In the following, we consider a special bargaining concept as a suitable control parameter. First of all we motivate the fact that such a solution will be preferred within the TEM model. Solving the Semi-Smooth Equivalent Problem A special situation is obtained if we now regard the τ -value [103] as a possible bargaining parameter. In the following, its concept is summarized (for simplicity, we neglect time dependence of the characteristic function). This section answers the question whether it is possible to regard the τ -value as a control parameter of our time-discrete system. The τ -value depends linearly on an upper vector, which can be regarded as a maximal preferred value for each player, and the gap-function which expresses the difference between the preferred and realizable contributions of 4
Later on, criteria are presented which justify such an approach.
5.2 Algorithmic Solutions for An Emission Reduction Game
253
a coalition. This concession vector acts as a resulting individual bargaining interval for each player in such a determined coalition. We consider a general n-players game with the characteristic function v ∈ Gn . Then, we define Definition 5.8. Let v ∈ Gn be a n-persons game, N = {1, . . . , n}. The upper vector bv ∈ Rn is defined by bvi := v(N ) − v(N \{i}) for all i ∈ N . Definition 5.9. Let v ∈ Gn . Then define the gap-function g v : P ot(N ) → we v v R of the game v by g (S) := j∈S bj − v(S) for all S ∈ P ot(N ). The concession vector λv ∈ Rn is defined by λvi := min{g v (S) | S ∈ P ot(N ), i ∈ S}, i = 1, . . . , n. Definition 5.10. Let v ∈ Gn be a n-persons game. The τ -value is defined by τi := bvi − where
λv ({N }) =
i∈N
λvi g v ({N }), λv ({N })
i ∈ N,
λvi is assumed to be positive.
In the following, the class of quasi-balanced games and the class of 1-convex games are introduced. According to emission reduction programs as Joint Implementation the grand coalition of all actors has great importance for the ratification procedure of the Kyoto Protocol. If the concession vector is characterized merely by the grand coalition then small coalitions are protected. The assumption that λi ≥ 0 for all i = 1, . . . , n indicates that every actor has an incentive to join the grand coalition. According to [25] there might occur cases where this might not be observable in reality. This criticism indicates the necessity of a control theoretic approach and monitoring mechanisms. This aspect is described in the following where the τ -value is taken as a control parameter. As the τ -value is a certain allocation method an overview over another cost-allocation method within general cost-games is presented in the following subparagraph: Cost-Games and General Cost Allocation Methods - New Filter Technologies of Power Plants Although there are a lot of allocation models in game-theory literature there is a lack of economic case studies for emission trading scenarios. For that reason a similar economic example should be discussed and compared. Airport cost allocation problems are quite similar to our emission reduction program: If we presume that new filter technologies are necessary to reduce emissions
254
5 Applications and Related Topics
within Joint Implementation programs it is possible to compare the situation with classical airport landing cost allocation problems: Building a new runway and sharing the costs among the users are very similar to the cost allocation problem within the development of new filter technologies of power plants. In [20] a game-theoretic approach to the cost allocation problem of setting airport landing charges for different types of aircraft is considered. The biggest plane has the highest impact on the joint costs. Let us summarize in the following this attempt: As the τ -value is a certain allocation method an overview over another cost-allocation, namely the Shapley value, is presented: Definition 5.11 (General Cost Game). Let N = {1, 2, . . . , n} be a set of users who cooperate in the undertaking of a joint project. For any nonempty subset S of N , let c(S) represents the least cost of that coalition. If we put c(∅) := 0, the resulting cost function c : 2N → R should be subadditive, i.e. c(S) + c(T ) ≥ c(S ∪ T ), S ∩ T = {∅}, S, T ∈ N . This cost function can be interpreted as the characteristic function of the cost game (N ; c). Remark 5.12. We call a vector y ∈ Rn a cost-allocation if it satisfies the efficiency principle according to the amount of the grand coalition, i.e., y = v(N ). i i∈N A very interesting cost allocation problem is the setting of airport landing charges for different types of aircraft. We introduce the airport cost game (c(∅) = 0) by c(S) = max (Cj | 1 ≤ j ≤ m, S ∩ Nj = ∅),
S⊂N
(5.35)
where 0 = C0 < C1 < C2 < . . . < Cm and Nj is the set of landings by planes of type j (j = 1, . . . , m) and N := ∪m j=1 Nj is the set of all landings at the airport. In the following the τ -value and the Shapley-value are determined for this special allocation problem. An economic interpretation is given and a small case study is presented. The reader may judge on his own which principle is more realistic or intuitive. Theorem 5.13 (Driessen86). Let (N ; c) be the airport cost game with c(S) = max(Cj | 1 ≤ j ≤ m, S ∩ Nj = ∅), S ⊂ N where 0 = C0 < C1 < C2 < . . . < Cm . If we set nj := |Nj | | for all 1 ≤ j ≤ m, then we get τi (c) = m k=1
Cj
Cm
for
i ∈ Nj
in case
Cm−1
for
nm ≥ 2.
nk Ck
If nm = 1 and m ≥ 2 then τi (c) =
m−1 k=1
Cj
i ∈ Nj , j = m, and
nk Ck + Cm−1
τˆi (c) = τi (c) + Cm − Cm−1
for ˆi ∈ Nm and i ∈ Nm−1 .
5.2 Algorithmic Solutions for An Emission Reduction Game
255
This recursive scheme can be interpreted as follows [19]: If there are at least two landings by planes of the largest type, the joint costs Cm are allocated in proportion to the runway costs Cj , 1 ≤ j ≤ m. Whenever the planes of the largest type use the runway only once, the incremental cost Cm − Cm−1 is charged to the largest plane. Furthermore, the remaining joint costs Cm−1 are allocated among all landings according to the allocation principle. In that iteration step the largest plane of type m is now regarded as a plane of type m − 1. In comparison to that we consider the Shapley-value which is defined as follows: Definition 5.14 (Shapley-Value, [101]). Let v ∈ Gn be a n-persons game with N = {1, . . . , n}. Then for all i ∈ N the Shapley value is defined by
φi (v) :=
S⊂N \{i}
|S|!(n − |S| − 1)! (v(S ∪ {i}) − v(S)), (n!)
i = 1, . . . , n.
−1 Remark 5.15. The fraction can be written as γn (T ) = n−1 n−1 and |T | then γ (S) = 1 for all i ∈ N . This indicates that the expresS⊂N −{i} n sion {γn (S) | S ⊂ N − {i}} can be considered as a probability distribution over the collection of subsets of N not containing player i. This was one reason for not considering the Shapley value in the emission reduction example. Nevertheless, both values shall be compared in the following: Theorem 5.16. Let (N , c) be the airport cost game of (5.35). If we define m |Nk | for all 1 ≤ j ≤ m then we have for the Shapley value mj := k=j
φi (c) =
j Ck − Ck−1 k=1
mk
for
i ∈ Nj .
This allocation principle indicates that the costs are divided in a successive way: Starting with the cost C1 of a runway adequate for planes of the smallest type: All players have to contribute. In the next phase the incremental costs C2 − C1 of a runway (adequate to receive all landings by planes of the second smallest type) are distributed among all players but the smallest type. Continue thus until finitely the incremental cost Cm −Cm−1 of the largest type is divided equally among the number of landings made by the largest aircraft type. Due to [21] an enlarged class of such airport cost allocation methods is considered for the Shapley value.
256
5 Applications and Related Topics
Remark 5.17 (Airport Cost-Games - Joint Implementation Programs). In airport cost allocation problems game-theoretic principles are used to determine fair cost allocations for the landing fees. In our example the amount τi indicates the cost of a landing of plane i (i ∈ Nj ) on runway j. Each runway stands for a certain class of costs. A similar situation occurs if we consider certain emission reduction programs: It is cheaper to reduce emissions in the installation A than in B. For that reason these technical opportunities can be classified in the same way as in the cost allocation game. For each reduction program the relevant costs can be determined. Concerning the Kyoto Protocol the total amount of emissions has to be reduced. For example, joint costs can be shared according to the recursive scheme which is described in Theorem 5.13. In the following the τ -value is considered and its properties in the class of quasi-balanced games are described. We will see that the quasi-balance property can be assumed to be valid in the beginning of a Joint Implementation program. Quasi-Balanced Games The last paragraph demonstrated the necessity that actors cooperate within such joint projects. The control theoretic approaches ask for stable coalitions. We extend that understanding in the following dynamic way: Player i promises the other members of any coalition S containing player i their utopia payoffs whenever they cooperate with him. Player i will keep the remaining part of the associated worth v(S) and hence, player i himself gets the amount v(S). This amount should be maximized. This amount will be maximized if the concession vector can be determined for each player. In the following, we restrict our attention to cases where these concession amounts λvi , i ∈ N , are nonnegative, i.e., g v (S) ≥ 0 for all S ⊂ N . Furthermore, these maximal concession amounts should total at least as much as the joint concession amount. Remark 5.18. In the beginning of Joint-Implementation programs the actors are seeking for partners and stable cooperations. Even if there is not enough data available at the moment to compare the game-theoretic approach with real-world data, the assumption of quasi-balance and the underlying procedure might be possible. We will see that even for a further restriction of 1-convex games we get a suitable solution. Definition 5.19. The class QB n of quasi-balanced n-persons games is given by QB n := {v ∈ Gn | λv (N ) ≥ g v (N ) n
= {v ∈ G |
λvi
≥0 v
for all v
and i∈N
g v (S) ≥ 0 and
(b − λ )(N ) ≤ v(N ) ≤ bv (N )}.
for all S ∈ P ot(N )}
5.2 Algorithmic Solutions for An Emission Reduction Game
257
Definition 5.20. The class of 1-convex games is characterized by the fact that all actors have the same concession vector λvi with λvi ≥ 0: λvi = g v (N ),
i = 1, . . . , n.
(5.36)
By this definition, the concession vector is characterized merely by the grand coalition, i.e., the gap of the grand coalition N is minimal among the gaps of non-empty coalitions. The concepts of 1-convexity and quasi-balance have important properties according to the non-emptiness of the core. Solution of the Equivalence Problem For quasi-balanced games it has been shown that in the 3-players case the τ value is always contained in the core [19]. If we consider now the existing core as feasible control set we get a modified control problem. As the approach is motivated by the fact that the τ -value is equivalent to the control parameter we call this problem now the equivalence problem. For a derivation of the resulting system, we refer to [91]. The result can be extended for 1-convex games such that the τ -value is always an element of the core and, therefore, also a suitable control vector. For the 3-players case the structure of the equivalence problem is similar to a linear program and can be easily solved. For the 1-convex case the structure of the equivalence problem has the following form: ˜ (˜ ˜ (x, y, z, w)(x, y, z, w)T = M x)˜ xT . u ˜=M
(5.37)
The vector u ˜ may be interpreted as a set of possible control parameters of a time-discrete system, whereas the vector x ˜ can be seen as a bargaining solution of a cooperative game. If x, y, z are the amounts of the 2-players coalitions and ω the amount of the grand coalition, we get the following representation: u1 = w − z −
f (x, y, z, w) (2w − x − y − z), f (x, y, z, w) + g(x, y, z, w) + h(x, y, z, w)
u2 = w − y −
f (x, y, z, w) (2w − x − y − z), f (x, y, z, w) + g(x, y, z, w) + h(x, y, z, w)
u3 = w − x −
f (x, y, z, w) (2w − x − y − z), f (x, y, z, w) + g(x, y, z, w) + h(x, y, z, w)
2w − x − y − z = ,
for some > 0.
(5.38)
according to the 1-convexity condition, f, g and h have the structure [94]: f (x, y, z, w) = min{w − z, 2w − x − y − z} = − max{z − w, x + y + z − 2w}, g(x, y, z, w) = min{w − y, 2w − x − y − z} = − max{y − w, x + y + z − 2w}, h(x, y, z, w) = min{w − x, 2w − x − y − z} = − max{x − w, x + y + z − 2w}.
258
5 Applications and Related Topics
Remark 5.21. If we neglect the 1-convexity condition, (5.37) can be solved via linear programming techniques. As the τ -value should lie in the core we can not omit this property and need another solution strategy. By exploiting the combinatorial structure of max-type functions we will be prepared to apply Newton methods for solving (5.37). Index-Sets and Max-Type Functions Now, our aim is to show that system (5.37) is Lipschitzian and regard the following: Preconsideration (For details see [94]) Let Y ⊆ Rq be a compact index set and a set M ⊆ Rn , where q, n ∈ N, and x → maxy∈Y ζ(x, y) (x ∈ M ) be given. Then, we state that for all x, x ˜ ∈ Rn we have: max ζ(x, y) − max ζ(˜ x, y) ≤ max |ζ(x, y) − ζ(˜ x, y)| . (5.39) y∈Y
y∈Y
y∈Y
(For details see [94].) Applying this preconsideration, we can prove that |f (x, y, z, w) − f (˜ x, y˜, z˜, w)| ˜ ≤ max{|w − w ˜ − (z − z˜)|, |2w − 2w ˜ − ((x − x ˜) + (y − y˜) + (z − z˜))| ≤ 5 max{|x − x ˜|, |y − y˜|, |z − z˜|, |w − w|} ˜ (5.40) and that the same holds true for g and h. This implies for ϕ(x, y, z, w) = f (x, y, z, w) + g(x, y, z, w) + h(x, y, z, w) that |ϕ(x, y, z, w) − ϕ(˜ x, y˜, z˜, w)| ˜ ≤ 15 max{|x − x ˜|, |y − y˜|, |z − z˜|, |w − w|}. ˜ In order to show that ϕ1 is Lipschitzian we can make use of the fact that |ϕ(x, y, z, w) − ϕ(˜ x, y˜, z˜, w)| ˜ 1 1 = − , ϕ(x, y, z, w) ϕ(˜ x, y˜, z˜, w) ˜ |ϕ(x, y, z, w|| ϕ(˜ x, y˜, z˜, w)| ˜ if |ϕ(x, y, z, w)| > 0 and |ϕ(˜ x, y˜, z˜, w)| ˜ > 0. Let M = (x, y, z, w) ∈ R4 | |ϕ(x, y, z, w)| > α for some α > 0 . Then 1 1 ≤ 15 max{|x − x ˜|, |y − y˜|, |z − z˜|, |w − w|} ˜ ϕ(x, y, z, w) − ϕ(˜ x, y˜, z˜, w) ˜ α2 for all (x, y, z, w), (˜ x, y˜, z˜, w) ˜ ∈ M. For the rest of the proof one can make use of the fact that the product of two Lipschitz-continuous functions is also Lipschitz-continuous, if the functions are bounded.
5.2 Algorithmic Solutions for An Emission Reduction Game
259
Remark 5.22. Note that the equivalence problem does not (only) determine the τ -value within the core, but calculates also an optimal control parameter which is feasible (i.e. solves the problem and lies in the core) and is identical to the τ -value. Our aim was to stimulate cooperative behavior. Solving the equivalence problem we get a suitable solution. As the derivation is very theoretical we will seek in the following section for another characterization of the grand coalition. We will see that within the TEM model a second cooperative treatment is possible. 5.2.2 A Second Cooperative Treatment of the TEM Model Let us come back to the problem of minimizing (5.22) subject to (5.20), (5.21). Let S of N = {1, . . . , r}, cj (S) = us define, for every non-empty subset c for j = 1, . . . , r and b(S) = b ij i∈S i∈S i . We assume that, for all nonempty S ⊆ N , ⎧ ⎪ ⎨cij ≥ 0 for all i, j = 1, . . . , r, (5.41) cii > 0 and ⎪ ⎩ bi > 0 for i = 1, . . . , r. For every non-empty S ⊆ N we now consider the problem of xj min j∈N
subject to xj ≥ 0 for j = 1, . . . , r
(5.42)
and r
cj (S)xj ≥ b(S).
(5.43)
j=1
According to the assumption (5.41) the set V (S) of all vectors x ∈ Rr which satisfy (5.42) and (5.43) is non-empty and, since j∈S xj ≥ 0 for all x ∈ V (S), there exists some x(S) ∈ V (S) with ⎫ ⎧ ⎬ ⎨ xj (S) = min xj x ∈ V (S) =: v(S). ⎭ ⎩ j∈S
j∈S
This function v : 2N → R+ can be interpreted as the payoff function of a cooperative r-persons game, if we define v(φ) = 0. The subsets S ⊆ N can be interpreted as coalitions which are built by the players by adding their
260
5 Applications and Related Topics
constraints in (5.20) and minimizing j∈N xj . The value v(S), for every nonempty S ⊆ N , can be determined explicitly. For that purpose we consider the dual problem which consists of max b(S)y cj (S)y ≤ 1 for all j ∈ N and y ≥ 0.
1 This problem has a solution, namely y(S) = max cj (S) j ∈ N , cj (S) > 0 and so v(S) = b(S)y(S). The question now arises what, if any, could be an incentive for the players to join in a grand coalition. For that purpose we divide the minimum cost v(N ) of the grand coalition N into r shares xi ≥ 0, i = 1, . . . , r, i.e., s.t.
r
xi = v(N )
(5.44)
i=1
If this can be done in such a way that for every coalition S ⊆ N we have xi ≤ v(S) (5.45) i∈S
then there is no incentive for them to form coalitions other than the grand one. In such a case we refer to the grand coalition as stable. Now we can prove the following sufficient conditions for cooperative behavior: Item: If the condition (5.41) is satisfied, then the grand coalition is stable. Remark 5.23. The equations (5.44) and (5.45) are identical with the definition of the core of a cooperative game. Here, the relationship to general production games can be observed. Linear production games were introduced in [84]. They are summarized in the following paragraph. The Core of a Linear Production Game We consider a linear production game with n players. Each player has at his disposal a vector bi = (bi1 , bi2 , . . . , bim ), i = 1, . . . , n, of resources bik > 0, k = 1, . . . , m, which he can use to produce goods that can be sold at a given market price. We assume that a unit of the j-th good (j = 1, . . . , p) requires akj ≥ 0 units of the k-th resource (k = 1, . . . , m) and can be sold at a price cj > 0. Let S ⊆ N = {1, . . . , n}, S = ∅, be a coalition. This coalition then has a total of bk (S) = bik i∈S
5.2 Algorithmic Solutions for An Emission Reduction Game
261
units of the k-th resource. Using all of their resources, the members of S can produce vectors (x1 , x2 , . . . , xp ) of goods which satisfy p
akj xj ≤ bk (S) for k = 1, . . . , m
(5.46)
j=1
xj ≥ 0 for j = 1, . . . , p. Under these conditions they want to maximize their profit p
cj xj .
j=1
If we define ⎧ p ⎨
⎫ ⎬ v(S) = max cj xj x ∈ Rp satisfies (5.46) , ⎩ ⎭ j=1 if S is non-empty (in which case the problem of maximizing p
cj xj
j=1
subject to (5.46) has a solution, if for every k ∈ {1, . . . , m} there is at least one j ∈ {1, . . . , p} such that akj > 0), and v(∅) = 0, then v : 2N → R+ is the characteristic function of a cooperative n-persons game. Item: The core of this game is non-empty. Remark 5.24. Owen makes use of the duality theory of linear programming to obtain equilibrium price vectors and to prove the non-emptiness of the core. He shows that in linear production games, a part of the core elements can be found by using shadow prices for the resources. When each player receives a payoff that equals the value of his/her resource bundle under the shadow price, this forms a core element in the production game. Since Owen(1975) defined linear production games arising from situations in which the production process is linear, several generalizations of the general model have been proposed. Generalizations of Owen’s model are given in [17] and [21]. The non-emptiness of the core is a necessary and sufficient condition for the balance of the game. The proof of this statement can be based on the duality theorem for linear programs. Let us introduce the balance of a game before we generalize that approach and present further necessary and sufficient conditions for the non-emptiness of the core.
262
5 Applications and Related Topics
Balanced Games Balanced collections were introduced in [101]. Before we define balance, we introduce the indicator function lS as follows: Definition 5.25 (Indicator Function). Let N = {1, 2, . . . , n} and S ∈ N , S = ∅. For any coalition S ∈ N we define the indicator function lS : N → {0, 1} by lS (i) :=
1 0
if if
i∈S i ∈ N \ S.
(5.47)
Now we are able to define the balance of a collection of distinct nonempty subsets of a coalition. Definition 5.26. Let N = {1, 2, . . . , n} and S ⊂ N , S = ∅. A collection B = {S1 , S2 , ..., Sm } of distinct non-empty subsets of the coalition S is said to be balanced over S if there exist positive numbers w1 , ..., wm such that for all i ∈ S
wj = 1
or equivalently,
m
wj lSj (i) = 1.
(5.48)
j=1
j; i∈Sj
The associated positive numbers are called weights for the balanced collection B. Remark 5.27. The sum of the weights is equal to one if and only if the balanced collection B over S is a partition of the set S. Definition 5.28. A game v ∈ Gn is said to be balanced if for any balanced collection B = {S1 , S2 , . . . , Sm } over the player set N with corresponding weights w1 , w2 , . . . , wm we have m
wj v(Sj ) ≤ v(N ).
j=1
The balance condition for a game requires that it is not advantageous with respect to the earnings in the game to divide the grand coalition N into sets of any balanced collection over N on the understanding that the earnings of the involved sets are weighted according to the corresponding weights. We can exploit that property for the following results:
5.2 Algorithmic Solutions for An Emission Reduction Game
263
Necessary and Sufficient Conditions for the Grand Coalition to be Stable In order to derive further sufficient%and & necessary conditions for the grand r coalition to be stable we denote the subsets of N which have k elements k % & r by Skl , l = 1, . . . , . Then we have the conditions k
xi ≤ v(Skl )
(5.49)
i∈Skl
to be satisfied for l = 1, . . . , r
% & r and k = 1, . . . , r − 1 together with k
xi = v(N ).
i=1
Now let us assume that k v(N ) ≤ min„ « v(Skl ) r l=1,..., r
for k = 1, . . . , r − 1.
k
If we then define xi =
1 v(N ) r
for i = 1, . . . , r
we obtain i∈Skl
% & k r l xi = v(N ) ≤ v(Sk ) for all l = 1, . . . , and k = 1, . . . , r − 1 k r
and
r
xi = v(N ),
i=1
i.e. , {x1 , . . . , xr } is a division of v(N ) of the kind we are looking for. On using the fact that “ ” r
k
l=1 i∈Skl
one can see that
xi =
% & r k r · xi k r i=1
for all k = 1, . . . , r
264
5 Applications and Related Topics “ ” r
1 k v(N ) ≤ % & r r k
k
v(Skl )
for all
k = 1, . . . , r − 1
l=1
are necessary conditions for the existence of a division {x1 , . . . , xr } of v(N ) with xi ≤ v(S) for all S ⊆ N . i∈S
Generalization: Non-Cooperative Treatment Given n players who persue n goals which are given by an n-vector b = (b1 , . . . , bn ). In order to achieve these goals every player has to spend a certain amount of money, say xi ≥ 0 for the i − th player. Every player Pi , i = 1, . . . , n, can be assigned a goal value fi which depends on the cost values of all players and can be described as a function fi : Rn → R for i = 1, . . . , n. The requirement that all players reach their goal is assumed to be given by a system of inequalities of the form fi (x1 , . . . , xn ) ≥ bi for i = 1, . . . , n (5.50) where xi ≥ 0
for
i = 1, . . . , n.
(5.51)
Every player is, of course, interested in minimizing his own costs subject to (5.50), (5.51). This, however, is in general simultaneously impossible. Therefore, we assume as a first step that the players minimize s(x) =
n
xi
(5.52)
i=1
subject to (5.50), (5.51). Let us assume that x ˆ ∈ Rn is a solution of this problem. If we then choose, for any i ∈ {1, . . . , n}, some xi ≥ 0 such that fi (ˆ x1 , . . . , x ˆi−1 , xi , x ˆi+1 , . . . , x ˆ n ) ≥ bi , it follows that
n j=1
and, therefore, x ˆi ≤ xi .
x ˆj ≤
n j = 1 j = i
x ˆj + xi
5.2 Algorithmic Solutions for An Emission Reduction Game
265
Remark 5.29. Thus every solution of (5.50), (5.51) which minimizes (5.52) is an optimal solution in the following sense: If the i-th player declines from his choice of costs whereas all the others stick to theirs, he can at most do worse and at least do no better. Then we can prove: Lemma 5.30. Let us assume that, for every i ∈ {1, . . . , n}, for every finite sequence of vectors x1 , . . . , xm ∈ Rn and numbers λ1 ≥ 0, . . . , λm ≥ 0, it is true that "m # m k λk x ≥ λk fi (xk ). fi k=1
k=1
Further let us assume that fN (x) ≥ fS (x) and fS (x) = i∈S fi (x) for all non-empty S ⊆ N and all x ∈ R with xi ≥ 0 for i = 1, . . . , n. B = 2N \{θ} and for Then v(N ) ≤ S∈B γS v(S) if for every collection every S ∈ B there exists a weight γS ≥ 0 with S∈B,i∈S γS = 1. Generalization: Cooperative Treatment We consider again n players Pi , i = 1, . . . , n, n ≥ 2, who play a game in which every player Pi has at his disposal a (non-empty) set Ui ⊆ Rmi of strategies. They can, however, not necessarily choose their strategies independently of each other. If player Pi chooses ui ∈ Ui for i = 1, . . . , n, then )n the n-tuple (u1 , . . . , un ) is required to lie in a non-empty subset U of i=1 Ui which is assumed to be of the form U=
n
Vi
where Vi ⊆
i=1
n
Uj
for i = 1, . . . , n.
j=1
)n Further every player Pi is assigned a cost function ϕi : j=1 Uj → R+ which he wants to minimize on U . This, however, is in general impossible simultaneously. nTherefore, the players could minimize in a first step the function ϕ(u) = i=1 ϕi (u) for u ∈ U . Remark 5.31. If u ˆ ∈ U is such that ϕ(ˆ u) ≤ ϕ(u) for all u ∈ U , then it is easy to see that u ˆ is a so called Pareto optimum, i.e., if there is any u ∈ U such that ϕi (u) ≤ ϕi (ˆ u) for all i = 1, . . . , n, then it necessarily follows that ϕi (u) = ϕi (ˆ u) for all i = 1, . . . , n.
266
5 Applications and Related Topics
Example: Grand Coalition Let be mi = 1 and Ui = R+ for every i = 1, . . . , n. Further let ϕi : Rn+ → R+ be given by ϕi (u1 , . . . , un ) = ui and let Vi be given by ⎧ ⎨ Vi = u ∈ Rn+ ⎩ Then
⎧ ⎨ U = u ∈ Rn+ ⎩
for i = 1, . . . , n
⎫ ⎬ n c u ≥ b ij j i ⎭ j=1
(5.53)
for i = 1, . . . , n.
⎫ n ⎬ n c u ≥ b for all i = 1, . . . , n = Vi . ij j i ⎭ j=1 i=1
Remark 5.32. The definition of the sets Vi (i = 1, ..., n) and U is strongly related with the equation (5.20). (Please note that in (5.20) r is the number of players.) Condition (5.20) guarantees that "N −1 # ˆ − E0 , C v(t) ≥ E i.e., t=0
the desired fixed point can be reached. Therefore, Vi and U contain the set of feasible control vectors which lead to an optimal solution. In this case the minimization of ϕ(u) =
n
ui
on U
i=1
also leads to a Nash equilibrium u ˆ ∈ U (see for details [51]). Further Sufficient Conditions for a Stable Grand Coalition For every non-empty subset S of N = {1, . . . , n} we choose now a non empty set US ⊆ i∈S Vi (with a property to be specified later) and define inf{ϕ(u) | u ∈ US }, if S is non-empty, v(S) = 0, if S is empty. Then v : 2N → R+ is the payoff function of a cooperative n-persons game. In the special case above we define, for every non-empty S ⊆ N , cj (S) = cij for j = 1, . . . , n and b(S) = bi i∈S
i∈S
5.2 Algorithmic Solutions for An Emission Reduction Game
and set
⎧ ⎨ US = u ∈ Rn+ ⎩
267
⎫ ⎬ n cj (S)uj ≥ b(S) . ⎭ j=1
Then it follows that
US ⊆
Vi .
i∈S
Let us assume that U is non-empty. Then every US is non-empty and, since ϕ(u) =
n
ui ≥ 0
u ∈ US ,
for all
i=1
there exists some u ˆS ∈ US such that ϕ(ˆ uS ) = v(S) = inf{ϕ(u) | u ∈ US }. Theorem 5.33. If the following assumptions are satisfied 1) For every non-empty set S ⊆ N let γS ≥ 0 be a weight such that v(N ) ≤ S∈2N \{∅} γS v(S) is satisfied. Then for every uS ∈ US it follows that
γS uS ∈ UN .
S∈2N \{∅}
2) For every non-empty set S ⊆ N there is some u ˆS ∈ US with ϕ(ˆ uS ) = v(S). n P
1
3) For every i ∈ N and every finite sequence u , . . . , u bers λ1 ≥ 0, . . . , λm ≥ 0 it is true that " ϕi
m
k=1
# λk u
k
≤
m
m
mi
∈ Ri=1
and num-
λk ϕi (uk ).
k=1
r then there exists a vector x ∈ Rn with i=1 xi = v(N ) and i∈S xi ≤ v(S) which means that the grand coalition N is stable. This theorem is proved in [7]. The result can be used to prove the following fact: In the example above Grand Coalition of the strategy sets U assumption 2) is satisfied, if U is non-empty. Assumption 3) is obviously satisfied. Concerning assumption 1) we can prove [51]:
268
5 Applications and Related Topics
Lemma 5.34. Let us assume that the assumptions of the example Grand Coalition are valid, i.e., the control set is defined by ⎧ ⎨ U = u ∈ Rn+ ⎩
n cij uj ≥ bi j=1
for all
⎫ ⎬ i = 1, ..., n . ⎭
For every non-empty S ⊆ N we define cj (S) =
cij
for
j = 1, . . . , n
and
i∈S
b(S) =
bi .
i∈S
The parameters cij are introduced in (5.20) where the terminal condition C
"N −1
# v(t)
ˆ − E0 ≥E
t=0
is expressed by n
cij xj ≥ bi ,
i = 1, . . . , n.
j=1
Then we define ⎧ ⎨ US = u ∈ Rn+ ⎩
⎫ n ⎬ c (S)u ≥ b(S) . j j ⎭ j=1
If cii > 0 for i = 1, . . . , n and cij ≥ 0 for i, j = 1, . . . , n, i = j, (which implies that U is non-empty), then assumption 1) is also satisfied, i.e., the grand coalition is stable.
5.2.3 Comments This section summarizes the results of a cooperative treatment of the TEM model. The results are related to a control problem and its equivalence problem is solved. Furthermore, the structure of the grand coalition is analyzed in detail. Necessary and sufficient conditions for the grand coalition to be stable are presented. In [51] suitable transformation principles are developed which lead to an algorithmic determination of Pareto optima. In the following, Nash equilibria are introduced and their existence within k-layered graphs is shown.
5.3 An Emission Reduction Process - The MILAN Model
269
5.3 An Emission Reduction Process The MILAN Model We present a new bargaining approach which gives rise to a procedure for an international emissions trading system within the so-called Kyoto game [78]. We have introduced the Kyoto game in the last section. Its generalization can be described by a time-discrete process [35]. An algorithmic solution which is based on the Bellmann functional equation is presented in [90]. Using a suitable bijection we can transform the problem of determining Nash equilibria within the time-discrete system to an optimization problem on a k-layered graph [65]. Existence results and algorithmic solution principles are presented. 5.3.1 MILAN: Multilayered Games on Networks The General Kyoto Game as a Multi-Step Process In the following we combine the time-discrete with the cooperative approach. To this end let us assume that the core is a feasible set and we seek an optimal strategy. We present the time-discrete model and prove an existence theorem. Furthermore, we present a solution approach based on dynamic programming. In a second step we disregard the cooperative behavior and ask for the existence of Nash-equilibria. In the following, we assume that the core exists at each time-step (for details see [91]). We obtain the following problem formulation if we consider the general multistep Kyoto game (˜ xi (t) indicates the state of the i-th player at time-step t; this state defines the core): x ˜i (t) ∈ Xi ⊆ Rni
(i = 1, . . . , n,
t = 0, . . . , N )
u ˜i (t) ∈ Core (˜ x(t)) ⊆ Rmi
(i = 1, . . . , n,
t = 0, . . . , N − 1)
x ˜i (t + 1) = x ˜i (t) + fi (˜ x(t), u ˜(t)).
(5.54)
(5.55)
In vector notation, we write (5.54) as follows: xt+1 = Tt (xt , ut ),
Tt : Xt × Ut
(t = 0, . . . , N − 1),
(5.56)
where xt := (˜ x1 (t), x ˜2 (t), . . . , x ˜n (t)), ut := (˜ u1 (t), x ˜2 (t), . . . , u ˜n (t)). We call xt+1 = Tt (xt , ut ) a general multi-step process with x0 as start vector and where Tt is a suitable vector transformation. (1) (2) (α) The states of the Kyoto game xt = (xt , xt , . . . xt )T for t = 1, . . . , N α are elements of the non-empty set Xt , xt ∈ R . The parameter α describes the dimension of the state vector. A state is called feasible if it can be realized.
270
5 Applications and Related Topics
The process is restricted to a finite time-period [t0 , T ]. The starting point is t0 = 0. We introduce intervals Ip = [tp−1 , pi ] of length Δp with tP = T . Each interval Ii is associated with the i−th step of the generalized Kyoto Game with N steps. Now we introduce the following objective function: Z(x, u) =
P −1
Vp (xp , up ) + VP (xP ), with the following stipulations: (5.57)
p=0
Vp
is the objective function of the p-th step (it depends on the state xp and the decision parameter up ).
VP
is the objective function of the P -th step (it depends upon the input value xP assuming that the decision value on the last step is zero).
We note that Z(x, u) depends on the feasible multi-step decision process P R, where P R = (x, u) := (x0 , x1 , . . . , xP , u0 , u1 , . . . , uP −1 )
(5.58)
and where xt ∈ Xt , ut ∈ Ut (xt ) := Core (xt ), xt+1 = Tt (xt , ut ) ∈ Xt+1 (t = 0, . . . , P − 1). We exactly obtain an algorithmic solution of the problem by introducing subprocesses P Rj taken just at the (j + 1)-th step of the entire process (0 ≤ j ≤ P − 1) P − j: P Rj := (x, u)j := (xj , xj+1 , . . . , xP , uj , uj+1 , . . . , uP ),
(5.59)
where xt ∈ Xt , ut ∈ Ut (Xt ) := Core (xt ), ut ∈ Rβ (β dimension of the control vector), xt+1 = Tt (xt , ut ) ∈ Xt+1
(t = j, j + 1, . . . , P − 1).
Based on the sequence P Rj , we obtain a sequence of objective functions Zj (P Rj ) = Zj ((x, u)j ) =
P −1
Vi (xi , ui ) + VP (xP ).
i=j
Similarly, we easily obtain the entire process P R for j = 0:
Z(P ) = Z(x, u) =
N −1 i=0
Vi (xi , ui ) + VP (xP ).
5.3 An Emission Reduction Process - The MILAN Model
271
5.3.2 Sequencing and Dynamic Programming In order to achieve optimality, we vary the process. To this end we consider different alternatives or different paths in the Kyoto game. We introduce the following decision function, which signifies that the players can decide between several strategies. We assume that this decision function depends on the state of the system. This gives the following representation: sj := [sj (xj ), sj+1 (xj+1 ), . . . , sP −1 (xP −1 )] decision function 0 ≤ j ≤ P − 1.
(5.60)
In the following, we present a solution method which can be derived using the well-known Bellmann Principle of Dynamic Programming. For the objective function we obtain the following representation:
Zj =
P −1
Vt (xt , ut ) + VP (xP )
(5.61)
t=j
= Zj (xj , xj+1 , . . . , xP , sj (xj ), sj+1 (xj+1 ), . . . , sP −1 (xP −1 )) =
P −1
Vt (xt , st (xt )) + VP (xP ).
t=j
Remark 5.35. Note that the function Zj only depends on the state vector and the decision strategy. However, the method is independent of the control parameter which was not part of the Kyoto game. Additionally, in the next section, we introduce the concept of optimality with respect to strategy sj . To make this precise, we call a decision (set) function feasible, if st (xt ) ∈ Ut (xt ) ⊆ Rβ for all xt ∈ Xt . The expression Ut (xt ) represents the set of feasible control parameters for each state xt . Instead of xt = Tt−1 (xt−1 , ut−1 ), we then obtain xt = Tt−1 (xt−1 , st−1 (xt−1 )). Introducing the following notation ∗ ZP,s P (xP ) = VP (xP ), ∗ ZP −1,sP −1 (xP −1 ) = VP −1 (xP −1 , sP −1 (xP −1 )) + Z∗
P,sP
∗ Zt,s t (xt )
= Vt (xt , st (xt )) +
VP (xP ) , * +, -
(TP −1 (xP −1 ,sP −1 (xP −1 )))
∗ Zt+1,s t+1 (Tt (xt , st (xt ))),
(5.62)
we obtain the following definition for optimality according to [100]: Definition 5.36. A feasible decision strategy sj (xj ), s˜j+1 (xj+1 ), . . . , s˜P −1 (xP −1 )] s˜j = [˜
(5.63)
of the process P is called optimal if the following inequality is valid: ∗ ∗ Zj,˜ sj (xj ) ≥ Zj,sj (xj )
for all xj ∈ Xj , for each feasible decision strategy sj .
272
5 Applications and Related Topics
Due to [100], we call the functions which realize an optimal decision strategy, optimal decision functions. If we take t = 0, we apply the same terminology for the entire process. These functions depend on the start vector xj ∈ Xj only. Also, they describe the maximum of the characteristic function Zj for the process j, j + 1, . . . , N . Assuming that the decision strategy sj is feasible, it is obvious that each optimal strategy s˜t represents an optimal process P˜ Rt . Consider the functions s˜t , t = j, j + 1, . . . , N − 1, which represent an optimal decision policy. These are called optimal decision functions. Similarly, we also call the states of a partial process Pt optimal. If t = 0, then we have the same situation for the entire process. Rather than varying over all possible strategies we vary over all feasible processes: We introduce fP −j (xj ) =
max
ut ∈Ut (xt ) t=j,...,P −1
Zj ((x, u)j ) =
max
ut ∈Ut (xt ) t=j,...,P −1
Zj∗ (xj , uj ) (j = 0, 1, . . . , P − 1),
where xt+1 = Tt (xt , ut ) and f0 (xP ) = VP (xP ). If we use Zj ((x, u)j ) =
P −1
Vt (xt , ut ) + VP (xP )
(5.64)
t=j
then we get: ⎡ fP −j (xj ) =
max
ut ∈Ut (xt ) t=j,...,P −1
P −1
⎣
⎤ Vt (xt , ut ) + VP (xP )⎦
(j = 0, 1, . . . , P − 1).
t=j
Applying the following technical lemma, we obtain a method for a successive and algorithmic solution principle. Lemma 5.37. Let be Y1 ⊂ R and Y2 ⊂ R. The functions g1 : Y1 → R and g2 : Y1 ×Y2 → R are assumed to be continuous, the sets Y1 and Y2 are compact. Then, this yields max [g1 (y1 ) + g2 (y1 , y2 )] = max [g1 (y1 ) + max g2 (y1 , y2 )]. y1 ∈Y1
y1 ∈Y1 y2 ∈Y2
y2 ∈Y2
This means
⎧ ⎨ fP −j (xj ) = max Vj (xj , uj ) + uj ∈Uj (xj ) ⎩ =
max {Vj (xj , uj ) +
uj ∈Uj (xj )
⎡ max
ut ∈Ut (xt ) t=j+1,...,P −1
max
ut ∈Ut (xt ) t=j+1,...,P −1
⎣
P −1
t=j+1
(5.65)
⎤⎫ ⎬ Vt (xt , ut ) + VP (xP )⎦ ⎭
∗ Zj+1 (xj+1 , uj+1 )}
5.3 An Emission Reduction Process - The MILAN Model
=
max
uj ∈Uj (xj )
Vj (xj , uj ) + fP −(j+1) (xj+1 )
273
(j = 0, 1, . . . , P − 1).
Replacing xt+1 = Tt (xt , ut ), we obtain the Bellmann functional equation: f0 (xP ) = VP (xP ) fP −t (xt ) =
max
ut ∈Ut (xt )
Vt (xt , ut ) + fP −(t+1) (Tt (xt , ut )) .
(5.66)
Due to [6], the functional equations are necessary and sufficient conditions for an optimal decision parameter u ˜t . Each solution of the Bellmann equation is an optimal solution of the process and each optimal solution of the process is a solution of (5.66). We note that this representation gives an algorithmic solution principle. This is a consequence of the following theorems: Theorem 5.38. Given a time-discrete process under the Bellmann functional equations of (5.66). Then, there exists an optimal strategy [˜ sj (xj ), . . . , s˜N −1 (xN −1 )] of the process P Rj , which starts at step j +1. The strategy only depends on input parameter xj ∈ Xj . Theorem 5.39. Given a time-discrete process under the Bellmann functional equation of (5.66). Let s˜j (xj ) = [˜ sj (xj ), . . . , s˜N −1 (xj )] (j = 0, 1, . . . , N − 1) be an optimal decision strategy for the process P Rj with xj ∈ Xj . The process sj+1 (xj ), s˜j (xj ), . . . , s˜N −1 (xj )] which results from (5.60) is an optimal s˜j+1 = [˜ decision policy for the process P Rj+1 . Theorem 5.40. Given a time-discrete process under the Bellmann functional equation of (5.66). We assume that our problem (5.56) has at least one feasible solution. The state regions Xt ∈ Rα , t = 0, . . . , P , are bounded and closed. The decision regions Ut (xt ) ⊆ Rβ with xt ⊆ Xt , t = 0, . . . , P − 1, will be represented by the core of the game. Additionally, we assume that the functions Vt and the state transformations Tt are continuous and restricted to the following regions: {(xt , ut ) | xt ∈ Xt , ut ∈ Ut (xt )}, {xP | xP ∈ XP }.
t = 0, 1, . . . , P − 1,
(5.67) (5.68)
Let Ut (xt ) ⊆ Rβ , t = 0, 1, . . . , P − 1, be continuous set functions. Furthermore, we assume the core to be bounded for each xt ∈ Xt . The assumptions mentioned above are valid for our process which is described by (5.57) and (5.58). Then we state that (5.66) has one solution. See for details [90] where this has been derived.
274
5 Applications and Related Topics
Remark 5.41. The proof is based on the application of the theorem of Krein Milman: Each general ε-core (in the Kyoto game we have ε = 0) is the convex hull of the extremal points (see [19]). For each state xt , the region Ut (xt ) is a polyhedron and a compact set. Both regions Xt ∈ Rα , t = 0, . . . , P and Ut (xt ) ⊆ Rβ , t = 0, . . . , P −1 are compact. As the functions Vt are continuous the theorem of Weierstraß can be applied. These assumptions are very strong. In the next part, a general approach is presented. The generalizations include also the core as feasible solution set. 5.3.3 Generalizations of the Feasible Decision Sets: Optimal Solutions on k-Layered Graphs If we generalize the Kyoto game presented here, we obtain a multi-objective control problem of a time-discrete systems with given starting and final states. The dynamics of the system is controlled by p actors (players). Each of the players intends to minimize his own integral-time cost of the system’s passages using a certain admissible trajectory. Nash Equilibria conditions are derived and algorithms for solving dynamic games in positional form are proposed in the following. The existence theorem for Nash equilibria is related with the introduction of an auxiliary dynamic c-game. The stationary and nonstationary case is described.
Conclusion
The considered control models generalize classical ones and comprise a large class of practical and theoretical dynamic problems. A new mathematical tool for studying and solving these classes of problems has been elaborated and a general concept of the game-theoretical approach for control problems with integral-time cost criterion by a trajectory with given starting and final states has been developed. The classification of necessary and sufficient conditions for the existence of Nash equilibria, Pareto optima and Stackelberg strategies in the considered game control models has been obtained. The dynamic programming techniques for such class of problems has been developed and new polynomial time algorithms for determining Nash equilibria and Pareto optima have been elaborated. Efficient algorithms have been derived for determining optimal strategies of players in cyclic games, dynamic c-games on network and game control problems in positional form. Additionally, the time-expanded network method for solving dynamic optimal flow problems has been elaborated. Applications of the considered multi-objective discrete control problems for the analysis of Technology-Emission-Means models and Kyoto games are described. The obtained results can be used in general decision making systems.
References
1. Altman, E., Basar, T., Srikant, R.: Nash equilibria for combined flow control and routing in network: asymptotic behavior for a large number of users. IEEE Transactions on automatic control, 47(6), 917–929 (2002) 2. Basar, T., Olsder, G.: Dynamic Noncooperative Game Theory. SIAM (1999) 3. Bellman, R.: Functional equations in the theory of dynamic programming. XILimit theorems, Rand. Circolo Math. Palermo 8(3), 343–345 (1959). 4. Bellman, R., Kalaba, R.: Dynamic Programming and Modern Control Theory. Academic Press, New York and London (1965) 5. Boliac, R., Lozovanu, D., Solomon, D.: Optimal paths in network games with p players. Discrete Applied Mathematics 99(13), 339–348 (2000) 6. Boltjanski, W.G.: Optimale Steuerung diskreter Systeme. Leipzig Akademische Verlagsgesellschaft Geest & Portig K.-G., Leipzig (1976) 7. Bondareva, O.N.: Some applications of linear programming methods to the theory of cooperative games. Problemy Kibernet. 10, 119–139 (1963) 8. Boros, E., Gurvich, V.: On Nash-solvability in pure stationary strategies of finite games with perfect information which may have cycles. DIMACS Technical Report, 18, 1–32 (2002) 9. Boulogne, T., Altman, E., Kameda, H., Pourtallier, O.: Mixed equilibrium for multiclass routing games. IEEE Transactions on automatic control. Special issue on control issues in telecommunication networks, 47(6), 903–916 (2002) 10. Brucher, P.: Discrete parameter optimization problem and essential efficient points. Operat. Res., 16(5), 189–197 (1972) 11. Burkard, R., Krarup, J., Pruzan, P.: A relationship between optimality and efficiency in multicriteria 0-1 programming problems. Comp. Oper. Res., 8(2), 241–247 (1981) 12. Burkard, R., Krarup, J., Pruzan, P.: Efficiency and optimality in minisum, minimax 0-1 programming problems. J. Oper. Res. Soc., 33(2), 137–151 (1982) 13. Butenko, S., Murphey, R., Pardalos, P.M. (co-editors): Cooperative Control: Models, Applications, and Algorithms. Kluwer Academic Publishers (2003) 14. Butkovic, P., Cuninghame-Green, R.A.: An O(n2 ) algorithm for the maximum cycle mean of an n × n bivalent matrix. Discrete Applied Mathematics, 35, 157–162 (1992) 15. Christofides, N.: Graph theory. An agorithmic approach. Academic Press, New York, London, San Francisco (1975)
278
References
16. Condon, A.: The complexity of stochastic games. Informations and Computation. 96(2), 203–224 (1992) 17. Curiel, I., Derks, J. and Tijs, S.: On balanced games and games with committee control. Operations Research Spektrum, 11, 83–88 (1989) 18. Dijkstra, E.: A note on two problems in connexion with graphs. Numerische Mathematik, 269–271 (1959) 19. Driessen, T.S.H.: Cooperative Games, Solutions and Applications. Cambridge University Press, Cambridge, New York, (1986) 20. Driessen, Th.: Cooperative Games, Solutions and Applications. Kluwer Academic Publishers, Dordrecht - Boston - London, (1988) 21. Dubey, P., Shapley, L.S.: Totally balanced games arising from controlled programming problems. Mathematical Programming, 29, 245–267 (1984). 22. Ehrenfeucht, A., Mycielski, J.: Positional strategies for mean payoff games. International Journal of Game Theory, 8, 109–113 (1979) 23. Emelichev, V., Perepelitsa, V.: The complexity of discrete multicriterion problems. Discrete mathematics and applications, 6(1), 5–33 (1994) 24. Fabrikant, A., Papadimitriu, C., Talwar, K.: The complexity of pure Nash equilibria. In Proc. of the 36th Annual ACM Symposium on Theory of Computing (STOC’04), IL, USA, 604–612 (2005) 25. Faigle, U., Kern, W., Paulusma, D.: Note on the computational complexity of least core concepts for min-cost spanning tree games. Math. Meth. Oper. Res., 52, 23–38 (2000) 26. Fleisher, L.: Approximating multicommodity flow independent of the number of commodities. Siam J. Discrete Math., 13(4), 505-520 (2000) 27. Fleisher, L., Skutella, M.: The quickest multicommodity flow problem. Integer programming and combinatorial optimization. Springer, Berlin, 36–53 (2002) 28. Florian, M.: Nonlinear cost network models in transportation analysis. Math. Programming Study, 26, 167–196 (1996) 29. Fonoberova, M., Lozovanu, D.: Optimal multicommodity flows in dynamic networks and algorithms for their finding. The Bulletin of Academy of Sciences of Moldova, Mathematics, 1(47), 19–34 (2005) 30. Fonoberova, M., Lozovanu, D.: Game-theoretic approach for solving multiobjective flow problems on networks. Computer Science Journal of Moldova, 13(2(38)), 168–176 (2005) 31. Fonoberova, M., Lozovanu, D.: Minimum cost multicommodity flows in dynamic networks and algorithms for their finding. The Bulletin of Academy of Sciences of Moldova, Mathematics, 1(53), 37–49 (2007) 32. Ford, L., Fulkerson, D.: Constructing maximal dynamic flows from static flows. Operation Res., 6, 419–433 (1958) 33. Ford, L., Fulkerson, D.: Flows in Networks. Princeton University Press, Princeton, NJ, (1962) 34. Garey, M., Johnson, D.: Computers and Intractability. San Francisco (1979) 35. Gebert, J., L¨ atsch, M., Pickl, S., Weber, G.W., W¨ unschiers, R.: Genetic networks and anticipation of gene expression patterns. Computing Anticipatory Systems: CASYS 2003 - Sixth Int. Conference AIP Conference Proceedings, edited by D. Dubois, accepted, (2004) 36. Gottlob, G., Greco, G., Scarcello, F.: Pure Nash Equilibria: Hard and Easy Games. Journal of Artificial Intelligence Research, 24, 357–406 (2005) 37. Granot, D., Hamers, H., Tijs, S.: Spanning network games. International Journal of Game Theory, 27, 467–500 (1998)
References
279
38. Grimm, B., Pickl, S. and Reed, A.: Management and Optimization of Environmental Data within Emission Trading Markets, VEREGISTER and TEMPI. in: B. Hansj¨ urgens, P. Letmathe (Hrsg.) Business and Emissions Trading, Eduard Elgar Verlag. 39. Guisewite, G.M., Pardalos, P.M.: Minimum concave-cost network flow problems: Applications, complexity and algorithms. Annals of Operations Research, 25(1), 75–99 (1990) 40. Gurvich, V.A., Karzanov, A.V., Khachiyan, L.G.: Cyclic games and an algorithm to find minmax cycle means in directed graphs. USSR, Computational Mathematics and Mathematical Physics, 28, 85–91 (1988) 41. Hoppe, B., Tardos, E.: The quickest transshipment problem. Mathematics of Operations Research, 25, 36–62 (2000) 42. Howard, R.A.: Dynamic Programming and Markov Processes. Wiley (1960) 43. Karp, R.M.: A characterization of the minimum cycle mean in a digraph. Discrete Mathematics, 23(3), 309–311 (1978) 44. Karzanov, A.V., Lebedev, V.N.: Cyclical games with prohibitions. Mathematical Programming, 60, 277–293 (1993) 45. Khachian, L.G.: Polynomial time algorithm in linear programming. USSR, Computational Mathematics and Mathematical Physics, 20, 51–58 (1980) 46. Khachian, L.G.: On exact solution of the system of linear inequalities and linear programming problem. USSR, Computational Mathematics and Mathematical Physics, 22, 999–1002 (1982) 47. Kostreva, M., Wiecek, M.: Time dependency in multiple objective dynamic programming. Journal of Mathematical Analysis and Applications, 17(1), 289–307 (1993) 48. Koutsoupias, E., Papadimitriu, C.: Worst-case equilibria. Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computing. Science Lecture Notes in Computer Science, 1563, 404–413 (1999) 49. Krabs, W., Pickl, S., and Scheffran, J.: Optimization of an n-Person Game Under Linear Side Conditions. In: Optimization, Dynamics and Economic Analysis, edited by E. J. Dockner, R. F. Hartl, M. Luptacik, and G. Sorger. PhysicaVerlag: Heidelberg, New York, 76–85 (2000) 50. Krabs, W., Pickl, S.: Controllability of a time-discrete dynamical system with the aid of the solution of an approximation problem. International Journal of Control Theory and Cybernetics, 32(1), 57–74 (2003) 51. Krabs, W., Pickl, S.: Analysis, Controllability and Optimization of TimeDiscrete Systems and Dynamical Games. Lecture Notes in Economics and Mathematical Systems. Springer Verlag, Berlin - Heidelberg (2003) 52. Krabs, W., Pickl, S.: A game-theoretic treatment of a time-discrete emission reduction model. International Game Theory Review, 6(1) (2004) 53. Krabs, W.: Personal communication 54. Lozovanu, D.: Properties of optimal solutions of a grid transport problem with concave functions of the flows on the arcs. Engineering Cybernetics, 20, 34–38 (1983) 55. Lozovanu, D.: Extremal-Combinatorial problems and algorithms for its solving. Kishinev, Stiinta (1991) 56. Lozovanu, D.: Algorithms to solve some classes of network minmax problems and their applications. Cybernetics, 29, 93–100 (1991)
280
References
57. Lozovanu, D.: Strongly polynomial algorithms for finding minimax paths in networks and solution of cyclic games. Cybernetics and Systems Analysis, 29, 754–759 (1993) 58. Lozovanu, D.: Dynamic games with p players on networks. The Bulletin of Academy of Sciences of Moldova. Mathematics, 1(32), 41–54 (2000) 59. Lozovanu, D.: Networks models of discrete optimal control and dynamic games with p players. Discrete Mathematics and Applications, 13(4), 126–143 (2001) 60. Lozovanu, D.: Polynomial time algorithm for determining optimal strategies in cyclic games. 10th International IPCO Conference, New York, NY, Proceedings, 74–85 (2004) 61. Lozovanu, D.: Polynomial time algorithm for determining max-min paths in networks and solving zero value cyclic games. Computer Science Journal of Moldova, 14(2(38)), 18–33 (2005) 62. Lozovanu, D.: Multiobjective Control of Time-Discrete Systems and Dynamic Games on Networks. Chapter in book Pareto Optimality, Game Theory and Equilibria” (edited by A. Chinchulun, A. Migdalas, P. Pardalos, L. Pitsoulis), Springer, 665–757 (2008) 63. Lozovanu, D., Fonoberova, M.: Optimal dynamic multicommodity flows in networks. Electronic Notes in Discrete Mathematics, 25, 93-100 (2006) 64. Lozovanu, D., Pickl, S.: Polynomial time algorithms for determining optimal strategies. Electronic Notes in Discrete Mathematics, 13, 154–158 (2003) 65. Lozovanu, D., Pickl, S.: A special dynamic programming technique for multiobjective discrete control and for dynamic games on graph-based networks. Electronic Notes in Discrete Mathematics, 17, 183–188 (2004) 66. Lozovanu, D., Pickl, S.: Discrete Optimal Control problems on networks and dynamic games with p players. Bulletin of Academy of Sciences of Moldova. ser. Mathematics, 2(45), 67–88 (2004) 67. Lozovanu, D., Pickl, S.: Nash Equilibria for Multiobjective Control of TimeDiscrete Systems and Polynomial-Time Algorithm for k-partite Networks. Central European Journal of Operation Research, 13(2), 127–146 (2005) 68. Lozovanu, D., Pickl, S.: An approach for an algorithmic solution of discrete optimal control problems and their game-theoretical extension. Central European Journal of Operation Research, 14(4), 357-376 (2006) 69. Lozovanu, D., Pickl, S.: Nash Equilibria Condition for Cyclic Games with p players. Electronic Notes in Discrete Mathematics, 25, 117–124 (2006) 70. Lozovanu, D., Pickl, S.: Algorithms and the calculation of Nash Equilibria for multi-objective control of time-discrete systems and polynomial-time algorithms for dynamic c-games on networks. European Journal of Operational Research, 181(3), 1214–1232 (2007) 71. Lozovanu, D., Pickl, S.: Algorithms for solving multiobjective discrete control problems and dynamic c-games on networks. Discrete Applied Mathematics, 155(14), 158–180 (2007) 72. Lozovanu, D., Pickl, S.: Multiobjective Hierarchical Control of Time-Discrete Systems and Determining Stackelberg Strategies. CTW 2007 Proceedings, 111114 (2007) 73. Lozovanu, D., Pickl, S., Weber, G.W.: Optimization, monotonicity and the determination of Nash-equilibria. An algorithmic analysis computing anticipatory systems. Computing Anticipatory Systems - Proc. Sixth International Conference, CASYS 2003, 351–362 (2004) ”
References
281
74. Lozovanu, D., Solomon, D., Zelikovsky, A.: Multiobjective Games and Determining Pareto-Nash Equilibria. The Bulletin of Academy of Sciences of Moldova, Mathematics, 3(49), 115–122 (2005) 75. Lozovanu, D., Stratila, D.: The minimum cost flow problem on dynamic networks and algorithm for its solving. Bul. Acad. S ¸ tiint¸e Repub. Mold., Mat., 3, 38–56 (2001) 76. Lozovanu, D., Stratila, D.: Optimal flow in dynamic networks with nonlinear cost functions on edges. In Barbu V, Lesiencko I (eds), Analysis and optimization of differential systems, Kluwer Academic Publissers, 247–258 (2003) 77. Lozovanu, D., Trubin, V.A.: Min-max path problem on network and an algorithm for its solving. Discrete Mathematics and Applications, 6, 138–144 (1994) 78. Meyer-Nieberg, S. and Pickl, S.: Simulation eines CO2 -Zertifikatenhandels und algorithmische Optimierung von Investitionen. Operations Research Proceedings 2002 (Selected Papers) Berlin, Heidelberg: Springer Verlag, 471–474 (2003) 79. Moulin, H.: Prolongement des jeux a deux joueurs de somme nulle. Bull. Soc. Math. France. Mem 45 (1976) 80. Moulin, H.: Th´ eorie des Jeux pour l’Economie et la Politique. Hermann, Paris (1981) 81. Murphey, R., Pardalos, P.M. (co-editors): Cooperative Control and Optimization. Kluwer Academic Publishers (2002) 82. Nash, J.F.: Non Cooperative Games. Annals of Mathematics, 2, 286–295 (1951) 83. Neumann, J., Morgenstern, O.: Theory of games and economic behaviour. Princeton: Princeton Univ. Press (1953) 84. Owen, G.: On the Core of Linear Production Games. Mathematical Programming, 9, 358–370 (1975) 85. Papadimitriu, C.: Algorithms, games, and the internet. Proceedings of the 33rd Annual ACM Symposium on the Theory of Computing. ACM, NY, 749–753 (2001) 86. Pardalos, P.M.: Enumerative techniques for solving some nonconvex global optimization problems. OR Spectrum, 10, 29–35 (1988) 87. Pardalos, P.M., Guisewite, G.: Algorithms for the single-source uncapacitated minimum concave-cost network flow problem. Journal of Global Optimization, 1(3), 245-265 (1991) 88. Pardalos, P.M., Guisewite, G.: Global search algorithms for minimum concave cost network flow problem. Journal of Global Optimization, 1(4), 309–330 (1991) 89. Pareto, V.: Manuel d’economie politique. Giard, Paris (1904) 90. Pickl S.: Der τ -value als Kontrollparameter – Modellierung und Analyse eines Joint-Implementation Programmes mithilfe der dynamischen kooperativen Spieltheorie und der diskreten Optimierung. Doctoral thesis, Darmstadt University of Technology, Department of Mathematics, (1998) 91. Pickl, S.: Convex Games and Feasible Sets in Control Theory. Mathematical Methods of Operations Research, 53(1), 51–66 (2001) 92. Pickl, S.: On norm-minimal local controllability of time-discrete dynamical systems applying the Kalman condition - an algorithmic approach. Journal of Computational Technologies, 7, 68–77 (2002) 93. Pickl, S.: Investitionsoptimierung mithilfe von TEMPI. in: W. Fichtner, J. Geldermann (Hrsg.) Einsatz von OR-Verfahren zur techno-¨ okonomischen Analyse von Produktionssystemen. Peter Lang. Frankfurt am Main, 95–109 (2002)
282
References
94. Pickl, S.: Solving the semi-smooth equivalence problem. European Journal of Operations Research, 157, 68–73 (2004) 95. Pickl, S.: Optimization under linear side conditions using inverse monotone matrices. Annales d’economie et de statistique (accepted), (2004) 96. Pickl, S., Scheffran, J.: Control and game theoretic assessment of climate change - options for Joint-Implementation. Annals of Operations Research, 97, 203– 212 (2000) 97. Podinovsky, V., Noghin, V.: Pareto optimal solution for multicriterion problem. Nauka, Moscow (1982) 98. Romanovski, I.V.: Optimization of stationary control of discrete deterministic processes. Cybernetics, 2, 66–78 (1967) 99. Romanovski, I.V.: Algorithms of solving extreme problems. Moscow, Nauka (in Russian) (1977) 100. Sebastian, H.J. and Sieber, N.: Diskrete dynamische Optimierung. Akademische Verlagsgesellschaft, Leipzig (1980) 101. Shapley, L.S.: On balanced sets and cores. Naval Res. Log.ist. Quart., 14, 453– 460 (1967) 102. Stackelberg, H.: Marktform and Gleichgewicht. Springer-Verlag (1934), English translation, entitled The Theory of Market Economy, Oxford University Press (1952) 103. Tijs, S.H.: Bounds for the core and the τ -value. Technical Report, North Holland Publishing Company, The Netherlands, (1981) 104. Van Den Nouweland, A., Maschler, M., Tijs, S.: Monotonic Games are spanning network games. International Journal of Game Theory, 21, 419–427 (1993) 105. Voge, J., Jurdzinsky, M.: A discrete strategy improvement algorithm for solving parity games. In Emerson EA and Sistla AP (eds), Computer Aided Verification, 12th International Conference, volume 1855 of LNCS, Springer, 202–215 (2000) 106. Weber, G.W.: Optimal control theory: On the global structure and connection with optimization. Journal of Computational Technologies, 4(2), 3–25 (1999) 107. Zwick, U., Paterson, M.: The complexity of mean payoff games on graphs. TCS 158, 344–359 (1996)
Index
A General optimal control model, 154 a Stackelberg solution, 67 Acyclic c-game, 87 Acyclic l-game, 101, 102 Acyclic l-games on networks, 101 Acyclic network, 36, 65, 73 Alternate players’ control, 11 Alternate players’ control condition, 11 Antagonistic dynamic c-game, 84 Antagonistic dynamic game, 81 Asymptotic behavior of the integraltime cost function, 11 Bellmann functional equation, 273 c-game, 24, 25, 45 Coalition, 251 Complex system, v Computational complexity, 45 Control function, 237 Control Marcov chains with income, 118 Control models with fixed unit time of states’ transitions, 125 Control parameter, 2 Control problem with infinite time horizon, 10 Control problems with varying time of states’ transitions, 125 Controllability, 237, 238 Controlled costs, 250 Cooperative r-person game, 251 Cooperative game, 8, 266
Cooperative games on dynamic networks, 58 Core, 250, 261 Cost, 2 Cost function, 132, 265 Cost vector function, 250 Cyclic game, 105 Cyclic game with p players, 118 Cyclic games, 111 Cyclic games with random states’ transitions, 117 Decision set, 274 Decomposition, 241, 242 Demand supply function, 188 Dichotomy method, 104, 116 Dijkstra’s Algorithm, 18 Discrete control, 81 Discrete dynamical system, 2 Discrete multi-objective game, 177 Discrete optimal control, 154 Discrete system, 233 Dominated strategy, 27 Dynamic c-game, 24, 25 Dynamic c-game with backward time-step account, 24 Dynamic c-game with constant costs, 26 Dynamic flow problem, 181, 182, 212, 214, 231 Dynamic game, 1 Dynamic game in positional form, 11 Dynamic maximum flow problem, 181 Dynamic network, 15
284
Index
Dynamic nonlinear minimum cost flow problem, 181 Dynamic programming, 15, 271 Dynamic programming algorithms, 15 Dynamic system, 4, 11, 83 Dynamical programming method, 16 Dynamical system, 117 Dynamics of the system, 2 Emission reduction game, 250 Emission Reduction Model, 233 Equivalence problem, 250, 257 Ergodic cyclic game, 116 Ergodic cyclic games with p players, 119 Ergodic games, 111 Ergodic network, 107 Ergodic zero-value cyclic games, 111 Essential strategy, 28 Feasible, 274 Feasible decision set, 274 Feasible dynamic flow, 182 Final state, 2 Fixed point, 236, 238 Fixed point controllability, 238 Flow problem, 181, 182, 200, 212, 214, 224, 229, 231 Flow storage, 186, 188 Flow storage at nodes, 186, 188 Game control model in positional form, 12 Game-theoretic control model, 160 Grand coalition, 251 Grand coalition stable, 260 Graph of states’ transitions, 15 Graph of transitions, 14 Hamiltonian path problem, 65 Hierarchical control, 7, 69 Hierarchical control problem, 7 Hierarchical control problem on an acyclic network, 73 Hierarchical control problem on network, 67 Index set, 258 Indicator function, 262 Infinite time, 152
Infinite time horizon, 10 Integral constant demand supply function, 188 Integral-time cost, 3 Investment, 240 Investment parameter, 240 Kalman condition, 239 Kyoto game, 250, 269 Linear control system, 238 Linear production game, 260 Linear system, 236 Marcov process with income, 118 Matrix game, 177–179 Matrix multi-objective game, 177 Max-min, 81 Max-min control, 81 Max-min control problem, 6, 81 Max-min paths, 83 Max-min paths problem, 83 Max-type function, 258 Maximal mean cost cycle, 107 Maximum flow problem, 181, 213 Maximum multi-commodity flow problem on networks, 214 MILAN, 269 Multi-commodity flow, 214, 229 Multi-objective, 1, 4 Multi-objective control, 4 Multi-objective discrete control, 9 Multi-objective games, 171 Multi-step process, 269 Multilayered decision problem, 244 Multilayered Games on Networks, 269 Nash equilibria, 4, 11 Nash equilibria for non-stationary dynamic c-games, 53 Nash equilibrium, 266 Network, 1, 15, 159 Network in canonic form, 87 Node, 186, 188 Non-cooperative game, 4 Non-cooperative games on dynamic networks, 22 Non-stationary control, 10 Non-stationary dynamic c-game, 48
Index Non-stationary strategies of the players, 25 Non-stationary strategies on networks, 59 Nonlinear minimum cost multicommodity flow problem, 214 Not essential strategy, 28 Null-controllability, 233, 238, 239 Optimal control problem, 15 Optimal dynamic flow problems on networks, 181 Optimal non-stationary strategy, 25 Optimal paths in a dynamic network, 18 Optimal stationary strategy, 22 Optimal strategies of players in cyclic game, 105 Optimal trajectory, 48 Optimal value of the mean integral-time cost, 11 Optimization principle, 18 Optimization principle for dynamic networks, 47 Pareto optima, 8 Pareto optimum, 265 Pareto solution, 9 Pareto solution for a cyclic game, 122 Pareto stationary strategies, 60 Pareto-Nash equilibria, 171 Pareto-Stackelberg solution, 179 Partite networks, 164 Path, 19, 48, 83 Path problem, 16, 45, 65 Polynomial time algorithm, 83, 116 Positional form, 11 Positions with random states’ transitions, 117 Potential transformation, 36, 87 Quasi-balanced, 253, 256 Quasi-balanced game, 253 Random states, 117 Random states’ transitions, 117 Reduction game, 250 Richard Bellmann, v Saddle point, 173 Semi-smooth, 252 Sequence, 236
285
Sequence of system’s transitions, 15 Sequencing, 271 Set of positions of player, 12 Set of states, 2 Shapley value, 254 Single-commodity dynamic flow problem, 181 Single-objective, 2 Single-objective control, 15 Sink vertex, 36 Solution in the sense of Nash, 6 Stackelberg, 7 Stackelberg solution, 67 Stackelberg stationary strategies, 71 Stackelberg strategies, 7 Starting state, 2 State function, 240 State of the system, 2 States’ transition, 125 States’ transitions, 117, 126, 127, 132, 141, 149, 150 Static game of p players, 68 Stationary control, 10, 152 Stationary Pareto solution, 66 Stationary strategies of players, 22 Stationary strategies on networks, 58 Supply function, 188 TEM model, 233 Time-discrete system, 1, 233 Time-expanded network, 53, 181, 190 Trajectory, 2 Transformation, 250, 268 Transitions, 117, 125, 127, 132, 141, 149, 150 Tree of optimal paths, 18 Uncontrolled system, 238 Value of a cyclic game, 105 Vector of control parameters, 2 Vector transformation, 269 Weierstraß, 274 Zero-sum control problem, 6 Zero-sum control problem with infinite time horizon, 10 zero-sum game, 81 Zero-sum game on network, 83 Zero-sum multi-objective game, 172 Zero-value cyclic game, 111