Optimal control, expectations and uncertainty
Optimal control, expectations and uncertainty Sean Holly Centre for Economic Forecasting London Business School
Andrew Hughes Hallett University of Newcastle
The right of the University of Cambridge to print and sell all manner of books was granted by Henry VIII in 1534. The University has printed and published continuously since 1584.
CAMBRIDGE UNIVERSITY PRESS Cambridge New York New Rochelle Melbourne Sydney
Published by the Press Syndicate of the University of Cambridge The Pitt Building, Trumpington Street, Cambridge CB2 1RP 32 East 57th Street, New York, NY 10022, USA 10 Stamford Road, Oakleigh, Melbourne 3166, Australia © Cambridge University Press 1989 First published 1989 British Library cataloguing in publication data Holly, Sean Optimal control, expectations and uncertainty. 1. Economics. Applications of optimal control theory I. Title II. Hughes Hallett, Andrew 33O'.O1'51 Library of Congress cataloguing in publication data Holly, Sean. Optimal control, expectations and uncertainty. Bibliography. 1. Economics - Mathematical models. 2. Control theory. 3. Rational expectations (Economic theory) 4. Uncertainty. I. Hughes Hallett, Andrew. II. Title. HB141.H62 1989 33O'.O1'51 88-9523 ISBNO 521 26444 8 Transferred to digital printing 2004
MC
Contents Preface 1 Introduction
page xi 1
2 The theory of economic policy and the linear model 2.1 The linear model 2.2 State-space forms 2.2.1 Dynamic equivalences 2.3 The final form 2.4 Controllability 2.4.1 Static controllability 2.4.2 Dynamic (state) controllability 2.4.3 Dynamic (target) controllability 2.4.4 Stochastic controllability 2.4.5 Path controllability 2.4.6 Local path controllability 2.5 An economic example
12 12 13 18 19 21 21 23 23 25 25 28 28
3 Optimal-policy design 3.1 Classical control 3.1.1 Transfer functions 3.1.2 Multivariate feedback control 3.1.3 Stabilisability 3.1.4 The distinction between stabilisation and control 3.1.5 An economic example 3.2 Deterministic optimal control by dynamic programming 3.2.1 A scalar example 3.3 The minimum principle 3.4 Sequential open-loop optimisation 3.5 Uncertainty and policy revisions 3.5.1 First-period certainty equivalence 3.5.2 Certainty equivalence in dynamic programming
31 31 31 33 34 36 37 39 43 45 48 50 50 51
VI
CONTENTS
3.6 3.7
3.5.3 Restrictions on the use of certainty equivalence 3.5.4 Optimal-policy revisions 3.5.5 Feedback-policy revisions Sequential optimisation and feedback-policy rules Open-loop, closed-loop and feedback-policy rules
4 4.1 4.2
Uncertainty and risk Introduction Policy design with information and model uncertainty 4.2.1 Additive uncertainty (information uncertainty) 4.2.2 Multiplicative uncertainty (model uncertainty) 4.2.3 Optimal decisions under multiplicative uncertainty 4.2.4 A dynamic-programming solution under multiplicative uncertainty 4.3 The effects of model uncertainty on policy design 4.3.1 Parameter variances 4.3.2 Parameter covariances 4.3.3 Increasing parameter uncertainty 4.4 Policy specification under uncertainty 4.5 Risk-averse (variance-minimising) policies 4.6 Risk-sensitive decisions 4.6.1 Optimal mean-variance decisions 4.6.2 Trading security for performance level 4.6.3 The certainty-equivalent reparameterisation 4.6.4 Risk-sensitive policy revisions 4.6.5 The stochastic inseparability of dynamic risk-sensitive decisions 4.7 The Kalman Filter and observation errors 5 Risk aversion, priorities and achievements 5.1 Introduction 5.2 Approximate priorities and approximate-decision rules 5.3 Respecifying the preference function 5.4 Risk-sensitive priorities: computable von Neumann— Morgenstern utility functions 5.5 Risk-bearing in economic theory 5.6 The utility-association approach to risk-bearing 5.7 Risk management in statistical decisions 5.7.1 The regression analogy 5.7.2 The optimal risk-sensitivity matrix 5.7.3 A simplification 5.8 Achievement indices 5.9 Conclusions: fixed or flexible decision rules?
52 52 55 56 58 61 61 64 64 65 66 67 68 69 71 72 73 75 77 77 78 78 79 80 82 87 87 88 90 92 95 97 99 99 101 102 102 103
CONTENTS
vii
6 Non-linear optimal control 6.1 Introduction 6.2 Non-linearities and certainty equivalence 6.3 Non-linear control by linearisation 6.3.1 Linearisation by stochastic perturbation 6.3.2 A linearisation in stacked form 6.4 Non-linear control as a non-linear programming problem 6.4.1 A modified gradient algorithm 6.4.2 Repeated linearisation
105 105 106 108 109 110 111 114 115
7 7.1 7.2 7.3 7.4
117 117 118 120 121 122 123 124 126
The linear rational-expectations model Introduction The rational-expectations hypothesis The general linear rational-expectations model The analytic solution of rational-expectations models 7.4.1 A forward-looking solution procedure 7.4.2 The final-form rational-expectations model 7.4.3 A penalty-function solution 7.4.4 Multiperiod solutions 7.4.5 The stochastic properties of rational-expectations solutions 7.5 Terminal conditions 7.6 The numerical solution of rational-expectations models 7.6.1 Jacobian methods 7.6.2 General first-order iterative methods 7.6.3 Non-linearities and the evaluation of multipliers 7.6.4 Shooting methods 7.7 Stochastic solutions of non-linear rational-expectations models 7.8 Controllability and the non-neutrality of policy under rational expectations 7.8.1 Dynamic controllability 7.8.2 An example: the non-neutrality of money 7.8.3 Conclusion: the stochastic neutrality of policy
141 142 144 147
8 Policy design for rational-expectations models 8.1 Introduction 8.2 The dynamic-programming solution 8.3 Non-recursive methods 8.3.1 A direct method 8.3.2 A penalty-function approach
148 148 148 150 150 151
127 128 131 132 133 136 138 139
viii
CONTENTS
8.4
Time inconsistency 8.4.1 A scalar demonstration 8.4.2 Suboptimal versus time-inconsistent policies 8.4.3 A simple example 8.4.4 The case of zero policy-adjustment costs 8.5 Microeconomic instances of inconsistency 8.6 Precommitment, consistent planning and contracts 8.7 Rules versus discretion 8.8 Sequential optimisation and closed-loop policies 8.8.1 Open-loop decisions 8.8.2 Optimal revisions 8.9 Sources of suboptimal decisions 8.10 An interpretation of time consistency
152 152 153 153 156 156 159 161 163 164 165 167 168
9 Non-cooperative, full-information dynamic games 9.1 Introduction 9.2 Types of non-cooperative strategy 9.2.1 The Nash solution 9.2.2 Simplified non-cooperative solutions 9.2.3 Feedback-policy rules 9.3 Nash and Stackelberg feedback solutions 9.3.1 The Nash case 9.3.2 The Stackelberg case 9.4 The optimal open-loop Nash solution 9.5 Conjectural variations 9.5.1 The conjectural-variations equilibrium 9.6 Uncertainty and expectations 9.7 Anticipations and the Lucas critique 9.7.1 A hierarchy of solutions 9.8 Stackelberg games: the full-information case 9.9 Dynamic games with rational observers 9.9.1 A direct method 9.9.2 A penalty-function method
169 169 171 171 173 174 175 176 179 180 181 182 184 187 187 189 192 192 194
10 10.1 10.2 10.3
197 197 198 202 203 204 206
Incomplete information, bargaining and social optima Introduction Incomplete information, time inconsistency and cheating Cooperation and policy coordination 10.3.1 Cooperative decision-making 10.3.2 Socially optimal decisions 10.3.3 Some examples
CONTENTS
10.4 Bargaining strategies 10.4.1 Feasible policy bargains 10.4.2 Rational policy bargains 10.4.3 Relations between bargaining solutions 10.5 Sustaining cooperation 10.5.1 An example 10.5.2 Cheating solutions 10.5.3 The risk of cheating 10.5.4 Punishment schemes 10.6 Reputational equilibria and time inconsistency 10.6.1 Asymmetric information and uncertainty 10.6.2 Generalisations of reputational effects 10.6.3 Conclusions
IX
209 212 212 214 215 215 216 217 218 219 221 224 225
Notes
227
References
231
Index
241
Preface This book was started while the first author was at Imperial College, London, and the second author at the University of Rotterdam. It began with the objective of first covering the standard optimal-control theory - largely imported from mathematical control theory - and parallel developments in economics associated with Timbergen and Theil; then secondly to relate this work to recent developments in rational-expectations modelling. However, the rate of progress has been such that the original objectives have been revised in the light of many interesting developments in the theory of policy formulation under rational expectations, particularly those concerned with the issues of time inconsistency and international policy coordination. The first six chapters are devoted to various aspects of conventional control theory but with an emphasis on uncertainty. The remaining four chapters are devoted to the implications of rational, or forward-looking, expectations for optimal-control theory in economics. We also examine the increasing use which has been made of game-theoretic concepts in the policy-formulation literature and how this is relevant to understanding how expectations play a role in policy. We owe many debts to colleagues past and present. A number of people have read and commented upon previous drafts of this book. In particular we are grateful to Andreas Bransma, Bob Corker, Berc Rustem, John Driffill, Sheri Markose, Mark Salmon and Paul Levine. We also owe a particular debt of gratitude to Francis Brooke, whose editorial patience was inexhaustible. We are also indebted to first Bunty King and then Sheila Klein, who shouldered the enormous burden of typing, retyping and typing again many, sometimes incomprehensible, drafts.
1
Introduction
Control methods have been used in economics - at least at the academic level - for about 35 years. Tustin (1953) was the first to spot a possible analogy between the control of industrial and engineering processes and post-war macroeconomic policy-making. Phillips (1954, 1957) took up the ideas of Tustin and attempted to make them more accessible to the economics profession by developing the use of feedback in the stabilisation of economic models. However, despite the interest of economists in the contribution of Phillips, the use of dynamics in applied and theoretical work was still relatively rudimentary. Developments in econometric modelling and the use of computer technology were still in their very early stages, so that there were practically no applications in the 1950s. However, matters began to change in the 1960s. Large-scale macroeconometric models were being developed and widely used for forecasting and policy analysis. Powerful computers were also becoming available. These developments plus important theoretical developments in control theory began to make the application of control concepts to economic systems a much more attractive proposition. Economists with the help of a number of control engineers began to encourage the use of control techniques. Independent of the development of control techniques in engineering and attempts by Tustin and Phillips to find applications in economics, there were also a number of significant developments in economics and in particular in the theory of economic policy. Tinbergen, Meade and Theil developed, quite separately from control theory, the theory of the relationship between the instruments that a government has at its disposal and the policy targets at which it wishes to aim. Theil (1964), in particular, developed an independent line of analysis which could be used to calculate optimal economic policies which was quite distinct from the methods emerging in optimal-control theory. Initially the work of Theil was overshadowed by modern control theory but as will be shown in this book his methods are capable of providing solutions to problems that are not directly amenable to the methods of modern control theory.
2
1 INTRODUCTION
The main development on the engineering-control side was the emergence of modern control theory, which had a much firmer basis in applied mathematics than traditional concepts, many of which had developed as rules of thumb, which could be used by engineers to design control processes for engineering systems. Modern methods provided the tools needed to design guidance systems to send rockets to the moon and seemed ripe for application to economic models. They also provided a precise criterion for the performance of a system in contrast to the more ad hoc classical methods. A further attraction of modern control methods was that they offered ways of analysing economic policy under different kinds of uncertainty. In principle, optimal-policy rules could be designed which took account of both additive and multiplicative uncertainty. Furthermore, the insights provided by classical control theory concerning the importance of feedback were regarded as particularly relevant to the way in which economic policy had been conducted since the Second World War. If proper allowance was not made for the dynamic behaviour of the economy, stabilisation policy could actually make matters worse compared with a policy of non-intervention. While this observation was not new to economists, since a similar point was made by Dow (1964) in a study of the post-war performance of the UK economy and by Baumol (1961) in the context of the simple multiplier-accelerator model, optimal-control theory offered a practical way of designing policy which would prevent policy-makers from destabilising the economy and indeed would make a useful contribution to the overall stability of economic activity. According to this view economic policymaking was not being designed properly. This was not to deny that there were not a number of difficulties associated with the detailed application of control methods to economic policy-making. 1 However, the feeling was that these difficulties were amenable to technical solution and that the piecemeal introduction of control methods which could prove themselves against existing methods of policy design was the best way to proceed. This optimism proved to be misplaced. This was not because policymakers failed to recognise the benefits but because a major revolution was occurring in mainstream macroeconomics which was to pull the rug out from under optimal control. The way it was being applied to economic systems was inappropriate. The introduction of the rational-expectations hypothesis into macroeconomics produced some radical changes in the way in which economic policy was perceived. The orthodox theory of economic policy was turned on its head. In the most extreme and arresting formulation of the hypothesis there was no role for macroeconomic policy, indeed there was no need for stabilisation policy. Although it was recognised later that this analytical result also required perfectly flexible prices as well as rational or forward-looking
1 Introduction
3
expectations, it was of considerable political importance since it provided some intellectual justification for widespread disillusionment with conventional economic policies in a world suffering from rising inflation and rising unemployment. With the benefit of hindsight the effect that the rationalexpectations revolution has had on optimal-control theory - as it was being applied to economics - was salutary. It helped to highlight the basic fallacy of treating economic systems as closely analogous to physical systems. Human agents do not respond mechanically to stimuli but anticipate and act in response to expectations of what is going to happen. For a long time economists - while recognising this point in theoretical analysis - usually fell back in empirical work upon various devices which enabled them to make expectations a function only of the past behaviour of the economy. But there are other circumstances where the analogy between economic and physical systems breaks down. Decision-making by economic agents is a very complex affair, especially when a number of different groups or individuals are involved, who all may want different or competing things. They may also have insufficient information both about how the economy works and about what other individuals and groups are up to. They may also have great difficulty in giving a precise expression to their preferences and objectives in a form which can be easily characterised by an objective, utility or loss function. These difficulties may extend as far as specifying how they would respond to uncertainty and how they would rank uncertain outcomes. In this book we try to take greater account of the difficulties and special characteristics of using optimal-control techniques in economics. The area has developed sufficiently to allow us to address the particular problems found in economic systems whether they concern forward-looking expectations, risk, game theory or the specification of objective functions. This does not mean that we are able to provide complete answers to all the problems that could be examined. Many difficulties remain. Nevertheless we can be a lot more confident that, in comparison with the position that optimal control found itself in straight after the rational-expectations revolution, it is now possible to analyse models in which expectations are forward-looking under a variety of assumptions about the amount of information which is available both to policy-makers and to economic agents. The book is divided into two main parts. The orthodox optimal-control framework is covered in chapters 2 to 6. Different types of policy under different degrees of uncertainty are examined. At the same time we try to extend the standard analysis and to bring out some of the elements which we feel to be of particular relevance to economics. We also examine the effect of multiplicative uncertainty, the question of risk, and non-linearities in the econometric model. In the second part of the book we take up the question of expectations and examine how the standard optimal-control problem is
4
1 INTRODUCTION
transformed by forward-looking expectations. We provide methods by which the way in which forward-looking expectations are formed can be treated jointly with the determination of the optimal policy. However, even this task is considerably complicated by the possibility that the principle of optimality may break down. This raises the possibility that there may be incentive for policy-makers to renege on their commitments and to indulge in reversals of policy. There is a temptation to seek a solution to this problem of time inconsistency by recasting the problem of economic-policy formulation in terms of a dynamic game between the government and the private sector. However, we have to be careful because there may be no strategic interdependencies between the government and the private sector if the private sector is comprised of atomistic agents. However, there are similarities between some of the problems of game theory and those of rational expectations which we consider. Moreover, policy formulation in an open economy does raise gametheoretic considerations. In chapter 2 the basic framework and notation are set out. Since we want to remain as flexible as possible we employ the standard economic linear model, the state-space model commonly used in control theory and also the stackedvector notation introduced by Theil, in which a set of dynamic equations for T periods is rewritten in a static form. We also show how they are related to each other and how both minimal and non-minimal realisations of state-space models are related to the standard form of the econometric model. Once the basic linear framework has been set out we then examine the nature of the policy problem as one between a set of policy instruments and a set of targets or objectives at which the policy-maker is assumed to aim. The question of 'controllability', or the existence of a policy, is something which comes before the question of what is feasible and desirable and simply establishes the limits within which a policy choice should be made. In dynamic models the original assignment problem solved by Timbergen has to be modified to allow for the possibility, if not the requirement, that a policy-maker will have objectives which refer not just to some future date or 'state' but also to the path by which the future state is arrived at. In comparison with static controllability, dynamic controllability is a more rigorous requirement that a policy should fulfil, especially if the policy-maker is constrained by the frequency with which instruments can be changed. In chapter 3 the question of how optimal policies can be designed is taken up. A policy is described as optimal if it is the minimum of an objective function which, mainly for reasons of tractability, is assumed to be quadratic in deviations of the set of targets and instruments from desired or 'ideal' values. The objective function is also intertemporal in that it involves a number of time periods. This objective function is minimised subject to a dynamic linear model (the subject of non-linear constraints is taken up in
1 Introduction
5
chapter 6). Three main kinds of method are described. The first is the classical method of control theory which does not necessarily provide an 'optimal' solution in the sense of minimising an explicit objective function. Instead there is greater emphasis placed upon the dynamic properties of the system under control and upon the need to obtain, often by means of trial and error, a controller or feedback rule which is robust when things go wrong or when the model of the system is poorly specified or the system is itself subject to change over time. The second method we consider - that of dynamic programming - exploits the additively recursive structure of the objective function and breaks the problem down into a number of subproblems. The optimal policy is then obtained by solving backwards from the terminal period to the initial period, so that for each subperiod the optimal solution is obtained subject to the asyet-to-be-solved separable part of the whole problem. In the third method, which uses the stacked approach of Theil, the dynamic problem is recast as a large static problem. The optimal policy is then obtained in one go rather than by backwards substitution. All three methods have their advantages and disadvantages and, although classical techniques have a number of things to be said for them, especially when, as is often the case in economics, the system under control is poorly understood or subject to 'structural' change, we devote most of the book to the study and use of the last two methods. It is worth highlighting some of the differences between the solutions that are provided by dynamic programming and the stacked approach. The most noticeable feature is that the recursive method, at least for the case we consider in chapter 3 of a quadratic cost, a linear model and additive disturbances, produces an explicit feedback solution. It is easy to see why, for a wide variety of physical, biological and engineering systems, the dynamic-programming approach should be preferred to a stacked approach. As the number of time intervals rises the dimension of the stacked form becomes impossible to handle. The stacked form is also only suitable for discrete time models. But more significantly the stacked form only provides a solution in the shape of an open-loop policy. By open-loop is meant a policy which gives precise values for the policy instruments or control variables over the period of the plan. On the other hand a feedback solution does not provide explicit values 2 but instead it provides a function which makes the control or policy instrument dependent upon the performance of the system under control. Among other things this means that the feedback controller does not have to carry a large amount of information around. All it needs apart from the feedback rule is a solution since for a convex objective function and linear constraints there is only one policy which is optimal. However, the failure of the method of Theil to provide a policy which is responsive to the evolving behaviour of the economy is more apparent than real. It has to be conceded that the stacked approach is inappropriate when the need is to design simple, cheap and
6
1 INTRODUCTION
economical feedback mechanisms which have to function without human intervention. But when, as is normally the case with macroeconomic policy, the period of gestation between one policy action and another is quite long measured in months rather than microseconds - it is perfectly straightforward to update the stacked 'open-loop' policy in each time period as new information becomes available so that this sequential multiperiod approach actually mimics the feedback solution. Even when economic policy is conducted over quite short intervals - as with money-market or exchangemarket day-to-day intervention by a monetary authority - a formal econometric model is rarely available, so that automatic intervention in the form of a feedback rule is neither possible nor, probably, desirable. More significantly, it is useful to draw a distinction between a feedback rule and a 'closed-loop' rule. Although these are treated as synonymous in much of the literature 3 we shall use them in distinct senses. In the usual econometric model policy instruments and exogenous variables are treated as the same. But we need to distinguish between those variables which are at the discretion of the policy-maker and those - such as events external to a single country which are not. The values that exogenous variables take are an important input to most econometric models, especially when forecasts are being made. Values for the exogenous variables are produced by a variety of methods. Some of these may involve submodels such as time-series models, while others will be produced by much less formal methods. This implies that assumptions which are made about exogenous variables could change in subsequent periods. This will require the recalculation of at least the intercept term in the feedback law so that there are not necessarily any computational benefits to be gained from using a feedback rule rather than a sequential updating of the stacked solution. Henceforth we will use the term 'closed-loop' to denote a full sequential policy in which current policy is revised in the light of not only stochastic and unanticipated disturbances but also anticipated shifts in expected outcomes for the exogenous variables. It has sometimes been argued that an explicit feedback rule has the advantage of being simple and easy for economic agents to understand. In practice this may not prove to be so since the optimal-feedback rule in the most general intertemporal case is a very complex, time-varying function of the structure of preferences and the parameters of the model, though we do examine some simple cases in chapter 3 where the feedback rule does become reasonably simple. It is important not to confuse what is after all only a question about the computational advantages of one method over another with matters of real economic substance. All that matters for the policy-maker in the end is what kind of economic policy is pursued, not how it is arrived at. As long as both approaches yield the same outcome it does not really matter which is used. Where it does matter is when there is something that one method can produce
1 Introduction
1
that the other method cannot and we will examine a number of situations in this book where one method is decidedly superior to the other. Nevertheless, the preoccupation with feedback rules and with a supposed desire for an endogenous response to random disturbances masks an important aspect of economic policy-making concerned with how the policymaker responds to changes in assumptions about the future behaviour of exogenous variables. Feedback rules are essentially backward-looking and, while very sensible for engineering systems, they are not sufficiently general for economic policy to encompass situations in which policy-makers want to respond in anticipation of what they expect to happen in the future. If policymakers confined themselves to responding only to unanticipated disturbances coming through the error term in the econometric model they would be introducing an inefficient restriction upon the freedom with which policy could be carried out. In essence policy-makers must also have regard to the feedforward aspect of policy. Feedforward refers to the forward-looking, anticipatory aspect of economic policy whereby policy-makers take account of expected events and revise current policy. A combination of feedback and feedforward is the closed-loop policy referred to above. In linear models with quadratic cost functions an important simplification results from what is known as the certainty-equivalence principle. What this means is that the problem of solving a stochastic control problem can be solved by ignoring the disturbances and just solving the deterministic problem. Although feedback is not very meaningful in the deterministic case, since there is nothing on which to feed back, this separation result does mean that the task of obtaining an explicit solution is much easier. Put another way, certainty equivalence means that in the stochastic problem we only need to know the first moment of the probability distribution of stochastic disturbances. However large the uncertainty of the outcome - in the sense that there is a large variance attached to the forecast error - the optimal-feedback rule will be the same. It is important to stress that this does not mean that the actual policy which is implemented will be the same, only that the structure of the optimal-feedback rule is the same. If the variance of stochastic disturbances alters exogenously then the closed-loop response will also alter. In chapter 4 we take up a number of cases for which certainty equivalence does not hold. For example, if there are multiplicative disturbances so that the parameters of the econometric model now have a probability distribution, we find that the structure of the optimal-feedback and feedforward rule will depend upon the covariance matrix of the parameters, but that the second moment of the additive disturbances will have an effect only in so far as they co-vary with the model parameters. The effect on economic policy of parameter uncertainty was first established by Brainard, who was able to show, at least in a static model, that, as the variance of the parameter on the policy variable increased, the intensity with which the policy was pursued
8
1 INTRODUCTION
tended to diminish, that is, the strength of the feedback response to random disturbances tended to weaken. One consequence of this is that the greater the degree of uncertainty attached to parameters the less interventionary a policy will tend to be. The generality of this result to dynamic models and intertemporal objective functions is not quite so clear-cut, as we will show in chapter 4. Another important issue taken up in both chapter 4 and chapter 5 is the role of risk in economic policy. Certainty equivalence seems to permit no role for risk aversion in the formulation of economic policy if we mean by this that there is no obvious way in which policymakers' dislike of high variance can be taken account of. Risk has long been recognised as an important influence on economic decision-making at the microeconomic level though the effect that it can have at the level of macroeconomic policy has only been considered more recently. There are a number of problems with the incorporation of risk considerations into the optimal-control framework. The quadratic objective function, though it has many attractions because of its tractability, is not ideally sited to the treatment of risk since the curvature of the preference function does not have the properties normally associated with the analysis of risk in economic theory. One way of overcoming this problem is to reconsider the quadratic function as some local approximation to a more general preference function which does have the desired properties. We show that approaching the problem from this perspective permits us to reparameterise the problem as one in which the policy-maker, when specifying this objective function, must have regard for the variability of the targets and instruments about which he is risk-averse. This means that in practice the prior or interactive specification of the objective function cannot be achieved without some knowledge of the stochastic behaviour of the model of the economy and in particular of how the model behaves stochastically under closed-loop control. What this point also highlights is that there is already a strong element of sensitivity to risk implicit in a quadratic formulation. The risk premium can be interpreted as approximately equal to the variance times the coefficient of risk aversion. With a quadratic objective function we are minimising the squared deviation of both targets and instruments about a desired path; so the closedloop policy can be seen as variance-minimising behaviour on the part of the economic policy-maker. We examine in chapter 4 one further question concerned with uncertainty and that is how to handle the very common problem of what form policy should take when there is uncertainty about the initial state; or in other words when the policy-maker is uncertainty where the economy actually is when the appropriate policy decision is made. There may be incomplete information available because of delays in collecting data on the economy or initial
1 Introduction
9
measurements may be subject to subsequent revision. What is required is a filter which generates optimal estimates of the inital state and this is provided by what is known as the Kalman Filter, which uses information on both the expected value of the estimate and the variance of the measurement or data revision error to calculate the optimal estimate of the initial state. The certainty-equivalence principle still applies but it is worth noting that while the structure of the optimal-decision rule is independent of the variance of the measurement error the actual policy change which occurs in response to some disturbance to the economy will be affected by the reliability, i.e. the variance of measurements of economic activity. In chapter 6 we examine the effects of having non-linear models. The vast majority of this book is concerned with linear models. This is because of the desire to obtain as far as possible explicit analytical results. Yet practically all the econometric models which are used for forecasting and policy-making are non-linear. We consider two main approaches to the problem of non-linear models. The first adapts the strategy of linearising the model about some reference path and then applying the linear theory to the linear approximation. The second approach treats the problem of minimising a quadratic function subject to a non-linear model as a non-linear optimisation problem and applies a number of algorithms which have emerged in the last few years. Whichever approach is used the difficulties attached to linearisation and problems of approximation, as well as the well-known problems of possible multiple solutions and convergence difficulties in non-linear optimisation, have to be recognised. The second part of this book starts with chapter 7. Here we switch from the standard linear econometric model of chapter 2 to the linear rationalexpectations model in which expectations of private agents are assumed to be forward-looking. This change has major effects both upon the kind of techniques which can be used to solve models of this type and upon the whole conduct of economic policy in a world in which private agents cannot be assumed to respond passively to changes in current and future expected government actions. Chapter 7 is mainly devoted to aspects of the analytical and numerical solutions of linear and non-linear rational-expectations models, and the role of transversality or terminal conditions. Casting the rational-expectations model into a version of the stacked form of Theil proves to be a particularly useful approach and we then use this form to examine the controllability conditions for rational-expectations models. Controllability is of special importance because one of the factors which gave a boost to the interest in rational-expectations models was the result of Sargent and Wallace (1975) that in a special class of model there was no role for economic policy. This controllability result led to a large amount of literature which provided counter-examples or extensions to more elaborate models. In chapter 7 we
10
1 INTRODUCTION
provide straightforward ways of establishing the degree of controllability of any linear rational-expectations model and also apply them to the original Sargent and Wallace model. In chapter 8 we take up the question of the design of optimal policies in rational-expectations models. In comparison with the standard model this is still a relatively unexplored area and it produces a number of intriguing problems. We first show that the recursive method of dynamic programming is not capable of generating an optimal policy for the initial time period, let alone for subsequent periods. The reason for this lies in the breakdown of the additive separability of the objective function in rational-expectations models. We show that this difficulty can be easily overcome if the stacked form is adopted so that an open-loop optimal policy can be obtained for the current period. A sequential approach can then be used to update policy as new information arrives. However, there is still a serious difficulty with the implementation of the optimal policy because of the problem that as the policy-maker moves forward in time it may not be optimal for him to implement the previously optimal policy, even when there are no disturbances. This difficulty, referred to as 'time inconsistency', was first identified by Kydland and Prescott in the context of rational-expectation models, though Strotz had much earlier demonstrated the nature of the difficulty in intertemporal consumption theory. Strotz proposed two ways in which the consumer could respond. He could enter into binding commitments so that subsequently he would be committed to a particular strategy. Or he might resign himself to the fact that the optimal plan he formulates today cannot be implemented so he might as well select the current decision which is best given that he will go back on his plans in the future. He will seek the best plan among those he will follow. One such consistent plan is that provided by dynamic programming. For the macroeconomic policy-maker the problem is not quite the same since the cost of reneging will be borne by the public. Moreover, if the government can enter into binding precommitments - perhaps in the form of a constitutional amendment - it would be better if the commitment were made to the ex-ante, open-loop policy rather than the suboptimal dynamicprogramming solution - with suitable sequential revisions to policy in a stochastic environment. In practice binding constraints on governments are difficult to implement, to police and even to design given the problems attached to reversing the rule if circumstances change. One other area in which dynamic inconsistency has also been observed to arise is in game theory, and this is the subject of the remaining two chapters. We show that the question of optimal-policy design in rational-expectations models can be given a much clearer motivation if the implicit game aspects of rational-expectations models are brought to the fore. This is despite the absence of strategic interdependencies. Two questions are then important:
1 Introduction
11
first, how much information each player in a 'game' has about the intentions and constraints of the other players and, secondly, whether players cooperate or collude. Typically the government is in the position of facing a large number of firms, organisations and individuals about whom it would be next to impossible to have full information. On the other hand the public is likely to have a lot of information about what the government is up to and what its preferences are, although the public will not know everything. Noncooperative solutions have the characteristic of not being Pareto-optimal in the sense that if cooperation were possible an outcome beneficial to everyone could be found. The fundamental problem is how to calculate policies which are Pareto-efficient, how to choose among the efficient set - perhaps by bargaining - and then how to ensure that players keep to their side of the agreement. The reason that methods of enforcement can be important is that there may well be incentives for players to renege on the cooperative solution. Even in non-cooperative games cheating can be a problem if there is incomplete information about the intentions of, for example, the government so that a policy announcement is made to elicit a favourable response on the part of economic agents but a different policy is actually pursued. The question of dynamic inconsistency, cheating and reneging are, then, interwoven aspects of the same problem. We consider one way in which the possible indeterminacy of macroeconomic - and presumably microeconomic - policy can be resolved and that is in terms of reputation. One of the costs a government might bear if it reneged is a loss of its reputation so that in subsequent periods the public might attach no credence whatsoever to its announcements. This might result in a suboptimal outcome for the government and make it desirable for the government to balance the advantages at the margin of reneging or pursuing the time-inconsistent policy against the costs of a loss of reputation in the future. The theory of economic policy has come a long way from the simple targetinstrument assignment framework of Tinbergen, and from the early applications of optimal control to economics. If anything, what the introduction of expectations and game theory into the theory of economic policy does is to re-establish the political-economy features of policy-making. The design of an economic policy is not simply a matter of estimating the appropriate econometric model and of applying an algorithm to minimise some objective function - even when there are complications with uncertainty in the model and in the specification of the objective function. We need a clearer understanding of how people's views of the political climate interact with the formation of expectations and how policies can be designed in such a way as to ensure that there are not serious inconsistencies in the announcement and implementation of policies.
2
2.1
The theory of economic policy and the linear model THE LINEAR MODEL
The general linear constant-coefficient dynamic stochastic model common in the econometric literature is written in structural form: A(L)yt = B(L)xt + C(L)et + et (2.1.1) where yt is a vector of g endogenous variables, xt a vector of n policy instruments and et a vector of k exogenous variables. A(L), B(L) and C(L) are g x g, g x n and g x k matrices of polynomials in the lag operator. Thus:
r, 5 and q determine the maximum lag at which endogenous variables, policy instruments and exogenous variables enter into the structural form. Ao is subject to the normalisation rule that each of the diagonal elements is unity. The vector et of random disturbances follows a fixed distribution with zero mean and known covariances. Contrary to the custom in econometrics, we do not subsume the set of policy instruments in the exogenous variables since our primary interest is in how the values of these policy instruments are to be determined. The exogenous variables are assumed to be completely outside the influence of policy-makers. It is a characteristic of the structural form that the matrix Ao is not diagonal (i.e. Ao = I). Assuming that Ao is invertible, the instantaneous relationship between the endogenous variables is removed in the reduced form, so that each endogenous variable is expressed in terms of lagged endogenous variables and current and lagged policy instruments and exogenous variables: yt = -AoHA,^-,
+ ... + Aryt_r) + A^{B{L)xt
+ C{L)e^ + A;ht
(2.1.2)
2.2 State-space
13
forms
Since a change in both policy instruments and exogenous variables can influence the endogenous variables immediately (AQ 1BO # 0, AQ X CO ^ 0), the system represented by (2.1.1) would be described by control theorists as 'improper'. Most engineering systems, either naturally or by design, have controls (or policy instruments) which can affect the system only after a lag of one or more periods. Provided that the reduced form is stable in the sense that the roots of the polynomial matrix A(L) in (2.1.1) lie within the unit circle, A(L) can be inverted to give the final form: l
B{L)xt
yt =
* C(L)et
(2.1.3)
The final form is identical to the transfer-function or input-output (blackbox) systems developed in engineering. The rational matrix [>l(L)j - 1 [B(L)], each element of which is the ratio of two polynomials (but with identical denominator polynomials, unless common factors cancel), expresses each endogenous variable as an infinite distributed lag function of the policy instruments, exogenous variables and disturbance terms. An alternative way of writing the reduced form, and one which will be of use later, is to stack the equations as: (2.1.4) where the vectors and matrices are redefined as follows:
Ao'A
Ao
/
A=
0
0
0 / 0
A01B0
Ar
I
B =
...
Ao
1
0
0
0 0
0
C=
2.2
STATE-SPACE FORMS
In the previous section we described the structural-, reduced- and final-form representations of the linear econometric model. However, a major part of modern control theory deals with models of systems in state-space form. An essential feature (because of its mathematical tractability) is that the dynamics
14
2 THEORY OF ECONOMIC POLICY AND THE LINEAR MODEL
of the system are described by a set of first-order difference equations. The state of a dynamic system is essentially a set of numbers such that knowledge of these numbers (together with a knowledge of the equations describing the system), plus the future sequence of policy instruments and exogenous variables, is sufficient to determine the future evolution of the system. The n~ dimensional space containing the state variables is the state space - hence the term state-space form to denote models represented in this way. The problem of translating (or realising) the input-output- or final-form representations of equation (2.1.3) into a representation involving first-order difference equations was first tackled by Kalman (1961, 1963). There is nothing, however, which makes the final form the only stage at which realisation occurs. The two best-known approaches to optimal control in economics, those of Pindyck (1973) and Chow (1975), used the reduced form as the basis for realisation; realisations of the structural form have also been explored (Preston and Wall, 1973). Indeed Chow has managed to develop his approach with hardly a mention of state space and its associated concepts (see also Aoki, 1976). The state-space form of a system of linear equations is written: zt = Fzt_x + Gxxt_x + Geet_x + GEet^
(2.2.1)
yt = Hzt + Dxxt + Deet + DEet
(2.2.2)
The vector zt is the state vector with state variables as elements. The dimension, / , of the state vector is the dimension of the state space. The task of getting from the structural, reduced or final form is denoted a realisation, and amounts to determining the set of constant matrices (F, Gx, Ge, Ge, H, D x , D e , DE) such that, for example in the final form (2.1.3), the mapping (xn et9 st) -+" yt is determined by equations (2.2.1) and (2.2.2). Equation (2.2.1) is the state equation and (2.2.2) the observation or measurement equation. State-space realisations are not unique since there are a number of ways of factorising the final or reduced forms. Kalman (1966) has proved that all minimal state-space realisations are those which have the minimum number of states. For the reduced form (2.1.2) the minimum state dimension is g, the number of endogenous variables times the maximum lag given by r, s or q. Thus if policy or exogenous variables appear with a lag greater than r there will be explicit state variables associated with policy or exogenous variables. One particular reduced-form realisation (Aoki, 1976) defines the state vector (assuming r> s,q and no noise term) as: z
t
=
z
t
AQ
^0xt
^0
^O^t
and zt_l=yt—A
0
A1yt_1—A
0
B1xt_1—A
0
C 1e t _ 1
2.2 State-space forms
15
Lagging z\ and substituting into the equation for zf2 and reorganising give: Repeating this procedure with z 3 we obtain an equivalent expression for z 3 and so on. The state-space form is then:
"A^A,
I
i^A2
o ... o"
0 /
... 0
z^i+G^^+G^.,
i
t-'A,
0
...
(2.2.3a)
0_
yt = (Ig:0)zt + Dxxt + Deet
(2.2.3b)
where
[
and
A~*
A
R 4- /? ~1
^ l 0 -^-1 A~l A ^0 ^ 2
^0 ' ^1 R 4 - R ^ 0 ' r > 2J
^ e
f~ A~ 1 A
C 4- C ~1
^ 0 ^1 \ A~ 1 A L 0 ^r
^O^^l C ~i- C ^0 ' W J
Each state is a linear combination of endogenous and exogenous variables and policy instruments and therefore has no direct economy interpretation. The convoluted structure of the Gx and Ge matrices is due entirely to the improper nature of the model. If the model is strictly proper (Bo = C o = 0) the companion matrix F retains its structure but Gx and Ge can be interpreted more easily. Moreover, the first g elements of the state vector zt refer directly to endogenous variables since the observation matrix takes on a trivial form. For sor^f greater than r we simply define extra states: — 7r+J+1 —Zt_1
-L R v A- C Y -f- £>r + jXt _ ! •+- 1^r + jXt _ !
for j = 1, max(s, q). The treatment of the noise term is exactly parallel to that for exogenous variables. If the error process is actually more general than (2.1.1) with autoregressive or moving-average disturbances, then the procedure is straightforward. If the error process is:
where now vt is a sequence of independent random processes with zero mean and known co variances, we can transform yt9 xt and et by multiplying through by V(L) and then treating the moving-average process D(L) as an extra set of exogenous variables. The main difficulty is not at the realisation stage but in obtaining estimates of the state, given that states will now be defined as random variables. To initiate control at time t we require an estimate of the
16
2 THEORY OF ECONOMIC POLICY AND THE LINEAR MODEL
state at time t — 1, and usually this will be provided by the Kalman Filter, which generates recursive estimates of the state. But because of the particular characteristics of econometric models the noise terms in (2.2.1) and (2.2.2) are usually identical (a case when this is not so is examined in chapter 4). Thus we can solve (2.2.2) for et and substitute into (2.2.1) to give: zt = Fzt_x + G ^ . j + Geet_, + Kyt.t
(2.2.4)
where F = F-KH,
GX = GX-KDX,
Ge = Ge-KDe,
K = D~1GE
As long as F is asymptotically stable then, as Zarrop et al. (1979a) argue, the initial state zt_x can be calculated to a high degree of accuracy by running (2.2.4) from some date, /c, in the past (using historical values on y, x and e) and assuming, for example, that zt_k is zero or equal to zt _k + 1 so that for the first term period we use:
Structural realisations Some of the loss of transparency of minimal state realisations can be removed if we adopt the structural-form realisation proposed by Preston and Wall (1973). Rewriting (2.1.2) as: yt=-t
AjA^yt-j* j=l
£ Bjxt-j+ £ C^_,
j=0
j=0
where the endogenous variables have been redefined as yt = Aoyn we can then treat this as a reduced form and obtain an equivalent realisation to (2.2.3), but where now the observation equation is: This observable canonical structural-form realisation also provides explicit information about the contemporaneous matrix Ao as well as the impact multipliers Dx and De. So far we have described realisations of both the reduced and the structural forms with little attention paid to final-form or transfer-function realisations. But there may be circumstances in which a final form is worth while. For example, if the structural model is very large - which most econometric models used for policy analysis are (the question of non-linearity is taken up in chapter 6) - then the target variables may be quite a small subset of the total number of endogenous variables. Rather than provide a full realisation for all endogenous variables, which we will have to do for the reduced form, the final form can be used to strip out all but the target variables, with possibly a
2.2 State-space forms
17
considerable saving in the dimension of the state. The number of exogenous variables might still be large but this could be reduced by defining new 'exogenous' variables as linear combinations of existing exogenous variables with the weights provided by the elements in the final-form matrices, i.e. if the exogenous-variables part of the final form is: yt = A(L)et where yt is a vector of m(^g) target variables, and A(L) is a matrix of polynomials, A(L) = A o + KXL + . . . , then for the ith target in period t = 0 we can define a new 'exogenous' variable as:
where ej is the 7th exogenous variable. A scalar example of a state-space model A scalar example will help to highlight some of the basic features of state-space representations. Consider the second-order, constant-coefficient, scalardifference equation: 2 ~T~ OQX*
—| u-tXf
i "T~ O2X*
2
(2.2.5)
The realisation employed by Chow (1975) would be: zt =
(2.2.6)
Fzt_x+Gxxt
where ax F=
a2
1 0 0
&! b2 0
0
0
0
0
0
0
0
b0 _
si
0
(2.2.7)
0.
with each element of the state vector:
A = yt>
zf = xt K
t~i
Each state has a direct interpretation in terms of current and lagged values of the endogenous variables and the policy instrument. Note that x,_! appears in the state equation. An equivalent minimal realisation can be written: a2 y, =
(2.2.8) (2.2.9)
18
2 THEORY OF ECONOMIC POLICY AND THE LINEAR MODEL
with
2.2.1
Dynamic equivalences
The equivalence between the various forms of econometric model representation can be seen from the dynamic structure. If we rewrite the state equation (2.2.1) using the matrix lag operator L as: (/ - FL)zt = GxLxt + GeLet + GELet
(2.2.10)
and then substitute into (2.2.2): + [H(I-FL)-1GeL
+ De]et
l
+ [H(I - FL)~ GEL + £D,]e
(2.2.11)
Equating terms between (2.1.2) and (2.2.11), we have: [A(Lj] ~x B(L) = U(L) = H(I-FLy1 1
GXL + Dx
1
[A(L)\ - C{L) = A(L) = H(I-FL)[A(LY\ ~*
GeL + De
= T(L) = H(I -FL)-
1
GEL + De
Taking power-series expansions: (2.2.12) (2.2.13)
i=0
(2-2.14) we have: Gx Ge o
(2.2.15) (2.2.16) (2.2.17)
The final-form matrices, IT,, A,, Ti9 i = 1,2,..., are the matrix multipliers of the econometric model. Thus n 0 = Dx, A o = De are the impact multipliers of the policy instruments and exogenous variables respectively. Similarly the
23 The final form
19
matrices Tlh Ah Ti9 i = 2 , 3 , . . . are the dynamic matrix multipliers at lag i. So each element describes the impulse response of the endogenous variables to a one-off shock to the policy instruments, exogenous variables and random disturbances. On the assumption that F is stable (all roots lie within the unit circle) the steady-state matrix multipliers (step responses) representing the equilibrium effects on the endogenous variables of a sustained unit-change in the policy instruments or exogenous variables are given by: n°° = Dx + H(I -F)~1GX
(2.2.18)
A00 = De + # ( / - F)" x G e
2.3
(2.2.19)
THE FINAL FORM
Another way to represent the linear model, in a form which will prove useful for a number of policy-optimisation problems, is to use the stacked final forms ofTheil (1964). Consider the linear reduced-form model (2.1.4). By backward substitution as far as some known initial condition y0, we can write: l
jZ
j=0
^+\ j=0
Aet.j
(2.3.1)
j=0
Consequently all the policy-impact multipliers in this system and the dynamic multipliers: dyjdxt
and
dyt _/dx f for T - t > j ^ 1
can be evaluated directly from the coefficient matrices of the reduced form: dyjdxt = B
and
dyt +j/dxt = AjB
The same holds for those multipliers of the exogenous (uncontrollable) variables, et, and those of the random disturbances. Equation (2.3.1) expresses the model in its 'final' form for given initial values, y 0 . For the planning interval t= 1 , . . . , T as a whole we can now describe the behaviour of the endogenous variables in terms of the realisations of the instruments, the uncontrollable exogenous variables and random errors as follows. If we define the stacked vectors:
then the final form model can be written (Theil, 1964): Y = RX + s
(2.3.2)
20
2 THEORY OF ECONOMIC POLICY AND THE LINEAR MODEL
where R2
R=
_RT
...
R, _
and A
c.
I
2
AC
A
A 5 =
l
C
sT
4T-1
C
and of course Rt = A1 XB. Thus R contains all the impact and dynamic multipliers relevant to the period of study, while s collects together all the external influences on the system. X, and through it 7, is subject to the policymaker's choice. The vector s is not subject to his choice and so the value of X which is selected must be conditional on some assumed values for s. Once these are fixed the policy-maker is free to select the whole of X in one operation, or alternatively to select each component x, separately. A difficulty which arises is that, since 5 is stochastic, it will be unknown to the policymaker in advance. In this case it does matter whether X is actually calculated as a whole, or whether each component xt is calculated sequentially and therefore revised at each period. If there are sequential revisions then it will be possible to exploit any information which becomes available during the period in which policy is being implemented on the actual realisations st-j9 j ^ 1, and/or on revisions to the assumed values st+j, j ^ 0. The relationship between the stacked final forms (2.3.2) and the first-order difference equation (2.1.4) can be seen more clearly if we take the yt _ x term to the left-hand side and then stack the difference equations as: 0
/ -A
/
B
0
0
B
-A -A C
I
yT
x1 x2 XT
0 (2.3.3)
0
C
2.4 Controllability
21
and then premultiplying by: / -A
0 I -A
0
I
-l
A
/
2
A
A -A
I
A7 - 1
AT-2
gives us equation (2.3.2). It will prove useful analytically to be able to solve for the endogenous variables in this manner. Computationally, however, there is no advantage. 2.4
CONTROLLABILITY
Before we consider the relative merits of different policy-design techniques, it is important to check that sufficient intervention instruments actually exist. Given a model of an economy, in any one of the forms examined in previous sections, the fundamental question is whether, given the range of policy instruments at the disposal of the policy-maker, it is possible to reach any desired policy objectives. If this is possible, the model is said to be controllable. Thus controllability is an essential property guaranteeing the existence of a sequence of decisions which can achieve arbitrary pre-assigned values for the target variables within some finite planning interval (Preston, 1974; Preston and Sieper, 1977). And, as we shall see later in this section, and in subsequent chapters, the concept lies at the heart of most discussions of economic policy. Indeed many of the conflicts which can be observed in discussions of economic policy arise because of differing views of the degree of controllability of the economy, the effectiveness of particular instruments, and the desire to invent new instruments to improve the controllability of the economy, as well as differing views of the short-term and long-term effects of policy. Controllability, however, only establishes the necessary conditions for the existence of policy. Sufficient conditions involve much broader considerations than just the technical properties of a given model. There may, for example, be severe institutional and political constraints (apart from technical constraints such as non-negative interest rates) on the range over which policy instruments can vary, as well as on their rates of change. Moreover, there is a class of macroeconomic models which, under certain assumptions about the formation of expectations by private agents, is not always controllable. This class of model is examined later, in chapter 7. 2.4.1
Static controllability
Suppose there are m targets yt (a subset of all the endogenous variables in yt), and n instruments xt9 in each decision period t = 1 , . . . , T. A naive strategy is
22
2 THEORY OF ECONOMIC POLICY AND THE LINEAR MODEL
to pick xt to reach the desired yt values period by period, each time taking all lagged variables and expected values of other exogenous variables as temporarily fixed. To illustrate this, suppose the targets and instruments are linked through the model (2.1.4), so that: yt = Ao M I
AjK-j + Bo*t + s + X Bjxt-j) + ut
(2.4.1)
where ut contains all the non-controllable (random) variables. Let yf, for f = l , . . . , r , be the ideal target values. Then, according to this myopic strategy, the interventions required are: xf = D~l{~/t - at) for t = 1 , . . . , T
(2.4.2)
where D contains just the rows corresponding to the target variables from A^BQ, and where dt contains the same rows from:
The vector at will be a known quantity at the start of period t if Et(ut) represents the expected value of the non-controllable elements. Obviously xf only exists if n = m and D is non-singular. (If n > m, multiple solutions exist; but (2.4.2) can still be used by transferring n — m surplus instruments into s.) Under these conditions yf will be achieved exactly in every period, provided the realised values of ut actually coincide with their expected values. The model is then said to be statically controllable. This proposition is due to Tinbergen (1952). It is often convenient to distinguish the simple necessary condition, that there should be at least as many instruments as targets (n > m), from the more complicated necessary and sufficient condition that those instruments should also be linearly independent (D non-singular). There is nothing in this framework to ensure that the interventions xr* are 'realistic', or politically and administratively feasible; the static-controllability conditions do no more than guarantee the existence of a policy package to achieve arbitrary target values. There are a number of other difficulties: (i) There is no indication of how to pick x* if n < m. (ii) It is obviously myopic because the consequences of xf will determine, in part, the values which will be necessary for x f * +1 ,.. .,x£. Policymakers are seldom indifferent to the size of the interventions, and a strategy which takes account of the implications of xf for the values needed for x* + 1 ,.. .,x% will almost certainly reduce the size of the interventions required. (iii) On what grounds is the use of Et(ut) justified? Given that we can deal with no more than expected target values here, a risk-averse strategy to reduce potential variability in the actual yt values might be preferable.
2.4 Controllability 2.4.2
23
Dynamic {state) controllability
It is possible to avoid some of these difficulties if we try to reach the ideal target values at some pre-assigned date in the not-too-distant future rather than in every decision period. The targets are allowed to take whatever values they need along the way. In the control-theory literature, controllability describes the general property of being able to transfer a dynamic system from any given state to any other by means of a suitable choice of values for the policy instruments. Ignoring the exogenous variables and the random terms in equation (2.2.1), the difference equation: z^Fz^.+G^.,
(2.4.3)
can by backward substitution be written solely in terms of the initial state and past policy-instrument values: zt = Fz0 + PtX0
(2.4.4)
where Xo is a m-dimensional stacked vector of past policy instruments X'o = (x' 0 ,xi,.. .,x,'_i)
(2.4.5)
and where Pt = (G'-1Gx9F'-2Gx9...9FGx9Gx)
(2.4.6)
is of dimension / x tn. Thus to transfer any given state to some alternative state, z°, requires the solution of the equation: z*-Ftzo-PtXo = 0
(2.4.7)
Thus the dynamic system (2.4.3) is said to be completely state-controllable if the matrix Pt in (2.4.7) is of rank / , denoted r(P) = f.
2.4.3
Dynamic (target) controllability
The property of state controllability is very important, but, as we have seen in section (2.2), a state vector in the minimal-form realisation is some combination of economic variables. This is not always easily interpreted and it is not always possible to relate the necessary conditions for controllability directly to the characteristics of an economic model. A more tractable formulation can be derived if we consider the ancillary problem of transferring the value of an endogenous variable from any initial point to any other point. For the case of no exogenous variables or random disturbances the observation equation is: yt = Hzt + Dxxt
(2.4.8)
24
2 THEORY OF ECONOMIC POLICY AND THE LINEAR MODEL
Substituting (2.4.2) into (2.4.6): yt = HFz0 + HP*X0 + Dxxt
(2.4.9)
the system is said to be completely target-controllable if the matrix: P* = (HFt~1Gx,...,HGFx,Dx)
(2.4.10)
is of rank t (the number of endogenous variables) where t>g. But rank(F) < g, so (2.4.10) reduces to rank(P*) = g. This condition for controllability can be more readily understood since each of the terms in (2.4.10) can be seen to be constructed from the impact and dynamic multipliers of the econometric model (recall section 2.3). Indeed this controllability condition for the non-minimal realisation advocated by Chow (1975) makes explicit exactly this point. By backward substitution, (2.1.4) can be written as: PtXo
(2.4.11)
Pt = {At~lB,At-2B,...,AB,B)
(2.4.12)
where now and
and (without loss of generality) all uncontrollables have been set to zero again. The general economic system represented by (2.1.4) is therefore dynamically controllable if and only if the matrix Pt has rank g, where t > g (for a direct proof, see Turnovsky, 1977, p. 333). Hence there must be as many independent policy choices which can be made between now and t as there are targets in yr If we have just one instrument, we could in principle assign it to a different target in each period and give that target just the impulse required for the dynamics of the system to carry it along to its ideal value at t (t>g). If we had more instruments, we could reduce the lead time needed to get all targets to their ideal states because multiple assignments are possible in each period. Ultimately, if n > g, it could all be done in a single period. There is therefore a trade-off between using more instruments to reach the desired target values earlier, and allowing a longer lead time to compensate for a lack of, or a smaller number of, independent instrument choices. The lead time (or degree of policy anticipation) required to reach ydt from an arbitrary state t periods earlier, y0, is of course t — 1 periods. On the other hand there is no point in considering t>g since the final policy impulse (AgB) will, by the CayleyHamilton theorem, be a combination of the earlier impulses (AlB, for 0 < / < g — 1) so no independent instrument choices are added by extending the lead time (Preston and Pagan, 1982). Thus the necessary and sufficient
2.4 Controllability
25
condition for dynamic controllability over t periods is r a n k ^ " 1 ^ , . . .,AB,B) = g. The dimensions of this matrix show nt^g is necessary for dynamic controllability, and that implies that the choice of lead time is bounded by (g — n)/n
m) since rank(tf t = SAl~1B)^ min[r(S), r(Al~1Bj] = m. The obvious necessary condition here is nt > m, which implies that the choice of t will now be bounded by (m — n)/n
2.4.4
Stochastic controllability
One limitation of the controllability analysis so far is that it is restricted to the control of the expected values of the joint distribution of the target variables. We can extend dynamic controllability to cover the distribution of the targets, as opposed to their deterministic expectations Et(yt), only in the sense that the variance-covariance matrix V(yt) will remain bounded above and below under certain conditions. In terms of (2.4.1), these conditions are satisfied if V(ut) is finite when r(dyt/dut) = m and r{Rt,.. .,Rl) = m. Such a system is said to be stochastically controllable. Thus (2.4.1) is stochastically controllable if and only if it is dynamically controllable (see Aoki, 1976, for this extension).
2.4.5
Path controllability
Two potential difficulties with dynamic controllability in practice are: (i) Unless yf happens to be on an equilibrium path, the targets are only sure to pass through the pre-assigned values; they will not necessarily stay there or follow any specified trajectory thereafter.
26
2 THEORY OF ECONOMIC POLICY AND THE LINEAR MODEL
(ii) It will only be possible to steer the expected target values to their ideal values; nothing has been said about holding the targets to a pre-assigned path in the face of unanticipated shocks. The first point can be dealt with by path controllability, which is a condition ensuring the existence of a policy sequence which drives the (expected) targets along some pre-assigned ideal path for yf. This reflects the fact that the purpose of policy-making - and more particularly stabilisation - is generally to control the anticipated path of the economy according to some preconceived notion of the ideal trajectory. This cannot be set by one set of ideal values, yf, for one particular date. Except perhaps for the politician who is desperate for re-election at any price, there is no obvious terminal date by which time the economy should be in a given state and after which events are of secondary importance and may be ignored for the present. Path controllability, or the existence of an intervention strategy which will move the endogenous variables (in expectation) along a pre-assigned path for a pre-assigned period (t > g)9 is assumed if and only if B
...
A2g~2B
B
9 l
(2.4.13) 0
...
A ~B
has rank at least g2 (see Preston and Pagan, 1982). However, we are hardly very interested in controlling all the endogenous variables at once. Once again this condition simplifies for the path controllability of just the target variables after premultiplying the system by the selection matrix in order to pick out the targets. In this case, path controllability implies that (2.4.14) 0 must have rank at least m2 (where t > m). In large systems, the path controllability condition - especially that for the complete system - rapidly becomes unwieldly and difficult to compute in practice. Fortunately Aoki and Canzoneri (1979) have provided a set of sufficient conditions which are easier to verify. These conditions, applied to the complete system, are formulated in terms of the g x g0 matrix E = (Ig: 0); A from (2.1.4); and D' = (D'u.. .D'r) where Dt = AtB0 + Bt} Then (2.1.4) is path-controllable if one of the following is true: (a) The first non-zero matrix in the sequence B0,EAD9.. .9EA9~1D has rank g. (b) The matrix I
has rank 2g.
2.4 Controllability
27
(c) A is non-singular and (Bo — EA~1D) has rank g. (d) Bo has rank p > g, and a permutation matrix P exists such that P(E: Eo) = r R i L say, where B% is p x # and ° has rank IE2DJ Bo and Bo differ in rank by g. (e) 0 Note that the only function of £ is as a selection matrix to pick out the leading g rows from A or D. These sufficient conditions may obviously be reformulated for the path controllability of just the target variables by defining E to annihilate the nontarget rows of Yt9 A and D, in addition to their lower g(r — 1) rows which correspond to the lags in Yt of (2.1.4). That is done by setting the unit elements in non-target rows of Ig to zero and then replacing g by m throughout the rank conditions (a) to (e) above. Thus in performing these calculations E will actually be replaced by the appropriate target-row selection matrix SE, Bo will be replaced by SB0, and the rank conditions will involve the number of target variables m. To these simple sufficient conditions, we should add one simple necessary condition derived directly from (2.4.14); namely, n^m or static controllability.2 Together these various conditions provide a practical (if approximate) way of testing for path controllability. The path-controllability conditions have been presented here directly in terms of the parameters of the econometric model. To do this we have used the discrete-time version of the Aoki-Canzoneri analysis due to Wohltmann and Kromer (1984), and have further translated their conditions from a statespace model system to the parameters of the underlying econometric model using the relationships between the two systems set out in Hughes Hallett and Rees (1983, p. 178). This has allowed the conditions to be condensed on just the target variables from a large econometric system. Further discussion of the conditions for path controllability, including corrections to several of the early statements of those conditions, will be found in Preston and Pagan (1982), Wohltmann (1984), Wohltmann and Kromer (1984) and Buiter and Gersovitz (1981, 1984). Finally, path controllability conditions may be expressed also in terms of the state-space model parameters of (2.2.1) and (2.2.2). The matrix in (2.4.13) becomes: D
HG
...
HE2f-xG (2.4.15)
0
D
with rank at least ( / + \)g. Similarly the sufficient conditions (a) to (e) are as stated above, but with D of (2.2.2) for B o , H for £, F for A, and G for D. Then / replaces the rank value of g throughout, except in (b) where 2g becomes / + g.
28
2 THEORY OF ECONOMIC POLICY AND THE LINEAR MODEL
2.4.6 Local path controllability A less stringent requirement would be local path controllability. In this case we seek a policy sequence which drives the (expected) targets along a preassigned path from yd to yd+t over the interval (t, T+t) inclusive but not necessarily beyond. Given an arbitrary initial state y0, the necessary condition for local path controllability is (Preston and Pagan, 1982): m(T+l)
(2.4.16)
The minimum lead time here is t > (m — n)(T+ l)/n, and this reduces to zero only under static controllability. As before, the maximum lead time is m periods. Similarly the period of local controllability is given by: T + l ^ - ^ — i f m>n m-n
(2.4.17)
Hence the controlled period can be no more than mn/(m — n) if m > n, but it may be unbounded if m = n (or m < n). One could therefore regard dynamic controllability as a special case of path controllability where the control period reduces to a point (T = 0); dynamic controllability is therefore also necessary for path controllability. 2.5
AN ECONOMIC EXAMPLE
Some of the concepts of the previous sections can be reinforced with an economic example. Black (1975) has developed a simple continuous time version of the quantity theory with a Phillips curve in a closed economy. The model is written in logarithms. The log of the ratio of actual real output at its normal level (y) changes in response to deviations of actual (ms) from desired money balances (md) : y = a(ms-md)
a>0
(2.5.1)
Normal output grows at a constant, exogenously determined, rate h. The demand for money (md) is a function of real incomes, prices and the expected rate of inflation (pe): m
= k + ht + y + p-cpe
c>0
(2.5.2)
The actual rate of inflation (p) is described by the expectations-augmented Phillips curve: p = p* + vy
v>0
(2.5.3)
Expectations of the rate of inflation are determined adaptively: pe = b(p-pe)
b>0
(2.5.4)
2.5 An economic example
29
By substitution and differentiation we finally obtain the third-order differential equation for y (i.e. deviations of output from normal): y + ay + av(l — bc)y 4- abvy = am
(2.5.5)
Although this is a differential equation, conversion into state-space form is precisely analogous to that for difference equations. Given the nth-order differential equation: y ° + ai/""" 1 * + ... + ocny = poxin) + f$x{n~l) + ... + finx
(2.5.6)
(3)
i.e. y = y, etc., it can be written in first-order matrix form by defining zt = y, z2 = y 2 , . . . , z n = y", and since zt- = zi + 1 , we have: (2.5.7)
z=Az + bx y=(l
0
...
(2.5.8)
0)
where since fio = 0 a
A=
av(l-bc) abv
1
0
0
1
a b=
0 0
0 0
and each state is defined as Zi=y
z2 = y — abvy z
= y — abvy — av(l — bc)y — am
The stability properties (in the absence of feedback) of the system (2.5.7) can be determined from the state-transition matrix A. The system (2.5.7) is asymptotically stable if and only if A is a stability matrix, i.e. all the characteristic roots of kk oiA have negative real parts. The characteristic roots of A are the roots of the characteristic polynomial (Gantmacher, 1959): det(2/ - A) = P + aP + av(l - bc)X + abv
(2.5.9)
The Routh-Hurwiz criterion can be employed to test for the stability of A using the coefficients of the characteristic polynomial without evaluating the roots explicitly. The criterion can be expressed with reference to determinants formed from the coefficients of (2.5.9): 1
0
abv av(l—bc) 0
0
a ab
= abva2v(l — be) — abv
30
2 THEORY OF ECONOMIC POLICY AND THE LINEAR MODEL
'2 = 1 , ,« . \ = 1 abv av(l —be)
a2v(l-bc)-abv
The roots of the characteristic polynomial all have negative real parts if and only if the determinants V1, V2, V3 are greater than zero. Since a, b, c and v are all positive, the necessary and sufficient conditions reduce to: a2v(l —be) > abv bc<\ The stability of the model is thus dependent upon whatever values the parameters that appear in the differential equation take. Stability is more likely if real expenditure responds quickly to excess money holdings and if expectations about the rate of inflation respond slowly to the actual rate of inflation. Stability is also more likely if c, the effect of expectations of inflation in decreasing the demand for money, has a low value. For the differential equation system (2.5.7), the controllability criterion is that the g x gn matrix (n = 1, in this case) has rank g, where g is the order of the state vector and n is the number of policy instruments. In our case, with g = 3, n = 1, the controllability matrix is: a2bv
a
V1
2
0
a v(l-bc)
V
2
0
a3bv
a
where V =a*(bv)2 + I
2
v(l-bc)
2
V =a to[ai;(l-bc)]+a2 Searching for linear dependency among the rows and columns of P, it is clear that only the second and third columns are possible candidates. Extracting the common term, a2, we have: a(bv)2 + v(][-be)
bv -be) 1
=k
1 + bv(av(l -be)) akv
This will hold only if k = 1/abv = 0. So, given a plausible range of values for the coefficients a, b and v, Black's (1975) model is dynamically controllable.
3 3.1
Optimal-policy design CLASSICAL CONTROL
A natural starting-point for any examination of the design of policy is the classical theory of control. Classical control theory was initially concerned with dynamic systems with one input (control, policy instrument) and one output (target variable), though we shall consider some more recent extensions to the multivariate case below. 3.1.1
Transfer functions
Consider the scalar difference equation: 2 + b1xt_1 + b2xt_2
(3.1.1)
which is identical to (2.2.5) but with bo = 0, so that the system is strictly proper. In polynomial form we have an open-loop system:1 a(L)yt = b(L)xt
(3.1.2)
with a final or transfer function form as in section 2.1: = n(L)xt
(3.1.3)
The zeros (roots, or eigenvalues) of the characteristic polynomial a(L) are referred to as the poles of the transfer function n(L) and the zeros of b(L) are the zeros of n(L). If xt is set to zero, and there are known initial conditions, the solution for yt9 the complementary function of (3.1.1), tends to a steady state as t tends to infinity only if all the poles of n(L), denoted by kx, A2, • •., An, lie outside the unit circle. Suppose we were to add to (3.1.2) a feedback rule so that the modified policy instrument xt is now a function of yt: g(L)xt = n(L)yt
(3.1.4)
or, as a transfer function: (3.1.5)
32
3 OPTIMAL-POLICY DESIGN
The difference between what were previously the values for the policy instrument and what is now fed back in response to the values of the target is: rj = xt-xt
(3.1.6)
so that the behaviour of yt is now described by: y, = n(L)x, + n(L)n,
(3.1.7)
Eliminating xt we have: yt = nc(L)rjt
(3.1.8)
where nc is called the feedback transfer function: l
n(L)
(3.1.9)
compared with the open-loop transfer function in (3.1.3). Since the dynamic behaviour of yt in (3.1.8) is completely determined by the poles and zeros of nc(L) a number of methods have been devised to determine the form of (3.1.4). Some of these are particularly useful when the parameters of (3.1.1) are not well determined. The Nyquist criterion (Dorf, 1970), for example, can be used to establish over what range of parameter values the open-loop system is stable. Suppose, furthermore, that the open-loop transfer function n(L) is multiplied by a parameter k (referred to as the gain) so that the closed-loop transfer function becomes: nc(L) = kn(L)/[l + Kn(L)h(L)]
(3.1.10)
The root method is concerned with establishing how the poles of nc(L) vary as k is varied from zero to infinity. The various approaches to feedback in the classical scalar case are covered extensively in Truxal (1972) and Dorf (1970). An application in economics which emphasises the insights classical feedback theory gives into questions of stability and control is Livesey (1979). Supporters of the developments and use of classical control methods have always had reservations about modern control theory because of the sometimes highly conditional nature of optimality and the very narrow class of problems for which exact solutions can be obtained (MacFarlane, 1972). The optimal can sometimes be the enemy of the good. Robustness and stability are often more important criteria than optimality, especially when the system being controlled is poorly understood. Salmon and Young (1979) have pointed out the disadvantages of strict optimality for misspecified systems. More recently Maciejowski and Vines (1982) have explored the applicability of frequency-domain techniques to the design of decoupled controls for the assignment policies of Meade (1982) (see also Vines, Maciejowski and Meade, 1983).
3.1 Classical control 3.1.2
33
Multivariate feedback control
The extension of classical feedback theory to the multivariate case is associated mainly with Rosenb rock. (1970) and his interesting connections with the econometric and state-space representations discussed in sections 2.1 and 2.2. Consider the special case of (2.2.1) and (2.2.2), a strictly proper form and ignoring the noise term and exogenous variables: zt + 1=Fzt
+ Gxxt
(3.1.11)
yt = Hzt
(3.1.12)
where for convenience the state equation is led by one discrete interval. Suppose we apply linear state feedback, so that each of our control instruments is a linear combination of state variables such that: xt = Kzt
(3.1.13)
where K is a constant n x m feedback matrix. In fact the feedback is usually written in terms of outputs (endogenous variables in our usage) rather than states, but, since the mapping between states and output in the observation equation (3.1.2) is straightforward, it is easy to move from one to the other. Substituting (3.1.13) into (3.1.11), we have the modified state equation: zt + 1 = (F + GxK)zt
(3.1.14)
For a given xt, t = 1 , . . . , T, and an initial state z 0 , the state transition matrix F in (3.1.11) completely determines the future dynamic evolution of the state vector. But in (3.1.14) it is the new systems matrix (F + GK) with feedback which determines the future behaviour of the state. Feedback can thus be seen to fundamentally change dynamic behaviour. Generally, as with the scalar-case equation, (3.1.4) is described as being under closed-loop control, where again the system (3.1.11) is described as open-loop. Thus under certain conditions K can be chosen so as to tailor the dynamic response of the system to whatever is required. Feedback can be described in a more classical manner in terms of the outputs if we obtain the final-form or transfer-function representation of the state-space model of (3.1.11) and (3.1.12): -F)-lGxxt
= n(L)xt
(3.1.15)
where now L" 1 acts as a forward shift operator, L~lxt = xt + i. We can write: (L-i
_F)-i
=adj(L~
1
-F)/dQt(L~l
-F)
(3.1.16)
The elements of the multivariate transfer function in (3.1.15) are rational functions with a common denominator det(L - 1 — F) which is also the characteristic equation of the state equation (3.1.1). If we apply the feedback
34
3 OPTIMAL-POLICY DESIGN
rule or law (3.1.3) then the characteristic equation of the system under closedloop control is: det(L~1-FGxK) = 0
(3.1.17)
Thus, as with the scalar case, the purpose of the feedback rule is to change the poles and the zeros of the open-loop multivariate transfer function. The degree to which the poles of the transfer function (the eigenvalues of the statecharacteristic equation) can be moved to more desired locations in the complex plane is directly related to the controllability criterion given in section 2.4. Assuming, as will always be the case in econometric models, that F and G x in (3.1.11) are real and letting An = (/i 1 ,/i 2 ,...,/^ n )bean arbitrary set of n complex numbers such that any which are not purely real occur in conjugate pairs, then if the system (3.1.11) is completely controllable there exists a real feedback matrix K such that the roots of the matrix (F + GXK) are the set An. This is known as the pole-assignment theorem (see section 3.1.3 below). As a special case of this property, the model (3.1.1) is said to be stabilisable because K can, of course, be chosen such that (F + GXK) is stable, i.e. p(F + GXK) < 1 where p(.) denotes spectral radius. Stabilisability, perhaps, should be regarded as the extension of the concept of controllability to the stochastic case since it implies that the system can be controlled in such a way that it will tend to return to the desired yt if any expected shock should perturb the system between now and period t. In fact the condition that the eigenvalues of the system under feedback can be assigned to any values implies something stronger than just stabilisability. It implies that we can return the system to the desired path at a predetermined, arbitrarily fast, rate by the appropriate choice of eigenvalues but not according to any arbitrary path, e.g. it may not be possible to prevent rapid oscillations or uncomfortable phase relations between certain variables. It is also possible to extend this stabilisability concept to models with parameter uncertainty - that is, uncertainty about the 'true' values of ax and bx in (3.1.1). This is known as stochastic stability (Turnovsky, 1977). 3.13
Stabilisability
The distinction between dynamic and path controllability is important for the general problem of policy design and not only because there is no obvious terminal date or final state which we should aim at. Consider the problem of stabilising an exchange rate by intervention from a stock of foreign-currency reserves. There are two targets: the exchange rate itself, and the level of reserves (which cannot be endlessly run down, or accumulated beyond the available financing either by directly borrowing abroad or by borrowing from foreign governments and international agencies). Dynamic controllability may be satisfied, but path controllability can never hold, because the targets
3.1 Classical control
35
are not statically controllable and can only be held on track for two periods. This means that the exchange rate will always tend to drift away after achieving its pre-assigned ideal level, and will therefore need repeated intervention just to bring it back to the 'stabilised' path. This stabilisation exercise will be much more expensive than when a second instrument could be introduced to ensure path controllability. There are a variety of other intervention instruments among the usual monetary and fiscal policy measures which might be used. Path controllability may nevertheless seem irrelevant because it continues to treat the market as deterministic. There is never anything we can do to prevent current stocks pushing the actual targets away from their ideal values. But we can attempt to damp out the consequences of those shocks as fast as possible thereafter, in order to maximise the speed at which future target realisations return to their pre-assigned path. If the best we can do is to guarantee that targets hit their ideal paths in expectation, then we should aim also for guaranteed stability about those paths. Then, whatever shocks strike the system, the economy will tend to return to that ideal with fixed policies (i.e. those computed to steer towards the expected target values). This can always be done if the system is dynamically controllable. Hence a dynamically controllable system is always stabilisable. It may help to think in terms of a servo mechanism, or feedback-control rule, where instruments are set in response to past successes or failures in the endogenous variables. We have already used this idea with the model in statespace form: the control rule (3.1.13) was applied to (3.1.11) and resulted in (3.1.14). We can do the same for the conventional reduced-form econometric model (2.1.4). Consider applying the feedback-control rule: xt = Ktyt.^kt
(3.1.18)
where Kt and kt are to be chosen. The controlled system will then behave as: Ut
(3.1.19)
The intervention has therefore altered the growth-path characteristics from those defined by A to those defined by (A + BKt). If the system is dynamically controllable we can choose Kt to achieve any growth characteristics including any suitable degree of stability - that we like. The question of the 'best' choice of Kt is a matter of picking optimal policies. This is a matter for optimal-control techniques. In picking Kt, what freedom do we have to impose the characteristics we wish on the economy? The system's dynamic structure cannot be arbitrarily changed or annihilated by choice of Kt because to construct an arbitrary matrix (A + BKt) for (3.1.19) requires sufficient instruments (i.e. n > g and B non-singular). (But if we are concerned with only the m targets in yt, premultiplying (2.1.19) by the selection matrix S implies we would actually
36
3 OPTIMAL-POLICY DESIGN
need n> m and static controllability again.) However, given dynamic controllability, we can pick Kt so that (A + BKt) has an arbitrary set of eigenvalues. This follows from the pole-assignment theorem (Wonham, 1974): for any real matrices C (of order g) and D (of order g x n) and a set of arbitrary complex numbers \i^ j=l,...9g, then, provided r(D9CD,.. .,C9~x) = g where r(.) denotes rank, a real matrix F exists such that (C + DF) has eigenvalues /i;,; = 1 , . . . , # . In this case we can impose any desired cyclical and growth characteristics on the model, but not an arbitrary dynamic structure. This result is independent of the values yf, and of the numbers of targets and instruments. In particular we can choose Kt such that the spectral radius p(A + BKt) < 1 for all t. That means we can always impose stability on the economy. 3.1.4
The distinction between stabilisation and control
It is perhaps useful to emphasise the difference between stabilisability and the imposition of an arbitrary dynamic structure. The former implies only that the eigenvalues of A + BKt can be selected at will, whereas the latter implies that every element in that matrix may be picked as well. Now the canonical form of that matrix is: (A + BKt) = WM
W~1=YJ
iijWjWj
(3.1.20)
where W is a matrix of eigenvectors and M one of eigenvalues /i. We have written the ;th column of W as Wj9 and thejth row of W'1 as Wj. The amplitude, cycle length and phase relations (i.e. the leads or lags) between the component cycles of (3.1.20) are all determined by the real and complex parts of fij (Theil and Boot, 1962). But the dynamic characteristics of the realisations of each y, element are dictated by how those component cycles combine; and the way in which the cycles combine is determined by the real and complex parts of the elements of W. As we have just seen, stabilisability allows us the pick of the component cycles via M. Consequently these component cycles combine to affect any given target in a way which depends on the elements of W as well as M. Hence, through stabilisability, we can arbitrarily control the amplitude, length and stability of economic cycles. But without control over Was well we cannot complete the choice of dynamics by determining how the component cycles combine within any particular target. That means we cannot force targets to hit pre-arranged values in each period; but we can ensure that they necessarily tend to such values over a longer horizon. Moreover, it is not possible to extend the pole-assignment theorem, and hence stabilisability, to a 'decoupled'-policy regime in an arbitrary economy. A 'decoupled'-policy regime is one in which one policy instrument is assigned exclusively to the achievement of each target (and vice versa) so that K
3.1 Classical control
37
becomes a diagonal matrix. We have used feedback-policy rules throughout this section since the optimal interventions can usually be written in this form (see below). Stabilisability is therefore a stochastic extension of dynamic controllability. But, because the complete dynamic structure cannot be selected at will, the stabilisation of a stochastic system is effectively the same thing as steering it. For example, we might pick policies (and Kt) by specifying suitably smooth, desired target and instrument values, and getting as close to them as possible. This is the standard control-theory approach. But it fails to guarantee that the consequences of a stochastic shock are damped out as rapidly as possible, or that the variance of each target is reduced, or that the way that cycles combine in the controlled system is necessarily satisfactory. For example, altering the cycles in the system's dynamics may lead to shocks setting up cycles with low amplitude but high frequency when the targets were intended to follow a smooth trajectory. If the short cycles become relatively more important for some targets as a result of the control rule, then the variance may be increased. There are well-known examples in Baumol (1961) and Howrey (1967,1971) where 'stabilisation', despite reducing the modulus of the roots of the system, actually increases the amplitude of the cycles for certain variables and also increases the variances of the endogenous variables. Policy design can hardly be said to be successful if the target variables are the ones with the increased variances or amplitudes.
3.1.5
An economic example
The economic example of section 2.5 can be extended to examine the role of feedback. In a discussion of the policy aspects of his model, Black (1975) considered various kinds of feedback rules for the monetary instrument, ma, designed to stabilise the economy. One possibility is to have a feedback rule for the money supply which depends on output deviations from trend due to the actual endogeneity of the money stock resulting from cyclical and autonomous movements in the fiscal stance of the public sector: ms = g-wy
(3.1.21)
Substituting (3.1.21) into (2.5.1) and re-deriving the differential equation for y we have: y + ay + a[w + v(l - bc)~]y + abvy = 0
(3.1.22)
for which the necessary and sufficient conditions for stability are then: a2[w + i;(l -be)] >abv w/v + 1 > be
38
3 OPTIMAL-POLICY DESIGN
Thus some sufficiently large w can ensure that both these conditions are satisfied if the original open-loop system is unstable, i.e. if the roots of the characteristic polynomial of equation (2.5.5) are in the left half of the complex plane. Questions of stabilisation policy, however, are not concerned only with whether an economic system is stable in this sense. An economy may be asymptotically stable in the sense that for the model described above the response of y to an exogenous shock eventually dies away, so that output returns to its normal level. If, however, the roots of the open-loop system are close to unity the amount of time which elapses before convergence may be considerable. Complex roots, moreover, will mean that the economy will oscillate about its equilibrium path. The feedback rule (3.1.21), although it can be used to ensure the asymptotic stability of the closed-loop system, is too simple to give adequate influence over dynamic behaviour. Consider instead the state-feedback rule for the money supply: (3.1.23)
rrft = kzt
where k is a 3 x 1 vector. Substituting (3.1.23) into (2.5.7) we have the closedloop system: (3.1.24)
z=(A + bk)z where a (A + bk) =
av{\ — be) abv
1 0
— akx
1
0
0 0
0
0
+
— ak2
— ak3
0
0
0
0
(3.1.25)
So the money supply is a linear function of all three state variables. In contrast the feedback rule (3.1.21) describes the money supply as a linear function of a single state only (the first state, zx). The problem now is to choose the elements of the vector k so that the coefficients of the closed-loop characteristic polynomial take whatever values are desired. The closed-loop system matrix (3.1.24) has the characteristic polynomial: (3.1.26) with Sj = a(akx - 1)
(3.1.27)
s2 = a2bvk3 + av(l — bc)(ak2 —
(3.1.28)
5 3 = abv(ak2 — 1)
(3.1.29)
Suppose the decision was to set s1 = s2 = s 3 = 0, equivalent to locating the closed-loop poles at the origin in the complex plane. This would ensure, due to the feedback rule for the money supply, that output - even when there is an
3.2 Deterministic optimal control
39
exogenous disturbance - will never deviate from its normal level. Solving for kx from (3.1.27), k2 from (3.1.29) and then k3 from (3.1.28) we have the feedback coefficients:
/c1 = l/tf;
k2=l/a;
k3=l/(a2bv)
Thus the first two feedback coefficients are equal to the reciprocal of the responsiveness of output to excess money balances while the third coefficient depends also on the elasticity of inflation with respect to excess demand (the slope of the Phillips curve) and the speed of adjustment of inflationary expectations. The comparatively straightforward way in which we were able to choose the feedback coefficients is entirely a consequence of the fact that there is only one policy instrument which enters the state-space form in a simple way. With more complex systems matters are less easy (Elgerd, 1967; Barnett, 1975). The application of the state-feedback rule (3.1.23) to (2.5.7) means that, with the closed-loop poles located at the origin, the ability of output to recover from some initial disturbance is considerably enhanced. This is only achieved, however, by requiring that the money supply adjust very rapidly to any disturbance. The further to the left in the complex plane the closed-loop poles are set, the faster convergence to the zero state can be achieved. To make the system move fast, however, may require the amplitude of instrument changes to increase considerably. Since, in practice, the instrument amplitude will be bounded, if only by institutional and political constraints, there is an implicit trade-off between the speed with which an economy can be moved to a desired state and the freedom with which policy instruments can be varied. It may well be realistic to bound the permissible fluctuations in the instruments (or at least penalise them) in order that they shall remain within what is 'acceptable' even if this entails some loss in the degree of effective control. One response to these perceived shortcomings of classical control methods is to devise explicit criteria by which the performance of a system can be judged, and for this we turn to optimal-control theory. 3.2
DETERMINISTIC OPTIMAL CONTROL BY DYNAMIC PROGRAMMING
An 'optimal' economic policy means one which is 'best' in some sense. But before the term can be given a more precise meaning we need to be able to define what is preferred in order to provide a measure of what 'best' means, and to be able to spell out the range of possible alternatives among which a choice can be made. A mathematical statement of the optimal deterministic economic-policy problem consists, first, of a description of the economy provided by equation (2.1.4): y^Ay^^Bx^Ce,
(3.2.1)
40
3 OPTIMAL-POLICY DESIGN
Secondly, we need a description of the task which is to be accomplished in terms of the targets of economic policy and the possible constraints and, thirdly, we need a statement of the criterion by which performance is to be judged. The criterion most commonly used to embody the structure of preferences is an additively separable quadratic function: J = 1/2 £ (dy'tQtSyt + dx'tNt6xt)
(3.2.2)
t= i
The matrices Nn t = 1 , . . . , T are assumed to be symmetric and positive definite while the matrices Qt, t = 1 , . . . , T are assumed to be symmetric and semidefinite. The deviations 5y and dx are defined by:
Sxt =
xt—xdt
where the superscript d denotes the desired (or preferred, or ideal) value for the target or instrument. The diagonal elements of Q and N penalise these deviations. The problem is then to determine the vector of instruments (xt: 1 ^ t ^ T) from the first time period to some (unspecified) horizon at time period T, such that the objective function (3.2.2) is minimised subject to the constraints of the econometric model described by (3.2.1). The basic characteristic exhibited by dynamic programming is that a Tperiod decision process is simplified to a sequence of T single-period decision processes. This reduction is made possible by what is called the principle of optimality (Bellman, 1957): 'An optimal policy has the property that, whatever the initial state and the initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.' If an economic policy (x*: 1 ^ t ^ T) is optimal over the interval t = 1 to t = 7", then it must also be optimal over any subinterval t = T to t = T, where 1 < T < T. Take a simple example. Suppose someone wanted to travel from London to Edinburgh by the shortest and least expensive route, and this was found to be to go via York and Newcastle upon Tyne. Now suppose the traveller had reached York and nothing had changed in the sense that no new information had been obtained about some alternative route, then it must be the case that the optimal route from York to Edinburgh is still to go via Newcastle. For the principle of optimality to hold, the minimum of (3.2.2) must be decomposable in the following way: min[J(r, T)] = min{J(f, T- 1) 4- min[J(T, 71)]} (3.2.3) where the more explicit notation is designed to make it clear to which period of time the cost applies. Thus the cost of an optimal policy from periods t = 1
3.2 Deterministic optimal control
41
to T must be equal to the sum of the cost of an optimal policy from period t = 1 to T— 1 and the cost of the optimal policy for the remaining period T. Formally, the necessary condition for the principle of optimality to hold is that the constrained-obJQdive function must satisfy the Markov property: 'After any number of decisions, t, we wish the effect of the remaining T—t stages of the decision process upon the total return (the constrained objective function value) to depend only on the state of the system at the end of the rth decision and the subsequent decisions' (Bellman, 1961). As can be seen from (3.2.3) - or from general considerations - this amounts to requiring the constrained-objective function to be additively recursive with respect to time so that the decision variables themselves are nested with respect to time intervals. If that is not the case, obviously it will not be possible to optimise the sum of objective-function components by optimising component by component sequentially backwards (as in (3.2.3), or in (3.2.5), (3.2.8) and (3.2.12) below). Thus if the unconstrained objective can be written J = lI=iJt(yi>xt)> t n e n the constrained objective must be of the form J = Yl=i(xt>--->xi)- This necessary condition for the optimality of dynamic programming is important because dynamic games and models with rational expectations introduce mutual dependence between time periods, rather than the recursive dependence found in conventional dynamic models. In those cases this necessary condition will be violated. Consider the optimisation problem for period T, the last period of the planning horizon, where xT has to be chosen to minimise: J(T, T) = l/lidy'jQjdyr + Sx'TNT5xT)
(3.2.4)
Expanding (3.2.4) we have: J(T, T) = l/2(y'TQTyT
+ x'TNTxT
- 2y'TQTydT - 2x'TNTxdT + ydTQTydT + xdTNTxdT) (3.2.5)
After substituting (3.2.1) into (3.2.5) and differentiating with respect to xT we have the optimal-feedback rule: x*T = KTyT_1+kT KT=-(NT
+
(3.2.6) B'QTB)-1B'QTA
Equation (3.2.6) is arranged in the form of a feedback rule or law. The first term on the right-hand side is referred to as the feedback matrix. The second term is the tracking gain comprised of desired values for the policy instruments, and terminal weights on the targets and instruments. The tracking gain is thus a constant independent of the state of the system.
42
3 OPTIMAL-POLICY DESIGN
Substituting (3.2.1) and (3.2.6) into (3.2.5) we have: J(T,T)= l/Ky'j^Pryr. 1 - 2y'T_xhT + aT) where pT = (A + BKT)'QT(A + #K T ) + K'TNTKT d
fcT = (.4 4- BKT)'QTy
T
(3.2.7) (3.2.7a)
- K'rNTx*
d
a T = k'TNTkT - 2QTy T(BkT d
+ CeT) -
(3.2.7b) d
2kTNTx T
d
+ yrGryr + x TNTx T + (BfcT + CeT)'QT(BkT + Ce r )
(3.2.7c)
The quadratic form of (3.2.7) provides an expression for the minimum cost at the terminal period. For some intermediate time period where T can be period T — 1 we can proceed by assuming that the solution will also be of quadratic form like (3.2.7). Thus: J(t, T) = min 1/2(5^6,^ + Sx'tNtdxt) + J(t + 1, T)
(3.2.8)
This is a difference equation with the boundary condition provided by (3.2.7). The differential of (3.2.8) with respect to xt is: a/(T,7) dxx
=
a/(T,T) dxx
t
dxz
dyxdJ(x+l,T) dyx
where the second term on the right-hand side follows because J{x + 1, T) is an implicit function of xT through the effect of x t on yx and yx on J(T -f 1, T). Rewriting (3.2.8) in the expanded form we have: J(7\ T) = l/2(>;er>'t + x;NTxt - 2yfxQxyd - 2x'xNxxd + >f &>? + xf Nrx?) + J(T + 1, T)
(3.2.10)
which, when a quadratic expression for J(T + 1, T) like (3.2.7) is substituted into (3.2.10), gives: - 2x'xNxxdx + /x'Qxyd + xd'Nxxd-] +ax + 1
(3.2.11)
where note that time period x + 1 is assumed to be the terminal period. The structure of (3.2.11) is very similar to that of (3.2.5). The expression only contains terms in yT and xT and current and future constant terms embodying system parameters, desired values and exogenous assumptions and preferences. So if we substitute y\ out, using (3.2.1), and differentiate with respect to xr we have: x? = K r y T _ 1 +fc T
(3.2.12)
3.2 Deterministic optimal control
43
We now find that the optimal feedback and tracking gains depend upon future as well as current system parameters, exogenous assumptions and preferences. Substituting (3.2.1) and (3.2.12) into (3.2.11) and rearranging again we have: J(T,T) = y't_1PTyt_1-2yx_1hr where now
+ a.
(3.2.13)
PX = (A + BKx)'(Qr + *\ + i)(A + BKX) + K'XNXKX
(3.2.13a)
hx = (A + BKxy(Qx + Px + 1/x-hx_1)-K'xNxxdx
(3.2.13b)
ax = ax_t + k'xNxkx ~ 2(Qzyf + K + l){Bkz + Cex) - 2kfxNxkx + /X'QX/X + * M
(3.2.13c)
We thus have the optimal-feedback solution for two time periods. The solution for further periods can now be obtained by using repetitions of the backward recursion described by equations (3.2.3) to (3.2.13). The recurrence equations (3.2.13a) to (3.2.13c) are matrix difference equations, also known as Riccati equations in continuous-time problems. They can be solved from the boundary conditions provided by the terminal-period equations (3.2.7a) to (3.2.7c). The optimal-feedback rule for time period T is for the deterministic case dependent on all future feedback rules, desired values and therefore exogenous variables up to the terminal period. Note also that the feedback matrix KT, although time-varying, is only a function of system parameters and the weights embodied in the preference structure of the objective function. On the other hand the tracking gains take account of the desired trajectories as well as of current and future exogenous variables. In the stochastic case future exogenous variables are only known in expectation. If assumptions about future exogenous influences were to change, this would only shift the timevarying optimal tracking gain but would not alter the feedback response. The feedback matrix is essentially backward-looking, since, even though it is affected by future preferences, it responds only to unanticipated disturbances after the event. The tracking gain is forward-looking and takes account of anticipated exogenous influences. For the stochastic case these tracking gains would therefore have to be 'kept up to date' by re-evaluating all kx for t^x^T, if one current expectation of a future exogenous variable, say Et(eT), is revised at t. In principle these re-evaluations will be repeated at each t=l,...,T. 3.2.1 A scalar example Some aspects of the use of a quadratic objective function in the optimalcontrol solution can be clarified with a simple scalar example: St
(3.2.14)
44
3 OPTIMAL-POLICY DESIGN
where exogenous influences are excluded. The objective function for a twoperiod problem is:
J=l/2 iiqjrf + Sx?) t= 1
where the weight on the instrument is assumed to be unity and the same for both periods. The feedback rule for period t + 1 is: xt + l=Kt + iyt + kt + l
(3.2.15)
with
There are a number of features of this (single-period) optimal result worth noting. If we take the limiting case where there are no costs of adjustment attached to variations in the policy instrument, then the optimal-feedback rule for the single-period problem is: -^r-i)
(3.2.16)
which when substituted into (3.2.15) gives: >>, = >>? + £, So with zero adjustment costs the autoregressive response of yt to an innovation in ef is completely eliminated. The target fluctuates randomly about its desired value with variance of as compared with a larger variance of (1 + a2)o2 in the open-loop case. With policy-adjustment costs the autoregressive coefficient on yt_x is no longer zero but is: X = a - qab2/(qb2 + n) and given n ^ 0, |A| is always less than |
Ktyt^l^-kt
where Kt= -pt with
2
+ lab/(pt + 1b
+ 1)
3.3 The minimum principle
45
A considerable simplification to these expressions can be achieved if we consider again the limiting case of zero adjustment costs attached to policy instruments. The feedback and tracking gains are then: j£
"\ii ' - i t t i
ir-ri/
f"J 9 17 ^
It can be seen that the first-period optimal policy in this two-period problem depends not only upon the lagged value of the target variable and the current-period desired target but also upon the desired value of the target in the next time period and the relative weighting attached to each period. Consider further the case in which the policy-maker attaches equal importance to attaining target values in each period (qt = qt + l). In this case the feedback rules described by (3.2.17) contract to: xt +
l=a/byt+l/by?+1
Thus, although the optimal-feedback rule is time-varying, the feedback part is a constant with only the tracking gain varying with time. Only if desired targets take a constant value over time does the optimal-feedback rule reduce to the constant (steady-state) form:
3.3
THE MINIMUM PRINCIPLE
The original solution of the discrete-time, optimal-control problem in economics using the minimum principle was provided by Pindyck (1973). But his derivation is for a strictly 'proper' model, i.e. policy instruments and exogenous variables only affect endogenous variables with a minimum lag of one period. Since this is an unnecessary restriction we derive the solution for an improper version of the model, and for time-varying penalties in the objective function. The particular form of the solution will depend upon whether we use a minimal or non-minimal realisation of the model. The non-minimal realisation was used in the previous section for dynamic programming. Although we could have used a minimal realisation, it is important to note that, before this can be done, there are a number of transformations which we have to carry out to be able to work with an objective function which has been specified in terms of targets and instruments rather than states. With the nonminimal realisation of the previous section this is not necessary since there is a one-to-one correspondence between states and targets and instruments.
46
3 OPTIMAL-POLICY DESIGN
In the case of Pindyck's original derivation, although he worked with a minimal realisation, transformations to the objective function were unnecessary because the model was strictly proper. Thus it is only when improper models are realised that transformation problems arise because of the non-trivial nature of the observation equation (2.2.2). The solution of this section will be for a non-minimal realisation of an improper model, but it is worth observing the transformations which would be needed if a minimal form were used and we wished to write the objective function in terms of states. To solve the problem as one of state-tracking we rewrite (3.2.2), using (2.2.1) and (2.2.2): T
J = 1/2 £ (5z'tJtSzt + dx'tLt5xt) t= i
where Jt = HQtH Lt = Nt + D'xQtDx and with / / + the Penrose pseudo-inverse of H. The values for the actual target variables are then easily recoverable from: If the system description is strictly proper, i.e. Bo = C o , implying Dx = De = 0, then the transformations are trivial. The Hamiltonian for the non-minimal realisation is a function defined as: H(yt9xt9kt)=l/25ytQtyt+l/25x[NM^^(Ayt-1+Bxt
+ Cet)
(3.3.1)
where Xx is a vector of Lagrangean multipliers (co-state variables). The necessary conditions, as set down by the minimum principle, are evaluated about the optimal paths, y*9 x*, A*, for t= 1, T: ~
= rf - yt-1 = Ayt_x + BxT + Cet
-SH = X* - i*_x = -Qt5yt - A'Xfdy*
(3.3.2) (3.3.3)
The particular first-difference form of the model is provided by defining A in (3.3.1) as A = (A — I) (Pindyck, 1973, ch. 5). Initial conditions are provided by ^o = £• Terminal or transversality conditions are not as straightforward when the model is improper. In strictly proper models, since the terminal state is unaffected by the terminal value of policy instruments, the terminal cost is
3.3 The minimum principle
47
simply a function of the terminal state. The determination of the transversality conditions in improper models is examined later. The minimum of the Hamiltonian is provided when: ^ =0 dx* i.e. -N;lB'Xt + xdt (3.3.4) The optimal co-states can be determined by assuming that they are linearly related to the optimal states, of the form: V = Pt + iy? + ht + 1 (3.3.5) x*=
Substituting (3.3.4) into (3.3.2) and (3.3.3) and then substituting (3.3.5) into the result and rearranging, we have: 1
1
f
(3.3.6) (3-3.7)
where
Substituting (3.3.6) into (3.3.7) and rearranging we have: A)'Pt + 1 + Q^D^BN^B' + [(/ + A)'Pt + 1 + fijD," W
+ (/ + A)'}ht + l
+ Cet) ' Qt/t = pty?-i + K
(3-3.8)
Equating coefficients on each side of (3.3.8) provides expressions for Pt and hv Substituting (3.3.5) into (3.3.4) and substituting (3.3.6) into the result, we have the feedback equation: x* = Kty*.1+kt
(3.3.9)
where kt = [(Nr'B'Pt^Dr'BN-' Terminal conditions for the recurrence equations for Pt and ht are provided by the solution of the terminal-period problem: PT = Q(I + A) d
hT = Q(Bx T + CeT - /T)
(3.3.10a) (3.3.10b)
48
3 OPTIMAL-POLICY DESIGN
Once again the policies are determined by means of a feedback relationship (3.3.9), comprising a feedback matrix and tracking gains determined from a prior backwards recursion from KT and kT to Kx and k1. We start from (3.3.10) with P T + 1 = 0 and kT+1 = 0, determine KT and /cT, then D'T_l9 hence PT-i and KT_X via (3.3.8) and so KT_X and kT_1 by (3.3.9). And so on. 3.4
SEQUENTIAL OPEN-LOOP OPTIMISATION
The third method of choosing optimal policies uses the stacked form of the econometric model (2.3.2). We shall continue to seek the minimum of a quadratic objective function, so we first transform the stacked form into a model in deviations from desired values, i.e.:
(3.4.1)
SY = R8X + b
wherefr= s - y We also need to rewrite the additively separable quadratic objective function (3.2.2) in the stacked form: J=
l/2(5Z)'Q(5Z)
(3.4.2)
where Qi
0
Mi
0 MT
0
QT
0
M\
0
Wi
0
0
M'r
0
NT
M
:]
and
SZ' = (dY':8X') This formulation is slightly more general than (3.2.2), having cross-product terms, M*, in the objective function. In a deterministic framework the complete s-vector would be known in advance in the first planning period t = 1. Consequently we may minimise (3.4.2) subject to (3.4.1) by substitution to give the constrained objective function:
j = \/2(dxys'QS(5X) +
+
(3.4.3)
where Sf = (R':InT). Differentiating (3.4.3) with respect to the policy instruments yields the optimal-policy choice directly:
=
Xd-(S'QSr1S'Q\^\=Xd-M-1m
(3.4.4)
3.4 Sequential open-loop optimisation
49
where M=(S'QS)
and m = S'(
This implies the optimal target choice: Y* = RX*+s
(3.4.5)
Alternatively we can write the target choice as: Y* = Yd + [ / - RiS'QSyiS'Qiyi
b
(3.4.6)
There are three points to be made about this solution. (i) If the word is genuinely deterministic, so that s is known, then (3.4.4) provides the full set of optimal values for the policy instruments for all time periods up to T, the horizon of the planning interval. (ii) A sufficient condition so as to guarantee a minimum to (3.4.3) and the uniqueness of the policy choice (3.4.4) is that S'QS should be positive definite. This of course would hold if Q itself were positive definite. (iii) It is not, however, necessary for Q to be strictly positive definite to ensure the existence (or uniqueness) of the policy (3.4.3). Positive semidefiniteness will also provide a unique minimum so long as r(Q) ^ nT, since r(S) = nTby construction. This rank condition permits an extra degree of flexibility in specifying the policy problem. For example, sections of any Qt and/or of Nt91 = 1 , . . . , T, can be set to zero (as well as corresponding rows and columns of Mt) while the rank condition r(Q)^nT is satisfied. Thus objectives may be incompletely specified in this approach. Even when m > n (there are more targets than instruments) it is possible to solve problems with no penalties on the instruments (which is not possible with the dynamic-programming approach) 2 , or no penalties on the targets, as well as any combination in between. It is not necessary, therefore, to penalise the use of all instruments in every period or of all target variables in every period (or, indeed, to provide penalties for every period from t= 1 to T). In this sense the intermittent use of instruments and the intermittent pursuit of targets would present no difficulties. This extra flexibility is not available with the alternative optimisation procedures examined earlier although the practical value of this in economic policy-making has not attracted much attention.
50 3.5
3 OPTIMAL-POLICY DESIGN UNCERTAINTY AND POLICY REVISIONS
The policy-section procedures derived in previous sections contain three possible sources of uncertainty: (i) Uncertainty in the variables - the exogenous influences, et, and the random disturbances, st - denoted additive uncertainty, (ii) Uncertainty in the parameters - in the reduced form (2.1.4) this is reflected in A9 B and C, and in the stacked final form it is reflected in R and within the model components of s - denoted multiplicative uncertainty, (iii) The parameters of the probability-density functions of the uncertain components vary over time. In this chapter we restrict ourselves to additive uncertainty. Multiplicative uncertainty is examined in chapter 4. 3.5.1
First-period certainty equivalence
Consider the stacked final-form representation of section 3.4. The optimising value of 5Z depends explicitly on b - and through that on et and et for all of t = 1 , . . . , T, so that its choice must be conditioned on some values for these variables. To that end we replace b by its expectation, conditional on all the information available when (3.4.4) is calculated. This may be justified by the certainty-equivalence theorem: min[£(J)|<57 - RdX - b = 0] = min[J|<5Y - RdX - E(b) = 0] X
(3.5.1)
X
where E(.) is the expectation operator. The minimum of the expected value of the quadratic objective function is equal to the minimum of the objective function under no uncertainty, where the random elements in et and et are replaced by their expected values. Proof If the objective function is quadratic, i.e.: E[J(dY, SX)~] = l/2E(5Z'QSZ) then, from the definition of 3Z:
and also:
(3.5.2)
3.5 Uncertainty and policy revisions
51
Now expanding the quadratic using (3.4.1): E(dZ'QdZ) = E(R6X + b)'Q*(RSX + b) + 2(R6X + b)fM*'5X + SX'N*SX (3.5.4) Hence: E\7(6Y, dX)~\ - J[E(SY), 8X] = l/2[E(b'Q*b) - E(b)'Q*E(b)~] = 0
(3.5.5)
Since (3.5.5) is independent of 5Z, certainty equivalence holds. Hence, if we wish to maximise the expectation of J subject to (3.4.1), we have a strategy to deal with additively uncertain variables. We would get the same optimal values for the policy instruments, X, if, instead, we were to maximise J subject to (3.4.1) but with b replaced by E(b). 3.5.2
Certainty equivalence in dynamic programming
The certainty equivalence principle can also be seen to hold in the dynamic programming solution of section 3.2 if (3.2.1) and (3.2.2) are written: yt = Ayt_1+ Bxt + Eet\t + et
(3.5.6)
where et\t is the expectation of the exogenous influences held at time period t and referring to t. It is assumed that:
et = E[eA J + wt where wt is distributed with zero mean and variance oaw. The random term wt is assumed to be independent of et9 yt and er. For the terminal period the objective function is: E[J(T, T)] = E{y'TQTyT + x'TNTxT - 2y'TQTydT - 2x'TNTxdT /T + x(NTxdT)
(3.5.7)
If we substitute (3.5.6) into (3.5.7), noting that the expectations for the exogenous variables is conditioned at t (i.e. eTt) and differentiating with respect to xT, we produce the feedback law identical to (3.2.6). Since xT is independent of wT and eT (conditional on information available at t) the optimal-feedback law for the terminal period is, under uncertainty, identical to the optimal-feedback law when there is no uncertainty. That certainty equivalence holds for all time periods can now be established by repeating the inductive process of the dynamic programming (solution of (3.2.7) to (3.2.13) but substituting (3.5.6) for (3.2.1). However, the policy rule for period 1, xt = Kx y0 + kx, depends on the backwards recursions of (Pt9 ht) and {Kt, kt) for t = T,..., 1; and hence onet,..., ex which appears in b o t h ht,...,h1 a n d kt,...,kl. I n fact each kt d e p e n d s o n et9...,el. Thus certainty equivalence implies that, in practice, kx will be computed using
52
3 OPTIMAL-POLICY DESIGN
eT\u • • •> e\\\ m those backward recursions. If our expectations are adjusted at period t, then the whole sequence kT,...,kt will have to be computed over again. 3.5.3 Restrictions on the use of certainty equivalence The arbitrary substitution of E(J) for J as a way of capturing the uncertainties in et and et may well be thought questionable. Even so, the certainty-equivalence result would not hold if any one of the following happens to be true, since extra terms involving 6X would appear in (3.5.5): (i) If the non-linearity of the objective function is other than quadratic. (ii) If there is an uncertainty in the parameters. For the stacked final form, stochastic parameters would introduce the extra term: E(5X'R'Q*RdX) * dX'E(R)'Q*E(R)5X
(3.5.8)
into (3.5.5), where this extra term depends on 5X. (iii) If there are non-linearities in the linkages between targets and instruments. Note, however, that non-linearity in any of the components of b does not vitiate the certainty-equivalence result. Looking back at (3.4.4) we find that the optimal policy SX is a linear function of b (which itself is linear in et and et) so SX* is fixed if the expectations of et and et remain unchanged. This property holds for et because E(et) = 0 by assumption. There is the possibility, however, that E(et) is obtained from an auxiliary model or alternative forecasting methods (ARIMA models, for example). Alternatively st may be known to be serially correlated. In these cases the certainty equivalents may be stochastic so that SX* is stochastic. One of the strengths of the certainty-equivalence theorem is that we can avoid having to specify probability-density functions (PDFs) f(et) and f(et) for et and ev Obtaining information on the PDFs can be difficult and what can be obtained may be arbitrary and of uncertain validity. In fact, all we are assuming is that J etf(st)det and j etf(et)det both converge to some fixed non-stochastic values. These values (or our estimates of them) may well be time-varying, and the density functions (subjective or not) can alter as we progress through time. 3.5.4
Optimal-policy revisions
At each t we can only implement the policies in the subvector x* of (3.4.4), that is, the policies calculated for the first time period of the T-period optimisation. As new information becomes available, it is rational for policy-making to adopt the strategy of reoptimising (3.4.4) at each t to obtain the remaining T—t+l periods (where possibly even the horizon is constantly being rolled
3.5 Uncertainty and policy revisions
53
forward). That is for each t = 1 , . . . , T we reoptimise:
where the subscript (t) is being used to indicate the time period in which the optimisation occurs. But again only the leading subvector dx? is actually implemented. The remaining future policies dx*+ x to Sx$ may be thought of as forecasts of future plans. Given uncertainty, we shall be concerned of course in reoptimising expected values, conditioned on currently available information: Et(J) = E(J\Qt) where Qt is the information set available at the start of time period t. This means that we have the opportunity to include all known past values of x r _, and 5 r _ f , for / > 1, in addition to the latest revisions to our expectations of future uncertainty exogenous variables Et(st+j), for j ^ O , making up Et(b), which is used in the reapplication of certainty equivalence. Consequently, we shall have a sequence of T static optimisation with decreasing horizons and changing Et(b). This procedure is obviously easily extended to the case of rolling horizons (Athans et al., 1976). For each reoptimisation, X(t) is subject to choice. The remaining instruments are fixed at their historical realisations and act as additional constraints as parts of Q r The implication for this for the reoptimised policies is most easily seen by working in two stages: first in the objective function and then in the constraint set. For this purpose we need extra notation. Let <5Y' = [(5Yg':(5Y;r)]
and
5Xf = [5X*f :SX{t)]
be the partitioning of policy variables with respect to the past and the future. Thus
SZ*' = (SY*':5Xt') are the already implemented policies and the actual outcomes for the targets, while dZ{t)=[SY{t):SX{t)'} remain to be chosen (assuming a fixed horizon). We can likewise partition Q in terms of its submatrices associated with the past and the future:
and
54
3 OPTIMAL-POLICY DESIGN
Now, collecting terms in the objective function (3.4.2), we find:
The first term on the right of (3.5.9) is a constant with respect to the reoptimisation and can be ignored. The objective function for policy revisions at t^ 2 is of the same form as the original but with reduced dimensions. Naturally the presence of uncertainty implies that the revised policies would be computed to minimise Et(J), of which only l/2Z{t)Q{t)SZit) remains subject to choice. Of the constraints, the first m(t — 1) have been satisfied exactly by the past sequence of optimal policies. Thus:
o
R
)
d X
) fyoJ
because the lower subvector only contains constraints on the policy instruments which remain to be chosen at t. R and b have been partitioned:
U* «J *nd LJ where
(_R r ... K T _, + 1 J and R
-
conforming to the partitioning in Y and Z , with respect to the past and the future. Both bit) and Rt+5X$ are invariant to the choice of X(t). Therefore the constraints in the reoptimisation problem are of the same form as those in the original optimisation but with bit) replaced by b{t) + Rt+SX$. Policy revisions for t = 2,..., T are the same form as (3.4.4) for t= 1. Inserting the indicated modifications we find that:
j
(
l
)
)
(3.5.11)
This sequence of calculations will be multiplied certainty-equivalent if b{t) is
3.5 Uncertainty and policy revisions
55
replaced by Et(b{t)) at the calculation stage. This follows from the application of the certainty-equivalence theorem at each stage of the sequential process, i.e.: min[Et(J)\dY - RSX - b = 0, QJ = min[J|<5y - R8X -Et(b) = 0~\ (0
(0
(3.5.12) provided the known values Y$ and X$ have been inserted. Notice that £([fe(t)] ^ Et _ l [fc(t)] for two possible reasons. First, actual realisations for st _ x can now be inserted. Secondly, expectations of future events will probably be revised so £ r (s,_ H )^£,_ 1 (s f + I) for i > 0 . Recall from equation (2.3.2) that s is linear in both the non-controllable exogenous variables, et, and the model's additive random errors, e r Rewriting the expectations in 5, and replacing them where possible by actual outcomes, applies as much to non-controllable variables as to the additive-equation errors. Under the conditions stated above, each leading subvector x* of Xft) from (3.5.11) is said to be first-period certainty-equivalent at the moment of its implementation, as seen from the perspective of time period t, because it satisfies (3.5.12). If this requirement is satisfied at each r, then the complete sequence of such policies x* is multiperiod certainty-equivalent. 3.5.5 Feedback-policy revisions The sequential nature of the method of section 3.3 has been stressed. The optimal policy which is calculated at the beginning of the planning period provides a set of values for the policy instruments for all T periods, but only the first-period policy is actually implemented. Subsequent policies are recalculated as new information becomes available. In contrast, the feedback solution provided by the dynamic-programming algorithm is supposed to be designed to make reoptimisation unnecessary. Some of the reasons for this lie in the historical development of control theory. Automatic controllers which remove the need for continuous human intervention are obviously necessary for most engineering processes. The feedback rule: x, = K f y f _ 1 +fc l (3.5.13) is designed specifically to attenuate the effects of unanticipated disturbances coming through et on the target path. So the behaviour of the targets under feedback is governed by: yt = (A + BKt)yt _ x + Bkt + Cet + Et
(3.5.14)
In the majority of engineering applications there would not be a separate role for the non-controlled exogenous variables. The predictable component of et would be extracted and included as a constant and the remaining noise component added to et. This procedure could be followed in economic applications. But to do so might mean losing valuable information, especially
56
3 OPTIMAL-POLICY DESIGN
as the relative size of the random components in economic variables tends to be conspicuously large. Often this is not the case in engineering systems. Also large random disturbances necessarily imply large gains from revising policy.3 So at time period t we have a set of expected outcomes for et + i, for i^O, with which we can calculate the optimal feedback solution: x? = Ktyt.1+Et(kt)
(3.5.15)
where now we are recognising explicitly that the optimal-feedback rule is conditioned by information at t about the future path for the exogenous variables. As will be recalled from section 3.2.1, Kt, the feedback matrix, is dependent upon model parameters and preferences. Only kt, the tracing gain, is a function of current and future exogenous variables. Suppose that at t + 1 new information became available which resulted in the revision of forecasts for the exogenous variables. Through its effect on the tracking gain, current policy would be changed even if the random disturbance was zero. The policy revision is forward-looking, anticipating some future shift in exogenous influences. One way of describing this anticipatory response is to call it feedforward (Holly and Corker,. 1984). Feedback, in contrast, is solely backward-looking. Shifts in expectations of future exogenous influences require that the recursive Riccati equations are rerun backwards from the terminal period so as to compute the current-period policy revision. Optimal policy in this framework can be seen to involve a feedback provision which takes into account unanticipated disturbances coming through et and possible et via the prediction error (Et _ l (et) — et)9 and a feedforward provision which responds to anticipated disturbances in et+j forj^O. Only when the parameters of the feedback rule (only kt in the case of additive uncertainties) are continuously recomputed for 7 , . . . , t, at each f, can we speak of a closed-loop control being applied. Optimal closed-loop control combines, therefore, two features. First, there are the feedback adjustments as a result of variations in past outcomes Secondly, there are feedforward adjustments which anticipate changes in assumptions about future outcomes. The multiperiod certainty-equivalent procedure of (3.5.11) above automatically has both the feedback and the feedforward elements built into it. 3.6
SEQUENTIAL OPTIMISATION AND FEEDBACK-POLICY RULES
We have looked at techniques for optimising a dynamic economy under uncertainty. To reconcile the policy choices obtained by each technique and to show them all to be identical is rather complicated and involves extensive, not to say tedious, algebra. This is a topic covered elsewhere (formally by Norman, 1974, and somewhat heuristically in Hughes Hallett and Rees, 1983). It does not lead to much further insight into the use of the techniques themselves, so we shall not go into the details here. However, it should be
3.6 Sequential optimisation and feedback policy
57
obvious that all three techniques should give identical solutions since in each case we have demonstrated the uniqueness of the solution obtained. It will of course be understood that the same objective function, the same model and the same set of conditioning information, Qf, are in use at each and every stage. In fact, the main difficulty in demonstrating the identity between the three techniques arises because the sequential-optimisation approach does not yield optimal policies in the form of a feedback-policy rule. It is therefore not straightforward to see that, although the same information has been introduced into the policy-section problem, this information is being used in the same way. We shall therefore next rewrite (3.5.11) as a feedback rule. It is, however, important to remember two conditions which must be satisfied if the three techniques are to yield the same unique optimal-policy selection. First, the objective function must be additively separable with respect to time; and, secondly, the model must be causal, i.e. dyt/dxt+j = 0 for any j ^ 1. These two conditions are clearly satisfied here. We now set out the optimal-policy calculations at (3.5.11) as a sequential process. Recall that the introduction of the latest information in Et(b) may include innovations in the form of realisations of previously uncertain variables, e.g. -Ef_i(sr_r-) = s,_£ for i ^ l , and in the form of revised expectations of the future, e.g. Et(st+1) = Et _j(st + 1)forj^ 1 and i ^ 0. The first type of innovation will be derived from computed structural errors: e. = ^ 0 ( y . _ y . )
(3.6.1)
where Yt is the actual realisation of the endogenous variables of the model for period i ^ t — I. 4 Meanwhile Yt is their expectation at i with the benefit of hindsight; this is conditioned on the model's specification and parameter estimates, the actual values of the instruments and non-controllables up to period i, plus the actual values of the stochastic errors up to period i. Thus:
0
0. A. A1'1 d
(3-6-2) ...'A
0
where Y 0 and X 0 are the subvectors of Y and Xd corresponding to Yo and Xo from Y and X, ® is the Kronecker product,
Co =
c, " • =
d
d
:
A
A''
1
o
Yo + l
C
...
C
58
3 OPTIMAL-POLICY DESIGN
and Ro is rows 1 to m(t — 1) of R. Now define the non-singular matrix: 0. A. : A-
1 1
0 '••. ''••.'' 0 ... A
Then (3.6.2) becomes: - Co - Yd0
(3.6.3)
Hence, returning to (3.5.11) we can write:
'm + Rlt)Xft) + Rt+dXt
X*t) =
i - Co - Y'o + R0Xd0) (3.6.4) where d{t) contains all terms not involving either SYg or 5X$ and includes all the expectations variables. We have taken {t) from (3.5.11) and have written: A1
...
A2 corresponding to R,+ AT-r
Despite the rather complex nature of the expression in (3.6.4), we can clearly see the sequential or feedback nature of the optimal-policy calculations. It will be useful to keep this characteristic in mind, although of course there is no suggestion that (3.6.4) should be used for calculation purposes.
3.7
OPEN-LOOP, CLOSED-LOOP AND FEEDBACK-POLICY RULES
At this point it is important to distinguish three kinds of policy rule which can be applied to an uncertain and dynamic economic system. The differences between them reflect how uncertainty is handled in a dynamic framework. An open-loop policy rule is one where the values the policy instruments will take can be specified completely at the start of the planning period. An openloop policy is a function only of the information available at time period t = 1. For a given calculation procedure, represented by the general function g(.), to be applied at each t but calculated in some previous period j , where j < t, an
3.7 Open-loop, closed loop, feedback-policy rules
59
open-loop policy rule is: xt = gtJ(aj)
(3.7.1)
When there is uncertainty, the information set available at t may differ from what it was in the previous period j . In this case a feedback rule is one where the value of policy instruments at time t can also depend on the past evolution of the system: (3.7.2) A more general formulation is: xt = gM)
(3.7.3)
and we will refer to (3.7.3) as a closed-loop rule. It is common in the control and economics literature to treat the feedback and closed-loop rules as synonymous. If there are no revisions to exogenous assumptions, the two rules will coincide. Otherwise there will be a need to reoptimise fully as in the sequential approach or to rerun the tracking-gain recursions. It is also conventional to reserve the term 'open-loop' exclusively for policies based on Qx and computed at period 1; i.e. only for (3.7.1) where j = 1. We shall do the same. Then (3.7.1) for some 1 <j < t is a sequentially updated open-loop rule (see Athans et a/., 1976). If we examine the policy rules which have been generated in sections 3.1 to 3.4 we can identify examples of open-loop, feedback and closed-loop rules. Consider first the sequential optimisation procedure of section 3.4. The initially calculated X* in (3.4.4) (from which Xf will be implemented) is an open-loop calculation. Subsequent revisions Xft) in (3.5.11), from which the multiperiod certainty-equivalent sequential policies x* will be implemented, is evidently a closed-loop policy since Q, is used directly to evaluate x* and is also used in the reoptimisation procedure which produces the function gtt(.) in (3.7.3). As we noted in an earlier section, the importance of the distinction between a feedback rule and a closed-loop rule is that the feedback solution is essentially backward-looking and is specifically designed to take account of innovations (i.e. unanticipated disturbances) in et, and possibly in et, which have already occurred. Future anticipated exogenous influences have to be allowed for in the tracking gain. In a pure feedback solution the tracking gain is conditional on the information set Q 1? even though the feedback gain responds to Q r The closed-loop solution can be written: xt = KulYt.1+ktft
(3.7.4)
where the double subscripts indicate that the optimal feedback gain at t is that calculated at the beginning of the planning period, while the optimal tracking gain is redetermined at each t. Thus the closed-loop rule can be seen to
60
3 OPTIMAL-POLICY DESIGN
combine both a feedback response to unanticipated disturbances and a feedforward response to anticipated disturbances. In all the discussion we have assumed that there is no possibility of revising the model, or indeed the preference structure, during the planning exercise. In principle this is an undesirable restriction, given the great degree of uncertainty associated with the estimated parameters in econometric models, and in view of the lack of knowledge (and hence uncertainty) over the 'correct' specification of the objective function. One would therefore expect adjustments to the model or objective function to be an important aspect of economic-policy choice - although it does not appear to have been done in practice. In this case the complete sequence of feedback and tracking gains which are functions only of these parts of the policy problem would also have to be recomputed over alh' = T,..., t for each t = 1,..., T. Then we could have closed-loop policies only when xt is derived from Ktt and ktt in (3.7.5). In the sequential optimisation calculations of (3.5.11) the equivalence requires that the components >{t) and b(t) be made up from the new model and objective function components, before x*} is computed.
4 4.1
Uncertainty and risk INTRODUCTION
Reliability is obviously an important property for economic policies. Policymakers can understand that economic policy-making requires a model, historical data, projections of exogenous events, and some kind of preference function - even if these quantities are specified informally rather than introduced into an explicit optimisation framework. But the extent of the uncertainty about future events, and about the accuracy of one's model as a representation of the economy, means that economic-policy formulation is seldom as straightforward as dynamic-optimisation procedures would suggest. The main difficulty with econometrically based policy recommendations is that they are not robust to shocks and that they seldom incorporate any real information about the form and extent of the risks faced. Indeed, the usual certainty-equivalent decision rules are invariant to risk. Thus, knowing that the above-mentioned four features of policy formulation are in fact uncertain, policy-makers often prefer to follow their own judgement. Moreover, policy-makers particularly dislike committing themselves to policies which they suspect may later need frequent and substantial revisions. If they can be convinced of the robustness of some particular alternative strategy, they may very well prefer to accept that strategy even if it promises fewer of the benefits offered potentially by other policies. There is, therefore, a need to investigate whether the trade-off between a robust (or risk-sensitive) policy and one which is more ambitious is likely to be significant. If so, what does that trade-off look like in practice? Although risksensitive decision rules have been available for some time, the risk-ambition trade-off appears not to have been studied.1 The certainty-equivalence principle allows a very tractable and convenient separation between the stage at which an econometric model is specified and estimated and the policy formulation stage. As long as the uncertainty in a linear model is additive, the full closed-loop response can be determined
62
4 UNCERTAINTY AND RISK
sequentially by only taking account of the first moment of the probability distribution of outcomes. However uncertain economic activity is - where uncertainty in this case is measured by the variance of forecast errors - the structure of the optimal policy will be the same function of model parameters and the objective function. This does not mean, of course, that the actual policy pursued in terms of the tax rate or the level of interest rates will be the same whether the error variance is low or high, just that the coefficients in the feedback and tracking gains will be the same. If the economy is subjected to larger shocks, the policy responses will be greater. In particular, the feedforward response to some change in the expected future behaviour of an exogenous variable will be independent of the degree of additive uncertainty there is about it. If, for example, the dispersion about future expected oil prices were to increase because of changes in the political climate in the Middle East, this would have no effect on the optimal response to anticipated and unanticipated changes in oil prices. There are difficulties in reconciling the certainty-equivalence principle with the widespread role uncertainty plays in economic theory. Risk aversion and caution are recognised to be important influences on economic behaviour. Mean-variance analysis in portfolio theory places great stress upon the effects of the expected return on an asset relative to the risk attached. An asset which yields the same expected return as another asset but with much smaller risk as measured by the variance of returns is generally preferred, at least by investors who are averse to risk. Intuitively one would think that an economic policy-maker would be more cautious in the face of risks attached to the outcome of his policy than in their absence. But it is important to identify what kind of uncertainty it is we are dealing with and whether the uncertainty lies in the constraints which the policy-maker faces or whether the objective function itself explicitly penalises higher moments of the probability distribution of outcomes. In the first instance, we examine the case in which there is multiplicative uncertainty in the linear constraints. Previously we have assumed that the coefficients of the econometric model were fixed and known with certainty. There is, of course, a well-known problem of the interpretation of parameter uncertainty both in statistical terms and in terms of economic theory. The 'true' parameters may be constant but because we have to estimate them from samples there may be uncertainty about whether they are correctly estimated. Alternatively we can assume that the 'true' structural parameters are random variables but with known means. Aggregation of individual behavioural relationships will often produce random structural parameters at the macroeconomic level. For example, if the optimal adjustment to disturbances varies across individuals and firms because of varying costs or preferences, then the aggregate adjustment function will be a random variable. To assume that our knowledge actually extends as far as the moments of
4.1 Introduction
63
unknown structural parameters more than overstates the true position. In reality there is considerable dispute among economic theorists about the correct theoretical structure. In principle there is a branch of optimal-control theory which can be used to handle this kind of uncertainty. Adaptive control, or dual control as it is also called, is concerned with the possibility that we may start off with insufficient information about structural parameters in the sense that the mean is unknown, but that, as time goes by, we shall learn more. So in determining what is best to do now we ought also to take into account that we shall know more in the future. Obviously this is far from straightforward since learning can take a variety of forms, very few of which can be predicted in any useful sense. Passive learning occurs as a natural by-product of the constant formulation and testing of hypotheses and the clash of competing theories. But if it were indeed possible to predict the direction of this learning then the need for it would not exist. Active learning, on the other hand, which is what the proponents of adaptive control actually mean, is concerned with gaining more knowledge of the structure of the economy by actively simulating economic activity in a variety of ways, while at the same time also controlling it. This approach combines experimental design with optimal control by stimulating the economy early in the planning period so as to enhance the quality of control later on. The practical or political feasibility of adaptive control is open to question. But there is no question that governments do conduct experiments in economic policy, even though the need to increase understanding of the economy by structured experimentation is never a consideration. Adaptive control involves probing the economy to provide sufficient information to allow key parameters to be estimated more accurately. To take an example, suppose that a government wanted to try out a new kind of policy instrument but had no direct information about how this new instrument would affect economic activity. Prior theorising or the similarity of the instrument to existing instruments might provide some initial estimate. But a more precise measure of its effect can only be gauged by actual observations on the behaviour of the economy. To do this efficiently it would be necessary to vary the new instrument over time to provide sufficient variance for the effects to be measured. The precise way of varying the instrument will depend upon the nature of the policy instrument, the policy objective function and the structure of the economy (Zarrop, 1981). However, it is clear that there are two main disadvantages to this approach. First, there may be powerful political objections to using policy instruments in this way and, secondly, the amount of time necessary to provide sufficient data may be prohibitive. An additional source of uncertainty which can have an important bearing upon the conduct of economic policy is uncertainty about the initial state. So far we have assumed that in a time period t the information set includes accurate measures of yt-l9 and et. In practice, there is a delay in collecting
64
4 UNCERTAINTY AND RISK
data, and data come in much more rapidly for some time series than for others. Moreover, some important series are subject to substantial revision. Measurement error can moderate the particular feedback rule which is adopted, because of uncertainty about where the economy is at the particular moment when policy instruments are changed. The effects on policy of different types of uncertainty about the initial state will be examined later. 4.2
POLICY DESIGN WITH INFORMATION AND MODEL UNCERTAINTY
In chapter 3, uncertainty was confined to the non-controllable variables, et and et9 and was handled in the optimised-decision rules by means of the certainty-equivalence theorem. In a more general analysis, there are three problems: (i) uncertainty in the non-controllable variables (i.e. additive uncertainty); (ii) uncertainty in the parameters (i.e. in the numerical evaluation of the policy multipliers R, denoted multiplicative uncertainty); and (iii) the fact that the parameters of the probability-density functions of those uncertain components may vary over time. 4.2.1
Additive uncertainty (information uncertainty)
The optimal decision, X*, in (3.4.4) depends explicitly on b - a n d through that on et and et. Certainty equivalence implies that b should be replaced by its conditional expectation Et(b). The reasoning was as follows. Let Et(.) = [ £ ( . ) | Q] denote an expectation conditional on the information set Qt available at the start of period t. Then J is replaced by Et(J) as the criterion at each f. Now, if tr(.) denotes trace: Et(J) = Et{dZ)'QEt(SZ) + q'Et(dZ) + tr Q Vt(Z)
(4.2.1)
where, under additive uncertainty: Vt(Z) = Et[Z - £ f (Z)] [Z - Et(Z)J = (1:0)fVt(s)(1:0) is the conditional variance of Z for any choice of X. Evidently Vt(Z) is independent of the choice of X. Hence: min [£( J)\dY = RdX + b, QJ = min [J \SY = RSX + £f (6)] X
(4.2.2)
X
and this shows that, under additive uncertainty, the optimal, certaintyequivalent, decisions are just (3.4.4) with Et (b) replacing b. Starting with Ex (b), we can reoptimise X conditional on Qt for t ^ 2. The form of the optimalpolicy revision, made in subsequent time periods and conditional on new information, was specified in chapter 3. Therefore, certainty-equivalent decisions require no knowledge of the distribution functions of the random variables in s, beyond the conditional
4.2 Information and model uncertainty
65
mean Et(s). At the same time, certainty equivalence also implies that the term in Vt(Z) has been dropped from the decision criterion. Yet the larger the dispersion of z, for a given mean, the greater the risks about the realised target values (see, for example, Newberry and Stiglitz, 1981). Hence certaintyequivalent decisions are risk-neutral. They are unaffected by risk aversion, or indeed by any term reflecting the degree of uncertainty, Vt(Z). This raises the question of whether the first moment of the objective function's distribution can be justified as the sole optimisation criterion under uncertainty. If this distribution is affected by whatever decisions are taken, then higher-order moments may need to be considered as well. 4.2.2 Multiplicative uncertainty (model uncertainty) We now investigate the consequences of uncertainty about the model's coefficients. The value of R might be uncertain for several reasons. The target responses to instrument impulses may be genuinely random. Or it may be that they are fixed but impossible to estimate precisely, due to the sampling variability in a finite data set. Alternatively it may be that they vary according to some well-defined but imperfectly known scheme, because, for example, it is recognised that the model is a linearisation evaluated about a trajectory of uncertain exogenous variables, ev The probability-density function, characterising any uncertainty in A and B in equation (2.1.4), is not generally known when the latter consists of restricted reduced-form estimates. And even if some features of these density functions (such as the means and variances) are known, the associated characteristics of R will not be known, because of the non-linear transformations and convolutions necessary to derive the PDF of R from the structural form. Since it is impossible to characterise this type of uncertainty precisely, we must be content with some approximations. The approximation used here has two parts. First, we use an approximation to the mean, Et(R). Secondly, we use an approximation to the correct decision, given some prior estimate of Et(R). In fact, Et(R) is usually approximated by taking the estimated means of the structural-parameter matrices, and then creating R via (2.3.2) or via a mean stochastic simulation of the parameters. We must accept the resulting approximation as adequate, since further checks are impossible without knowledge of the exact distribution of R. Now let dR = R — Et(R) and consider: SX'Et(R'Q*R)dX = 5X'Et(R)'Q*Et(R)5X
+2dX'Et(R)'q*Et(dR)5Z
+ Et5X'(dR'Q*dR)dX = 5XfEt(R)'Q*Et(R)SX
+ dX'Et{dR'Q*dR)dX
which holds when Et(dR) = 0, i.e. we have unbiased estimates.
(4.2.3)
66
4 UNCERTAINTY AND RISK
Since A and B will be consistent estimates at best, (4.2.3) is strictly true only asymptotically. But the error in this third approximation is probably very small compared with ignoring the last term of (4.2.3), which will be the next step. Repeating the certainty-equivalence argument for this case and using (4.2.3) yields: tr Q Vt(Z) = [Et(b'Q*b) - Et{b)'Q*Et{b) + SX'Et(dR'Q*dR)SX]
(42 A)
This expression depends on X'. Furthermore, if R and b are correlated we must add: SX'[Et(R'Q*b) - Et(R)'Q*Et(b)'] = SX'Et(dR'Q*db)
(4.2.5)
to (4.2.4). So, if, in addition to any other approximations, we neglect the last term of (4.2.3) because it is of an order smaller than the rest, then (4.2.4) is approximately independent of SX. This requires that the sum of variances and covariances of JR, weighted by the scale of the target preferences, is small. Similarly, we need small correlations between R and b if (4.2.5) is also to vanish. In this case, the certainty-equivalence result continues to hold approximately, with Et(R) inserted for R on the right of (3.4.4). Notice that the neglect of the last term of (4.2.3) and of (4.2.5) would be implied by taking first-order approximations to SX'E(R'BR)SX and SX'Et(R'Bb) about Et(R) and Et(b). Hence the strategy of treating the parameters in the optimal-decision rules as if they were fixed at their expected values can be justified as correct up to a linear approximation - a strategy termed first-order certainty equivalence. 4.2.3
Optimal decisions under multiplicative uncertainty
The decisions which minimise Et(J) exactly, when the parameter matrix R is stochastic, can be obtained directly from (3.4.3): X* = Xd-[Et(M)']-1Et(m1)
(4.2.6)
where Et(M) = N* + M*Et(R) + Et(R)'M* + Et(R)'Q*Et(R) + Et(dR'Q*dR) and £ , K ) = Et(R)'q1 +q2 + M*Et(b) + Et(R)fQ*Et(dR'Q*db) The difficulty with (4.2.6) will be the computation of an exact distribution, or at least the means and variances, for the dynamic multipliers in R. But, once the mean and covariance matrix of R has been computed, X* is easily obtained. The literature contains a variety of other techniques for treating multiplicative uncertainty - for example, the passive-learning methods of
4.2 Information and model uncertainty
67
Chow (1976) and the active-learning methods in Kendrick (1981). Unlike (4.2.6), these latter techniques impose stochastic independence between decision periods on the constrained objective and may not always be valid. To illustrate one approach, a dynamic-programming solution is given next. 4.2.4 A dynamic-programming solution under multiplicative uncertainty As in chapter 3, the objective function is: J = 1/2E t t=i
(fytQtfyt + Sx'tNt8xt)
(4.2.7)
but now it has to be minimised with respect to the constraints provided by a linear model with multiplicative uncertainty: yt = Atyt_1+Btxt
+ Cet + et
(4.2.8)
The coefficient matrices are indexed with respect to time because it is assumed that the coefficients are stochastic with known means, variances and covariances but independent over time. The stochastic characteristics of the coefficients are assumed not to vary with respect to time. There is no opportunity to learn about the coefficients. Note that if coefficients vary over time in a known, deterministic way - following, perhaps, an autoregressive process - then all one needs to do is substitute their values into the backwards recursions of chapter 3 to derive the optimal policy. For the terminal period, (4.2.7) can be rewritten: J(T, T) = l/2E(y'TQTyT + x'TNTxT - 2y'TQtydT - 2x'TNTxdT 4TxdT)
(4.2.9)
The main difficulty arises with the evaluation of the first term on the righthand side. From (4.2.1) we have: E{y'TQTyT) = (EyT)'QT(EyT) + tr(QtVy)
(4.2.10)
where the ijth element of the matrix Vy can be obtained from : +Vii
(4.2.11)
V% is the covariance matrix of the /th row of A with the^'th row of B. VQ is the error variance-covariance matrix. It is assumed that there is no covariance between A and C or between C and Q, since these terms have no effect on the solution. Substituting (4.2.10) into (4.2.9) and differentiating with respect to xT9 the optimal-feedback rule for the terminal period is: (4.2.12)
68
4 UNCERTAINTY AND RISK
where KT= -(B'QTB + NT + tiQTVB)-1{B'QTA
+ tiQTVAB)
(4.2.13)
x [(B'QTC + trQTVBC)e, - BQT/T -NTx'T + irQTVBa]
(4.2.14)
Following the procedure employed in section 3.2, the feedback rule for any intermediate period, T, is: xt = X t y t _ , + f c t
(4.2.15)
where now
1 )F 4B ]
(4.2.16)
P t+1 )*y- 1 x {[(6, + P , + i)C + tr(Q t + P t + 1 ) K « ; > t - N r x ? m}
(4.2.17)
with
•
+VBQKt+l+K'r+lNt
+ 1Kr+1]
(4.2.18)
ht+i = - ( / H - B X t + 1 ) ' [ ( Q t + 1 + P t — K'
N
rd
4- K'
+ (K'x + 1VBC+VAC)ex
+1
k
V
+ F^ +
fc^F^]
(4.2.19)
These diflference equations can be solved backwards from the boundary conditions provided by (4.2.9) and (4.2.12).
4.3
THE EFFECTS OF MODEL UNCERTAINTY ON POLICY DESIGN
The effects that multiplicative uncertainty can have on the conduct of economic policy can be seen most easily with a scalar example. Although the matrix case was examined in the last section, it is not easy to establish clear-cut conclusions in the general case.
4.3 Model multiplicative uncertainty 4.3.1
69
Parameter variances
Consider the scalar example: yt = atyt-i
+ btxt + ctet + et
(4.3.1)
where in contrast to equation (3.2.1) the coefficients are random variables with means, E(at) = a9 E(bt) = b, E(ct) — c, and variances a2, o\, a2. Possible covariances between the coefficients themselves and the error term are considered later. The objective function is as before, so that for the static, single-period problem the optimal closed-loop is: x, = K , y , - i + * r 2
Kt=-qtab/(qtb
(4.3.2) 2
+ nt + qt
(4.3.3) 2
kt = - (qtbcet ~ ntx?)/(qt(T + nt + qta b)
(4.3.4)
The following observations can be made. Only the variance of the impact effect of policy, measured by a2, enters into the closed-loop rule. Moreover, as uncertainty about the impact of policy increases, where uncertainty is being measured by a2, both the feedback and the tracking-gain parts of the optimal, closed-loop, response to innovations in et and et tend to zero. This result, first observed by Brainard (1967), gives an analytical bite to the intuitive feeling that as one becomes less certain of the effects of policy there will tend to be more caution in the exercise of policy and a less vigorous response to disturbances. In the limit, the optimal response is to do nothing. The observation that uncertainty about the effects of policy can make the optimal responses to disturbances tend towards passivity has been used to lend credence to the views of, for example, Friedman (1968) that the presence of uncertainty provides a strong argument for the abandonment of a stabilisation role for macroeconomic policy and the adoption of 'fixed' rules the most well-known of which is the adoption of a fixed rate of growth for the money supply. It must be emphasised, however, that Friedman envisaged uncertainty to be of a much broader class than that encompassed by uncertainty about the impact effect of policy. The effect of a2 on the closed-loop rule in the scalar case is very like the effect of the penalty, nt. Even where there are no costs attached to varying the policy instrument (nt = 0) we find that with a non-zero G\ it is no longer possible to completely eliminate the autoregressive response of yt to disturbances, such as is achieved by equation (3.2.16). The effects of multiplicative uncertainty (in the absence of covariances) are only felt in the static, single-period case through the impact effect of policy. We need to know what the consequences are if we consider the intertemporal case. For example, consider the two-period problem of minimising:
i r=l
)
(4.3.5)
70
4 UNCERTAINTY AND RISK
subject to (4.3.1). As with the deterministic case in chapter 3, we first solve the second-period (terminal) problem, the solution to which is provided by the single-period solution above. The (intertemporal) optimal closed-loop solution for the first period is: xt = Ktyt_x+kt
(4.3.6)
where Kt =
-pl
2 + 1ab/(pt + lb ^nt^pt
+ lai)
K = -(Pt + ibcet-ntxf-qtbyf-bht Pt + l = ( ] t + q t + l(a + b K t
+l
(4.3.7) 2
+ 1)/(pt + lb
2
2
) + qt + l ( 7 + n t
— n + (k + K + —/C
r+
^nt+pt 2
+ 1K +1
xrd+ )
+ 1(7^
(4.3.8) (4-3.9) (4.3.10)
We now find that, for the intertemporal case, a2 also enters into the optimal closed-loop rule. But now the effect of changes in uncertainty about a is to increase the strength of any feedback and feedforward response, i.e.: and
Thus the more uncertainty about the speed of adjustment of the economy (for given $1), as measured by a2, the more vigorous a closed-loop policy will tend to be (Craine, 1979). As in the single-period case, increases in o\ still reduce the intensity of policy, i.e. for the feedback part: +/W2)2
(4-3.13)
and for the tracking gain: dkjda2 = ~t
+i
Pt
*)_Ce\Jlt*
It is interesting to note that nowhere does a2 enter into the closed-loop rule. Thus, no matter how much uncertainty there is about the impact effect that exogenous variables have on targets, the optimal closed-loop rule is the same. Even through the feedforward term, kt, there is no tendency for the anticipatory response of current policy to innovations in expectations about future exogenous processes to be influenced by increased multiplicative uncertainty about how these exogenous processes will affect targets in the future.
4.3 Model multiplicative uncertainty
71
In circumstances in which there are no adjustment costs attached to using policy instruments (nt = nt + 1 = 0), the optimal closed-loop rule collapses to: xt = Ktyt.l+kt 2
(4.3.15) 2
Kt = abl{b + o b) kt=~(pt
+ lbcet-qtbyf-bht^)/(pt
(4.3.16) 2
+ l(b
+ G2))
(4.3.17)
2
so that now a only enters into the feedforward component of the closed-loop rule. In the even more special case in which there are no policy-adjustment costs and multiplicative uncertainty is confined to a (a2 = 0), the closed-loop rule for period t is: xt= -a/byt^
-[(qt + qt + 1(T2a)cet + qty?y[b(qt
+ qt + l(T2)]
(4.3.18)
Thus the feedback term reduces to a constant independent of any multiplicative uncertainty, with a2 affecting policy through the feedforward term.
4.3.2 Parameter covariances The analytical results obtained above are complicated by allowing covariation between coefficients and it is no longer necessarily the case that changes in a2 and a2 respectively weaken or strengthen a closed-loop policy. The single-period optimal closed-loop rule is now: x, = £,>>,_!+/c, Kt=-
(4.3.19)
(qtab + qtpabGaab)l{qtb2 +ntqt<j2)
(4.3.20) 2
2
K = - l(qtbc + qtpbc(Jb(ic)et - ntx? - qtbyf + qtphE~\/{qtb + nt + qta )
(4.3.21)
where ptj is the correlation coefficient between i and ;. The covariance between a and b enters into the feedback gain and the covariance between b and c and between b and e appear in the feedforward or tracking gain. In this situation the direction of effect of changes in uncertainty about the impact of policy will no longer be unambiguously negative. Relative to the certainty-equivalent policy where there is no multiplicative uncertainty, whether policy is more or less vigorous in response to unanticipated disturbances will depend upon the strength of the covariation between a and b and upon the sign. If pab is negative and the covariation strong enough, it may well be that the policy is more vigorous than if there were no uncertainty, in so far as changes in a2 are associated with changes in the covariation between a and b. The covariation between b and c and between b and the error term e appear in the tracking gain. If the correlation coefficients are positive, increases in uncertainty about the impact effect of exogenous variables and the additive
72
4 UNCERTAINTY AND RISK
disturbance will produce a more vigorous response than in the certaintyequivalent case. Overall the effect of covariances is to make the effect of multiplicative uncertainty ambiguous in the single-period problem. In the two-period problem, the first-period optimal closed-loop rule is with coefficient covariances: xt = Ktyt-x
+kt
K , = -(Pt + iab + pt + 1pab(Taab)/(pt K=~
2 + 1b
+ nt
-ntxdt
[(A + 1ab + pt + 1 pabaaah)et
-b(qtydt+h
1nt
ht + l = -(a + bKt + l)[qt
+ 1(bKt + l
+cet
+1
-ydt
t + 1)
+ (Kt + Pae°a°e+ K + x GabGaGb~\
(4.3.22)
For the intertemporal problem the first-period optimal decision will be affected by the correlation between a and the additive error, £, as well as the correlation between a and c. But again these only show up in the tracking gain. Once we allow for correlation among the coefficients and the additive error, the question of how multiplicative uncertainty will affect the conduct of economic policy becomes an empirical one. Even when the multiplicative uncertainty is confined to the impact effect of policy and b is correlated with s, whether policy is more or less vigorous than under no multiplicative uncertainty will crucially depend on pbe.
4.3.3
Increasing parameter
uncertainty
It would be useful to be able to give some general results about the effect of uncertainty about policy responses on policy design. Using the 'stacked' approach of section 4.2.3, we can write (from (4.2.6)): x* = xd - (M + F ) " 1 ^ ! + / ) where M = N* + M*Et(R) + Et(R)'M*' + Et(R)'Q*Et(R)
(4.3.23)
4.4 Policy specification under uncertainty
73
F = Et(dR'Q*dR) mx = Et(R)'qi + q2 + M*Et(b) + Et(R)'QEt(b) f=Et{dR'Q*db) An increase in any element V{Ri}) of the covariance matrix V(R), for a fixed value of i= 1,.. . , m T a n d a l l ; = 1 , . . . , nT, implies dFjj>0 since Q* > 0 . But: dx* = - ( M + F)~^F(x* - xd)
(4.3.24)
where dF is a null matrix except where dFn>Q. Consequently dxf < 0 if xf > x? but dxt > 0 if xf < xf. Either way the effect of an increase in the uncertainty about the time Rtj values (for fixed i and any ;) implies \xf — x?| is reduced. Therefore increasing uncertainty about the policy responses usually forces the optimal decisions to follow their ideal values more closely. This is only an approximate result in that the covariances between R and b have been assumed to stay fixed, whereas in practice increases in V(R) may well increase those covariances too. This means that policies would become less vigorous relative to the xd values. But whether that would imply less vigorous intervention is quite another question, and depends on whether the ideal values themselves would imply vigorous policy changes or not. If they do, then increasing uncertainty of this kind should lead to greater intervention. But, in the more likely case that they do not, increasing parameter uncertainty would (as in the exmaple of section 4.3.1) lead to less vigorous interventions.
4.4
POLICY SPECIFICATION UNDER UNCERTAINTY
The broad implication of the argument in section 4.2.1 with respect to certainty equivalence is that if policy-makers do care about the variability of outcomes then it will not be sufficient to consider the deterministic solution as an equivalent solution to the stochastic problem. The policy-maker will need to know the likely degree of variability implied by his policies. Thus, even in the unlikely case in which the policy-maker could write down his relative priorities, the problem would not be fully stated until he also knew what the closed-loop variances were. This might entail a further iteration in the policyformulation process as relative weights are refined in the light of what is revealed by an analysis of the closed-loop variances. For the linear model (3.2.1) we can write analytical expressions for variances of both the targets and the policy instruments. The one-step-ahead open-loop covariance of the vector of endogenous variables is: VE
(4.4.1)
74
4 UNCERTAINTY AND RISK
where R^AR^^A'
+VE
(4.4.2)
with Ro = Vt. The covariances with feedback are: Vyt4i = (A + BKt)Rt(A + BKt)f + Kf
(4.4.3)
If we also allow for disturbances to be transmitted through the exogenous variables which we assume have covariance VCf9 so that we are permitting the covariance of the exogenous variables to vary over time, then the open-loop co variances are: Vyt + l=ARtA'
+ CVeC + VE
(4.4.4)
The closed-loop expressions are more complex because we must also allow for the feedforward term kt which we assume is able to respond to current realisations of the exogenous variables. This means: Vyi+x = (A + BKt)Rt(A + BKt)f + CVeC + VE
(4.4.5)
l
where C = C-lNT + B'{Qt + Qt + l)BY B{Qt + Pt + 1)C. By inspection of these covariances the policy-maker can gain an idea of the likely variability of target variables at any point over the policy horizon and can establish what the choice of particular penalties in the objective function implies for the variability of outcomes. Moreover, the distribution of disturbances from et and et also imply a corresponding covariance in the behaviour of policy variables, i.e.: Vx< = KtVyi_K't + (C - C)Ve(C - CY
(4.4.6)
Thus far we have assumed that the policy-maker is only concerned about how target variables vary, but there is no reason why there should not also be disutility attached to large variations in policy instruments, For example, if interest rates are used as policy instruments there may be large political costs associated with large variances in interest rates because of the effects on the residential loan repayments of households. Moreover, there may be economic costs attached to increases in the variance of policy instruments which would have to be weighed against the benefits arising from the smaller variability of target variables. The linear model we have been examining does not explicitly recognise the effects of variances on behavioural relationships. There may well be some implicit effects through structural parameters, even though these, ordinarily, would have constant variances. The multiplicative-uncertainty case is unlikely to capture these kinds of effects since we do not have the parameter variances dependent upon the variances of yf, xt and et. If active policy stabilisation is going to alter the variability of both targets and instruments, this is likely to
4.5 Risk-averse (variance-minimising) policies
75
undermine the assumption that the underlying structural parameters are relatively invariant. This raises issues and problems which we shall not attempt to address here, though it should be noted that these problems can be interpreted as a lack of structural in variance due to second-moment effects in contrast to the first-moment effects identified by Lucas (1976) and considered in chapter 7 onwards.
4.5
RISK-AVERSE (VARIANCE-MINIMISING) POLICIES
In section 4.2 we showed that certainty-equivalent decisions are invariant to risk. The same policies will always be recommended whatever the potential variability of the target variables. This situation might appear very unsatisfactory to a policy-maker. It has, in fact, been heavily criticised in economic theory, albeit in a particular context. Poole (1970), for example, showed that the proper choice of monetary instrument (the growth of the money stock, or the interest rate) depends on knowing the relative variability of money demand and national income. If the variance of national income is sufficiently large (small) then the money stock (interest rate) is the appropriate instrument, although a certain combination of the two instruments (depending on those variances) is superior to either individually. In making this argument, Poole clearly distinguishes between certainty equivalence 'in the decisions' and 'in the utility sense'. But we have just seen that certainty equivalence in (3.4.4) follows from minimising the first two terms on the right of (4.2.1). So it is the target variances (or risk term) in (4.2.1) which define whether the policy instruments have been selected correctly. If the decision rule is independent of policy target variances, then it will not be possible to pick out the best instruments with respect to the inherent risks, let alone to select optimal values for them. The rest of this chapter will be devoted to designing risk-sensitive decision procedures which react to the variability (and asymmetries) of the target variables. The same distinction between certainty equivalence 'in the decisions' and 'in the utilities' emerges. Our risk-sensitive decisions optimise the expected utilities associated with the performance level, J, actually achieved. These utilities approximate the von Neumann-Morgenstern theory of decision-making under risk. The problem arises because the decision criterion, J, is stochastic and riskneutral. It has been conventional to replace J by its expectation as the decision criterion. If this is unsatisfactory, the obvious remedy is to include the secondand higher-order moments of J. That is, to pick policies which produce not only the best performance in expectation, Et(J), but also one with the smallest variance and the power to ensure that any disturbances are more likely to appear on the more desirable side of the target's distribution than on the less
76
4 UNCERTAINTY AND RISK
favourable side. An appropriate combination of these characteristics of J would define a new decision criterion such that optimising its expectation produces policies consistent with a risk-sensitive decision-maker's tastes. The more moments included, the more are the stochastic characteristics accounted for. Indeed these moments are themselves expectations of increasing powers of the stochastic-optimisation criterion /^ = E f (J-E t (J)Y for ; = 2,3,.... Hence the linear combination Et(J) + Yj=i a./Af/ constitutes an approximation to the expectation of a standard concave utility function if appropriate weights are used for OL}. TO introduce such a combination as an objective is to specify that risk-sensitive decisions must follow from a von Neumann—Morgenstern utility function. We first examine decision rules which would reduce the variability of economic activity. Consider the case where the objective function (3.4.1) is separable between targets and instruments and contains linear penalties, i.e.: J = 1/2(SY'Q*8Y +SX'N*8X)
+ q'15Y +q'25X
(4.5.1)
If optimising the first moment of the distribution of J provides the best expected performance, then minimising its second moment reduces the probability of given variations about Et(J) and hence provides more security and less risk about the actual performance realised. Since risks are reflected in variability, a purely risk-averse decision-maker would aim to minimise the conditional variance Vt(J) = E{[J - E(J)]2\Qt}. Let G = b - E^b). Inserting (3.4.2) into (4.4.1) implies, given Ql9 that: J - E^J) = 8X'R'Q*G
+ E1(b)fQ*G +
1/2[
-
E1(G'Q*G)']
+ q\a
(4.5.2)
which does depend on X? The decision which minimises Vt(J) is therefore: q
l
] }
1
(4.5.3)
where \l/1=E1(aa') and s* = 1/2E1(GG'Q*G)9 given that 8X'R'Q*E1(G)Et(GQ*G) = 0. This decision depends on the first, second and third moments of the density functions of the random ('final-form') variables in b. In contrast, the certainty-equivalent decisions (equation (3.4.4)), which depend on only the first moment of the density function of b, are purely ambitious since they minimise Et(J) without regard to risk. For calculation purposes:
**' = ( l Z QTJE^G^ V
... X Z G S ^ i ' i n W ) )
<4-5-4)
which is a vector of weighted sums of all the marginal and joint third moments of the multivariate density function of b. Note s* = 0 holds for normally or symmetrically distributed random errors, but even then X
4.6 Risk-sensitive decisions 4.6
11
RISK-SENSITIVE DECISIONS
The literature on risk-bearing contains a number of suggestions for modifying a stochastic objective function so as to retain an effective measure of risk aversion, while also yielding a tractable optimisation procedure and relatively simple decision rules. These proposals may be separated into three groups: (i) The risk-premium approach (Pratt, 1964; Arrow, 1970). (ii) The generalised-utility (von Neumann-Morgenstern) approach, revived recently by Machina (1982). (iii) The complete density-function approach (Whittle, 1982).
4.6.1
Optimal mean-variance decisions
In this framework, the simplest approach to generating risk-sensitive decisions is to combine risk-averse (variance-minimising) and ambitious (certainty-equivalent) decisions. To do this, we set up a multivariate, dynamic mean-variance framework which yields an explicit and computable approximation to the decision-maker's underlying (von NeumannMorgenstern) utility function. This mean-variance framework has the advantage of producing decision rules which are simple analytically and computationally, while at the same time combining many of the properties of all three approaches mentioned above into one mechanism. Sections 5.4—5.6 show that these rules generalise to risk-premium decisions when (3.4.1) represents a second-order approximation to the generalised utility functions advocated by Machina (1982) and others.3 Finally, by way of comparison, the mean-variance framework is a special case of the complete density-function approach in that only the first two moments of the relevant distribution are considered. More importantly, it generalises the complete density-function approach in that: (a) it is distribution free, (b) it explicitly recognises that risk sensitive decisions of different periods will be stochastically interdependent (Hughes Hallett, 1984a). Thus the determination of optimal decisions here depends neither on having normally and serially independently distributed random variables nor on the prior imposition of intertemporal stochastic separability, which would cut out any possibility of anticipatory behaviour. Asymmetric risks can also be accommodated in the mean-variance approach. Optimal mean-variance decisions are now defined by: min[l/2aK 1 (J) + ( l - a ) £ 1 ( ^ ) ] (4-6.1) x where a is a risk-aversion parameter which trades security, indexed by ^ ( J ) ,
78
4 UNCERTAINTY AND RISK
against ambition, indexed by EX{J). The solution to (4.6.1) is: (4.6.2) where (K,k) and (M,m) are as defined in (4.5.3) and (3.4.4) respectively.4 4.6.2
Trading security for performance level
Optimal mean-variance decisions can also be obtained from: min[aF0 + ( l - a ) J 0 ]
(4.6.3)
x
where Vo is the increase in Vt if X ^ X** is chosen, and Jo is the increase in Et(J) when X ^ X* is used. This formulation is convenient since setting a explicitly trades a loss of ambition, J o , against a loss in security, Vo. Indeed, Jjo = (X - X*)'M(X - X*) if X^X* is implemented (Theil, 1964). In addition, the transformation of (Q*, q*) to (Qo, qo)9 given in detail in section 4.6.3 below, permits X** to be treated as if it were a certainty-equivalent decision under this reformulated penalty scheme. This impies that Vo = (X - X**)'K{X - X**) if X^X**. Solving (4.6.3) with these quantities inserted yields: X° =-[aK + (1 - ^Mj-^aKX** + ( l - a ) M I * ]
(4.6.4)
which is identical to (4.6.2) when X* and X** are replaced by (3.4.4) and (4.5.3) respectively. One obvious simplification is to construct the linear combination: (4.6.5) where 0 ^7 < 1 is a scalar. Since this implies X* -X = (1 -y)(X* -X**), we have, as above, J0 = (l-y)2(X*-X**)fM(X*-X**) and, similarly, V0 = y2(X* -X**)'K(X -X**). Therefore the value of y which minimises (4.6.3), subject to (4.6.5), is: 1
y* = ( l - a ) J 1 / [ ( l - a ) J 1 + a F 1 ] where Jx =(X*-X**)'M(X* -X**) are known quantities.
(4.6.6) and Vx = (Z* -X**)'K(X* -X**)
4.6.3 The certainty-equivalent reparameterisation It is important to notice that theserisk-sensitivedecision rules actually follow from applying reparameterised priorities set into a certainty-equivalence framework. For example, therisk-sensitivedecision values (4.6.2) are identical to the ordinary certainty-equivalent values (3.4.4) when Q is replaced by
4.6 Risk-sensitive decisions
79
<xQo + (1 ~ a)Q> where
-[ fi 'o e * o] Vn
and q is replaced by ocqo -h (1 — cc)q where
0 Hence risk-sensitive decisions, (4.4.3), (4.6.2) or (4.6.6), accommodate high-risk targets by raising the relative penalties on deviations from their ideal values, while lowering the penalties on low-risk targets and removing them altogether from the instrument deviations. An interpretation of these changes will be given as part of our discussion of the setting of relative priorities in sections (5.4)-(5.6). The point here is that all of the risk-sensitive decision rules described above can be computed and, more importantly, analysed, simply by applying standard certainty-equivalent computer routines and analytical results to the decision rules which follow from optimising a quadratic objective function containing the transformed priorities structure.
4.6.4
Risk-sensitive policy revisions
The simplest way to introduce optimal-policy revisions to risk-sensitive decisions is to apply the revisions detailed in chapter 3 for the certaintyequivalent rules directly to (4.4.3) or (4.6.2), written as certainty-equivalent decisions with the transformed priority parameters. (Naturally the revisions for (4.6.6) follow those for X* and X** separately.) However, to give some insight as to what is involved here, partition the decision variables between past and future in the same way as in chapter 3. Thus T = [ Y*', yjr)] and X' = [X*', X[t)l where Y*' = (Y*\ . . . , Y*l t) and X$f = {X\\..., XfLi) are past realisations at t^ 2, but Y[t) and X'{t) have still to be reoptimised. Partitioning other quantities comformably: 0
nv* o)
01
jRi0)
0
M say. Then
Vt{J) = dX'R'P,RSX + 2dX'R'[PtE(b) + «J + 25X'R'Q*E,[s* (4.6.7)
80
4 UNCERTAINTY AND RISK
where Pt = Q**l'tQ*- The first two terms correspond to the transformed quadratic objective function, i.e. (4.6.1) with Q* = Pt and q2 = 0. Inserting the partitions into this transformed objective function, and collecting terms which are respectively quadratic and linear variables to be reoptimised, it can be shown that Q*)^tQ(t) replaces Pt in the quadratic term; while 4i(r> + 6*)^i26*0)^^0 anc * Et[b(t) + ^ i ^ o 3 replace q and Ex(b) in the linear term. The conditional variance of a(t) in these replacements is x/ = l / l t l (t)~{l/i2{l/ii[l/i2' Inserting these partitions into the final term of (4.6.7) implies the third moments 5* are replaced by: sit) = R{t)Q*t)Et{G(t)G{t)Q*t)G(t)-\
(4.6.8)
The optimally revised decisions are therefore (4.5.3) or (4.6.2) with Qft) for g*, if/t for ij/l9 the replacements for q1 and Ex(b) as above, and (4.6.8) for s*.
4.6.5
The stochastic inseparability of dynamic risk-sensitive decisions
In chapter 3, we pointed out that the constrained-objective function must have an additively recursive structure (at least under a monotonic transformation), so that the decision variables are nested backwards with respect to time, if dynamic programming is to generate optimal decisions. That is a consequence of the necessary conditions for the validity of the principle of optimality (Bellman, 1961, pp. 54-7). The underlying problem may have such a structure since inserting (4.2.8) or (3.2.1) into: J = Z, Jt = lITLih'tQf^
+ Sx'tNtSxt)
which implies that: (4.6.9) which has the required time-nested decision structure. The linearity of the expectations operator, moreover, preserves this decision structure within Et(J) for the certainty-equivalent decisions. But the quadratic nature of variances destroys this nested structure when Vt(J) enters the objective. Indeed:
Vt(J) = I [ Vt(Jt) + 2 X cov(Jr, J,)] t L
j*t
J J
(4.6.10)
where the covariances, cov(Jt,Jj), are non-zero since Jt and Jj are both quadratic functions of xti...,x1 and et,...9e1 (for j>t). These covariance terms cannot be ignored in the sense of optimising 2Kf(Jf) + 2 1 cov(J r , Jt), as would happen in deriving a decision rule for xt (t < T) by dynamic programming. But optimising (4.6.10) with respect to each xt in turn will yield
4.6 Risk-sensitive decisions
81
decision rules directly dependent on earlier decisions as well as on the rules for xj9 j> t, already determined. Thus: xt = f ( x t - l 9 . . .9xx\ Kt
+ l9kt
+ l9..
.9Kt,kt)
(4.6.11)
where the previously determined rules are of the form xt+j = Kt+jYt+j_l + kt+j for j= 1,.. .,T — t. Thus to optimise x 1? say, it will be necessary to solve xt and vice versa, that is, it will be necessary to solve for xl9.. . , x r simultaneously, which is exactly what (4.4.3) does. The joint dependence between x x , . . . , xT here is a result of the stochastic inseparability of the risk-averse objectives over time. This inseparability manifests itself in the fact that i/^, and hence the transformed priorities in Q o , are not block diagonal with respect to time. In fact the t,;th submatrix (IAIX/* where t^=j, contains E(
sj = (3yj/dyt)st +
Z
(dyj/dut)ui
i=t+ l
but: t
*t = Z (fyt/dujui + (dyt/dy0)y0 i=l
Thus the certainty-equivalent decision, X*9 was defined as solving mins j=l
where fit is a vector of Lagrange multipliers. But the risk-averse strategy, X**, results from solving T 1/Z 2 J \_0'^'t\Uo)tj0^'ji
r T ~T~Qot°^t • A^rl ^M — 2 J
*^tjo^j~
(4.6.13) which does not display the recursive (time-nested) structure necessary for the principle of optimality to ensure the optimality of dynamic-programming techniques. As a specific example, consider the cumulative generation function of J = lLJt9 i.e. = p ~x In E^) where p is a risk-aversion parameter. This is the risk-averse decision criterion proposed by Whittle (1981, 1982). In a deterministic framework the constrained objective is (setting p = l for convenience): <j> = ln[e^ u < v . -i] = z J t ( x t , . . . , X l , st,...,fil)
(4.6.14)
which may be optimised by dynamic programming since it displays the required additive recursive structure. But in stochastic problems the
82
4 UNCERTAINTY AND RISK
constrained objective is:
JL
_ . ,_
eJi)<5] # In f ] £ f f e r '
_ . !
. n _ Y T? IW/'1]
' ' '] ~
'L
(4.6.15)
r=l
Hence <£ no longer displays the required structure and cannot be optimised by optimising each Jt(.) separately in the stages of dynamic programming unless one imposes further restrictions on the decision rule. Whittle imposes non-anticipation on the decision-maker's conditional density functions for each yt and xt. This obliges the decision-maker to ignore the stochastic dependence between each future state and each expected decision value future to that state - although that dependence is a general feature of his problem. The decision-maker then acts as if each Jt(.) were statistically independent. But he would not accept that restriction if another optimisation technique were available. 4.7
THE KALMAN FILTER AND OBSERVATION ERRORS
In practical applications of control methods a potentially significant cause of uncertainty concerns the informational basis upon which feedback and feedforward occurs. The application of the feedback rule: x^KM-i+k,
(4.7.1)
requires measurements on yt _ x as well as on the exogenous assumptions for current and future time periods embodied in kt. The problem is, as any forecaster knows, that one rarely has complete information on the initial state )>,_!, from which the current feedback value for x, can be determined. The problem becomes the more serious the shorter the intervals over which policy responds to innovations in economic activity. But not only is there a delay in collating data, a number of important macroeconomic time series will be subject to subsequent revision as more accurate estimates become available. Moreover, estimates often become available for aggregated series, such as consumption, before data for the components, such as durable and nondurable consumption. Data may also be collected at a more frequent interval than that of the macroeconometric model. For quarterly models monthly data may provide a partial guide to the outcome for the quarter as a whole, while for yearly models both monthly and quarterly data will provide a good guide to the outcome for the whole year, well before complete published data for the year are available. The forecaster or policy-maker is also aided in obtaining an estimate for the initial state by auxiliary information provided by sources external to the model. Any macroeconometric model used for policy analysis is going to involve a level of abstraction. The practicalities and costs of handling large models place a constraint on the degree of detail they can contain. Rather than try to model every piece of possible systematic information, the model-user
4.7 The Kalman Filter and observation errors
83
will attempt to combine auxiliary information with the results of the model. For example, estimates of the current level of consumption and forecasts of future consumption can be improved by data which are available on retail sales, by recent surveys of consumer intentions or even by anecdotal evidence provided by informal contacts with the retail trade. In this section we examine formal ways of 'filtering' these various kinds of information in order to obtain the best estimate of the current state of the economy and the way in which the quality of the estimate affects the type of policy which is implemented. In chapter 3 the relative merits of a minimal versus a non-minimal statespace realisation were discussed only in terms of the direct economic interpretation of states and the numerical advantages of a minimal realisation. For the non-minimal realisation the observation equation was redundant. But when there are uncertainties about the current state the observation equation can be used as a way of modelling data inadequacies and generating the optimal estimate of the state. Consider the linear model written as: yt = Ayt _! + Bxt + Cet + et = Hyt + rjt
(4.7.2a) (4.7.2b)
where E{r\t) = 0 and E(r]ftrjt) = F t .
This formulation can be used as a flexible device for incorporating various kinds of uncertainty. Note that if the lagged state of the economy is observed without error, rjt = O, H is the identity matrix and equations (4.7.2a) and (4.7.2b) are identical to the original system. The representation described by (4.7.2a) and (4.7.2b) has been used to pool information provided by different models and to combine monthly and quarterly forecasts, to combine auxiliary non-model-based information with model forecasts, to take account of measurement error in initial data estimates, and even to obtain estimates of unobservable expectational variables. Consider, as an example, a situation in which for a quarterly model there are monthly data available for only the first two months of the previous quarter. If we define three elements of the vector z to be the monthly observations, the relevant part of the observation equation relating the jth element of the vector of quarter observations, yt-l9 to the jth, yth + 1 and ;th + 2 elements of the vector of latest measurements provided by z is: 1 1 1 0 0 0
(4.7.3)
0 0 0 with the corresponding diagonal elements of the observation-error co variance matrix being zero for the monthly observations which are available and very large for the monthly observations not yet available.
84
4 UNCERTAINTY AND RISK
If there are no data for a higher frequency than the time interval of the model, it is possible to fill out any missing observations by using the model itself to solve for an estimate for yt_x. The appropriate elements in T would then be the forecast variance-covariance error generated by the model. Feedback occurs in response to innovations in et (ignoring et); this is not observed immediately but is reflected in unanticipated changes in yt _ x, in the form of actual changes in, for example, output, employment and inflation. But if we do not have precise measurements we must rely on more or less accurate estimates in the form of zt _ x. These estimates will change from period to period as new information becomes available. This information is filtered to provide the optimal estimate to the extent that, if certain sources of information, such as preliminary data estimates, provide poor guides to the actual outcome, a relatively low weighting will be attached to them. Filtered estimates can be generated recursively so that previous estimates are updated in the light of the most recent observations. This has the advantage that recursive estimates economise on both the number of calculations which have to be done and the amount of information which needs to be carried from period to period. The method of recursive estimation of the state is referred to as the Kalman Filter. Because of its convenience it has been of particular use in aerospace engineering. But it has also seen many applications in economics. It has been most widely used in the recursive estimation of parameters, used in tests for structural change (Rustem and Velupillai, 1979), for misspecification and for the recursive generation of residuals (Harvey, 1981), as well as for a variety of other estimation and hypothesis-testing problems. It has also been applied to problems arising from the day-to-day use of econometric models in forecasting and policy analysis, the pooling of short-term, medium-term and long-term model forecasts and the filtering of information (Kalchbrenner and Tinsley, 1980). A decision to alter policy at time period t is assumed to be based on information up to that moment. This information is assumed to be incomplete. We thus want the best estimate of yt-l9 conditional on information available up to period r, denoted by yt_ u , such that the estimate is optimal, i.e. we want to minimise The unconventional dating of the conditioning information set relative to the period for which an estimate is required is intended to reflect the disparate nature and uneven character of the information set available. While there may be accurate measurements of current interest and exchange rates, reliable measure of inventory behaviour may only be available for six to nine months before. However, there is no problem in changing the data of the conditioning information set to f — 1.
4.7 The Kalman Filter and observation errors
85
Given our prior knowledge of the conditional probability density of yt_x based on the information set Qt, p(yt-i\Q, we have: l|"r) = £-!,,
(4.7.5)
so that E(yt\Qt) = Ayt_u
+ Bxt + CeUi
(4.7.6)
In this case we assume that the generation of the expectation for et (and for exogenous variables in future periods) is a completely separate issue, though in principle we should also treat the problem of updating the exogenous assumptions as a filtering problem. The conditional covariance of yt-u is defined as: 1 |Q f )
= Af_1
(4.7.7)
so that cov(yt|Qt) = A\t-XA'
+ r, = A,
(4.7.8)
The likelihood function, determined by (4.7.4), together with the normal probability-density function (PDF) of the observation error nn is: p(Q t |^_ 1 ) = constxexp(l/2||z f _ 1 -/fj; f _ 1 J| 2 r r - 1 )
(4.7.9)
The observation-error variance-covariance matrix will be in general timevarying because of possible fluctuations in the type and quality of information available to policy-makers. The prior PDF for the random variable yt_x before observing zt_x is: /7(>;f_1|Q) = constxexp(l/2||j; r _ 1 , t -^_ 1 , f _ 1 || 2 rr 1 )
(4.7.10)
so the conditional PDF, for yt_\ given z,_1? is, by Bayes' law: i,,-A-u-i||2Ar1)
(4.7.11)
Differentiating (4.7.11) with respect to yt-u we have: (HTr^+Ar^-i^ArVr-u-i+HTr1^^
(4.7.12)
Using the matrix lemmas: (/fTt-1H + A r - 1 )" 1 =Ar 1 -A t // / (/fAr 1 //+r t )- 1 //A t ~ 1
(4.7.13)
and (HT-1H + At-1)HT-1=A-1Hf(HAt-1Hf
+ Tt)-1
(4.7.14)
equation (4.7.12) can be rewritten: yt-i,t = Pt-ut-i + Ft(zt-i ~Hyt-^-i)
(4.7.15)
86
4 UNCERTAINTY AND RISK
where The best current estimate of yt _x is therefore a linear combination of the previous best estimate and a correction for the difference between the previous estimate and the latest observations. The filter gain, Fn is independent of the structure of the optimal closedloop rule, which now is: xf = K,JUi., + *i
(4-7.16)
where the precisely observable yt_1 used in the formulation of chapter 3 is replaced by the optimal estimate yt-u provided by (4.7.15). The separation between the structure of the closed-loop rule, represented by the feedback and feedforward terms Kt and kt, and the measurements made of innovations to st does not mean that actual policy innovations will be independent of the quality of information available. Consider the scalar case: y^ayt-i+bXt
+ Bt
(4.7.17)
where observations of yt _ x are provided by: zf_2 =y f _ x + ^ r
(4.7.18)
where E(rjt) = 0 and E(rj'trjt) = o*v The optimal estimate of yt _ x at time period t is: yt-u = yt-u-i
+ [^/K2r + ^)]fe-i -9t-u-i)
(4.7.19)
where o*t is a time-varying measure of the quality of observations of yt _ x. When o*t is zero there are precise observations of yt_x available. But, if for some reason there are doubts about the reliability of recent observations, policy revisions at time period t, relative to what they otherwise would have been on the basis of information available at t — 1, may be more cautious. In some circumstances it is possible to imagine the policy-maker continuing with the previously optimal 'open-loop' policy because of the complete unreliability of the latest measurements. The practical pursuit of a closed-loop policy will, therefore, be affected by the uncertainty attached to current estimates of the state of the economy. The more uncertain these estimates, the less the policy-maker will depart from the policy he originally settled upon on the basis of information available in previous periods.
Risk aversion, priorities and achievements 5.1
INTRODUCTION
The selection of one preferred policy package from a set of candidates involves the use of at least an implicit criterion. In policy studies, specifically those using optimal-control techniques, this criterion is made explicit. Yet a precise and realistic specification of the parameters defining the derivatives of an explicit-criterion function is extremely difficult. This difficulty is compounded by the fact that policy-makers have been reluctant, and usually unable, to specify their relative priorities in advance of a planning exercise. Typically they need to get a feel for how much they can or cannot expect to achieve; they need to establish the trade-offs available between policy goals, and the likelihood of not being able to reach them, in order to set the limits to policymaking and acceptable compromise. If it is hard for policy-makers to set relative priorities in explicit numerical form, it is certainly no easier for their advisory staff. As we have pointed out, the specified objectives will be expressed in the form of an approximation to some generally unknown, or incompletely known, collective-preference function. This preference function is unknown partly because policy-makers are unable to give a full description of their relative priorities for all possible events, but also because they cannot know in advance the preferences, bargaining strengths or solutions acceptable to the other decision-makers who influence economic decisions. If other decision-makers are involved, then to ignore their influences on the final decision might introduce serious errors. An approximate-criterion function has to be evaluated, at least implicitly, at some point of the policy space. In this chapter we shall work with the multiperiod certainty-equivalent decision rules developed in sections 3.4 and 3.5. This is for the convenience of being able to examine the preference structure over all planning periods simultaneously, conditioned on arbitrary information available at any one (and hence all) of the decision points.
88 5.2
5 RISK AVERSION, PRIORITIES AND ACHIEVEMENTS APPROXIMATE PRIORITIES AND APPROXIMATE-DECISION RULES
A quadratic objective function generally can be regarded as no more than a local approximation to the true preferences, evaluated about some particular policy values, just as any model approximates the behaviour of the economy about the observed levels of its variables. If policy values SZ are rationally chosen, then the value of some index of economic performance can be associated with each SZ value. The ability to do no more than rank SZ values in terms of this index (which is the direct consequence of presuming rational choices as a behavioural theory for policymakers) implies that the index is, at most, an ordinal preference function of SZ, say: J = J(SZ)
(5.2.1)
which is convex and written as a loss function so that the optimal policy is the minimiser of (5.2.1). Preferences are expressed as relative penalties on failing to reach the desired or ideal values Zd. This immediately introduces a restriction on the preference function since it implies that the unconstrained minimum of (5.2.1) lies at Zd and so satiation is implied. One way of ensuring that policy choices are always made on the falling portion of the loss function is to set the desired values sufficiently large/small to ensure that the marginal loss is always falling. A second-order expansion of (5.2.1) about some feasible point Z* ignoring the constant, to which minimisation is invariant, and terms other than second order - gives: J^SZ'QSZ + q'SZ
(5.2.2)
where
G= Thus Q is symmetric and positive definite if (5.2.1) is strictly convex and twice differentiable, at least over the feasible region of the model. Now (5.2.2) makes clear that the relative penalties on the 'failures' SZ, appropriate for the specification of J(Z), must always be dependent on the actual dZ values chosen. These penalties may well be sensitive to small changes in dZ values. Indeed there are strong reasons for arguing that this is the typical case in economic policy-making.1 Thus an objective function of the form (5.2.2) cannot normally be expected to be valid for more than a rather small area of the feasible policy space. Moreover, (5.2.2) specifies ideal values of Zd — Q~1q, so that q = 0 if Zd represents a genuinely ideal path for the economy. However, (5.2.2) could also be specified with q non-zero and with Zd replaced by the artificial values Za = Zd + Q~1q. Finally q could be specified as non-zero to
5.2 Approximate priorities and decision rules
89
give preferences which are asymmetric about the true Zd values (because dJ/dZ = q is evaluated at Zd). Presumably q would be chosen so that the constrained optimum 8Z* has the more desirable sign in each element. The first two options are equivalent, whereas the last biases (5.2.2) to give results for the optimal polices over a wider part of the policy space than is available from a second-order approximation to (5.2.1). Whether the final option is used will depend on the amount of prior information there is available on Zd or on J(SZ). The marginal measure of local relative risk aversion 2 for Z t , holding other variables constant, would be:
so that the quadratic approximation already embodies a measure of risk aversion through the choice of the desired values Z d , even though this might conflict with the need to ensure that choices are always being made on the falling portion of the loss function. Taking the quadratic objective function separately, relative risk aversion is: -Z^dZ,)-1 (5.2.4) so that relative risk aversion increases as we approach the unconstrained optimum from one side and diminishes as we approach it from the other. Thus procedures for incorporating risk aversion into policy-making are closely tied up with the specification of the objective function. If we introduce a risk premium, which for many cases (Newberry and Stiglitz, 1981) is approximately equal to one half the variance times the coefficient of absolute risk aversion, then we can link the variance-minimising policies of chapter 4 with standard measures of risk aversion. Thus the size of the risk premium depends on the shape of the preference function and the variance of outcomes. Note that in this section we have included both targets and policy instruments so the question of the variance of the policy-instrument settings is also relevant to the analysis of risk. The risk premium is a measure of the cost of risk in terms of a fall in expected utility. This brings us to our second point about the analysis ofriskin macroeconomic policy-making. The standard analysis of risk in decisionmaking involves questions about how individuals and firms respond to risk. Macroeconomic policy may involve risk for policy-makers because of the chances of being re-elected but, if the policy-maker's objective function is identified with a social-welfare function, then the policy-maker may be averse to overall non-diversifiable market risk. For example, in the capital-asset pricing model (Sharpe, 1964; Lintner, 1965; Mossim, 1966) the cost of capital used to discount a stream of expected cash flows for a particular firm will reflect both the unique risk associated with this stream and overall market risk. For the case where the expected returns of an individual firm are perfectly
90
5 RISK AVERSION, PRIORITIES AND ACHIEVEMENTS
correlated with expected market returns the cost of capital is the same as the average market cost of capital, and in turn this cost of capital will reflect nondiversifiable market risk relative to assets with a safe return. Individual lenders and borrowers can reduce the risk attached to investing in fixed and financial assets by diversifying. But the minimum risk is the average market risk and this in turn is determined by macroeconomic factors inducing unpredictable and stochastic cyclical movements in the economy. If economic policy-makers can moderate cyclical movements then it is possible that market risk will be reduced and the average cost of capital fall. This may have beneficial supply side-effects by increasing aggregate investment. Individuals can reduce overall risk by diversification both between safe and risky assets and between alternative risk assets which have negative covariances. In most cases macroeconomic policy-makers cannot reduce policy risks by diversification; although it is possible to have circumstances in which small countries might adopt strategies - such as overseas fixed and portfolio investment - to spread the risks of an internal stabilisation policy.
5.3
RESPECIFYING TtfE PREFERENCE FUNCTION
We first review the question of how to specify relative priorities for the targets and instruments in a multiperiod problem, when the policy-makers' 'true' collective preferences are not known. This will be done for a fixed information set, but later we shall consider the respecification induced by the policymakers' risk sensitivity. Respecifications, for given information, can be carried out using an algorithm due to Rustem et al. (1979) and later extended by Ancot et al. (1982) and Hughes Hallett and Rees (1983). It is based on a variable metric algorithm which uses a 'downhill' property, i.e. J[c5Z(5+1)] < J[<5Z(S)], to generate an improved policy vector Z (s + i ) from a trial solution Z(s). s is an iteration counter. At convergence, the true minimiser Z*, and a quadratic approximation to J(dZ) at Z*, will be obtained without needing an explicit specification for J(SZ). Let d{s) = Z{s + 1)-Z{s)
(5.3.1)
and y(s) = (<3J/3Z)Z(S + 1) - (3J/dZ)Z(s) Then a symmetric rank-one algorithm computes
(5.3.2)
5.3 Respecifying the preference function
91
Consider each member of the sequence {Z(s)} to be the constrained optimum of (5.2.1) corresponding to each member of the possibility set {Q{s)} for g. Substituting for Q{s + 1) by (5.3.3) into the expression for optimal values for Z :3
[6 ( s + 1 V 5 ~ ~ [G *V -d J{Q {s) [I -F (s +
(s)
{s)
(5.3.4) (s)
( s + 1)
/
(s+1)
1
where P = / - Q /f [//e H']- /f, and / / = (/': - R ) . The difficulty with this is that it assumes that the policy-makers know J(SZ), at least as far as the downhill property and gradient evaluations are concerned. We can remove this difficulty and can supply a feasible or infeasible policy vector which policy-makers prefer (in the scale of the true preferences J(SZ)) to some arbitrary feasible choice Z(s). We may assume that this will be the case, for it only implies that, given enough time, experimentation and introspection, policy-makers will eventually be able to make a choice. To deny this assumption is to deny the existence of a collectivepreference ordering, the existence of J(SZ) or the possibility that some defensible choice (with respect to Zd) can emerge. Thus let Zp( =/ Zd) be any set of points preferred to every feasible choice considered so far: J(Zp — Zd) < J[Z{1) — (Zd)] for f = l , . . . , 5 . Now suppose that policy-makers pick a sequence for Zp vectors in order to sample the Pareto-optimal frontier of the feasible set. Each Zp is one in a series of experiments to locate which feasible paths are preferable to those evaluated so far. The definition of the permissible Zp values for successive experiments implies new candidates Z (s + 1) which move steadily downhill in terms of J(SZ) and, as a consequence of convexity, systematically towards the constrained optimum of J(SZ) - i.e. towards policies which, after sampling the entire frontier of the feasible set, the policymakers would have judged to be the most acceptable. Thus Z{s + 1) must satisfy4 \\y(s+l)_y
|P< I V(s) _ 7 ||
fS 1 S\
in order to ensure the 'downhill' property. The gradient evaluation difficulty is overcome by post-multiplying (5.3.3) by d{s\ which yields: yM
=
faQMtfs)
(5.3.6)
where 4>s is a scalar to be chosen. Inserting >s into (5.3.3) and (5.3.4) produces: Ois + 1)-Ois)
+ (6 -1)
O{s)d{s)d{s)'O(s) —
(5 3 7)
and (5.3.8)
92
5 RISK AVERSION, PRIORITIES AND ACHIEVEMENTS
where OLS =
— ^ j —
and - P(s)]d(s
The downhill property and gradient evaluations can therefore be introduced without explicit knowledge of the contours of J(SZ). The step length, >s, is chosen subject only to (5.3.5). It is therefore convenient, and efficient, to use (5.3.7) and (5.3.8) to cover an arbitrarily chosen fraction of the distance to Zp itself. Thus (5.3.1) is replaced by: d{s) = Zp-Z{s)
(5.3.9) (s + 1)
which avoids solving (5.3.7) and (5.3.8) simultaneously for Z . The fact that d{s) contains Zp in place of Z (s + 1) introduces an approximation to the true symmetric rank-one algorithm. However, the resulting differences between the direction vectors (and hence the iterates) in (5.3.8) and the true algorithm vanish at convergence. And convergence is guaranteed provided s is chosen within a suitable non-empty interval (Hughes Hallett and Ancot, 1982). Hence the algorithm now reconstructs the steps of a variable metric optimisation algorithm endowed with the 'downhill' property but without accurate line searches at each step. The policy search proceeds from an arbitrary g ( 1 ) and the optimal Z (1) associated with it, and sets suitable <£s values to control the step lengths. 5.4
RISK-SENSITIVE PRIORITIES! COMPUTABLE VON NEUMANN-MORGENSTERN UTILITY FUNCTIONS
In section 5.3 we considered ways in which approximate priorities concerning the decision variables could be respecified systematically to give a more accurate evaluation of the underlying preference function. This was necessary because we are using quadratic approximations to the 'true' preference structure, and because that preference structure is generally imperfectly known. In chapter 4 we were dealing with exactly such a case when deriving the risk-sensitive decisions. We argued there that, even if the specified quadratic objective function represents the policy-maker's utilities in a deterministic setting, some modifications would be necessary in a stochastic case to construct the von Neumann-Morgenstern utility function of a riskaverse decision-maker since the original specification would be risk-neutral
5.4 Risk-sensitive priorities: utility functions
93
under the usual expected-utility criterion. Although one might try to respecify the original priorities to achieve a von Neumann-Morgenstern utility function by means of the algorithm in section 5.3, it would be hard to do so since it is much easier to trade off expected values (and thus set Zp) than it is to trade off ranges of values with associated probabilities for each of the variables. Fortunately this would not be necessary. The rest of this chapter is concerned with some important cases in which respecification is necessary but does not require (5.3.7) and (5.3.8) or explicit Zp values. The von Neumann-Morgenstern axioms of choice ensure the existence of an objective function which combines risk-aversion and preference elements, such that optimising its expectation produces decisions consistent with the decision-maker's true tastes. But this objective function is not generally identical to the expectation of preference components on their own, specified independently of the distribution functions of the underlying random variables (Machina, 1982). Both the mean-variance criterion (4.6.1) and the explicit trade-off of security against ambition in (4.6.3) lead to the risk-sensitive decision which replaces Et(J) by: Et(J0) = Et{ l/2SZ'[<xQ0 + (1 - a)Q\6Z + [*q'o + (1 - a)q']dZ}
(5.4.1)
as the criterion to be minimised subject to the model (3.4.1). Moreover, since applying the certainty-equivalence theorem to Et(J) alone cannot produce risk-sensitive decisions, whereas optimising a combination of the first two moments of the same performance index can do so, J o must represent some utility function consistent with a risk-averse decision-maker's tastes. 5 The resulting decision reacts to the form as well as to the degree of uncertainty involved. In the last chapter (section 4.6.3) we pointed out that (5.4.1) implies that the risk-sensitive decision can be computed as reparameterised certaintyequivalent decisions (3.4.4) where Q is replaced by a() 0 + (l— <x)Q, where 6o = ,
0
oj
and q by aq 0 + (1 - <x)q, where q0 =
'••
The respecification
needed to produce this approximation to the von Neumann-Morgenstern utility function is automatic (without need of the specification algorithm), once the value of a is set. Thus, risk-sensitive decisions handle high-risk targets (those with large variances in ^ x ) by raising the relative weight on deviations from their ideal values, while lowering the penalties on the low-risk targets, and by removing the penalties on instruments altogether. The increases in target penalties are further raised for important targets (i.e. where Q* elements
94
5 RISK AVERSION, PRIORITIES AND ACHIEVEMENTS
exceed unity, taking unity as the normalisation in Q) but lowered for unimportant targets (i.e. where Q* elements are less than unity). Thus, unless a target was relatively unimportant in the original objective (5.2.2), the larger the uncertainty about a target's realisation the greater the effort made to trace its ideal value by the risk-sensitive decisions. This concentration on high-risk variables constitutes a simplification of the optimal strategy, in which low-risk targets and the instruments are permitted to fluctuate more freely in order to secure smaller expected deviations in the high-risk targets. 6 This result is, of course, precisely opposite to the optimal response to increasing uncertainty over policy impacts (of which there are none here), which calls for smaller rather than larger interventions in terms of x*—x? (see Brainard, 1967; Hughes Hallett and Rees, 1983, p. 308; and sections 4.2 and 4.3 above). These risk-sensitive decisions also allow for the possibility that random shocks are more likely to cause target realisations to be on one side of their planned values than on the other; the unconstrained optimum is adjusted in the opposite direction. For simplicity, suppose qt=0. The transformed preferences then show that risk-averse decisions aim for 'ideal' values7 of Yd — B~1\j/ils*, which implies preferences asymmetric with respect to yd itself. Roughly, if some yt is positively skewed then its implicit ideal value is shifted negatively. Each positive 'failure' yt — yd (i= 1,.. .,pT) therefore attracts larger unit penalties, while those on negative failures are reduced. If a = 1, duo/dy = Q*(il/ib + s*) at / . Where dJ0/dY has positive elements, the preference surface has become sharply curved for the associated targets and, falling short of the ideal, will be penalised at a lower rate than overshooting. But where dJJdY has negative elements, the preference surface has a shallow curvature and undershooting is penalised more than overshooting. Thus the asymmetries bias the decisions on the safe side. But their impacts are also scaled down for high-priority targets and for those where the skew is small relative to the variance. Risk-sensitive decisions therefore respond cautiously to both high and asymmetric risks. They embody both types of asymmetry which Waud (1976) demonstrated to be an intrinsic part of riskaverse decision-making - namely that asymmetric penalties should be assigned to positive and negative deviations from the ideal target values, and that decisions should also allow for any asymmetrically distributed probabilities of those deviations. The classic difficulty of introducing formal methods into economic decision-making is the question of how to specify a utility function to reflect the policy-maker's aspirations and needs, and this of course includes the specification of suitable risk-aversion parameter(s). The advantage of the approach in this book is that specification can proceed in two stages. We first set relative priorities on reaching each variable's ideal or desired value for the case of no uncertainty. This establishes trade-offs among targets and between targets and instruments. Then we can specify how those priorities should be
5.5 Risk-bearing in economic theory
95
modified to allow for the varying probabilities that the target realisations will actually differ from their ideal values by different amounts than were originally planned. Thus the two functions of the optimisation criterion (to trade off predictable policy 'failures' due to infeasibility and to trade off the random components of policy 'failures') are parameterised separately. Thereafter the criterion may be reduced to a single function of decision variables, and elaborated or simplified as required. Indeed, one would expect that a correctly constructed certainty-equivalence criterion should reduce to reformulating the parameters of the underlying objective function in order to retain a measure of aversion to variance, and this is exactly what this twostage approach does. 5.5
RISK-BEARING IN ECONOMIC THEORY
The von Neumann-Morgenstern axioms of choice are the conventional starting-point for studying the management of risk in economic theory. These axioms define the conditions under which decisions in the face of risk can be characterised as maximising the expected utility of the possible outcomes. The possible outcomes here are indicated by the variability of the decision criterion. An expected-utility approach therefore involves modifying J before taking expectations, in order to associate utilities with the risky outcomes (Machina, 1982). The standard modification has been to adjust the targets for a risk 'premium' until the decision-maker is indifferent between accepting the known mean of the adjusted variables and minimising Et(J) directly. In fact explicit von Neumann-Morgenstern utility functions can be avoided altogether by introducing risk 'premiums' in the multivariate generalisation of the standard Arrow-Pratt analysis of decisions under risk aversion, due to Duncan (1977).8 We now show that this standard analysis produces decisions which are a special case of the risk-sensitive decisions studied in chapter 4. For our problem, the risk-premium vector n is defined by J[E(y) — n, X] = E(J). Taking second-order approximations about E(3Z) on both sides implies:
= J\E(&Zj\ + 1/2 £ (^/^)
(5.5.1)
where J1(.=dJ/dZi, JiJ(.) = d2J/dZidZj, and i/^ is a typical element of \j/. Neglecting the third moments and using 82J/dYdYf = g*, a solution to (5.5.1) is n~ -l/2(co)+ tr(<2*i/0 where co = (Jl9..., JpT) and ' + ' denotes a generalised inverse. Duncan's 'natural' estimate of n is n= —1/2 dg(Pi^) for P = (di)y1d2J/dYdY. The operator dg(.) is the inverse of
96
5 RISK AVERSION, PRIORITIES AND ACHIEVEMENTS
diagonalisation - it places the main diagonal of a matrix into a column vector. The risk-aversion matrix is P = D~1Q*9 with D = diag[(/:0)(Q5Z + )] when (5.2.2) is used, or D = dmg(Q*8Y + qx) when the objective function (4.5.1) is used. Thus the risk-premium vector depends on the decision values, while the decision values in turn depend on the risk-premium vector. Suppose we wish to make some certainty-equivalent decision (say Xo, with expected outcomes of Yo) sensitive to risk. Evaluating the multivariate riskpremium measure at (Y0,X0) yields: W)
(5.5.2)
because inserting SZ0 into the formula given for P defines dg(D0 = Q*dY0 for the situations where (4.5.1) applies. But (5.5.1) can be rewritten as dg{D 0 = — l ^ f d i a g ^ o ) ] " 1 dg(<2*^)}; which implies that the optimal decision corresponding to the degree of risk aversion n0 should have in fact used q™ = -Q*SY0 - l/2(diag Tio)"1 dg(g*iA)
(5.5.3)
in place of the value which was actually used to generate 3X0 and SY0. Now (5.5.3) has implicit ideal values Zd — Q~xq, so it may be written, for any Z (s) , as: tf + u = qif) _ [Q*5Yi8) + l/2(diag TTj"1 dg(6«W]
(5.5.4)
(s)
where ns is given by (5.5.2) evaluated at Z . The convergence point of the twopart iteration consisting of (5.5.4), together with SY{S + 1) = RSX(S + 1) 4- E(b) and
IMTI
(5.5.5)
would, if attained, supply optimal Arrow-Pratt-type decisions. The modifications to the preference structure here are restricted to the target components since the risk premium applies only to the targets. These Arrow-Pratt decisions are special cases of the risk-sensitive decisions X** and X° of chapter 4 in that only the linear, and not the quadratic, part of the preference structure in (5.2.2) is modified. Moreover, the third-order moments, which appear in 5* of (4.5.3) and (4.6.2), are missing altogether from (5.5.5). In order to clarify the role of the risk premium in optimal decisions, suppose the iteration (5.5.5) converges to a value q*' = (qf'.q*'). The optimal Arrow-Pratt decisions then correspond to solving min(l/2<5Z/e<5Z + q*'8Z\8Y = RSX +b,Qt) (5.5.6) x which has implicit ideal values Zd — Q ~ lq*. Once again let Zd be genuine ideal values, so that q = 0. The risk premium then has the effect of shifting the 'point
5.6 Utility-association approach to risk-bearing
97
of aim' of the optimisation process by -Q*~lqf, while the relative priorities remain unchanged. The penalties assigned to unit target failures of either sign about the elements of Zd are no longer equal. In fact dJ/dY = q\ at Yd, so once again the signs in qX determine the form of asymmetry about Yd. But these asymmetries apply to the target variables only and lead to only one of the two types of risk asymmetry described by Waud (1976). That is to say, the riskpremium approach implies an optimisation problem in which penalties on potential outcomes on the undesirable side of Yd are larger than those which apply to equal deviations on the other side. But no allowance is made for the second type of asymmetry, i.e. that a given deviation may be more probable on one side than on the other. Rather, equal deviations of either sign are assumed to have equal probability.
5.6
THE UTILITY-ASSOCIATION APPROACH TO RISK-BEARING
Several authors have favoured an alternative approach to risk-bearing. Given (5.2.2), they use a standard utility function u(J), defined over the range of J, to associate utility values with the stochastic outcomes of the variables whose values are to be optimised (Malinvaud, 1972; Peleg and Yaari, 1975; Johansen, 1980). Taking a second-order approximation about E(J) yields the conventional criterion (Newberry and Stiglitz, 1981, p. 73). Setting /?=a/l—a leads to our mean-variance criterion (4.6.1) when J is also normally distributed (Newberry and Stiglitz, 1981). Another special case is Johansen's (1980) suggestion of u = E(& z ). These two suggestions are too specialised to be useful because of the particular distribution necessary for the quadratic form of the random variables to be distributed normally. However, Whittle (1982) and Van der Ploeg (1984) note that (4.6.1) also follows from w = j ^" 1 (^ J — 1) or from u = f$~l In E(epj), when PV(J) is small, since in either case E(u)~ aE(J) + P/2V(J). But whether f$V(J) will take on small values is a property dependent on the parameters of the particular problem and on the choice of X (since a risk-sensitive decision-maker will not generally pick small values of a). Hence this interpretation cannot be invoked in advance of selecting the value of X. And even then the minimisation of E(u) still requires normally and independently distributed random errors, plus the imposition of stochastic dependence on components of the constrained objective function partitioned with respect to t (i.e. no intertemporal risk components). Of those two assumptions, normality may not be plausible in many economic applications; and stochastic separability is certainly not valid in finite-horizon problems (see Hughes Hallett, 1984a) without imposing the additional restriction of non-anticipation on the decision-maker and therefore clashing with recent economic theory which stresses the leading role of endogenous expectations.
98
5 RISK AVERSION, PRIORITIES AND ACHIEVEMENTS
On the other hand, the expected values of these two exponential utility functions represent the moment and the cumulative generating functions of J respectively. To maximise those expectations is therefore to optimise a particular combination of all moments of the distribution of J. Thus the exponential approach and the general utility-function approach of this chapter are in some respects complementary. The former allows all the stochastic characteristics of J to be optimised under restrictive assumptions on the form of utility function, on the distribution of the underlying random variables, and on the information content of the decisions. The latter, meanwhile, has E[ku(J)] ^ {u[E(J)~} + l/2(d2u/dJ2)V(J)}
(5.6.1)
where X is an arbitrary normalisation constant. This approximation is a special case of the Ej(J) + Ea/0 criterion advocated in chapter 4, when E(J) = Xu[E(J)~\. The latter condition is achieved (without loss of generality) by setting X = E(J)/u[E(J)~]. Moreover, maximising (5.6.1) is equivalent to minimising the mean-variance criterion at (4.6.1) provided that A < 0 , and that X(d2u/dJ2) = a/(l - a), when evaluated at £(J), defines the value of a. The coefficient of relative risk aversion is p = — a V( J)/[( 1 — oc)E( J ) 2 ] , so the degree of risk aversion is controlled by the value set for a. The unusual step of introducing X < 0, so that the utility index is always negative, preserves the notion of utility being desirable and therefore to be maximised, while J represents a loss function to be minimised. A sign change is necessary to associate higher utilities with increasing welfare ( — J) and lower values of J. The usual concavity property will then hold, together with marginal utilities which increase with welfare, only if 0 < a < 1. Such restrictions on a might have been inferred from (4.6.1) and (4.6.2), and they ensure positive first but negative second derivatives (with respect to — J) of the underlying utility function. The replacement of u{.) by a linear function in the first component of (5.6.1) is defined it to be risk-neutral with respect to the variability in the possible outcomes of J. Risk neutrality holds if the stochastic objective J satisfies E(uJ) = u[E{J)~\, in other words, where the risk premium tends to zero so that the certainty-equivalence theorem may be applied. Therefore, the replacement of u[£(J)] by E(J)/X is neither arbitrary nor just to force (4.6.1) to have a utility interpretation. In fact (5.6.1) represents a second-order expansion of the expectation of a standard utility function about its riskneutral component. Certain specialised forms of utility function, combined with the assumption of normally distributed random variables, will produce similar risk-sensitive decision rules directly. For example, the exponential utility function u= —Xe~^\ with a constant coefficient of absolute risk aversion, employs a distribution-free method and incorporates the intertemporal risk-aversion
5.7 Risk management in statistical decisions
99
components of an implicit utility function to optimise a combination of the first two moments of J. Until more powerful optimisation techniques can combine the two, the question of which of these two generalisations is more important is a matter of particular cases. The point here is that the risksensitive decisions of (4.6.2) are consistent with the conventional measures of risk aversion; and they can also be derived in a distribution-free form with due allowance for the anticipations generated by intertemporal risks. Finally, the utility interpretation for u0 defined in (5.4.1) is established by the fact that Et(u0) approximates the expectation of a second-order expansion of any standard utility function, defined on J and thus on Z, about its riskneutral component. In other words, u0 constitutes a second-order approximation to a von Neumann-Morgenstern utility function, and Et(u0) is a particular 'generalised expected-utility' function of the type analysed by Machina (1982). The advantages of using a second-order approximation to von Neumann-Morgenstern utility functions are: (i) linear and distributionfree risk-sensitive decision rules; (ii) the ability to accommodate anticipations via intertemporal risk aversion in the decision structure; (iii) the ability to allow for asymmetric risks; and (iv) the analytical and computational convenience of being able to treat the problem as one of certainty equivalence applied to a reparameterised version of the original preference function.
5.7
RISK MANAGEMENT IN STATISTICAL DECISIONS
In the statistical literature the problem of risk is handled in a more straightforward manner. It is conventional to set up a risk function which measures the losses to be expected from implementing decisions with incomplete information relative to those decisions computed with perfect information. If we take a quadratic loss function measured in the metric of C, where C is any positive definite matrix, then the appropriate risk function would be E(X*-X°)'C(X*-X°). For the present problem we have taken X* = M~1E1(m1) from (3.4.4) and X°= -M~1m1. Here X* represents the (ex-ante) decision rule to be implemented and X° the (ex-ante) optimal decisions which would have been implemented under perfect information. The objective then is to pick X* in such a way as to minimise the expected losses, given C, due to uncertainty. 5.7.1
The regression analogy
Consider the objective (5.7.2) where q = 0. The certainty-equivalent decisions (3.4.4) can be written as: (5.7.1)
100
5 RISK AVERSION, PRIORITIES AND ACHIEVEMENTS
where 5 + = (S'QSy^'Q
is a generalised left inverse of S' = (R':I). Then
SZ* = (I-S'M-'SQ)]
""'
(5.7.2)
|_ 0 J
so that S + dZ* = 0 . Since S is of full column rank with (mxn)T>nT, this particular choice of generalised inverse minimises the norm \\Z — Zd\\2 in the metric Q. This choice of inverse therefore defines what is meant by 'suitable' instrument values. More interestingly S+ means the generalised least-square solution to fitting SX to Ex(b) in the 'regression model': dY
(5.7.3)
(see Havenner and Craine, 1981). Now (5.2.2) specifies that:
and the instrument-penalty matrix N* gives (R'Q*R + N*)~1R'Q*E1(b), in (5.7.1), the character of a generalised ridge regression. This implies that N* can be used to reduce the mean-square errors, and also the expected losses from uncertainty, when it turns out that the realised values of b do not coincide with their previously expected values. Hence, AT* reduces the risks imposed by having to implement X* = Xd-M~1(ml) rather than the unknown but true (ex-post) optimal values X* = Xd — M~1m1. The second-moment or meansquare error matrix EX(X* —X°)(X* -X°)f is smaller when certain non-zero values of N* are chosen compared to the case where N* = 0. Smaller here means that the difference between these two second-moment matrices is negative semidefmite. The decisions using such values of N* are then also superior to those using N* = 0 in that the implied quadratic loss of risk function E((j)) = E{X* — X°)fC(X* — X°) is smaller for any symmetric and positive (semi)definite matrix C (see Theobald, 1974). However, we already noted (section 4.6.2 above) that the loss due to uncertainty in period 1 is given by 4> = 1/2(X* - X°)'(R'BR + A)(X* - X°). Hence, to reduce the general risk function Ex{(j)) is to reduce, in particular, the expected losses due to uncertainty in E1((j)). The risk function £($) has some practical advantages over the riskaversion criterion found in economic theory. The latter is constructed from some implicit utility function, which leads to difficulties in specifying an explicit risk-averse criterion when it comes to calculating optimal decisions numerically. The advantage of the present approach is that, for an arbitrary risk function, it yields a straightforward way of determining the required objective function, and consequently the optimal decision values, explicitly. This is achieved because picking a strategy which reduces the mean-square
5.7 Risk management in statistical decisions
101
errors associated with the instrument values (lowering the risk of instruments being 'wrong') simultaneously reduces the expected losses associated with the corresponding random-target realisations for any utility scheme defined by the matrix C. 5.7.2
The optimal risk-sensitivity matrix
At this point, it is helpful to introduce a canonical form of the model: Y = P£ + b
(5.7.4)
where P = RG and £ = G'dX. Here, G is a matrix of eigenvectors of R'QR, and A is a diagonal matrix of eigenvalues, such that G'G = GG' = I and R'Q*R = GAG'. The benchmark optimal decisions - against which reductions in mean-square errors and expected losses would be measured minimise (5.1.2) with N* = 0 and subject to Qi: b)
(5.7.5)
This rule yields the lowest achievable expected value for ||?|| 2 ,e* a n d therefore represents the most ambitious strategy available, given a free hand with the instruments and no allowance for risk. The corresponding ex-post optimal decision would be: (5.7.6) 1
which has mean Ex(£°) = £* and covariance matrix Vx (£°) = A " 6A ~x where 9=G'R'Q*\I/Q*RG is a known positive-definite and symmetric matrix, and where \j/ is the conditional covariance of the targets. Introducing ridge factors into (5.7.5) and (5.7.6) implies: b)
(5.7.7)
and ^° = (A + D)-1GrRfQ*b
(5.7.8)
respectively, where D is a diagonal matrix. The 'bias' in (5.7.7) is: ^-£1(^°)=[(A + D)-1-A-1]^
(5.7.9)
f f
where jS = G R Q*E1(b) is a known vector. The covariance matrix of <^° about £* is: (5.7.10) The 'ridge-regression' decisions in (5.7.7) are identical to those at (5.7.1) above when N* = GDG''. In this context, 'unbiased' means that the implemented decisions will, on average, coincide exactly with what will appear ex post to have been the optimal decisions. And minimum variance
102
5 RISK AVERSION, PRIORITIES AND ACHIEVEMENTS
implies that the implemented decisions are such that, given the information available at implementation, the corresponding ex-post decisions are distributed most tightly about them. Combining these two properties to minimise the mean square error of £*, about its ex-post optimal value £°, requires D* = diag(df) where9 d* = Al6ii/p?>0
i=l,...,nT
(5.7.11)
The corresponding value of N* = GD*G' minimises the mean-square error of N* about X°, and also minimises the risk function E1((j)) implied by having to implement X* before the realisations of the stochastic variables can be known. Hence, this value of N* minimises E^xj/), the expected losses due to uncertainty, at the same time. 5.7.3 A simplification One obvious way of choosing a 'good', if suboptimal, value for N* is to set di = hx in (5.7.11), or equivalently: N* = R'Q*R
(5.7.12)
This value has some practical advantages (see Hughes Hallett, 1985). (i) It supplies good (if suboptimal) values for the decisions Xt which suffer significant, but not very large, risks - that is, where the coefficients of variation are around unity. It also specifies a significant degree of risk reduction (albeit at a lower level of ambition) for the high-risk decisions inX. (ii) It is very simple, and implies X** = Xd - l/2(RfQ*R)~ 1R'Q*E1 (b), which is half the 'no risk-reduction' decisions. (iii) The variance and mean-square error of X** fall to one quarter of its 'no risk-reduction' counterpart. (iv) However, (5.7.12) implies a relative 'bias' of X(SXf)/Xf in the ith instrument setting. It also implies a relative loss in the expected objective function value of 1/4R2/(1 - R2) where R2 is given by (5.8.1) below. As might be expected this relative loss is small where the 'no risk-reduction' decisions would anyway have been unsuccessful. But the losses will be larger where a feasible (risk-neutral) strategy exists which can satisfy the objectives successfully 5.8
ACHIEVEMENT INDICES
This regression analogy may also be used to provide an index of a strategy's achievements. This involves the JR2 analogy from (5.7.2), S'QSZ* = 0 so that EWQ+E^b) = SZ*'QSZ* + 8X*'(S'QS)5X*, or: R2=l-2J*/E1(b)'Q*E1(b)
(5.8.1)
5.8 Achievement indices
103
The aim of policy, in this analogy, is to pick X so as to annihilate E^b) in (5.7.3). This task can be divided between the controlled sum of squares (SX*'S'QS5X*) and the sum of squared failures (SZ*QSZ* = 2J*). Thus R2 gives a standardised and dimensionless index of relative success, defined on the interval (0,1). It is a useful complement to J* because it allows different strategies to be compared when the lists of targets, instruments or their priorities are also varied. Notice, however, that increasing the number of targets, or the planning horizon, T, which is equivalent to increasing the samples in regression. So R2 is only certain to rise with an additional instrument (or one activated from Xd).
5.9
CONCLUSIONS: FIXED OR FLEXIBLE DECISION RULES?
If the probability distribution of the target variables is symmetric, then the economist's approach of optimising a combination of the means and variances of the target variables yields a decision rule of the form (3.4.4), where Q is replaced by a g 0 + (1 4- oc)Q and Qo is a square matrix of order (n + m)T with top left submatrix Q*IJJQ* and zeros elsewhere. In the extreme case of a = 1, the effect of risk sensitivity is to change the target penalties. If N* = 0, this change will have exactly the same effect as the 'ridge-regression' decisions whenever Q*(il/Q*—I) is negative definite. The net effect is in both cases a relative reduction in target penalties provided the scaled target variances are not too large. But that similarity ends when \jjQ* becomes large. Thus, where the important targets are also high-risk variables, the statistician's approach, which picks robust decision rules to limit the impacts of random shocks on the objective function, will differ from the mean-variance approach, which is to pick decision rules which hold the high-risk variables closer to their ideal paths than would otherwise be the case. Shocks of a given size then cause smaller variations about those paths. These two approaches to deriving decisions which aim to reduce the impact of uncertainty in certainty-equivalent policies have quite distinct characteristics. The results show that the economist's approach is to hold high-risk variables closer to their ideal values so that given shocks cause smaller deviations from those paths, whereas its statistical counterpart is to pick robust rules to the ex-post optimal path. The rules derived from economic theory tend to give stable target variables but flexible instruments, while the statistician's approach implies more flexible targets and sticky instruments for the same problem. As a consequence the economist's rule will be more successful in terms of suppressing the effects of unanticipated shocks; it will tend to produce more vigorous (and successful) dynamic revisions to the decision sequence. The statistician's decision rule will be more successful in reducing the sensitivity of
104
5 RISK AVERSION, PRIORITIES AND ACHIEVEMENTS
the decisions themselves to the potential uncertainties from the outside - thus reducing the need for extensive revisions. In addition to this contrast, the economist's risk sensitivity is evidently more wide-reaching and utilises more information about the stochastic processes underlying the decision problem. Not only are the priorities on target versus instruments shifted, but also the relative priorities on different targets are adjusted according to their potential risks, and asymmetric penalties are introduced to capture the higher probability of getting random shocks on one side of the expected target values than on the other. The important conclusion here is that one must be careful to identify what the policy-maker really wants when he claims to be risk-averse. Should he aim to be in the best position to counteract random shocks as they strike the system (the economist's approach)? Or should he aim to preserve continuity in the system by reducing the sensitivity of decision-making to those shocks and hence the expected losses due to uncertainty (the statistician's approach)? This is of course none other than the old fixed-versus-flexible rules debate in the context of the uncertainties faced by economic planners. Conventional economic theory would evidently suggest a more flexible 'fine-tuning' approach to risk, whereas the statistician's approach stresses the importance of applying policies which are insensitive to shocks.
6 6.1
Non-linear optimal control INTRODUCTION
So far we have been concerned with the economic-policy aspects of linear models. But the majority of econometric models which are used for forecasting, policy simulation and optimisation in the economic-policy process are actually non-linear. In principle this can create a number of difficulties because the analytical results of linear systems are no longer available. Solutions to the problem of designing economic policy when the model is non-linear have taken two basic forms: (i) A linearisation approach by which the apparatus of linear theory is then applied to the linear approximation. (ii) A direct approach by which the problem of finding the minimum of an objective function subject to the non-linear model is treated as non-linear constrained optimisation, or non-linear programming problem.1 For the first approach the main problems are how to generate the linearisation and how good an approximation the linearisation is. For the second approach the main difficulties are concerned with the type of algorithm which is used to find the minimum of the objective function and what kind of derivative information is needed. The examination of non-linear models does not throw any fresh light upon the problems of economic policy. But since most of the models in practical use are non-linear it is worth considering how they can be handled and how the results which are obtained can be related to the results obtained for linear systems. While explicit analytical solutions can be obtained for many linear optimal-control problems (with quadratic loss functions), in practice we still have to use numerical techniques to invert matrices. Thus many of the numerical techniques devised for non-linear systems will also be relevant to the solution of linear systems, especially when the linear system is large.
106 6.2
6 NON-LINEAR OPTIMAL CONTROL NON-LINEARITIES AND CERTAINTY EQUIVALENCE
In chapter 3 we were able to determine the optimal policy both in the closedloop dynamic-programming form and in the final form by using certainty equivalence and replacing conditional expectations of exogenous variables by their means where random disturbances always entered additively. Expectations could then be updated at each t to provide sequentially the closed-loop (multiperiod certainty-equivalent) solution. Any knowledge of the distribution functions of any random terms beyond the conditional mean was unnecessary.2 Now consider the stochastic non-linear model: yit=fit(yjt>yt-iT-*>yt-s>Xt>-->Xt-r) + zu
(6.2. i)
for i= 1,.. .,g and j ^ i. There are a maximum of s lagged values of the endogenous variables and a maximum of r lagged values of policy instruments. eit is an additive random disturbance. The exogenous variables, et, are ignored for simplicity but it can be supposed that uncertainty about the exogenous variables is subsumed in et. Although the disturbances enter additively in (6.2.1), the actual effect of uncertainty on Jit is multiplicative since the impact of ejt in other equations, via yjt9 is transformed by the nonlinearity of fit(.). We can no longer invoke certainty equivalence. If we write the cost or objective function as J(Y, X), where Y and X are stacked vectors: as in the notation used for the stacked final form in previous chapters, then specifically it is easily verified that £,[J(Y,X)] - J[£,(Y),X] is a nonvanishing function of X, since for all i^j, and t^k^T: Et\_jikiyjfc>
y*. - 1 9 • • • > yk - « » * * > • • • •> xk - r ) J k,yk-1,..
•,yk-s,xk,...,xk_r)
In principle, then, non-linear models such as (6.2.1) present difficulties for optimal control. If it is not legitimate to minimise Et\_J(Y,X)~\ by inserting conditional means for all the random variables in (6.2.1), before minimising J(Y,X) subject to (6.2.1), then we ought to compute the minimiser of Et\_J(Y, X)~\ directly. But the difficulty with this procedure is that any attempt to replace these means by 'certainty equivalents' is faced by the problem that it is impossible to know a priori what values for the random components of yt9 yt-i9 • • • > should be inserted on the right-hand side of (6.2.1) so that the resulting constrained minimum of J(7, X) is also the minimum of £ r [J(7, X)~\. It is therefore also impossible to know, a priori, about which values of Y and X the model should be linearised, if a local certainty equivalent is to be used to partially circumvent this difficulty. Any direct search to locate the minimum of Et\_J(Y, X)~] would involve searching through all convolutions of the density
6.2 Non-linearities and certainty equivalence
107
functions implied by J(Y,X) and (6.2.1) for each possible choice of X. In practice this would not be a feasible scheme because of the non-linear transformations and integrations of higher dimensionality involved in determining the relevant density functions. The problems for optimal control raised by stochastic non-linear models have a parallel in the use of non-linear models for simulation and forecasting. A majority of model-users generate forecasts from non-linear models deterministically; that is, the random terms are ignored. This implies that forecasts or predictions will be biased. One way in which this potential problem of bias can be overcome is to carry out Monte Carlo-type stochastic simulations in which the structural disturbances are replaced by stochastic proxies. These stochastic simulations are usually of two kinds: either parametric, with stochastic proxies obtained as random draws from a usually multivariate normal distribution seeded by the (estimated or approximated) variance-covariance matrix of the model's residuals, with possibly the use of antithetic variates to improve the efficiency of the estimates of the moments of the model's error distribution; or, alternatively, non-parametric simulations are used (McCarthy, 1972; Brown and Mariano, 1981) by which the residuals obtained from the estimation sample-period are used as stochastic proxies for the structural-disturbance terms. But despite the potential prediction bias the majority of non-linear macroeconometric models are used in a deterministic way. There are a number of reasons for this. Repeated stochastic replications are expensive. Moreover, a majority of the tests which have been carried out have actually found little (significant) difference between the deterministic solution and the mean of the stochastic replications.3 But a number of commentators (Brown and Mariano, 1981; Salmon and Wallis, 1982) have argued that this may not provide a sufficient justification for confining model usage to deterministic solutions. First, the estimation techniques used for most large non-linear models may bias the models towards linearity. Models which are estimated by ordinary least squares (OLS) are very common and this failure to allow for simultaneity and possible non-linearities in parameters may mask non-linear characteristics. While this undoubtedly poses a problem for the specification of a model and suggests that greater attention should be paid to it, it does not raise direct problems for non-linear optimal control, where we are obliged, at least in the first analysis, to take the model on trust. Existing approaches to non-linear optimal control either approximate the required values of the random variables and then optimise the resulting deterministic model directly - usually by pretending that certainty equivalence is approximately valid so that conditional means can be inserted or make a guess of the path about which the model is to be linearised and use the linear approximation as a model for which an optimal closed-loop rule is obtained. The closed-loop rule both provides a feedback control for the non-
108
6 NON-LINEAR OPTIMAL CONTROL
linear model and counters possible linearisation error, including any errors in using the wrong random-variable values. 6.3
NON-LINEAR CONTROL BY LINEARISATION
The linearisation of (6.2.1) can take a variety of forms. Expanding (6.2.1) to the first order around some chosen trajectory - this can be either some base forecasts path or the optimal path derived by the direct methods which are discussed in section 6.4 - yields:
for i = 1,..., g, t = 1,..., T. The superscript, o, denotes the trajectory about which the non-linear model is being linearised, and the perturbations are:
The equation (6.3.1) can be written in the time-varying vector-polynomial form: A\L)yt + B\L)xt = 0
(6.3.2)
for t = l , . . . , r . This provides a time-varying linear approximation to a constant-parameter, non-linear econometric model. L is the lag operator:
B\L) = £ B)LJ
A\L) = £ A)ti; j=o
j=o
and the (p,/c) elements of A) and B) are given by: p f c
_!
a
p= 1,.. .,g; k= 1,.. .,m Approximating (6.3.1) further, by a constant-coefficient model: )
(6.3.3)
we can write the approximate final polynomial form of the non-linear model, for A(L) invertible: yt = - [A(Lj\ - W x , = T{L)xt
(6.3.4)
If there are no common factor cancellations, each element of T(L) is a rational function with the common-denominator polynomial det[A(L)]. Linearisations of non-linear macroeconomic models are primarily concerned with obtaining F(L) from (6.2.1) (Holly et al, 1979).
6.3 Non-linear control by linearisation
109
6.3.1 Linearisation by stochastic perturbation One way by which the rational functions in T(L) can be identified - providing mappings from each instrument to each target - is to set the random disturbances to zero and then to perturb each instrument by a white-noise sequence. White noise has the desirable property that it meets a basic identifiability condition (Hannan, 1971), which is that the perturbations to each of the instruments should have an absolutely continuous spectrum with spectral density non-zero on a set of positive measure in (— n, n). With white noise there is a better chance of exciting all of the dynamic modes of the model, which may be an important consideration in non-linear models (Holly, Zarrop and Westcott, 1978; Zarrop et al., 1979a; Zarrop, 1981). In the control literature inputs of this kind are termed 'persistently exciting', a property which is precisely analogous to the problems raised by multicollinearity in econometrics. Formally: E(xik) = 0;
(6-3-5)
E(xikxit) = XAtAki
where E(.), as before, denotes an expectation, Atj is the Kronecker delta and kt determines the size of the perturbation of the ith policy instrument. The form of (6.3.5) ensures (Cooper and Fischer, 1972) that there is no collinearity between the policy instruments. Moreover, this orthogonality condition simplifies the estimation of the relationship between each target and all the policy instruments. As is well known, orthogonality among the explanatory variables in a linear model makes multiple regression analysis unnecessary. Instead of having to estimate simultaneously m, potentially very complex, distributed lags with often relatively few observations, each distributed lag can be estimated separately and the principle of superposition used to construct a multiple linear equation for each target variable. We estimate a rational function for each perturbed instrument on each perturbed target of the form:
^ W = ? i 7 T ^ ) + 7i7T w oW
i=i,...,i;;=l,...,m
(6.3.6)
where ccij9 f$ip ctj and d{j are polynomial functions of the lag operator L, and the first term on the right-hand side of (6.3.6) is being used to approximate the corresponding rational element in T{L). The transfer function form of (6.3.6) would have to be estimated by a full information estimator, though it would be much faster and cheaper if (assuming ci}{L) = 1) least squares could be used instead. An equation for each target on each of the policy instruments can then be constructed using superposition, i.e.: 3=1
j=l
110
6 NON-LINEAR OPTIMAL CONTROL
for i = 1,.. .,g. A model for the residuals (6.3.6) can be constructed from (6.3.7) and then an error model: d^Dw^c^L^
(6.3.8)
for i = 1 , . . . , g, where eit is a white-noise sequence, can be constructed for each target variable. The error model may help to counter problems arising from possible non-stationarities in the non-linear model as well as from modelling errors in the linearisation. As it stands, the perturbation model written in vector form: yt=V(L)xt
+ £(L)et
(6.3.9)
is expressed in deviations about some nominal trajectory. But, because the objective function is usually expressed in the same levels as the nominal trajectory, we need to reintroduce the nominal trajectory into the model by defining a set of g + m 'exogenous' variables and m identities, which gives the model: yt = V(L)zt + (Ig: 0gm)dt + S(L)st
(6.3.10)
where zt = xt-(0mg:Im)dt
(6.3.11) (yr0', xr0').
zt is an w-vector of identities and d[ = Equation (6.3.10) can be treated in the normal way to obtain first-order, state-space forms suitable for optimal control. A feedback rule: xt = Ktyt_x+kt
(6.3.12)
derived using the methods of chapter 3 can now be used in conjunction with the non-linear model.
63.2
A linearisation in stacked form
The power-series expansion of T(L) provides the matrix dynamic multipliers of the linearisation:
r(D= £ r,i/= f; RX i=0
(6.3.13)
t=l
so an alternative way of generating the final form (6.3.4) is to evaluate the dynamic multipliers: ) =R numerically about the nominal trajectory, for each t and t
This
6.4 The non-linear programming problem
111
provides a time-varying approximation which can be stacked up to provide, in the notation of chapters 2 and 3: s°
(6.3.14)
but where now
«-[*, ::•. I] In a non-linear model the dynamic multipliers will vary with respect to both time and the choice of trajectory about which the multipliers are generated. If only one model solution is used to obtain the multipliers then a time-invariant approximate model is obtained so that: RT
...
Rx
which is similar to R in equation (2.3.2). Given (6.3.14) and the objective function, we can compute the optimal instrument values Xj conditional on the approximate dynamic multipliers. Substituting Xj into the constraints we can relinearise and generate a new approximation Yj = Rj*Xj + S and recompute a new optimum Xj+i. If this generated a lower cost than Xj then the model could be relinearised again and the steps repeated. Otherwise the procedure could be terminated. The consequent properties of this type of repeated linearisation are examined further in section 6.4.2. An alternative procedure would involve attempting to gain a better initial approximation to the time-varying R%. Rather than carry out T(T— l)/2 period solutions per policy instrument to determine (6.3.14) one could assume instead that the dynamic multipliers are reasonably constant over short intervals of time. Divide the optimisation interval (1, T) into shorter intervals, say two or three. The dynamic multipliers need only be computed for these intervals and then we must assume that they are constant between intervals. 6.4
NON-LINEAR CONTROL AS A NON-LINEAR PROGRAMMING PROBLEM
Rather than generate a linear approximation to the non-linear model and then use established techniques for linear systems on the approximation, an alternative approach is to treat the problem of minimising a policy objective function subject to a non-linear model as a constrained optimisation or nonlinear programming problem. The literature on non-linear optimisation has boomed in the last decade. Problems of non-linear optimisation arise in a wide variety of fields. The growth in computer technology has created a need for numerical algorithms which are able to solve problems very rapidly. Mainly because of their size,
112
6 NON-LINEAR OPTIMAL CONTROL
non-linear econometric models pose particular problems for non-linear optimisation. This means that a premium is attached to algorithms which economise on the number of times the model has to be solved. A majority of the algorithms used are some variant of the original algorithm of Newton. The dynamic non-linear constraints (6.2.1) are transcribed into the static (i.e. stacked) non-linear form, in the notation of section 2.3: F(Y,X) = 0
(6.4.1)
where we are approximating the correct values for the random variables and solving a deterministic non-linear problem. As elsewhere we assume that the objective function is the quadratic: J(Y, X) = SY Q*dX + SX'N*SX
(6.4.2)
where SY = Y - Yd, X = X - Xd, and g* and N* are defined as in section 3.4: Ql
-
In this case the symmetric matrices, Q* and JV*, are assumed to be positive semidefinite and positive definite respectively. The problem is to choose values of Y and X such that (6.4.2) is minimised subject to (6.4.1), i.e.: min[J(y, X)\F(Y, X) = 0]
(6.4.3)
We want the minimum of the quadratic objective function subject to the nonlinear equality constraints (6.4.1). The constrained-optimisation problem (6.4.3) can be recast as an unconstrained-optimisation problem once it is recognised that the particular structure of the non-linear equality constraints provided by the econometric model provides a mapping Y = g(X) (assumed to be unique) between the targets and policy instruments. Thus: G(X) = J[g(X), X] = l/2[g(X) - Y*]'Q*[g(X) - Yd] + SX'N*SX
(6.4.4)
(6.4.4) is assumed to be approximately quadratic (only if the equality constraints were linear would this hold exactly). The gradient along some feasible trajectory, /c, is: gk = Rk+Q*[g(Xk) - Yd] + N*(Xk - Xd)
(6.4.5)
where R\ has the time-varying structure of (6.3.13). The assumption that (6.4.4) is approximately quadratic implies that the second-order conditions for the minimum are: gk-gk
+l
= Gk(Xk - Xk + l)
(6.4.6)
6.4 The non-linear programming problem
113
where G\ the Hessian, is defined as: Gk = R%Q*Ri + JV* + £*
(6.4.7)
where Bk is
Bk= £ V^(Xfc){e*[^(Zk)-7d]} I'= 1
i denotes the ith element of the corresponding vector and Vgt is the Hessian of gt with respect to X evaluated at Xk. A Newton-type descent direction is obtained by assuming that Xk + 1 is the unconstrained optimum of G(X), a necessary condition for which is, of course, that gk + 1 =0,so that (6.4.6) can be rearranged to provide the variable metric iterative procedure: Xk + 1=Xk-(xk(Gkr1gk
(6.4.8)
This scheme is usually referred to as a damped Newton algorithm because of the inclusion of the scalar a. Circumstances may arise in which G(X) is not sufficiently approximated by a quadratic, so afe can be set to ensure that G(Xk + 1)
(6.4.9)
and the gradient, computed at each iteration, is: gk =
R0'Q*fy(xk) - Yd] + N*(Xk - Xd)
(6.4.10)
and the Gauss-Newton variable metric iteration (Rustem and Zarrop, 1979) is: Xk + 1=Xk-*kG~lgk
(6.4.11)
114
6.4.1
6 NON-LINEAR OPTIMAL CONTROL
A modified gradient algorithm
Preston et al. (1976) have proposed that an even simpler iteration scheme than (6.4.11) be based on the direct solution of the gradient, for, since at the minimum gk = 0, we have from (6.4.10): - 7d]
(6.4.12) +
This iteration is locally convergent to the optimum, say X , only if:
p(-N*Rl'Q*R+)
+i=
x
d
- AiN^-iRl'R^giX)
- Yd] + (1 - X)dXk
(6.4.13)
for some A > 0. This algorithm is locally convergent if — (N*)~1R°'Q*R+ has all roots less than unity in real part (Hughes Hallett, 1981, lemma 1). Hence global convergence can be guaranteed for (6.4.13) if R^ and F(Y, X) are replaced by R° and R°Xk + s* respectively, since the iteration matrix then becomes negative semidefinite and symmetric and therefore automatically satisfies the condition for convergence. The damping factor, A, could be chosen so as to secure convergence according to the condition: 0 < X < [p{N*)~ 1R°'Q*R0}'] ~x However, retaining the time-invariant JR£ does not ensure global convergence. Alternatively, X can be chosen automatically within the iterative algorithm (see Hughes Hallett, 1985). Both the modified Preston et al. algorithm (6.4.13) and the Rustem-Zarrop algorithm (6.4.11) use only an approximation to the gradient (6.4.5); so if the first order condition, # + = 0 , is used to establish convergence then the resulting X + may not actually be the optimum. In this case we need an outer iteration at which Rk is revaluated and the iterative procedure repeated. The outer iteration is then: 7 d -5*)] (6.4.14) where Hj = (R: I)L\ L is the triangularisation of Q (defined in chapter 3), i.e.
6.4 The non-linear programming problem
115
Q = L'L. Xj, for j = 1,2,..., are the successive 'optimum' values provided by the convergence point of (6.4.13). Each step of (6.4.14) alternates between evaluating Rj from F(YJ9XJ) and Rj(dY/dX)Xj. Hence we have a two-stage iteration consisting of alternative least-squares 'estimation' and 'prediction' phases. This reproduces exactly the well-known fixed-point estimator where Rj plays the role of fitted endogenous variables4 and xj the model parameters, in the analogy. The convergence of such estimators, and thus (6.4.14), is determined by reaching a fixed point in the 'prediction' phase; which in this case follows if p(I — d vec Rj+1/d vec Rj) < 1, for all j . Now: vec Rj+1 = [(d vec R/dX)]Xj(Xj) so d\ecRJ
+1
/d\ecRJ = I + _
+ vec Rj
(6.4.15) (6.4.16)
Therefore the convergence of (6.4.14) is guaranteed neither globally nor locally. It is problem-dependent, but only to the extent of second-order effects. Convergence, when observed, will be extremely rapid near the convergence point. The first derivative (multiplier sensitivities) on the right of (6.4.16) depends only on the model. But the second derivative, the optimal-policy sensitivities, depend on both the model and the preferences. For the latter the larger Q* is relative to N*, the larger are the alterations to Xj in response to given multiplier changes. But if N* is relatively large then the second derivative is small, so, for example, X° = Xd would lead to convergence because the Xd simulation lies near the convergence point. On the other hand, linear models give instant convergence since d vec R/dX = 0 in (6.4.15) and (6.4.16) at every Xj. Consequently, convergence will be very rapid where multipliers change very slowly with Xj - i.e. for models nearly linear over that subset of feasible policies which policy-makers can accept. This experience is frequently reported, but its generalisation must be treated with caution. Multiplier sensitivities are usually investigated only for a restricted policy set, and for individual, rather than packages of, policy variations. If the sum of second-order multiplier effects in (6.4.16) is significant the iteration can easily leave the neighbourhood of linearity. Thus, in general, algorithms of this kind can locate optimal policies only if the multipliers are sufficiently insensitive to policy changes and the optimal policies themselves are also insensitive to alterations in the model's responses. It is the latter component which has been overlooked. 6.4.2
Repeated linearisation
The procedure of repeatedly relinearising the model considered at the end of section 6.3.2 can be given a firmer basis in the light of the convergence
116
6 NON-LINEAR OPTIMAL CONTROL
conditions examined above. Equation (6.3.14), obtained by linearising the model about some initial path, is used to generate an 'optimal' policy X1. This is used to provide a new path about which the model is relinearised, for which a further 'optimal' X2 is calculated and so on iteratively, given that, from chapter 3: b = s - / + RXd
SY = RSX + b9
and the objective function is: J(SZ) = SZfQ5Z
(6.4.17)
so that: X*=-{H'QH)-lH'Q\
Q-
(6.4.18)
This suggests that each iteration of the repeated linearisation procedure is: XJ
+1
=Xd-(Hj'Hj)-1Hj'L\
•••
(6.4.19)
Rearranging terms in (6.4.19) gives: 3Xt + 1= SXj + (HJ"Hj) ~1HjL5Zj
(6.4.20)
which is exactly step; of Holbrook's (1974) algorithm. Moreover (6.4.19) has an identical iterative structure to (6.4.14). Hence the convergence conditions (and remarks following) which apply to (6.4.14) apply without modification to algorithms of this type, and to (6.4.20) in particular. In case convergence difficulties emerge, the iterations (6.4.19) or (6.4.14) may be accelerated as: SXJ + 1 = -a j (H J ''H J )~ 1 H j I "* L . . . + (1 -0Lj)5Xj or with Zd -
S
--
replacing -
•• •
(6.4.21)
for (6.4.14).
Again (6.4.21) converges for some Oy>0 provided only that the original iteration matrix (the last term of (6.4.16)) has all roots less than unity in real part.
7
7.1
The linear rational-expectations model INTRODUCTION
In the econometric model used in the first part of this book it was assumed that economic agents did not base their behaviour upon anticipations of the future; or if they did the process by which they revised their expectations was largely mechanical. This means that economic systems could be treated in principle in an analogous fashion to physical and engineering systems. Economic agents could be treated more or less as automata responding in a regular and reasonably predictable way to stimuli. Recently the basic theory relating to the behaviour of the economy, especially at the macroeconomic level, has changed. The emphasis has shifted from an essentially backward-looking to a forward-looking perspective. This shift is associated with the rational-expectations hypothesis. Initially the introduction of forward-looking expectations into macroeconomic models appeared to have serious implications for optimal-control theory. In particular, the neutrality proposition of Sargent and Wallace (1975) suggested that there is no systematic role for a stabilisation policy when expectations are rational. The generality of this proposition has been challenged and it is now recognised that even when expectations are forwardlooking it is still possible for a stabilisation policy to have a systematic effect on economic activity. However, this does not leave optimal-control theory unaffected, because the forward-looking 'acausal' nature of the economy means that the standard policy-design techniques brought over from engineering are no longer applicable. In this chapter we examine methods for analysing linear rationalexpectations models and for writing them in a form suitable for policy formulation. We also show how the controllability results of chapter 2 can be extended to rational-expectations models. Optimal-policy formulation is taken up in chapter 8.
118 7.2
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL THE RATIONAL-EXPECTATIONS HYPOTHESIS
The rational-expectations hypothesis, originally advanced by Muth (1961), states that expectations are essentially the same as the predictions of the relevant economic theory. A corollary of this hypothesis is that economic agents do not waste information but use it to their best advantage. Expectations are not governed by some external process but will depend upon the complete structure of the system of which they are an important part. The kind of information which is used by economic agents and the way in which it is organised to provide a prediction of the future course of events lies at the centre of an explanation of dynamic behaviour in economic systems. Expectations, since they are informed predictions of future events, are essentially the same as the predictions of the relevant economic theory... The hypothesis can be rephrased a little more precisely as follows: that expectations of firms (or, more generally, the subjective probability distribution of outcomes) tend to be distributed, for the same information set, about the prediction of the theory (or the 'objective' probability distribution of outcomes). (Muth, 1961)
Muth was careful to stress that the hypothesis does not assert that agents have in mind some system of equations when they formulate their expectations; any more than a billiards player has in mind a set of partial differential equations when he decides how he is to make a shot - even though the behaviour and the motions of the cue and the billiard ball can be described by differential equations. Nevertheless, since Muth formulated the rationalexpectations hypothesis there has been the emergence of a large industry concerned with forecasting at both the macroeconomic and microeconomic level. Explicit mathematical and statistical techniques are used and information considered relevant to a forecast is often organised in the framework of the relevant economic theory. Formally the hypothesis can be stated that expectations are forecasts conditional upon the available information set: yf+Ut = E(yt
+ 1\Qt)
(7.2.1)
The expectations for y formed at time period t are conditioned on information available at t and refer to t + 1. This implies that the linear prediction error:
has certain statistical properties. It is orthogonal to the information set Qt so the expectations are optimal, unbiased, minimum-variance, linear forecasts. The contents of the information set will vary from circumstance to circumstance. The costs of obtaining and using the information set will also be important. This kind of formulation of the way in which an expectation is formed is contrasted favourably with the more traditional adaptive-
7.2 The rational-expectations hypothesis
119
expectations hypothesis, for i = 1:
where, as before, L is the lag operator and X is the coefficient of adjustment (0 < X < 1). The forecast provided by (7.2.2) will be biased except in the special case where the actual evolution of yt is a first-order autoregressive process with X as its coefficient. Of course, it is possible to consider more general representations than (7.2.2) such as the autoregressive moving-average form:
which, for sufficiently high orders in the denominator and numerator polynomials, will provide unbiased forecasts of yt + 1. If, however, there are other systematic influences on yt in addition to its own past behaviour, the forecast provided by (7.2.3) will not have the smallest variance. Consider the simple two-equation model: (7.2.4)
where the covariance of ej" and ef is zero and £™~iV(0,O>ef ~N(0,cr p ). A second-order autoregressive equation for pt could be estimated, of the form:
where (j)l = f}x + oc1 /?2, 2 = ^iPi* a n d vt is a white noise process sf + PieT-1 • Although forecasts for pt generated from the second-order autoregressive equation will be unbiased conditional expectations, they will not have the smallest variance relative to the structural relationships since ov = op + $\om. Knowledge of the way in which mt evolves will provide better forecasts. The revolution in economic theory and policy associated with the rationalexpectations hypothesis did not really get under way until the work of Lucas, Sargent, Wallace and others in the early seventies. Lucas (1976) made a serious attack on the relevance of most macroeconomic models used for policy analysis and design because of their failure to ensure that the structure of the model was invariant to policy. Of more importance was the finding of Sargent and Wallace (1975) that, in a deceptively simple model, anticipated systematic monetary policy could have no effect on either the mean or the distribution of output. Since then there has been a huge literature on the conditions under which this proposition does or does not hold (Begg, 1981). A variety of models have been constructed and estimated, based for example on overlapping contracts (Taylor, 1979) or on the absence of sufficient future or
120
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
contingent markets (Buiter, 1981a) in which anticipated systematic monetary policy does have effects on output. The so-called neutrality proposition of Sargent and Wallace is essentially concerned with controllability in the sense in which the term has been used in chapter 2. The question of policy existence is therefore best answered by reference to the general controllability properties of particular rational-expectations models. The relevant controllability criteria are developed in section 7.8.
7.3
THE GENERAL LINEAR RATIONAL-EXPECTATIONS MODEL
The general linear, constant-coefficient, rational-expectations model can be written: (7.3.1)
Bxt + Cet + Do ft.t
where yt is a vector of g endogenous variables, xt is a vector of n policy instruments, et is a vector of k exogenous variables, and y£t and yf+ltt are vectors of rational expectations of the endogenous variables.1 We have explicitly excluded terms of the form yfft-1, that is, expectations formed in the previous period about the current period. This does cut us off from a large part of the literature on rational-expectations models but we feel that the interesting problems for optimal-stabilisation policy arise from forwardlooking expectations. In principle, expectations may refer to periods further ahead than t + 1 (or expectations may be conditioned by information other than at t, Qt). In this case we assume that a model with expectations of endogenous variables further than one period ahead is transformed into the second-order form of (7.3.1) by redefinition of variables. For example, the scalar difference equation: (7.3.2)
can be rewritten in companion form: 1 1 1
a
72
1
a2 0
0
1
0
0
l
a
0
-1 -1
0
0
-1
0
l
(7.3.3)
where y<1' = Z, + 1 ; y ! 2 ' = y((3) = z,_ 1 . An alternative way of rewriting (7.3.2) is: (7.3.4)
73 General linear rational-expectations models
121
where 0
1
0 0 0 0
0
0
1 0
0
0
0
72
E=
F=
1
1 0 0 0
«1
a2
0
0
0
0 1
0
0
0
0
4)
and the fourth element is yj = zr + 2 . For y1, y2 not equal to zero it is trivially true that (7.3.2) can be written as an equation in lagged values of zt only. Equally if E~l exists (7.3.4) can be written in conventional state-space form. In practice, however, for higher-dimensional models it is not obvious that E will always be invertible. It is unlikely that there will be as many expectations as endogenous variables. There are methods available for analysing and solving (7.3.4). Luenberger (1977, 1978), extending the frequency-domain work of Rosenbrock (1970) to the time domain, has described (7.3.4) as a timeinvariant descriptor form. Descriptor systems (with state-space forms as special cases) provide a way of representing dynamic processes (and, in particular, acausal systems). Wall (1977) first recognised the relevance of descriptor forms to rational-expectations models but did not pursue the point. The first application of descriptor forms was by Preston and Pagan (1982), who used them as a way of establishing controllability criteria for models in which expectations are forward-looking. However, the policy-design aspects of descriptor systems still remains relatively undeveloped, though Cobb (1981) has examined the role of linear feedback in continuous-time descriptor systems and Pandolfi (1981) has examined optimal policies. Given the present state of descriptor-system theory we intend to work directly with (7.3.1) and variants of it, and to develop the appropriate framework from which controllability criteria and policy-design methods can be derived. Indeed this may have computational advantages because it may be possible to avoid the high dimensionality of the large equation systems of descriptor space (being further generalisations of state-space systems).
7.4
THE ANALYTIC SOLUTION OF RATIONAL-EXPECTATIONS MODELS
Any difference equation can be solved as a forward-looking or backwardlooking equation (Sargent, 1979). Consider the scalar first-order difference equation: {\-XL)yt = bxt
(7.4.1)
where L is the lag operator. The backward-looking solution is: (7.4.2)
122
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
where the last term is an arbitrary constant of integration. The forwardlooking solution of (7.4.1) is: yt=-bI,(l/X)-lxt
r
+ i + 1+c2X
(7.4.3)
provided by using the alternative expansion: 1
-(1/X)2L~2
...
Both (7.4.2) and (7.4.3) are a general solution to (7.4.1). Indeed any linear combination of (7.4.2) and (7.4.3) is also a general solution to (7.4.1) (Blanchard, 1979; Blanchard and Kahn, 1980). The particular solution of (7.4.1) requires that values be assigned to the constants of integration. For the backward-looking solutions, an initial y0 provides the extra information which ties the solution down; for the forward-looking solution it is assumed that there is some terminal condition, 6. If (7.4.1) is stable (1 > X > 0) then (7.4.2) will provide a finite yt (assuming, of course, that xt is well behaved, i.e. that it is finite for finite t) and the contribution of y0 to yt will vanish as T tends to infinity. But, if (7.4.1) is stable and the forward-looking solution is used, yt will be unbounded as T tends to infinity. Equally, if (7.4.1) is unstable, the forward-looking solution will be bounded, since as T tends to infinity (l/^) r tends to zero so that the contribution of the boundary condition will vanish. It has usually been the practice in the analysis of economic models to adopt the backward-looking form, even when the model is considered unstable. This is because it reflects the normal ideal of 'causality'. But rational-expectations models must also take into account the forward-looking part of the solutions since this is the part which reflects the forward-looking component of the rational-expectations hypothesis. 7.4.1 A forward-looking solution procedure A forward-looking solution to (7.3.1) (with Do = 0, for convenience) can be determined for time period t, conditional on information Qr and assuming that a set of terminal or transversality conditions, 8, is provided at some finite period, T, in the future. Taking expectations so that all et+j have an expectation of zero for j greater than zero, we can write for period T: YT = AJYJ.,
+ BTxT + CTeT + DT6 e
where 6 = Y
T+Ut9
(7.4.4)
AT = A9BT = B, CT = C,DT = D
For period T— 1, we have after substitution: (7.4.5) where now AT_X =HT_lA9BT__1
=HT_1B,CT.1
=HT_1C,DT_1
=Hr_1D
7.4 Analytic solutions
123
The general solution for period t is of the form: T+l
Yt = AtYt_1+Btxt + Ctet+ £ s= 0
/ T
£ Dt+J \s=0
where n is the product operator, i.e.: n
n xj= Here we have a forward-looking reduced form which is solved recursively backwards from period T to period 1. That is to say, each function of its own lagged value and current and future expected values of policy instruments and exogenous variables is conditioned by the information set Qt. Thus, as Qr becomes Q, + 1 and then Qr + 2 , and so on, we will have a different solution and possibly a different terminal condition or even a horizon which rolls forward. For (7.4.6) to generate a finite yt we require (apart from the processes for xt and et, for t = 1 , . . . , T, to be well behaved) the last term on the right-hand side of (7.4.6) to go to zero as T goes to infinity. Since the last term can be written:
DT n Ht+S6
(7.4.7)
s=0
a sufficient condition for this to go to zero as T goes to infinity is that D should be a stable matrix. 7.4.2
The final-form rational-expectations model
Another way of solving rational-expectations models is to obtain the final form of (7.3.1). We can write it as: -^iyt
+
i\t + yt~ Doyt\t -Ayt,1=
Bxt + ut
(7.4.8)
where yt+j\t = E(yt+j\Qt), for ; > 0 , is an expectation conditioned on the information available at the start of period t9 and ut = Cet + et. A, B, C, Do, and Dx are matrices of known coefficients. Each information set, Qr, contains certain information on yt_j9 xt_j9 ut_j and ut+j_1^t forj > 1, in addition to the coefficients in (7.4.8). Each expectation in (7.4.8) is the same as the forecast obtained by solving the model for period t + 1, conditional on the information set Qt. But these expectations are linked in time; so to solve (1) also requires solutions for yt+j\t where; = 1 , 2 , . . . Thus, in period 1, the expectation of each yt9 up to a horizon
124
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
T, is obtained by conditioning (7.4.8) on Qx and stacking up the T sets of equations just as in section 2.3. Thus: (I-Do)
-Dx
-A 0
yi\i
0
o
B
=
'-Dx —A
B
(I-Do)
ym
so
5 0 -A 0
•-. '-A
-Dl (/-Do)
0
B
M T |1
R R TI
R
lT
RTr
= RX + s X
T\l
(7.4.9)
S
T\1
where M,^ is taken from Q 1? and x ^ are instrument values to be determined. We can regard x ^ a s a decision intended for immediate implementation, and xr|! as the currently expected future decisions (t > 2). This final-form solution has exactly the same form as the conventional final work of chapter 2. The only material difference is that R is not block lower-triangular, and a terminal condition has to be included in 5.
7.43
A penalty-function solution
Yet another way of solving linear rational-expectations models is provided by the penalty-function method (Holly and Zarrop, 1983). The problem of achieving a perfect-foresight or consistent-expectations solution can be considered as a quadratic minimisation problem. This approach adopts a loss function which is expressed in terms of the deviation of the model solution
7.4 Analytic solutions
125
from the convergent saddlepath solution, rewriting the stacked form as: I-Dn
0
-A
7-D 0
B
__
-A
z
I-Do
-A
0
+
0
yT\i
I-Do
B T\1
Dl
0
(7.4.10)
+
0
0
This is in the more conventional Theil notation with a lower triangular matrix on the left-hand side which we have used extensively in previous chapters. The final form of this model can be written: KYe
(7.4.11)
where R is the conventional lower triangular matrix of dynamic multipliers. K is a lower triangular matrix of the dynamic multipliers between Y and Ye where Ye is defined as the stacked vector:
A loss function can then be written as: Jp = l/2[(7 e - Ys)fP(Ye - Ys) + Ye'GYe~\
(7.4.12)
where P is a semipositive definite weighting matrix. Ys is defined as: 0
1
0
0
Ys =
1 0
1
0
(7.4.13)
or: Ys = VY + v
(7.4.14)
The unconstrained minimum of (7.4.12) is the perfect foresight path since then 7 = Ys. Because, in general, we require that G is positive definite, a standard exterior penalty-function method is used (Avriel, 1976) such that a
126
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
sequence of solutions to (7.4.12) corresponding to increases in the minimum eigenvalues of P tends towards the convergent-saddlepoint consistentexpectations path. However, the exterior approach may not be necessary for linear models. For a suitable choice of the penalties in P relative to G, the rational-expectations solution corresponding to the convergent-saddlepoint path can be obtained in a single iteration. This is referred to as an exactpenalty method (Zangwill, 1967). The value of Ye lying on the saddlepoint path is given by minimising (7.4.12) subject to (7.4.11). This gives:
Ye= [(/ - VK)'P{1 -
- VK)P(V(RX + S) - V)\
(7.4.15)
7.4.4 Multiperiod solutions For each change in the conditioning information, there will be a change in the numerical solution of the model - but the form of the solution (7.4.9) will be retained. In order to update the conditioning information in (7.4.9) we must partition its elements between the past and future variables. Thus, for some t > 2 , we have: R{0\.
.Vi|i
.Ri2)
J>r|i_
T\1
(7.4.16) where Rn- -R,,t-
Rn •
R,-uv •A-l,,-l_ Ru
•
. RlT
~ >
R,-u-
•
-RT,I-I_
~R,,-
•
R,T~
_R,T- •
.RTT_
R =
R,-1,T_ {0
_Rr,
a
{2)
{t)
The submatrices S \ S \ S and S correspond to an analogous partitioning of the inverse matrix of (7.4.9). Hence, given Q t : Wj ij
x
t\t
X1
+
_XT\t
where Pf and
+
, >
_
| F(22)
, -
are the lower m(T-t
(7.4.17) + 1) rows of Pj and P2. The merit of
7.4 Analytic solutions
127
(7.4.17) is that, for each t, it involves no more than a different partitioning of the coefficient matrices already obtained for (7.4.9). It is therefore very simple to generate the revised solutions. In summary: (a) Both model solutions (7.4.9) and (7.4.16) include 'non-causal' influences since dy^/dXf+j^ ^ 0 for; ^ 1; hence yt, as well as yt\t, is subject to 'noncausal' effects. The associated multipliers are given by the elements of R{2) for the different partitions generated by £ = 2 , . . . , T . Current endogenous variables therefore depend not only on current lagged policy variables and exogenous variables, but also on the expected future values of the policy and exogenous variables. (b) The rational expectations, yt\t9 and their prediction errors, e^t, respond to a moving average of past disturbances. Hence Et(e^t) = 0, but Et(etleefjlt) ^0 when j^t. (c) The fact that yt\t # yt\t-i when ut # ut^t^l implies that, if xt\t-1 has been selected in order to reach specific values of the endogenous variables yt\t _!, then the instruments should also be set to adjust by some function of those same unanticipated disturbances. This implies that an innovations-dependent rather than a fixed-policy rule will be needed to steer this type of economy. (d) Obviously whether the final form (7.4.9) exists will depend upon whether the tridiagonal matrix G is invertible. If Do = 0, a necessary condition is that ADX has no unit roots. 2 It is worth while examining the structure in R in (7.4.9) as compared with the standard linear model of section 2.3. R in (2.3.2) has a convenient lower triangular form so that it can be built up by numerical differentiation, that is, by evaluating the dynamic multipliers. The same kind of procedure can be applied here, but it is a bit more complicated because, in order to capture the net effects of both lead and lag variables, it is no longer possible to build these numerical derivatives up recursively.
7.4.5
The stochastic properties of rational-expectations solutions
Rational-expectations models imply the initial 'prediction' error: t
= Z [Rtj(Xj - xj{1) + Stj(uj - Ufr)'] + I r=l
[*u(*j|t - *j|i) + S > , | , - u;11)]
(7.4.18)
where Stj is the (tj)th submatrix of the inverted (block tridiagonal) matrix in
128
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
(7.4.9), when the terminal condition remains unchanged. These expectations are unbiased predictors in the sense that the expectations turn out to be correct on average, i.e. E^e^) = 0 for all t. However, unlike the single-period case (see Muth, 1961), the prediction errors of rational expectations are serially correlated even when all anticipated shocks to the system are independent and serially uncorrelated. For example, if under these conditions the authorities hold to one pre-announced sequence of decisions throughout (i.e. xt = x,)!, for all r), then: 3 £ife|i4|i)= t
StjV(uj)S'kj*0
(7.4.19)
for k > t and t = 1 , . . . , T, where V(.) denotes a variance-covariance matrix. Thus, although these expectations do not contain any systematic errors with respect to the first information set Q x , they certainly will do so with respect to the subsequent information sets Qr which supply values for the unanticipated disturbances (Innovations') (uj — w ^ ) , . . . , (u1 — u^). Therefore, at t ^ 2, the expectations generated by (7.4.9) will only remain 'rational' in the usual sense if the model's solution is revised using each information set Qf in turn. The rational expectations will then be adjusted by an amount which is a linear function of the unanticipated disturbances experienced so far.
7.5
TERMINAL CONDITIONS
So far we have said nothing about how the terminal conditions are determined, even though they are an integral part of the solution. The role of terminal, or transversality, conditions in the solution of rational-expectations models has been a subject of controversy in the literature (Shiller, 1977). The main reasons for this revolve around questions of uniqueness and stability. Any model should have the characteristic that for given inputs only one set of outputs, only one unique solution, should exist. If we choose to represent economic activity by means of differential or difference equations and a set of boundary conditions, the representation can be described as well posed if two criteria are satisfied. First, the solution should be unique since experience suggests that a given set of economic circumstances leads to just one outcome. Secondly, the solution should be stable in the sense that an arbitrarily small change in the boundary conditions should not result in explosive behaviour. If the economic process under examination were dynamically unstable, then some trivial disturbance in the past would have caused the process to have exploded by now. If the mathematical representation we have employed is non-unique or unstable then we must conclude that the problem is not well posed and could not be associated with economic processes.
7.5 Terminal conditions
129
The argument that a problem which exhibits instability is wrongly posed does not mean that there is a hidden bias in favour of economic models which indicate that the economy is well behaved in the sense that it adjusts rapidly to shocks. The term instability is sometimes regarded as interchangeable with the term volatility. We are using the term stability to refer to the property that the eigenvalues of the system are of modulus less than one. This does not mean that a system which is mathematically stable (not explosive) cannot exhibit highly volatile behaviour or fluctuate wildly. There can be eigenvalues close to the inside edge of the unit circle. Such a system could be highly volatile and take a considerable period of time to settle down after a disturbance. Of itself, the use of terminal conditions will not ensure a stable solution. It may be possible that the matrix A has unstable roots. We are assuming that there are as many terminal conditions as unstable roots, but this need not always be the case (Blanchard, 1979; Buiter, 1981b). There still remains the problem of how the terminal conditions are actually to be chosen. A number of proposals have been made. One particular proposal (Minford et a/., 1980) is that the terminal conditions should be set at the equilibrium solution of the model. The justification for this is that nonconvergent behaviour would provide a response on the part of economic agents who would then respond so as to eliminate the non-convergence. This line of argument specifically rules out speculative bubbles or self-fulfilling expectations which put the economy on to an unstable path. Another argument is that rational-expectations models must have underlying them some implicit intertemporal optimisation, and the solution of this optimisation does presuppose some transversality conditions (Sargent, 1979). The question of the intertemporal optimality underlying the rationalexpectations hypothesis will be gone into further in chapter 8. For the purpose of this chapter we adopt a more pragmatic approach. One of the difficulties of using the equilibrium solution of the model as terminal conditions is the problem of determining it, especially when the model is large and (possibly) non-linear. An alternative is to take advantage of the fact that the values of variables to which expectations refer in time period T+ 1 are likely to be in the neighbourhood of their values at T (Holly and Beenstock, 1980). For example, consider the scalar difference equation: yt = ayt-1+bxt
+ dyt + 1
(7.5.1)
The solution procedure of (7.4.6), for this equation, produces a terminal condition:
n
5=0
where ht = (1 — dat + l)~x. So long as \d\ is less than one, as T goes to infinity
130
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
the contribution which 6 makes will disappear. Alternatively, we could assume yT+1 = yT. The equation for period T is: yr^d-dy'ayr^+d-dy'bxr
(7.5.2)
or: As long as \d\ is less than one, the contribution which the extra term (1 — d) makes will die away as we solve backwards from T to t. If instead we suppose: where S is a matrix with non-zero terms only on the diagonal, then (7.4.4) will have: AT=(I-DSy1A,
BT = (I-DSy1B,
CT = (I-DS)~1C
and DT=0
The extent to which the assumed terminal conditions 'corrupt' the solution will depend both upon the eigenvalues of D and upon the size of T. In practical applications we are always constrained by how large a value of T may be used. For this reason it is usual to require the user to supply the expected terminal condition y r +i|i as part of the model solution specification (see Chow, 1980). This does mean that the results for the later part of the solution period will be affected by the numerical values chosen for YT_l^1. However, there are two important cases where they will not. First, if, for a date sufficiently far in the future, the expected values change at a fixed rate, then yT+l\1 = GyT\x where G is a diagonal matrix of growth rates. The » + i | i term on the right of (7.4.9) now vanishes and the bottom right submatrix of the inverse matrix becomes (1 —D0 — D1A). It seems quite plausible that agents lacking adequate conditioning information about the distant future may form expectations which eventually follow a 'steady-state' path, even if those expectations are not particularly reliable. Examples would be expectations which no longer change (G = /), or expectations which change according to one's assessment of the equilibrium rates of growth, for example, the core rate of inflation, the natural rate of unemployment or a zero balance of payments. The second possibility is that, for large enough values of T, the solutions of interest, y^l9.. .9yt\l9 say, will turn out to be insensitive to the terminal condition. This is easily checked by evaluating those forecasts in (7.4.9) for successive values of T (and a plausible terminal condition) until their values stabilise, as proposed by Fair and Taylor (1983). It is also possible to establish if such a value of T exists by checking that the first t submatrices of the final column of the inverse matrix vanish as T-> oo. Using Theil's technique for inverting an infinite-band matrix (Theil, 1964), this amounts to checking that the roots of the matrix difference equations: ^-(/-DoKi-^O
(7.5.3)
7.6 Numerical solutions
131
and ^ , - ( 7 - 1 ) 0 ) ^ - 1 + ^ , - 2 = 0
(7.5.4) x
all lie within the unit circle; i.e. that p\D^ (l
fDr'(;-D0) : - V I
— D o )] < 1 and
,
where p ( . ) denotes spectral radius and Dx * may be replaced by a generalised left-inverse if, as usually happens in econometric models, Dl is not of full rank. These spectral-radius conditions in fact imply two constraints on the parameters of the structural model which ensure that the usual 'saddlepoint' property guaranteeing a unique solution to (7.3.1) will hold. The saddlepoint property requires that there shall be as many unstable roots to the reducedform system as there are rational expectations of future variables - the system's remaining roots all being stable in the conventional sense (Shiller, 1977). To illustrate, take the case where Do = 0 in (7.3.1). Then the spectralradius conditions boil down to:
and
o /][/ "o] < 1
(755)
Thus Dx must have roots outside the unit circle, and also
P p
~A~\ i • < 1 since
o
o
Uthe matrix oj involving A here defines the roots of just the lag structure of the but model and hence supplies the other part of the 'saddlepoint'.
7.6
THE NUMERICAL SOLUTION OF RATIONAL-EXPECTATIONS MODELS
The analytical techniques discussed in previous sections may not always be the most convenient, and are certainly not - with the possible exception of the penalty-function method - the most efficient, way of obtaining numerical solutions to rational-expectations models. Furthermore, when the model is non-linear, analytical solutions will not, in general, exist. In this section we consider some numerical techniques which essentially treat the solution of non-linear (or possibly linear) rational-expectations models as a numerical two-point boundary-value problem. The initial values, y0, provide one set of boundary conditions and the terminal conditions, 0, provide the remaining or end-point boundary conditions.
132
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
Assume that the non-linear stochastic rational-expectations model can be written: Aynyt-i>-->yt-s>xt,xt-l9...,xt-r9yf+1)t9Et)=o
for
t= i , . . . , r
(7.6.1)
where there are a maximum of s lagged values of the endogenous variables and a maximum of r lagged values of the exogenous variables. et is, in general, a vector of multiplicative structural disturbances with a multivariate normal distribution with zero mean and a finite variance-covariance matrix. Taking 'expectations' as before, at time t the deterministic solution of (7.7.1) et + i, = 0, for i > 0, implies that along the solution path: y?+r\t = yt+T ^
T=1,...,T
(7.6.2)
for given initial and terminal conditions. Thus a solution to (7.6.1) is sought such that each yf+1|t is actually equal to the calculated value of yt + 1. We assume that the values of the sequence of policy instruments, xt, and the sequence of exogenous variables, et, for t = 1 , . . . , T are either known or have been projected. Thus they are given as part of each information set Q,.
7.6.1 Jacobian methods For convenience we define a new vector of exogenous variables: V, = J>f+l f !
We thus seek values of vt such that: vt = yt + 1
for t = l
T - l
(7.6.3)
If we start with an initial guess at vt9 at r = l , . . . , 7, we can solve (7.6.1) forwards for given xt, et9 vt91 = 1 , . . . , T, from the initial condition y0 until the terminal period where the boundary condition provides for the value of v r . Running equation (7.6.1) forward will generate a sequence of values for yt + l. If the perfect-foresight condition (7.6.2) is not satisfied, then we seek an improved approximation vf + 1 , f = 1 , . . . , T, from which we can generate a further y* +}, t = 1 , . . . , T. Note that, for each forward-running solution of the model, the expectational proxy, vp is exogenous. Now let ef, e*+1 be the respective errors in the approximation to the perfect-foresight solution so that: yS + 1 _ ,.S + 1 , 5 + 1 Vt — Yt + l + e r + l
\ ' ' )
for t = l , . . . , r . A Jacobian method for generating successive solutions
7.6 Numerical solutions
133
(Anderson, 1979; Fair, 1979; Minford et al.9 1980) uses variations on the formula: for t — 1 , . . . , T, where Ao is a constant determined on a trial-and-error basis to minimise the number of iterations necessary to achieve: /r + 1
(7.6.5)
where x is some predetermined tolerance factor. This convergence criterion is equivalent to defining convergence by: (7.6.6) for small values of x (e.g. x = 10~3). 7.6.2 General first-order iterative methods As always with raw first-order iterative solution techniques, convergence is not guaranteed and in many cases it may not be very rapid. Extrapolating those iterates, however, can often prevent problems of non-convergence, and it will usually reduce the computational burden substantially by accelerating the rate of convergence. But the results we need here have all been derived for conventional equation systems (i.e. those which can be solved recursively forwards, without the complications of forward-looking expectations terms). We therefore have to fit our rational-expectations model into a general framework to which these results can be applied, and within which the convergence of the various Jacobian methods can be systematically analysed and compared (Fisher, Holly and Hughes Hallett, 1986). Consider first an arbitrary linear equation system : Ay = b
(7.6.7.)
where AeRnn is a known real matrix of order n, normalised to have unit diagonal elements, and where y and b are real vectors containing the unknown and predetermined parts of each equation respectively. Stationary first-order iterative techniques: y<* + i> =
Gy (s) + c
s=0,l,2,... (0)
(7.6.8)
with an arbitrary start y , are used routinely to construct the numerical solution to (7.6.7). They are also widely used for solving non-linear equation systems, in which case A represents the system's Jacobian matrix. First-order methods are based on the splitting A = P - Q where P is nonsingular; and they are completely consistent with (7.6.4) when G = P~XQ and c = P~1b define the iteration matrix and forecasing function. The Jacobi,
134
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
Gauss-Seidel and successive over-relaxation (SOR) iterative methods are special cases of (7.6.2) where P = l , P = (I-L) and P = a ~ 1 ( J - a L ) respectively, where L is a matrix such that Ltj = Atj if i<j (and zeros elsewhere) and where a is an acceleration parameter to be chosen. Thus the Jacobi method would be: i-l
(7.6.9)
J=l
j=i+l
while the SOR method sets:
[
i-l
n
(7.6.10) ( s+i)
y L v.
4- y
where A = I — L—U. The rate of con vergence of any method represented by (7.6.8) can be increased by the extrapolation: )
= y[Gy{s) + c]
(7.6.11)
+ yc
Two particular versions of (7.6.11) are widely used: the Jacobi over-relaxation (JOR) method with G = / - A, and the fast Gauss-Seidel (FGS) method with G = (/ — <xL) ~1 [a U + (1 — a)/]. The convergence conditions, in terms of ,4 or G, and the value of y which maximises the rate of a convergence are given in an appendix to this chapter. Consider now the rational expectations model at (7.3.1). Conditioning the expectations on Q 1? and writing I — Do = Alf the solution at (7.4.9) was obtained from: An — Di 0 "1 f~Vi ii ~l Y B 0 -A
0
'-A
AoJL^ij _
0
£
_uT]i _
(7.6.12) i.e. by solving: Ay = E
(7.6.13)
where A is the block tridiagonal matrix in (7.6.12); y is the stacked vector of the endogenous-variables condition on Q x ; and b is the stacked vector of terms on the right of (7.6.12). The solution of a dynamic rational-expectations model therefore has exactly the same form as that for a conventional equation system. The differences are only that: (i) the unknowns of different time periods have to be determined simultaneously rather than recursively; (ii) the Jacobian matrix A has been replaced by the block tridiagonal matrix A; and (iii) the variables in b are augmented by the terminal conditions yT+i\\ (if their contribution is not negligible for large values of T).
7.6 Numerical solutions
135
Therefore, we can continue to use simple first-order iterative techniques as a cheap way of constructing numerical solutions to rational-expectations models. However, whereas the ordering of the elements in y of (7.6.7) had no special significance, the equations in (7.6.12) are ordered by time periods. In conventional models, multiperiod solutions are generated sequentially forwards because that exploits the block recursive structures of (7.6.12) when Dx = 0. Consequently the only relevant splittings of A are those defined within Ax itself. But when D1^0 the block recursive structure is lost and it is important to consider splitting of A over time periods as well as over the equations within a given period. The three main possibilities are: (a) Splitting of A, element by element, without regard to its block structure. These splittings define a family of simple first-order iterations on (7.6.12) treated as one large equation system. (b) Splittings of A, element by element within each diagonal block, to define equation-by-equation iterations for each time period (solved sequentially forwards) with expectations temporarily fixed; and a separate block-by-block splitting for the above diagonal submatrices to define the iterative steps which update those expectations terms. These splittings define a two-part iteration: an inner iteration which solves for the current lagged variables of each period, and an outer iteration which updates the forward-expectations terms. (c) Type (b) splittings in which the inner iterations are not taken to convergence at each outer-loop step. This can be done by setting a substantially weaker convergence criterion on the inner loop. Computational savings are made because computation is not wasted in getting a full inner-loop solution which is then going to be changed again by the values degenerated in the next outer-loop step. Any of these iterations may be extrapolated in exactly the same way as (7.6.8), but type (b) and (c) methods allow different extrapolations to be applied to the inner and outer loops. In practice, given the sparseness of the block structure of A9 and in order to exploit the sparseness in the A, Ax and Dx submatrices which is a typical feature of econometric models, it will be efficient computationally to perform these iterations equation by equation. Thus a type (a) method using FGS would be (all variables being conditioned on Q x ):
-j-
1 n
and ^(S) __ ^ ( S - l / 2 ) 4. (1 _ y)j;(«)
(7.6.14)
136
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
for i = 1 , . . . , n, and each t = 1 , . . . , T in turn. 4 We have written L and U for the upper and lower triangular submatrices of Do in (7.6.14). Similarly a type (b) or (c) method using FGS would be:
and
& = yoy\r1/2)
+ (i - y o M r 1 }
(7.6.15)
where the outer iteration is defined by: J*} = Vi I
[ M f J - i + Dotjy} + B i y ^ V + ^ ] + (1 + 7i)y&
(7.6.16)
given yft as the 'converged' values from the last round of inner iterations by (7.6.15). Three special cases of these schemes have appeared elsewhere in the literature. The Anderson-Fair-Taylor method (Fair and Taylor, 1983) is a type (b) scheme, but it can be extended to include both incomplete inner iterations and extrapolated inner- and outer-loop iterations. The method suggested by Hall (1985) can be classified as either a type (a) scheme, or a type (c) scheme with a single inner-loop step per outer-loop step. Finally the multiple-shooting method (Lipton et al., 1982) can be written as an expanded version of a type (a) scheme (see section 7.6.4). There are obviously a great many other possibilities and a range of these algorithms is investigated in Fisher, Holly and Hughes Hallett (1986). That study shows that the most efficient solution methods are the two-part iterative schemes, where the innerloop iterations are not completed (i.e. they are terminated early by a weak convergence criterion applied only to those variables having forward expectations in the model), and where the inner- and outer-loop iterations are accelerated by using y0 and yi. However, the weaker inner-loop convergence criterion plays the major role because the logarithmic relation between the speed of convergence and the iteration matrix's spectral radius (see the appendix) means that more than proportional gains can be made by only loosely approximating convergence before invalidating it again with the next outer-loop step.
7.6.3 Non-linearities and the evaluation of multipliers Non-linear rational-expectations models can be solved numerically by exactly the same first-order iterative methods to yield R (the multiplier matrix)
7.6 Numerical solutions
137
directly. For example, the non-linear counterpart to (7.6.14) - the type (a) FGS method - would be:
and ^ ) = y3^-1/2) + (l-y)>«"1)
(7.6.17)
for f = 1,.. .,n, and each t = 1 , . . . , T in turn. Similarly (7.6.\5)-{1.6.16) becomes : As)
v(s-l/2)
i;(s-l/2)
1;(s-D
v(*-l) V(*-l)l
and y 1 / 2 ) + (i-yo)3*"1)
(7.6.18)
The multipliers in K can now be evaluated numerically by simulating a series of unit shocks to each instrument in each period 8xti, i = 1 , . . . , n and t = 1 , . . . , T around the forecast path just obtained. The target changes (8yxj, j = 1 , . . . , m, and T = 1 , . . . , T) then supply the 1,7th element of the submatrix KTt. Finally:
Finally, although iterative methods such as JOR and FGS can be written as specialisations of the standard Newton solution method (Hughes Hallett, 1985), Newton methods themselves are not considered here because repeatedly evaluating and inverting matrices of partial derivatives are extremely expensive. To reach the sth step requires arithmetic operations of the order of sn2(n + 1) for the Newton method, whereas first-order iterations applied to (7.6.13) will require sZ^- operations (where ^ = the number of endogenous variables in the ;th equation). Since n, is typically no more than about 4 or 5 in econometric models, whereas n may be 100 or more, the advantage of first-order iterations is that they trade slower convergence for less computation per step. In fact, in conventional econometric models, firstorder iterative solutions usually involve a fraction of the calculations required for even one step of (modified) Newton techniques (Fisher, Holly and Hughes Hallett, 1986).
138
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
7.6.4 Shooting methods The third method which has been used for the numerical solution of rationalexpectations models is 'shooting' (Keller, 1968), applied in economics by Lipton et al. (1982). In this method we assume that the terminal period conditions are known - possibly as the equilibrium or steady-state solution implied by the known sequence of xt+j, j = 0 , . . . , Y where y t _ ; , . . . , s, and xt _ j9 j = 0 , . . . , r, are predetermined. We then seek the value of yt which, when the model is solved forward deterministically as: /OWi,^<W = 0
for
t=l,...,T
(7.6.19)
implies the terminal condition yT+1. <\>t summarises the predetermined variables and xt sequence. Equation (7.6.19) is written in this form to indicate that the model is solved with yt + 1 on the left-hand side. Because of the saddlepoint property of rational-expectations models, this means that we are solving unstable equations forward. This contrasts with the previous two methods where, because the term yt + 1 is exogenous for each model solution, stable equations are solved forward. Thus, in its simplest form, the shooting method starts with an initial guess for yf, and solves the model forward ('shoots') to obtain an implied terminal value >4+i- If Jr+i is equal to $f~+\ then we have a solution to the two-point boundary-value problem. Otherwise we need a revision, $ + 1, to the initial guess, conditional on the difference between y^+i a n d yr+i- The Newton-Raphson iteration would set: rf + 1 = ) f + (5yT + i / ^ ) " 1 ( 3 ' r + i - ^ + ) i ) (7.6.20) Given the saddlepoint property of the model (i.e. that, with increasing T values, expected future states have a vanishing impact on the current state), the larger T is, the smaller is the Jacobian dyT+1/dyt. This means that ultimately the algorithm will break down. In practice, even with quite small T values, updating by (7.6.20) often gives results which are wild enough to prevent convergence overall. One way around this problem with the simple shooting method is 'parallel' or 'multiple' shooting. The full period of solution is divided into subintervals. For each subinterval a 'terminal' value is set and an initial guess at the starting value of the subinterval is used to solve over the subinterval. The model is solved forward, using each subinterval initial guess to start the next subinterval. The information which this provides is used to update both the guess at the start of each subinterval and the 'terminal' guess on the end of each subinterval. For the purposes of convergence analysis, the simple shooting method for the linear model (7.3.1) can be represented as follows:
?
^
rf^
+Bxt + ut
(7.6.21a)
(7.6.21b) (7.6.21c)
139
7.7 Stochastic solutions
where N = (dyT+l/dy1) 1 contains a numerical evaluation of the transfer function (obtained, say, from the previous step), and Ax = / — Do as before. To complete this specification some (iterative) method of solving (7.6.21a) to obtain y^l^, given yf and yikll9 etc., must be included. Whatever this subsidiary iteration is, the complete shooting method corresponds to some splitting of the system: -Dx
0
yx
Bxl+ul
-A 0
-A
0
•0 • • • •
Ax
Ay0 0
BxT + uT
(7.6.22) k
5
- t h e y\ +nt terms having been eliminated from (7.6.21). Notice that the roots of the left-hand matrix in (7.6.22) are given by: \A-M\\N-M\ = 0
(7.6.23)
where A is the same matrix as appeared in (7.6.13). Consequently, whatever the shooting method (splitting of (7.6.22)) chosen, it cannot converge any faster than the corresponding method applied to (7.6.13) as a single large equation system. Since we can, in general, reduce the computational burden by employing a type (b) or (c) method rather than a type (a) method for (7.6.13), and since (7.6.22) itself is anyway a longer system, the shooting method must actually be more expensive. In addition, rational-expectations models can be solved forwards in the form (7.6.21a) even in the linear case (Preston and Pagan, 1982). In the non-linear case, the non-linear functions of the expectations variables must be invertible, and that cannot be guaranteed either. The shooting method may break down for these reasons. It has nevertheless been widely used for the solution of two-point boundary problems, and it can be used to provide open-loop solutions to policy-design problems approached via the minimum principle (Brogan, 1974; Driffill, 1982).
7.7
STOCHASTIC SOLUTIONS OF NON-LINEAR RATIONAL-EXPECTATIONS MODELS
The algorithms of the previous section are suitable for solving deterministic, zero-noise, non-linear models. One, though not necessarily convincing, justification for this is that it mirrors the normal practice among macroeconomic forecasters. Given that the true stochastic non-linear model is
140
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
equation (7.7.1) and leaving aside, for one moment, the role of yf+ 1>f, consider the standard non-linear model used for forecasting. It is well known that the expected value of a non-linear function of a random variable is not equal to the non-linear function of the expected value of the random variable. What this implies is that a deterministic solution of the endogenous variables in a nonlinear model, in contrast to that of a linear model, generally will not be equal to the conditional expectation of the endogenous variables. The conditional expectation can be approximated by stochastic simulation, and the difference between the deterministic path and the mean of the stochastic replications is one measure of non-linearity. In stochastic simulations equation error terms are represented by pseudo-random shock whose properties in general are determined by the corresponding error distribution obtained when the model is estimated. For each stochastic simulation, disturbances are input in each time period up to the horizon and, to provide efficient estimates of both the mean and the standard errors of the model, repeated stochastic simulations must be carried out, and this can be very expensive. A majority of the tests which have been carried out have actually found little difference between the deterministic solution and the mean of the stochastic replications. But a number of commentators (Brown and Mariano, 1981; Salmon and Wallis, 1980) have argued that this may be insufficient justification for confining forecasting to deterministic simulations. This leaves aside the issue of whether the forecaster wishes to generate model-based confidence bands around his forecast, which may be desirable whether the non-linearity biases are serious or not. The potential problems arising from stochastic disturbances in non-linear models are enormously compounded when we admit forward-looking expectations. The first problem concerns whether we wish to make the rational expectation of a variable equivalent to the conditional or mathematical expectation. And the second problem concerns the way in which stochastic simulations are carried out. It is obvious that, if non-linear biases are found to be important, it is not sufficient to interpret the consistent expectation path provided by the algorithms of this section as the correct expectation, or certainty equivalent, in a stochastic model. This implies that the current consistent expectations may have to be obtained by stochastic simulation. And this brings us to the second difficulty. Normally, in stochastic simulations, a single replication of the solution is achieved in one sweep (though perhaps iterating to obtain the solution for each time period). Random disturbances are input one period after another by moving forward from the first to the last period. The results are stored and then there is a return to the first period to repeat the process. Unfortunately forward-looking expectations add an additional dimension to stochastic simulations because of a need to draw a distinction between anticipated and unanticipated disturbances.
7.8 Controllability and non-neutrality of policy
141
Unless in the consistent-expectations model endogenous variables possess no autoregressive structure (a highly unlikely occurrence), a stochastic shock at time period t must lead to a revision in expectations from periods t + T (T > 0) subsequent to t. The need to set stochastic disturbances equal to their expected values of zero in period t + T, for x > 1, arises because these random disturbances are strictly unanticipated by agents at time period t. Because of this, just to be able to generate one stochastic replication for T time periods we need to solve the outer loop as well, but with each successive outer-loop solution one period shorter than the previous one. This means that one replication requires approximately that K[T(T+ l)/2] periods be solved where K is the average number of iterations each minimisation requires. The total number of solution periods required to generate efficient estimates of the mean and standard error bands may be prohibitively expensive. If it could be done, though, it would also provide error bands for expectations. We could also consider the effect of anticipations of exogenous variables on the error bands of the model but here we would need to draw a distinction between anticipated transitory stochastic shocks to exogenous variables and anticipated permanent stochastic shifts in the mean behaviour of exogenous variables.
7.8
CONTROLLABILITY AND THE NON-NEUTRALITY OF POLICY UNDER RATIONAL EXPECTATIONS
One of the most arresting and provocative findings of the early rationalexpectations literature was the so-called 'neutrality proposition' of Sargent and Wallace (1975). This proposition, perhaps more than any other single contribution, was responsible for establishing the notion of rational expectations in discussions of macroeconomic policy. What Sargent and Wallace were able to show was that, if one postulated some simple feedback rule with the money supply a function of lagged information, the distribution of output was independent of the feedback rule. This went much further than the claim of Friedman (1968) that in the long run output was independent of monetary policy, to the assertion that output was independent even in the short run. What is at issue is a question of controllability, and the extent to which rational-expectations models are controllable. The need to establish the conditions under which a model is controllable is logically prior to the design state so we must establish analogous controllability criteria to those of chapter 2 for the linear rational-expectations model. For the purposes of this section we shall be working with: (7.8.1)
142
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
This section extends the conventional controllability analysis for models containing rational expectations. However, it will be essential to do so in a way which lays bare the (dynamic) structure of the interactions between the expectations terms and the causal terms in the model. For otherwise it will not be possible to determine which particular features of a given model, or what general properties of the rational-expectations mechanism, are responsible for policy neutrality. Earlier studies have already established that policy ineffectiveness is not in fact a consequence of rational expectations in a number of particular cases. For example Shiller (1977) and Persson (1979) show that certain departures from a log-linear specification of the expectations imply a role for monetary policy. So too do rigidities in the underlying goods or factors markets (see, for example, Fischer, 1979), or cases where each decision depends on the information of different dates or agents (e.g. Buiter, 1981a). Multiplicative uncertainties can also give policy a role. But, in each case, the point has been established by counter-example; a particular alternative model is used to demonstrate policy non-neutrality. That does not provide a framework for analysing the controllability of rational-expectations models, nor for determining what conditions are necessary for policy ineffectiveness.
7.8.1 Dynamic controllability It will be helpful to keep in mind the derivation of the dynamiccontrollability criterion for a model with endogenously generated expectations. Consider (7.8.1) with DO = DX. Chapter 2 established that this system is dynamically controllable over the interval (1, t), for any t ^ 1, if and only if r{Rg,...,Rl) = g, where r(.) denotes rank and Ri = Ai~iB for z = l , . . . , T . Dynamic controllability implies that desired values can be achieved for the endogenous variables yti in period t, given any arbitrary starting position y0. This relaxes the traditional theory of economic policy (Tinbergen, 1952). The dynamic controllability conditions can be established as follows.6 Back-substituting in (7.8.1), when D0 = D1 = 0, yields:
y<= tR*t+l i=l
Aut^ + Ayo
(7.8.2)
i=0
Now (7.8.2) is controllable over (l,t) if and only if it is possible to find a sequence of values xl9.. ,9xt to reach any value yt9 given any y0. But the random nature of u t _ t , as seen from the start of period 1, means that we can only guarantee a pre-assigned expected value yt^. Hence the expectation rather than the realisation of yt is controllable. Inserting ut_t^ on the right of (7.8.2) gives yt^. Dynamic controllability follows if we can solve for xx,..., xt9
7.8 Controllability and non-neutrality of policy
143
given the ideal value for yt^ and values for y0 and ut^. Such a solution always exists provided r{Rt9.. .,K 1 ) = 0. But if t^g, then ^ # , , . . . , 1 ^ ) = r(Rg,..., Rx), which provides the result. Finally, if we are concerned only with a subset of m targets, we can delete the g — m non-target rows from each R( and have dynamic controllability for those targets if r(Rm,..., Rx) = m. Now, the same arguments may be repeated to establish the controllability of (7.8.1). We saw earlier that such a model could be written in the form (7.4.9) when conditioned on Q1. In this context xf!l was regarded as a policy instrument chosen for immediate execution, and x ^ {t > 2) as values chosen as expected future policies (in the absence of further information changes). We can then repeat the same controllability argument, but applied to:
from (7.4.9). Consequently, (7.8.1) is dynamically controllable over (1, t), for t ^ g if and only if: r(Rtg,...,Rtl) = g
(7.8.4)
and again a subset of m targets is dynamically controllable if and only if r(Rtm,..., Rn) = m. The only effective difference is that each yt^ is now influenced by xt + l\l9.. .,x f |i. The impacts of pre-announced or expected future instrument values are included through the reactions of yt\x to expected future states of the economy and hence to future expected instruments. This means that the policy-maker can exploit pre-announcements of instrument values (or the agents' anticipations of them), in addition to causal reactions to current and past instruments, in order to help influence the economy. In general, therefore, expectations will increase the degree of economic control, unless the reactions to expectations happen to neutralise the causal and dynamic instrument impacts. A small change to the usual interpretation of dynamic controllability should be mentioned here: r(Rlg9.. ,,Rll) = g implies dynamic control of y^ by means of setting z1(1 and preannouncements or expectations for x ^ , . . . , x r | i . This may be distinguished from r(RTg9..., RT1) = g which implies the control of >Y|i through the causal impacts of x^t,..., xT^. The former criterion may or may not lead to the preannouncements being realised (i.e. xt = xt\x for t ^ 2); although alterations to the expectations themselves would almost certainly be introduced if systematic difference between announcements and out-turns had been observed in the past, and especially if xt were rationally chosen (e.g. by optimal-control methods) but the pre-announcements were dynamically inconsistent or demonstrably suboptimal. The latter criterion, however, does not rely on pre-announcements although the operation of expectations is
144
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
incorporated in it, and Rtj may in practice turn out to be rather different numerically from RTj for any j = 1 , . . . , g. Hence the former case describes the controllability of the system as a longer-run proposition and it may involve elements of control by exhortation, if not by tactical manipulation. The latter case indicates whether the policy-maker can expect to have controllability by a specific horizon 7, which, at the same time is consistent with expecting to leave the economy in some given terminal step >v +111 that matches the current anticipations of other agents.
7.82
An example: the non-neutrality of money
The value of this extended dynamic-controllability criterion is well illustrated by applying it to the small macroeconomic model of Sargent and Wallace (1975). The Sargent-Wallace model was the first of many used to demonstrate that the impact of expectations of future inflation may be such as to exactly offset the causal effects of the supply or money as an instrument for controlling real output. This has led to the proposition that a simple pre-assigned rule for money supply is the most advisable policy to pursue. Investigating the dynamic controllability of this model shows this proposition to be invalid because it applies only to one-period controllability when the potential timevarying nature of the submatrices Rtj in (7.8.3) is also ignored. In short, real output can be controlled at t = 2 through the causal and dynamic effects in the system as a whole, although first-period instruments and pre-announced instruments are both neutralised by the expectations mechanism. The Sargent-Wallace model may be written as:
zt = a1kt_1 +a2pt-a2ptltjorai<0,
i = 1,2
(7.8.5) (7.8.6)
= mt- cxzt - c2rn for cx > 0, c2 < 0
(7.8.7) (7.8.8)
where all non-instrument exogenous factors have been set to zero without loss of generality for controllability, and (7.8.6) has been renormalised. Here zt = real output, kt = productive capacity, rt = the rate of interest, and mt = the money supply. All variables are expressed in logarithms. The endogenous and instrument coefficient matrices are: 1
0
0 1 - d,
— a2
0
0
0
1
0
0
1
0 and
0 1
0
7.8 Controllability and non-neutrality of policy
145
whicti yields the following evaluation of the parameters of (7.8.1): ~0 0 0 y(a1-a2c2pi) 0 0 0 A= 0 0 0 (ai/a2)(y ~ 1) ~ PiC, i7 _0 0 0
] +rf i__ + s2(c2)84 + c 1 )]- 1 >0
when5 l > y = and
B=
)m2
0 0
ychPA i.
0 0
y
0 0
A*
'
2
0
_
0 0
—
0 0
y ( l + a2c±
0 0
-yc2
0 0
l[y(l
ya2(c2 - 1)
0
)] y(c2-l)+l
o 0
0 d2[l -7-7^(^4 + Ci)] 0 0 ^
0
+ a2c1 r
1
] o
Now, direct evaluation of (7.8.3) for the case of T= 2 yields:
Rtl = I + [1 + (/ -Doy'D.FAil
-Dor1^]
and R12 =
(I-Doy1DlFB
with R21=FA{I-D0)~lB R22 =
and
FB=(I-D0y1D1B
which defines F. Thus the dynamic controllability of y2^ depends on ^ 2 2 ^ 2 1 ) - I n particular control of output, z2)1, alone requires ^ $ 2 , ^ ^ ) = 1 where R^ is the top row of R2i (for i= 1,2). The latter matrices are:
R22 =
~eif/h (2e2f-i)/h -l/h
-ejlh where /i = ( l --eJ)(c2-\)
and
i? 21 =
0 0 0 _0
146
f =
7 THE LINEAR RATIONAL-EXPECTATIONS MODEL
2yd2c2a2b4
Notice that the inequalities of (7.8.5)-(7.8.8) ensure that h ^ - 1 , and that — e 2 / < 0. Consequently all the policy impacts dy2\Jdxt are finite for t = 1,2. In particular, expected real output in period 2 is controllable by the money supply in period 2, although first-period money supply can influence neither output in period 2 nor r 2 | 1 , p2|i anc * &2|iReal output, z2 \x, can actually remain uninfluenced by the money supply, m2, in this model, if / = 0 or if ex = 0. Now / = 0 follows if d2 = 0, so that productive capacity becomes independent of anticipated changes in real interest rates (rt —pt + l\t —pt); or if c2 = 0 so that prices are independent of interest rates; or if a2 = 0 so that not even unanticipated price movements influence output, which is then predetermined; or if j34 = 0, so that interest rates are independent of output levels.7 Similarly, e1=0 follows only if ax = 0 (since yfii{c1 - c 2 ) = - 1 generally occurs with probability zero). Hence, if policy ineffectiveness is observed or postulated, it could be the result of any one of several different factors, only one of which is due to the special form of the 'only surprises matter' supply function specified by (7.8.5). It may be true (as is often argued) that the absence of all anticipated price effects in (7.8.5) is a necessary condition for the independence of first-period output, zx\l9 from money, m1. But the same does not hold for subsequent periods because the absence of all price effects (both anticipated and unanticipated) is the necessary condition for z2\i and m2 to be independent. Nevertheless, the specification of this supply function (and a2 in particular) continues to play a special role here in the sense that the impact of m2 on z2\i is weakened more rapidly if a2 decreases than if the other parameters in (7.8.5)-(7.8.8) decrease. As a2 is reduced, e3f/g tends to a function of second order in a2 but of first order in the other parameters. The dynamic controllability of first-period targets in y^l9 on the other hand, depends on r(R12,Rll). In this case: 0
c2fh(c2-l) eJ2/(Hc2-l))
0 and
7.8 Controllability and non-neutrality of policy
147
So real output is not controllable in period 1, either statically or by making pre-announcements of the money supply. Also / e ^ can only be influenced by the current money supply. Hence the control of real output in period 2 is achieved indirectly, that is, through the dynamics of the model, and the imposition of a fixed horizon or terminal state for the economy. In this framework money is certainly not neutral with respect to its impact on real output. To extend this example slightly, suppose we assume that the relation yT+i\i= dyT\! holds. Then repeating the above calculations indicates that the leading element of R22 becomes (-e1f/g + dy2a2c2). It is easily checked that this new value is non-zero with probability one (i.e. except by absolute chance). Moreover, (/— D)~1D1 has eigenvalues of zero and c2/(c2 — 1)- But c2 < 0 so p(I — DQ 1D1) < 1 is guaranteed. Consequently the controllability of the model in (7.8.5)—(7.8.8) does not depend crucially on the assumption of some terminal condition. 7.8.3 Conclusion: the stochastic neutrality of policy The proposition that money is neutral with respect to its impact on real output is sometimes said to be a stochastic-neutrality theorem: the probability distribution of real output is independent of any decision rule for money supply. Dynamic controllability has allowed us to examine a particular version of stochastic neutrality: namely, that the first moment of the distribution of real output is independent of money. This is a necessary (but not sufficient) condition for any stochastic-neutrality theorem. We have demonstrated that stochastic neutrality depends on the assumption either that only one decision is implemented or that output is independent of anticipated prices. The neutrality of a single decision plus policy preannouncements (identified correctly in the usual money-neutrality proposition), when contrasted with the non-neutrality of the same target to decisions in a two-period problem (with any further pre-announcements implicit in yr-t- i|i) 9 emphasises that a pre-announced policy and the same policy when implemented can prove to have different effects. So it is the singledecision assumption, rather than just rational expectations in a strictly neoclassical model, which is necessary for stochastic neutrality. Concentrating on static controllability in a world where multiperiod planning is possible, and where anticipations generally play a role, may give an inadequate account of the opportunities for policy-making.
8
8.1
Policy design for rational-expectations models INTRODUCTION
In this chapter we consider the problem of designing optimal economic policies for models in which expectations are forward-looking. This area throws up a number of intriguing problems which do not appear in the standard causal model. The backwards inductive process of dynamic programming which plays such an important role in standard optimalcontrol theory does not work in non-causal models because the constrainedobjective function is no longer recursively separable with respect to time, and the decision variables are no longer nested with respect to each time period. Therefore, the backwards recursion technique of dynamic programming is unable to satisfy all of the first-order conditions necessary for optimality. The stacked form of the rational-expectations model developed in chapter 7 and the penalty-function method are capable of being used to generate policies which satisfy all the first-order conditions for optimality. But there still remains a difficulty if these policies, examined ex post, appear not to be optimal because of the so-called dynamic-inconsistency problem. We examine various aspects of this problem and attempt to draw out the possible analogy with game theory. However, we postpone a full discussion of game theory until chapters 9 and 10.
8.2
THE DYNAMIC-PROGRAMMING SOLUTION
As before (chapter 3), the basic problem is to determine the vector of instruments (xt: 1 < t < T) to minimise the quadratic objective function: J = 1/2 X (Sy'tQtdyt + 5x'tNt5xt) t=
(8.2.1)
I
but subject now to the linear rational-expectations model: yt = Ayt.x + Bxt + Cet + Dyf+U + et
(8.2.2)
8.2 The dynamic-programming solution
149
Consider the optimisation problem for the terminal period. As in chapter 7 we assume that the transversality or terminal conditions 9 have already been specified for yT+ltt. Then we aim to minimise the terminal-period objective function: J(T, T) = l/2(5y'TQTdyT + 6x'TNjtxT)
(8.2.3)
subject to: YT = ATyT_l
+ BTxT + CTeT + D6 + et
(8.2.4)
where, for period T, AT = A, BT = B, CT = C. This yields the feedback solution: +kT
(8.2.5)
where KT=
-(NT +
kT=-(NT
B'QTB)-lB'QTA
+ B'QTB)-1(QTBydY-NTXdT-NTXdT-BQTCeT-QTDO)
(8.2.6)
Using Q x - i , we have at the previous period, T— 1: +D}/eTtT.1+aT.1
yT_1=AyT_l+BxT_1+CeT.1
(8.2.7)
Substituting (8.2.5) into (8.2.4) and the result into (8.2.7) we have: yT~l =
+ B r - i * r - i + CT-YeT + CT_1DTeT_1
^T-JT-I 2
+ HT_1D 6 + DBT_1kT
(8.2.8)
where AT-I =HT_1AT9BT_l
=HT_1BT,CT^1
=HT_1Ct
and After substitution, an expression for the minimum cost at time period Tcan be written in the quadratic form, just as in section 3.2: J(T,T) = ^ _ 1 P T y r _ 1 - 2 / r _ 1 / i r +
flr
(8.2.9)
where PT and hT are identical to equations (3.2.7a) and (3.2.7b), and aT is identical to (3.2.7c) except for an additional term in DO, where now the coefficient matrices are AT_X, B T _ l 9 CT_±, etc. The two-period problem is to minimise: J(T-
l) = dy'T_1QT_l5yT_1
+<5x'r_1NT_1<5xr_1 + J(T,T)
(8.2.10)
The minimisation of (8.2.9), constrained by (8.2.8), has exactly the same form as minimising (8.2.4) subject to (8.2.5). When the rule for x T _ x based on Q T _ x
150
8 POLICY DESIGN FOR RATIONAL-EXPECTATIONS MODELS
is constructed, the decision-maker must account for the public's reactions today to what will be done tomorrow. But a backwards recursive procedure, for determining X j - i ^ - j , cannot incorporate the reaction of y T 1 | r - i to the values anticipated for xT|T_ j while determining the decision rule for x T j r _ t en route to determining the rule for x r _ x j T _ x . To account for these effects we need to optimise y r - i | r - i a r j d *T\T-I simultaneously, whereas the use of a recursive procedure requires y r - i | r - i t o be invariant to xT\T^l at the first backwards step.
8.3
NON-RECURSIVE METHODS
8.3.1
A direct method
The failure of the recursive relationship of section 8.2 to provide a convenient method of obtaining an optimal-feedback solution (Chow, 1981) does not mean that there are not alternative methods which can be used to obtain optimal policies. Consider the final form of section 7.4, in which we employ a stacked form of the linear rational-expectations model subject to the boundary values y0 and 6. The quadratic objective function in stacked form is: J = (SZ)'Q(SZ)
(8.3.1)
where 0
0
6=
0
QT
0
MT
M\
0
* i
0
0
M'r
0
NT
N*
5Z = (5Y':5X') and, rewriting the stacked rational-expectations model (7.4.9): SY = RSX + b
(8.3.2) d
d
where b = 5 - Y + RX . Note that R is no longer block triangular. Hence dyjdxj = Rtj ^ 0 even if t<j, which defines non-causality. The constrained-objective function now is: J = \/2(dX)'S'QS(dX) + (<5X)'S'Q • • • + constants
(8.3.3)
where as usual S' = (R':InT) so that the minimisation of (8.3.1) gives: X* = Xd-(S'QSy^W
^W
--' --' \ = = Xd-
(8.3.4)
83 Non-recursive methods
151
where
Thus non-causality causes neither theoretical nor computational complications here. All the 'non-causal' multipliers Rtj for t<j are explicitly included in the evaluation of (8.3.4). 83.2
A penalty-function approach
Another way of handling the 'non-causal' aspects of forward-looking rationalexpectations models is by an extension of the penalty-function solution procedure (Holly and Zarrop, 1983) described in section 7.4.3. There, obtaining a solution to a two-point boundary-value problem was posed as a quadratic minimisation problem.1 If we write the quadratic objective function in a slightly different stacked form from (8.3.1) and one which is more closely related to (3.2.2), and augment it with (7.4.12): J* = l/2[SY'QSY + SX'NSX + (Ye - Ys)T(Ye - Ys) + Y2e'GYe]
(8.3.5)
The optimal X consistent with rational expectations is given by: YX*~| _ I" (R'QR + VR'P VR + N) [R'QS + P(I - KS)] Y
K
R
L J ~ Ll 'Q
~RQ
{
XI
~^
+ VK)VK\
1"l
[K'QK + (/ + VK)'P{I + VK) + G] J
PVRV-RQ
KQ KQ - P(I + VK)V
PVR
V
s Xd
fe, 7, i; and s were defined in equations (7.4.11) and (7.4.14). This expression has been kept complex deliberately, in contrast to the more compact form of (8.3.4). It shows how the optimal solution lies on the convergent rational-expectations path, and the terminal conditions captured by v are an integral part of the solution. Despite the complexity, the numerical solution is straightforward, since in practice the vector of expectational variables, Ye, will just be added to the vector of policy instruments, and the vector Ys, which shifts endogenous variables forward in time, will be added to the vector of target variables. The only additional features which need to be added to standard optimal-control procedure are a shift function and a means of setting the terminal conditions according to the methods described in section 7.5.
152 8.4
8 POLICY DESIGN FOR RATIONAL-EXPECTATIONS MODELS TIME INCONSISTENCY
It has been shown that the recursive method of dynamic programming fails to work in models in which expectations are forward-looking because the constrained-objective function is no longer additively recursive with respect to time. Non-recursive methods, however, are capable of generating optimal (open-loop) policies because they first substitute out the interdependencies between time periods created by the expectations terms. By analogy with the non-recursive methods of chapter 3, it would appear that the need for policy adjustments in response to unanticipated and anticipated disturbances could be satisfied by the use of sequential reoptimisations, so that the loss of an explicit optimal-feedback solution for rational-expectations models is not serious at the practical level. Unfortunately the use of a sequential procedure is not quite straightforward because of a particular complication known as time inconsistency, identified by Kydland and Prescott (1977) and Calvo (1978). 8.4.1
A scalar demonstration
It is easier to explain the nature of time inconsistency in rational-expectations models if we employ simple scalar examples. The generalisation to more realistic models is considered later. The aim is to minimise some general preference function: J{yt,yt
+ l9xt,xt
+ 1)
(8.4.1)
where following Kydland and Prescott's notation yn yt + x are target variables or states of the world in time periods t and t + 1 and xt and xt + 1 are linearly independent policy instruments. yt = ft(xt9xt
+ l)
(8.4.2a)
yt+l=ft+1(yt^nxt+1)
(8.4.2b)
These equations can be related to the functional forms of chapter 7 if we assume that the initial and terminal conditions are zero. If we assume differentiability and an interior solution, the first-order conditions for the optimality of (8.4.1) are: dxt
dxt\dyt
dJ , dft dxt + 1 dxt + 1\dyt
\
dLJJ
dyt + 1 dyt J
dxt dyt + 1
dyt + 1 dyt J
dxt + idyt
= Q
(8 .4.4)
+1
These first-order conditions correspond to the optimal solution provided by the non-recursive method of the previous section. But solution provided by
8.4 Time inconsistency
153
dynamic programming for period r + 1 is: dJ dJ dft + 1 + - 1 =0 dxt
+l
dyt + 1 dxt
(8.4.5)
+1
which is a function only of yt + 1 and xt + 1. For time period t the solution is: dyt + 1
dyt )
dxt
dyt + 1
0
(8.4.6)
which of course corresponds to (8.4.3). The dynamic-programming algorithm, using the backwards recursion, will not correspond to the global optimum unless dft/dxt + 1 = 0, that is, unless the model is causal; or unless the direct and indirect effects of yt on J are exactly offsetting, that is, if dJ/dyt + df/dyt + xdyt + Jdyt = 0. 8.4.2 Suboptimal versus time-inconsistent policies Kydland and Prescott refer to the policy sequence (xnxt (8.4.5) and (8.4.6) as consistent.
+ 1)
provided by
Definition: A policy sequence (xt9xt + 1,.. .,x r ) is consistent if, for each time period T, xx maximises J(yt, yt + l9.. .,yT, xt, xt + 1,.. .,x r ) taking as given previous decisions, xt9.. .,x T _ l 5 and that future policy decisions, (xs,s >T), are similarly selected. It should be clear from the above that the dynamic programming method does not even provide an optimal-policy sequence, unless there are no forwardlooking expectations, since it cannot satisfy the first-order conditions for the maximisation of (8.4.1). However, the dynamic-programming method provides an explicit feedback solution, and there may be other reasons, such as the ease with which the public can understand rules rather than a full sequential reoptimisation, which may make suboptimal rules attractive. This possibility will be considered later but for the moment it can be seen that the non-recursive method of the previous section appears (ex-ante) to provide a consistent optimal solution. However, the possibility of inconsistency emerges when we examine policy from the vantage point of a subsequent period. It is found that it is no longer optimal to continue with the ex-ante optimal policy. In the two-period example, if policy is re-examined ex post, taking xt and yt as bygones, the first-order condition for minimising J(yt + lixt + 1 ) is now (8.4.5). Ex-ante optimal decisions are time-consistent. 8.43
A simple example
Some of the issues raised by time inconsistency can be drawn out more clearly with a numerical example. Consider the following three-period problem. Minimise: (8.4.7) 2
154
8 POLICY DESIGN FOR RATIONAL-EXPECTATIONS MODELS
subject to: yt = xf + xr + 1
(8.4.8)
tt + l = * r - * r + l + *r + 2
(8-4.9)
tt + 2 = Xf + *r + l + *r + 2
(8.4.10)
where Jf refers to the objective function from the vantage point of time period t. Substituting the constraints of (8.4.8) to (8.4.10) into (8.4.7), the ex-ante optimal solution is: 4 = 2/7,
x,°+1|r = - 1 / 7 ,
x r % = 1/3
(8.4.11)
In this example, xt + i\t denotes a value for xt+i as computed in period t. The minimum ex-ante cost is J° = 0.2381. An alternative consistent solution provided by the backward recursions of dynamic programming is: xcut = 3/10,
xct + u = -1/10,
xct + 2,t = 1/4
(8.4.12)
the cost of which is J\ = 0.255. Clearly the dynamic-programming solution yields a higher cost than the ex-ante, time-inconsistent, optimal solution. The inconsistency can be observed by re-examining the policy problem from the vantage point of period t + 1. It is assumed, for the moment, that in the first instance (time period t) xr° is implemented and it is believed that xf°+1 and xr°+2 will be implemented. The two-period problem is now to minimise:
subject to: yt + l = x r ° - x r + 1 + x , + 2
(8.4.14)
yt + 2 = x? + xt + 1+xt
(8.4.15)
+2
This yields the ex-post optimal solution: x f+1|r+1 = - 3 / 2 8 ,
x,°+2|r + 1 = 1/3
(8.4.16)
If the public once again believe that the policy-maker will implement xr°+ 2|f +1 > the policy-maker can then recalculate xt + 2, in period t + 2. This gives the result xf°+2|r + 2 = H/56. So the ex-post, perfect-cheating, cost is Jt = 0.1955. On the face of it, we have three different policies. But it is important to observe that the perfect-cheating solution is a feasible option only if it is actually believed in the first period that the ex-ante optimal policy will be implemented in subsequent periods. The scope for achieving a lower cost by reneging in subsequent periods is critically dependent upon this belief. Now suppose that the policy-maker was not believed when the ex-ante optimal
8.4 Time inconsistency
155
policy was announced and instead it was assumed that he will renege in the future whenever he can improve the (current) evaluation of J by doing so. Consequently the actual first-period state would be: Thus, evaluating the cost of the cheating policy on an ex-ante, rather than expost, basis yields the cost, J t + 1|f = 0.2445, which must of course be greater than or equal to the cost of the ex-ante optimal policy - since by definition (8.4.11) is the minimising policy for (8.4.7) and there cannot be another policy sequence which yields a lower cost - given a quadratic loss function and the linear constraints (8.4.8)-(8.4.10). On this basis, the policy-maker would prefer to stick with his original solution of (8.4.11) and J = 0.2381. What has happened here is that, even in the absence of information changes of any kind, the private sector (whose responses are represented by the values taken by yt, yt + l and yt + 2) perceives the government's temptation to renege on its earlier promise ofxt + lt= — 1/7 and to execute instead xt + x tt +1 = — 3/28. What is more, the private sector can predict in period t that this temptation will arise in period t + 1 simply by going through the same set of calculations, to yield the results in (8.4.11) and (8.4.16), before period t + 1 even arrives, just as we have done in this example, because there is no uncertainty or informati6nal changes involved. In this case the private sector would, in period t, anticipate xt + 1 = —3/28, despite the announced value of — 1/7, and would alter their responses for yt (in period t) accordingly. The government, however, is just as capable of predicting the private sector's behaviour as the private sector is of predicting the actions of the government (since there is no uncertainty about the model). Consequently the government will realise that abandoning its promise will actually lead to worse results because of private-sector retaliation, so it ought to retain the exante optimal policy provided by (8.4.11) in preference. Thus, reneging by the government, and the private sector's responses, can be conceived of as mutually offsetting threats. This example is important for showing one reason why decision-makers might resist the temptation to renege, despite the effects of forward-looking expectations. We shall attempt a more general explanation later in this chapter, and in the next two chapters we shall draw out the apparent connection between rational expectations models and the underlying game between agents in the economy. Despite the superiority of the ex-ante optimal policy, we cannot assume that it will be carried out ex-post, especially in period t + 2, when there are clear incentives for the policy-maker to renege, with no come-backs in subsequent periods. To try to eliminate the indeterminacy implicit in the exante optimal solution we need to appeal to other considerations, such as binding agreements or questions of reputation. These issues are taken up later.
156
8.4.4
8 POLICY DESIGN FOR RATIONAL-EXPECTATIONS MODELS
The case of zero policy-adjustment costs
There is a special case in which the possibility of time inconsistency does not arise, and where the ex-ante optimal policy also coincides with the dynamicprogramming solution. If there are not costs attached to varying policy instruments and there are at least as many instruments as targets (this is the condition for static, or Timbergen, controllability), the dynamic optimisation separates into T one-period static optimisations. The future (or non-causality) does not matter because targets yf can be achieved by penaltyless changes to instruments at each t. The objective function for the three-period example is then: Jt = y? + (yt+i-i)2 + yh2 (8.4.20) the optimal instrument values for which are x t °= 1/4, xf°+1 = —1/4, and xr0+2=l/2. Intuitively, the reason why time inconsistency does not arise in this particular case is that these policy values are the unconstrained minimisers of (8.4.20), the optimal cost of which is zero. So, by definition, there is no policy, either now or in the future, which will yield a lower cost. An interesting situation arises if there are fewer instruments than targets and no costs are attached to varying policy instruments. Consider the problem of minimising:
subject to: >>f = X« + l yt+1= yt + 2 =
*i+1
• *i+2
x
t + 1 + *r + 1
where now there are three targets (for three periods) but only two instruments, xf + 1 and xt + 2- The optimal instrument values at time t are x?+lt = — 1/3 and x r°+2,r = 1/2. But, when we move forward to period t + 1 and re-evaluate the optimal policy, we find that x?+1f + 1 = — 1/2 and xf0+2,r + i = 1/2. Thus, even when there are zero instrument penalties, the ex-ante optimal policy is inconsistent. In a sense, the non-causal structure of the model actually enhances the degree of controllability, since the policy-maker can influence the target yt indirectly through the 'expectationaP influence of xt + l. This is a particular case of the general nature of dynamic controllability in the rationalexpectations models discussed in section 7.8. 8.5
MICROECONOMIC INSTANCES OF INCONSISTENCY
In this section we consider some of the microeconomic cases of dynamic
8.5 Micro economic instances of inconsistency
157
inconsistency because of the light they help to throw on the nature, and possible resolution, of inconsistency problems in macroeconomic policymaking. The problem of dynamic inconsistency was first examined in a classic paper by Strotz (1956). The problem arose, not in a model in which expectations were rational, but from time-varying preferences. Strotz (1956) considered the problem of a consumer with an initial endowment of consumer goods who must choose among a number of alternative time paths for consumption. 2 The initial endowment, Ko, of consumer goods has to be allocated over the finite interval (0, T). At time period t the consumer has the preference function, indexed on utility and satisfying the basic axioms of choice: k(t-O)U[c(t),t]dt (8.5.1) o where [c(r),r] is the instantaneous rate of consumption at time period t and X(t — 0) is a discount factor, the value of which depends upon the elapse of time between a past or future date and the present. The maximisation of (8.5.1) with respect to consumption over the interval (0, T), denoted by [0,e(t), 7], subject to the budget constraint:
1 T
c(t)dt = K0 (8.5.2) o so that there are no bequests, implies that the discounted marginal utility of consumption should be the same for all time periods, i.e.: X(t - O)/A(f - 0) = - Uc(t)/Uc(t)
(8.5.3)
The resulting path for consumption is optimal from the point of view of T = 0. But, at some later date, x > 0 , the consumer may - or must if Jx is actually to be maximised - reconsider his consumption plan. The problem is then to maximise: [T X(t-x)U[c(t)9i]dt Jo subject to [ 0 , C ( £ ) , T ] , already given as:
(8.5.4)
I c(t)dt = Kz = K0 I c(t)dt Jo Jo Again the optimal path for consumption will have the property that: X(t - r)/A(r - r) = - Uc(t)/Uc(t)
(8.5.5)
This solution may be entirely different from (8.5.3) because the discount function has been shifted.
158
8 POLICY DESIGN FOR RATIONAL-EXPECTATIONS MODELS
To persist in a pattern of consumption simply because it was optimal from the vantage point of a previous period will not be rational if this pattern of consumption is not optimal as perceived from the vantage point of the present period. The optimal pattern of consumption will change with changes in T. Nevertheless, for the consumer to be fully rational he must also be aware that there will be these continuous reassessments of the pattern of consumption. An important question concerns the conditions under which the optimal plan at time T > 0 will be the continuation of the plan formulated at T = 0. For this we require that if [ 0 , C * ( 0 , T ' ] and |Y,c(f), T] maximise: X(t-0)U[c(t)9i]dt
+
X(t-Q)U[c(t)9i]dt
(8.5.6)
subject to: c{t)dt+
c(t)dt = K0
0
then [i',c(t), T] should maximise: T
X(t-T')U[c(t),t] subject to: 7
c(t)dt = K0-
or that
[T',C*(0,
[T c*{t)dt T] should imply both:
X(t-O)/Ht-O)=-C(t)/Uc(t)
(8.5.7)
and: X(t-T')/X(t-Tf)=-C(t)/Uc(t)
(8.5.8)
A necessary and sufficient condition for (8.5.7) and (8.5.8) to be equal is that all future dates are discounted at a constant rate, i.e. (t) = k*. This condition can be seen more clearly in the following example. Suppose, in discrete time, we have a preference function: Jt = y? + *t(yt + 1-l)2
+ Pty?+2 + o£xt\2
+ ptx?+2
(8.5.9)
which is identical to (8.4.7) except that now periods in the future are being discounted by at and /?,. Equation (8.5.9) is optimised subject to the causal
8.6 Precommitment, consistent planning, contracts
159
constraints:
yt+2=
x
t+1 +
x
t+2
The first-order condition for xr°+ x jr is: ar(xt + 1 - xt + 1) + pt(xt
+l
+ xr + 1 ) + 2arxf + 1 = 0
(8.5.10)
In period t + 1 we assume that, now, the policy-maker discounts the future at the discount rate at + 1 so: 1+at
+ 1x?+2
(8.5.11)
The first order condition for xr°+1|r + 1 is then: x t + 1 - x t + l + af + 1 ( x f + 1 + x t
+ 2)
+ 2xt +1 = 0
(8.5.12)
Thus, while in period t time period t + 2 is discounted at the same rate ft, in period r + 1 time period t + 2 is discounted at the rate a, + 1 . Because of the shift in the discount factor, there is no necessary reason why x?+l\t should be equal to xt + x |r + j . As should be clear from Strotz's example, xr°+ ^t + 1 will be equal to xf°+ j jr if and only if cct = cct + 1
8.6
and ft = af2
PRECOMMITMENT, CONSISTENT PLANNING AND CONTRACTS
Faced with a constant tendency to renege on former plans, Strotz suggested that an individual might decide to do something about it. An individual might find, for example, that, because of a strong preference for immediate satisfaction, he saves little, if any, of current income. But at the same time he regrets not having saved more in the past. In an attempt to combat this my optic behaviour he could adopt two alternative strategies. First, he could enter into precommitments by which he was obliged to carry out his plan. Entering into a contractual savings scheme might be one way of enforcing savings.3 The second course of action which might be adopted, if precommitment is not feasible, is what Strotz calls a strategy of consistent planning. The individual may resign himself to the fact that the optimal plan he formulates today cannot actually be implemented, so he might as well select the current decision which is best, given that he will go back on his plans in the future. He seeks the best plan among those he will actually follow. One consistent plan is that provided by the dynamic-programming approach of section 8.2. At time period T, for given transversality conditions
160
8 POLICY DESIGN FOR RATIONAL-EXPECTATIONS MODELS
and y T + 1 predetermined, the value xT is calculated. Then for the previous period, yT is given and yT_ 2 predetermined so we can calculate x T _ x. At each stage of the recurrence relationship, bygones are treated as bygones and the principle of optimality has been imposed. The question is whether a macroeconomic policy-maker would or should pursue the consistent policy. In the case of an individual, aware of the possible inconsistency of his planning, there may be a desire to actually reject any plan which is not followed through. The problem is then to find the best plan among those that will be actually followed. Is the consistent policy the best among those which will actually be followed by a macroeconomic policymaker? Consider again the simple scalar example of section 8.4. In a model in which expectations are forward-looking, it is not simply a matter of isolating the policy which is best among those which will actually be followed; we also need to determine whether the policies we study are believed or not. Let us first examine the polar case of believed and carried-through policies, leaving to one side, for the moment, the question whether these are optimal or not. What is the status of the consistent - dynamic-programming - policy? Clearly it is inferior to the ex-ante optimal policy if the ex-ante optimal policy is actually implemented. Nevertheless, we have reason to believe that the ex-ante optimal policy will not be pursued because the government can renege in future periods. But suppose - as seems likely - the private sector is aware that the government will renege. That is, far the problem of section 8.4, the inconsistent sequence xjj,, xr°+1|r + 1 , x?+2\t + 2 *s pursued. We have already established that, if this happens - if the private sector is aware that it will be cheated - the actual outcome will yield a cost, ex-post, greater than (or, at most, equal to) the cost of pursuing the ex-ante optimal policy in the first place. The next question is how the consistent dynamic-programming solution ranks as compared with the 'cheating' solution, of which the private sector is aware. Again, it is clear from section 8.4 that even if the government were to sequentially revise its policies as it went along - and the private sector was aware of what was going on - it would still yield an outcome superior to the consistent solution of dynamic programming. We can conclude, therefore, that, at least for the example of section 8.4, a process of sequential reoptimisations - of which the private sector is aware - which at each time period treats bygones as bygones, will be superior to the policy suggested by the recurrence method of dynamic programming, but will be inferior to the exante optimal policy. We can conclude, therefore, that the consistent policy produced by dynamic programming is not the best among those which are believed and could be followed. The ex-ante superiority of the time-inconsistent policy depends upon the policy-maker being able to fool the private sector so that implicit costs, in the
8.7 Rules versus discretion
161
form of errors, (x,°+1|f -x f ° + 1 | f + 1 ), (x,°+2|f + 1 -x f ° + 2 | f + 2 ), are being imposed. The two- and three-period examples we have considered tend to give undue emphasis to the scope for reneging. But, if we consider problems with much longer horizons, the possibility that a government could repeatedly renege without the private sector cottoning on becomes more and more implausible. It may well be that, early on, the government was able to fool the private sector. But with the passage of time there would be a loss of confidence in the announcements that the government made, and a consequent deterioration in performance, as economic agents became more unsure about what the government was up to. Once its reputation has been called into question, it might be very difficult to rebuild and very costly in terms of what could have been achieved if confidence in the government had not been undermined. The loss of reputation that a government would suffer suggests that the best option is for it to pursue the ex-ante optimal policy. But how is the government to convince the private sector that it is this policy which will be pursued in the future, given that, come the future, it will be in the interests of the government - and in some instances in the 'interests' of the private sector for policy to be switched? Among private individuals contracts can provide a legal mechanism by which two parties - when the benefits due to each party accrue in different time periods - can be bound to honour an agreement. Mutually advantageous trades which occur simultaneosly do not suffer from the problem that one of the parties to the trade might subsequently renege on his part of the bargain. Barters which involve intertemporal transactions might be particularly prone to problems of dynamic inconsistency, which may be a partial explanation for the use of money and other negotiable instruments in transactions. Governments can be legally bound if there is an independent constitution and an independent judiciary which can arbitrate upon possible violations of the constitution. But a more likely constraint upon a possible course of dynamically inconsistent policy-making by the government would be the loss of reputation. A government might embark upon a term of office with the realisation that in the early stages it might be able to get away with a measure of cheating. However, with time the private sector would realise what it was doing, modify its behaviour accordingly and no longer trust the government. In this case the best course may be to stick to the ex-ante optimal policy in the first place. Whether such a policy was optimal, ex post, would depend upon how quickly the private sector learned, the length of the planning horizon and the rate at which the reputation of the government deteriorated. 8.7
RULES VERSUS DISCRETION
The prospect of dynamic inconsistency in rational expectations models has been brought to bear on the argument about the merits of fixed rules for
162
8 POLICY DESIGN FOR RATIONAL-EXPECTATIONS MODELS
economy policy relative to discretion. Indeed, the findings of Kydland and Prescott (1977) have been used as a theoretical basis for the case against discretionary macroeconomic policy, a line of reasoning which is distinct from the argument of Friedman (1968) which is based on the presence of uncertainty. When they identified the possibility of ex-post inconsistency of optimal plans in models in which expectations were forward-looking, Kydland and Prescott (1977) went a step further and argued that the problems with the dynamic-programming algorithm, which we have considered already, suggested that there was 'no way control theory can be made applicable to economic planning when expectations are rational'. If one identified control theory solely with the methods of dynamic programming this might be true. But this book demonstrates that we can use non-recursive methods - which actually draw on an older tradition than modern control theory based on the theory of economic policy developed by Tinbergen and Theil. The maximum principle can also be used (see Driffill, 1980). But Kydland and Prescott's conclusion that simple (fixed) rules were superior to discretion in forward-looking models was not in fact a direct inference from the dynamic inconsistency of optimal plans but was suggested by some numerical experiments with an investment-tax-credit model. What they did was to examine what happens when an investment function is estimated and then the tax rate and tax credit determined on the basis of this function to minimise the value of a specified social-welfare function. But the rule for the tax rate and the tax credit is determined under the incorrect assumption that the structure of the investment function is invariant to the policy rule. With a change in policy rule there is a shift in the investment function. This is interpreted as a structural change and the function is reestimated and the policy rule determined anew. This causes a further shift in the investment function, re-estimation and a new policy rule. They found that for most sets of parameter values this iterative process did not converge and actually destabilised economic activity relative to some passive rule. As an analogue of the conduct of economic policy over time their examples point to an important lesson about the conduct of discretionary policy, whether the policy is determined by optimal control or not. To conduct policy on the assumption that expectations are not forward-looking, when in fact they are, can be very hazardous. Some additional evidence for this has been provided in Holly and Zarrop (1983). But it is not clear that this point has anything to do with the timeinconsistency issue per se. What it tells us is that the misspecification of the proper dynamics can lead to instability. That is, if equation (8.2.3) is the true structure, but by mistake the policy-maker assumes D = 0, then the application of a feedback rule: xt = Ktyt_x+kt (8.7.1)
8.8 Sequential optimisation, closed-loop policies
163
to (8.2.3) produces: yt = {A + BKt)yt_x + Cet + Dyt + U + Bkt
(8.7.2)
which can be written in the first-order form of chapter 7: yt = Ayt + 1+Cet + mt
(8.7.3)
where now
A=\ C
°
=[- M + BIO-C] :
It is quite possible that the application of the feedback rule (8.7.1) to (8.2.3), on the assumption that D = 0, will increase the absolute size of the eigenvalues associated with the forward terms in (8.7.3), and could even push the eigenvalues outside of the unit circle. The dynamic-programming solution of section 8.2, by contrast, is not based on the misspecification of the dynamics since it takes account of the forward-looking terms in the backward induction. It may not be optimal, relative to the ex-ante optimal solution, but it provides a fairer benchmark against which to evaluate simple rules than that used by Kydland and Prescott. 8.8
SEQUENTIAL OPTIMISATION AND CLOSED-LOOP POLICIES
The discussion of dynamic inconsistency so far has been concerned almost exclusively with non-stochastic models and thus with various kinds of openloop policy, since the basic issues would only have been obscured by the addition of stochastic elements. But in the real world we are concerned with uncertainty and responses to both anticipated and unanticipated disturbances. For this reason we need to examine the role of a sequential reoptimisation procedure which uses the non-recursive method of section 8.3, and the possible effects on it of dynamic inconsistency. It should now be clear that the use of a sequential approach could generate a time-inconsistent solution. More precisely, as new information about the unanticipated innovations, and anticipated and unanticipated changes in exogenous assumptions, becomes available the sequential revision to policy will be made up of two different elements. First there will be the normal innovation-dependent policy responses but secondly there can be a policy change which will reflect the fact that past expectations of current events are no longer a binding constraint. So, even if there were no stochastic disturbances or shifts in exogenous assumptions, there would still be a change in policy which reflects the ex-post inconsistency of optimal plans.
164
8 POLICY DESIGN FOR RATIONAL-EXPECTATIONS MODELS
We have already argued that, because of the need to retain a good reputation, a policy-maker will have a strong incentive to conform to the exante optimal policy. In section 8.4 we argued that even though it might appear that he could do better by changing his announced course this would only work to the policy-maker's advantage if everyone were fooled, i.e. only if the possibility of inconsistent behaviour were not perceived and the private sector did not change its responses to these predictable policy changes.
8.8.1
Open-loop decisions
Consider again a dynamic econometric model augmented with rational expectations of future outcomes: yt = Ayt_{ + Bxt + Doytlt
+Diyt
+ llt
+ ut
(8.8.1)
Recall that, from chapter 7, in period 1, the expectations for each yt are obtained from equation (7.4.9). Now stack the policy variables as Z' = (y'x\l9.. . , / | i , x ' i | i , . • >*r|i) where each y ^ may be understood to be a subset of targets from the full set of endogenous variables of (8.8.1). Let Z a , partitioned conformably, be ideal (but infeasible) values. Consider the quadratic objective function: l/2SZ'Q6Z + q'SZ
(8.8.2)
where Q is symmetric and positive definite. Deleting any equations in (8.8.1) which correspond to non-targets, we can rewrite the constraints as: HdZ = b
(8.8.3)
where H = [/: -R] and b = s - HZd. Hence the optimal policies are obtained as: Z* = min[(J)-fi(HSZ ft)|Qi] (8.8.4) z where JX is a vector of Lagrange multipliers. We can apply the conventional certainty-equivalence theorem directly in (8.8.4) and, therefore, minimise instead:
Q -H'TSZl
f q 1T5Z1
The first-order conditions imply:
ojj UwJ
(886)
8S Sequential optimisation, closed-loop policies
165
So: dZ* = Q-1{H'[HQ-1E1(b)]-[I-H'(HQ-1H')HQ-l]q}
(8.8.7)
and: d
(8.8.8)
If uncertainty is indeed confined to ut (for r = 1 , . . . , T), then 5Z* provides the current expectation of the ex-ante optimal-policy choice and its expected outcome. 8.8.2
Optimal revisions
The remaining problem is to determine how to revise each policy selection optimally at each t^2, given unanticipated shocks (st_j^st_j\t_j f o r ; ^ l ) , past achievements (realisations for yt_j9j^ 1, and changed expectations of the future (st+^st+J^t_i fory^O). This may be done by inserting updated values from Qf into (8.8.5) and reoptimising the policy variables still to be implemented. These variables can be partitioned between past realisations and future choices: 95Y[td
and dX'= [SXf, 5X{td
(8.8.9)
where 5Yt' = (&yt',...98y*i1)
and 8X*' = (8x'l9.. .9Sx't-i)
(8-8.10)
so the realisations for past targets and instruments are 5Z%' = (8Y%\6X%f). Meanwhile 8Z[t) = (S/t\t,.. .98y'T\t9 Sx't\t,..., <5xTjr) may be reoptimised at t. First we must partition the parameters of the objective function with respect to the past and the future. If Q
JQ* Ml \_M' Nj
let
conform to the partitioning in 8Y and 8X defined by (8.8.9). Expanding \/28Z'Q8Z and then collecting terms: 1/26Z'Q6Z = 1/28Z*'QOSZ* + dZfQ^Z^
+ l/2dZ{t)Q{t)8Z(t)
where
_\Q% M'OI
and
(8.8.11)
166
8 POLICY DESIGN FOR RATIONAL-EXPECTATIONS MODELS
Similarly q'dZ = q'0 + q{t)5Z{t) may be ignored in the reoptimisations, but the linear term has become \_q\t) + 6Z%'Q{] in place of the original q[t), and the quadratic term involves only Q{t). In the same way the components of the model (8.9.2) can be partitioned between the past and the future. This was done in section 7.4.3, equations (7.4.16) and (7.4.17), when multiperiod solutions were derived for (8.9.1). Hence, at t^2, the constrained objective, (8.9.4), becomes (ignoring constants):
* - H ° 2 ) / i * TP Z «>1 (8.8.12) where n = \ji*', /*|r)] is also partitioned with respect to the past and future; and JJ(I) = [ J : - l ? ( r ) ] , / / ( 2 ) = [ 0 : - . R ( 2 ) ] and H (1) = [ 0 : - i ? ( 1 ) ] are all derived from:
- [ 1 • : :E :£]
Finally b{t) contains the lower m(T — t -h 1) elements of b. Notice that no policy variables of the past have been deleted until after the constrained-objective function has been reconstructed on the basis of Q r Thus we have an optimisation problem of precisely the same form as (8.8.5) but with Q replaced by Qit) and H by H{t\ In addition, since -H{1)SZ0 = R{1)dX% we must replace: q by {Q\dZ* + [0:K<2)]>$ +qit)}
(8.8.14)
and bby{RM6XS
+ Et[b{t)-}}
(8.8.15)
where certainty equivalence has been reapplied to (8.8.13). Therefore optimally revised policies are given by (8.8.8) with Q, H, q and Ex{b) replaced, respectively, by Q{t), H{t\ and the quantities in (8.8.14) and (8.8.15). Of course only x£ itself will be implemented (with the expectation of achieving y$) before the next revision. This sequence of revisions ensures that, at implementation, each subvector x$t is multiperiod certainty-equivalent and innovations-dependent. Moreover, x^ and y*t are constructed using x,*-i|r-i,...,xi|i and ^ f - u - i ^ - - ^ * ^ e t c - T h u s e a c h *$ depends on expectations based on all Q,,..., Q1. Finally, (8.8.14) and (8.8.15) guarantee that there is no reneging because they ensure that each xr^r would remain unchanged from its initial value x ^ in the absence of any information changes (Q1 = . . . = Q r ). Indeed, the terms not involving R{2) in (8.9.14) and (8.9.15) are standard for sequential
8.9 Sources of suboptimal decisions
167
optimisations using a causal model (Thiel, 1964). And the H{2)'ii% terms are simply an extension for non-causal models. Taken together, (8.8.14) and (8.8.15) constrain each reoptimisation to the 'initial conditions' created by the previous optimal choices. By forcing the continued satisfaction of (8.8.8) at every t, as the first-order conditions from (8.8.7), they necessarily ensure that, for any fixed value of 5, each reoptimisation reproduces the same values for the decisions outstanding. 8.9
SOURCES OF SUBOPTIMAL DECISIONS
In summary, we can now identify two possible sources of suboptimal decisions in rational-expectations modds. First, recursive 'optimisation' techniques, such as dynamic programming, which ignore the non-causal elements represented by R{2\ will yield suboptimal coefficients in the (feedback) decision rules generated by any one information set, as pointed out by Kydland and Prescott (1977). To apply the principle of optimality requires the constrained-objective function to have a time-nested structure. Because (8.7.6) is not of this form when # ( 2 ) / 0 , the decisions must be optimised simultaneously and not recursively, as in dynamic programming. Secondly, including the non-causal multipliers in order to generate optimal open-loop decisions is necessary but not sufficient to ensure that they remain optimal decisions from the vantage-point of periods t = 2 , . . . , T. The decisions made in period t are actually implemented period by period, and to maintain their optimality it is necessary to incorporate all the information in Qt at each step - including the cost, to the policy-maker, of being constrained by expectations held in the past. Differentiating (8.8.6) with respect to Z yields QZ + q = H'[i at the optimum. Hence the subvectors of fi% can be written as functions of past expectations: i$ = (I fi)[Q{k)Zfk) + qik)~], for k = 1,..., t — 1. The purpose of the term involving n% in (8.8.14) is therefore to ensure that each decision remains an optimal function of the economy's current state, including the impact of expectations held in the past - or, more precisely, of the shadow prices which can be imputed to a rational decision-maker whose targets were affected by those expectations. Those shadow prices represent the penalties imposed on current decisions by having accommodated the effects of previously held expectations on earlier decisions. One may deny that past expectations represent actual commitments, but one cannot ignore any impact which they may have had on past decisions because the physical consequences are now locked into the system's dynamics. Just as past decisions cannot be undone, allowances made in past decisions for the then expected effects of future decisions cannot be undone. If these allowances are ignored, the constrained objective will depend on more than the current state and remaining decisions and this will violate the necessary condition for making optimal revisions on a sequential basis.
168 8.10
8 POLICY DESIGN FOR RATIONAL-EXPECTATIONS MODELS AN INTERPRETATION OF TIME CONSISTENCY
The intuition behind this time-consistency property is rather different from the traditional view of time-consistent decisions. It is usually argued that time consistency should be imposed by solving the optimisation problem sequentially backwards, so that agents are bound to pick the best decisions as seen at the point of decision until the terminal date and irrespective of any decisions made in the past. 4 In other words, it is assumed that agents will always ignore past announcements and will sequentially revise current decisions in their own interest. That causes no difficulties if the components of the constrained objective are nested from t to T at each t, so that, at each t, the outstanding elements of that objective are additively separable from elements with earlier dates and the Markov property (looking forward) applies. But we have seen that rational-expectations models imply no such nesting and that inferior (open-loop) decisions would emerge. In this case, given the emphasis which is given in the rational-expectations literature to optimising behaviour, why should a policy-maker accept being restricted to a recursive strategy of consistent planning, and why should the public believe that the policy-maker will act in this suboptimal manner? The solutions proposed in this chapter, based on non-recursive methods, will provide the ex-ante optimal solution. Moreover, subsequent reoptimisations can be constrained to reproduce the ex-ante optimal solution, ex post, in the absence of stochastic innovations. If there are innovations then the solution produces an innovations-dependent adjustment, contingent on disturbances. Ex ante, then, there are powerful reasons to suggest that the policy-maker should pursue the ex-ante optimal policy and, moreover, that attempts should be made to convince the public that this policy will be carried through. Convincing the public of this will not be easy since there will still be an ex-post incentive to renege, if, ex ante, the policy-maker can convince the public that no reneging will occur. However, the public will also be aware of the options facing the policy-maker. Ways in which the policy-maker can be persuaded not to renege, or persuaded to keep the degree of reneging to a minimum, will be discussed in subsequent chapters.
9
9.1
Non-cooperative, full-information dynamic games INTRODUCTION
One of the most important features revealed by the analysis of the previous two chapters was the extent to which the policy-maker needs to take account of how the private sector will react to expectations of what the policy-maker will do in the future. This raised a number of problems - of a technical nature because of the inadequacy of recursive techniques based on dynamic programming, and also problems of a more general kind arising from attempting to ensure the 'consistency' of policy over time. In this chapter and the next we examine policy as a dynamic game. Typically this will take the form of a game between competing groups who contribute to the process of policy-making - and this has been widely analysed in bargaining theory - but it could also be a game between different countries. The two essential features of a dynamic game are the degree of cooperation between participants and the amount of information that any particular participant has about the tastes, intentions and preferences of others. In this chapter we assume that there is no cooperation between players while each player has full information about the tastes, preferences and intentions of other players. The following chapter will then deal with cases of incomplete information, bargaining models and cooperation and social optima. The literature on games has three main strands: the duopoly-oligopoly, static models of Cournot, Edgeworth and Stackelberg; the game-theoretic framework of von Neumann and Morgenstern (1944), Kuhn and Tucker (1953), Selten (1975); and the differential game models of optimal-control theory in which the single-controller case was extended to handle problems in which more than one controller was involved (Isaacs, 1965; Starr and Ho, 1969). Although some of the early applications in optimal control were meant to be for pursuit-evasion problems in the use of anti-missile missiles, the most general and successful applications have proved to be in economics (see, for example, Pindyck, 1977).* Competition between economic agents can be characterised as a non-
170
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
cooperative game (Basar and Olsder, 1982). The competitive-equilibrium model of atomistic markets is a special kind of game in which the individual player has such a small effect on the actions of others that he can effectively treat the vector of prices as given. The advantage of this approach is that the information that is needed for an individual to act optimally is clearly specified. In the perfectly competitive equilibrium model the information required is particularly sparse and is contained in the vector of prices. But, as soon as the player is large enough to affect prices, it is necessary for him to know more. In general, game theory can be applied to the analysis of a number of different situations in which one player has to allow for the repercussions of what he does on other players and what the other players can be expected to do in response. Some specific cases which have been studied are: (i) Decentralised macroeconomic policy-making where a constitution dictates that responsibility for monetary and fiscal policy is vested in legally distinct bodies, as in the USA and Germany (e.g. Pierce, 1974; Pindyck, 1977).2 (ii) Increases in the interdependencies between national economies which are large enough to affect one another trade and capital flows can make the coordination of national macroeconomic policy advantageous (e.g. Hamada, 1976; Hamada and Sakuri, 1978; Brandsma and Hughes Hallett, 1984a; Oudiz and Sachs, 1984, 1985; Canzoneri and Gray, 1985; Taylor, 1985; Hughes Hallett, 1986a).3 (iii) Problems in corporatist systems in which macroeconomic policy is made by strategic behaviour, or negotiation and agreement among employers, trade unions, interest groups and governments (e.g. Rustem and Zarrop, 1979; Brandsma and Hughes Hallett, 1984a; Neese and Pindyck, 1984; Van der Ploeg, 1982). Also there have been a number of recent attempts to apply the framework of game theory to the analysis of rational-expectations models. In this analysis the policy-maker or government is treated as one player and the private sector or public is treated as the other player. However, for this kind of situation to be regarded as a game involving duopolistic players we require that there are 'strategic interdependencies' (Friedman, 1977). But unless the public is assumed to collude it is difficult to see how individual members of the public even though they might form rational expectations of what the government will do - can act strategically with respect to the government. Indeed, the position seems more like that of the monopolist who faces an atomistic market of buyers represented by a downward-sloping demand curve. Nevertheless, some of the concepts of game theory are helpful for clarifying aspects of policy under rational expectations. Furthermore, there is a class of dynamic games
9.2 Types of non-cooperative strategy
111
for which a public forming rational expectations is crucial. If countries try to coordinate monetary and fiscal policies between themselves - or at least agree to coordinate economic policies - the view that the public takes, especially in international capital markets, of the likelihood of success or of the likelihood that some countries will renege on any agreement will have an important bearing on how exchange rates and interest rates behave. 4 Methods which take account of this novel aspect of dynamic games will be described later in this chapter. 9.2
TYPES OF NON-COOPERATIVE STRATEGY
Consider a dynamic, non-cooperative, two-player game in which each player maximises his own interests. We assume that there are n target variables, yt, and also that there are m1 instruments available to player 1, xlt, and m2 instruments available to player 2, x2v Each player has an objective function, Jx and J2 respectively, and these (to prevent the problem collapsing to a single-player game) contain conflicting objectives. 9.2.1
The Nash solution
The most straightforward kind of non-cooperative strategy is a Nash game, which, for two players, is defined such that J1(xlf,x5f)<J1(xlf,xy
(9.2.1a)
and J2(x\t9xlt)<J2(x1nx2t)
(9.2.1b)
for all feasible xlt # x\t, x2t ¥^ x\t, and for all t = 1 , . . . , T. This Nash solution has the property that, in the absence of cooperation, there is no unilateral alternative choice for either player which can make him better off. However, the logical possibility that each player might make reasonable conjectures about how the other player might react is considered in section 9.5 below. Both players are on an equal footing and neither can have a solution imposed against his will. There is another branch of the game-theoretic literature which refers to the Nash equilibrium as a perfect equilibrium (see, for example, Selten, 1973). An additional refinement is to refer to a subgame-perfect equilibrium, which is essentially the principle of optimality applied to games. The relevance of this will be brought out later when we discuss multiperiod games. The Nash solution can be shown diagrammatically for the two-player, single-policy instrument and single-period case, as in figure 9.1. We assume that player 1 controls the instrument xl9 the value of which is measured on the vertical axis, and player 2 controls the instrument x 2 , the value of which is measured on the horizontal axis. The objective functions
172
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
Jj = Jl(xl,x2), J2 = ^2(xi>x2)» after substituting out states, are assumed to be quadratic in xt and x2. The quadratic functions are represented in figure 9.1 with indifference contours for each player. The line aial represents the locus of strategies of player 1 which provide the smallest cost for player 1 for each possible choice of x 2 . Thus there are at most two different values of x2 which yield the same cost for player 1. The line b2b2 represents the equivalent locus of strategies for player 2. The slope of each reaction curve is determined both by the responses of the economy (as represented by the strength of the effect of each x on the state of the underlying system) and by the tastes and preferences of each player. It is assumed that the underlying constraints are linear so that with quadratic cost functions the reaction curves are straight lines.
X{
x\ Figure 9.1 Nash and Stackelberg equilibrium solutions
x{
9.2 Types of non-cooperative strategy
173
The intersection of axax and b2b2 is the Nash optimal solution marked N. The resulting choices are the only ones which lie on both reaction functions. So they are the only choices which satisfy (9.2.1). 9.2.2 Simplified non-cooperative solutions A simpler form of non-cooperative equilibrium is a Cournot game, in which each player assumes rightly or wrongly that the opponent will make a choice for his instruments irrespective of what he chooses himself. In the two-player case, the optimal Cournot decisions are a feasible pair, (xclt,xc2f), such that:
Wu,xL)<Mxlt,xi) :
J2(x{nXf 2i)<J2(x{t9x2t)
(9.2.2a) (9.2.2b)
for all feasible xlt^xclt, x2t^xc2n and r = l , . . . , T . x{t and x{t are the particular values assumed by player 1 and player 2 respectively for the actions of their opponents. 1 The third kind of solution is a simple Stackelberg game where one player (the leader) follows a Cournot strategy, ignoring how his opponent might react. The other player (the follower) conditions his own decision on the Cournot-type action of the leader. Thus a Stackelberg solution is a feasible pair, (xjf,xs2f), such that: Ji(x\t9xi)<J1(xlt9x{t)
(9.2.3a) (9.2.3b)
Both the Cournot and the Stackelberg solutions are shown in figure 9.1. If player 1 assumes that player 2 will employ x{, his optimal decision is given by Xi, while, given x{, player 2 pursues xc2. For the simple kind of Stackelberg solution, player 1 (the leader) assumes x{ and pursues x\ ( = x\), while player 2, the follower, conditions his decision on x\ and pursues x2*. This last remark points to an inconsistency inherent in these simple Stackelberg or Cournot games; an inconsistency which will in fact be observable and which, at least in a multiperiod problem, 5 would destroy the equilibrium nature of the solution for rational decision-makers. Only in exceptional circumstances would the assumptions made by each player about the actions to be expected from the other actually turn out to be the same as those which occur. In other words, player 1 will observe that xf2t ^ xc2t and player 2 will note, likewise, that x{t # xclf. Similarly in the Stackelberg case, the leader (player 1) will only observe x{ = xs2* by chance. In the face of these systematic errors, both players will realise that they are not making decisions based on 'rational' expectations of their opponents actions and that they are not therefore computing the best decisions for themselves. They will perceive that some other solution procedure (involving another mechanism for
174
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
arriving at expectations of the opponent's actions) will bring better results and the equilibrium nature of the solution will then be lost. However, both solutions would become more plausible if learning were permitted. A particularly simple form of learning over time might be where assumptions about what one opponent is going to do are revised in the light of what he actually did in the previous period. For example, player 1 in a Cournot game uses: X
2t
= X
2t - 1 + ^1 (X2t - 1 ~ X°2t - 1)
and player 2 uses: X
U = xlt-l
+^2(*{«-l
-At-l)
For, 0 < kx, X2 < 1, and with the convexity of the preferences and the linearity of the constraints, this Cournot strategy with learning will converge to a fixed point at the Nash equilibrium solution. The difficulty here is to determine how the values of kx and k2 should best be chosen. 9.2.3 Feedback-policy rules We have pointed out that one of the drawbacks to the Stackelberg game in which the leader adopts a Cournot strategy is that the leader's assumption about what the follower will do is arbitrary and will be violated. Suppose instead that the leader were aware of how the follower would react. It is obvious from figure 9.1 that the leader, if player 1, can do best if he can move to S1. On the other hand, if player 2 is the leader, point S2 is best. Note that, if player 1 is the leader, the follower is actually better off at Sx compared with the Nash solution at N, while, if player 2 is the leader, player 1 as the follower is worse off compared with AT.6 The obvious way of avoiding having to make arbitrary assumptions about the follower's behaviour, which are then proved wrong, is to assume that the follower will follow a particular reaction function (in terms of the leader's decisions). The leader can then determine his own optimal-reaction function (given that of the follower) and solve the two reaction functions together to obtain mutually consistent decisions. This removes any difficulty that the (arbitrary) initial values x{ will be inconsistent with the subsequently computed value xs2; and it might be that the follower's reaction function was known (or could be imposed) a priori. Point Sx is an illustration of this more sophisticiated Stackelberg strategy, where the follower is assumed to use his reaction function to determine x2 given xx. For the leader to be able to maintain Sx or S 2 , it is necessary that he is able to act first so as to force the follower to take up the required position. In particular he needs to know, or must learn over time, the reaction function of the follower. It is possible to envisage a number of sophisticated
9.3 Nash and Stackelberg feedback solutions
175
methods by which the leader can learn even though he does not know what the preference function of the follower is. If the leader can obtain the slope of the follower's reaction function by making a small change to what the leader does, it is comparatively straightforward for the leader to calculate his best strategy. We can obviously use the same device to eliminate the arbitrary assumptions (x{, x{ of (9.2.2)) of the Cournot solution, and the subsequent inconsistencies when x\ ^ x{ and xc2 ^ x{. In this case each player assumes some reaction function for his opponent, chooses his own optimal-reaction function in the light of his opponent's function, and then solves the two jointly to obtain the decision values themselves. This ensures that each player gets a mutually consistent pair of values (x l 5 x 2 ) but does not generally mean that player l's evaluation of the pair (x1, x2) coincides with player 2's evaluation of that pair. In other words, just as the leader found in the Stackelberg case, each player will find that the reaction function assumed for his opponent is now inconsistent with the function which a rational opponent would actually end up using. So, as before, this type of Cournot inconsistency problem has not been fully resolved. It should be clear that what characterises a game is the amount of information available to each player, the strategy he chooses to follow (i.e. how much use he makes of that information) and the extent to which one player can impose a solution by acting first or by using coercion. The Cournot solution uses the least possible amount of information while the Nash solution uses the most. In principle, solutions (9.2.1), (9.2.2) and (9.2.3) involve more than a single period. But to provide a fully intertemporal dynamic game we have to be more explicit about the objective function of each player and the constraints they all face, as well as the particular form their strategies take, whether open-loop or closed-loop. One difficulty with the literature on dynamic games is that the distinction between open-loop and closed-loop which was drawn in chapter 3 can be obscured because the terms are often used in conflicting senses. In some of the literature a solution is described as open-loop if a player does not take into account the reactions and counter-reactions of other players, while it is described as closed-loop if he does. This terminology can be misleading but it is so widely used in the literature that we shall use it.
9.3
NASH AND STACKELBERG FEEDBACK SOLUTIONS
Consider the linear model: yt = Ayt-X + Bxxlt + B2x2t + Cet
(9.3.1)
where the vector of policy instruments has been partitioned into two distinct
176
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
sets of instruments, each of which is under the direction of separate policymakers or players. The differing preferences of each player are reflected in differing objective functions in terms of either relative penalties or desired or ideal values: Jx = 1/2 £ (6y'uQu6y'u + 6x'uNud*u)
(9.3.2a)
J2 = 1/2 £ (dy'2,Q2ldy2 + Sx'2,N2,Sx2t)
(9.3.2b)
1=1
The matrices Nu and N2t, t = 1 , . . . , T, are symmetric and positive definite while the matrices Qlt and Q2t, t= 19.. .,T are symmetric and positive semidefinite. It is assumed that neither player has any direct preferences with regard to the policy instruments of the other player. The deviations from desired or ideal values are defined by: *yit = y t - A \
= Xu-xdit
teit
for i = l , 2
The generalisation to n players, though notationally messier, involves no more complications - leaving aside the possibilities of coalitions between groups of players - than the two-player game. In contrast to the single-player 'game' implicit in chapter 3, the first player must minimise his cost function subject to equation (9.3.1) but without being able to regard x2t as either under his own control or exogenous; while the second player faces a similar problem with regard to xu. 9.3.1
The Nash case
An explicit feedback solution, in the sense that a contingent rule is provided for each player, can be obtained using dynamic programming (Pindyck, 1977; Plasmans, 1978). The problem in the non-cooperative game is to determine xlt and x2t, for t = 1 , . . . , T, such that the objective functions (9.3.2) are jointly minimised subject to (9.3.1). Consider the problem for the terminal period. The firstorder conditions for the minimum of (9.3.2), subject to (9.3.1), provide the optimal-feedback rule for each player: x» = KuYT_t+kiT9
i=l,2
(9.3.3) l
ST=-
[N(T + Q.TBi + (NjT + QiTBj) ~ QjTB^
KiT = Sr l [QiTA + QnBANjr + QjTBj) ~ 'QjTA\ kiT = Sf x {[Q iT C - QjTBj(NjT + QjrBjr'QjrC^er + QiTB{{NjT + QjrBjy'iQjrBj/jr for ij=
1,2, and
i^j.
+ QiTB{^T + NiTxdiT
+ NjTxdjT)}
93 Nash and Stackelberg feedback solutions
111
Substituting (9.3.1) and (9.3.3) into (9.3.2) and rearranging we have the quadratic form for the terminal-period cost: Jt(T, T) = y'T-1PityT_1
-2yT-xK
+ aiT for i= 1,2
(9.3.4)
where PiT =(A + BtKiT + BjKJT)'QiT(A + BtKiT + BjKjT) + K'nTNiTKiT hiT = (A + BtKiT + BjKjTYlQiABiktT + BjkJT + CeT) - QiTy«T] — K'iTNiTxiT aiT = k'iTNiTkiT - 2QiT&(BtkiT + BjkiT + CeT)
-2klTNiTxfT + y&QtTyfT + xfTNiTxfT for ij= 1,2 and 1V7. For some intermediate period, t, the optimal-feedback rule for each player is: xt = Kityt^+kit
(9.3.5)
where
j t +
^
for ij = 1,2 and 1 # 7 and P£t = (A + ^ X f t + BjKjt)f(Qit + Pft + 1 )M + B{Kit + B
hit = (A + BtKu + BfayiQu + Pft + 1) x [(^Kft + B;.K,, + Cet) - Q f t ^ - hu + 1] + K'itNitxi for i, j = 1, 2 and
i^j
where we have ignored the term ar, since this has no further role to play in the determination of the feedback matrix and tracking gains. As in the single-player case, the feedback matrices are a function only of system parameters and the tastes and preferences reflected in the objective function. But there is an important difference in that the feedback matrix of
178
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
each player is also a function of the tastes and preferences of the other player. Moreover, each player's tracking gain also includes terms in the desired values of the other player. This means that if one player changes the path he is pursuing - appearing as a shift in y* or xd - this will also change the optimalfeedback rule of the other player. Thus the closed-loop responses of each player to both anticipated and unanticipated disturbances are closely bound up with one another. This Nash feedback solution has many desirable features since, for fixed exogenous assumptions, it provides a reasonably economical way for each player to respond to stochastic disturbances, even though the initial information requirements are very onerous. However, there is a difficulty in that the Nash feedback solution is still not the best that each player can achieve, even when all players agree to conduct the game in terms of a feedback rule and agree that the gains from iterating on the reaction functions are not worth while and can be ignored. This is a different source of suboptimality to that mentioned in previous sections where each player fails to account for all his opponent's reactions and counter-reactions. Consider the following two-player, two-period dynamic game to illustrate the point: + y 2 + 1 + ( x l t + 1)2] 2
(9.3.6a) 2
2
2
J2t = l/2[(yt - I) + (j, f+ 1 - I) + x 2t + (x2f + 1 ) ]
(9.3.6b)
where Ju and J2t are the objective functions of each player, which are minimised subject to: yt = x2t yt + 1 =yt + xlt
(9.3.7a) +l
+x2t +1
(9.3.7b)
The feedback solution is obtained by first minimising:
subject to (9.3.7b) and then minimising Ju and J2t subject to the minimiser of Jlt + 1 and J2t + 1. This provides the dynamic programming or recursive solution: x lf + 1 =37/60,
x2t= 17/20,
x2t
+l
=23/60
(9.3.8)
with objective function values Ju — 4.52 and J2t = 1.61 respectively. However, if we optimise (9.3.6) directly subject to (9.3.7) we obtain: *ii + i = - 4 / 7 ,
x 2r = 5/7,
x 2 , + 1 = 3/7
(9.3.9)
which is the Nash optimal solution satisfying (9.2.1). It yields objectivefunction values of Ju = 1.16 and J2t = 0.96, which dominate those of the
9.3 Nash and Stackleberg feedback solutions
179
previous solution by a wide margin. The suboptimality in (9.3.8) is therefore due to the period-by-period nature of the recursive solution.The recursive solution in fact ignores some of the intertemporal components in the constrained-objective function facing each player and therefore delivers an inferior solution. These intertemporal components are the same 4non-causaF elements we have identified in the forward-looking models of chapter 8. 9.3.2
The Stackelberg case
In a Stackelberg feedback game the leader moves first, aware that the follower will make his response in terms of the follower's reaction function. The firstorder conditions in this case are for the leader: dJJdxx + (dy1/dx1)dJl/dyl
=0
(9.3.10)
and for the follower: dJl/dx1 + [(dy1/dx2)dx2/dx1
+ dy1/dx1]dJ1/dy1
=0
(9.3.11)
Strictly speaking, we should write the first-order conditions so as to make it clear that we cannot solve them simultaneously. For one player to act as leader, he has to be able to move first. But the structure of the constraints described by equation (9.3.1) implies that xlt and x2t are chosen at the same time. One way of setting up the problem in such a way that decisions are not made simultaneously is to rewrite (9.3.1) as: yt = Ayt-X + B,xu + B2x2t., + Cet (9.3.12) In this case the second player only affects the economy with a delay.7 In the terminal period, the leader is free to act independently of the follower, since the follower cannot implement a decision in the terminal period. Thus the solution for the leader is given by a variant of equation (3.2.12) from chapter 3. This is: x*1T = K1TyT__l+k1T
(9.3.13) 1
KlT= -(NlTB'1QlTBi )B'1QlTA
(9.3.13a) (9.3.13b)
Substituting (9.3.12) and (9.3.13) into the expression for the terminal-period loss gives: Jl(T,T)=l/2(yfT-1PlTyT-1-2y'T.1h1T
+ a1T)
(9.3.14)
where P1T=(A + B.K^yQ.M h1T = (A + BlK1T)'Q
d
+ BXK1T) + K'lTNlTKlT d
1TyT-K'1TN1Tx T
(9.3.15a) (9.3.15b)
180
alT =
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
k'lTNiTklT-2QiT/T(BlklT+CeT)-2klTNlTxiT 2T _ 1 )
(9.3.15c)
To obtain the full feedback solution for the Stackelberg leader we have to follow the procedure of chapter 3, section 3.2, and proceed backwards period by period to the initial period. The follower in period T — 1 now knows what the leader is doing and faces the set of constraints: et
where A = (A + B1Kt), kt is (9.3.13b) minus BJlx^-u
(9.3.16)
where n is
[Nt + B'l(Qlt + Plt + l)Biy1(Qlt + Pt + 1)B2. The follower can now calculate his feedback rule, given his knowledge of the feedback rule of the leader. Note that what the leader assumes about the follower's behaviour need not coincide with what the follower actually does.
9.4
THE OPTIMAL OPEN-LOOP NASH SOLUTION
The failure of recursive methods to provide an optimal solution to the Nash game echoes the problems which we encountered with rational-expectations models. But, before we examine the exact relationship between the optimal solution and that obtained by recursive programming, we show how the optimal solution to a Nash game may be obtained by means of the stacked final-form model approach. The linear reduced-form model with two players is given by (9.3.1) and the objective functions for each player by (9.3.2). For our purposes the constraints on minimising each objective can be condensed to: y
(9.4.1)
where R(iJ\ 7 = 1 , 2 , are (mtT x nfT) matrices containing the dynamic multipliers R^^dy^/dx^ if t>k, and zeros elsewhere; and where s(I) is the sub vector of s associated with the subset Y(I) from Y. The complete model is therefore:
The causal nature of this model is reflected in the block triangularity of the multiplier matrices R{ijj\ which would in practice be evaluated numerically along suitable reference paths for xf(1), xj 2) and ev Thus R{iJ) describes the responses of player f's targets to its own instruments, and R(iJ) his responses to player / s decisions.
9.5 Conjectural variations
181
Now each decision-maker will find his own actions constrained by: ii)
b{i)
i=l,2
iiJ)
U)
(9.4.3) {i
(0
s(i)
{i)d
{i
w
The vector b = R 5X + c \ where c = - y + £,. R ^x defines the information required by player / in order to make his decision. Evidently each player's optimal strategy depends on, and must be determined simultaneously with, the optimal decisions to be expected from the other player. In the absence of cooperation, the optimal decisions [x (1) *, x (2) *] will satisfy: j(i)[jr«)*, X U)*] ^ J{i)[X{i\ XU)*] (i)
for i = land 2
(9.4.4)
(i)
for all feasible X ^ X *. This is the Nash equilibrium solution described by (9.2.1). If (9.4.4) is satisfied, then neither player has any incentive to deviate from his choice. Now if player i can take X o ) to be invariant with respect to his own decision, then the necessary conditions for an optimal value of X{1) are: dJ^/dX® + [dY{i)/dX{i)ydJ{i)/dY{i)
=0
(9.4.5)
yielding: x(0*
_ xd)d = _ [RlWQWRii.i> + N (0] - 1R(MTQ(0| C (0 + R(W[XW _ X W]} (9.4.6)
Player j , meanwhile, will be deriving a similar set of necessary conditions for an optimal choice of XU) on the assumption that X{i) will remain fixed with respect to any changes made in the Xu\ His decision rule, corresponding to (9.4.6), will therefore be: XU)*
_ x U)d _ _
rRUJ)nU)Ru,j)
(9.4.7) The Nash solution, point N on figure 9.1, is then given by solving (9.4.6) and (9.4.7) simultaneously:
[_X{2)*J " L^(2)dJ " IRIWQWRV-VRU-V'QWRU-2) + N ( 2 ) \rR^ 2 ) Q^V 1
9.5
(9.4.8)
CONJECTURAL VARIATIONS
An important feature of the Nash equilibrium solution is that it is assumed that each player takes a decision independently of the other in the sense that, in minimising Jl9 player 1 ignores the effect dx2/dxl, and player 2 ignores the effect dxjdx2. These terms are referred to as the conjectural variations.
182
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
But both players will notice that they can benefit by altering their reactions to their opponent's predictable reaction functions. (These functions are predictable in the sense that both parties can compute either optimal-reaction function in advance, under full information about the feasible responses of the economy and the relative priorities being adopted by both parties.) Indeed it is consistent for each player to assume that he could introduce his optimalreaction function behaviour, and then to go on to use a similar reaction function for his opponent (which does change the opponent's decisions as a function of the first player's decisions) when solving for the actual policy values. One point which dominates N is marked as 5 l 5 already identified in figure 9.1 as the Stackelberg solution when player 1 is the leader. It would be in the interests of both players to move to Sx. Thus the Nash solution is only optimal so long as each player ignores the predictable reaction of the other player who is making similar calculations to determine his own optimal responses. But once each player realises he may want to relax that restriction the Nash, solution will appear suboptimal. Some further aspects of the conjectural variations will be examined below. 9.5.1
The conjectural-variations equilibrium
Given a suitable starting-point, each player can use conjectural variations 8 to deduce the appropriate reaction terms within the optimisation process. Each player now recognises that his decisions affect all targets and hence the policy choices of his opponent, and that variations in the latter then influence the targets again and hence what the first player should decide to do. Instead of optimising conditional on fixed reactions by the opponent ('given he does that, what should I do?'), each player must now anticipate his opponent's reactions ('if I do this, how will he react and how should I best accommodate that?' etc.). Hence these conjectural variations are captured by the dX(i)/dXU) and dXU)/dX{i) terms. By optimising the decisions associated with each variation in the conjectured responses to the currently envisaged choices, and then constraining those variations so that the new pair of decisions represent an improvement over the old pair, we reach the conjectural-variations Nash equilibrium. That equilibrium holds only when both players perceive that no further gains can be made by varying their reactions to the decisions currently expected from their opponent, because to do so would trigger counterreactions (in the opponent's interest) which more than offset the gains of unilateral action. But, whenever net gains can be made despite the opponent's responses, a further round of policy adjustments must be expected. The necessary conditions for optimal decisions with conjectural variations are therefore: dJ{i)/dX{i) + {[dY(i)/dXU)~]8XU)/dX(i) + dY{i)/dX{i)ydJ{i)/dY{i) = 0 (9.5.1)
9.5 Conjectural variations
183
for i = 1 and 2 (i # j). Excepting the reaction matrices dXU)/dX(i\ all the partial derivatives required to evaluate (9.5.1) are given by (9.4.1) and (9.4.3). If dX{j)/dX{i) = D{j) were known forj= 1 and 2, we could solve (9.5.1) for i= 1 and 2 simultaneously to obtain: l)(2)(2)(22)
(2) J|_G<2>N(2)C(2)J
(95 2)
'
where G(i) = R{iJ)-¥RiiJ)DUh Notice that setting D (1) = 0, D (2) = 0 reproduces the open-loop Nash solution (9.4.10). Hence the optimal open-loop Nash is just a restricted case of (9.5.2) as claimed. One obvious way of evaluating (9.5.1), when D{1) and D (2) are unknown, is to construct a fixed point between [D(1),Z)(2)] and [X (1) X {2) ] satisfying (9.5.1). Indeed an iterative procedure is already implied. Inserting trial values [D (1) ,D (2) ] into (9.5.1) automatically generates new values [ D j V i ^ i + i] via (9.5.2), i.e.:
where F®+1 = - [G{jyN{i)R{U)
+ Q{i)] ~ x G f N{i) with G<° = R(iJ) + K ^ Z ) ^
and where D ^ ! =Ff+1R{iJ)
± Df
i = 1 and 2, and ; / i
(9.5.4)
Moreover, a pair, ( D ^ , D^2}) and (X{*\ X{^2)), satisfying (9.5.1) exists, since an open-loop Nash equilibrium exists for every non-zero-sum game with convex objectives and strategy sets.9 If a solution exists for the open-loop Nash equilibrium, then one exists for the conjectural-variables equilibrium since the latter derives from objective-function evaluations which are better for at least one of the players, and probably for both. (These objective-function values are of course bounded below by zero.) Hence a sensible way to search for this conjectural-variations equilibrium is to start from the open-loop solution (i.e. (9.5.2) with D{i) = 0, i = 1 and 2) and constrain the iterations in (9.5.3) and (9.5.4) to generate improvements in one or both objective-function evaluations at each step. This ensures that the search will terminate. But, because of its non-linearity, (9.5.3) is not guaranteed to converge. We therefore modify (9.5.3) by replacing D®+1 with ytDf+1 + (1 - yt)D^\ where 0 < yx < 1 is a scalar chosen to force X^lx and X{*12 'downhill' (i.e. such that J ^ X ^ i ^ s + i ^HXi1*,*?*] ^ r i= 1 and 2). Thus we search among the improvements available from the reaction matrices which are generated by resolving (9.5.1) at each step. An exhaustive search will ultimately lead to
184
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
Thus, this modification of (9.5.3) has many advantages. First, it helps overcome any convergence and uniqueness problems in (9.5.4). In any case, to establish an equilibrium position, we must pick that solution which yields the best objective-function values for both players simultaneously from the multiple solutions which (9.5.2) may generate. Secondly, it avoids problems of complex-valued reactions in the asymmetric case. Thirdly, it recognises that it is only sensible for a player to alter his conjectures in a way which is Pareto-improving for both players, since otherwise the opponent simply will not react in the way conjectured. One cannot expect any player to react contrary to his own interests because that would amount to imposing interpersonal comparisons without any compensating bargain or sidepayments.
9.6
UNCERTAINTY AND EXPECTATIONS
In this section we examine in more detail how conjectures about what one's opponent will do imply anticipation of future actions. An expression for each player's decision, on its own, can be obtained by inverting the matrix in (9.5.3). Multiplying out and rearranging terms yields: x GfQf[c{ii)
+ EyF&i^0]
(9.6.1) i )
i )
for i = l and 2 (/#;). At the end of the downhill search, D J = D J +1 (either because (9.5.3) converges with consistent conjectures, or because the search terminates when yl=y2=0 is forced). The optimal decisions therefore actually follow from (i = 1,2): XM*
= xii)d - \G®QTG® + JVJ " l G{$QfEti[d('>*] {
ii j)
{iJ)
(9.6.2)
i )
for G 2 = R > + R D i and d^-c^ + R^F^c^. Fj? is defined following (9.5.3) but is evaluated using the converged terminal values Djjp. Finally Eti(.) = E(.)Qit denotes an expectation conditional on player l's information at t. These decisions would subsequently be revised by reoptimising them conditional on later information sets (for t = 2 , . . . , T, and i = 1 or 2).1 ° The decisions are therefore computed using a single information set representing either one or the other decision-maker's view of the past and future at each moment in turn. The result is a dynamic-decision rule for each player (closed-loop with respect to time). Hence conjectural variations have decomposed the standard two-player problem into two separate but equivalent single-player problems: namely, minimise J, subject to: SY(l) = [RH + RuDf]8X®
+ d®
(9.6.3)
9.6 Uncertainty and expectations
185
In terms of the original system, the 'domestic' multipliers, R(iJ) - if we assume the game is between different countries - have been transformed to include the induced spillovers from elsewhere to R{iJ) + R^D^. In addition, the additive 'shock' term s(I) has been transformed to a combination of both 'shocks' c (0 and c 0) . But since d{i) is additive, uncertainty can still be treated by certainty equivalence - as is done in (9.6.2). In practice one seldom has accurate knowledge about the contents of the information set employed by one's rivals, and player f s perception of both optimal decisions may not coincide with player fs evaluation. To allow for heterogeneous information, we have to insert Eti(s) into d^ for j ^ i as well as into d^ to determine player fs decision. Thus, uncertainty has just two components. d{£ contains both the direct uncertainty over s{i) within c{i) and the indirect uncertainty over the evaluation of the other player's decisions (i.e. over sU) within c 0) ). Different estimates of the preferences in Qf or Nt and the multipliers in the R{i'i] and R(iJ) matrices, can be similarly handled. Of course, the differences between ex-ante expectation Eti(d{£) and ex-post realisation will depend on errors in player i's perception of player/s information (Q,it for j V 0 as well as past information errors in (Qit _k for k ^ 1). But any policy revisions will depend solely on Qit, and hence on the innovations in the policy-maker's expectations of the non-controllable variables which determine his anticipations of the future behaviour of the other participant. Finally we come to the difficult question of the time consistency of these decisions. Notice that (9.5.2) allows for the 'non-causal' effects of anticipating the opponent's decisions and their outcomes, dy^/dx%k # 0, for k^ 1, since D^ is not block triangular. Hence the decisions represented by (9.5.2) react to a set of intertemporal (as well as current) conjectured responses, including any policy threats which lie in future periods. These anticipation effects appear even in solutions without conjectures (the openloop Nash solution) since D{1\ for i = 1,2, is not block lower-triangular even when D^ = 0. That means, just as in the rational-expectations framework of chapter 8, there is the potential for time-consistent behaviour by each player individually. Having announced certain policy values for period 2 (say), and having persuaded his opponent to take certain actions in period 1 on the basis of that information, each player left to himself will predictably wish not to carry out his announcement. Reoptimisation will produce a new and better set of policies for the second period (thereby invalidating his opponent's proposed first-period actions). The difference between this and the rationalexpectations case is that both players will have incentives to be timeinconsistent. So the first player will not be able to contemplate reneging on previous announcements because, if he can invalidate his opponent's firstperiod decisions by reneging, then so can his opponent invalidate his decisions. Each player's currently expected objective function is therefore at risk - whether he likes it or not - from time-inconsistent behaviour by his
186
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
opponent, and vice versa. If the gains from reneging and getting away with it are small compared to the losses caused when someone reneges against you and/or when both players renege simultaneously, then time-inconsistent behaviour would not be observed. Since this is all predictable in advance (as in the rational-expectations case), each player's currently expected outcomes will be better if they remain with their originally announced policy sequences, compared to what they would get by entering into a sequence of timeinconsistent revisions of those policies.11 The difference between the expected outcomes under these two strategies is the 'bond' which each player implicitly puts up to guarantee his own time-consistent behaviour; it is what he would lose by triggering time-inconsistent responses from his opponent through his own time-inconsistent behaviour. In that case, the decisions will be time-consistent for exactly the same reason as the rational-expectations case. If these 'bond' values are positive it will be in each player's own interest to follow his announced policies, and he will have to realise the gains from using this (open-loop) type of jointly determined multiperiod decision rule rather than the period-by-period backwards-recursive decisions from the subgame-perfect approach. The latter is, of course, also time-consistent, but it is inferior to the multiperiod decision rule, and represents the decisions which each player will force his opponent to revert to if the opponent's bluff is called. Either the multiperiod decision rule will be consistently followed (if the bond value is positive) or a subgameperfect strategy will be used (if it is not). Finally we can check that the bond values are positive by explicitly reoptimising £tI-(Jf), for r ^ 2 and a fixed information set inclusive of the effects that future expected decisions induce in current behaviour, to find that the optimally revised decisions are precisely those expected in earlier calculations. In summary, time consistency holds in this type of game because the optimisation process is not dependent on having a time-recursive objectivefunction structure, and because the reoptimisation preserve the necessary Markov property for the forward steps of the revision process of a set of decisions which cannot be separated into a time-nested sequence of actions. There are several other ways of specifying time-consistent behaviour which depend on memoryless rules for re-establishing equilibrium at each t (Oudiz and Sachs, 1985). But such rules are inferior because they ignore the intertemporal elements of the constrained objective. In contrast, time consistency holds here because each decision-maker recognises that, if he were to renege, then so would the other. Thus, although each player might gain from time-inconsistent behaviour, he can only do so provided that the other player (despite being made worse off) decides to stick with his original policy sequence. But if both players are time-inconsistent (either in an attempt to realise those gains for themselves, or to minimise the losses caused by anticipated time inconsistency elsewhere), then both will end up worse off.
9.7 Anticipations and the Lucas critique
187
Once this is recognised, the incentive for time-inconsistent behaviour vanishes. Essentially we have a Nash, or subgame-perfect, equilibrium, where there are no incentives to renege (Holly, 1986). A numerical demonstration of this is given in Brandsma and Hughes Hallett (1984b). Therefore, in games between countries, the need to predict future 'foreign' actions restricts each government's freedom of choice and ensures that the revised decisions are consistent with previous announcements. 9.7
ANTICIPATIONS AND THE LUCAS CRITIQUE
A common objection to standard game-theory solutions is that the conjectures which agents are presumed to hold are usually wrong. The openloop Nash solution, for instance, requires each player to base his actions on the assumption that his rival will not react to any policy adjustments, although both players will be affected. This assumption nevertheless generates non-zero reactions (i.e. D{^\ D[2) in (9.5.3), given D t f ^ O , £>(02) = 0) which falsify the original conjectures. This is true for all conjectures except those where the conjectures turn out to equal the optimal reactions. The inconsistencies at earlier steps of (9.5.4) are therefore just a demonstration of the Lucas critique. The policies of one player, based on certain conjectured responses elsewhere, will actually generate different reactions from those conjectured. This discrepancy will then invalidate the original policy choice because the spillovers from these new reactions alter the parameters controlling the responses of the first player's targets to his own instruments. For example, the open-loop Nash solution generates policies for player 1 assuming that RX1 describes the responses to domestic policy changes, whereas in fact, by (9.5.3), R{U1) + R{U2)Df] will determine these responses. Player 1 is then obliged to modify his proposed action to account for the changed dynamic responses induced by other agents' reactions. More generally, the responses R{iJ) + R{i[J)D^] change to R^ + R^D^^. Only at termination is the Lucas critique satisfied because, if one player tries to gain by altering his reactions again, the other will certainly be made worse off and will retreat to a position which makes them both worse off.
9.7.1
A hierarchy of solutions
We noted above that the open-loop Nash solution is given by (9.5.2) with D(1) = 0 and D{2) = 0. That solution is a special case of (9.5.3) generated by setting s = 0 and D{Q] = 0. An open-loop Stackelberg equilibrium assumes that one player (the leader, player 1 here) takes his opponent's responses as given and anticipates no further reactions, while the follower optimises relative to the leader's optimal decision; i.e. that D (1) = 0 but D (2) # 0. Thus the leader
188
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
expects SXi2)* = Di2)SX{1) + e(2) where: £><2> = _
(R(2,2YQi2)Ri2,2)
^ ( 2 ) ) - lfl(2,2 Y Q(2 )R (2,1)
+
= D<
and e(2) = Fi?)Eti[c{2)']. The leader's targets would then behave as: yd)
=
(#1.1) + ^(1.2)^(2)^(1) + c (i) +
R(U2)e(2)
(gJ2)
implying the optimal decisions:
X{1)* = X{1)d - [Git>Qil)Git) + N (1) ] " ^ T O 1 ^ * 1 * + R^'2V2)) (ltl)
where G! = R
(1 2)
-f i^ ' D
l
(2)
/
(9.7.3) {2)
. Using this equation to determine X *9 we get:
01-'I F^E \
\
(9.7.4)
which is a special case of (9.5.3) at s = 1 with D(o1} = D[1} = 0 and D(o2) = 0. On the other hand, the leader might take the follower's policy rule as given, say x\2) = Ktyt^x +/e r The follower's reaction matrix can be obtained by stacking and substituting out the yt_t terms by (3):
o
o
Xi2) =
K2 y0 (9.7.5) KT
it]
or where
for/=1,2 R
(t,2)
Hence D{1) = 0 again, but now D{2) = {I-KR(2)). This feedback Stackelberg equilibrium is one step of (9.5.3) from D{01} = 0 and D(o2) = O = ( / - K R ( 2 ) ) " 1 A : R ( 1 ) . Finally, a feedback Nash equilibrium holds when each player takes his opponent's policy rule, rather than policy values, to be fixed. Player 2's reaction matrix, D{2\ is the same as in (9.7.5). Player 1, who uses xjl} = Kt yt _ x + kt, will have the corresponding reaction matrix D{1) = [/ - KR){1)~\ " x KRi2). Thus the feedback Nash solution is the first step
of (9.7.4) from D^=[I-KR^y'KR^
and D2 = [ I - ^
9.8 Stackelberg games: the full-information case
189
Conventional solutions are therefore all special cases of the general solution (9.5.2), in which certain prior restrictions are placed on the conjectural variations or reaction matrices D{i). The generalisations offered by a conjectural variations solution are twofold: (i) the reaction matrices are not restricted to any pre-assigned values, and (ii) the reaction matrices contain no implicit zero restrictions across time intervals. The latter allows current decisions to react fully to anticipated future decisions, and in that they represent more sophisticated behaviour than the Nash strategies, which feed back only from past economic states or past policies. Endogenised anticipations of that kind are necessary if policy-makers are to be able to react to rational expectations of their opponents and to avoid the Lucas critique. 9.8
STACKELBERG GAMES: THE FULL-INFORMATION CASE
In this chapter we have distinguished two types of Stackelberg game. In the first, the leader' acts as a Cournot player and the 'follower' then determines his strategy knowing what the leader has done. In the second, the leader determines his strategy on the assumption that the follower will subsequently act rationally. There are a number of alternative definitions of Stackelberg in the literature, so to avoid confusion in this chapter we have been using the second type of definition. Note that the essential feature is that the leader and follower have/w// information. Each knows the tastes and intentions of the other. In the next chapter we shall relax this very strong assumption, in which case it may sometimes be optimal for the Stackelberg leader to act as a Cournot player since he may not know what the follower will do. As with most of the material in this book this kind of analysis can provide both positive and normative descriptions of behaviour. On the positive level, Stackelberg solutions can be used to characterise inter-firm behaviour within oligopolistic industries in which a firm or small group of firms dominate output with the remainder accounted for by a large number of small firms. The dominant firms act as price-fixers and the small firms follow. Alternatively, Stackelberg games can be used to provide normative solutions to planning problems of centrally planned economies in which there is a measure of hierarchical decentralisation. The central planning authority lays down broad guidelines as to targets and then decision-makers at the level of the individual firm or plant are at liberty to decide how the target is to be achieved, what mix of labour and capital to use and from where to buy raw materials. Our primary concern is with the normative and positive aspects of macropolicy in mixed economies, but it is not immediately obvious that national macroeconomic policy-making ought to be formulated as a Stackelberg game, even though international economic policy may be another matter (see, for example, Canzoneri and Gray, 1985). International trade is dominated by a number of large economies or blocks of countries acting in
190
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
unison on tariff questions or acting as cartels in the supply of commodities. There are examples in which attempts have been made to achieve cooperative schemes beneficial to all countries - such as the General Agreement on Tariffs and Trade. But in most instances large single countries have tended to act as leaders, or have cooperated with other large countries to fix tariffs or nontariff barriers to trade or to impose import quotas. For it to be worth while for a player in a Stackelberg game to act as the leader, it is necessary that there be some incentives over and above those arising from the corresponding Nash solution. The leader must be able to gain by acting first. Given the dynamic constraints provided by the econometric model, the leader's advantage can be of two kinds. Either his policy instruments impact on the economy before the instruments of followers, or the follower is resticted in the freedom with which he can vary his strategy. If the leader does not have any advantage, then, if he determines his strategy on the assumption that the follower will act in his own best interests conditional on what he does, the resulting outcome is bound to be the Nash solution. Largeness, then, is not necessarily the natural criterion by which the leader could be selected. Since there are advantages to the leader being able to act first - and in some instances it may be better, relative to the Nash solution, for followers to allow for a leader- a measure of cooperation, and possibly sidepayments, may be necessary. To illustrate these points, consider the simple, two-player, non-cooperative game described by: +x? f + 1 )
(9.8.1a)
i+At)
(9.8.1b)
where J1 is the objective function of player 1 and J2 is the objective function of player 2. y is the state and xx and x2 are instruments of player 1 and 2 respectively. It is assumed that the state of the world is described by the twoperiod model: yt = yo + x2t yt + i=yt + xu + i
(9.8.2a) (9.8.2b)
If we assume that player 1 is the leader, and we solve for the optimal response of the follower, player 2, we have:
so the optimal response for the follower depends upon what the leader is expected to do in the next period, where, as before, the second subscript is being used to indicate the period in which strategy is being determined. The
9.8 Stackelberg games: the full-information case
191
leader is, therefore, faced with the state of the world described by: yt = 2/3y0 + 2/3xu ^ +1=^ + ^ +1
+1
(9.8.4a) (9.8.4b)
We now have a non-causal system of constraints. If the leader determines his optimal strategy on the basis of these constraints we have xit + lt = 2/1 y0. Given the announced strategy of the leader, we assume that the follower actually implements x2ttt. In period t + 1, however, if the leader re-evaluates his strategy by minimising only: Jlf
+1
= l/2xff + 1
(9.8.5)
his second-period optimal strategy becomes x lf + 1>r + 1 = 0 . Suppose now y0 = 1. The first period's solution then implies that Jx = 0.581 and J2 = 0.638. When the respective strategies of both leader and follower are viewed from period t, the leader in this problem is concerned with the state of the world at time t, but can only influence the state at f + 1; while the follower is only concerned with period t + 1 but can only influence it indirectly via yt. The game at time t consists of the follower and the leader agreeing to a strategy which the leader reneges on in period t + 1. But notice that in period t + 1 the follower has no instrument left with which to retaliate to the leader's timeinconsistent behaviour - nor, therefore, does he have the opportunity to be time-inconsistent himself, so there are no gains to be lost through the leader reneging. The time-consistency arguments of section 9.6 cannot be applied here. This example has been deliberately contrived to show how, in a Stackelberg game, it can be in the interests of the leader to act other than he was expected to act by the follower and to gain by it. But consider the slightly more general game described by equations (9.3.6) and (9.3.7), in which the follower has instruments which affect yt + l directly and both leader and follower are concerned with the state of the world at both t and t + 1. Minimising (9.3.6) with respect to (9.3.7), we have the optimal strategy for the leader and the follower as x l t + lft = —4/7; x2ut ~ 5/7; x2t + ltt = 3/7, which is identical to the Nash solutions given by (9.3.8). If the leader now re-evaluates his strategy in period t + 1, allowing that the follower is permitted to react too, we find that x l t + 1 H—4/7. Hence, if there are any gains from the leader reneging, the follower can offset them by also reoptimising in his own interest. In this case, although we have described the game as Stackelberg with a leader and a follower, in fact the optimal solution is a Nash equilibrium solution to a non-cooperative game. It is not in the interests of either the leader or the follower to depart from this solution. Thus it takes more than just an agreement to transpose a dynamic game into the kind of full-information Stackelberg game considered here. Note that a Nash solution to (9.8.1) and
192
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
(9.8.2) does not exist even if there is no leader. Player 1 always has an advantage in period t + 1 to act other than he agreed to act in period t. Whether or not the solution to a dynamic game is Nash or Stackelberg will depend upon the structure of the constraints. In the first game the inability of the follower, player 2, to have any direct influence over yt + 1 is the primary reason for the emergence of a Stackelberg solution. Of course, this would not matter if player 2 attached no weight to yt + l in his objective function. But in this case the problem would be degenerate anyway. To have a Stackelberg solution, with full information available to both players, requires in the absence of any natural advantages obtaining in the constraints, such as were found for (9.8.1) and (9.8.2), that the leader can impose a solution.
9.9
DYNAMIC GAMES WITH RATIONAL OBSERVERS
In this section we examine dynamic games in which the players of the game have to take into account how non-participating agents will react to the outcome of the game (Rogoff (1985), Holly (1988)). In particular, the nonparticipants form rational expectations so that the set of constraints which the players face is not in the form of equation (9.3.1) but is instead represented by: V, = Ayt-X + Dx tf+1|f + Bxxu + B2x2t + Cet
(9.9.1)
This is identical to (8.2.2) except that now the set of policy instruments has been partitioned. This equation can be taken to represent the constraints of a variety of economic-policy problems. The formulation of coordinated international policies between individual countries requires that the (forward-looking) reactions of the private sector - especially foreign exchange markets - are taken into account. Within single countries attempts to arrive at agreements between competing groups within society, such as employers' representatives and employees' representatives, require, in addition to the will to agree, that those who are not party to the agreement believe that it will work. As in section 8.3, two non-recursive methods are described in this section the direct method as a generalisation of the Theil procedure used extensively elsewhere in this book, and the penalty-function method of 8.3. 9.9 A
A direct method
Stacking up T equations, for T periods, as in the previous two chapters, we
9.9 Dynamic games with rational observers
193
have: (I-Do)
-D,
0
- A ,
•'•..
-Dl
0
'.-A
(I-Do) 0
+
JT\I
_
x21jl DxyT+1
so
-1
(/-D o ) 0 B
0
R11
. . . R1T
RT1
....
-Di
0
c
u|i
' - ^ (/-Do)
C
21|l
RTT s i n .
+
X
p2 71
A
•' •
p2 T7
K
ll|2
+
SI'T-U
(9.9.: This final-form solution for a dynamic game is not block lower-triangular so the calculation of the dynamic multipliers is more complex. Moreover, an additional dimension is introduced into determination of the terminal conditions since different players might have different views about the terminal conditions. Nevertheless, we assume here that common views are held by all players of the terminal conditions. Producing the open-loop Nash solution for this form of the model is then simply a straightforward matter of rewriting the model as (9.4.4) and then computing (9.4.10). The resulting solution has the property that it is a Nash solution for two players which lies on the convergent saddlepath for the economy described by (9.9.1). The
194
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
solution is: KiGiKi+Wi)
(^'161*2)
(R\
y2 -R2Q2
9.9.2
(9.9.3)
0
A penalty-function method
Rewriting the stacked form as: 0 /. -D. -A
.
'••.
-A
'I
\JT\I
__
B2 22|l
0
/2I1
(9.9.4) 0
0 and rewriting (9.4.4) as: Y = RiX1 +R2X2 + SYe + s
(9.9.5)
where now we have: - Y'yQ^Y - Y{) + {Xx - X{)fN1(X1 - X{j\ J2 =
d
d
d
d
- Y 2)'Q2(Y - Y 2) + (X2 - X 2YN2(X2 - X 2)~\
Je = l/2(7 e - Ys)'P{Ye - Ys) + l/2(Ye'GYe)
(9.9.6) (9.9.7) (9.9.8)
we can then use the same penalty-function method approach of chapter 8 to compute the non-cooperative Nash solution to this two-player game with
9.9 Dynamic games with rational observers
195
rational observers as:
~x* x*2
( R'lQiRi
ye
0 X
+ N,)
(R~1Q1R2)
(R'zQiRi) (R'IQ.,R2 + N2) (#'2( (l-VK YPVRJ [ - ( / - VK)'PVR2-] [( I — \ f-F/ C) + G ] _
=
0 0
^'262 0
-RiCi
0
-«2Q2
0
[-(/-FX)PK] [(/ - VK)P]
Ni
0 0
0
(9.9.9
N2 0
V
x\ x{
Both solutions to the multiplayer game assume that the public acts as an observer, taking account of the consequences of the game for the behaviour of the economy. As long as the game is Nash, then there are no incentives, in the absence of cooperation, for one of the players to depart from the Nash solution, provided we are willing to go along with the assumption that the public is fully aware of what is going on. The combination of a Nash equilibrium between the participants in the game and rationally formed expectations by those not part of the game but affected by it and able - in aggregate though not individually - to affect the result of the game produces a particularly interesting class of problems which mirror those we have discussed earlier. Now, instead of a policy-maker facing a public which forms rational expectations, we have a number of players who have to take into account how non-participants will react to the game. As in the single-player game, in a many-player game the possibility of there being incentives, ex post, for players to act differently from that which they decided upon earlier, even in the absence of stochastic disturbances, has to be considered. In the terminology of Selten, the subgame-perfect equilibrium provided by the Nash solution may no longer be subgame-perfect when nonparticipants form rational expectations. As an illustration, consider a variant of the stylised game described by equations (9.3.6) and (9.3.7) where now (9.3.7) is: yt = x2t + 0.5yt + 1
(9.9.10a)
yt + i=yt + Xu + i+X2t + i+0 (9.9.10b) yf+1|, is the rational expectation, formed at, time period f, by those not actually participating, about the outcome of the game in time period t+ 1. The terminal condition for yt + 2\t is given by 6. We assume that 6 = 0. The objective functions for the two players are as before: (xlf + 1 ) 2 ] - I) 2
~ I)2
(9.9.11a) (9.9.11b)
196
9 NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
If each player minimises his objective function subject to (9.9.10), then what we can call the ex-ante Nash solution, satisfying (9.2.1), gives the result x2t\t = 70/ll,x 2 r + i|r = 114/11, and x l r + 1 | r = — 15. This yields the ex-ante cost Ju\t = 302.4, and J2t\t — 204.2. The solution is a perfect equilibrium as long as the non-participants believe that the two players will go through with the game. For this to be a reasonable conjecture, we require that the solution is also subgame-perfect. In other words, there must not be any incentives in time period t + 1 for the players to depart from x l r + 1|, and x2t + i\n once x2t\t has been implemented. However, if each player recalculates his strategy in period t + 1, we find that xlt + 1\t + 1 =64/33 and x 2r + 1|r + 1 = 130/33. Thus, just as in the single-'player' game of chapter 8, when the private sector forms forwardlooking expectations of what the policy-maker does, there can be ex-post incentives for time-inconsistent behaviour. The difference here is that now it is in the interests of both players in the second period to play a Nash game between themselves. But this solution for time period t + 1 is time-inconsistent from the point of view of the non-participants in the game.
10 Incomplete information, bargaining and social optima 10.1
INTRODUCTION
In chapter 9 we considered non-cooperative, full-information games; that is, competitive decision-making in which privately optimal solutions were sought by private agents. The macroeconomic policy-maker, who was also assumed to have full information, pursued his own optimal policy. The objective function lying behind this policy may have elements of Bergsonian social welfare, with policy-makers maximising the utility of the representative individual. But it may be that other considerations are also important here. The information that is needed to be able to make an optimal choice can sometimes be very onerous. In competitive, atomistic markets, because agents can make their optimal choice solely on the basis of relative prices, the information needed to make decisions is comparatively small. At the other extreme of competitiveness, the monopolist needs only to know the slope of the demand curve facing him, as well as the vector of relative prices, to be able to make optimal decisions. The difficulties lie in between these two extreme cases where conditions of oligopoly obtain. The analysis of decision-making is much less straightforward, not because the usual marginal conditions for the optimal choice do not apply, but because an agent has to know how other agents will respond to what he does himself and then know how to respond in turn to this. The information requirements here are overwhelming. There may be incomplete information not only about the performance of the economy but also about the intentions of both private agents and the government. The problem of how to act when there is uncertainty about the underlying economic process is one which has been studied extensively. In chapter 4 we examined the role of the Kalman Filter when observations available on the state were corrupted by noise. We also referred to the adaptive-control literature (Bar-Shalom and Tse, 1976; Kendrick, 1979), which is specifically concerned with the problem of learning, in the statistical sense, about an incompletely known system. Some attempts have been made to analyse dynamic games with incomplete information about the state,1 and
198
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
Aoki (1977) has examined a number of economic examples of this type. While we believe that this approach may yield some important results for the analysis of economic processes under imperfect information, we do not feel that it is capable of handling the sort of problems which can arise when the issue is uncertainty about the intentions of other players. The question of incomplete information about the intentions of other players in a dynamic game is at the heart of the difficulties we have found in chapters 8 and 9, with the wider issues of a government's reputation, reneging and predictability in economic affairs. The public is uncertain about the behaviour of the government but this uncertainty is not necessarily statistical in the sense that a frequency approach can be taken to determine the likelihood of the government taking a particular decision. The basic problem is to establish the conditions under which this uncertainty can be resolved so as to provide an equilibrium which is sustainable. Because non-cooperative imperfect-information solutions may cause problems for the conduct of economic policy, there may be major benefits from cooperation. The Nash non-cooperative solution, even under perfect information, may not be Pareto-optimal. Players can therefore make themselves better off by agreeing to adopt a cooperative solution. Cooperation itself involves bargaining over the setting of the individual policy instruments under the control of different decision-makers. Therefore it necessarily dispels some of the uncertainty in that the 'reactions' by others will become known, and this may avoid the need to collect information on (or to estimate) the priorities, expectations and economic responses of other players. But there are a number of problems which then arise. First, we need to be able to calculate outcomes which lie in the Pareto-efficient region. Secondly, we need a bargaining strategy which will lead to the choice of one of these outcomes, and, finally, we need mechanisms which will ensure that partners to the bargain keep to their part of the agreement. The sustainability of a bargaining solution will have a strong bearing upon the conduct of the bargaining process because, if some players feel that others are not to be trusted, cooperation is much less likely to result. 10.2
INCOMPLETE INFORMATION, TIME INCONSISTENCY AND CHEATING
In the introduction we identified two kinds of incomplete information. First, there was the problem of the participants in a dynamic game not knowing the precise state and, secondly, there was the situation in which they did not know each other's precise intentions. In chapter 9 we observed that, under the assumption of full information, a Stackelberg solution did not necessarily yield benefits to the leader superior to the Nash non-cooperative solution. However, when there are asymmetries in information the leader can exploit the follower's ignorance of the leader's true intentions by cheating.
10.2 Time inconsistency and cheating
199
The possibility of cheating (Hamalainen, 1979) arises when the leader knows his own objective function and that of the follower, while the follower does not know the objective function of the leader. In this situation the leader announces one policy so as to elicit a particular response from the follower, but actually implements another. The basic idea is illustrated in figure 10.1 for the static two-player case. As in chapter 9, the quadratic functions in xx and x2 are represented by indifference contours, with axax the locus of strategies of player 1 which gives the smallest cost for player 1 for fixed values of x 2 , and with b2b2 representing the equivalence locus of strategies for player 2. When player 1 is the leader, the tangency Sx denotes the Stackelberg solution without cheating where the leader adopts the policy x\. But the leader can do better if he announces s\*, which elicits the response xs2 from the follower, but x2k
X{**
X\*
X\
Figure 10.1 Cheating and Stackelberg solutions
200
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
actually implements x\*, since this places the leader on a lower indifference contour. By cheating in this way, the leader does better than Sx, but he can do even better. Perfect cheating arises if the leader can persuade the follower to adopt a policy which enables the leader to achieve the unconstrained minimum of his objective function denoted by Ox. He can do this by announcing xf, to elicit the response x{c, but actually implementing x\**, which achieves the unconstrained minimum for player 1. On the face of it, extending this analysis to a dynamic setting should be reasonably straightforward (though, for some difficulties with the perfectcheating case, see Hamalainen, 1979). The real problem lies with the reasonableness of cheating over time, i.e. in a repeated game, when the follower will be able to observe the discrepancy between the leader's announcements and what he actually does. A sustainable strategy of cheating requires a large degree of gullibility on the part of the follower. Cheating in the Stackelberg game has many similarities to time inconsistency.2 And the same issues arise about how the macroeconomic policy-maker can get away with reneging or cheating and for how long. Consider the macroeconomic policy model used by Fischer (1980). A consumer faces a two-period problem: to save in the first period and to supply labour and to consume in the second period. The government only acts in the second period, by taking capital and income earned by supplying labour, and by using the revenue to finance its expenditure. The utility function of a representative individual is assumed to be the logarithmic function: J(cnct
+ l9gt + 1)
= lnct + y[lnct
+1
+ < x l n ( w - n , + 1 ) + j81n0 f + 1 ]
(10.2.1)
where ct is consumption in period t,nt + 1 is the amount of labour supplied in period t + 1 and gt + x is government expenditure. Individuals are assumed to obtain utility from government expenditure. The production function is linear, with a and b the marginal products of labour and capital respectively. There is an initial endowment of capital kv The consumer faces the constraints: ct + kt +! = (1 + b)kt = Rtkt
(10.2.2)
Tl + 1 )n f + 1
(10.2.3)
where Rt + 1 is the after-tax rate of return on capital and xt + 1 is the tax rate on labour income. The consumer treats taxes and government expenditure as exogenous, and he chooses his consumption and labour supply to maximise (10.2.1). Then: ct= [1 +y(l + a ) ] - > « [ ( l -Tt ct + l=yRt
+ 1Ct
+ 1)/Rt + 1]
+ Rtkt}
(10.2.4) (10.2.5)
T< + 1 )
(10.2.6)
10.2 Time inconsistency and cheating
201
Thus, the higher the expected second-period tax on capital, the more that is consumed in period t and the less that is available for capital investment. The government also chooses its tax rates, rr + 1 and Rt + U to maximise (10.2.1) subject to its own budget constraint: gt + 1 = (Rt-Rt
+ 1)kt + 1+xt + 1ant + 1
(10.2.7)
It is assumed that the optimal tax rates coincide with those used by economic agents to calculate their consumption and labour supply plans. The optimal tax rates R*+1 and T*+ l are complicated non-linear functions, but the firstorder conditions imply, as is to be expected, that the utility of government spending at the margin is equal to the marginal utility of private spending, i.e.: gt + l=Pct
(10.2.8)
+1
Dynamic inconsistency or cheating occurs when we re-examine government policy from the vantage point of period t + 1. Bygones are bygones, and the representative consumer has already implemented his firstperiod consumption plan and so the capital stock, /cr + 1 , for period t + 1 is predetermined. The representative consumer now maximises: Jt + 1(ct + ugt
+ 1)
= \nct
+1
+ccln(n-nt
+ 1)
+pin gt + 1
(10.2.9)
subject to: ct + 1 = R t + 1kt + 1+(l-rt
+ 1)ant + 1
and again, for given values of Rt + uTt and labour-supply plans are:
(10.2.10) +l
and gt + u the optimal consumption
1 fc r+1 ]
(10.2.11) (10.2.12)
The government also maximises (10.2.9) subject to: 1+(R-Rt
+ 1)kt + 1
(10.2.13)
which implies: T, + 1 = 0 ^^^(l
(10.2.14) + a + ^-^^l
+ a)-^/^^]
(10.2.15)
It is now optimal not to tax income generated by work, but to tax capital, since the capital has already been built up by first-period investment. In this example there is dynamic inconsistency even when the government maximises the welfare of the representative individual. The reason for this lies in the difference between ex-ante and ex-post welfare. Ex ante, the government encourages capital build-up by indicating that in the next period taxes on capital will be low and taxes on income high, so investment is undertaken. Ex
202
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
post, bygones are bygones, the capital stock has increased, so it is now optimal for the government to encourage as much second-period consumption as possible by not taxing labour income, and to finance its expenditure by only taxing capital. Ex post, individuals are better off if the government acts inconsistently, but only if, in the first period, they originally believe that the government will actually implement R?+ x and T*+ J although in fact it does not. The view that it is in the interests of individuals that they be fooled by what Fischer (1980) has called a 'benevolent dissembling' government was used by Pigou (1936) to justify government intervention in determining the mix between investment and consumption over time. It could also be used to justify accelerating development in underdeveloped countries by forced saving or the rapid industrialisation of Russia by Stalin in the 1930s by depriving the agricultural sector of resources. The difference in Fischer's model is that the government does not use coercion to enforce the socially optimal solution. Instead it cheats. One difficulty with this example is that the two-period horizon exaggerates the freedom with which the government can renege. The representative consumer has no recollection of the government reneging in the past, and therefore takes the government at its word in period one. In period two the consumer finds his first-period expectations to be wrong but has no opportunity to use this information to condition the expectations he will form for subsequent periods. In a many-period game, the public would have to take a view of how the government will tax capital in the next period in the light of what the government has done in the past and what it has announced in the past. There is then the question whether, over many periods, the game will settle down to some sustainable equilibrium. To be able to provide an answer we need some broader considerations which until recently lay outside the normal scope of the theory of economic policy. One possibility which is examined in section 10.6 below is that the reputation of the government may be an important constraint on the discretion with which it can act since the loss of reputation may make it impossible for the government to achieve even the basic Nash non-cooperative policy.
10.3
COOPERATION AND POLICY COORDINATION
A key feature of economic policy is the impact of expectations (generated within the game itself) on the decision process. The realisations that 'foreign' reactions can offset 'domestic' action has persuaded many politicians to call for internationally coordinated policy-making. This has perhaps been most obvious in the context of international policy design, starting with the communiques issued at the annual economic summit meetings since 1975,
10.3 Cooperation and policy coordination
203
through the discussion of concerted action to reflate the world economy in the early 1980s, to'the coordinated monetary arrangements in 1985/6. Indeed the 1986 annual IMF-World Bank meetings proclaimed that 'International coordination of policies [is] increasingly viewed as [the] key to expansion' (IMF Survey, 3 November 1986). 10.3.1 Cooperative decision-making It is a well-known result that, in the absence of side-payments, the outcomes of a non-cooperative game are socially inefficient in that alternative decisions exist which would make all parties better off. The following section, for example, shows that a necessary condition for any non-cooperative decision rule to be Pareto-efficient is that each player's reaction matrix, dXU)/dX(i\ must vanish and that the aims and priorities for the union of targets in the game must be identical for both players. If these conditions are not fulfilled and the fact that Y(1) ^ Y{2) (i.e. to the extent governments pursue private and national rather than public or international goals) means that they can never be fulfilled - then another decision rule will dominate because at least one player can be made better off without the other being worse off. Da Cunha and Polak (1967) have shown that the set of non-dominated decisions can be generated by minimising the 'collective' objective function : J = OLJ1 + ( l - a ) J 2
where 0 < a < l
(10.3.1)
subject to: Y^ = R^X{1)
+ R{i>2)Xi2) + s{i) for i = l , 2 (1)
{2)
(10.3.2)
with respect to Z and X together. Once a collective objective (or a value of a) is agreed, all parties can gain by following a Pareto-efficient cooperative solution, although a redistribution of benefits may be implied. But, by the same token, the cooperative solution is not optimal for player 1 in terms of Jx alone, given that he has now acquired a measure of control over X(2). There is therefore an incentive to cheat while the other countries maintain their cooperatively optimal decisions. The existence of this kind of risk may explain why politically sovereign policy-makers have not been persuaded of the advantages of cooperative solutions in practice. To achieve cooperation it is necessary to establish that the losses risked, if the other country does cheat, are acceptably small. If those losses are not small, we shall need to design suitable punishment schemes. But perhaps the more important question is the prior one: will the coordination gains be worth while, compared with the loss of individual sovereignty? This is an empirical issue, to be judged on a case-by-case basis, which theoretical models can do rather little to resolve. It is also important to realise that cooperation may entail redistributing the policy gains in a way
204
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
which may correspond poorly with national priorities or bargaining power. It is therefore important to estimate the distribution as well as the size of the gains from cooperation, compared with those of a non-cooperative strategy, in order to be able to say anything about the chances of sustaining cooperation in practice. 10.3.2 Socially optimal decisions Under what conditions will optimal non-cooperative decisions also be socially (Pareto) optimal? Consider the simple open-loop Nash solution for the non-cooperative problem, that is, equation (9.4.10), which corresponds to the zero-conjectures solution (9.5.2) with D (1) = 0, and D{2) = 0. Take the private objective functions which were used in chapter 9, that is: Qi =
Then (9.5.2) with D (1) = 0 and £>(2) = 0 can be rearranged to give: '2 + M'2R{2'2) + 1
On the other hand the optimal cooperative solution is an extended singleplayer problem with the global objective J = OL1J1 + cc2J2 and hence (with full generality) relative priorities given by: a2M2 Q=
where z =
<xlM\
Nu
*2M'2
N'12
Nl2 N22_
and
q=
and where y represents the union of the two target sets. It
is therefore understood that Qf, Q\, M1 and M 2 have been augmented by rows (columns) of zeros where the corresponding element of y is a target of the opponent but not of the player whose preferences are being modelled. The optimal cooperative instrument values are then given by the single-player solution of chapter 3, where the combined model is: Y = Rn>X^ + /?(2>A-(2> + 5 where K(i> =
10.3 Cooperation and policy coordination
205
as before. Hence: rR{1)BRll)+ lR{2)'BR{i)
<*!#»'Mi+aiiM'iRM + Nu + OL2M2R{1)
RilYBR(2) + a2R{1)'M2 + (x1M'lR{2) + Nl2l
+ tx1R{2)'Ml + Nt2 R{2) BR{2) + oc2R{2)'M2 + <x2M'2R{2) + N 2 2 J
rx^*-x^nRiirUaiM\)b+R«yqi+q2ll [x(2)* _X(2MJ l(R(2rB+a2M'2)b + RWqi +q22] where B = ^Q* +
OL2Q$ and
\
b = s - Yd + X
Notice that the Pareto-optimal property holds (by Da Cunha and Polak's results) in (10.3.4) if OLX, a 2 > 0 and OLX + a 2 = 1. Obviously (10.3.3) and (10.3.4) can coincide generally only if: (a) B = Q* = Q*, ie. Q* = a 2 /(l -^)Qf or Qt = Q*2. Similarly we also require that Y{1)d = Y{2)d. Notice that this condition applied to B automatically implies that R{1) = Rlul) and R{2) = R{2>2) since there cannot then be any zero rows/columns in Qf and Q\. In other words all targets are in common: no player pursues a target which is not pursued by other players. (b) M, = 0, and M 2 = 0. (c) AT12 = 0. The generalisation of this proposition to multiple players is straightforward. Let J = YK aih s u c h *h at Z a i = 1 an<^ a / ^ ^ - We use the fact that we can treat X2 above as if it were p — 1 collected opposition players behaving as one 'individual'. Therefore we can compare (10.3.3) and (10.3.4) in the p versions obtained by rotating i = 1 to be each player and i = 2 to be his opponent. We subdivide R{2) as [R{2\ . . . , Rip~1}] and X{2) conformably each time. Parts (a) and (c) immediately generalise to (a) Q? = £ ? = 1 [a/(l--a i )]y 0 > i , and (c) Nij = 0 for all i#y\ Part (b) remains as M, = 0 for all /. All three parts are for /, 7 = 1 , . . . , p. This proposition makes it clear that the conditions on preferences for the private policy choices to be socially optimal are extremely stringent: (i) Individual preferences over the target priorities, the list of targets and their ideal values must all be identical or identically linearly dependent. Different goals are ruled out. The time profile of target penalties must also be identical across individuals, despite the different electoral timetables, contract periods, or other periods of account which they will face in practice. Finally, no targets which are specific to one player alone can be entertained. (ii) Mf = 0 states that there can be no envy (emulation, demonstration effects, etc.). No disutility is felt by any player as a result of the joint values of his own instruments and the union of the targets. There is no extra penalty on any individual preferences (beyond that on an
206
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
individual variable's failure) if a target is achieved by a disproportionate sacrifice by one player: as, for example, when organised labour has to forgo increases in wages in order to achieve higher investment, or when entrepreneurs accept higher taxation in the interests of an equitable distribution of the social wage. (iii) Ntj = 0 states that there can be no externalities; the use of one player's instrument imposes no costs on the preference function of any other player. To identify the conditions under which privately optimal choices provide a socially optimal solution is an obvious as well as a traditional concern. Indeed, much economic theory, in the shape of Walrasian general-equilibrium analysis, is designed to investigate that very question and to show that private rational decisions in an exchange economy under perfect competition will also be socially optimal. However, in terms of the private decisions investigated here, ruling out conflicts in tastes and all externalities to ensure a social optimum effectively makes a game (competition) hardly seem necessary. But it is obvious that for genuinely optimal private decisions by (9.5.2) with D<°/0 there are no general conditions under which they produce a socially optimal outcome. On the other hand if the open-loop Nash solution is socially optimal, then the conjectural variations solution must also be socially optimal, since it is necessarily as good as, or better than, the Nash equilibrium. Notice, in addition, that perfect competition is a special case in which D^ = 0 due to the large number of equally insignificant agents with very fast decision speeds. The results of this subsection therefore serve to point out the very specialised nature of the perfect-competition assumptions, in addition to the restrictive conditions on preferences to ensure socially optimal decisions. In the more interesting cases, where competitition is genuine but not 'perfect', private (Nash equilibrium) decisions cannot be socially optimal. 1033
Some examples
A number of examples of the coordination of economic policies have appeared in recent years - although it is a feature of work in this area that few of the examples are based on estimated models. In fact, most previous work has been based t>n small analytic models of identically symmetric economies and/or static-decision rules. For example, Hamada (1974, 1976) and Johansen (1982) study identically symmetric economies under static-decision procedures. Currie and Levine (1985), Miller and Salmon (1985). Oudiz and Sachs (1985) and Taylor (1985) consider identically symmetric economies but dynamic decisions - although the analysis is conducted for the case when the system has reached its steady state. Finally, Canzoneri and Gray (1985) allow asymmetric spillovers (in sign, not size) under static decisions. From these
10.3 Cooperation and policy coordination
207
studies it is rather hard to deduce what the gains from cooperation may depend on, how large they might be, or how asymmetrically they might be distributed. In particular, the importance of sequencing the interventions correctly or using instruments with different response speeds and the cost of resolving disequilibria is not considered. Among empirical studies are Oudiz and Sachs (1984), Hughes Hallett (1986a,b, 1987) and Canzoneri and Minford (1986). The results of these studies suggest that policy cooperation has the following practical advantages: (i) Coordination restores policy effectiveness which is weakened by interdependence and by expectations of foreign reactions. This happens because cooperation involves trading reductions in externalities, imposed on foreigners by domestic action, against reduced domestic externalities from foreign reactions. As a result, smaller interventions can achieve better expected outcomes on average. (ii) Coordination speeds up an economy's responses, and dampens out the transitory effects of external shocks more rapidly. (iii) Coordination enables governments to extend the range of comparative policy advantage, and hence the policy specialisations, which they can exploit. Thus, paradoxically, coordination helps restore a degree of autonomy in policy choice to the policy-makers, and the loss of sovereignty may be more apparent than real. The first two arguments are confirmation of the theoretical results which Cooper (1969) derived from the comparative statics of a small analytic model. In his terminology, coordination cuts the cost of intervention since one needs to intervene less and for shorter periods. But the third argument shows that coordination brings additional benefits to countries which have asymmetric policy responses. These additional gains cannot be detected in Cooper's (or indeed most other people's) analysis since that was based on a model of two identically symmetric economies. Asymmetries between countries play a crucial role in determining the gains from cooperation. Three kinds of asymmetry are of interest: asymmetric domestic-policy responses (which determine policy specialisations); asymmetric spillover effects (which determine bargaining strengths); and asymmetric adjustment speeds (which determine the cost of policy responses). An exercise contrasting US and EEC policies for the 1970s (Hughes Hallett, 1986a) showed, for example, that the important asymmetries were larger and faster domestic-policy responses in the US, and the dominance of US monetary policy among the spillover effects. Thus the US should specialise in monetary policy and the EEC in fiscal policy. For the same reasons the US can make greater gains from competitive policies, while the EEC, despite its low bargaining power, gains more from any sustainable cooperative arrangement.
208
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
However, the incentives to cooperate were not in themselves large, which probably explains why the Americans officially favour competitive decisionmaking with a full exchange of information (Feldstein, 1983). The gains from cooperation were nevertheless somewhat larger in Hughes Hallett's (1986a) study than Oudiz and Sachs (1984) report using static-decision procedures in a dynamic model. This suggests that an important part of the gain comes from coordinating the timing of policy changes. One striking feature of these empirical results is the increasing degree of policy specialisation found in cooperative solutions. In Hughes Hallett (1986a) both the US and EEC concentrate on policy instruments which have the more powerful domestic multipliers: monetary instruments for the US and fiscal instruments for the EEC. This specialisation can be explained as a theory of comparative advantage among policy instruments. In the cooperative solution each country uses its instruments to 'produce' several target values in the global objective (9.3.1) where J, > 0. Therefore it pays for each country to put most effort into those instruments with the lowest opportunity costs, i.e. those with the largest (absolute) multipliers. Specialisation according to comparative advantage always makes it possible to reallocate the intervention effort between countries so as to get improved outcomes for both components in a global objective, since both countries are directly or indirectly 'producing' some of the changes in both Jt and J 2 . In non-cooperative problems there is no global objective, so comparative advantage is restricted to the different instruments in the home country. Hence non-cooperative solutions show specialisation in those instruments with domestic comparative advantage, but cooperative solutions show a greater specialisation by focussing instruments with international comparative advantage; on those with those having domestic but not international comparative advantage being 'phased out'. Hence the opportunity to reallocate instruments according to international (in addition to national) comparative advantage is one important reason why a cooperative solution can always produce a better outcome than the optimal non-cooperative solution. Policy-makers can then exploit the comparative advantage of the same instrument in different countries, as well as that of different instruments within one country. This comparative-advantage argument is closely related to Mundell's 'principle of effective market classification'. But we have had to distinguish the cooperative from the noncooperative case here, and also to provide for policy specialisation rather than for a strict allocation of instruments. A final point to make is that the gains to cooperation may appear in a better performance for different variables in different countries. In Hughes Hallett (1986a), for example, the gains to cooperation take the form of both smaller instrument interventions and better expected target values in the US; but in the EEC those gains arise from a reduction in intervention effort rather than from improved target values. Thus the EEC would exploit the US economy as
10.4 Bargaining strategies
209
a locomotive'; improvements in the US would enable the EEC to reach the same target values with less effort, both because the favourable spillovers are enhanced and because the unfavourable spillovers are blocked. Similarly, an improved EEC economic performance means that the US instruments need to intervene less strongly to start a recovery following the oil-price shock, with the results that the US instruments switch to achieving better outcomes for their own targets. Cooperation therefore reduces certain policy externalities; in this case the cost (to the EEC) of blocking spillovers from US policy and the cost (to the US) of generating a recovery are both reduced. There is, of course, a serious question of how robust these results are to changes in model specification and in the particular information requirements of a given policy problem. It is rather difficult to make any definitive judgements on this issue since there are so few empirical studies available. However, the question must be taken seriously. If there is one thing we have learnt from the theoretical models, it is how sensitive they are to small shifts in their underlying assumptions and restrictions. For example, Corden and Turnovsky (1983) can reverse Hamada and Sakuri's (1978) conclusions on the impact of the terms of trade on policy interactions by simply considering two productive sectors per economy rather than one. Similarly Mundell's (1963) model of the effectiveness of fiscal and monetary policy for interdependent economies with capital mobility yields opposite conclusions depending on whether exchange rates are fixed or flexible. Malinvaud (1981) warns that the untested restrictions assumed in theoretical models are often more extreme in their consequences than those in empirical models. The problem in this case is that there are always multiple spillover channels between economies and we have to account for the net spillover effects. It is very hard to construct a model which manages to treat all channels satisfactorily because it is difficult to evaluate the combined effect of all the partial derivatives of different policy changes, activated in different periods, on each player's policy target in each period. Some empirical work has been done to assess the robustness of coordinated policies to model and information errors: Frankel and Rockett (1986) or Holtham and Hughes Hallett (1987) on model errors and Hughes Hallett (1987) on information errors. It seems that cooperation may imply greater robustness because it shares risks between participants and because it implies an extra degree of flexibility in revising one's policies in the light of past 'mistakes'. However, the real problem appears to arise when players cannot agree on the right model to use. Here the gains to cooperation may vanish entirely when they try to coordinate while each maintains his favoured model. 10.4
BARGAINING STRATEGIES
Bargaining requires that the participants in a game recognise the advantages to be gained by cooperating and that there exists a recognised strategy for arriving at a solution. For some cases the disadvantages of not bargaining
210
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
may be very great. For example in wage-bargaining between an employer and a trade union, disagreement may entail no nominal wage rise and, with a rising price level, a fall in real wages.3 For an employer there may be a total loss of production because of a strike. For the kinds of cases we are considering, the fall-back, 'threat-point' or non-cooperative position is rarely of this extreme kind. For many cases the Nash non-cooperative solution will be the outcome which results when agreement cannot be reached. If the bargain is to be struck between a government and representatives of employers and employees in a corporatist framework, then at worst the fall-back position is the resumption of what prevailed before. For bargaining between countries on questions of mutual concern, such as coordinated fiscal and monetary policies or tariff barriers, the fall-back position in the event of a failure to agree may well be the Nash noncooperative solution, though it may also involve the threat to introduce new protectionist measures. The first analysis of bargaining, by Edgeworth (1981), in a bilateral monopoly model identified the possible scope, or the contract curve, along which players could bargain, but was unable to determine on which particular point, if any, they would finally settle. The contract curve is shown in figure 10.1 as the curved line joining the unconstrained minima (O1 and O2) for each player's loss function. Bargains occur on the segment of the line within the shaded area with the Pareto-efficient set superior to the non-cooperative Nash solution at N. Zeuthen (1930), in the form that Harsayni (1956) also used, established a procedure for negotiation which, for reasonable utility functions, would converge on a unique point. Assume that the fall-back, non-cooperative position is the static, twoperson, Nash, full-information solution: JiM.xJK-Mx^xy
(10.4.1a)
J 2 M,x" 2 )^J 2 (x" 1 ,x 2 )
(10.4.1b) n
for all feasible xl^x\,x2^x 2. that maximises:
The outcome of the bargain is the pair xx, x2
(x*-x1)(x2-x"2) Suppose, in the opening round of negotiation, player 1 demands that he be allowed to play x\, which implies a response from player 2 of x2, and therefore a cost Ji(x\, x2) for player 1. Player 2, on the other hand, wishes to play x2 with the response from player 1 of x* 2 and cost for player 2 of J2(x\ 2> xk2). Both alternatives satisfy: •MX*I2,**2X.M*12,*2)
(10.42a)
J2(xk1,xk21)^J2(x\,x21)
(10.4.2b)
10.4 Bargaining strategies
211
for all feasible x12¥z x\, xk21 / x2l. If each player has some idea, p, of the risk of disagreement, then it would be optimal for player 1 to take the risk if: Pl
Jx (x'lxy^J^x1^)
(10.4.3)
and for player 2 to take the risk if: \
,xn2)>J2(xk1,xk2i)
(10.4.4)
Then, assuming that the first concession would be made by the player who can least afford disagreement, player 1 would make a concession if: Ji(xi2,A) Ji(x\,x\i)-M*i,*2)
J2(A2,A)J2(A>Ai) JiiAi^^-JiiA.A)
(1045)
and player 1 would continue to make concessions until the inequality was reversed, at which point player 2 would start to make concessions (Bishop, 1964; Cross, 1965; Shackle, 1964). If this process converges, then at the agreement point we have: &M*i, A)ld*i + dJ2(x\,xb2)/dx2
=0
(10.4.6)
which is Pareto-optimal. Nash (1950), in his treatment of bargaining, arrives at an identical solution but from a different perspective. Nash shows what conditions the solution should satisfy rather than how the solution is arrived at. In principle, the extension of the above analysis to the dynamic setting should be straightforward (Bishop, 1964; Foldes, 1964; Rustem and Velupillai, 1979; Van der Ploeg, 1982); the difficulty is that players may have to implement their parts of the bargain at different points in time - for example, if the government agrees to pass laws favourable to trade unions in exchange for subsequent wage restraint. The fundamental difficulty with bargaining models is that the outcome produced by agreement may not be sustainable subsequently. This can happen for a number of reasons. The hierarchical structure of corporatist systems means that those who act as representatives must be able to ensure that those they represent will keep to their part of the bargain. Governments can alter, and the new government may not feel bound by the actions of its predecessors. In these circumstances representatives may have to fall back upon persuasion, or even direct coercion. Governments can sign agreements and treaties which may or may not be enforceable by international bodies or legal mechanisms. But in the end it is not always possible to have binding constraints which will always guarantee that agreements are honoured. Legal systems are the most common mechanisms for enforcing and sustaining cooperative solutions but if there is widespread dissatisfaction with the law and if the law does not accord with what people regard as fair and
212
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
reasonable then even the legal system may not work. Governments can be restrained by a written constitution as long as there are countervailing forces outside the executive. But the processes involved in achieving amendments to a written constitution so as to embody rules to govern economic policy are long and drawn-out. Difficulties would be raised by trying to define a watertight set of rules, for example to restrain budget deficits or to fix the rate of growth of some measure of the money supply. These results would need to be relevant over long periods of time, and may in practice create more problems for the conduct of economic policy than they solve. We shall now look at some of these questions in more detail. 10.4.1 Feasible policy bargains The first requirement of a negotiated cooperative solution is that it should dominate the optimal non-cooperative solution since the latter represents the best outcome either player could expect from unilateral action. In section 10.3 it was pointed out that the set of non-dominated (Pareto-efficient) solutions can be generated by minimising J = OLJ1 + ( l - a ) J 2 for ae(0,1). The private objective function values, Jf and J\, implied by this set of solutions has been plotted in figure 10.2, together with the threat-point N (the best noncooperative outcome). The shaded segment of LL represents the set of cooperative solutions which dominate that threat-point. This corresponds to the shaded area in figure 10.1. Although other points on LL offer one player an even better outcome, they require the other player to accept an outcome which is worse than that available under unilateral action. The losing player would therefore revert to his non-cooperative solution, forcing a return to N. In that case the vertical and horizontal distances, AN and BN, represent the incentives to cooperate since they represent the maximum gains which either player could secure for himself without triggering a non-cooperative response from the other. The ratio of BN to AN tells us something about the opportunity each player has for (re)distributing the cooperative gains in his direction. 10.4.2 Rational policy bargains The potential gains from cooperation must be analysed in a way which does not depend on comparisons with one particular cooperative solution picked from the feasible set AB. Any model of rational bargaining behaviour would supply a valid (non-arbitrary) point of comparison, and from there we can consider whether the policy bargains are sustainable. Indeed, to determine the value of a most likely to prevail in a cooperative agreement requires a bargaining theory which recognises the relative power of the participants. Economic theory contains several such models of bargaining behaviour, each
10.4 Bargaining strategies
213
with rather different theoretical properties. This does not mean, however, that they will all give very different estimates of a. On the other hand, once a particular a value has been selected, one can assess the distribution of the cooperative gains from the associated position on the arc AB. The standard bargaining theory is represented by Nash's cooperative equilibrium model (Nash, 1953). This model is based on the premise that the relevant measure of 'relative power' which determines the outcome of the bargaining process is given by the relative utilities at the threat-point. This is plausible since each player will be willing to bargain only in so far as he expects to get a pay-off better than that attained at the threat-point, and since both players should be willing to accept a division of the net incremental gains in a proportion directly related to the losses incurred by not making an agreement. Nash demonstrated his solution to be the only solution, (Jf, 3%), that satisfies axioms of rationality, feasibility, Pareto-optimality, independence of irrelevant alternatives, symmetry, and independence with respect to linear transformations of the set of pay-offs. Furthermore, this solution is obtained where the product of the cooperative gains from the threat-point, {J1 —JN\){J2 — JNI)I is maximised. The Nash solution has been criticised on the grounds that the independence of irrelevant axioms is inappropriate in a bargaining problem
Figure 10.2 The gains from cooperation
214
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
(Kalai and Smorodinski, 1975). No individual would agree to ignore the opportunities to improve his position should new options become available to him. If the independence of irrelevant alternatives axiom is replaced by the monotonicity in individual utility functions over possible outcomes, then a unique bargaining solution exists in which the distribution of utility gains is proportional to the maximum individual gains when the other player is no worse off than at the threat-point. The maximum individual gains which either player could make while the other maintains his threat-point outcome (these are the team solutions) are indicated by the position of points A and B in figure 10.2. The Kalai-Smorodinski solution is given by the intersection of LL and the line NI, and also corresponds to minimising the net advantage which either player can gain over his opponent through cooperation (Cao, 1982). Another possibility is to pick a as the point on the shaded segment of LL in proportion to the weighted incentives: i.e. <x = BN/(AN + BN). This is the proportional solution proposed by Kalai (1977), but expressed in terms of utility gains from / (rather than from the origin) in order to maintain independence of the scale of the utility functions. This solution follows from the axiom of 'step-by-step' negotiations, in which the point of agreement at each step is used as a threat-point in a subsequent round of negotiations to increase the utility of one or both players. Lastly, there is the possibility of picking a to maximise the sum of cooperative gains from N. Since we are minimising loss functions, this corresponds to Harsayni's (1956) suggestion, considered once again as utility gains from / rather than from the origin. It is equivalent to the usual economic-surplus welfare measure used in the theory of market stabilisation for evaluating the net benefits of policy interventions.
10.43
Relations between bargaining solutions
Some of these bargaining solutions mentioned above have also been proposed by other authors. Cao (1982) shows that the Kalai-Smorodinski approach is in fact the same as a solution due to Raiffa (1953) and that a modified version of a solution proposed by Thomson (1981) is the same as that of Harsayni. It is possible to treat these and the Nash solution as special cases in one general system of bargaining solutions defined by ma.x(v1v2) subject to (10.3.2), where v1 = (J1-J>[) + Pll-(J2-J»)]9 v2 = (J2-J?) + Pll-iJx-Ji)'] and ( J f , ^ ) is the threat-point. The Nash bargaining solution is defined by /? = 0, the Kalai-Smorodinski solution is p= 1, and Harsayni's solution is given by p = — 1. Moreover, the Nash solution always lies between the other two. This provides a framework in which the competing bargaining models could be analysed together.
10.5 Sustainable cooperation 10.5
215
SUSTAINABLE COOPERATION
10.5.1 An example The problem in designing cooperative strategies is to establish that the potential gains to each participant are worth while, and that an acceptable distribution of those gains will result. This distribution depends on the value of a. To determine what value of a is most likely to prevail in a cooperative agreement requires some bargaining theory which recognises the relative power of the participants. This is necessarily a matter of particular cases. In order to illustrate what is involved here we take an example from international policy coordination (Hughes Hallett, 1986b). This exercise extends the policy problem, mentioned earlier, of selecting a coordinated strategy between the US and the EEC to generate an end to the world recession of the mid-1970s. In this case, all four bargaining models in section 10.3.2 gave an 'optimal' a value of 0.67. In that case, whatever their theoretical 75
r
•
70 -
EEC cheats (a =
d
• EEC cheats (a
= !)
65 -
60 -
• both cheat (a = J) 55 •-
\
-a = 0.3 &
• both cheat (a = §)
^EEC
AT /4-u«»,>+ ~ ^ : . . * \
50-
\a
= 0.6y/ 0.7 « = 0.8'U! 1 cheats (a = £)
45 -I
/
^
A
40 -
35 20
,
.US cheats (a = §) ^ a - 09
— 1
i
i
i
i
i
25
30
35
40
45
50
Figure 10.3 The gains from coordinated policies between the US and the EEC
55
216
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
differences, there is no problem of selecting one of the bargaining models ahead of the others. Bargaining power is therefore distributed 2 to 1 in favour of the US. The proportional gains to cooperation, (J u s — JusV^us compared with (JEEC ~ ^EEC)AJEEO f° r t m s solution, however, turned out to be distributed roughly 2 to 1 in favour of the EEC. Moreover, the policies and objective-function values were insensitive to variations in a. Hence varying a cannot alter the small size of the gains from cooperation (compared with larger gains to 'playing the game' rather than following isolationist strategies). But larger shifts in a will change the distribution of these gains, and perhaps make them reflect bargaining powers more closely. The possibilities for cooperation in this example are summarised in figure 10.3. Cooperation in fact carries few advantages for the US, both because its gains from cooperation are small and because its share of the gains is smaller than the EEC's share unless US interests are given an implausibly high weight (0.75
10.5.2 Cheating solutions The outcomes when the US cheats and when the EEC cheats on the rationalbargaining (a = 2/3) cooperative solution are noted in table 10.1 and marked on figure 10.3. In the first case, cheating brings quite small gains to the US (3.0 units) but larger losses to the EEC (12.2 units). Meanwhile, if the EEC cheats, it gains 2.5 units while the US would lose 23.1 units. Finally, if both countries should cheat simultaneously (either to try and secure the gains, or to strike pre-emptively since this evidently minimises their potential losses), they both lose out: the US by 8.4 units and the EEC by 6.7 units. The results based on cheating from the a = 1/2 solution are not very different and are marked on figure 10.3. One interesting feature is that, in the latter case, they both remain better off than under non-cooperation if the US cheats - the US benefiting by more than it does under cooperation, although obviously this is not the case for the EEC.
10.5 Sustainable cooperation
217
Table 10.1 The performance indicators under cheating on the rationalbargaining (a = 2/3) solution Gains/losses if Cooperative (a = 2/3):
US cheats
EEC cheats
Both cheat
J*vs
44.7
J*EEC
29.1
+ 3.0 -12.2
-23.1 + 2.5
-8.4 -6.7
10.5.3
The risk of cheating
In the light of these results it seems extremely unlikely that either the US or the EEC would attempt to break any reasonable policy bargain. The potential gains from doing so are small relative to the gains from cooperation, and they are very small relative to the losses risked if the other country is provoked into retaliating in kind or into reverting to the non-cooperative solution. In fact the US risks the complete elimination of the gains available from cooperation, while the EEC risks halving its gains from cooperation. And, for either country, the losses risked are over two and a half times greater than the maximum gain achieved by either country cheating successfully. No doubt both sides would therefore regard the chance of cheating on an agreed policy programme as very low, 4 in which case cooperative policy-making could be maintained. However, the same arguments might also persuade policy-makers that the probability of being able to reach an agreement in the first place is equally low. The incentives to cooperate are small. Each country risks considerably more than it could gain and with a = 2/3 no cheating solution is sustainable in the sense of dominating the non-cooperative solution. Also the fact that each country can limit its potential losses by pre-emptive cheating is likely to increase the policy-makers' subjective probability of that happening. Interestingly it is the US which is more vulnerable here. Its losses, if the EEC cheats, are substantially more than those of the EEC if the US cheats; and, unlike the EEC, the US (if sinned against) is always worse off than under no cooperation. Risk aversion may therefore be a powerful reason for the US not to enter a cooperative agreement. The fact that these risks are weighted against the US is an inevitable consequence of possessing the power to make greater gains from competitive policy-making, since one then depends more on the honesty of others at the subsequent cooperation stage while risking more, for a smaller pay-off, than the other players.
218
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
10.5.4 Punishment schemes These cheating solutions all assume that the country sinned against sticks (initially at least) to its part of the policy bargain. They are therefore simplified solutions to show the maximum, gains and losses, and hence the scale of the risks which each country accepts when entering into a policy bargain. In practice, each country could alter its strategy in anticipation of the other cheating; and the country cheated against will certainly change its strategy in the following period to make the best of the new (non-cooperative) situation. Thus the outcomes marked on figure 10.3 will not be maintained in subsequent periods; it is likely that they will actually move towards the threatpoint N. For the same reason a situation in which both countries cheat, anticipating that the other will also do so, will lead to the non-cooperative outcome. The possibility of cheating may also explain why politically sovereign (and sophisticated) policy-makers are not easily persuaded of the superiority of cooperative solutions. In practice one needs to establish that the losses risked, if another country does cheat, are acceptably small and/or that the gains made by that other country can be wiped out by the possibility that all countries may cheat simultaneously. One could also try to develop punishment schemes, but to be credible the cheating country must be made to lose more than it gains from cheating, while the punishment country must lose less than by accepting the best of the outcomes under cheating or the non-cooperative strategies. Since administering a punishment scheme is a unilateral act, and since the other country will still have full freedom to react as it pleases so that the punishing country cannot expect a better outcome by unilateral action than from the competitive solution, it seems extremely improbable that the punishing country could find a punishment strategy which would be superior to its optimal non-cooperative outcome. This would no longer hold in a three- (or more) country game, because two countries might cooperate in punishing the third in order to secure better outcomes for themselves than under no cooperation while still inflicting damage on the third. The third country is then free to react fully to the combination of the other two, but not to them separately. Economic theory has established that, for a wide range of situations, punishment schemes do exist which can sustain a given cooperative equilibrium (in this case a solution generated from an agreed set of af values in a global objective function). However, this seems to be a case of putting the cart before the horse, because it does not specify how that equilibrium, solution should be chosen. Presumably an agreed equilibrium depends on the expected benefits inclusive of the likelihood and size of losses under punished attempts at cheating, and this means the selection of the equilibrium and of the punishment scheme must be treated as jointly dependent problems. A more
10.6 Reputational equilibria, time inconsistency
219
practical problem is one of monitoring what happens if there is disagreement over whether a country cheated, and especially where it is claimed that there has been a measurement or implementation mistake or an interim response to an unexpected shock. In the final section we turn back to some of the issues raised by singleplayer policy-making when private agents form rational expectations.
10.6
REPUTATIONAL EQUILIBRIA AND TIME INCONSISTENCY
In chapter 8 we considered the merits of the ex-ante optimal policy - that is the 'believed and carried through' policy calculated at time t. Ex post, the policymaker can do better, even in the absence of any disturbances, by reneging on the ex-ante policy. One of the strategies which Strotz (1956) recommended for the consumer facing a similar problem was one of precommitment. The problem for the macroeconomic policy-maker is that while he may announce a precommitment there is the problem that the private sector may attach little, if any, credence to the announcement. One reason may be that the public recalls being fooled in the past by similar announcements. If a particular government has a reputation for reneging, then it is going to face difficulties in convincing people of its 'true' intentions. The idea that reputational considerations are relevant both to the problem of cheating in dynamic games and to the problem of time inconsistency in rational-expectations models has recently been taken up by Kreps and Wilson (1982), Barro and Gordon (1983), Backus and Driffill (1985a,b) and Barro (1986). Essentially, because the public expects the government to act inconsistently this produces an outcome which is inferior to the ex-ante solution (Holly and Zarrop, 1983). Barro and Gordon (1983) examine whether reputational considerations can improve the position of policy-makers by making it easier to sustain the credibility of a policy. If policy-makers depart from the announced policy, they suffer a loss of reputation which alters the view the public takes of subsequent announcements. Barro and Gordon use a simple inflation-output problem in a natural-rate model. Output is determined by a Lucas supply function: where yt is output, yn the natural rate of output, nt the rate of inflation and net the expected rate of inflation. The policy-maker's objective function is to minimise the expected present value of:
Jf = " I P W K 2 -b(ntt=0
<)]
(10.6.2)
220
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
where the policy-maker dislikes inflation but likes the output benefits from inflation surprises. px =1/(1 + Kf is a discount factor. The public, on the other hand, dislikes being fooled so they maximise:
4 = t /?'[-(*,-O 2 ]
(10.6.3)
r= 0
In this simple model, the government sets the inflation rate by setting the rate of growth of the money supply. The Nash non-cooperative solution to this game, for period t = 0, is:
where the current period is independent of periods t + f, for i = 1,2,... Both the government and the public act simultaneously, taking into account how the other will act. Note that there is a superior rule that the government could adopt, which is to set the inflation rate to zero. This is the Pareto-optimal solution but there is the basic problem of how it is to be achieved. If the government acted first so that it could commit itself to a particular rate of inflation, then 7tf* = 0 is the optimal solution. On the other hand if the public acted first, and for example ne0 = 0, then the government would have an incentive to pursue a policy which raised yt above yn. The time-inconsistent solution5 is then net = b/a. Barro and Gordon then consider the time-inconsistent case and what happens if the government suffers a loss of reputation for cheating. They assume that, if the government reneges on its previously announced policy, the public expects inflation to continue for one more period whatever the government announces. If in the next period the government keeps to its announcement, then the public is assumed to believe the subsequent announcement. This 'binary' mechanism takes the form: < + ! = <+ j
if nt = net
net+1 = n*+! = (bt + l)/a
if nt ^ nf
where nat +1 is the announced policy. If the government cheats in period t, then it knows that it will be 'punished' in period t + 1 when the public will assume that the government will pursue the same policy as in period t. The government faces the temptation to cheat in period t when the public assumes pt = 0, because the pay-off is l/2b2/a, but knows that this will result in a loss of reputation in period t + 1, the expected cost of which is p(l/2b2/a). Since the discount factor is strictly less than 1, the temptation to cheat will be greater than the expected cost of cheating. Barro and Gordon seek policy rules for which the cost of reneging in the form of a loss of reputation is at least as
10.6 Reputational equilibria, time inconsistency
221
great as the pay-off from cheating. They find that there is a range of announced inflation rates that qualify as equilibria, in the sense that the government can announce one of them and it will be believed. These lie in the interval:
They attempt to fix on a particular outcome by choosing a value of n which minimises the present expected value of (10.7.2), which is:
This is still a non-cooperative solution but it does yield a lower inflation rate than would be the outcome if there were no punishment although a higher inflation rate than in the cooperative solution. 10.6.1 Asymmetric information and uncertainty One of the drawbacks to Barro and Gordon's model is that the binary expectational mechanism is very restricted. The public is assumed to have a very short memory. An alternative procedure might be to have a 'forget' factor in past misdemeanours so that the public partly recalls cheating in periods prior to that immediately before. Another possibility has been examined by Backus and Driffill (1985a). They use the sequential-equilibria notions of Kreps and Wilson (1982) to provide a number of insights into how for a finite horizon - for example, the life of a government - a sequence of games is played in which the public form probabilities of what the government will do, given what is has done previously. This allows us to analyse the effects of both one-sided (Backus and Driffill, 1985a) and two-sided (Backus and Driffill, 1985b) uncertainty. With onesided uncertainty the private sector does not know for sure what kind of government it is which it faces. Two types of government are assumed to exist. The first is 'hard-nosed' and always follows a policy of zero inflation. In terms of the kinds of model we have considered in previous chapters this type of government is one which will always adhere to the ex-ante optimal policy. The second type of government is 'weak' and will be tempted to inflate, or in terms of previous chapters will be tempted to pursue the ex-post, 'cheating', optimal policy. Because the private sector does not know for sure whether the government is weak or hard-nosed, there is an incentive for the weak government to pretend to be hard-nosed in order to keep the private sector guessing. The government, then, has an incentive to manipulate its reputation. Suppose that the government's reputation is measured by at, the private sector's subjective probability in time period t that the government is hardnosed. Let pt be the probability, in period f, that a weak government will
222
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
pretend to be strong, conditional on having pretended to be strong in the past, for if the weak government has cheated in the past then pt must be zero. In the model of Barro (1986) there are two possible outcomes for inflation in period f, nt — 0 or nt = nt (the time-inconsistent, cheating policy). The first occurs if the government is hard-nosed (probability ar), and the second if the government is weak (with probability 1 — af) but pretending to be hard-nosed (with conditional probabiltiy 1 — pt). The weak government sets the probability, p t , that it will cheat while the private sector has a perception, denoted by pn of what pt is. Expected inflation for period t is given by: < = 7it(l-o0(l-pt)
(10.6.4)
where this is the best forecast of inflation, given ar and pt. If the government behaves in period t the private sector revises upwards the probability that the policy-maker is hard-nosed. This revision is, according to Bayes' law: ar + j = Prob(hard-nosedjnt, nt _ x,... = 0) l j i , , ^ . ! , . . . = 0) Prob(7i, = 0|hard-nosed)
= aflaf + (l-a t )p,
(10.6.5)
Thus the probability that the government is hard-nosed is equal to the probability in the previous period that the government is hard-nosed, given that it has not cheated previously, multiplied by the probability that it will not cheat, given that it is hard-nosed (which is unity). This is divided by the probability that it will not cheat, given that it has not cheated in the past, which is the probability that it is hard-nosed (ar) plus the probability that it is weak (1 — a,) multiplied by the subjective probability that the government will pretend to be hard-nosed, given that it is weak. As long as <xt is greater than zero (and less than one) and pt lies between zero and one, then at + 1 >OLV Therefore, if the government does not cheat, whether it is weak or hard-nosed, the probability that the government is hard-nosed will increase. Thus the weak government can establish a reputation as hardnosed by not cheating. The hard-nosed government never cheats or reneges and always adheres to the ex-ante optimal policy. The weak government is tempted to renege so the question is: how do the incentives to cheat interact with the evolution of a weak government's reputation for toughness to produce an actual policy sequence, and is this time-consistent? Or, in other words, under what circumstances will a weak government have no incentives to cheat? Barro (1986) considers a more general loss function than (10.6.2), which is:
J, = J(nt,n,-nf)
(10.6.6)
with a — b = 1. Consider the final period of a finite-horizon game. Because
10.6 Reputational equilibria, time inconsistency
223
there are no losses to be taken into account after the terminal period, a weak government will renege while the hard-nosed government will still adhere to the ex-ante optimal policy. Let Vt(<xt) be the minimised expected present value of the losses from period t to period T, E{Jt + Jt + l/(l + r)+ . . . + JT/(l + r) r ~ r ], conditional on the government not having cheated in the past. The current reputation for being hard-nosed is measured by ar. The expected cost V(oit) can be decomposed into the expected cost for period r, E(Jt), plus the expected present discounted value of costs in period t + 1. With probability pt, the government does not cheat, which generates a reputation for the next period of (xt + 1. The minimised present discounted cost at period t + 1 is Vt + 1(cct + 1). With probability (1 — pt), the government cheats, in which case in all subsequent periods its reputation falls to zero so that the cost is that associated with the ex-post optimal policy; denote this as J. The present value of this cost for periods t + 1 to T is (J/r)[l - 1/(1 + r)T~*]. We can therefore write an expression for Vt(at) as: Vt(oit) = E(Jt) + 1/(1 + r)PtVt + 1(0Lt + 1) + (1 - p,)(J/r)[l - 1/(1 + r)r"<] (10.6.7) where at + 1 is given by updating expression for the probability that the government is hard-nosed in equation (10.6.5). The expected loss for period r, E(Jt), is the weighted sum of the cost of not cheating (nt = 0) and the cost of reneging: E(Jt) = PtJ(09 -nf) + (1 - pt)J(nt, nt - nf)
(10.6.8)
where nf = (1 -a f )(l - pt). The government which is tempted to renege has to choose the probability that it will renege, p t , in period t in order to minimise the expected present discounted value of costs. Note that the perception that the private sector has of what the probability is that the government will pretend to be hard-nosed even though it is weak does not have to be the same as that which the government sets, i.e. pt ^ pv Given that pt is independently determined by the private sector, Vt + 1(at + 1)is independent of pt so Vt(cct) is a linear function of pt. Differentiating Vt with respect to p r , conditional on pt and at, gives: dVt(zt)/dpt = - J(0, -nf)
+ J(nt9 nt - nf) + 1/(1 + r)Vt + 1(oct + 1)
- ( J / r ) [ l - l / ( l + r) r -<]
(10.6.9)
Using this expression and equation (10.6.7) to substitute for Ftand Vt + 1 gives: J ( 0 , -nf) - J(nt, nt - nf) = 1/(1 + r)[J - J(nt + l,nt
+l
- <+1)]
(10.6.10)
This equilibrium condition demonstrates that in any period the temptation to cheat, measured by the terms on the left-hand side, must be equal to the present discounted costs of cheating, which are measured by the terms on the right-hand side. As long as the costs of cheating are higher than the benefits of
224
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
cheating, the weak government will continue to pretend to be strong. Once it becomes worth while to cheat, the weak government no longer masquerades and is then revealed as weak. The upshot of their analysis is that the addition of reputation to rationalexpectations models makes it possible to design credible, time-consistent policies where the government will not be tempted to renege because the benefits of reneging are finely balanced against the costs of a loss of reputation.
10.6.2 Generalisations of reputational effects One of the drawbacks of the analysis of the previous section is that the model is too simple. Nevertheless, the simplicity of the model means that fairly strong conclusions can be drawn. It would be useful if the analysis could be extended to more general and realistic models. Consider the linear rational-expectations model employed in previous chapters: (ignoring exogenous influences). Previously we considered two types of policy sequence. First, there is an ex-ante optimal policy, which is equivalent to the policy a hard-nosed government should follow. Secondly, there is the ex-post optimal policy, which is the policy the weak government wishes to pursue but can only do so if it can achieve the perfect-cheating solution. Once the weak government cheats, the question is: what policy does the private sector 'expect' will be pursued in the future? In the Barro model this is a 'discretionary' policy, implying a higher rate of inflation. In a more general framework we can assume that the private sector regards the 'discretionary' policy as that which implies attempts by the 'weak' government to pursue the ex-post policy which are thwarted by the private sector. Consider, first, the situation before the 'weak' government is revealed. If the private sector is uncertain about what kind of government it faces, then in time period t the expectation it forms is: )f + i|i = a^° + i|r + ( l - a r ) ( l - P i W + i i i
(10.6.11)
when yf+ ,|, is the expectation of the ex-ante optimal policy, and yc+ ^ is the expost, time-inconsistent policy. The first difficulty is that yr°+1|r is the policy which is optimal only if a r = 1. In the Barro model it is assumed that the hard-nosed government is only interested in zero inflation and ignores the possibility that it may not be believed. Strictly speaking, if the hard-nosed government wishes to act optimally, then it should take into account the possibility that the private sector does not believe it will stick to the ex-ante policy with probability one.
10.6 Reputational equilibria, time inconsistency
225
If we ignore this complication for the moment, we can envisage a situation in which the hard-nosed government announces a policy sequence y?+1\t, for i = 1 , . . . , T— 1, and sticks to it. Over time, as y?+i\t is implemented, and as yf+i\t evolves according to the Bayesian learning procedure, eventually yf+i\t will converge on y?+i\t, for some value of i. So eventually the private sector will be certain that it faces a hard-nosed government and the full benefits of the exante optimal policy will be available. The weak government faces a more difficult calculation if it wishes to masquerade as a 'hard-nosed' government in order to build up a reputation, but then to spend the reputation on a (one-period) surprise switch to the expost policy. One way in which this can be done is to calculate a sequence of policy actions for t = i, at each i determining whether the gains from reneging (achievable only in one period) outweigh the losses associated with not being believed in subsequent periods. The weak government will renege in any period i if and only if: Jt(yf,f-^,t)>Jt
+1
( ^ + 1 , , - ^ + i|r)
(10.6.12)
where }^ + 1|r is the (ex-ante, suboptimal) policy which the private sector assumes a weak government will pursue, once it is revealed as such. J T (.) refers to the reduction in losses associated with single-period surprise cheating, while Jx +1 (.) is the increase in losses for t = T + 1 to T as the weak government is no longer able to masquerade as a hard-nosed government. As is to be expected, when T = T the weak government always cheats, since there are no losses in periods beyond T to be considered. (For further discussion, see Levine and Holly, 1988.) 10.6.3 Conclusions Introducing reputation into rational-expectations models, when there is uncertainty about the kind of government which the private sector faces, does help to overcome some of the potential indeterminacies that can arise when there are ex-post incentives to renege on previous commitments. However, there are drawbacks to the kind of analysis which the Kreps and Wilson sequential-equilibrium model suggests. Rarely is the distinction between 'weak' and 'hard-nosed' governments as clear-cut as is assumed. Also the elements of a government can change over time as key members are replaced or as new policy initiatives are followed. Also the permanent loss of reputation which attaches to a weak government that reneges is unrealistic. There should be the opportunity for a weak government to rebuild its lost reputation, especially as the analysis of the previous section suggests that the weak government will be better off if it does behave subsequently. There is also the point that there may be circumstances in which a strong government may want to waver from the straight and narrow because of
226
10 INCOMPLETE INFORMATION AND SOCIAL OPTIMA
exceptional conditions which make the ex-ante optimal policy no longer appropriate. The private sector may understand this and not penalise the government by assuming that it will act in the same way in more normal circumstances. This suggests that there are bounds within which the private sector expects the government to act consistently, while recognising that in certain circumstances inconsistent behaviour may be appropriate. Empirical studies may help to throw more light on the extent to which time-inconsistent behaviour is worth while from the point of view of the weak government, given the costs associated with a loss of reputation. It may be, in practice, that there will be strong incentives for the weak government to stick to the ex-ante policy.
Notes 1
INTRODUCTION
1 Many of these are discussed in detail in the Report of the Committee on Policy Optimisation (1978). 2 Though values for policy instruments can be inferred by substitution. 3 Or else used to distinguish adaptive control from ordinary control, as in BarShalom and Tse (1976). 2
THE THEORY OF ECONOMIC POLICY AND THE LINEAR MODEL
1 We have assumed that r>s in (2.1.4) so that Bt = 0 for s
3
OPTIMAL-POLICY DESIGN
1 In this context any 'open-loop' system is one in which x is not a function of >'. When x is a function of y we have a closed-loop system. A more general definition than this will be employed later. 2 Nt + QtB would then not be of full rank. 3 For example, the world oil price is an important exogenous influence on the economic activity of national economies. Reducing predictions of its level to, say, an ARIMA process may throw away more accurate predictions which can be obtained from information about the intentions of OPEC and the state of world demand. 4 Ao is the endogenous-variable coefficient matrix. For purposes of notational simplicity only, the analysis is done throughout for the case where all endogenous variables are potential targets. In practice this is seldom the case. 4
UNCERTAINTY AND RISK
1 E.g. Arrow (1970), Machina (1982), Whittle (1982), Hughes Hallett (1984a,b), Van der Ploeg (1984). 2 The definition of o implies that the additive uncertainties here are based on the model's composite forecasting errors reflecting specification, estimation and
228
NOTES
information errors. Extensions to the usual choice criterion reduces the risks from each source. However, making the multiplicative uncertainties explicit is an obvious way to extend this analysis. 3 Hughes Hallett (1984b, c). These utility associations have coefficients of risk aversion increasing in the risk-aversion parameters, a in (4.6.1). 4 A possible disadvantage of this optimisation procedure is that the decisions do not follow from an explicit feedback rule. Instead dynamic decisions are obtained numerically, by reoptimising (4.4.3) or (4.6.2), conditional on each subsequent information set Q,, for r > 2 . Of course this does not permit us to show how the eigenvalues of the controlled system vary as a function of a. But in chapter 5 we show how the implicit penalties on targets increase with a, and how those penalties allow for asymmetrically distributed shocks. The latter hold in general, whereas strictly speaking the eigenvalues analysis holds only for normally distributed shocks and decision procedures which ignore intertemporal stochastic inseparabilities. 5
RISK AVERSION, PRIORITIES AND ACHIEVEMENTS
1 This point is discussed in some detail in Hughes Hallett (1979). 2 Pratt, 1964; Arrow, 1965. 3 From (5.2.2) and (3.4.1), the optimal decisions can be derived from min\J — n'(HSZ — 6)] where H = (I:—R) and JLI is a vector of Lagrange -Q-lH{H'Q-lH')b. multipliers. The solution is Z* = Zd 4 I'|| denotes the ordinary Euclidian norm, indicating distance. 5 In addition to this heuristic argument, the utility interpretation of J o will be established formally in section 5.6. 6 However, there may be difficulties associated with responding to risk by loosening up on the penalties on instruments because of the possibility of instrument instability. 7 The unconstrained optimum of (5.4.1) - in this case for a = 1 (pure risk aversion). 8 Based on Pratt (1964) and Arrow (1970). 9 Subject to the restriction dt > 0 to maintain the objective function as convex over the decision space (see Hughes Hallett, 1984a). 6
NON-LINEAR OPTIMAL CONTROL
1 In a variant which combines both (i) and (ii), Kim, Goreaux and Kendrick (1975) first optimise the non-linear model deterministically and then linearise the model about the derived path and calculate a closed-loop rule for the linearisation. 2 See chapter 4 for the case of multiplicative disturbances. 3 See Corker, Ellis and Holly (1986) and Fisher and Salmon (1986). 4 If reductions in cost are used as a criterion, the optimum may well be located. 7
THE LINEAR RATIONAL-EXPECTATIONS MODEL
1 Where expectations of policy instruments of exogenous variables appear directly in structural equations, it is assumed that identities are defined so that all expectations refer to endogenous variables. We treat 'rational expectations' as synonymous with an unbiased conditional expectation. Since (7.3.1) is essentially
NOTES
2 3 4 5 6 7
8
229
a second-order difference equation, ordinary expectations will not generally produce minimum-variance forecasts, etc. (see section 7.4.4). If Do T* 0, neither I —Do nor A(I — DQ)'1 may have unit roots. Assuming, but only for simplicity, that expectations of future exogenous events are not subsequently revised: u ^ = u}\x for j = k + 1 , . . . , T. It is understood that yff = y0 and yf(5i =.Vt + i|r f ° r a ^ s ( o r s = k) values. A similar expression, but with additional elements, is obtained in the multipleshooting case. Turnovsky, 1977, p. 333. Naturally infinite coefficients in (7.8.5)-(7.8.8) are ruled out so that y ^ O is guaranteed. POLICY DESIGN FOR RATIONAL-EXPECTATIONS MODELS
1 The penalty-function method was originally applied to non-linear rationalexpectations models, which involves the application of the iterative techniques discussed in chapter 6 (see Holly and Zarrop, 1983). 2 Any problem of intertemporal optimisation, involving investment, factor supplies, sales, inventory management, etc., could have been considered. 3 Many civil contracts can be interpreted as means by which people enter into commitments for a future date. When the future date actually arrives, the commitment may be regretted, and there may be a temptation to renege. The contract prevents this, or at least makes it a costly choice. 4 This is a 'strategy of consistent planning' originally suggested by Strotz (1956) and discussed in section 8.6 above. 9
NON-COOPERATIVE, FULL-INFORMATION DYNAMIC GAMES
1 Further examples of the dynamic game approach to economic-policy analysis are contained in Plasmans and de Zeeuw (1978), Johansen (1982), Hughes Hallett (1984b), Miller and Salmon (1985), Currie and Levine (1985) and Holly (1986). 2 Another important example of this would be decentralised industrial planning within a centrally planned economy. Naturally one would hope for coordinated plans here, but oligarchic behaviour is quite common and one needs to know the costs of uncoordinated actions. 3 Once again we would be interested in the consequences of strategic behaviour and of different information patterns just as much as in the possibilities for coordination. 4 If there is bargaining within a single country, perhaps between the representatives of employers and employees and the government, the success of any agreement may depend critically upon how the rest of the private sector - especially in financial markets - perceives the outcome of the bargain. 5 This is also referred to as a repeated game. 6 For a further discussion of these Stackelberg strategies, see De Zeeuw (1984). 7 Alternatively, we can attribute the delayed impact of x2t to informational lags, so the second player cannot act until the leader has acted. 8 The conjectural-variations method is due to Frisch (1951), although it was first suggested by Bowley (1924). In common with other authors (e.g. Bresnahan,
230
9
10 11 12 13
10
NOTES 1981; Ulph, 1983; Holt, 1985), we consider only first-order (linear) conjectures here. Brandsma and Hughes Hallett (1984a) extend the method for any number of non-overlapping targets and decision periods, and or asymmetric decisionmakers under uncertainty. In his technical summary, Basar (1986) argues that higher-order conjectures should be included in Frisch (1951); but as yet no solution technique exists for that case. However, the Pareto-improving property, introduced below, overcomes the shortcoming of using just first-order conjectures. See Aubin (1979). Bresnahan (1981) shows that a 'rational' conjectural equilibrium exists in linear-quadratic problems, although, if the player's policy responses are asymmetric, there is no proof that the corresponding reaction functions are real-valued. If that should happen, it seems reasonable to stop the iteration (9.5.3) at the real-valued solution nearest to the consistent-conjectures equilibrium such that no player is worse off than he would be with any other conjecture (including zero conjectures). This problem apart, the purpose of (9.5.3) is to ensure that the computed policy responses dX{i)/dXij) actually equal the values conjectured for them. Uniqueness is more difficult. The open-loop Nash equilibrium is unique in the linear-quadratic case (Aubin, 1979); and it seems that the conjectural-variations (Nash) equilibrium may be unique for the same reasons since there is just one information set, subsuming all elements in c(1) and c(2). However, the 'downhill'-directed search, introduced next, will in any case locate that joint equilibrium. This is not to suggest that the convergence of (9.5.3), on its own, will be unique. The necessary steps, which are quite complicated and involve various adjustments to Gj?, Qf, Ni9 d(l\ are summarisedin Hughes Hallett (1984d). Reneging on the assumption that the other player is also going to do the same may lead to an infinite regression. The notation of Canzoneri and Gray is used; the only change is that the objectives have been simplified by setting \i = 0. Conjectural variations are used only when they can improve on the open-loop Nash solution. Hence the expression in (9.7.3) will be zero if each player can block the other (piP* —p2P*)- The same happens when p2^p\ (so domestic action increases unfavourable reactions from others - an 'own goal') or when player 1 has no policy problem (i.e. p* — 0). INCOMPLETE INFORMATION, BARGAINING AND SOCIAL OPTIMA
1 Rhodes and Luenberger (1969); Basar (1977). 2 However, the analogy cannot be pushed too far since, as we pointed out in chapter 9, the relationship between a macroeconomic policy-maker and the private sector cannot be treated as a 'game' if there are no strategic interdependencies. 3 The backdating of pay awards in many collective agreements means that only if there is a strike will employees lose wages. 4 This certainly seems to have been the conclusion of the policy-makers responsible for drafting the cooperative programme agreed at the 1978 Bonn Summit (Putnam and Henning, 1986). 5 The 'perfect-cheating' solution of Hamalainen (1979) does not exist for this particular formulation.
References Ancot, J. P., A. J. Hughes Hallett and J. Paelinck, 1982. The determination of implicit preferences: two possible approaches compared. European Economic Review 18: 267-89 Anderson, P. A., 1979. Rational expectations forecasts from nonrational models. Journal of Monetary Economics 5: 67-80 Aoki, M., 1976. Optimal Control and System Theory in Dynamic Economic Analysis. Amsterdam: North Holland Aoki, M., 1977. Interaction among economic agents under imperfect information: an example. Presented at the CISM-UNESCO Seminar on New Trends in Dynamic System Theory and Economics, Udine (September) Aoki, M. and M. Canzoneri, 1979. Reduced forms of rational expectations models. Quarterly Journal of Economics 93 (Feb.): 59-71 Arrow, K. J., 1965. The theory of risk aversion. Lecture 2 in Aspects of the Theory of Risk-bearing, Yrjo Jahnsson Lectures, Helsinki Arrow, K. J., 1970. Essays in the Theory of Risk Bearing. Amsterdam: North Holland Athans, M., E. Kuh, L. Papademos, R. Pindyck, T. Ozan and K. Wall, 1976. Sequential open loop optimal control of a nonlinear macroeconomic model. In M. D. Intriligator (ed.), Frontiers of Quantitative Economics, vol. IIIA. Amsterdam: North Holland Aubin, J. P., 1979. Mathematical Methods of Game and Economic Theory. Amsterdam: North Holland Avriel, M., 1976. Nonlinear Programming: Analysis and Methods. Englewood Cliffs, NJ: Prentice-Hall Backus, D. and J. Driffill, 1985a. Inflation and reputation. American Economic Review 75: 530-8 Backus, D. and J. Driffill, J. 1985b. Rational expectations and policy credibility following a change in regime. Review of Economic Studies 52: 21-22 Barnett, S., 1975. Introduction to Mathematical Control Theory. Oxford: Clarendon Press Barro, R., 1986. Reputation in a model of monetary policy with incomplete information. Journal of Monetary Economics 17: 101-22 Barro, R. and D. Gordon, 1983. Rules, discretion and reputation in a model of monetary policy. Journal of Monetary Economics 12: 101-21 Bar-Shalom, Y. and E. Tse, 1976. Caution, probing and the value of information in the control of uncertain systems. Annals of Economic and Social Measurement 5(2): 327-37
232
REFERENCES
Basar, R. A., 1986. Tutorial on dynamic and differential games. In T. Basar (ed.), Dynamic Games and Applications in Economics. Berlin and New York: Springer Verlag Basar, T., 1977. Informationally nonunique equilibrium solutions in differential games. Journal of Control and Optimisation 18: 636-60 Basar, T. and G. J. Olsder, 1982. Dynamic Noncooperative Game Theory. New York: Academic Press Baumol, W. J., 1961. Pitfalls in contracyclical policies: some tools and results. Review of Economics and Statistics 43: 21-6 Begg, D. K. H., 1981. Rational expectations and macrodynamics: some recent results. Mimeo, Oxford University Bellman, R., 1957. Dynamic Programming. Princeton University Press Bellman, R., 1961. Adaptive Control Processes: a Guided Tour. Princeton University Press Bishop, R. L., 1964. Zeuthen-Hicks' theory of bargaining. Econometrica 32: 410-77 Black, J., 1975. A dynamic model of the quantity theory. In M. Parkin and A. Nobay (eds.), Current Economic Problems: Proceedings of the AUTE Annual Conference. Cambridge University Press Blanchard, O. J., 1979. Backward and forward solutions for economies with rational expectations. American Economic Review 69(2) (May): 114-18 Blanchard, O. J. and C. M. Kahn, 1980. The solution of linear difference models under rational expectations. Econometrica 48(5): 1305-11 Bowley, A. L., 1924. The Mathematical Groundwork of Economics. Oxford University Press Brainard, W., 1967. Uncertainty and the effectiveness of policy. American Economic Review 52 (May): 411-25 Brandsma, A. S. and A. J. Hughes Hallett, 1984a. Economic conflict and the solution of dynamic games. European Economic Review 13: 309-32 Brandsma, A. S. and A. J. Hughes,.Hallett, 1984b. Noncausalities and time inconsistency in dynamic noncooperative games: the problem revisited. Economic Letters 14(2/3): 123-30 Bresnahan, T. F., 1981. Duopoly models with consistent conjectures. American Economic Review 11: 934—45 Brogan, W. L., 1974. Modern Control Theory. New York: Quantum Publishers Inc. Brown, B. W. and R. J. Mariano, 1981. Ex-ante measures of non stochastic prediction bias in nonlinear simultaneous systems. Presented at the European Econometric Society Meeting, Pisa (September) Buiter, W. H., 1981a. The superiority of contingent rules over fixed rules in models with rational expectations. Economic Journal 19 (Sept.): 647-70 Buiter, W. H., 198 lb. Saddlepoint problems in the rational expectations models when there may be 'too many' stable roots: a general method and some macroeconomic examples. Discussion Paper no. 105/81, Bristol University (June) Buiter, W. H. and M. Gersovitz, 1981. Issues in controllability and the theory of economic policy. Journal of Public Economics 15: 33^43 Buiter, W. H. and Gersovitz, M., 1984. Controllability and the theory of economic policy: a further note. Journal of Public Economics 24: 127-9 Calvo, G. A., 1978. On the time inconsistency of optimal policy in a monetary economy. Econometrica 46: 1411-28
REFERENCES
233
Canzoneri, M. and J. A. Gray, 1985. Monetary policy games and the consequences of noncooperative behaviour. International Economic Review 26: 547-64 Canzoneri, M. B. and P. Minford, 1986. When international policy coordination matters: an empirical analysis. Discussion Paper no. 119, Centre for Economic Policy Research, London Cao, X., 1982. Preference functions and bargaining solutions. Proceedings of the 21st IEEE Conference on Decision and Control Chow, G. C , 1975. Analysis and Control of Dynamic Economic Systems. New York: John Wiley and Sons, Inc. 1976. The control of non-linear econometric systems with unknown parameters. Econometrica 44: 685-95 1980. Econometric policy evaluation and optimisation under rational expectations. Journal of Economic Dynamics and Control (March): 47-59 1981. Econometric Analysis by Control Methods. New York: John Wiley and Sons Cobb, D., 1981. Feedback and pole placement in descriptor variable systems. International Journal of Control 33(6): 1135-46 Cooper, J. P. and S. Fischer, 1972. Stochastic simulations of monetary rules in two macroeconomic models. Journal of the American Statistical Association 67: 75060 Corden, W. N. and S. J. Turnovsky, 1983. Negative transmission of economic expansion. European Economic Review 20: 289-310 Corker, R. J., R. G. Ellis and S. Holly, 1986. Uncertainty and forecast precision. International Journal of Forecasting 2: 53-69 Craine, R., 1979. Optimal monetary policy with uncertainty. Journal of Economic Dynamics and Control Cross, J. G., 1965. A theory of the bargaining process. American Economic Review: 6794 Currie, D. and P. L. Levine, 1985. Macroeconomic policy design in an interdependent world. In W. H. Buiter and R. C. Marston (eds.), International Economic Policy Coordination. Cambridge University Press Da Cunha, N. and E. Polak, 1967. Constrained minimisation of vector-values criteria in finite dimensional spaces. Journal of Mathematical Analysis and Applications 19: 103-24 De Zeeuw, 1984. Difference games and linked econometric policy models. Ph.D. dissertation, University of Tilburg Dorf, R. C , 1970. Modern Control Systems. New York: Addison-Wesley Publishing Co. Dow, J. C. R., 1964. The Management of the British Economy, 1945-1960. Cambridge University Press Driffill, E. J., 1980. Time inconsistency and 'rule vs. discretion' in macroeconomic models with rational expectations. Mimeo, Southampton University (December) Driffill, E. J., 1982. Optimal money and exchange rate policies. Mimeo, Department of Economics, Southampton University Duncan, G. T., 1977. A matrix measure of multivariate local risk aversion. Econometrica 45: 895-903 Edgeworth, F. Y., 1981. Mathematical Psychics. London: C. Kegan Paul Elgerd, D. I., 1967. Control Systems Theory. New York: McGraw Hill Book Company Fair, R. C , 1979. An analysis of a macroeconomic model with rational expectations in the bond and stock markets. American Economic Review (Feb.)
234
REFERENCES
Fair, R. C. and J. B. Taylor, 1983. Solution and maximum likelihood estimation of dynamic nonlinear rational expectations models. Econometrica 51: 1169-86 Feldstein, M., 1983. The world economy today. Economist, 11 June Fischer, S., 1979. Anticipation and the non-neutrality of money. Journal of Political Economy 87'(2): 225-52 Fischer, S., 1980. Dynamic inconsistency, cooperation and the benevolent dissembling government. Journal of Economic Dynamics and Control 2: 93-107 Fisher, P. G., S. Holly and A. J. Hughes Hallett, 1986. Efficient solution techniques for dynamic rational expectations models. Journal of Economic Dynamics and Control Fisher, P. G. and M. Salmon, 1986. On evaluating the importance of nonlinearity in large macro-economic models. International Economic Review 27: 625^46 Foldes, L., 1964. A determinate model of bilateral monopoly. Economica 122: 117-31 Frankel, J. A. and K. Rockett, 1986. International macroeconomic policy coordination when policy makers disagree on the model. Mimeo, University of California, Berkeley (revised September) Friedman, J. W., 1977. Oligopoly and the Theory of Games. Amsterdam: North Holland Friedman, M., 1968. The role of monetary policy. American Economic Review 58 (March): 1-17 Frisch, R., 1951. Monopoly - polypoly - the concept of force in the economy. International Economic Papers 1: 23-36 Gantmacher, F. R., 1959. The Theory of Matrices. London: Chelsea Publishing Company Gill, P. E. and W. Murray, 1976. Quasi-Newton methods for unconstrained optimisation, Mathematics Report No. 97, National Physical Laboratory (April) Hall, S., 1985. On the solution of large economic models with consistent expectations. Bulletin of Economic Research (May): 157-61 Hamada, K., 1974. Alternative exchange rate systems and the interdependence of monetary policies. In R. Z. Aliber (ed.), National Monetary Policies and the International Financial System. University of Chicago Press Hamada, K., 1976. A strategic analysis of monetary interdependencies. Journal of Political Economy 84: Hamada, K. and M. Sakuri, 1978. International transmission of stagflation under fixed and flexible exchange rates. Journal of Political Economy 86: 877-95 Hamalainen, R. P., 1979. On the cheating problem in Stackelberg games. Systems Theory Laboratory Report B.49, Helsinki University of Technology Hannan, E. J., 1971. The identification problem for multiple equation systems with moving average errors. Econometrica 39: 751-65 Harsayni, J. C , 1956. Approaches to the bargaining problem before and after the theory of games: a critical discussion of Zeuthen's, Hicks' and Nash's theories. Econometrica 24: 144-57 Harvey, A. C , 1981. Econometrica Analysis of Time Series. London: Phillip Allan Havenner, A. and R. Craine, 1981. Estimation analogies in control. Journal of the American Statistical Association 76: 850—9 Holbrook, R. S., 1974. A practical method for controlling a large nonlinear stochastic system. American Economic Review 3(1): 155-75 Holly, S., 1986. Games, expectations and optimal policy for open economies. Journal of Economic Dynamics and Control 10: 45-9
REFERENCES
235
Holly, S. and M. Beenstock, 1980. The implications of rational expectations for the forecasting and simulation of econometric models. Centre for Economic Forecasting Discussion Paper no. 72, London Business School (March) Holly, S. and B. Corker, 1984. Optimal feedback and feedforward stabilisation of exchange rates, money, prices and output under rational expectations. In A. J. Hughes Hallett (ed.), Applied Decision Analysis. Dordrecht: Kluwer Holly, S. and M. B. Zarrop, 1983. On optimality and time consistency when expectations are rational. European Economic Review, 1, February Holly, S., B. Rustem, J. H. Westcott, M. B. Zarrop and R. Becker, 1979. Control exercises with a small linear model of the UK economy. In S. Holly, B. Rustem and M. B. Zarrop (eds.), Optimal Control for Econometric Models: an Approach to Economic Policy Formulation. London: Macmillan Holly, S., M. B. Zarrop and J. H. Westcott, 1978. Near-optimal stabilisation policies with large nonlinear economic models: an application to the LBS econometric model of the UK economy. Programme for Research on Economic Methods, Discussion Paper no. 23, Imperial College, London Holt, C. A., 1985. An experimental test of the consistent conjectures hypothesis. American Economic Review 75: 314—25 Holtham, G. H. and A. J. Hughes Hallett, 1987. International policy cooperation and model uncertainty. In R. Bryant and R. Portes (eds.), Global Macro economies: Policy Conflict and Cooperation. London: Macmillan Howrey, E. P., 1967. Stabilisation policy in a linear stochastic system. Review of Economics and Statistics 49: 404-11 Howrey, E. P., 1971. Stochastic properties of the Klein-Goldberger model. Econometrica 39: 73—87 Hughes Hallett, A. J., 1979. The sensitivity of optimal policies to parametric and stochastic changes. In S. Holly, B. Rustem and M. B. Zarrop (eds.), Optimal Control for Econometric Models: an Approach to Economic Policy Formulation. London: Macmillan 1981. Measuring the importance of stochastic control in the design of economic policies. Economic Letters 8: 341-7 1984a. The stochastic interdependence of intertemporal risk sensitive decisions. International Journal of Systems Science 15: 1301-10 1984b. Optimal stockpiling in a high risk commodity market: the case of copper. Journal of Economic Dynamics and Control 8: 211-38 1984c. On alternative methods of generating risk sensitive decision rules. Economic Letters 16: 37-44 1985. Techniques which accelerate the convergence of first order iterations automatically. Linear Algebra and Applications 15: 309-18 1986a. Autonomy and the choice of policy in asymmetrically dependent economies. Oxford Economic Papers 38: 516-44 1986b. International policy design and the sustainability of policy bargains. Journal of Economic Dynamics and Control 1987. The impact of interdependence on economic policy design: the case of the US, EEC and Japan. Economic Modelling Hughes Hallett, A. J. and J. P. Ancot, 1982. Estimating revealed preferences in models of planning behaviour. In J . H . P . Paelinck (ed.), Quantitative and Qualitative Mathematical Economics. Boston and The Hague: Martinus Nijhoff Hughes Hallett, A. J. and H. J. B. Rees, 1983. Quantitative Economic Policies and Interactive Planning. Cambridge University Press
236
REFERENCES
Isaacs, R., 1965. Differential Games. New York: Wiley Johansen, L., 1980. Parametric certainty equivalence procedures in decision making under uncertainty. Zeitschrift fur Nationalokonomie 40: 257-80 Johansen, L., 1982. The possibility of international equilibrium at low levels of activity. Journal of International Economics 13: 257-65 Kalai, E., 1977. Proportional solutions to bargaining situations: interpersonal utility comparisons. Econometrica 45: 1623-30 Kalai, E. and M. Smorodinski, 1975. Other solutions to Nash's bargaining problem. Econometrica 43: 513-18 Kalchbrenner, J. W. and P. A. Tinsley, 1980. On filtering auxiliary information in short-run monetary policy. In K. Brenner and K. Meltzer (eds.), Optimal Policies, Control Theory and Technology Exports. New York: New Holland Kalman, R. E., 1961. On the general theory of control systems. In Proceedings of the 1st International Congress on Automatic Control, Moscow, 1960, vol. 1. Butterworth Scientific Publications, p. 481 Kalman, R. E., 1963. Mathematical description of linear dynamical systems. Journal of Control and Optimisation 1: 152-92 Kalman, R. E., 1966. On structural properties of linear, constant, multivariate systems. Paper 6A, Proceedings of 3rd IF AC Congress, London Keller, H. B., 1968. Numerical Methods for Two-Point Boundary Value Problems. New York: Blaisdell Publishing Co. Kendall, M. and A. Stewart, 1977. The Advanced Theory of Statistics, vols. 1-3. London: Charles Griffin & Co. Kendrick, D., 1979. Adaptive control of macroeconomic models with measurement error. In S. Holly, B. Rustem and M.B. Zarrop (eds.), Optimal Control for Econometric Models: an Approach to Economic Policy Formulation. London: Macmillan Kendrick, D., 1981. Stochastic Control for Economic Models. New York: McGraw Hill Kim, H. K., L. M. Goreaux and D. A. Kendrick, 1975. Feedback control rules for cocoa market stabilisation. In W.C. Labys (ed.), Quantitative Models for Commodity Markets. Cambridge, Mass.: Ballinger Kreps, D. and R. Wilson, 1982. Reputation and imperfect information. Journal of Economic Theory 27: 253-79 Kuhn, H. W. and A. W. Tucker, 1953. Contributions to the Theory of Games. Princeton: Princeton University Press Kydland, F. E. and E. C. Prescott, 1977. Rules rather than discretion: the inconsistency of optimal plans. Journal of Political Economy 85: 473-92 Levine, P. L. and S. Holly, 1988. The time inconsistency issue in macroeconomics: a survey. Economic Perspectives Lintner, J., 1965. Security price, risk and maximal gains from diversification. Journal of Finance 20 (Dec.): 587-615 Lipton, D., J. Poterba, J. Sachs and L. Summers, 1982. Multiple shooting in rational expectations models. Econometrica 50: 1329-33 Livesey, D. A., 1979. The role of feedback in economic policy. In S. Holly, B. Rustem and M. B. Zarrop (eds.), Optimal Control for Econometric Models: an Approach to Economic Policy Formulation. London: Macmillan Lucas, R. E., 1976. Econometric policy evaluation: a critique. In K. Brunner and A. H. Meltzer (eds.), The Phillips Curve and Labour Market. Amsterdam: North Holland
REFERENCES
237
Luenberger, D. H., 1977. Dynamic systems in descriptor form. IEEE Transformations on Automatic Control AC-22: 312-21 Luenberger, D. H., 1978. Time invariant descriptor systems. Automatica 14: 473-80 McCarthy, M. D., 1972. Appendix. In B. G. Hickman (ed.), Economic Models of Cyclical Behaviour. New York: NBER Studies in Income and Wealth MacFarlane, A. G. J., 1972. A survey of some recent results in linear multi-variate feedback theory. Automatica 8: 455-92 Machina, M. J., 1982. Expected utility without the independence axiom. Econometrica 50: 277-323 Maciejowski, J. M. and D. Vines, 1982. The design and performance of a multivariate macroeconomic regulator. Proceedings of the IEEE Conference on Applications and Multivariate Control, Hull, UK (July) Malinvaud, E., 1972. Lectures in Microeconomic Theory. Amsterdam: North Holland Malinvaud, E., 1981. Econometrics faced with needs of macroeconomic policy. Econometrica 49: 1363-75 Meade, J. E., 1982. Wage-fixing. George Allen and Unwin Miller, M. and M. Salmon, 1985. Policy coordination and the time inconsistency of optimal policy in open economies. Economic Journal Supplement 124—35 Minford, P., K. Matthews and S. Marwaha, 1980. Terminal conditions, uniqueness and the solution of rational expectations models. Mimeo, University of Liverpool Mossim, J., 1966. Equilibrium in a capital asset market. Econometrica 34 (Oct.): 4 Mundell, R. A., 1963. Capital mobility and stabilisation policy under fixed and flexible exchange rates. Canadian Journal of Economics 29: 475-85 Muth, J., 1961. Rational expectations and the theory of price movements. Econometrica 29 (July) Nash, J. F., 1950. The bargaining problem. Econometrica 18: 155-62 Nash, J. F., 1953. Two person cooperative games. Econometrica 21: 128—40 Neese, J. W. and R. S. Pindyck, 1984. Behavioural assumptions in decentralised stabilisation policies. In A. J. Hughes Hallett (ed.), Applied Decision Analysis and Economic Behaviour. Boston and Dordrecht: Kluwer-Nijhoff Newberry, D. M. G. and J. E. Stiglitz, 1981. The Theory of Commodity Price Stabilisation. Oxford University Press Norman, A. L., 1974. On the relationship between linear feedback control and the first period certainty equivalence. International Economic Review 15 (Feb.): 209-15 Ortega, J. M. and W. C. Rheinboldt, 1970. Iterative Solution of Nonlinear Equations in Several Variables. New York: Academic Press Oudiz, G. and J. Sachs, 1984. Policy coordination in industrial countries. Brookings Economic Papers 1: 1-64 Oudiz, G. and J. Sachs, 1985. International policy coordination in dynamic macroeconomic models. In W. H. Buiter and R. C. Marston (eds.), International Economic Policy Coordination. Cambridge University Press Pandolfi, L., 1981. On the regulator problem for linear degenerate control systems. Journal of Optimization Theory and Application 33(2) (Feb.): 241-54 Peleg, B. and M. E. Yaari, 1975. A price characterisation of efficient random variables. Econometrica 43: 283-92 Persson, M., 1979. Certainty equivalence and the rationale for rational expectations. Mimeo, Stockholm School of Economics
238
REFERENCES
Phillips, A. W., 1954. Stabilisation policy in a closed economy. Economic Journal 64 (June): 290-323 Phillips, A. W., 1957. Stabilisation policy in the time-form of lagged responses. Economic Journal, 67: 265—77
Pierce, J. L., 1974. Quantitative analysis of decisions at the Federal Reserve. Annals of Economic and Social Measurement 3: 11-19 Pigou, A. C , 1936. The Economics of Welfare. London: Macmillan Pindyck, R. S., 1973. Optimal Planning for Economic Stabilisation. Amsterdam: North Holland Pindyck, R. S., 1977. Optimal economic stabilisation under decentralised control and conflicted objectives. IEEE Transactions on Automatic Control AC-22: 517-30 Plasmans, J., 1978. Linked econometric models as a differential game; Nashoptimality-I. Department of Econometrics, Tilburg University. Paper presented at the European Meeting of the Econometric Society, Geneva (September) Plasmans, J. E. J. and A. J. de Zeeuw, 1978. Pareto optimality and incentives to cooperate in linear quadratic difference games. Mimeo, Department of Economics, Tilburg University (September) Poole, W., 1970. Optimal choice of monetary policy instruments in a simple stochastic macro model. Quarterly Journal of Economics 84: 197-216 Pratt, J. W., 1964. Risk aversion in the small and in the large. Econometrica 32 (Jan.April): 122-36 Preston, A. J., 1974. A dynamic generalisation of Timbergen's theory of policy. Review of Economic Studies 41 (Jan.): 65-74 Preston, A. J. and A. R. Pagan, 1982. The Theory of Economic Policy. Cambridge University Press Preston, A. J. and E. Sieper, 1977. Policy objectives and instruments requirements for a dynamic theory of policy. In J.D. Pitchford and S.J. Turnovsky (eds.), Applications of Control Theory to Economic Analysis. Amsterdam: North Holland Preston, R. S., L. R. Klein, Y. C. O'Brien and B. W. Brown, 1976. Control theory simulations using the Wharton Long Term Annual and Industry Forecasting Model. Mimeo, University of Pennsylvania Preston, A. J. and K. D. Wall, 1973. Some aspects of the use of state space models in econometrics. Discussion Paper no. 5, Australian National University (November) Putnam, R. D. and C. R. Henning, 1986. The Bonn Summit of 1978: how does international economic policy coordination actually work? Brookings Discussion Papers in International Economics 53 (Oct.) Raiffa, H., 1953. Arbitration schemes for generalised two person games. In H. W. Kuhn and A.W. Tucker (eds.), Contributions to the Theory of Games II. Princeton, N.J.: Princeton University Press Report of the Committee on Policy Optimisation 1978. CMND 7148 (March). HMSO Rhodes, I. B. and D. G. Luenberger, 1969. Differential games with imperfect state information. IEEE Transactions on Automatic Control AC-14(1) (Feb.) Rosenbrock, I., 1970. State Space and Multivariate Theory. London: Nelson Rustem, B., 1981. Projection Methods in Constrained Optimisation and Applications to Optimal Policy Decisions. Berlin: Springer-Verlag Rustem, B. and Velupillai, K., 1979. A new approach to the bargaining problem. In M. Aoki, A. Marzoll and W. Vogel (eds.), New Trends in Dynamic System Theory and Economics. London and New York: Academic Press
REFERENCES
239
Rustem, B. and M. B. Zarrop, 1979. A Newton type method for the optimisation and control of nonlinear econometric models. Journal of Economic Dynamics and Control 1(3) Rustem, B., J. H. Westcott, M. B. Zarrop, S. Holly and R. Becker, 1979. Iterative respecification of the quadratic objective function. In S. Holly, B. Rustem and M. B. Zarrop (eds.), Optimal Control for Econometric Models: an Approach to Economic Policy Formulation. London: Macmillan Salmon, M. and K. F. Wallis, 1982. Model validation and forecast comparisons: theoretical and practical considerations. In G. C. Chow and P. Corsi (eds.), Evaluating the Reliability of Macroeconomic Models. University of Warwick Salmon, M. and P. Young, 1979. Control methods and quantative economic policy. In S. Holly, B. Rustem and M. B. Zarrop (eds.), Optimal Control for Econometric Models: an Approach to Economic Policy Formulation. London: Macmillan Sargent, T. J., 1979. Macroeconomic Theory. New York: Academic Press Sargent, T. J. and N. Wallace, 1975. Rational expectations. The optimal money supply rule. Journal of Political Economy 83: 241-54 Selten, R., 1973. Re-examination of the perfectness concept for equilibrium points in extensive games. International Journal of Game Theory 4: 25-65 Shackle, G. L. S., 1964. The nature of the bargaining process. In J. T. Dunlop (ed.), The Theory of Wage Determination. London: Macmillan Sharpe, W. F., 1964. Capital asset prices: a theory of market equilibrium under conditions of risk. Journal of Finance 425-42 Shiller, R. J., 1977. Rational expectations and the dynamic structure of macroeconomic models with rational expectations. Econometrica 45 (Sept.): 6 Starr, A. W. and Y. C. Ho, 1969. Nonzero-sum differential games. Journal of Optimistic Theory and Applications 3(4) Strotz, R. H., 1956. Myopia and inconsistency in dynamic maximisation. Review of Economic Studies 66: 165-80 Taylor, J. B., 1979. An econometric business cycle model with rational expectations: some estimation results (June). Mimeo, Princeton University Taylor, J. B., 1985. International coordination in the design of macroeconomic policy rules. European Economic Review 28: 53-82 Theil, H., 1964. Optimal Decision Rules for Government and Industry. Amsterdam: North Holland Theil, H. and J. C. G. Boot, 1962. The final form of econometric equations. In A. Zellner (ed.), Readings in Economic Statistics and Econometrics. Boston, Mass.: MIT Press Theobald, C. M., 1974. Generalisations of mean square error applied to ridge regression. Journal of the Royal Statistical Society, Series B, 36: 103-6 Thomson, W., 1981. Nash's bargaining solution and utilitarian choice rules. Econometrica 49: Timbergen, J., 1952. On the Theory of Economic Policy. Amsterdam: North Holland Truxal, J. G., 1972. Control Systems Synthesis. New York: McGraw-Hill Turnovsky, S. J., 1977. Macroeconomic Analysis and Stabilization Policy. Cambridge University Press Tustin, A., 1953. The Mechanism of Economic Systems: an Approach to the Problem of Economic Stabilisation from the Point of View of Control-system Engineering. Cambridge, Mass.: Harvard University Press Ulph, D., 1983. Rational conjectures in the theory of oligopoly. International Journal of Industrial Organisation 1: 131-54
240
REFERENCES
Van der Ploeg, F., 1982. Government policy, real wage resistance and the resolution of conflict. European Economic Review 19: 181-212 Van der Ploeg, F., 1984. Economic policy rules for risk sensitive decision making. Zeitschrift fur Nationalokonomie 44: 207-35 Vines, D., J. M. Maciejowski and J. E. Meade, 1983. Stagflation Vol 2: Demand Management. London: George Allen and Unwin von Neumann, J. and O. Morgenstern, 1944. Theory of Games and Economic Behaviour. Princeton: Princeton University Press Wall, K. D., 1977. Rational expectations and the control of the economy. IEEE Conference on Decision and Control, New Orleans, December, pp. 1020-3 Waud, R. M., 1976. Asymmetric policymaker utility functions and optimal policy under uncertainty. Econometrica 44: 53-66 Whittle, P., 1981. Risk sensitive linear-quadratic-Gaussian control. Advances in Applied Probability 13: 764-77 Whittle, P., 1982. Optimisation over Time. New York: Wiley Wohltmann, H. W., 1984. A note on Aoki's conditions for path controllability of continuous time dynamic economic systems. Review of Economic Studies 51: 343-9 Wohltmann, H. W. and W. Kromer, 1984. Sufficient conditions for dynamic path controllability of economic systems. Journal of Economic Dynamics and Control 7:315-30 Wonham, W. H., 1974. Linear multivariate control: a geometric approach. Lecture Notes in Economics and Mathematical Systems. Berlin: Springer Verlag Zangwill, W. I., 1967. Nonlinear programming via penalty functions. Management Science 13: 344-58 Zarrop, M. B., 1981. Optimal Experimental Design for Dynamic System Identification, no. 21. Berlin: Springer Verlag Zarrop, M. B., S. Holly, B. Rustem, J. H. Westcott and M. O'Connell, 1979a. Control of the LBS econometric model via a control model. In S. Holly, B. Rustem and M. B. Zarrop (eds.), Optimal Control for Econometric Models: an Approach to Economic Policy Formulation. London: Macmillan Zarrop, M. B., S. Holly, B. Rustem and J. H. Westcott, 1979b. The design of economic stabilisation policies with large nonlinear econometric models: two possible approaches. In P. Ormerod (ed.), Economic Modelling. London: Heinemann Zeuthen, F., 1930. Problems of Monopoly and Economic Warfare. London: Routledge and Sons
Index adaptive control, 63, 197 additive separability, 10 Ancot, J. P., 90, 92 Anderson, P. A., 133, 136 antithetic variates, 107 Aoki.M., 14,26, 197 approximate priorities, 88-90 Arrow, K. J., 77 assignment problem, 4 Aviel, M., 125 Backus, D., 219, 221 bargaining, 198, 209-16 Barnett, S., 39 Barro, R., 219, 222 Bar-Shalom, Y., 197 Basar, R. A., 170 Baumol, W., 2, 37 Beenstock, M., 129 Begg, D. K. H., 119 Bellman, R., 40, 80 Bishop, R. L., 211 Black, J., 28, 37 Blanchard, O., 122, 129 Boot, J. C. G., 36 Brainard, W., 7, 69, 94 Brandsma, A. S., 170, 187 Brogan, W. L., 139 Brown, B. W., 107, 140 Buiter, W., 27, 120, 129, 142 Calvo, G. A., 152 Canzoneiri, M., 26, 170, 206, 207 capital asset pricing model, 89 certainty equivalence, 7, 50, 96, 99, 107 first order, 66 and non-linearities, 106-8 cheating, and time inconsistency, 198-200, 220 cheating solutions, 216-17
Chow, G., 14, 17, 67, 130 classical control methods, 2, 31-2 closed-loop rule, 6, 58-60, 107 Cobb, D., 121 conjectural variations, 181-7 consistent planning, 159-61 controllability, 4, 21-8 static controllability, 21-2 dynamic controllability, 23-5, 34, 35 stochastic controllability, 25, 36 path controllability, 25-8, 34 in rational expectations models, 117, 1417 Cooper, J. P., 109, 207 cooperative solutions, 198 Corden, M., 209 Cournot games, 169, 173-5 Craine, R., 70, 100 credibility, 219 Cross, J. G.,211 Currie, D., 206 descriptor forms, 121 desired values, 40 Dorf, R. C , 32 Dow, C , 4 Driffill.J., 139, 162,219,221 Duncan, G. T., 95 dynamic games, 4, 169-95 dynamic inconsistency, see time inconsistency dynamic programming, 39—45 under multiplicative uncertainty, 67-73 solution for rational expectations models, 148-50 Edgeworth, F. Y., 210 Elgard, D. I., 39 engineering control systems, 2 ex-ante, open-loop policy, 10
242
INDEX
ex-ante optimal decisions, 153 ex-post optimal decisions, 155 exogenous variables, 7 expected utility, 75, 89 Fair, R., 130, 133, 136 feedback, 1 rule, 58-60,110 matrix, 41, 43 optimal rule, 41-3 revisions to, 55-6 feedforward, 7, 62, 70 final form, 19-20 Fischer, S., 109, 142, 200 Fisher, P., 133, 136, 137 Foldes, L., 211 Friedman, J. W., 170 Friedman, M., 69, 141, 162 game theory, 3 Gauss-Newton algorithms, 113-14 variable metric iterations, 113 Gersovitz, M., 27 Gill, P. E., 113 Gordon, D., 219 Gray, J. A., 170, 206 Hall, S., 136 Hamada, K., 170, 206 Hamalainen, R. P., 200 Hannan, E. J., 109 Harsayni, J. C , 210 Harvey, A., 84 Havenner, A., 100 Ho, Y. C , 169 Holbrook. R. S., 116 Holly, S., 108, 109, 124, 129, 133, 136, 137, 162, 187, 192, 219, 225 Holtham, G., 209 Howrey, E. P., 37 Hughes Hallett, A. J., 27, 77, 90, 92, 94, 97, 114, 133, 136, 137, 170, 187, 207, 208, 209 Kaacs R 169 Johansen, L., 97, 206 Kahn, C. M., 122 Kalchbrenner, J. W., 84 Kalman Filter, 9, 16, 82-6, 197 Kalman, R., 14 Keller, H. B., 138 Kendrick, D., 67, 197 Kreps, D., 219, 221 Kromer, W., 27 Kuhn, H. W., 169 Kydland, N., 10, 152, 162
Levine, P., 206, 225 linear model, 12-13 rational expectations, 117-47 Lintner, J., 89 Lipton, D., 136, 138 Livesey, D., 32 loss functions, see objective functions Lucas, R., 75, 119 Luenberger, D. H., 121 McCarthy, M. D., 107 McFarlene, A. G. J., 32 Machina, M. J., 77, 93, 95, 99 Maciejowki, J. M., 32 Macroeconomic models, 1 Malinvaud, E., 97, 209 Mariano, R. J., 107, 140 Meade, J., 1, 32 m e a n variance analysis, 62 optimal decisions, 77-30 Miller, M., 206 Minford, P., 129, 133, 207 minimum principle, 45-3 modern control theory, 2 Morgenstern, O., 169 Mossim, J., 89 multicollinearity, 109 multivariate feedback control, 33-34 multivariate risk premium, 96 Mundell, R., 209 Murray, W., 113 Muth, J., 118, 128 j F 2 13 Nash'solutions, 171-3 feedback solution, 176-79 Newberry, D., 65, 89 Newton algorithm, 112 damped 113 Newton-type descent direction, 113 non-causal models, 148 non-cooperative solution, 11 non-cooperative dynamic games, 169-95 non-linearity, 3 non-linear optimal control, 105-16 linear approximations, 105, 108-11 linearisation in stacked form, 110-11 non-linear programming, 111-15 repeated linearisations, 115-16 non-recursive methods, 150-2 numerical solution of rational expectations models, 131-9 Jacobian methods, 132-7 Shooting methods, 138-9 Nyquist criterion, 32
Nash
243
INDEX objective functions, 3 quadratic, 40, 88 in dynamic games, 176 Olsder, G. J., 170 open-loop policy, 6 open-loop rules, 58 optimal policy revisions, 52-5, 165-7 Ortega, J. M., 113 Oudiz, G., 170, 186, 206 Pagan, A., 21,26, 121, 139 Pandolfi, L., 121 Pareto-optimal, 11, 198, 204-6 Pareto-efficient, 203 Peleg, B., 97 penalty function approach, 151 for dynamic games, 194-5 perfect cheating, 200 perfect equilibrium, 171 Persson, M., 142 Phillips, A. R., 1 Pierce, J. L., 170 Pigou, A. C , 202 Pindyck, R., 14, 45, 169, 170, 176 Plasmans, J., 176 pole assignment, 33, 36 policy coordination, 202-9, 206-9 Poole, W., 75 Pratt, J. W., 77 precommitment, 159-61,219 preference function respecification, 90-2 Preston, A. J., 14, 16, 26, 121, 139 Preston, R. S., 114 Prescott, E., 10,21, 152, 162 principle of optimality, 4, 40-1 punishment schemes, 218-19 quadratic functions, see objective functions rank-one update, 90 rational-expectations models, 2, 3, 117 general linear model, 120-1 analytic solutions, 121-7 stochastic properties, 127-8 final form, 123-4 penalty function solution, 124-6 multiperiod solution, 126-7 terminal conditions, 128-31 numerical solutions, 131-6 dynamic programming solution, 148-50 rational observers, 192-5 Rees, H. J. B., 27, 90, 94 reneging, 11 reputation, 12, 198, 202, 220 reputational equilibria, 219-24 Rheinboldt, W. C , 113 ridge factors, 101
risk, 3 non-diversifiable, 89-90 neutral, 92, 97 management, 99-101 function, 100 risk aversion, 8, 75-82, 87-104 local relative, 88 risk-sensitive policies, 61, 79-82, 97 Robustness, 61 Rosenbrock, I., 33, 121 Rustem, B., 84, 90, 113, 170, 211 Sachs, J., 170, 186, 206 Sakuri, M., 170 Salmon, M., 32, 107, 140, 206 Sargent, T., 9, 117, 121, 129, 141, 144 Selton, R., 169 sequential open-loop optimisation, 48-9, 164-5 and closed loop policies, 163-7 sequential updating, 6 Sharpe, W. F., 89 Shiller, R. J., 128, 131, 142 shooting methods, 138-9 Sieper, E., 21 stabilisability, 34-7 Stackelberg games, 169, 173, 174 feedback solutions, 179-80 full information, 189-92 and cheating, 198-200 Starr, A. W. state space, 13-19 minimal realisations, 14 non-minimal realisations, 14-15 structural realisations, 16-17 Stiglitz, J., 65, 89 stochastic simulation, 107 strategic interdependences, 4, 11, 170 Strotz, R. H., 156,219 sub-game perfect equilibrium, 171, 195 superposition, 109 sustainable cooperation, 215-16 Taylor, J., 119, 130, 170,206 terminal conditions, 9 128-31, 149 Theil, 1, 4, 19, 36, 130 time inconsistency, 4, 10, 152-5, 198, 220 Stackelberg games and cheating, 200 Tinbergen, J., 1,4,22, 142 Tinsley, P., 84 tracking gain, 41, 43 tracking gain, 41, 43 transfer functions, 31-2, 109 open loop,32 closed loop, 32 transversality conditions, see terminal conditions
244
INDEX
Truxal, J. G., 32 Tucker, A. W., 169 Turnovsky, S., 34, 209 Tustin, A., 1 uncertainty, 3 additive, 50, 64-5 multiplicative, 3, 50, 65-7 utility functions, 3 von Neumann-Morgenstern, 92-3, 95, 99 exponential, 97-8 Van der Ploeg, R., 97, 170, 211 variable metric algorithm, 90 Velupillai, K., 84,211 Vines, D., 32 von Neumann, J., 169
Wall, K., 14, 16, 121 Wahltmann, H. W., 27 Wallac , N . 9, 117, 141, 144 Wallis, K. F., 107, 140 Waud, . M., 94, 97 Westcott, J., 109 White noise, 109 Whittle, P., 77, 81, 97 Wilson, R., 219, 221 Yaari, M.E., 97 Young, P., 32 Zangwill, W. I., 126 Zarrop, M., 16, 63, 109, 113, 124, 162, 170, 219 Zeuthen, F.,210