OPTIMAL CONTROL and FORECASTING of COMPLEX DYNAMICAL SYSTEMS '.V:«i..
Ilya Grigorenko World Scientific
OPTIMAL CONTROL and FORECASTING of COMPLEX DYNAMICAL SYSTEMS
This page is intentionally left blank
OPTIMAL CONTROL and FORECASTING of COMPLEX DYNAMICAL SYSTEMS
Ilya Grigorenko University of Southern California
USA
\{p World Scientific NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONGKONG • TAIPEI • CHENNAI
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
OPTIMAL CONTROL AND FORECASTING OF COMPLEX DYNAMICAL SYSTEMS Copyright © 2006 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-256-660-0
Printed in Singapore by World Scientific Printers (S) Pte Ltd
To my beautiful wife Elena
This page is intentionally left blank
Preface
Chance, however, is the governess of life. Palladas, 5th Century A.D. Anthologia Palatina 10.65 This book has appeared by choice but also by some chance. I gave a talk in summer 2003 at Max Plank Institute of Complex Systems in Dresden, Germany, where I was kindly invited by Prof. Dr. Jan-Michael Rost. It happened that among the people who has visited my talk, was a representative of the World Scientific Inc. One month later I received an invitation to write this book. The purpose of this text is to summarize and share with a reader the author's experience in the field of complex systems. The title of this book were constructed to maximally cover different topics of the author's recent research, as fully as it is possible to do in few words. The main target of this book is to show the variety of the problems in modern physics which be formulated in terms of optimization and optimal control. This idea is not new. From the 18th century it is known, that almost any physical (or mechanical) problem one can formulate as an extremum problem. Such approach is called Lagrangian formalism, after the great French mathematician Lagrange. The text is written in such a way, that all the chapters are logically coupled to each other, so the reader should be ready to be referred to different parts of the book. In this book the author tried to adopt a naive division of the complexity hierarchy. The simplest case is control and forecast of the systems, which one can describe with the help of linear differential equations, and control fields enter in these equations as additive inhomogeneous terms. The sitvii
viii
Optimal control and forecasting of complex dynamical
systems
uation becomes more complicated, when control appears as not additive, but multiplicative. It leads to a nonlinear problem for the search of control fields. A typical example-control of a quantum system, where a control field enters into the Schrodinger equation as a product with the system's wavefunction. The next level of complexity appears when the nonlinearity of controlled system is taken into account. Such problems are still tractable. As an example one can consider control of Bose-Einstein condensate (BEC), which dynamics is described by the Gross-Pitaevsky equation. Note, it is assumed, that we still know the explicit form of the mathematical equations governing the system's dynamics. However, the dynamics of the controlled system could be very complicated (chaotic). Additional complexity can be achieved, if dynamics of the system becomes not deterministic, with addition of some stochastic component. And the most difficult situation happens when we need to control a black box system (like biological, financial or social systems), for which ab initio evolution equations are unknown. Chapter 1 provides an introduction to the long mathematical history of the calculus of variations, starting from the Ferma's variational principle, the famous Bernoulli brachistochrone problem and the beginning of the calculus of variations. Despite of the limited applicability of analytical methods, the calculus of variations remains a vital instrument to solve various variational problems. The author could go deeper into the ancient times and start his story with the princess Dido's problem, but he has feeling that the brachistochrone problem belongs to the scientific history, and Dido's problem is just a beautiful ancient legend based on Virgil's Aeneid, without clear proof of the Dido's priority. In chapter 2 we discuss different aspects of numerical optimization, including effectiveness of the optimization algorithms and multiobjective optimization. We make a brief review of some popular numerical methods which could be useful for solution of various problems in optimization, control and forecasting. We give a broader review of the so-called "Quantum Genetic Algorithm", which operates with smooth differentiable functions with a limited absolute value of their gradient. As an example, we demonstrate its ability to solve few-body quantum statistical problems in ID and 2D as a problem of minimization of the ground state energy, or maximization of the partition function. Different scenarios of the formation and melting of a "Wigner molecule" in a quantum dot.
Preface
IX
Chapter 3 outlines some elements of the chaos theory and a deep connection between nonlinearity and complexity in different systems. In this chapter we give a generalization of the Lorenz system using fractional derivatives. We show, how the "effective dimension" of the system controls its dynamical behavior, including a transition from chaos to a regular motion. In chapter 4 we discuss a problem of optimal control in application to nanoscale quantum systems. We introduced a novel approach, which permits us to obtain new analytical solutions of different optimal control problems. We also solve a problem for optimal control of the induced photo-current between two quantum dots using genetic algorithm. We analyze how decoherence processes, which result in non-unitary evolution of a quantum system, change the optimal control fields. This question is very significant for future design of nanoscale devices, since decoherence, in general, significantly limits optimal control. In chapter 5 we continue to consider control of quantum systems with particular application to quantum computing. We have shown, that an optimal design of artificial quantum bits can decrease by an order of magnitude the number of errors due to quantum decoherence processes, and leads to a faster performance of basic quantum logical operations. In chapter 6 we briefly discuss different aspects of forecasting and its connection with optimization and chaos theory, which we discuss in the previous chapters. I would like to conclude this introduction with acknowledge of my teachers and colleagues, who helped and governed my research last years. The most of the results which were presented or mentioned in this book were done in a close collaboration with these nice people. I would like to thank my scientific supervisors and colleagues: Prof. Dr. B. G. Matisov and Dr. I. E. Mazets, Prof. Dr. K. H. Bennemann, Prof. Dr. M. E. Garcia, Prof. Dr. D. V. Khveshchenko, Prof. Dr. S. Haas and Prof. Dr. A. F. J. Levi. Ilya A. Grigorenko
This page is intentionally left blank
Contents
Preface
vii
1. Analytical methods in control and optimization
2.
1
1.1 Calculus of variations 1.1.1 The beginning: Fermat's variational principle . . . . 1.1.2 The "beautiful" Brachistochrone Problem 1.1.3 Euler-Lagrange equation 1.1.4 A word about distance between two functions . . . . 1.1.5 The Brachistochrone problem revisited 1.1.6 Generalizations of the Euler-Lagrange equation . . . 1.1.7 Transversality conditions 1.1.8 Conditional extremum: Lagrange multipliers method 1.1.9 Mixed Optimal problem 1.1.10 Approximate methods of solution-Ritz's method . . . 1.2 Optimal control theory 1.2.1 Sensitivity analysis 1.2.2 Null controllability 1.2.3 Problems with constrained control 1.3 Summary
1 2 4 6 11 12 14 16 16 19 20 21 25 26 26 27
Numerical optimization
29
2.1 The halting problem and No Free Lunch Theorem 2.2 Global Optimization: searching for the deepest hole on a golf field in the darkness using a cheap laser pointer 2.2.1 Sensitivity to numerical errors 2.3 Multiobjective optimization
29
xi
30 33 34
xii
3.
4.
5.
Optimal control and forecasting
of complex dynamical
systems
2.3.1 Pareto front 2.3.2 The weighted-sum method 2.4 Simplex method 2.5 Simulated annealing: "crystallizing" solutions 2.6 Introduction to genetic algorithms 2.7 GA for a class of smooth (differentiable) functions 2.8 Application of the GA to the eigenproblem 2.8.1 The ground state problem in one and two dimensions 2.8.2 Extension of the QGA to quantum statistical problems 2.8.3 Formation of a "Wigner molecule" and its "melting" 2.9 Evolutionary gradient search and Lamarckianism 2.10 Summary
35 37 38 42 44 49 57 58 66 69 74 76
Chaos in complex systems
77
3.1 Lorenz attractor 3.2 Control of chaotic dynamics of the fractional Lorenz system 3.3 Summary
80 83 91
Optimal control of quantum systems
93
4.1 Density matrix formalism 4.2 Liouville equation for the reduced density matrix 4.3 Modern variational approach to optimal control of quantum systems 4.3.1 An alternative analytical theory 4.4 An approximate analytical solution for the case of a two level system 4.5 Optimal control of a time averaged occupation of the excited level in a two-level system 4.5.1 Analytical solution for optimal control field 4.5.2 Optimal control at a given time 4.5.3 Estimation of the absolute bound for the control due to decoherence 4.6 Optimal control of nanostructures: double quantum dot . . 4.6.1 The optimal field for the control of the photon assisted tunnelling between quantum dots 4.7 Analytical theory for control of multi-photon transitions . . 4.8 Summary
95 96
Optimal control and quantum computing
99 100 105 109 114 117 119 121 124 138 145 147
Contents
5.1 5.2 5.3 5.4 6.
Robust two-qubit quantum registers Optimal design of universal two-qubit gates Entanglement of a pair of qubits Summary
Forecasting of complex dynamical systems 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
xiii
147 157 166 168 171
Forecasting of financial markets 171 Autoregressive models 172 Chaos theory embedding dimensions 174 Modelling of economic "agents" and El Farol bar problem . 175 Forecasting of the solar activity 176 Noise reduction and Wavelets 177 Finance and random matrix theory 179 Neural Networks 180 Summary 181
Bibliography
183
Index
197
Chapter 1
Analytical methods in control and optimization
1.1
Calculus of variations
Physics, Chemistry, Engineering and Finance often pose us problems which have in common that one have to choose the best (in a certain sense) solution among a huge set of possible ones. Such problems are involving optimization, optimal control, optimal design, optimal decision, long-term and short-term forecasting etc. All these optimization problems, in particular, which involve nonlinearity and multi-dimensionality, rarely can be solved analytically - some numerical methods must be used. However, one usually cannot learn much from the particular numerical solution of a complex optimization problem-it can be too complicated for analysis. Fortunately, one can get some insight from studying relatively simple problems that can be solved analytically. That is why this book begins with the introduction to analytical techniques, which were developed by many outstanding mathematicians during last 300 years. One of this techniques, the Variational Calculus provides us a method based on the Euler-Lagrange theory, to obtain analytical solutions of optimization or optimal control problems. This method can be considered as a generalization of the condition for a local extremum of the real variable functions f'(x) = 0 to the problems of functional analysis. A functional is a correspondence which assigns a definite real number to each function belonging to some class. Thus, one might say that a functional is a kind of function, where the independent variable is itself a function. In this chapter we are going to discuss methods of analytical solution of optimal control problems, based on variational approach. Being useful only in some simple cases, it is very transparent to demonstrate complexity and richness of the optimal control theory.
1
1
Optimal control and forecasting of complex dynamical
systems
In the following section we give a short historical survey of the most significant discoveries in the theory of the variational calculus, which gave rise of the many branches of the functional analysis and optimal control theory.
1.1.1
The beginning:
Fermat's
variational
principle
The name of a great French mathematician Pierre Fermat (1601-1665) is connected to many famous and even intriguing pages of Mathematics. Perhaps, everyone heard about the great "Fermat's Last Theorem" that was challenging mathematicians for more than 300 years. Many mathematicians questioned Fermat's claim that he knew a proof of this theorem. Finally the theorem was resolved by Andrew Wiles, a professor of mathematics at Princeton in 1993. It is less known that Fermat made the first formulation of the variational principle for a physical problem. It is named as Fermat's variational principle in geometrical optics. In his letter to a colleague, Marin Cureau de la Chamber, dated 1st January 1662, Fermat has attached two papers: "The analysis of refractions" and "The synthesis of refractions", where he considered a problem of light propagation in optical medium. Fermat's basic idea was that a ray of light goes along a trajectory (among all possible trajectories joining the same points), so that the time of the travel must be minimal. In a homogeneous medium (for example, in the air), where the speed of light is constant at all points and in all directions, the time of travel along a trajectory is proportional to its length. Therefore, the minimum time trajectory connecting two points A and B is simply the straight line. By means of his principle Fermat was able to give completely clear derivation of the Snell's law of light refraction, that was originally established experimentally by Dutch mathematician Willebrord van Roijen Snell (1580 — 1626). Here we also would like to mention that the Snell's law probably have been discovered first by 10th-century Islamic scholar Abu Said al-Ala Ibn Sahl in his work "On the Burning Instruments" in the 10th century, and then rediscovered independently by several European scientists, including another great French mathematician and philosopher Rene Descartes (1596 - 1650). Let us give a brief description of the Snell's law, since we a going to use it in the next subsection. The law can be formulated as follows. Let us consider a parallel beam of rays of light be incident on a horizontal plane
Analytical methods in control and optimization
3
i
Fig. 1.1
k
s
Snell's law.
interface 5 between two homogeneous optical media (see Fig. 1.1). Snell's law says: sinai
The
vi
where a and f3 are the angle of incidence and the angle of refraction (reckoned from the normal N to S ) and v\ and V2 are the speed of light above and below S. Quite remarkable, in one of his above mentioned letters Fermat cited a great Italian scientist Galileo Galilei (1564 — 1642), and his work, published in 1638. In this work, titled "Two New Sciences", Galileo apparently first considered the problem of finding the path of the fastest descend under the action of gravity. For Fermat it was very significant that Galileo's solution (which was incorrect) was not just a straight line representing the shortest way. After the Fermat's discovery of the first variational principle, many
4
Optimal control and forecasting of complex dynamical
systems
others were proposed by different scientists. Variational formulations became known in Mechanics, Electrodynamics, Quantum Mechanics, Quantum Field Theory, etc. Nowadays it is a common knowledge in science that any natural law can be formulated as an extremal problem. Here we can quote a great Swiss mathematicians Leonhard Euler (1707 — 1783): "In everything that occurs in the world the meaning of a maximum or a minimum can be seen". The variational principle inspired metaphysically inclined thinkers who understood it as a concrete mathematical expression of the idea of a great German philosopher and scientist Gottfried Wilhelm Leibnitz (1646 — 1716) that actual world is the best possible world. 1.1.2
The "beautiful" Brachistochrone
Problem
After Fermat's papers of 1662 it was not much progress on the subject until in June 1696 an article by a great Swiss mathematician Johann Bernoulli (1667-1748) was published with an intriguing title: "A New Problem to Whose Solution Mathematicians are Invited". He stated there the following problem: "Suppose that we are given two points A and B in a vertical plane. Determine the path ACB along which a body C which starts moving from the point A under the action of its own gravity reaches the point B in the shortest time". The statement of the problem was followed by a paragraph in which
-0.5
-1.5
-2.5
Fig. 1.2
The line of the shortest decent, the brachistochrone.
Johann Bernoulli explained his readers that the problem is very useful in
Analytical methods in control and
optimization
5
mechanics, the solution is not a straight line AB, and that the curve is very well known to geometers (Fig. 1.2). One of the most influential mathematicians at this time, G. W. Leibnitz recognized this problem as "so beautiful and until recently unknown". Apparently, neither Leibnitz nor Johann Bernoulli was aware about Galileo's work in 1638, which we have mentioned above. Some time after the publication of his article Johann Bernoulli gave the solution of the problem by himself. Another solutions were provided independently by Leibniz, Jacob Bernoulli (brother of Johann Bernoulli), and anonymous author. Experts immediately guessed "Ex Ungue Leonem..." ("to judge a lion from its claw"), and now we know exactly, that it was a great British mathematician and physicist Sir Isaac Newton (1642 — 1727). The curve of the shortest decent or the brachistochrone ("brachistochrone" means in Greek "shortest time") turned out to be a cycloid. Leibniz's solution was based on the approximation of curves with broken lines. Very interesting solution was given by Jacob Bernoulli. This solution was based on adoption of the Huygens' principle and the concept of "wave front". However, the most frequently mentioned solution belongs to Johann Bernoulli himself. The Bernoulli's solution is following. First of all let us set a coordinate system (x, y) in the vertical plane so that the x axis is horizontal and the y axis is directed downward (see Fig. 1.2). The velocity of the body C does not depend of the shape of the curve y(x) because the body moves without friction. Its velocity depends only on the current heights y(x) and according to the Galileo's law, v = ^J2gy{x), (g is the acceleration of gravity). The total time of the decent along the path y{x) from the point A to the point B is equal to: T=[ JAB
* = / - * , V JAB y/2gy(x)
(1.2)
where ds is the differential of the arc length. The problem is to find the optimal path y(x) in order to minimize the total time T. Bernoulli's original idea was to apply the Fermat's principle to the brachistochrone problem. He noted, that one can formulate original problem as the problem of light propagation in a nonhomogeneous medium, where the speed of light at a point (x,y(x)) is assumed to be equal to y/2gy(x). In order to obtain the solution in analytical form, after Johann Bernoulli, we can split the medium into many thin parallel layers with local
6
Optimal control and forecasting
of complex dynamical
systems
speed of light vt, i = 1, 2,..., N. Applying the Snell's law, we obtain sin(ai)
sin(a2)
sin(ai)
sin(a^)
Vi
V2
Vi
VN
const,
(1-3)
where on are the angles of incidence of the ray. Going to the limit of the infinitely thin optical layers, we conclude that sin(a(a;)) —W11 = const, v(x)
(1.4)
where v(x) = y/2gy(x) and a(x) is the angle between the tangent to the curve y(x) at the point (x,y(x)) and the axis y: sin(a(x)) = 1/y/l + (dy(x)/dx)2.
(1.5)
Thus, the equation of the brachistochrone y(x) can be rewritten in the following form:
y/l + (dy(x)/dx)*y/tfx)
= const.
(1-6)
This equation can be easily integrated, and we obtain the solution as the equation of a cycloid : x = xo + x\(t — sin(£))/2, S/ = I/o(l-cos(t))/2. 1.1.3
Euler-Lagrange
(1.7)
equation
It was already clear at that time, that the elegant method, proposed by Bernoulli, cannot be applied to any problem, and some general method of solving problems of this type, is very desired. In 1732, when Euler was only 25, he published his first work on variational calculus. And in 1744 Euler published a manuscript under the title "A method of finding curves possessing the properties the properties of a maximum or a minimum or the solution of the isoperimetric problem taken in the broadest sense". In his systematic study Euler treated 100(!) special problems and not only solved them but also set up the beginnings of a real general theory. Nowadays we can say that in this work he established the theoretical foundations of a new branch of mathematical analysis.
Analytical methods in control and
optimization
7
In his work, Euler considered the problems, similar to the brachistochrone problem, and which can be formulated as the problem to find unknown function x(t), so the functional J(x) achieves its extremum: J(x)=
F(t,x(t),x(t))dt~*extr.
(1.8)
J to The function F is problem-specific and it is usually called functional density. The Euler's method was based on the approximation curves with broken lines. With this method Euler derived a second-order differential equation for the extremals, which later Lagrange called the Euler equation. Eleven years later, in 1755, 19 years old Joseph Louis Lagrange (1736 — 1813) sent a letter to Euler where he described his "method of variations". In 1759 Lagrange published the work where he had elaborated his new methods of the calculus of variations. His main idea was to consider a "variation" of the curve, which is assumed to be an extremal. Later on the Lagrange method became generally accepted by other mathematicians. This is the method that we shall use to derive the famous Euler equation. But before we start to derive the Euler-Lagrange equation let us proof first the socalled Fundamental Lemma, which plays an essential role in the variational calculus. The Fundamental Lemma. Let a continues function a(t) has a property: / a(t)x(t)dt = 0 Jo for any continuously differentiable function x(0) = x(T) = 0. Then a(t) = 0.
(1.9) x(t) satisfying
condition:
Proof. Let us suppose that a{t) ^ 0 at a point r G [0,T]. Then the continuity ofa(t) implies that there is a closed interval A = [ri, T2] on which a(t) does not vanish. For definiteness, leta(t) > a > 0, t € A (see Fig.1.3). Let construct a function x(t) = (t — n ) 2 ( i — T 2 ) 2 , t e A, 0, otherwise. It can be easily be verified, that x(t) is continuously differentiable function with x(0) = x(T) = 0. Thus x(t) satisfies conditions of the Fundamental Lemma. On the other hand, by the Mean Value Theorem one can show that exists 7} £ [0,T], such that J0 a(t)x(t)dt = a(ri) f0 x{t)dt > 0, and arrive at a contradiction which proves the Lemma. D
8
Optimal control and forecasting of complex dynamical
systems
A Fig. 1.3
The Fundamental Lemma, function a(t) (see in the text).
Now let us start the derivation of Euler-Lagrange equation. We suppose x* = x* (t) is an optimal solution of the variation problem given by Eq.(1.8), and let fi(t) be a function which satisfies boundary conditions /J.(to) = M(*I) = 0- Let us modestly restrict ourselves to the class of functions that satisfy the boundary conditions and have the first and the second derivative continuous on the t e (*o,*i)- These functions we will call admissible functions. For each real number a, let us define a new function x{t) by X(t) = X*(t) + afi(t),
(1.10)
see Fig. 1.4. Note that if a is small, the function x(t) is "near" the function x*(t) in some sense. The more precise meaning of "nearness" of functions we shall consider later. Clearly, for optimal x*: J{x*) > J(x* + afi), if we assume x* to be a global maximum, for all a. If we keep the function fi(t) fixed, the integral J(x* + a/x) becomes a function of a alone. Putting 1(a) = J(x* + afi), we have 1(a) =
F(t,x*+an,x*+aji)dt.
(1.11)
Jta Here 7(0) = J(x*), so 7(a) < 7(0) for all a. This condition can be formu-
Analytical methods in control and
optimization
Fig. 1.4
"Weak" variation of the optimal solution x*.
Fig. 1.5
"Strong" variation of the optimal solution x*.
9
yt
lated as dl/da\
= 0.
(1.12)
la=0
This is the condition we use to derive the Euler equation. Now, looking
10
Optimal control and forecasting of complex dynamical
systems
at the Eq.(l.ll), we see that to calculate 1(0) we must differentiate the integral with respect to the parameter a appearing in the integrand. The result is as follows dl_ da
(1.13)
Let us now integrate the second term in Eq.(1.13) by parts: ffl 9F . . . ^
dF
, , *i
f*1 d dF
(1.14)
Using Eq.(l.ll) and rearranging, we get: fu \dF da a=o JtQ L ax fdF\ , s Now recall that (i(to) = fi(ti
I
1
ddF-i .., dt ox J (dF\ ,
N
(1.15)
0, so the condition Eq.(1.12) reduces to
[dF
ddFi
. .,
l-8x--Jt-dx-ht)dt
=
n
°-
(1.16)
J t0
In the argument leading to this result, fi(t) was a fixed function. However, Eq.(1.16) holds for all functions /j,(t) which are 0 at to and t\. According to the Fundamental Lemma it follows that the term in the brackets must vanish for all t G [to,ti]: 9F^_dLdF_ _ dx dt dx
(1.17)
Here is worth to note, that the Euler-Lagrange equation is a first-order condition for "local" optimum analogous to the condition that the partial derivatives are 0 at a local extreme point in the "static" optimization. Let us mention two special (simple) forms of the Euler-Lagrange equation: 1) If the Lagrangian F does not depend on x explicitly, the Euler equation becomes: dF_ \ p(t) — const. dx
(1.18)
Analytical methods in control and
optimization
11
It is called the momentum integral. 2) If the Lagrangian F does not depend on t, the Euler equation possesses the first integral: dF — - F = E(t) = const.
(1.19)
It is called the energy integral (both the names originate from classical mechanics). The Legender's necessary condition: Now let us suppose that the condition of Eq.(1.17) is satisfied. A necessary condition for J, given by Eq.(1.8) to have a maximum at x* = x*(t), was first obtained by a great French mathematician Adrien-Marie Legendre (1752 - 1833) : FZ±(t,x*(t),x*(t))<0,
(1.20)
for all t £ [to,t\]. The corresponding condition for a minimum is obtained by reversing the inequality in Eq.(1.20). 1.1.4
A word about distance
between two
functions
In seeking the extremum we must compare values of a function for two "nearby" solutions. In the derivation of the Euler-Lagrange equation we just assumed that the perturbed solution x in Eq.(l.lO) is somewhere "near" the optimal x* one. Let us hence generalize the concept of "nearness" or "closeness" of two functions, since it will be useful further. Let two functions be given by the equations: y = y(x),
Vi=Vi(x).
(1.21)
Let the maximum of the absolute value of the difference \yi(x) — y(x)\ < e designate the distance between these functions. However, the functional J=
f1 F{x,y,^)dx
(1.22)
depends not only on the function itself, but also on its derivative -j-y{x). Hence, the values of the functional for functions between which the distance
12
Optimal control and forecasting of complex dynamical
systems
is small (less than any previously assigned number e) may differ radically (see, for example, such a "strong" variation in Fig. 1.5). Let us present an example, let us take the functional
j
-!(£)*•
(i 23)
y = l/nsin(nx),
(1-24)
-
and consider the function
whose distance from the horizontal axis (i.e., from the function y = 0) is 1/n and tends to zero as n increases without limit. Meanwhile, the value of the functional Eq.(1.23) for the function Eq.(1.24) is independent of n and equals 7r/2, but the value of the functional is zero for y = 0. Hence, for "nearby" functions the values of the functional may differ substantially. In order to avoid this, it is necessary to generalize the definition of "nearness" of functions, by bounding the distance between nth-order derivatives of the corresponding functions. Let us bound the greatest of the maximums of the differences: dx ,d2wi
dx d2y,
.dnyx dnyt < En dxn dx1* These type of conditions we will call the nth-order distance between two functions.
1.1.5
The Brachistochrone
problem
revisited
In the subsection 1.1.3 we have derived the Euler-Lagrange equation for the extremals. Now we can use it in order to solve the Bernoulli's brachistochrone problem. Let us choose the j/-axis to be vertical and the x-axis horizontal. According to the Galileo's hypothesis the velocity at the height y is given by the relation: v„22 = 2gy(x),
(1.26)
Analytical methods in control and
optimization
13
where g is the constant of gravity. The time T to fall from point A to B is given in parametric form:
where x = dx/dt, y — dy/dt. Notice that the integrand function in Eq.(1.27) does not contain x explicitly. One of the Euler equations for this problem is therefore
d dt
dT V± dt y/vix2 +y2)i
(1.28)
or, equivalently, F± = const = l/\/2a is the first integral. After some manipulation this condition reduces to the nonparametric equation:
Experience has shown it is convenient to introduce as a parameter the angle that the tangent to the curve makes with the y-axis; thus T2^2=cos(r).
x2 + y2
(1.30)
hence the relation given by Eq.(1.27) is equivalent to: x = 2acos 2 (r) = a ( l + c o s ( 2 r ) ) ,
(1.31)
and from Eq.(1.30) we obtain y + c = ±a(2r + sin(2r)). (1.32) We again obtained the cycloid solution given by Eq.(1.31) and Eq.(1.32). Note, this curve has been known and of interest since about 1500. One of its most remarkable properties, that of tautochronism, was discovered in 1673 by a famous Dutch mathematician and physicist Christiaan Huygens (1629 — 1695). It is known that for very small oscillations the period of a simple pendulum moving on an arc of a circle is nearly independent of its amplitude. But for a finite-sized oscillation this is no longer so. However, Huygens showed that when the pendulum bob moves along an arc of cycloid, its period is precisely independent of the amplitude of its excursion (i.e. the motion is exactly harmonic). Johann Bernoulli in his paper remarked that "...One will be astounded when I say that the Cycloid, the it tautochrone of Huygens, is the sought Brachistochrone."
14
Optimal control and forecasting
1.1.6
Generalizations
of complex dynamical
of the Euler-Lagrange
systems
equation
In the subsection 1.1.3 the Euler-Lagrange equation was derived using the following assumptions: 1) The integrand depends only on one function; 2) The function depends only on one variable; 3) The functional consists of only one term having the integral representation (see, for example, Eq.(1.8)). In practice, many applications can have complicated formulation, where one cannot meet these simplifying assumptions. Now we show some possible generalizations of the Euler's approach for more complicated problems. The integrand depends on higher-order derivatives: Consider the problem of finding a function x(t) that maximizes or minimizes functional J, rti
J=
F(t,x(t),x(t),x(t),...,x{n))dt^extr,
(1.33)
Jto where x(t) and its first n — 1 derivatives have preassigned values at to and t\ and F is a given function. One can show that a necessary condition for x{t) to solve this problem is that the following generalized (Euler-Poisson) equation equation is satisfied: (W_d_dF_ dx dt dx
+
d2 OF dt2 dx
^
dn OF ' dtn dx^
'
^ '
'
See Gelfand and Fomin [Gelfand (2000)] for more details. We are going to use this equation in application to optimal control of quantum systems in chapter 4. The integrand depends on two variables: Another type of generalization that we shall briefly mention is the one in which the variable function depends on more than one variable. With suitable restrictions on J, we pose the problem of maximizing or minimizing functional J, fXl J=
fVl dz dz / F(x,y,z,—,—)dxdy->extr,
(1.35)
Analytical methods in control and
optimization
15
defined on a surface z — z(x, y). The necessary condition for the extremum of the functional Eq.(1.35) is given by the Euler-Ostrogradskii equation: dF dz
d dF dx dz'x
d dF dy dz'y
For completeness the solution of the equation Eq.(1.36) should satisfy properly defined boundary conditions on the 2D domain. Several unknown functions: In the variational problems discussed so far, there has been only one unknown function. Of course, there are many situations in which several unknown functions are involved. Let us briefly consider the following problem: rti
J=
F(t,x{t),k{t))dt-^extr,x(t0)
= x°,
(1.37)
Jt0 where x(t) = (a;i(*),...,x n (£)), x(i) = (xi(i), ...,£„(£)). In order to solve the optimization problem, the unknown functions must satisfy a system of the Euler-Lagrange equations: dF
d dF n 0 t
eZ-dia£r >
=1
n
'"-> >
, (L38)
with a proper boundary conditions. Obviously, it is very difficult to obtain analytical solution of such a system in general case. Degenerate functionals: Let us now consider the case when Fy/y, = 0 identically (compare also with the Legendre's condition described above). Such kind of functionals are called degenerate functionals. In this case we can write it as linearly dependent on dy/dt: J
f1\A(t,y)+B(t,y)^dt. Jtn I to
L
dt
(1.39)
In this special case the Euler equation is not a differential. It represents a finite (algebraic) equation which have to be satisfied by the unknown
16
Optimal control and forecasting of complex dynamical
systems
function:
There are two possible cases. The first one is when the Eq.(1.40) is satisfied identically for any function y. In this case it can be written as an integral of a total differential. The functional has a "potential" form of a total differential and is independent of the path of integration. Its value is fully determined by the difference between final and initial points. Thus, J is independent on y. The second case is when Eq.(1.40) is not satisfied identically and one can determine one or several extremals y to optimize Eq.(1.39).
1.1.7
Transversality
conditions
Sometimes it is necessary to solve more complicated variational problem, when endpoints of the solution are not fixed. The situation becomes even more complicated if we would like to impose a condition, that the free endpoints should lie on certain curves, described by two different functions y = (p(x) and y = ip(x). Without proof (which we leave as an exercise) the variation of the functional with free endpoints is given by: 5J = £
(Fy - ~Fy,)5ydx
+ SyFy,]^.
(1.41)
Suppose, that right end point (2:1,2/1) must lay on a certain curve given by equation: 2/i =
{xi).
(1.42)
It follows then that the transversality condition is: (F + (
(1.43)
In a similar way one can impose the transversality condition for another endpoint. 1.1.8
Conditional
extremum:
Lagrange multipliers
method
Now we turn to consider a class of variational problems when the functions yielding the extremum of the functional are themselves subject to some
Analytical methods in control and
optimization
17
additional conditions (coupling equations). Such problems appear very frequently in different application and called the problems with a conditional extremum. These constraints can be given in algebraic or integral form, or by a differential equation. Let us consider as an example the problem of finding the shortest distance between two points, i.e. the minimum of the integral
S= Til+y^ + z'^dx,
(1.44)
under the condition that the curve connecting these points lie on a certain surface, for example, the sphere with the radius R: x2+y2+z2=R2.
(1.45)
In principle, the variable z may be expressed in terms of y and x from the coupling equation Eq.(1.45), be substituted into the integral Eq.(1.44), and the minimum of the usual functional of one variable might be sought. Such kind of approach will be demonstrated in chapter 4. However, such elimination of variables often leads to very complex computations, and it is more convenient to utilize another method of solution, such as the Lagrange method of undetermined multipliers. The following simple mnemonic rule exists: In order to find the extremum of the functional J= /
F(x,y,y',z,z')dx
(1.46)
J to
under the condition (x,y,z)=0,
(1.47)
it is necessary to introduce an intermediate function H = F + \{x)(x,y,z),
(1.48)
where X(x) is a function of a; as yet unknown, and to seek the extremum of the functional rt
J
/ Hdx, 't0 Jto by methods described in the previous sections. In all we must determine three unknown functions: y(x),z(x),
(1.49)
and X(x).
18
Optimal control and forecasting
of complex dynamical
systems
We have three equations for their determination: the two Euler-Lagrange equations for the functional Eq.(1.49) Hy
~ dx~Hy'
Hz
=Fy
dx~Fy'
~
~ TxUz'
+
^A(X)
=Fz
=
°'
~ iLFz' + ^A(:c) = °'
(L50)
and the coupling equation Eq.(1.47). Here we denote partial derivatives by the corresponding subscripts. From these equations we indeed can find the desired functions. The function X(x) is designed the Lagrange multiplier. This rule can be extended to the case when several conditions of the type Eq.(1.47) are given. In this case the intermediate function has the the form: n
H = F(x,yi,y^
+ ^2\j(x)j(x,yi).
(1.51)
i=i
The same rule is retained if the coupling equation contains derivatives of the desired functions, i.e., is a differential equation: 4>(x,y,y',z,z')
= 0.
(1.52)
This problem (when derivatives enter into the coupling equation) is called the general Lagrange problem. The mnemonic rule is extended also to seeking the transversality conditions. They are written exactly as for the simplest problem except that the intermediate function H = F + X(x)<j> plays the part of the function F. Let us now mention about such significant optimization task as isoperimetric problem. It was already known in ancient times that the circle is a shape that encloses the maximum area for a given length of perimeter. In fact, this is an example of a constrained problem. One can express the area and the perimeter in the integral form: J = /
J(x,J(x,y,y')da y, y')dx —> max
J/ tt00
I = ['l G(x,y,y')dx
= L,
(1.53)
J to
where L is fixed. Such class of problems is usually called isoperimetric problems (in Greek iso=equal, peri=around). For such kind of problems one should use the Lagrange multipliers formalism. The isoperimetric problem
Analytical methods in control and optimization
19
one can reduce to the general Lagrange problem. Let us introduce a new function *(£)
¥(*) = I G(x,y,y')dx.
(1.54)
J to
The problem Eq.(1.53) is equivalent then to the problem of finding functions y and \I/. The Euler-Lagrange equations in this case are: Hy ~ JtHy
= 0,
(1.55)
Hy — -T-Hq,, = —A = 0, at at where H = F + A(x)(*' - G).
1.1.9
Mixed Optimal
problem
In many variational problems it is natural to include in the maximization problem a function, which measures the value assigned to the terminal state. This problem can be formulated as:
/
F(t, x(t),x(t))dt
+ E(x(ti)) -> extr, x(t0) = x0,
(1.56)
where E is a given function, while x{t\) is to be determined by the maximization. For example, in a quantum control problem (see chapter 4), £(x(ti)) can represent the occupation of a certain state of a quantum system at a given time. We start from the assumption that we know a solution of the problem Eq.(1.56): x* = x*(t). We also know about x* that it has an extremal value compared with all admissible functions whose graphs join (to,xo) and (ti,x*(ti)). Obviously, all these functions result in the same value to the second term of Eq.(1.56). Thus, x* must satisfy the Euler-Lagrange equation Eq.(1.17) in [£o>*i]- The first constraint for the solution of the above formulated problem is determined by the initial condition x*(to) = XQ. The second constraint can obtained from the appropriate transversality condition-an additional boundary condition that must be satisfied by the optimal solution when the ends are not fixed, for the right endpoint in this case. If we assume differentiability of the function S, the problem can be
20
Optimal control and forecasting of complex dynamical
systems
transformed into: S(x(ii)) - E(x(t 0 )) = / 1 -fv(x(t))dt Jto dx
= f 1 Jto
dJ:
^h(t)dt. ax
(1.57)
Thus, we can reformulate the problem Eq.(1.56) as: fti
m a x / F(t,x,x) Jto with the Eq.(1.58) Than we Eq.(1.58)
+ E'(x(t))x(t)dt,
(1.58)
boundary condition x(t0) = x0 and x(ti) is free. Note that is a standard variational problem with "free" right-hand side. can conclude, that the transversality condition for the problem is given by &(F(t,x,x) + E'(x)x)\t=tl = 0, or 3F_ dx t=ti
OX i
Thus we have the following result: If x*(t) solves problem Eq.(1.56), then x*(t) satisfies the Euler-Lagrange equation Eq.(1.17) and the transversality condition Eq.(1.59).
1.1.10
Approximate
methods
of solution-Ritz's
method
Usually determination of the extremals is a hard task, because one have to solve nonlinear differential equations of a high order. Fortunately, often it is enough to know only an approximate solution of the extremum problem. One of the best approximate methods is the Ritz's method. It is based on reduction of the original problem of the extremum of a function of an infinite number of variables to the solution of a finite number of equations with a finite number of unknowns. To be specific, let us consider a problem of extremum of the functional: J=
f
F(x,y,y')dx.
(1.60)
Jo The main idea of the Ritz's method is to represent solution y(x) as a series: N
y*(x) = ^2
(1.61)
Analytical methods in control and optimization
21
where tpi(x) are certain chosen functions (basis). The reduced problem now is to find optimal parameters {a,i} which lead to extremum of Eq.(1.60). In order to do this one have to solve a system of N equations and N variables: dJ(y*) dai
(1.62)
In general, the Ritz's method will be more exact, the greater the number of terms TV taken into account, and one can even make an estimation of the error made. 1.2
Optimal control theory
Now we are going to discuss some aspects of the optimal control theory. The general theory applies to problems where the dynamics of the controlled object have a state space realization in the form of a finite set of simultaneous first order dynamic equations (state equations) in form: i=f(x,u,t),
(1.63)
where g could be any vector functions of suitable dimensionality. An initial state is assumed to be: x(to) = x° and a terminal state reached at some time t\ is given by one of the following: a)x(ii) = x1 b)x(ti) > x1 c)x(ti) free. "Performance" of the optimal control is measured by an integral criteria I (a cost functional) of a general form: 1=
h(x,u,t)dt
—> max.
(1-64)
Jto
Let us assume that there is a set of possible solutions of the optimal control problem, that transfers the system from the initial state x° to the final one x1, and we are looking for a solution that maximize the integral cost functional Eq.(1.64). This solution will be called an optimal control. In general, there is no reason to suppose that the optimal control always exists. However, let us do not dwell on existence problems and just simply suppose there is such a control. We are going to establish a condition in order to distinguish the optimal control from the other controls that simply transfer the system from f° to xl and do not maximize the functional I .
22
Optimal control and forecasting of complex dynamical
systems
Thus, it will be useful to find a necessary condition that is analogue to the Euler-Lagrange equation. If it is possible to find such a condition, any control u that fails this condition, cannot be optimal. And in the opposite case, any control that satisfies this condition could be optimal and needs to be checked carefully. Such a useful necessary condition was found by a great Russian mathematician Lev Pontryagin (1908 —1988) and published in the series of papers appeared between 1955 and 1961, and is known as the Pontryagin Maximum Principle (PMP). The maximum principle is particulary convenient for the investigation of systems where the desired cost functionals and coupling equations are linear and the constraints are imposed only on the controls, for example: max|M| < 1. Before we formulate PMP let us recall the Lagrange problem of maximizing a function subject to a constraint in the form of an equation. In the optimal control the problem is to maximize an integral subject to a constraint in the form of the differential equation given by Eq.(1.63). Associated with the constraint in the Lagrangian problem there is a constant, a Lagrangian multiplier. As to the problem above, the constraint is a differential equation on the interval [io,*i] a n d it can therefore be regarded as an infinite number of ordinary equations, one for each t. Hence, to the equation in Eq.(1.63) we associate a number p(t) for each t € [io>£i]The function p{t) is usually called an adjoint function (or variable) associated with the differential equation. In addition, we associate a number Po to the function h in Eq.(1.64). In the Lagrangian multiplier theorem the Lagrangian function plays an important role. The comparable function in the present case is the Hamiltonian function, defined by (we assume for simplicity dimension of the problem N = 1): H(x,u,p,t) = p0h(x,u,t) + pf(x,u,t).
(1.65)
The Maximum Principle transfers the problem of finding a u(t) which maximizes the integral in Eq.(1.64) subject to the given constraints, to the problem of maximizing the Hamiltonian function with respect to u. In addition it tells us how to determine the p-function. Precisely formulated, the PMP is as follows: Let u* be a piecewise continuous control denned on [to, ti] which solves
Analytical
methods in control and
optimization
23
problem Eq.(1.63), Eq.(1.64) with terminal state either a), b) or c), and let x*(t) be the associated optimal path. Then there exists a constant po and a continuous and piecewise continuously differentiable function p(t) such that for all t G [to,*i] (po,p(t))^(0,0), u* maximizes H(x*(t),u,p(t),t),
(1.66)
that is,
H(x*(t),u*(t),p(t),t)
> H(x*(t),u,p(t),t)
(1.67)
for all admissible solutions u. Except at the points of discontinuities of
rt'H-ap
(i-68)
where ^ - = Hx(x*(t),u*(t),p(t),t). The p-function is equal to po = 0 or po = 1- To each of the terminal conditions (a), (b), (c) there corresponds a transversality condition: a) p(t\) no condition, b) p(h) > 0 c) p(ix) = 0 Note, the inequality in Eq.(1.67) is valid for all t G [*o,*i]- Now we have to take a small brake. As we see later, the manner in which PMP can be used differs significantly from one problem to another. In general, there no standard procedure guarantees a solution. Let us make us further note: Pontryagin's equations can be extremely difficult to solve analytically, because the boundary conditions on the adjoint equations Eq.(1.68) are usually specified at the terminal time ti, whereas the boundary conditions on the state equations are usually specified at the initial time to- Such twopoint boundary value problems are notoriously intractable. Let us now consider two examples in which the Maximum Principle in each case produces a unique candidate for optimality. Our first example is a pure mathematical problem and it is almost easy as can be. max / x(t)dt Jo
(1.69)
±(t) = x(t)+u(t),
(1.70)
subject to
24
Optimal control and forecasting of complex dynamical
systems
where x(0) = 0, x(l) is free and a restriction on control variable u(t): \u(t)\ < 1.
(1.71)
Let us now make use of the PMP The Hamiltonian of this problem reads: H = H(x,u,p,t)
=p0x+p(x
+ u) = (po+p) +pu.
(1-72)
Let us now assume that the pair (x*(t), u*(t)) solves the problem Eq.(1.69)Eq.(1.71). According to the PMP there must exist a constant p0 and a constant function p(t) such that (po,p(t)) ^ (0, 0) for all t e [0,1]. Moreover, for each t € [0,1], u*(t) that value of u £ [—1,1] which maximizes H(x*{t),u,p{t),t)
= (p0+p(t))x*(t)+p{t)u.
(1.73)
The adjoint function p(t) satisfies, according to Eq.(1.68), (except at the points of discontinuity of u*(t)), the equation QU*
P(«) = — ^ - = - P o - P ( * ) -
(1-74)
Since x(l) is free, we apply the transversality condition given by c): p(l)=0. (1.75) From Eq.(1.66) we obtain in particular that po and p(l) cannot both be 0. Since p(l) = 0, po ¥" 0, a n d thus by Eq.(1.66), po = 1. We have now to put down all the information supplied by the PMP To proceed, notice that the differential equation for p(t) in Eq.(1.74) is particularly simple. In fact, it is a linear equation with constant coefficients so the solution is p(t) = Ae~* -p0
= Ae~* - 1.
(1.76)
To determine the constant A, we see that by Eq.(1.75) and Eq.(1.76), 0 = p(l) = Ae~l — 1, so A = e and therefore, p(t) = e 1 - ' - 1.
(1.77)
Thus the adjoint function is fully determined. From Eq.(1.77) we see moreover that p(t) = —e 1 _ t < 0 for all t. Since p(l) = 0, we conclude, that p{t) > 0 in the interval [0,1). Now let us turn to condition Eq.(1.73). Since only the term p(t)u in the Hamiltonian depends on u, u*(t) is the value of u € [—1,1] which maximizes p(t)u. When t € [0, l),p(t) > 0, so this case the maximum value of p{t)u is attained for u = 0. For t = 1, p{t) = 0 and Eq.(1.73) does not determine u*(l). However, we have previously decided to choose u(t) as a continuous
Analytical methods in control and
optimization
25
function at the endpoints of its domain of definition. Hence we must put u*(l) = 1, so our proposal for an optimal control is this: u*(t) = l , t e [ 0 , l ] .
(1.78)
The associated path must satisfy Eq.(1.70). Solving this linear differential equation we get x*(t) = Be1 — 1. By the initial condition x*(0) = 0, so 5 = 1. Hence, x*(t) = e* - 1.
(1.79)
We have now proved that if the given problem has a solution, the optimal control is given by Eq.(1.78) and the associated optimal path is given by Eq.(1.79). The corresponding value of the criterion function is
I
l t
(e4 - l)dt = e - 2.
(1.80) o We have illustrated above how the P.M.P. works in a simple case. Actually, our effort was not necessary for solving the problem. From Eq.(1.70) and Eq.(1.71) we se that u*(t) = 1 produces the highest value of x{t) for any t £ [0,1], and hence u*(t) = 1 must maximize J0 x{t)dt. Note: The control law that requires every control variable to take one or other of two limiting values, is called bang-bang control. However, most of control problems, unfortunately, are much more complicated and have no simple analytical solutions. That is why we need to develop numerical methods to find optimal control. 1.2.1
Sensitivity
analysis
Naturally, a very important question is to what degree does the existence of an error e in the control affect the cost functional. In order to quantify the effect of a small perturbation of the optimal control field, one have to do a sensitivity analysis. If the value of the cost functional changes substantially for small e then it is necessary to utilize a very accurate control assuring optimality of the cost functional. If small deviations from optimality are not dangerous, then an essentially simpler control may be used. It is possible to estimate approximately the change in the functional for deviations of the desired function from the extremal by means of the magnitude of the second variation:
26
1.2.2
Optimal control and forecasting of complex dynamical
Null
systems
controllability
Let us consider a linear control process, described by a system of differential equation (A, B are matrices): £ = f(x, u) = Ax + Bu.
(1.82)
It is useful to introduce a definition of null controllability: the domain C G Rn is defined as a set of points, each of which can be steered to x\ = 0 with some bounded control \u(t)\ < +oo in finite time. It is possible to proof a theorem, that such a system is null controllable if and only if rank[B, AB, A2B,...,
An~1B] = n,
(1.83)
where n is the dimension of the system. Note, that the above condition is not necessary, when nonlinear phenomena are involved! To illustrate this, let us consider the nonlinear system: x = — x + u(t), y = -y-
x3,
(1.84)
where we assume that |u(i)|<0.
(1.85)
The linear approximation of the system near the point (0,0) is: x(t) = Ax + Bu(t), with A = [[-1,0][0, - 1 ] , B[l, 0]. Note, that the linear approximation is non-degenerate (A is nonsingular) but the algebraic criterion of controllability Eq.(1.83) is not fulfilled, since: rank[B,AB]
= rank[[l,-l],
[0,0]] = 1 < 2.
(1.86)
However, it it possible demonstrate, that the complete nonlinear system Eq.(1.84) is null controllable. 1.2.3
Problems
with constrained
control
One example of the use of the P.M.P. in control theory has been in connection with continuous-time problems having realistic constraints on the range of values accessible to the controls u. One possible representation of such a constraint is to introduce a cost of control criterion, having quadratic form: / u2dt<E. Jo
(1.87)
Numerical
optimization
27
Another, and some times more realistic representation is to introduce a saturation inequality: •U-min
Umax.
(1.88)
In a case of symmetric constraints the condition max|u| < 1 is equivalent to the condition: fT
U
-,2m
dt < T,
(1.89)
here m —> oo. Indeed, if u < umin or u > umax, then u2m - t o o a s m CO and the integral Eq.(1.89) tends to infinity; conversely, if the condition Eq.(1.88) is satisfied, on at least a small section, then u2m —> 0 and the value of the integral will be less than T. Hence, the problem of the extremum of the functional Eq.(1.64) under the constraint like Eq.(1.88) can be formulated as an isoperimetric problem: find the extremum of the functional under the condition that the integral Eq.(1.89) takes a given value T.
1.3
Summary
In this chapter we have briefly reviewed the history of the calculus of variations, which we shall use in order to derive analytical solutions for some optimal control problems in the next chapters. As a first example we have considered the "Brachistochrone problem" that had stimulated the birth of the calculus of variations in the end of the XVII century. Then we demonstrate the derivation of the Euler equation that is actually a cornerstone of the calculus of variations. Further generalizations and developments, including Lagrange multipliers technique, were discussed. Then we make a jump to XX century and formulated the Pontryagin's variational principle for optimal control with some examples. However, we have to note, that there are not so much control problems having analytical solutions, and that stimulates us to open the next chapter, where some numerical techniques will be discussed.
This page is intentionally left blank
Chapter 2
Numerical optimization
With my two algorithms, one can solve all problemswithout error, if God will! Al-Khorezmi, (780-850), (in Science Focus [IBM], no. 1, (1981)). In mathematics and computer science an algorithm (the word is a derivation from the name of the Persian mathematician Al-Khorezmi (or, alternatively, Al-Khwarizmi)) is a finite set of well-defined instructions for accomplishing some task which, given an initial state, will terminate in a corresponding recognizable end-state. In this chapter we are going to discuss some issues of Global Optimization, Multi-objective Optimization and some useful numerical optimization algorithms. We also are going to mention some related problems such as general effectiveness of algorithms (No Free Lunch Theorem), complexity and scaling of the most common optimization problems.
2.1
The halting problem and N o Free Lunch Theorem
In computability theory the halting problem is a decision problem which can be informally stated as follows. Given a description of an algorithm and its initial input, determine whether the algorithm, when executed on this input, ever halts (completes). The alternative is that it runs forever without halting. A great British mathematician Alan Turing (1912 — 1954) proved in 1936 that a general algorithm to solve the halting problem for all possible inputs cannot exist. We say that the halting problem is undecidable. A quite similar situation with a general optimization problem. Since there is no recipe to determine whether any numerical optimization algo29
30
Optimal control and forecasting
of complex dynamical
systems
rithm will find a global extremum, or to prove that it does not exist, we have simply to run the algorithm on our computer for a reasonable time, just wait and hope that the algorithm will find a solution and stop before a deadline. Another interesting issue is effectiveness of different optimization algorithms. There were many attempts in the past few decades in order to answer the question: if one consider the whole Universe of optimization problems, can one find an algorithm which is superior compare to another one? And finally, in 1997 it was published a work by Wolpert and Macready where was presented a formulation and a proof of so-called No Free Lunch Theorem (NFT) [Wolpert (1997)]. This theorem had caused a wide response of the scientific computation society. The main result of the NFT is that if one take an average over the space of all possible optimization problems, all optimization procedures are equally efficient! Of course, some of the algorithms may outperform others on a certain set of problems, but this superiority must be compensated by poorer performance on other problems. We are not going to consider here NFT in detail, for further information one can read a nice overview [Schumacher (2001)].
2.2
Global Optimization: searching for the deepest hole on a golf field in the darkness using a cheap laser pointer
In a very general form, the main goal of the global optimization problem is summarized in the following definition: Let us consider a region A in A^-dimensional space. For a given function / ( x ) ,the value / * = /(x*) > —oo is called a global minimum, if for any x G A: /(x*) < / ( x ) . Then, x* is a global minimum point, / is called the objective function, and set A is called the feasible region. The problem of determining a global minimum point is called the global optimization problem. In the same way one can formulate the problem for search of a global maximum. The definition of a local minimum of an objective function: / ( x ) - the value / * is called a local minimum, if an e-environment C/(xo) : |xo—x| < e, such that /* is the smallest feasible objective function value within this environment. In practice, the number of local minima is quite large, see Fig.2.1. In one can proof, that a given objective function has property of con-
Numerical
optimization
31
vexity, then optimization problem becomes easier task. Convexity-is a geometrical property of a curve. A convex function is a real-valued function / defined on an interval [a, b] if for any two points x and y laying in [a, b] and any a £ [0,1], the following inequality holds: f(ax + (1 — a)y) < af{x) + (1 — a)f(y). It can be proofed, that any local minimum of a convex function is also a global minimum. A strictly convex function will have at most one global minimum. Taking into account this fact, we will call an objective function / convex, if it has exactly one local minimum (and it is also the global minimum), otherwise we will call it non-convex. Most of real-life optimization problems are non-convex. Of course, one is usually interested in finding a global
Fig. 2.1 Example of a ID objective function, note that it has many local and one global minimum.
optimum of a given optimization problem instead of only a local one. Unfortunately, in global optimization no general criterion exists for identification
32
Optimal control and forecasting of complex dynamical
systems
of the global minimum. Another bad message, that almost all practically interesting optimization problems contains huge amount of local minima. As an example of a frequent optimization problem we can mention a global optimization problem caused by the least square estimation of model parameters. Let us consider an observed data sequence (x*, ?/i), i = 1,..., N. Here yi are dependent variables ("effect") and X% 2LT6 independent variables ("cause"). Assuming the existence of a model or at least a hypothesis y(a, Xi) which describes the dependence between "effect" y^ and "cause" xit the model parameter vector a must be estimated in order to minimize the sum of squared differences between measured reality and model predictions: N
Note, that the "effect" y^ could in principle, depend on all x,, that makes the problem strongly non-local. Let us imagine an optimization problem, where a solution can be represented as a bit (1 or 0) string with the length N. One can easily calculate that there are 2 ^ possible variants of the unknown solution. Another example is from the multidimensional optimization; let f(xi,..., xn) is defined as n-l
f(xu
...,xn) = lOsin(Trxi)2 + (xn - l ) 2 + ]T(x* - 1) 2 (1 + 10sin(7rx i+1 ) 2 ).
(2.2) This function has an incredible number of local minima. For instance, the last function has 10 10 local minima when n = 10, but only a single global minimum. These two examples illustrate how fast the complexity of global optimization grows with the dimension of the problem. Clearly, to find the global optimum by full (exhaustive) search in these examples is unrealistic. As a general rule, the search space grows according to the laws of Combinatorics, which often can be nicely approximated by some exponential function of N. In a very natural way the discussion of global optimization turned from continues variables to discrete ones by introducing the idea of a grid search technique. Generally, optimization problems with discrete object variables are called combinatorial optimization problems. One example of such a problem is travelling salesman problem:
Numerical
optimization
33
Find a path through a weighted graph which starts and ends at the same vertex, includes every other vertex exactly once, and minimizes the total cost of edges. According to the exponential growth of the search space as it is mentioned above, it is not surprising that the decision problem to check whether a given feasible solution of a smooth, non-convex nonlinear optimization problem is not a local minimum needs could be an extremely hard problem.
2.2.1
Sensitivity
to numerical
errors
Today's computers and computers of the near future can manipulate and store only a finite amount of information. Since the solution of a nonlinear problem may be a real number, that cannot be represented in finite space or displayed on a screen in finite time, the best we can hope for in general is a point close to a solution (preferably with some guarantee on its proximity to the solution) or an interval enclosing a solution. Computer methods for solving nonlinear problems typically use floating-point numbers to approximate real numbers. Since there are only finitely many floating point numbers, these methods are bound to make numerical errors. These errors, although probably small considered in isolation, may have fundamental implications on the results. Consider, for instance, Wilkinson's problem, which consists in finding all solutions to the equation
20
Y[(x + i)+px19=0,
(2.3)
i=l
in the interval [—20.4; —9.4]. When p = 0, the equation obviously has 11 solutions. When p = 2 - 2 3 « 10~ 7 , it has no solution! Wilkinson's problem clearly indicates that a small numerical error (e.g., assume that p is the output of some numerical computation) can have fundamental implications for the results of an application. These numerical issues require users of numerical software to exercise great care when interpreting their results.
34
2.3
Optimal control and forecasting
of complex dynamical
systems
Multiobjective optimization
Most realworld engineering and scientific optimization problems (including optimal control, forecasting, least square estimation of model parameters, etc.) are multi-objective, since they usually have several objectives that may be conflicting with each other and at the same time must all be satisfied. For example, if we refer to the optimal control of a complex chemical reaction, where the multiple products of this reaction can be controlled, we will usually want first to maximize some final products, but at the same time, as a second task, we would like to minimize the outcome of some others products, that could be in a contradiction with a first goal. Another example is that we want to find the best fitting of the data using the most simple hypothesis y(a, Xi) in Eq.(2.1). Obviously, these two objectives (best fitting and simplicity of the hypothesis) are conflicting with each other. For illustration let us consider a classical example of Schaffer's function [Schaffer (2001)]: F{x) = (hix),
f2(x))
= (-x2, -[x - l) 2 ).
(2.4)
Obviously, there is no single point that maximize it all the components of F, see Fig.2.2. Multi-objective optimization (also called multi-criteria optimization) can be defined as a problem of finding a n-dimensional vector of variables called optimization parameters x = <xi,...,xm> which satisfies all imposed constraints and optimizes a m-dimensional vector objective function f(x) = / i , . . . , fm . Each element of f(x) corresponds to a particular performance criteria. No surprise that they are usually in conflict with each other. To solve the multi-objective optimization problem means to find such an optimal solution x* which would minimize values of all the components of the objective function. As we have mentioned in the previous chapter, in many optimization problems there are usually some restrictions imposed by the particular characteristics of the environment or resources available. For example, in the optimal control of molecules using laser pulses the energy of the laser pulse or the minimum possible pulse duration is bounded. And as we will see, the optimal solution will depend on this bounding value. Such kind of re-
Numerical optimization
35
0
-0.5 -1
-1.5 -2
-2.5 -3
-3.5 -4 -1
-0.5
Fig. 2.2
0
0.5
1
1.5
2
Example of the Shaffer's function.
strictions must be satisfied in order to consider that a certain solution is acceptable. These constraints can be expressed in form of mathematical inequalities: Gi(x)<0,i
= l,..,k,
(2.5)
Hi(x)=0,i
= l,..,k.
(2.6)
or equalities:
Note that k, the number of imposed constraints, must be less than n, the number of decision variables, because if p > n the problem is said to be over-constrained, since there are no degrees of freedom left for optimization.
2.3.1
Pareto
front
As we have seen, a multi-objective optimization problem usually has no unique, perfect (or "Utopian") solution. However, one can introduce a set of nondominated, alternative solutions, known as the Pareto-optimal set, named after a brilliant Italian Economist, Sociologist and Philosopher Vil-
36
Optimal control and forecasting of complex dynamical
systems
fredo Pareto (1848-1923). Pareto-optimal solutions are also called efficient, non-dominated, and non-inferior solutions. We say that x* is Pareto optimal if there exists no feasible vector x* which would decrease some criterion without causing a simultaneous increase in at least one other criterion. Pareto optimum almost always gives multiple solutions called noninferior or non-dominated solutions. In other words, for problems having more than one objective function (for example, Fj, j = 1,2,..., M and M > 1), any two solutions xi and X2 (having P decision variables each) can have one of the two possibilities- one dominates the other or none dominates the other. A solution x\ is said to dominate the other solution x%, if both the following conditions are true [Deb (1999)]: 1. The solution Xi is no worse than #2 in all objectives. 2. The solution x\ is strictly better than X2 in at least one objective. The set of all such solutions which are non-dominated constitute the Pareto front. These solutions are in the boundary of the design region, or in the locus of the tangent points of the objective functions. In general, it is not easy to find an analytical expression of the line or surface that contains these points, and the normal procedure is to compute the points belonging to the Pareto-optimal front and their corresponding function values for each of the objectives. When we have sufficient amount of these, we may proceed to take the final decision. More formally, one can introduce such kind of definition of the Pareno optimality: For an arbitrary minimization problem, dominance is defined as follows: Pareto dominance A vector u = (ui,...,un) is said to dominate v = (vi,..., vn) if and only if u is partially less than v , i.e., for all i £ 1, ...,n , u^ < Vi implicative exists i S l,...,n : Uj < Uj.
Numerical
optimization
37
Pareto optimality A solution xu G U is said to be Pareto-optimal if and only if there is no xv e U for which v = f(xv) = (vi,...,vn) dominates u = f(xu) = (ui,...,u n ). Pareto optimality can be illustrated graphically (see Fig.2.3)
Pareto front
Pig. 2.3
Example of the Pareto front.
by considering the set of all feasible objective values, i.e., the set of all points in the objective space corresponding to the number of degrees of freedom.
2.3.2
The weighted-sum
method
In order to characterize a "quality" of a certain solution, we need to have some criteria to evaluate it. These criteria are expressed as some functions of the decision variables, that are called objective functions. In our case, some of them will be in conflict with others, and some will have to be minimized while others are maximized. These objective functions may be commensurable (measured in the same units) or noncommensurable (measured in different units). In general, the objective functions with which we deal in engineering optimization are noncommensurable. We will designate the objective functions as: fi(x), f2(x),..., f„(x). Therefore, our objective functions will form a vector function f (x) which will be defined by a vector:
38
Optimal control and forecasting of complex dynamical
f(x) = (h(x),f2(x),..,fn(x)).
systems
(2.7)
The easiest and perhaps most widely used method to handle a multiobjective optimization problem, is the weighted-sum approach. Weighting coefficients rt- are real values which express the relative "importance" of the objectives and control their involvement in the cost functional. In this approach the cost function is formulated as a weighted sum of the objectives: N
F(x) = ^ r i / i ( x ) .
(2.8)
i=l
However, this method has its disadvantages, for example, the weightedsum approach can be particularly sensitive to the setting of the weights, depending on the problem. An alternative way to determine Pareto front and solve multiobjective optimization problem is to use multiobjective genetic algorithm [Zitzler (1999)]. In the conclusion, the multi-objective optimization can be identified as one of the most challenging optimization problems.
2.4
Simplex method
In 1965 Nelder and Mead [Nelder (1965)] developed a simplex method for a function minimization, which rapidly becomes one of the most popular methods of optimization. The simplex method was based on earlier developed numerical optimization algorithm by Spendley, Hext, and Himsworth [Spendley (1962)]. A simplex is a convex hull of N + 1 points in the ./V-dimensional search space. It is assumed that the points satisfy the non-degeneracy condition (i.e. the volume of the hull is not zero). In every iteration of the algorithm, the vertices / ( x i ) , / ( x 2 ) , . . . , / ( X J V + I ) of the current simplex are arranged in ascending order according to their objective function values (we consider minimization problem):
/(xi)
< /(x2)
< ... < / ( X J V + I ) .
(2.9)
Numerical
Fig. 2.4
optimization
39
Reflection scheme for the Simplex method.
We refer xi as the best point and x^r+i as the worst point. Let us introduce the mean of all points except the worst one: < x>=
1 N
(2.10) i=l
The simplex method attempts to replace the current worst vertex by a new one that is generated by the following three operations: 1) "reflection" 2) "expansion" 3) "contraction". Only in the case these fails, a 4) "shrink" step is carried out. Here we are going to describe the operations in more details. Let us first try construct a reflection point as follows: ^reflect = < X > + a ( < X > - X j y + l ) .
(2.11)
If / ( x i ) < f(x.refiect) < /(xjv+i), i.e. if the reflected point improves on the worst point but is not better that the best point so far, then we replace
40
Optimal control and forecasting
Fig. 2.5
of complex dynamical
systems
Expansion scheme for the Simplex method.
XJV+I by ^reflect and the iteration terminated. If f(xrefleet) < / ( x i ) , i.e. if the reflected point is better than the best point so far, then we create an expansion point X-expand = < X > +j(xrefiect-
Then we replace the worst point and the iteration is terminated.
XN+I
< X >).
by the better of xreftect
(2.12)
and xexpan
In the case / ( x r e / ; e c t ) > / ( X J V ) , i.e. if the reflected point would still be the worst if it replaced the worst point so far, then a contraction is attempted. Depending on whether the reflected point is better or worse than the worst point so far, two types of contraction are possible: (1) If f(x.refiect) < / ( X J V + I ) , then an outside contraction point ^-contract = < X > + / 3 ( x r e / ; e c t - < X > ) .
(2) If /(x r e /(ect) >
/(XJV+I), ^contract
(2.13)
then an inside contraction point
= < X > +^(xjv+l- < X >).
(2.14)
Numerical
k
optimization
41
cont
cont Fig. 2.6
Contraction scheme for the Simplex method.
Fig. 2.7
Shrink scheme for t h e Simplex method.
In either case, if f(xcontract) < nun(/(xjv + i),/(x r e /j e ct))) t n e n t n e worst point XJV+I is replaced by xContract and the iteration if terminated. If all of the above have failed to generate a point that it is better than the second worst, then all the vertices x; but the best are replaced by new points: x'i = < x i > +<J(x'i - x i ) , i = 2,..., N + l.
(2.15)
According to Nelder and Mead, the purpose of these operations is that "the simplex adapts itself to the local landscape, elongating down inclined planes, changing direction on encountering a valley at an angle, and contracting in the neighborhood of a minimum". Clearly, depending on the quality of the
42
Optimal control and forecasting of complex dynamical
systems
new points that are generated, the methods requires either 1, 2, or N + 2 objective function evaluations per time step. The almost universally recommended values for the parameters are a = 1,(3 = 6 = 0.5, and 7 = 2. We illustrate the "reflection", "expansion", "contraction" and "shrink" steps in a case of two dimensions (see Figs.2.4-2.7). While still being a rather popular optimization method, the simplex search is known to be frequently ineffective, in particular, in cases of searching over multi-modal landscapes, this method can be easily trapped in a local minimum. There are other methods, which we are going to discuus in the next sections, that in general more suitable for the global nonlinear optimization. They are also more robust, than simplex method. These methods which we are going to use further in this book are simulated annealing and genetic algorithm.
2.5
Simulated annealing: "crystallizing" solutions
Stochastic optimization techniques are optimization algorithms, that utilize some random elements. Unlike deterministic search algorithms that locate optima by systematically searching the solution space using gradient information (like Newton's gradient method), stochastic methods apply a degree of randomness to the decision-making process. The random nature of stochastic algorithms has discouraged their use in some applications, especially in the area of trajectory optimization, like Travelling Salesman Problem (TSP), where deterministic algorithms work better. However, the probabilistic elements in stochastic algorithms provide them a capability not possessed by deterministic methods - the ability to escape from a local minimum and stability in the presence of noise. Real engineering and scientific optimization problems are often characterized by a highly multimodal search space containing numerous local optima. Under these conditions, deterministic rules-based algorithms have difficulty while stochastic methods persevere. In the nest pages we are going to discuss some popular stochastic methods. These methods, like simulated annealing or genetic algorithm are inspired by natural systems in physics and biology. In simulated annealing, the process by which atoms cool in molten metal to crystallize into their solid state is modelled. Genetic algorithms implement the Darwinian concept of natural selection to evolve a randomly generated set of mediocre initial guesses into a population of
Numerical
optimization
43
optimal solutions. Simulated annealing is a technique first developed by Scott Kirkpatrick et al. [Kirkpatrick (1983)]. Simulated annealing can be seen as a generalization of a Monte Carlo method for examining the equations of state and frozen states of n-body systems. At the heart of the simulation lies the Metropolis algorithm [Metropolis (1953)], which models the behavior of large systems of particles in equilibrium at a given temperature. This component is fully responsible for the stochastic nature of the algorithm and that gives the algorithm its capability to avoid local minima. The simulated annealing algorithm combines the Metropolis algorithm with a temperature schedule in an attempt to simulate the physical behavior of atoms as they cool from their liquid state into a solid state with a minimum energy. The simulated annealing algorithm derives its name because it emulates the metallurgical process, where the metal reaches its most stable, minimum energy crystalline structure. The objective of the annealing process therefore is to cool the metal so slowly, that all of its atoms align in the same direction, thus achieving a perfectly ordered crystal and the lowest possible energy state. The crucial link between these simulation with optimization process is the analogy drawn between the energy of a given configuration and the value of the objective function. The simulation of annealing is conducted by extending the Metropolis algorithm, which essentially is random perturbations of the system's parameters to minimize the total energy of a system, and if the change in the total energy dE (or objective function) is positive, it is accepted with a probability given by the Boltzmann factor e x p ( — j ^ ) . The atomic structure, i.e. the set of problem parameters, is first initialized to a randomly determined state. The Metropolis algorithm is then run at the initial usersupplied simulated temperature for a number of iterations large enough to consider the system near thermal equilibrium. The system temperature is then decreased following a user-defined temperature schedule, and the Metropolis simulation is conducted for the new temperature. These temperature reductions continue until the simulated temperature of the system has reached absolute zero. Nowadays simulated annealing is one of the most highly regarded, wellunderstood and widely applied stochastic algorithms. The simulated annealing is able to search for a global optimum, and under certain constraints,
44
Optimal control and forecasting of complex dynamical
systems
it converges to a globally optimal solution with a probability 1. However, this requires that an exponential number of trails are performed, even to approximate an optimal solution arbitrary closely, and for some problems (e.g. the TSP) it requires more computation that a complete enumeration of the search space! As can be seen from the previous discussion, the conditions which must be in place for the simulated annealing algorithm to ensure an optimal solution greatly constrains the number of problems for which convergence can be guaranteed. While convergence of the algorithm to an optimal solution can statistically be guaranteed in the theoretical realm, application of the algorithm to real-world problems brings to light the impracticality of many of the algorithm's criteria for convergence. Although claims of guaranteed convergence might be somewhat inflated, simulated annealing remains a powerful optimization algorithm for multi-dimensional and multi-modal problems due to the probabilistic hillclimbing capabilities it possesses. Simulated annealing is thus not a panacea, but rather a powerful alternative in the family of stochastic optimization algorithms.
2.6
Introduction to genetic algorithms
Now let us discuss genetic algorithm, which we are going to use as a tool in the following chapters. Genetic algorithm became widely known after it was formally introduced in the 1970s by John Holland at University of Michigan [Holland (1975); Holland (1978)]. Genetic algorithm is in fact, belong to an interdisciplinary research field with a strong connection to Biology, Artificial Intelligence, and decision support in almost any engineering discipline. Genetic algorithm is based on a model of natural, biological evolution, which was formulated for the first time by a great British naturalist Charles Darwin (1809 - 1882) in 1859 in his book "The Origin of Species". The Darwinian theory of evolution explains the adaptive change of species by the principle of natural selection, which favors those species for survival and further evolution that are best adapted to their environmental conditions. Genetic algorithm is a subset of the class of stochastic search methods (we have discussed in the previous section another stochastic search methodsimulated annealing). Whereas most stochastic search methods operate on a single solution to the problem at hand, genetic algorithm operates on a
Numerical
45
optimization
population of solutions. In addition, genetic algorithm exhibits a large degree of parallelism, making it possible to effectively exploit the computing power made available through parallel processing. To use genetic algorithms, one must represent a solution to the problem as a genome (or chromosome). The genetic algorithm then creates initial population of solutions and applies genetic operators such as mutation and crossover [Sutton (1994)] to evolve the solutions in order to find the best one(s). Let us outline some of the basics of genetic algorithms. The three
r^>
Population Selection parents in proportion to their fitness
0 1 0 0 0 1 1 1
^Mffffffjffll
-CU ih^tttt^g •8 Fig. 2.8
0 0
A general scheme for genetic algorithm.
most important aspects of using genetic algorithms are: 1. definition of the objective (fitness) function, 2. definition and implementation of the genetic representation, 3. definition and implementation of the genetic operators.
46
Optimal control and forecasting
of complex dynamical
systems
Once these three steps have been performed, the genetic algorithm should work well. Beyond that one can try many different variations to improve performance, find multiple optima, or parallelize the algorithms.
T1
1
0 |1 |0
0
1
0
1
*
1
1 Fig. 2.9
1
Mutation operation on a bit string.
1
0
0 |0 |0
1
0
1 0
0
0
t
0
1
1 0
1
1 0
1
*
1
1
0
0
+ 0
Fig. 2.10
1 0
1 0
0
1
Crossover operation between two bit strings.
The genetic algorithm is very simple, yet it performs well on many different types of problems, because of its flexibility. There are many ways
Numerical
optimization
47
to modify the basic algorithm, and many parameters that can be adjusted. Basically, if one gets the objective function right, the representation right and the operators right, then variations on the genetic algorithm and its parameters will result in only minor improvements. Since the algorithm is separated from the representation of the problem, searches of mixed continuous/discrete variables are just as easy as searches of entirely discrete or entirely continuous variables. One can use different representations for the individual genomes in the genetic algorithm. Holland worked primarily with strings of bits [Holland (1975)], but one can use arrays, trees, lists, or any other object. However, one must define genetic operators (initialization, mutation, crossover, copy (reproduction)) for any representation that one decides to use. One have to remember that each individual must represent a complete solution to the problem one is trying to optimize. Let us now discuss main genetic operators: copy, crossover and mutation. After the seminal work of Holland the most common representation for the individual genomes in the genetic algorithm is a string of bits. The reason is that the definition of the genetic operators in this case is very simple. That is why we are going to explain first the definition of genetic operators using bit representation. The disadvantage of the bit string representation is that some of the bits have exponentially bigger weights than others. The result is that a random flip of the high order bit could change the solution dramatically, and place the "offspring" far away from its "parent" with a poor probability to improve the fitness value. Another possible alternative is to use a bit string, but employ a Gray code-an ordering of In binary numbers such that only one bit changes from one entry to the next. In this case a small perturbation (mutation) of the higher order bits will not change the initial number dramatically. However, the Gray codes for 4 or more bits are not unique, even allowing for permutation or inversion of bits, and also need some algorithm for coding and decoding. The copy or reproduction operator merely transfers the information of the "parent" to an individual of the next generation without any changes. The mutation operator introduces a certain amount of randomness to the search. It can help the search find solutions that crossover alone might not encounter. Usually the mutation represents the application of the logical "NOT" operation to a single bit of a "gene" at a random position, see Fig.2.9. Typically the crossover operation is defined so that two individuals (the
48
Optimal control and forecasting
of complex dynamical
systems
parents) combine to produce two more individuals ("the children"). The primary purpose of the crossover operator is to get "genetic material" from the previous generation to the subsequent generation. In a simple crossover, a random position is chosen at which each partner in a particular pair is divided into two pieces. Each "parent" then exchanges a subsection of itself with its partner (see Fig.2.10). Note, that application of the crossover operation between identical "parents" leads to the same "children". There are different implementations of the schedule of the genetic operations. Two of the most common genetic algorithm implementations are "simple" and "steady state". The simple genetic algorithm is described by Goldberg [Goldberg (1989)]. It is a generational algorithm in which the entire population is replaced each generation. In the steady state genetic algorithm only a few individuals are replaced each "generation". This type of replacement is often referred to as "overlapping populations". Often the objective scores must be transformed in order to help the genetic algorithm maintain diversity or differentiate between very similar individuals. The transformation from raw objective scores to scaled fitness scores is called scaling. There are many different scaling algorithms. Some of the most common are linear (fitness proportionate) scaling, sigma truncation scaling, and sharing. Linear scaling transforms the objective score based on a linear relationship using the maximum and minimum scores in the population as the transformation metric. Sigma truncation scaling uses the population's standard deviation to do a similar transformation, but it truncates to zero the poor performers. Sharing derates the score of individuals that are similar to other individuals in the population. For a complete description of each of these methods, see Goldberg's book [Goldberg (1989)]. The selection method determines how individuals are chosen for mating. If one use a selection method that picks only the best individual, then the population will quickly converge to that individual. So the selector should be biased toward better individuals, but should also pick some that aren't quite as good (but hopefully contain some good "genetic material"). Some of the most common methods include "roulette wheel selection" (the likelihood of picking an individual is proportional to the individual's score), "tournament selection" (a number of individuals are picked using roulette wheel selection, then the best of these are chosen for mating), and "rank selection" (pick the best individuals every time). Sometimes the crossover operator and selection method lead to a fast convergence of the population of individuals that are almost exactly the
Numerical
optimization
49
same. When the population consists of similar individuals, the likelihood of finding new solutions typically decreases. On one hand, it is desired that the genetic algorithm finds "good" individuals, but on the other hand one need it to maintain diversity. In general, the genetic algorithm is much more robust than other search methods in the case of a noisy environment, or/and if the search space has many local optima. Genetic search method has been recently applied for plenty of various optimization problems in science, for example, to optimize the atomic structures of small clusters [Judson (1992); Deaven (1995); Michaelian (1998); Garzon (1998)]. In these works the global minimum of the cohesive energy was obtained for different cluster species using LennardJones potentials [Judson (1992)], ionic potentials [Michaelian (1998)], or interaction potentials derived from the tight-binding Hamiltonian [Deaven (1995); Garzon (1998)]. Especially successful applications of the genetic algorithm were performed in optimal control theory [Vajda (2001)]. As we have mentioned, the genetic algorithm could be used for both discrete/contineous optimization. However, if one is going to apply a "classical" version of the genetic algorithm to search for optimal continuous and differentiable function, one have to take an additional care about smoothness and differentiability of the obtained solutions. The direct application of the mutation and crossover rules leads to generation of "children" with discontinuity at the positions of the crossover or mutation operations, and therefore, such kind of solutions do not belong to the class of our interest. In the next section we present an extension of genetic algorithm to search optimal solution in the class of smooth functions. As applications of this new technique we will consider search for a ground state function for a given system's Hamiltonian, or an optimal shape of electric field to control a nanoscale device.
2.7
GA for a class of smooth (differentiable) functions
The following extension of GA was originally developed to demonstrate how to obtain ground state wave-functions of a quantum system confined in external potential. Although this type of optimization problem belongs to the class of quadratic optimization and GA definitely is not the fastest way to solve such type of problems, it could be easily applied to the case of optimal control problems to obtain realistic solutions (continuous and with
50
Optimal control and forecasting of complex dynamical
systems
limited absolute value of local gradient, that corresponds to a finite time resolution of the control field). In the case of a few-body interacting quantum system, when it is treated under the mean field approximation (using Hatree-Fock or Density Functional theory), the corresponding minimization problem becomes nonlinear and GA can serve better, because it usually easier avoids local minima compare to gradient-based methods. Since this method was originally applied to quantum systems, we called it Quantum Genetic Algorithm (QGA). Let us start with the description of a quantum system. We assume H be the Hermitian Hamiltonian operator of a TV-body quantum mechanical system: H = Hkin + Hpot + Hint, where (throughout the section we use atomic units
(2.16) fi,=m=e=l)
N
1
Hkin = rZ 2_1V»2' i=i
N
Hpot = "£jU{xi), N-l
(2.17)
N
Hint = Yl Yl V& ~*i)i=l
j=i+l
Operators Hkin, Hpot, Hint refer to kinetic, potential and interaction energy. Let us first consider a quantum mechanical ground state problem for a system described by the Hamiltonian Eq.(2.16). Let ^(xi,X2,--,Xff) be an arbitrary TV-body wavefunction. We assume that \& is normalized and (*|\1/) = 1. One can write an inequality for the ground state energy E0 in this case: E0 < <#|£|*>.
(2.18)
Starting with a population of trial wavefunctions one can run the evolutionary procedure (GA) until the global minimum of the energy functional given by Eq.(2.18) is attained. For simplicity let us consider first the ground state problem for one particle in one dimension. In our approach a wavefunction vjr(x) is discretized
Numerical
optimization
51
on the mesh {2^} in real space, i = l,.., L, where L is a number of discretization points, and represented by the "genetic code" vector \&j = ^(xi) (see Fig.2.11).
Y(x)
¥ ( * , ) — - W.
T
L-2
Fig. 2.11 Representation of the discretized in real space wavefunction ^ ( x ) as a genetic code vector.
As we have mentioned before, there are different ways to describe the evolution of the population and the creation of the offsprings. The genetic algorithm we propose to obtain the ground state of a quantum system can be described as follows: (i) We create a random initial population {v^- (x)} consisting of Npop trial wave functions. (ii) The fitness E[^\
] of all individuals is determined.
(iii) A new population { ^ genetic operators.
(x)} is created through application of the
(iv) The fitness of the new generation is evaluated, which replaces the old one. (v) Steps (iii) and (iv) are repeated for the successive generations
52
Optimal control and forecasting of complex dynamical
systems
{*?\x)} until convergence is achieved and the ground-state wave function is found.
0.2
0.4
0.6
0.8
Coordinate x (arbitrary units) Fig. 2.12 Two randomly chosen wavefunctions for the crossover operation. The vertical dashed line shows the position of the crossover.
0.2
0.4
0.6
0.8
Coordinate x (arbitrary units) Fig. 2.13 Result of the direct application of the "classical" crossover operation. Note the discontinuity of the function 'J'j at the position of the crossover operation, that leads to extremely high kinetic energy of the "offspring".
Usually, the real space calculations deal with boundary conditions on a
Numerical
optimization
53
box. Therefore, in order to describe a wave function within a given interval in one dimension a < x < b we have to choose boundary conditions for ^ ( a ) and $(b). For simplicity we set ^(a) = <]/(&) = 0, i.e., we consider a well with infinite walls at x = a and x = b. However, one can also employ, for example, the periodic boundary conditions for a "ring" system. Inside the box one can simulate different kinds of external potentials, and if the size of the box is large enough boundary effects on the results of our calculations can be minimized. As initial population of wave functions satisfying the boundary conditions tyj (a) = fyj (b) = 0 we choose Gaussian-like functions of the form ^j(x)
= Aj exp(-(a; - Xjfja)){x
- a){b -x),j
= 1,...,
Npop,
(2.19)
with random values for the peak position Xj € [a, b] and the width (Tj £ (0,6 — a], whereas the amplitude Aj is determined from the normalization condition J \^(x)\ dx = 1 for given values of Xj and aj. One
l
H Coordinate x (arbitrary units) Fig. 2.14 Example of a smooth step function St(x) operation (see text).
used in the "uncertain" crossover
can significantly reduce computational costs choosing a wise guess of initial wavefunctions. If any approximate form of the solution is known, one can generate random initial population "near" this solution. After a few iterations successful offsprings will converge to the improved solution. It is also useful to generate initial population with the symmetry properties,
54
Optimal control and forecasting of complex dynamical
systems
reflecting the symmetry of the Hamiltonian. However, we shall show that the QGA successfully finds solutions starting from a random population denned byEq.(2.19). 2.5
2
^-. 1-5
>—' l
0.5
0
0
0.2
0.4
0.6
0.8
1
Coordinate x (arbitrary units) Fig. 2.15 Result of the application of the "smooth" crossover operation. The vertical dashed line shows the position of the crossover operation.
As we mentioned above, we should define three kinds of operations on the individuals: the copy, mutation of a wavefunction, and crossover between two wavefunctions (see Fig.2.12). While the copy operation has the same meaning as in previous applications of the GA, the crossover and the mutation operations have to be redefined to be applied to the quantum mechanical case. The reason is that after straightforward application of the crossover operation between two "parents" one unavoidably obtains "children" with discontinuity at the position of the crossover. It means that the "offsprings" have infinitely (practically, very large) kinetic energy, and therefore, cannot be considered as good candidates to be the ground state wavefunction (see Fig.2.13). To avoid this problem we suggested a new modification of the genetic operations to apply to smooth and differentiable wavefunctions. The smooth or "uncertain" crossover is defined as follows. Let us take two randomly chosen "parent" functions $[s)(x) and $>2a)(x) (see Fig.2.12). We can construct two new functions \&f+ \x),ty2
(x)
V[s+1){x)
= <&{s\x) St(x) + V2s\x)
V2s+1){x)
= V2s\x) St(x) + V{s){x) (1 - St(x)),
(1 -
as
St{x)) (2.20)
Numerical optimization
55
where St(x) is a smooth step function involved in the crossover operation. We consider St(x) = (l+tanh((:r—xo)/k%))/2, where xo is chosen randomly (XQ S (a, b)) and kc is a parameter which allows to control the sharpness of the crossover operation. An example of the function St(x) is shown in Fig.2.14. The result of the smooth crossover is presented in Fig.2.15. In the limit kc —> 0 one obtains the usual Heaviside step function St(x) = 6(X—XQ) and the transformation Eq.(2.20) becomes the "classical" crossover operation. Note, that the crossover operation does not violate the boundary conditions and application of the crossover between identical wavefunctions generates the same wavefunctions. The mutation operation in the quantum case must also take care about smoothness of the generated "offsprings". In "classical" GA it is not possible to change randomly the value of the wave function at a given point without producing dramatic changes in the kinetic energy of the state. To avoid this problem we define a new mutation operation as ^'+1\x)
= *W(a;) + * P ( x ) ,
(2.21)
where ^fr(x) is a random mutation function. For simplicity we choose ^r(x) as a Gaussian-like ^r(x) = B exp(—(xr — x)2/km) (x — a) (b — x) with a random center xr G (a, b), width km € (0, b — a), and a small amplitude B that can be both positive or negative. Note, that so defined mutation also does not violate the boundary conditions. In order to find the ground state, for each step of the QGA iteration we randomly perform copy, crossover and mutation operations. After each application of the genetic operation (except coping) the new-created functions are normalized. It is very easy to extend the quantum genetic algorithm to treat quantum systems of a few interacting particles in two dimensions. We perform calculations on a finite region f2 where we discretize the real space. Q, = {(x,y),0 < x < d, 0 < y < d} We assume again that the external potential outside Q, is infinitely high. For a simplified study let us consider a two-particle system described by a wave function ^ J J F C P I , ^ ) , with f = (x,y), which is the Slater determinant consisting of orthogonal and normalized one-particle wave functions ipu(r), v = 1,2. This means that the optimized ^HF will represent the exact ground-state wave function for the case of noninteracting particles, whereas for the interacting case ^HF will correspond to the Hartree-Fock
56
Optimal control and forecasting of complex dynamical
systems
approximation to the ground-state wave function. As in the one dimensional case, an initial population of trial two-body wave functions {\l/i},i = l,..,Npop is chosen randomly. For this purpose, we construct each \I>, , using Gaussian-like one-particle wave functions of the form 1>„(x, y) = Av e x p ( - ( x ~ ^ ) 2 -
{y
a
~2Vv)2 )x(d - x)y(d - y),
(2.22)
a
X,v
Y,v
with v = 1,2 and random values for xv, yv and for <7x,v, VY,V f° r each wave function. The amplitude Av is calculated from the normalization condition J f \ipj(x,y)\ dxdy = 1, and its sign is chosen randomly. Note, that defined in such way the wave functions ipj{x,y) fulfill zero condition on the boundary dQ Mx,y)
=0.
(2.23)
ail
The so constructed initial population {^i} corresponds to the initial generation. Now, the fitness of each individual \& j of the population is determined by evaluating the energy functional E{ = Etyi] = [ *:(ri > r2)H(fi,r ? 2 )*i(fi,r 3 )dfidr ? 3 , (2.24) Jo where H is the Hamiltonian of the corresponding problem. This means that the expectation value of the energy for a given individual is a measure of its fitness, and we apply the QGA to minimize the energy. By virtue of the variational principle, when the QGA finds the global minimum, it corresponds to the ground state of H. Now we define the smooth crossover in two dimensions. Given two randomly chosen single-particle "parent" functions ip;° (x,y) and ip^0 (x,y) (i,l = l,Npop, ii,u = 1,2), one can construct two new functions
i>ivew)(x> y)Mlew)(x> 4:eW)(x,y) ew
v) a s
= V - i f W ) St(x,y) + ^;ld\x,y) ld
d
i>\; \x,y) = i>l; \x,y) St(x,y) + 4°J \x,y)
(1 - St(x,y)) (2.25) (1 -
St(x,y)),
where St(x, y) is a 2D smooth step function which produces the crossover operation. We define St(x,y) = (l+tanh((aa;-|-&2/ + c)/fc;?))/2, where a,b,c are chosen randomly, so that the line ax + by + c = 0 cuts fi into two pieces. kc is a parameter which allows to control the sharpness of the crossover operation.
Numerical
optimization
57
In the same manner we modify the mutation operation for random "parent" ip(°ld){x,y) as
4:ew\x,y)
= ^(x,y)+Mx,y),
(2-26)
where ipr(x,y) is a random mutation function. We choose ipr{x,y) as a Gaussian-like function tpr(x,y) = Ar exp(—(xr — x)2/R2 — (yr — y)21Ry) x(d — x)y(d — y) with random values for xr, yr, Rx, Ry and AT. Note, that application of the defined above mutation operation does not violate boundary conditions. As it was discussed before, for each iteration of the QGA procedure we randomly perform copy, crossover and mutation operations. After each application of a genetic operation the new-created functions should be normalized and orthogonalized. Then, the fitness of the individuals is evaluated and the fittest individuals are selected. The procedure is repeated until convergence of the fitness function (the energy of the system) to a minimal value is reached. Inside the box CI one can simulate different kinds of external potentials. If size of the box is large enough, boundary effects become negligible. Concerning our choice of the GA parameters, for the following examples we have used Pm = 0.015 for the probability of a mutation and Pc — 0.485 for the probability of a crossover operation. In the rest cases we perform the coping operation. During our calculations we set different sizes of the population up to Npop = 1000. However, the population size of only 200 "parents" usually guarantees a good convergence of the algorithm.
2.8
Application of the GA to the eigenproblem
In this section we present results of the ground and excited states calculations for interacting particles in a confined quantum system (like a quantum dot system) using the Quantum Genetic Algorithm. First we perform calculations of the ground state for different simple one and two dimensional systems and compare the obtained results with known analytical solutions. This serves as a good test for the method developed in this work. Then we compute the partition function and the excitation spectra of strongly interacting few body systems. With the help of the QGA we investigate formation of the "Wigner molecule" in systems of few confined electrons. We also investigate two different mechanisms for the so called "melting of the Wigner molecule", namely due to thermal and quantum fluctuations.
58
2.8.1
Optimal control and forecasting of complex dynamical
The ground state problem in one and two
systems
dimensions
With the purpose to test the QGA, first we apply it to calculate the ground state wave function ^{x) in the case of different external potentials in one and two dimensions. For each iteration of the QGA we evaluate the fitness function for the different individuals of the population: Ej = E[ipj] =< tyj \H\^j >, then follow the steps described above. This process is repeated until the values of the fitness function converge to the minimal value of the energy. In the figures presented below we show the results for the density probability of the ground state and the behavior of the fitness function during the iterative GA-procedure. Let us start from the ground state problem for one particle captured in the region [0,L] being in the infinite square well. The analytical solution gives the lowest energy state with energy E = n2 I'll? and corresponds to the ground state wavefunction *(x) = y/2/Zsm(irx/L).
(2.27)
In Fig.2.16 we show the calculated ground state particle density |^(x)| 2 for a potential well with infinite walls at x = 0 and x = 1 (throughout this section we use atomic units h = e = m = 1). In the inset of the Fig.2.16 we show the evolution of the mean energy of the population. The mean population energy is defined using calculated energies of all population members. It is clear that the QGA converges rapidly to the ground state. The ground-state energy calculated using our method is very close to the exact value E = ir2/2 = 4.9348... up to an error of 1 0 - 5 % already after 20 iterations. We also performed calculations for other analytically solvable problems, namely the harmonic potential U(x) = \UJ2{X — 0.5) 2 . In this case the ground state energy is E = w/2, and the ground state wavefunction is given by V(x) = L/rr)
exp(-
(2.28)
In Fig.2.17 the calculated ground density state is shown for to = 2\/l0• 102. The value of the u is chosen to be rather large, therefore one can neglect the influence of the walls on the final result. In the inset of Fig.2.17 we show the evolution of the mean energy of the population. It converges slower than in the case of the infinite well because the QGA should find rather localized solution. However, it converges after only 30 iterations. For the ground-state energy the QGA yields E®GA = 316.29, while the analytical
Numerical
optimization
59
Fig. 2.16 Ground state spatial density distribution of an electron \^(x)\2 in a onedimensional infinite well (defined on the interval [0,1]) calculated using the QGA. Inset figure: evolution of the fitness as a function of the number of iterations.
result is E = 316.22 which represents a discrepancy of less than 0.02% after 30 iterations. In order to check whether the QGA finds a solution in more complicated cases, if, for example, the spatial distribution of the electron density has not one but two maxima, we performed calculations for the case of anharmonic potential with two distinct minima. In Fig.2.18 we present the calculated ground state density for an anharmonic potential of the form U(x) = fc0 - k2x2 + k3x3 + k4x4,
(2.29)
with k0 = -137.7074997, k2 = 7, k3 = 0.5, and k4 = 1. We use these values of the parameters in order to compare with existing calculations performed using the spectral method [Feit (1982)]. In the inset of Fig.2.18 we also show the evolution of the mean energy of the population. It converges slower than in the previous two cases. The reason is that the QGA operates is more complicated space of possible solutions. Nevertheless, algorithm converges after 200 iterations. Our calculated ground-state energy is E®GA = —144.87, whereas the value obtained by using the spectral
60
Optimal control and forecasting of complex dynamical
10
1
1
/ \
-
3000
/ \ >> \ 1 1
PC 2OD0 *> == 1000
11 1
1
y
. •,
)
i 10
-n<
I / 1
systems
J
;
i 3D
30
•
Iterations
1
-
*
"
•
4 •g ••3
£•
..-••'-
2
/
\
1
0.2
0.4
0.6
0.8
x, (atomic units) Fig. 2.17 Calculated spatial distribution of electron density |<&(a;)|2 (solid line) for an electron in a ID harmonic potential (dotted line). The potential is re-scaled. The inset shows the evolution of the fitness as a function of the number of iterations.
method is E = —144.96, i.e., the discrepancy is less than 0.06% after 200 iterations. Our next example deals with the ground state of an electron subject to a ID potential produced by a chain of 5 positive charged ions, which is given by
U(x) = J2
Q ^/{x - Xi)2 + S
(2.30)
where Q is the charge of each ion and Xi its position, here 5 is a cutoff parameter for the Coulomb potential in ID. This smooth ID ionic potential has been used, for instance, in the context of the Coulomb explosion of small clusters induced by intense femtosecond laser pulses [Garcia (2000)]. In the QGA-calculations for this potential, and in order to speed up the convergence process, we use for the initial populations trial functions of the form ^j(x) — yj-Aj exp(—(x — Xj)2/a'j)(x
— a)(b — x),
(2.31)
Numerical
1
1
1
i
optimization
1
'
f
Energy
-135
2-
-140 -145
i_
\
S
'
k )
£
61
1 1
1
100
Iterations
'V
/'
20CV
-
y
-
-1 4
6
10
x, (atomic units) Fig. 2.18 Calculated spatial distribution of the electron density |\P(x)| 2 (solid line) for an electron in an anharmonic potential of the forth order (dotted line). The inset shows the evolution of the fitness as a function of the number of iterations.
having 5 peaks, where the amplitudes Aj, widths 0 < aj < b — a and peak positions Xj £ (a, b) are random numbers. In our calculations we have used Q = 5, a = 0,6 = 50, Xi = 13,19,25,31,37, and S = 0.25. Note, that the calculated probability distribution, shown in Fig.2.19, has the same symmetry properties as the external potential U(x). In the inset of Fig.2.19 we plot the evolution of energy of an average of the fitness in the population. It is clear that using appropriate initial guess for the form of the wave function can considerably accelerate the convergence of the QGA. In this example the genetic process converges after only 20 iterations. Now we study the simplest case of the few body problem. For simplicity we consider first a system of two noninteracting fermions trapped into a harmonic potential V(x) — \w2{x — 0.5) 2 . We assume that the wavefunction of electrons has symmetric spin part, and therefore we search for the antisymmetric spatial solutions. The triplet-state wave function of two noninteracting electrons having the lowest energy can be written as ^(x,x') = [<j>i(x)2(x') — (f>2(x)(pi(x')]/y/2, where 4>i(x) and foix) are the ground-state and first excited state of the single-particle Hamiltonian, respectively.
62
Optimal control and forecasting
1
*
i
1
Energy
s-0.4
i
of complex dynamical
-
j
-9
-9. 5 (
t
1 1
I
)
JUUJ
10
20
-
-
13-0.2 (X
' '
-
t
•a 0.2
A
systems
i
1
10
20
,
1
I
30
40
50
x, (atomic units) Fig. 2.19 Calculated spatial distribution of the electron density | * ( i ) | 2 (solid line) for an electron in a potential produced by a chain of positive ions (dotted line). T h e inset shows the convergence behavior of the fitness function.
With the help of the QGA we have determined (j)\{x) and i|2(:r) and |>2|2(:r) for this case are shown using UJ = \/20 • 102. Note, that this procedure yields both the two-particle triplet state with the lowest energy and first two single-particle states of the single particle Hamiltonian. For the ground-state energy the QGA yields EQGA = 894 go, while the analytical result is E = 2w = 894.43. Let us now test the QGA by determining the ground state wavefunction in two dimensions under different external potentials. The region of discretization Q is chosen as follows. For practical purposes we consider a rectangular box O = {(x, y),0 < a; < d, 0 < y < d} in two dimensions. The value of the length d is 1 atomic unit. We also assume that the external potential outside O is infinitely high. In all 2D examples presented here we use a lattice with 100 by 100 grid points. To calculate the kinetic energy term in the Hamiltonian given by Eq.(2.16) we use a high-order finite-difference formula [Bickley (1941)].
Numerical
1? 15
1
1
'
optimization
1
1
63
1
'
bH 2000
-
C
A f \
S3
1000
w
5 10
'
N ^ .
0)
1
1
1
1 50
-
100
Iterations
:J
f £1
•3 * •*
•
/
\ '•
£
0.2
0.4
0.6
^*" 0.8
x, (atomic units) Fig. 2.20 Calculated densities \cf>i(x)\2 (solid line) and |>2(z)|2 (dotted line) of two orbitals which build the first triplet-state wave function for two noninteracting electrons in a ID harmonic potential (dashed line). The convergence behavior of the fitness function is shown in the inset.
In our first example we perform the evaluation of the ground state for an electron confined in the infinite well. The ground state is given by *(xi,2/i,£2,2/2) = 4[sin(7rai)sin(7r2/i)sin(7riE2)sin(7r2/2) (see Fig.2.21). We found an excellent agreement with the analytical solution. In our second example we perform the evaluation of the ground state for two noninteracting particles (in a triplet state) confined in the infinite well. The ground state of this system is degenerate, and the wave functions corresponding to the different degenerated states are antisymmetric. One possible solution is given by y{xi,yi,X2,y2)
= 4[sin(7rai)sin(7T2/i)sin(27ra:2)sin(7r2/2) — sin(7r:E2) sm(iry2) sin(27r:ri) sin(7T2/i)]
(2.32)
The QGA procedure converges rapidly to a solution having the same symmetry of the function given by Eq.(2.32). In Fig.2.22 we show the ground state spatial density pQGA{r) = f \^QGA(r,r')\2dr' obtained from the QGA. The overall shape of the solution and its symmetry are in a good agreement with the exact result. The calculated value of the ground en-
64
Optimal control and forecasting of complex dynamical
systems
1,0 Fig. 2.21 Ground state spatial density distribution of an electron | * ( x ) | 2 in a infinite well calculated using the QGA.
ergy (E C?G ' 4 =34.543619) is also in a very good agreement with the analytical result (E = 77r2/2=34.543615), the relative error being less than
io-5%. In the next example we determine the ground state of two noninteracting particles (in a triplet state) in a 2D harmonic potential described by the potential U(x, y) =
±UJ\{X
- 0.5) 2 + (y- 0.5) 2 ),
(2.33)
Numerical
optimization
65
Fig. 2.22 Density distribution PQGA(X,V) for the ground state of two noninteracting fermionic particles (triplet state) in a square infinite well.
using w = 10 2 . The analytical solution for one of the degenerate triplet state reads
3
y(xi,yi,x2,y2)
= — e x p ( - ^ V ((a* - 0.5)2 + (yi - 0.5) 2 ))(z 2 - an).
(2.34) In Fig.2.23 we present the ground state density PQGA{%, y) for this problem. In this case there is also a good agreement between the result obtained from the QGA and the exact analytical solution. The calculated value of the ground-state energy is EQGA = 300.0024, which compares well with the exact one (E = 3ui = 300).
66
Optimal control and forecasting
of complex dynamical
systems
Fig. 2.23 Density distribution PQGA{X,V) f ° r the ground state of two noninteracting fermionic particles (triplet state) in an external harmonic potential.
2.8.2
Extension lems
of the QGA
to quantum
statistical
prob-
Now we are starting to discuss a possible generalization of the QGA for the case of quantum statistical problems. In order to compute not only a ground state, but also exited states of a few body quantum system, one needs only a small modification of the QGA. For this purpose we use a variational formulation for the partition function Z of a many body quantum system. Let {vPfc} be an arbitrary orthonormal set of N body wave functions (\J/fc = $>k(x1)..,xiy)). Note, that we count eigenstates in such way, that ^>i corresponds to the ground state. It can be shown that the partition function Z of the quantum system satisfies the following inequality [Peierls (1936)]: Z>Z'
= ^e-W*\A\*k\ k
(2.35)
Numerical optimization
67
where the Hamiltonian H is defined by Eq.(2.16) and the dimensionless parameter (3 is proportional to the inverse temperature: 0 = -j^p, where ks is the Boltzmann constant. The equality holds only in the case if {^k} is the complete set of eigenfunctions of the Hamiltonian H. In practice, for a finite temperature calculations using Eq.(2.35), one can take in account the lowest M levels of the system, neglecting the occupation of the levels with higher energies. The number of considered levels M can be chosen in such way, that occupation of the neglected levels does not exceed a certain value at a given temperature T. In the limit when the temperature goes to zero (T —+ 0 or /? —* +oo) Eq.(2.35) becomes equivalent to the variational principle for the ground state energy EQ:
£b<<*i|ff|*i>.
(2.36)
Following the spirit of the QGA we assume that each "parent's" gene represents the set of M orthonormal wavefunctions {^k}- According to Eq.(2.35) one can obtain a good approximation of eigenfunctions and eigenvalues of the Hamiltonian H and also the partition function Z by running an evolutionary procedure until the sum in Eq.(2.35) attains its maximum possible value. In practice, full ab initio quantum mechanical calculations of the excitation spectrum are quite difficult even for the case of very few interacting particles. The reason is huge data amounts rapidly increase with a number of particles considered in the system [Bruce (2000)]. Therefore, quantum mechanical calculations with exact many body wave functions are limited by a number of 3 —4 particles in 2D, that corresponds to 6- or 8-dimensional wavefunctions in the configuration space. For a simplified study of the problem, we reduce the dimension of the many particle wavefunction ^ using the Hartree-Fock approximation. This is the simplest way to account for electron-electron interactions within the quantum system [Yannouleas (1999)]. The Hartree-Fock level is implemented across nuclear and atomic physics as a first step towards solution of the quantum many-body problem and suffices for a qualitative discussion of phase transitions and thermal behavior [Staroverov (2002)]. Let us consider a system of N spinless particles occupying K single particle states (N
68
Optimal control and forecasting
of complex dynamical
systems
Slater determinant:
*fe(x 1 ,..fjv)
(2.37)
where indexes 1 < ii,Z2, ...,iw < K counts the single particle states. This means that the set {$>k} will represent the exact excitation spectrum for the case of noninteracting particles. For the interacting case {^k} will correspond to the Hartree-Fock approximation. For simplicity let us consider one dimensional problems. As in the case of searching for the ground state, we perform calculations on a finite region [a, b] where we discretize the real space. However, as we have seen, generalization of the presented algorithm to higher dimensions is straightforward. The algorithm is implemented as follows. An initial population of Npop trial many-body states {^>Jk}, j = 1 , . . . , Npop is chosen randomly. For this purpose we construct K x Npop single particle wavefunctions ip? using a Gaussian-like form #'(*) = A\ exp ( - (x - S>f/{ai?)
(x - a) (b - x),
(2.38)
where 1 < i < K and index j denotes that the single particle wavefunction ipl(x) belongs to the "parent" with index j . We generate random values x\ E (a, b), and u\ S (0, b—a] for each single particle wavefunction. The amb
2
plitude A\ is calculated from the normalization condition J \ipj (x)\ dx = 1. The initial population of {ipj} is orthogonalized for each "parent". Note, that defined in such way wavefunctions ipf (re) fulfill zero conditions on the boundaries. As in the case of searching for the ground state, "offsprings" of the initial generation are formed through application of genetic operators on the genetic codes. As in the previous calculations, we define "quantum" or "uncertain" analogies of the genetic operations on the individuals, using the "smooth" mutation and crossover. For each iteration of the QGA procedure we randomly perform copy, crossover and mutation operations, applied to single particle wavefunctions ^{x). Note, that crossover operations are applied between randomly chosen single particle wave functions ^(x) and ipj2(x) ("parents" j i and j'2 respectively) corresponding to the same single-particle excitation state i. The fitness function, i.e. the functional to be maximized by the QGA is the sum Z', defined in Eq.(2.35). After each application of a genetic
Numerical
69
optimization
operation the new-created single particle wavefunctions are normalized and orthogonalized. Then, the fitness of the individuals is evaluated and the fittest individuals are selected. The procedure is repeated until convergence of the fitness function to a maximal value is reached. Maximizing Z', one obtains the "best" set of the excitation spectrum {stfe} for a given system. Then one can use this set in order to compute any kind of quantum statistical values, for example, density of particles p(x) for any value of the temperature parameter /3 (see [Militzer (2000)]): p{x) = p((3,xN) = / p(P,x)dx1...dxN-i,
(2.39)
Ja
where p(/3, x) is given by (0
x) =
Efcexp(-/3(*fc(x)l£l^(x)))|*fc(x)l2 £ f c exp(-/3<* f c (x)|#|* f c (x)>)
where we use notation x = {x\, over the many-body states. 2.8.3
Formation
...,XN},
of a "Wigner
'
and the summation is carried out
molecule"
and its
"melting"
The fact that for noninteracting quantum problems the QGA yields good convergence to the exact results motivated us to apply our approach to test its efficiency for interacting systems. First of all, we study two interacting particles (in a triplet state) in ID using an artificially smoothed Coulomblike interacting potential between them: V(xltx2)
= - —
Q
,
(2.41)
v(zi -x2y + s at T = 0. Here 5 is a smoothing coefficient. Q is the strength of the inter-particle interaction. In our calculations we set S = 3. As we mentioned above, we seek for the Hatree-Fock approximation of the ground state. Therefore, we construct an initial population {^k(xi,x2)} consisting of antisymmetric functions of the form $ (a? 1,22) = [V'l^ 1)^2(^2) — ^2(^1)^1(^2)] for particles in a triplet state. After convergence of the QGA we determine the spatial density of the particles given by PQGA(X) = f\$QGA(x,x')\2dx'. In Fig.2.24 we show the calculated PQGA{X) corresponding to the Hartree-Fock approximation to the ground state for different values of the interaction strength Q. For Q = 0 (noninteracting case) the solution is built
70
Optimal control and forecasting
of complex dynamical
systems
Fig. 2.24 Density distribution PQGA(X) of two interacting particles in a triplet state in one-dimensional infinite wef}. The different curves refer to different strengths Q of the interaction V(x\,X2) = Q/ {x\ — X2)2 + 3 between the particles.
up from the ground state and the first excited state of the single particle problem. As a consequence, the particles are well delocalized and minimize the kinetic energy. In contrast, in the interacting case the repulsive term of the Hamiltonian forces the particles to be far away from each other. This effect increases with increasing interaction strength, and leads to a localization near the opposite walls. For Q = 100 the overlap between one particle wavefunctions becomes negligible. Now let us determine the ground state of two particles strongly interacting via the repulsive Coulomb potential in 2D. This is a common problem in the physics of quantum dots [Serra (1999); Creffield (1999); Akman (1999); Yannouleas (1999)]. In this case the repulsive interaction potential is given by U(f1,f2)
= —
Fi ~ r 2 |
(2.42)
Instead of scaling of the strength of the inter-particle interaction Q, for this example we have re-scaled the size of the box by setting d = 1000. This corresponds to the low-density case. In Fig.2.25 we show the calculated and
Numerical
optimization
71
Fig. 2.25 Electron density distribution PQGA{X,V) of two interacting electrons in a triplet state at zero temperature T = 0, in a square infinite well in the low density limit (large box size d = 1000). The spatial coordinates are shown in the units of d. Note the localization of the electron density that corresponds to formation of the "Wigner molecule".
symmetrized density distribution PQGA{X,V)- For low electronic densities, the contribution of the Coulomb interaction is more important oc d~l, than the contribution of the kinetic energy oc d~2 of the particles. This leads to a strong localization of the particles in opposite corners of the square box, which minimizes the energy of the system. This effect can be attributed to the well known Wigner crystallization in solids. In his pioneering work [Wigner (1934)], Wigner pointed out that the electron gas would crystallize at sufficiently low temperature and density. When thermal fluctuations (in energy units) becomes comparable with the energy of the inter-particle interaction, the opposite to the Wigner crystallization transition occurs, i.e. melting of the Wigner crystal. Note, that for the two-particle ID system discussed above this effect also takes place upon increase of the interac-
72
Optimal control and forecasting of complex dynamical
systems
tion strength Q (see Fig.2.24). The Wigner crystallization in quantum dots, which is also described as formation of the "Wigner molecule", has been obtained recently using other theoretical approaches [Crefneld (1999); Jauregui 1993); Yannouleas (1999)]. In fact, our calculated density shown in Fig.2.25 is in good agreement with other methods [Crefneld (1999)]. Next,
2
6
1
1
i
'
1
'
1
I5
.tS
1 -a o
;' J\
• ^. 4 - :/' \ -.
« S 2
: i : '
- f\\\ ~~
\
3 11 "0
<
,
/
.
tt
y
'••1
20
v. v.
/Xl—t
•n
II
'
40
...I--"
,
60
80
100
Position, atomic units Fig. 2.26 Particle density p(x) of two interacting particles in an infinite well with the width L = 100, using /3 = 4 • 10 7 (dotted line),2 • 10 6 (dash-dotted line),l • 10 6 (dashed line),l • 10 5 (solid line).
we present the results of calculations of the partition function Z together with the excitation spectrum. Once calculated, the excitation spectrum can be used to derive any kind of quantum statistical values. For instance, we can compute the particle density at different temperatures. We consider particles in a triplet state being in the ID infinite well with the width L and interacting via the repulsive Coulomb potential. In order to eliminate logarithmic singularity which is typical for Coulomb interaction in one dimension, we again use the smoothed interaction potential given by Eq.(2.41). In these calculations we use L = 100 that guarantees a low electron density and set Q = 1, S = 1/10. First we present the results of our calculations of the melting of a "Wigner molecule" in one dimension due to increasing of thermal fluctuations. In Fig.2.26 we show the electron density for the case of a two
Numerical
73
optimization
particle system. In our calculations we set the parameter (3 = 4 • 10 7 ,2 • 10 6 ,1 • 10 6 ,1 • 105 and we choose size of the well L = 100. The density of particles p(x) clearly becomes more delocalized with increasing of the temperature (decreasing of the parameter /?) and for /3 = 1 • 105 it is almost uniform, the Wigner crystal has melted! In Fig.2.27 we plot the particle density for three particles with parallel spins, using parameter (3 = 6-107,3• 10 6 ,1.5• 10 6 ,1.5• 105 and for the same size of the well. As in Fig.2.26, initially localized particle density is smeared out with increasing of the temperature. Because of the small amount of the particles in the system, this transition is quite smooth.
20
40 60 80 Position, atomic units
100
Fig. 2.27 Particle density p(x) of three interacting particles in an infinite well with the width L = 100, using j3 = 6 • 10 7 (dotted line), 3 • 10 6 (dash-dotted line), 1.5 • 10 6 (dashed line), 1.5 • 10 5 (solid line).
Now let us investigate the behavior of the finite quantum system at low temperatures, i.e., in the quantum regime. In contrast to the above demonstrated thermal effect, recently a new "Wigner molecule melting" scenario which is caused by quantum fluctuations and exists even at zero temperature [Filinov (2001)] has been studied. It was shown, that derealization of the electron density occurs when the ratio of the kinetic energy to the Coulomb energy exceeds a certain threshold [Filinov (2001)]. As we mentioned above, the kinetic energy Ekin of a quantum system with fixed
74
Optimal control and forecasting
of complex dynamical
systems
number of particles and variable size L scales as Ekin ~ L~2, while the Coulomb energy Ec scales only as Ec ~ L~l. Therefore, with decreasing of the size of the system quantum fluctuations become more and more significant (due to the Heisenberg's uncertainty principle) and quantum phase transition would occur. In Fig.2.28 we present results for the calculation of
«20
*->
•
'c 3
B 15 h i-i
£,10
'w C
:/ :\
u
"O aj
5
~o
I
* A :' / A
0,
/:' \:
":/ 'A ••/ / • \ ./
V \)> 0.2
1/
\
> ' • >
i
0.4
/: —
,'
V
/ A t
• < • '
i
0.6
0.8
Position, atomic units
Fig. 2.28 Rescaled particle density p(x) of four interacting particles in an infinite well at zero temperature using width of the well L = 500 (dotted line), L = 100 (dashed line), L = 1 (solid line).
the particle density p(x) for different sizes of the well L = 500,100,1 at zero temperature. In order to visualize these results we re-scale calculated p(x) on the interval [0,1]. From Fig.2.28 it becomes clear, that with decreasing of the size of the system electron density tends to be more delocalized, i.e. a kind of "Wigner molecule melting" takes place in the system. Note, the it looks quite different from the "melting" due to thermal fluctuations. 2.9
Evolutionary gradient search and Lamarckianism
Now let us talk about possible combination between global GA optimization and local search techniques. The evolutionary gradient search algorithm of Salomon [Salomon (1998)] is a search method that employs yet another, "evolutionary inspired" estimation of the local gradient of the objective function. The state of the evolutionary gradient search algorithm at time
Numerical
optimization
75
t is described by a base point x and a step length h. In every iteration, A offspring candidate solutions x + hzi, i = 1,..., A, are generated, where the Zj are random vectors with independent components distributed according to the Gauss normal distribution. Rather then performing selection the way an evolution strategy does, the evolutionary gradient search method computes: 1
A
d h (x) = - Y, [/( x + ^ i ) - /(x)] Zi
(2.43)
as an estimation of the local gradient of the function. The motivation for this step is the wish to not discard the information carried by the offspring that are rejected, but to interpret an offspring candidate solution with a negative fitness advantage over the best point as evidence that a step should be taken in the opposite direction. The method then proceeds by taking two test steps from the base point in the negative direction of dh(x). One test step has length hy/Nk, the other one has length hy/N/k, where usually k = 1.8. The base point is then updated by performing the test step with the higher (measured) fitness advantage, and the step length h is multiplied by k if the longer of the two test steps was more successful and it is divided by k if the shorter of the two test steps prevailed. Clearly, an iteration of the evolutionary gradient search procedure requires A + 2 evaluations of the objective function. It is interesting to compare the direction given by the negative of Eq. (2.43) in which the evolutionary gradient search method proceeds with that of other strategies. Salomon has shown that for small h and sufficiently large A, the direction given by Eq.(2.43) agrees closely with the gradient direction at local x. It seems that the evolutionary gradient search method assigns weights that are proportional to the fitness advantages of all offspring, with offspring with a negative fitness advantage receiving negative weights. While the evolutionary gradient search method has not yet been studied in great detail, it seems conceivable that the "genetic repair" effect may be present in evolutionary gradient searches. Another way to improve search of genetic algorithm is to move from Darwin's to Lamarck's formulation of the evolution. Jean-Baptiste Pierre Antoine de Monet, Chevalier de Lamarck was a famous French naturalist (1744— 1829). Implementation of the "Lamarckianism" is using such a hybrid strategy, when an individual's fitness and genotype are returned after execution of local gradient search for some small percentage of time. This has the effect of slowly moving the search to beneficial areas of the search
76
Optimal control and forecasting of complex dynamical
systems
space. Such a gradual movement may provide faster algorithm execution since an increasing number of solutions in the genetic population will represent good initial guesses, limiting the amount of inefficient search by the localized search technique in poor areas of the search space. In particular, local optimization can help on the later stage of the whole optimization procedure, when the area of the global optimum is already localized by the genetic algorithm. In this case it could be a good idea to introduce an effective rate, which characterizes the ratio between Darwinian and Lamarckian methodology, and which decreases during search. 2.10
Summary
In this chapter we have discussed some problems which arise in a context of global optimization search. In particular, we learned a bit about NFT and its main consequences, and we also have discussed multi-objective optimization. We made a brief review of some popular numerical methods of optimization, including the Simplex method and Simulated Annealing. We gave a more detailed introduction into genetic algorithms, and we gave some examples how use GA to solve stationary few-body quantum mechanical eigenproblems, including quantum statistical calculations for strongly interacting systems. As an example, we considered formation and melting of a "Wigner molecule" in a system of a few electrons confined in a quantum dot.
Chapter 3
Chaos in complex systems
The Universe presents us with an infinite variety of complex dynamical systems. Their spacial and temporal scales vary from intergalactic distances and millions of years to atomic and electron motion and femto- and even attoseconds. All these physical systems can be described within framework of well established and experimentally proven theories, like celestial mechanics or quantum theory. A much higher level of complexity is associated with the description of dynamics in human society or financial networks (stock market), since there are no rigorous mathematical description of the rational (or irrational) behavior of a human being. However, one can try to develop some interdisciplinary ideas which are common for different systems and may provide some insight of the complex phenomena in general. One of such ideas is the concept of chaos. Perhaps the ultimate test for chaos is the accuracy of short-term predictions. With truly random data, prediction is impossible. Conversely, with chaotic data, there is absolute determinism, and prediction is possible in principle, at least on short-time scales. Predictability thus precludes complete randomness and signifies determinism, although randomness and determinism often coexist. Chaotic behavior can be observed in most fields of science: subatomic and molecular physics, fluid dynamics, plasma physics, chemistry, molecular biology, most of the environmental, social and economic phenomena. Thus we can hope, that the knowledge which is obtained from the studies of chaos in natural sciences, can then be applied to more complex examples, like social systems. In general, all chaotic systems are nonlinear, however, not any nonlinear system is chaotic. The story of multiple attempts to understand chaos
77
78
Optimal control and forecasting of complex dynamical
systems
begins 300 years ago, when Sir Isaac Newton has derived analytically Kepler's elliptic orbits and set equations of motion of a two-body problem that can be integrated exactly. However, if one starts to consider the motion of the Moon in more details, he immediately realizes, that it cannot be well approximated by a two-body problem, because at least the gravitation of the Sun has to be taken into account. So, one should solve much more complicated three-body problem: the motion of the Moon, the Earth, and the Sun. Although few analytical solutions for some particular cases of the problem were given in XIX century, the failure to solve the three-body problem analytically, motivated most brilliant mathematicians at that time for search of a general solution. The surprising result was obtained by a great French mathematician Henri Poincare (1854 — 1912), who has rigorously proofed the impossibility of integrating the three-body problem in general case. The impossibility to integrate equations of motion of a physical system is tightly connected with unpredictability its time evolution. In a deterministic time evolution, if we know the initial state x(0) of the system at time t = 0 exactly, we can predict its state x(t) at time t = T with complete precision. But if there is an imprecision Ax(0) in the initial definition of x(0), this (even extremely small) error Ax(0) may grow exponentially with t, at least for a while, that will virtually destroy our predictions! Technically, is one even be able to obtain partial analytical solution under special conditions (for example, for a planar motion), the series will converge extremely slow. The way to characterize this exponential grow is to calculate the Lyapunov exponents. In practice, to obtain the Lyapunov spectra, imagine an infinitesimal small sphere (in the case of three dimensions), with radius dr sitting on the initial state of a trajectory. The flow will deform this sphere into an ellipsoid. That is, after a finite time t all orbits which have started in that sphere will be in the ellipsoid. The i-th Lyapunov exponent is defined by
Ai= l i m l l n ( ^ ) , t-»oo t \dr I
(3.1)
where dli(t) is the radius of the ellipsoid along its ith principal axis. Sensitivity to initial conditions means that nearby points in phase space typically "repel" each other. Note, we are not interested in the case of
Chaos
79
unbounded dynamics, when the distance between two initially nearby points diverge from each other exponentially for all times, because they would eventually move apart to the opposite ends of the Universe. Actually, we are looking for the opposite case, when the motion in phase space is bounded and any two points will eventually reach a maximum separation and then begin to approach each other again. As a result, in some cases, the volume of the accessible phase space can monotonously decrease until it becomes a set of measure zero, and the trajectory of the system will be captured in so-called strange attractor. In fact, this strange attractor will have a fractional dimension! Such sensitive dependence on initial condition often appears in our life. Imagine, for example, in the case of someone playing dices and attempting to reproduce a certain combination and fails, because of the above mentioned reason. This situation is usually referred to chaotic behavior. Obviously, the evolution in time of a chaotic system has limited predictability, which of the order of t oc 1/A, A is the largest positive Lyapunov exponent, and the system undergoes irregular oscillations (if they were regular they would be predictable). How to do prediction for a chaotic system one can learn from mathematics, using so-called Takens' delay embedding theorem, which is a result of Floris Takens on the embedding dimension of nonlinear (chaotic) systems. The theorem states that a dynamical system can be reconstructed from a sequence of observations of the state of the dynamical system. We will talk about in more detail in chapter 6. The mathematical possibility of chaos was well understood 100 years ago by Hadamard and Poincare, but theory and applications were developed only six decades later. The reason is that only in 1950 — 1960 the growing computational power of computers permitted scientists to make possible numerical integration of systems of nonlinear equations and perform numerical experiments. Now, in the era of digital computers, most likely what we know about properties of nonlinear differential equations is usually based on numerical integration techniques. Any numerical methods are based on approximations, and chaotic systems are characterized by highly sensitivity to approximations. This problem is know as problem of efficient shadowing of the system. Shadowing is a branch of chaotic systems theory that tries to show that, even in the face of the exponential magnification of small errors, numerical solutions have some validity. It does this by trying to show that, for any particular computed solution (the "noisy" solution), there exists a
80
Optimal control and forecasting of complex dynamical
systems
true solution with slightly different initial conditions that stays uniformly close to the computed solution. If such a solution exists, it is called a true shadow of the computed solution. An approximation to true shadowing is numerical shadowing, whereby an iterative refinement algorithm is applied to a noisy solution to produce a nearby solution with less noise. If this iterative process converges to a solution with noise close to the machine precision, the resulting solution is called a numerical shadow, for more details, see, for example, [Maddox (1989)]. There are some important consequences of the idea of chaos. One of them was pointed out by Poincare, who believed that the uncertainty of our predictions justifies a probabilistic description of the world. After the discover of the chaos phenomena, it is widely accepted that chaos is a generic property of all multidimensional nonlinear systems. By projecting out the concept of chaos on to the natural world of earthquakes or the Sun activity, biological evolution, and even the human society, many scientists started to believe that concept of chaos will explode the naive determinism of Newton's mechanics. There is another issue worth to mention. One have to distinguish between chaos and randomness. Chaos usually assumes that there is always finite horizon of the reliable forecast of the system's dynamics, while randomness usually means the dynamics is completely unpredictable. Up to a certain precision one can simulate randomness with some deterministic pseudorandom procedure, or a strong chaotic process with a large positive Lyapunov exponent that makes any (even short-term) forecast very difficult. However, it seems that the true randomness exists only in quantum mechanics, where there is a breakdown of determinism, the cause and effect, given initial (and final) state and trajectories. If we accept postulates of quantum mechanics, this randomness is true, pure and unavoidable. It is not a poor pseudorandomness of dices and other gambling devices. One should notice, that the Schrodinger equation which describes the quantum evolution of a very specific dynamical variable, a wavefunction, is deterministic like as the common heat transfer equation.
3.1
Lorenz attractor
The Lorenz attractor dates from 1963, when the meteorologist Edward Lorenz working at MIT published an analysis of a simple system of three
Chaos
81
differential equations. He developed the equations as a simple model for the so-called Rayleigh-Benard convection problem, which governs the formation of convection rolls between two parallel surfaces at different temperatures. His interest was in predicting atmospheric dynamics with the ultimate goal of developing long-term weather prediction tools. What he found through obtaining numerical solutions of these seemingly simple equations had a deep impact on our understanding of the possible behaviors in nonlinear equations. In general, convection is described using the Navier-Stokes equations in partial derivatives, that makes analysis quite complicated. Fortunately, partial differential equations may often be thought of as infinite systems of ordinary differential equations. It is frequently possible to expand the dependent variables of a partial differential equation in an infinite discrete set of coupled ordinary differential equations for the time dependence of the Fourier coefficients. The Lorenz model was derived from the Navier-Stokes equations by making similar simplifications and results in a system of three coupled nonlinear first-order ordinary differential equations:
—a; =
a(y-x),
d —y = px-y-
, xz,
s
(3.2)
where a is the Prandtl number, i.e. the ratio of the kinematic viscosity divided by the thermal diffusivity, b is related to the shape of the convection rolls, and p is related to the Rayleigh number (or the temperature difference between the surfaces). The dependent variables x(t), y(t) and z(t) essentially contain the time dependence of the stream-function and temperature distributions expressed as Fourier expansions. Note, that the equations are nonlinear due to the xz and xy terms in the second and third equations, respectively. For derivation of the Lorenz equations, see [Schuster (1988)]. Lorenz has pointed out that Eq.(3.2) possess some surprising features. In particular, the equations are "sensitive to initial conditions", meaning that tiny differences at the start become amplified exponentially as time passes. This type of unpredictability is a characteristic feature of chaos. Conversely, there is also some magic "order" in the system: numerical solutions of the equations, plotted in three dimensions, consist of curves which approach a curious two-sheeted surface, later named the Lorenz at-
82
Optimal control and forecasting of complex dynamical
Fig. 3.1
systems
The famous Lorenz attractor, that symbolizes an order in chaos.
tractor. The geometry of the attractor is closely related to the "flow" of the equations the curves corresponding to solutions of the differential equations. There is an unstable equilibrium, a saddle point, at the origin. The trajectory repeatedly passes this point, and are pushed away to the left or right, only to circle round to pass back by the saddle. As they loop back, adjacent curves are pulled apart this is how the unpredictability is created and can end up on either side of the saddle. The result is an apparently random sequence of loops to the left and right. Lorenz attractor became more than 40 years ago the one of the wellrecognized symbols of modern nonlinear dynamics and chaos theory. One can possibly find it in every textbook on theory of dynamical nonlinear systems and it associated with appearing of an order within chaos (see Fig.3.1). First its existence was established only on the basis of numerical integration of Lorenz equations on computer. However, mathematicians have lacked a rigorous proof that exact solutions of the Lorenz equations will resemble the shape generated on a computer by numerical approximations, and they
Chaos
83
even also could not prove that its dynamics are genuinely chaotic. Only in 1998 Warwick Tucker has proved that Lorenz system indeed define a robust chaotic attractor [Tucker (1999)]. This result is of great importance, because it builds a bridge between empirical laws from numerical experiment and mathematically rigorous proof. It is easily accepted by our intuition that if a system consists of many interacting subsystems, having many degrees of freedom and hence governed by a set of complex multidimensional differential equations, then its behavior might be in practice impossible to predict. What is really hard to imagine that some times even very simple systems, described by simple nonlinear equations, can have very complicated chaotic solutions. Another striking observation, is that increasing of the dimensionality of some type of dynamical systems can lead to decreasing of the probability, that this system can be chaotic! [Sprott (2003)].
3.2
Control of chaotic dynamics of the fractional Lorenz system
Complex systems can consist of an enormous number of simple units whose interactions can lead to unexpected collective behavior. The dream of complexity theory is to discover the laws governing such complex systems, whether they be ecosystems, the weather or corporations in the marketplace. The "power law" is a distinctive experimental signature seen in a wide variety of complex systems. In Econometrics it goes under the name of distributions with "fat tails", in Physics it is referred to as "critical fluctuations" or "universality" , in Computer Science and Biology it is "the edge of chaos", and in Demographics and Linguistics it is called "Zipfs law". As it was shown recently [Hilfer (2000)], many of these general "power law" dependencies can be naturally introduced as solutions of differential equations with fractional derivatives. Another example, is the success in the modelling of financial series using Autoregressive Fractionally Integrated Moving Average models (ARFIMA) [Doornik (1994)]. Thus, fractional derivatives could be a significant ingredient in modelling of complex dynamical systems. In this section we introduce a generalization of the Lorenz dynamical system using fractional derivatives. Thus, the system can have an effective non-integer dimension E defined as a sum of the orders of all involved derivatives. We found that the system with E < 3 can exhibit chaotic
84
Optimal control and forecasting
of complex dynamical
systems
behavior. An interesting finding is that there a critical value of the effective dimension E c r , under which the system undergoes a transition from a chaotic dynamics to a regular one. Although fractional derivatives have a long mathematical history, many years they were not used in physics. One possible explanation of such unpopularity could be that, there are multiple nonequivalent definitions of fractional derivatives [Hilfer (2000)]. Another difficulty is that fractional derivatives have no evident geometrical interpretation because of their nonlocal character [Podlubny (2001)]. However, during the last ten years fractional calculus starts to attract much more attention of physicists and engineers. It was found that various, especially interdisciplinary applications can be elegantly described with the help of the fractional derivatives. As example, one can mention studies on viscoelastic bodies, polymer physics, phase transitions of fractional order, anomalous diffusion and description of the fractional kinetics of chaotic systems (for review see [Hilfer (2000)]). The usefulness of fractional derivatives in quantitative finance [Scalas (2000)] and quantum evolution of complex systems [Kusnezov (1999)] was recently demonstrated. One should also mention recent attempts to introduce a local formulation of fractional derivatives [Kolwankar (1998)] and to give some geometrical interpretations [Podlubny (2001)]. However, most of the studies mentioned above were performed on the basis of linear differential equations containing fractional derivatives. The main consequence of this limitation that the dynamics of such systems cannot be chaotic. According to the Poincare-Bendixson theorem (see, for example, [Hirsch (1965)]), chaos cannot occur in two-dimensional systems of autonomous ordinary differential equations. One have to stress that this theorem is applicable to continues-time dynamical systems and not to discrete maps, which can be represented as Xn+i = f(Xn), where Xn is a discrete state variable. The discrete-time dynamical systems can exhibit chaotic behavior even in one dimension (see, for example, the famous logistic map [Ott (1994)]). As we mentioned before, the most famous example of a continuestime three-dimensional system which exhibits chaos is the Lorenz model [Lorenz (1963)]. The dimension E of such system can be defined as a sum of the orders of all involved derivatives, however one should remember, that this definition is not rigorous. Therefore, by using fractional derivatives of orders 0 < a, )3,7 < 1 it is possible to obtain a system with an effective noninteger dimension £ < 3. A natural question arises whether such system can exhibit chaotic behavior. In connection with this question one should
Chaos
85
also mention the work of Hartley et al. where the authors studied chaotic motion of the Chua-Hartley system of a fractional order [Hartley (1995)]. In this section we are going to investigate dynamics of the fractional Lorenz system and we find that it can be chaotic with E < 3. We estimate the largest Lyapunov exponent in this case. Moreover, we determine a critical value E c r , under which E < E c r the dynamics of the considered system becomes regular. There are several definitions of fractional derivatives [Hilfer (2000)]. Probably the most known is the Riemann-Liouville formulation. The Riemann-Liouville derivative of order a and with the lower limit a is defined as
where r ( a ) is the gamma function and n is an integer number chosen in such way, that n — 1 < a < n. An alternative definition of fractional derivatives was introduced by Caputo [Caputo (1967)]. Caputo derivative of order a is a sort of regularization of the Riemann-Liouville derivative and it is defined through
The main advantage of the definition Eq.(3.4) is that the Caputo derivative of a constant is equal to zero, that is not the case for the Riemann-Liouville derivative. Substantially, the Caputo fractional derivative is a formal generalization of the integer derivative under the Laplace transformation [Scalas (2000)]. Now let us introduce a fractional generalization of the Lorenz system: = a(y -x), dt0V
= px -• =
y -
xz
(3.5)
xy-•bz.
Here we assume 0 < a,/3,7 < l , r > 1 and the time derivatives are in the Caputo sense. The effective dimension E of the system Eq. (3.5) we define as a sum of the orders a + /? + 7 = E. In our calculations we use the following values of the parameters a = 10, p — 28, b = 8/3, so that in the
86
Optimal control and forecasting
of complex dynamical
systems
case a = /? = 7 = r = l the system Eq.(3.5) reduces to the common Lorenz dynamical system exhibiting chaotic behavior. Generalization of dynamical equations using fractional derivatives could be useful in phenomenological description of viscoelastic liquids like, for example, human blood [Hilfer (2000); Thurston (1972)]. The system Eq.(3.5) is in fact a system of coupled nonlinear integro-differential equations with a weakly singular kernel. This is a computationally expensive problem since for numerical integration it requires C(n 2 ) operation counts, where n is the number of sampling points [Diethelm (1997)]. Let us start from the analytical solution of a linear fractional differential equation: ^ x
= Ax + f(t),x(0)=xo.
(3.6)
With the help of the Laplace transformation [Podlubny (1997)] one can easily obtain the solution of the Eq.(3.6) in the form: x(t) = x0Ea(Ata)
+ [ (t- T)a^Ea,a(A(t Jo
- r)a)f(r)dT,
(3.7)
where Ea is the one-parameter Mittag-Leffler function [Ederlyi (1955)] defined by: OO
fc
*»<*> = £i^+T)'< a>0 >-
^
and the two-parameter Mittag-Leffler function [Ederlyi (1955)] defined by: OO
fc
For a = 1, Ea and EaiCC both reduce to the usual exponent function. The numerical scheme we implemented in our calculations is based on the linearization of the system Eq.(3.5) on each step of integration and iterative application of the Eq.(3.7). We have checked our numerical scheme by comparison results for Eq.(3.5), obtained using Eq.(3.7), and by the standard fourth order Runge-Kutta method for the case a = /? = 7 = r = l. We integrated Eq.(3.5) for different values of the parameters a, (3,7, r and different initial conditions. The first finding is that the fractional Lorenz system can exhibit chaotic behavior with the effective dimension £ < 3. In Fig.3.2 we show the dynamical portrait of the system
Chaos
87
{x(t),y(t),z(t)} using parameters r = 1, a = /? = 7 = 0.99. Thus, the effective dimension of the system is E = 2.97 < 3. We set the initial conditions at t = 0 as {xo, yo, ZQ} = {10,0,10}. Note, that the system exhibits chaotic dynamics similar to the case of the common Lorenz system. Moreover, one probably can also define the set of points which could be characterized as a strange attractor. However, this set is slightly deformed compare to the "classical" Lorenz attractor. We have to stress, that it is rather time consuming to define Lyapunov exponents for the nonlocal system, like Eq.(3.5). In order to resolve this difficulty we define the largest positive Lyapunov exponent A using an implicit procedure developed for the time-series data. With the help of the free-ware package TISEAN [Hegger (1998)] we estimated the largest Lyapunov exponent A for the case shown in the Fig.3.2. We found A « 0.85 that corresponds to the chaotic regime. Note, that for the common Lorenz system A « 0.906 [Sparrow (1982)]. We conclude that decreasing of the effective dimension E induces some effective damping in the system. By decreasing the parameters a,/3,7 one obtains a further decreasing of the largest Lyapunov exponent.
Fig. 3.2 Dynamical portrait of the fractional Lorenz system using parameters a = (3 = 7 = 0.99 and having the effective dimension S = 2.97. Note formation of the attractor, similar to the Lorenz strange attractor.
At a certain critical dimension E c r the dynamics of the system undergoes qualitative changers and becomes regular for any initial condition. This is a new interesting result which, for our knowledge, was not described before.
88
Optimal control and forecasting of complex dynamical
systems
50 40 30 20 10
30 20 10 -15 -10
-io X
10
y
-20
15
20^30
Fig. 3.3 Dynamical portrait of the fractional Lorenz system using parameters a = /3 = 7 = 0.96 and having the effective dimension £ = 2.94. Note, that the strange attractor does not exist and system is attracted by one of the two focuses: (3v / 8,3\/8,26) and (-3^,3^,26).
We obtain the lowest value of the system's effective dimension E c r « 2.91, for which chaotic regime is still possible. This corresponds to the case a « 0.91,/? = 7 = 1. The obtained critical values of the parameters a, /3,7 reflect the fact, that the first, the linear differential equation in the system Eq.(3.5) seems to be "less sensitive" to the damping, introduced by the fractional derivative, than others, nonlinear equations. If we restrict ourselves for the case of equal derivative orders a = (3 = 7, the effective critical dimension for this symmetric case is even higher: £sim ^ 2.94. In the Fig.3.3 we show the dynamical portrait of the system setting parameters r = 1, a = /3 = 7 = 0.97, with the corresponding effective dimension £ = 2.91 < ££J.m- We use the same initial conditions as for the previous examples. Note, that in this case the system exhibits a strong damping of the oscillations. Dependent on initial conditions, the trajectory of the system is attracted by one of two centers given by
(x, y, z) = (±y/b{r-l),
±y/b(r-l),
r - 1).
(3.10)
These points can be easily defined from the stationarity condition: da
d13
_
di
0.
(3.11)
Note, that the stationarity condition Eq.(3.11) has the usual form because
Chaos
we use the Caputo's fractional derivatives, and it is not applicable, if one uses the Riemann-Liouville formulation Eq.(3.3). However, the obtained critical value S c r « 2.91 is not the "universal threshold" for any continues-time chaotic system of fractional dimension. We found that it is a value that characterizes a particular dynamical system. In order to illustrate this we repeat the simulations shown in the Fig.3.3 with the same initial conditions, but with the changed parameter r = 3. In the Fig.3.4 we show the dynamics of the variable z(t). The system Eq.(3.5) in this case exhibits a "stronger" nonlinearity, which possibly compensates the damping effect described above, and one again obtains chaotic behavior.
Fig. 3.4 Evolution of the variable z(t) of the fractional Lorenz system. We use parameters a — (3 = 7 = 0.96, r = 3. The effective dimension £ = 2.94. Note, that unlike in the Fig.3.3, the dynamics of the system is chaotic.
We also found that under certain conditions the system Eq.(3.5) can exhibit quasi-periodic oscillations with stable periodic orbits. In the Fig.3.5 we show an example of such quasi-periodic dynamics of the variable z(t) using parameters values r = l,a = (3 = l and 7 = 0.98, that corresponds to the effective dimension £ = 2.98. We used the same initial conditions as for the previous Figs.3.2-3.4. Note, that after some time of transient behavior
90
Optimal control and forecasting of complex dynamical
systems
Fig. 3.5 A quasi-periodic evolution of the variable z(t) of the fractional Lorenz system. We use parameters a —' f} = 1,7 = 0.98, r = 1.
the system evolves quasi-periodically. For different initial conditions the dynamics of the system shows different limiting cycles having the same two symmetric centers (fixed points) given by Eq.(3.10). One can understand the obtained results shown in the Figs.3.2-3.5 in the following way. Any chaotic system is characterized by a strong sensitivity to its initial conditions, and the "memory" time of the system can be estimated as tmem RS A - 1 , where A is the largest positive Lyapunov exponent. On the other hand, the introduction of fractional derivatives leads to a non-locality in time domain (see definition Eq.(3.4)), which can be interpreted as the presence of long "memory". The competition between these two tendencies was a subject of the presented investigations. Now we discuss a question whether introduction of fractional derivatives should always lead to stabilization and damping of the chaos in the dynamical system. Let us consider Eqs.(3.6),(3.7) in more detail. In the case of A < 0 and in the limit t -> +00, Ea(Ata) oc t~a [Scalas (2000)]. Therefore, one obtains only the power law convergence of two close trajectories instead of the exponential one for a = 1. Thus, even small changers of
Optimal control of quantum
systems
91
the orders of derivatives could lead to dramatic changers of the Lyapunov spectrum and of the whole dynamics. One can imagine a situation, when a small decreasing of the order of the derivative a could lead to a stronger sensitivity of the whole nonlinear system to the initial conditions. If A > 0, in the limit t —* +00, Ea(Ata) oc exp(A1^at) and one obtains, as in the case of a = 1, the exponential divergence of two close trajectories.
3.3
Summary
In this section we discussed an example of chaotic phenomena in Nature. We have introduced a fractional generalization of the Lorenz model which could be useful in phenomenological description of viscoelastic liquids and other dynamical systems with a long "memory". We study how the dynamics of the system depends on the effective dimension S, which now can be a non-integer value. We found that the fractional Lorenz system exhibits rich dynamical properties and can be chaotic with effective dimension £ less than 3. To discriminate between chaotic and ordered orbits, we also estimated the largest Lyapunov exponent in particular cases. We demonstrated that the dynamics of the system is strongly sensitive to the values of the orders of the involved derivatives a, f3,7 and as the result, to the effective dimension S. In general, decreasing of the parameters a, 3,7 leads to a damping in the system. We discovered that under the certain critical dimensionality S c r < 2.91 chaotic motion of the system is not possible. There are some interesting questions which are still open. Does the lowest universal bound £ " " exist, under which any nonlinear system cannot be chaotic? And how far could it be from the value predicted by the Poincare-Bendixson theorem ( £ " " = 2)? One should aware though, that fractional dynamics do not represent a group f(t + s) = f(t) * f(s) in time domain, so one cannot apply the Poincare-Bendicsson theorem rigorously. Another interesting question is whether the introduction of fractional derivatives of the distributed order [Chechkin (2002)] in nonlinear systems could help in the description of "edge of chaos", which is characterized by a power law divergence of close trajectories.
This page is intentionally left blank
Chapter 4
Optimal control of quantum systems
In the last decade the development of laser systems opened the way for creation of ultrashort femtosecond pulses with controlled shape, spectrum and polarization (see, for example, [Brixner (2001)]). Ultrashort laser pulses can be used as an ideal tool to manipulate quantum objects. For instance, with the help of optimal control field one can induce chemical reactions which are not possible or very difficult to carry out [Judson (1992)]. After the seminal work of Judson and Rabitz [Judson (1992)], where the authors suggested a variational formulation of optimal control in quantum systems and a procedure to solve this problem, many theoretical and experimental investigations were devoted to the problem how "to teach" lasers to drive molecular reactions in real time [Bardeen (1997); de Vivie-Riedle (2000), Apalategui (2001); Vajda (2001); Hornung (2000)]. The idea consists in using the pulse-shaping techniques to design pulses or sequences of pulses having a given optimal shape (and phase) so that the desired atomic wave packet dynamics is induced. Optimal control of the internal motions of a molecule is achieved by exploiting a variety of interference effects associated with the quantum mechanics of molecular or electron motion. Thus, the population of a certain vibrational state, which may be responsible for the yield of a chemical reaction can be controlled. Using a variational formulation of the control problem it was shown, how to construct optimal external fields (e.g. laser pulses) to drive a certain physical quantity, like the population of a given state, to reach a desired value at a given time [Ohtsuki (1999); Peirce (1988); de Araujo (1998)]. However, even for the simplest control problems the obtained fields usually have a rather complex nature and cannot be easily interpreted [Zhu (1998)]. Furthermore, since the optimal field arises in the formalism [Judson (1992)] as a solution of the system of coupled nonlinear differential equations, which
93
94
Optimal control and forecasting
of complex dynamical
systems
are treated numerically by application of iterative methods, there is also a problem related to the multiplicity of the obtained optimal solutions which are local extrema of the control problem. Therefore, it is necessary to develop a new theory, which permits to derive analytical solutions for the optimal fields, to guarantee their uniqueness at least for simple problems. In this chapter we present a new alternative approach which permits to obtain analytical solutions in some simple problems. There is another point worth to note. Although the maximization of a given objective at a certain moment (as it was considered in earlier works mentioned above) is relevant for many purposes, a more detailed manipulation of real systems may require the control of physical quantities during a finite time interval. An interesting example of such optimal control was recently performed on a system of shallow donors in semiconductors [Cole (2001)]. Using pulses of various shapes and duration one can control the photo-current in the system, while the total transferred charge is proportional to a time integral over the occupation of a certain excited state. The search for optimal fields able to perform such control quantum systems is a vital problem for which no analytical description has been given so far. Using our theory we are going to consider and solve such kind of optimal control problems. In many situations the controlled system cannot be treated as isolated, therefore dissipative and relaxation processes, due to coupling to the environment (thermal bath) or due to contact with measuring devices, could play a significant role. In this case some limits of the optimal control of the system should exist [Schirmer (2000)]. The question is to estimate these limits quantitatively. An interesting problem which is usually not mentioned, is: "If the optimal control field for a quantum system without relaxation is known, how should it be modified in the presence of relaxation in the system?" In the following chapter we develop a new formulation of the optimal control problem in quantum systems. Using new analytical approach we derive a differential equation for the optimal control field which we solve analytically for some limiting cases. This approach also permits us to investigate optimal control of simple quantum systems with relaxation. First we will give a short introduction into the density matrix formalism, which is useful in description of realistic quantum mechanical systems in a contact with environment. Then we give a brief overview of the modern variational formulation for the optimal control problem, which were developed, for example, in [Ohtsuki (1999); Peirce (1988)]. After that, we give an alternative approach, that allows us to describe optimal control of
Optimal control of quantum
systems
95
a quantum system over a finite [0, T] time interval. Optimal control of the system at a given time T is only a special case of our general theory. We also introduce a new type of constraint on the control field which limits the minimal width of the envelope of the resulting field. This constraint naturally arises if one tries to find the optimal pulses with experimentally achievable modulation of the control fields.
4.1
Density matrix formalism
Here we would like to outline main ideas behind the density matrix formalism that permits us to perform quantum mechanical description of a system embedded in some environment, usually described as a thermal bath. Let us consider a mixture of independently prepared quantum states \tpi >, (i = l,...,n) with the statistical weights Wi. These states are not necessarily orthornormal to each other. The statistical operator p, or density matrix operator is defined as: n
P
= J2wi\iPi>
(4-1)
t=l
In order to present Eq.(4.1) in a matrix form, we choose a convenient orthonormal basis \(f>\ >,..., |0„ >, that is connected |i/>i > through the relationship: n
3= 1 n
< ^ | = ^a*fc<^l-
(4-2)
fc=i
Than the expression given by Eq.(4.1) can be rewritten as n
n
n
p = ^2Y,J2WiaiJaik\i><4>kl
(4.3)
or, in equivalent way n
< 4>i\p\4>k >= Y,
W a
i iJ
(4-4)
96
Optimal control and forecasting of complex dynamical
systems
The last expression Eq.(4.4) can be considered as a definition of the matrix p. Thus, the density matrix contains all significant information about quantum mechanical system and we will use it in our further calculations. The density matrix of a quantum system satisfies the quantum Liouville equation. In order to derive it, we should remember the following expression for the density matrix operator:
| p ( t ) = |Ew^<(t)>
E^|( l v , i ( i ) > ) < v ' i W I + £^ l v ' i W > l( < v , i ( 0 1 )- (4-5) Using the time-dependent Schrodinger equation and its Hermite conjugate
§-t\A(t) >= ~H\i>i{t) >, | < Ut)\ = J$< Mt)\,
(4.6)
one easily obtains the quantum Liouville equation, which is very useful to describe the evolution of quantum systems interacting with decohering environment: ih ^=Hp-Hp=[H,p}.
(4.7)
Note, that the Hamiltonian of the system H can explicitly contain an external control field.
4.2
Liouville equation for the reduced density matrix
In this section we are going to derive, using the projector technique, the Liouville equation for the reduced density matrix, which describes a quantum system being in contact with environment. Let us consider a quantum system A with the Hamiltonian HA in contact with another system B (heat bath) with the Hamiltonian Hg. One can think about systems A and B as two subsystems interacting with each other. We assume a weak interaction between subsystems in order to justify using of the perturbation theory. As we have shown in the previous section, the density matrix of the
Optimal control of quantum systems
97
whole system p obeys the quantum Liouville equation: ih^
= [HA + HB + HAB ,p],
(4.8)
where HAB is the interaction between the systems A and B. In order to make further progress, we assume that the density matrix of the system B is at thermal equilibrium at temperature T and it is given by = exp(-/3FB)/TrB[exp(-/3FB)].
PB
(4.9)
The trace operation TTB means that we take the diagonal sum of the following operator in the Hilbert space of the system B [Toda (1983)]. Let us introduce the reduced density matrix a that describes the system A and is given by a(t) = TrB[p(t)].
(4.10)
Then our task is to derive from the Eq.(4.8) corresponding equation of motion for the reduced density matrix a(t). Let us introduce a projector operator: Pp{t) = pBTrB[p{t)}=pBa(t).
(4.11)
One can immediately check that from the definition Eq.(4.11) it follows: P2 = P,
(4.12)
thus, P is indeed a projector. Let us also introduce for a brief notation the corresponding Liouville operators: C = CA+CB
+ CAB.
(4.13)
For a Liouville operator £ and an operator F we can always write: exp(i£)T = exp(-iHt/h)Fexp(iHt/h).
(4.14)
We start from the separation of the Liouville equation into two equations:
ih
iif
ihd{1
~tP)p
= PCPp+P£(1 - p)/9
= (1 - P)CPp + (1 - P)C{1 - P)p.
(4.15)
98
Optimal control and forecasting of complex dynamical
systems
It is easy by summation of both equations obtain again the Liouville equation Eq.(4.8). We can write ih-^ ih
~^-
= £Aa +
= £ABpB
TrB[£AB6(t)},
+ CB+
(4.16)
P'CAB)0,
where we use notation (1 - P)p{t) = 9{t), (1 - P) = P'.
(4.17)
Now we just formally solve the second equation in (4.16) and obtain
*-ijf
6{t) = Z{t)a - - J dTCABpB°(T)Z(t-T),
(4.18)
where Z(t) = exp(—1(CA + CB + P'CABY/^)Now let us substitute the solution Eq.(4.18) into the first equation in Eq.(4.16), and we arrive to iK-£ = CA<J + TrB[CAB(Z(t)a)
-
%
-J dTCABpBa{T)Z{t
- r)].(4.19)
Note, that this equation is still exact. In order to deal with it, let us consider the second-order perturbation, that will give us ih^=CAa
(4.20)
i
\
Jo /o
dTTrB[£AB(exp(-i(£A
+ CB)(t -
T))CABPBO{T)\.
In many situations it is enough to consider the interaction between the systems A and B in the bilinear form: HAB = A(t)B(t),
(4.21)
where operator A{t) belongs to the system A, and operator B(t) corresponds to the system B. Then we can rewrite Eq.(4.19) in the interaction representation as iK
%
=
-^2 £dTTrBUA(t)B(0),{A(r)B(T-t),pBp(t)}}}.
(4.22)
Note that while the original equation Eq.(4.8) for the full density matrix represents unitary evolution, Eq.(4.22) for the reduced density matrix a is
Optimal control of quantum
systems
99
non-unitary. This is due to irreversibility introduced by the trace operation over subsystem B and partial loss of the information. In our further calculations we traditionally denote p the reduced density matrix, instead of o~ used in this section.
4.3
M o d e r n variational a p p r o a c h t o o p t i m a l control of quantum systems
Using the Lagrangian formalism for the control problem, it was shown [Ohtsuki (1999); Peirce (1988)] how to construct optimal external fields to drive a certain physical quantity like the population of a given quantum state to reach a desired value at a given time. Let us briefly describe this method: Following [Peirce (1988); Ohtsuki (1999)], let us consider a problem to find an optimal control pulse shape that transfers a quantum system with dissipation from the initial into some "target" state. The optimal control, we are going to implement, should be performed in the shortest possible time for a given pulse energy. Let the Hamiltonian of the controlled system interacting with external laser field E(t) is given by H = H0 + pE(t),
(4.23)
where Ho is the Hamiltonian of the unperturbed system (for example, molecule Hamiltonian), ft is the dipole moment operator and E(t) is a control field. Since the system is in the contact with environment, it can be characterized in terms of the reduced density matrix p(t), and time evolution of the system is described by the Liouville equation: ihp = LtotP = [H, p] + fp.
(4.24)
Here the operator T describes relaxation processes in the system. Let us assume that the initial density matrix is given by p(0) = po, and the target state is given by p(T) = prThe main idea is that the optimal control problem can be formulated in terms of the maximization the following functional
S=+\j
E2(t)dt+ J E(t)(ifi^-t-Ltot)p(t)dt,
(4.25)
where A is a Lagrangian multiplier and £ is a Lagrangian multiplier density.
100
Optimal control and forecasting
of complex dynamical
systems
The first term in the equation Eq.(4.25) maximizes the overlap with the target state, the second term bounds the energy of the control pulse and the third term makes us sure that the evolution of the system obeys the Liouville equation. The variation with respect to the Lagrangian multiplier density S(i) gives the Euler-Lagrange equation: iht = L\otE,
(4.26)
where f denotes the Hermite conjugation. The terminal condition is S(T) = PT- After some manipulations for the control field we have E(t) = ±,
(4.27)
where the operator M is given by the commutator with the dipole moment operator Mf = [//,/]. By substituting Eq.(4.27) into Eqs.(4.24),(4.26) we finally obtain a closed system of equations: ihp=Lp--^p, iht = L f E - ^ - S < Y,\M\p >,
(4.28)
LA
with the initial p(0) = po and the terminal state E(T) = pr- Here we use notation Lf = [Ho,/] + T / . In order to solve these equations, some sophisticated numerical procedures are necessary. 4.3.1
An alternative
analytical
theory
Unfortunately, by using the above described procedure it is not possible to obtain any analytical solution even for the simplest control problems. Here we develop an alternative new approach that permits us to obtain solutions of analytical form in some simple cases. The obtained solutions represent global extrema of considered control problems within certain approximations. In order to solve a problem analytically one has to make some reasonable assumptions. Let us limit ourselves on the case of monochromatic control fields only E(t) = V(t)sm(wt). This means that we fix the carrier frequency w and search for the optimal shape (envelope) of the field V(t). The modulation of the field amplitude should be slow enough, so it does not affect the assumption of monochromaticity. In the next pages we are going to discuss when such assumption is valid. Thus, we are not
Optimal control of quantum systems
101
going to consider any frequency modulation in our theory. For simplicity we consider in this section only the case of one-photon resonance, however, as we show later, this theory can be also applied when the multi-photon resonance takes place in the system. This method consists of two steps: (1) Under certain conditions we are going to derive an approximate analytical solution for the density matrix p(V(t),t) of a controlled quantum system that satisfies the Liouville equation with dissipative terms. This solution has a relatively simple functional dependence on the shape of the control field. (2) With the help of this solution we derive an explicit ordinary differential equation, which is in fact the Euler-Lagrange equation, for the optimal control field. Note, that one can guess the minimum order of the Euler-Lagrange equation from some general physical arguments. Since the memory effects are expected to be important, one should search for a differential equation containing both the pulse area 6 [Shore (1989)], with 6 defined as 9{t) = n f dt'V{t'),
(4.29)
and its time derivatives 6 = V(t), where V(t) is the external field envelope, and /u being the dipole matrix element of the system. Therefore, for the case of optimal control of dynamical quantities at a given time to, the differential equation satisfied by 6(t) must be of at least second order to fulfill the initial conditions 0(to) and 9(to) = V(to). In the same way, the control of time averaged quantities over a finite time interval [to, to + T] with boundary conditions requires a differential equation of at least forth order for 9(t) due to the boundary conditions for 9(t) and 6{t) at to and £0 + T. It will be shown below, that for certain quantum systems with decoherence a forth order differential equation for the control fields arises naturally using variational approach as the Euler-Lagrange equation. Let us consider a quantum-mechanical system which is in contact with an environment and interacts with an external field E{t) = V(t)cos(wt). Here V(t) is a pulse shape (or "envelope") and w is a carrier frequency. The evolution of such system obeys the quantum Liouville equation for the
102
Optimal control and forecasting
of complex dynamical
systems
density matrix p(t) with dissipative terms. The control of a time averaged dynamical quantity requires the search for an optimal shape V(t) of the external field. Thus, in order to obtain the optimal shape V(t) on a finite control time interval [0, T] we propose the following Lagrangian (that is different from the Lagrangian, proposed, for example, in [Judson (1992)])
L=
I h ^ + ^ W j ^ + ^ ^ ^ W . ^ f ) ] ^ (4-30)
where /3 is a Lagrange multiplier and Q(t) is a Lagrange multiplier density. Note, that atomic units fi.=m=e=l are used. The first term in Eq.(4.30) ensures that the density matrix satisfies the quantum Liouville equation with the corresponding Liouville operator Z(t): i^t
= Z(t)p(t).
(4.31)
We assume that p(t = 0) = po is the density matrix corresponding to the initial conditions. The second term in Eq.(4.30) explicitly includes the description of the optimal control problem. The functional density C\ is given by CMt),V(t),
^
)
= Cob(p(t)) + W2{t)
+Ai(j^j
,
(4-32)
where A and Ai are Lagrange multipliers. C0b(p(t)) refers to a physical quantity to be maximized during the control time interval. Control at a given time T can be obtained as a special
lim
f
Cob(p(t))5(t - TJdt = Cob(p(T)),
(4.33)
where 5(t) is the Dirac's delta function. The second term in Eq.(4.32) imposes a constraint on the total energy EQ of the control field 2 f E2{t)dt w / V2(t)dt = / e-^-dt = E0. (4.34) Jo Jo Jo M The third term in Eq.(4.32) represents a further constraint on the properties of the pulse envelope. The requirement
Optimal control of quantum
systems
103
where R is a positive constant, bounds the absolute value of the time derivative of the pulse envelope dV(t)/dt, and therefore excludes infinitely narrow or very abrupt, step-like solutions, which cannot be achieved experimentally. Let A be a minimal experimentally achievable duration of the pulse, then A - 1 oc R (see Fig.4.1). The above formulated control problem is
V(t) "
t Fig. 4.1 Constraint on the minimal width A of the envelope V(t) of the optimal pulse (see Eq.(4.35)).
rather complicated due to nonlinearity in the functional C0b(p(t)) and time dependence of the operator Z[t) in the Liouville equation Eq.(4.31) that leads to a nontrivial dependence of the density matrix p(t) on the field shape V(t). However, we shall show that under certain conditions it is possible to obtain an analytical solution for V(t). A formal solution of the Liouville equation Eq.(4.31) can be written in the time-ordered form:
p(t) = f exp (~ J Z(t')dAPo,
(4.36)
where T is the time ordering operator. Let us assume that if the applied field is in resonance with the controlled system, one can eliminate fast-oscillating terms like exp(iwi) and exp (—iuit) in the operator Z(t). This approximation usually called the Rotating Wave Approximation (RWA), and one can learn more about its validity in [Shore (1989)]. So, in new variables (so-called rotating basis) the evolutionary operator Z(V(t)) depends explicitly on time solely through the field envelope
V(t).
104
Optimal control and forecasting
of complex dynamical
systems
Then, using the adiabatic approximation, one can neglect the time ordering in Eq.(4.36). The essence of the adiabatic approximation is as follows [Akulin (1992)]. Let us suppose that we need to solve a system of linear differential equation: jtp(t)
= Z(t)p(t),
(4.37)
where p(t) is the n-dimensional vector, while Z(t) is the time-dependent n x n matrix. We choose initial conditions: p(t)\t0 = p{to)If eigenvectors an(t) and eigenfrequencies ujn(t) of the matrix Z{i) vary slowly with t: —wn(t) « (w„(t)) 2 , jt\an(t)\
(4-38)
Then the adiabatic approximation of the solution of the set of linear equations Eq.(4.37) may be represented in the form: Pn
= anexp( / un{t')dt'),
(4.39)
Indeed, let us substitute Eq.(4.39) in Eq.(4.37) and get an(t) exp( / u>n(t')dt') + wn(t)an{t) exp( / un{t')dt') -'to
=
Jt0
Z(t)anexv{
J con(t')dt').
(4.40)
Jt0
Given the slowness condition Eq.(4.38), the first term on the left-hand side of Eq.(4.40) is small compared to the second one and may by omitted. Then Eq.(4.40) only keeps the terms yielding the identity that determines the eigenfrequency wn. Under this approximation the density matrix p(t) depends only on the pulse area 8(t) and time t, so one can obtain an explicit expression for the functional Eq.(4.32): d = Ci{0,9,e,t).
(4.41)
Now, if we assume that we know at least an approximate analytical solution of the Liouville equation, given in terms of the explicit functional
Optimal control of quantum
systems
105
dependence p = p(9,t), then the first term in the Eq. (4.30) will be satisfied automatically, and 8C oc SCx. The corresponding extremum condition S£i = 0 yields the high-order Euler-Lagrange equation ' » * . dt2 36
<ǣ_ǤL_0, dt d9 09
(4.42,
d49 d29 p?dCob(p) - A 4^ + + A2 ^ - ^ ^ ^ 0 . dt dt ~ 2 09
(4.43)
+
or, using Eq.(4.32)
In order to solve Eq.(4.43) one can assume natural boundary conditions 0(0) = 0(0) = 8(T) = 0, 0(T) = &T, which also ensure that the control field is zero at the beginning and at the end of the control interval: V(0) = V(T) = 0. The choice of the constant 9T depends on the control problem. In general, the constants 9?, R and EQ can be also subjects of the optimization. Note, that Eq.(4.43) is highly nonlinear with respect to the pulse area 9(t) and usually can be solved only numerically. However, in the next paragraphs we are going to demonstrate, how to obtain analytical solutions for some simple optimal control problems. Equation (4.43) is the kernel result of the presented theory and provides an explicit ordinary differential equation for the pulse area 9(t). Note, that this equation is only applicable if one able to determine an explicit expression for the density matrix p = p(9(t),t). In order to show that Eq.(4.43) can describe optimal control in real physical situations, we apply our theory to a two level quantum system with relaxation (see Fig.4.2).
4.4
A n approximate analytical solution for the case of a two level system
To illustrate the above described method let us consider an optimal control of a two level quantum system. We will treat the quantum evolution of the system under the RWA. We also include decoherence effects phenomenologically using relaxation and dephasing rates: 71 ,72. This description is equivalent to the Markovian approximation of the decoherence dynamics [Weiss (1981)]. One can represent the system Hamiltonian HQ which interacts with an
106
Optimal control and forecasting
of complex dynamical
systems
22
E(t)
11 Fig. 4.2 A two level system with energy levels ei, 62 a n d with relaxation constants 71, 72 (see text) interacting with resonant external field E(t). The occupations of the levels are described by the diagonal density matrix elements p n and p22-
external electric field E(t) — V(t) cos(o;t) in the dipole approximation: H =
HQ
+ H^,
ei 0 0 e2
+
0 -H2iE'{t)
-iii2E(t) 0
(4.44)
where ej and €2 are the energy levels of the quantum system, /Z12 = M21 is the dipole matrix element, and the sign * denotes complex conjugation. Without loosing of generality one can set p,2i = M12 = 1- This means that we measure the field amplitude in energy units. Evolution of the system is described by the Liouville equation for the density matrix p(t) [Allen (1975)]: ;dp i-^~ 11 = E*(t)p2i - E(t)p12 + H1P22, dt .9p22 E(t)pi2 -E*(t)p2l-illP22, dt
(4.45)
i
~QT = w0Pl2 + E(t)(p22 - /Oil) - H2P12, P21 = P*2,
where OJQ = (.2 — £1 and 71 and 72 are phenomenologically introduced relaxation and dephasing rates. Eqs.(4.45) are widely used for the description of different quantum effects, like for instance, interaction of atoms or molecules with resonant laser field, or response of donor impurities in semiconductors
Optimal control of quantum
systems
107
to Terahertz radiation [Cole (2001)], or excitation of surface- into image charge states at noble metal surfaces [Hertel (1996)]. Let us assume that the carrier frequency of the control field u is chosen to be the resonant frequency u = UIQ. Using the Rotating Wave Approximation one can eliminate fast oscillating terms exp(iu;£) or exp(—iujt) and derive the following equations: .dp l i
= V{t)(pU
m
- p2l) + niP22, (4.46)
= V[t){p2\ ~ P12) ~ HlP22, .dpu
= V(t)(p22 - pn) - i72/5i2,
where we use the following notation: P12 = pi2exp(icot), P21
=P2iexp(-iwt).
(4.47)
Note, that p n + P22 = 1 and ^21 = P\2- We set the initial conditions as Pu = 1, P22 = P12 — P21 — 0. That corresponds to the case when the level with the lowest energy (ground level) is initially occupied, and the excited level is empty. After the application of the RWA Eqs.(4.46) have the form idp(t)/dt = Z(t)p(t) and still difficult to integrate analytically, since the operator Z(t) explicitly depends on time, and for 71,72 ^ 0, the commutators [Z(t),Z(t')} ^ 0,t ^ t'. However, one can note, that if 71,72 = 0, then [Z(t),Z(t')] = 0,t ^ t', and the system Eqs.(4.46) is integrable analytically. To make a further progress in integration of Eqs.(4.46) let us use the Magnus series [Burrage (1998)]. We write the time ordered product Eq.(4.36) for the Liouville operator Z(i) as: (4.48)
p(t) = exp(-ifi(*))/3 0 , where
fi(t) = J Z{Sl)dsx + 1 J [Z(S1), p Z(s2)dsds\ 2 1 +
4
Z(si),
[ Jo
X
\z(s2),
L
[ Jo
2
Z(s3)ds3
dsi + ...
(4.49)
108
Optimal control and forecasting of complex dynamical
systems
Thus, our assumption about the special form p = p(9, t) is equivalent that we neglect all the terms in the series Eq.(4.49) except the first one. The condition that this will be a good approximation is
\\Z(Sl)\\>\\[z(Sl),pZ(s2)ds2
(4.50)
with si S [0,T]. Now we use the Mean Value Theorem, that for a function f{x), if one can define f'(x) on the interval (a, b):
f(b)-f(a)
=
f'(c)(b-a),
(4.51)
where c € (a, b). Applying the theorem to the term JQSl Z{s2)ds2 in Eq.(4.50) we obtain: |Z(Sl)||»
(4.52)
Z(Sl),Z(c)Sl
where c € (0, si). Let us know apply the Mean Value Theorem to the term Z(c) again, that gives: dZ(c1)i
\\Z(Sl)\\ » J [Z(S1), (Z(*i) + —^(si
- c))ai]
(4.53)
where c\ G (c, s{). Replacing the terms s\ — c and si by their maximum possible value T and taking into account that Z(s{), Z(s\) = 0 we rewrite Eq.(4.53) as
^ 1 )|»THfz( Sl) ,%))l dt J\
(4.54)
Than, using the explicit form of the operator Z(t) we get : |^)|
> "^ " ^ 7 , ! , . = 1,2. dt
(4.55)
Here t,ti £ (0,T). This is the condition, under which it is possible to integrate Eqs.(4.46) and use the solution for the consequent substitution into the Euler-Lagrange equation Eq.(4.43). For a two level system under the RWA the Liouville operator Z reads (see Eq.(4.46)) as
Z(t) =
0 0 -V(t) V(t)
i7l -iji V(t) -V(t)
-V(t) V(t) -i72 0
V(t) \ -V{t) 0 -t72 /
(4.56)
Optimal control of quantum systems
109
While the initial conditions for the density matrix p(to) we set as /pn(to)\ P22{to) n(t \ _
_
(4.57)
I Pmto) \p2l(to)J
Using Eq.(4.56) and truncating the series Eq.(4.49) after the first term, it is easy to obtain an approximate analytical solution for the occupation P22(t)' P22(t)
= 262(t)F~1 ( l - cosh(ff) e x p ( - ( 7 l +
l2)t/2)
+(7i + I2)t sinh(F) e x p ( - ( 7 i + 7 2 ) t / 2 ) # - 1 ) ,
(4.58)
where
^=\/((7i-72)2i2-16^2W)A and ^ = 7i72*2 + 46>2(0. Note, that the approximate solution Eq.(4.58) becomes exact when 71 = 72 = 0 or for a constant amplitude of the control field V(t) = Vo, that is reflected by Eq.(4.55). The expression of Eq. (4.58) has the form p — p(0(i),t) and therefore we can apply Eq.(4.43) to solve optimal control problem. 4.5
Optimal control of a time averaged occupation of the excited level in a two-level system
In this section we are going to study the influence of relaxation in the controlled system on the shape of the optimal field. First we perform numerical integration of the Euler-Lagrange (see Eq.(4.43)) for various parameters of relaxation. The influence of boundary conditions on the shape of the control field is also investigated. Then we are going to present both numerical and analytical studies of the control problem using a simplified Lagrangian. In order to make a systematic study of the problem of control over a finite time interval, let us consider optimal control of a two level system, as we already have mentioned above. We assume that initially the ground state of this system is occupied and the excited level is empty, pu = 1 and P22 — 0. We are looking for the optimal shape of the field V(t) that maximizes the
110
Optimal control and forecasting of complex dynamical
systems
mean (time-averaged) occupation of the excited state P22 over time interval t € [0, T], or in other words, that maximizes the functional: n
2 = f [
P22(t)dt.
(4.59)
Note, since P22W is proportional to the observed photo-current in the Terahertz experiments on semiconductors [Cole (2001)], the value «2 is proportional to the measured photo-charge. The resonant tunnelling photo-charge through an array of coupled quantum dots is also proportional to such an expression [Stoof (1996)]. Therefore, in Eq.(4.32) we consider the objective functional density: Cob{p{t)) = P22(t).
(4.60)
In the limit of a weak relaxation in the system, the expression for the occupation of the exited state P22 obtains very simple form P22 = sin 2 (0(i)),
(4.61)
within the Rotating Wave Approximation. In this limit the corresponding Euler-Lagrange equation for the optimal pulse envelope 6{t) (see Eq.(4.43)) reads - A i ^ + A^-^sin(20)=O.
(4.62)
Using the boundary conditions 0(0) = 0(0) = 0(T) = 0 and 6(T) = 7r/2, that guarantee that the field V(t) is zero at the beginning and at the end of the control interval, and the pulse area is 7r/2, to maximize the population P22- Eq.(4.62) is known as a limiting case of the dispersive sine-Gordon equation and it is exactly integrable [Bogdan (2001)]. In the presence of decoherence (71T, 72T ^ 0) instead of the solution Eq.(4.61) we use the explicit formula for the P2i{t) given by Eq.(4.58) and then solve the corresponding Euler-Lagrange equation by standard numerical techniques using the fourth order Runge-Kutta method and "shooting" method for the boundary problem. We have calculated the optimal pulse area 0(t) and, correspondingly, the optimal pulse shape V(t) for different values of the relaxation constants 71, 72. We also considered different values of the pulse energy EQ and the pulse curvature R of the control fields. For simplicity we choose the duration of the control interval T = 1.
Optimal control of quantum systems
111
5 >
4
la
3
-t-H
ottl °0
0.2
0.4
0.6
0.8
Time Fig. 4.3 The optimal control field V(t) which maximizes the value of «2 (see text) over control time interval [0,1]. Solid line: solution of the Euler-Lagrange Eq.(4.62) for a system without relaxation (71T = 72T = 0). The pulse energy is £ 0 = 4.8 and the pulse curvature R = 182.2. Dashed line: optimal field for a system with relaxation (71T = 272T = 0.2) with the same pulse energy and curvature R = 134.32.
In Fig.4.3 we show the optimal field V(t) for a two level system without relaxation (71T = 72^ = 0) and with relaxation (71T = 272T — 0.2) for the same value of the pulse energy E0. Note, that for both cases the pulse maximum occurs near the beginning of the control interval. This solution leads to a rapid increase of the population P22{t) and therefore to a maximization of a time averaged occupation 112- We found that the optimal pulse obtained for the system with relaxation improves the value of ri2 by 50% with respect to a Gaussian-like pulse and by 25% compare to a square pulse of the same energy. We also find that in the case of a system without relaxation the pulse vanishes faster when the population inversion has been achieved, whereas for a system with relaxation the optimal pulse amplitude seems to decrease slower. One can suggest a simple physical interpretation of this result. Using Eq.(4.58) we can estimate that in the presence of external field the occupation of the upper level P22(t) decays exponentionally with the rate (71 + 72)/2 = 37i/4 that is slower than free decay of the system with the rate 71. Therefore, relaxation effects can be partially compensated by a
112
Optimal control and forecasting of complex dynamical
systems
0.8
c 0 •£J 0.6 03 3 OH 0.4
O
OH
0.2 n 0
*5
1
0.2
.
I
i
0.4
1
0.6
i
i
0.8
•
I
1
Time Fig. 4.4 Dynamics of the population P22{t) corresponding t o the fields shown in Fig.4.3 for a system without relaxation (thick solid line) and with relaxation (dash dotted line using an approximate formula given by Eq.(4.58), thin solid line-numerical solution of the Liouville equation Eq.(4.46)).
longer application of the field during control interval. In Fig.4.4 we show the corresponding dynamics of the population of the excited level p2i{t) for both cases. As we mentioned before, the expression given by Eq.(4.58) is exact for a system without relaxation within the RWA. However, for a system with relaxation it also agrees well with the numerical solution of the Liouville equation Eq.(4.46). This indicates that V(t) fulfills the condition given by Eq.(4.55) on the control interval [0,T] and therefore our approach is self consistent. Let us examine now the case of a strong decoherence, for which the controlled system is close to the "saturation" regime pn = p22 = 1/2. In Fig.4.5 we compare the optimal field V(t) for a two level system without relaxation (71T = 72 T = 0) and with relaxation (71T = 2-y2T = 5). We choose relatively large pulse energy EQ of the control pulse in the case with relaxation in order to compensate rapid decay of the exited level. Indeed, in this case control is not coherent within all control interval: Tjx^ > 1Since the optimal pulse amplitude does not change significantly over time interval, one can conclude that in the strong relaxation regime a pulse with a constant envelope V(t) = VQ will be a good approximation for the optimal
Optimal control of quantum
0.2
0.4
0.6
systems
113
0.8
Time Fig. 4.5 Optimal control field V(t) for a two level system without relaxation (71T = 72T = 0, solid line). The pulse energy is Eo = 4.57 and the pulse curvature R = 128.4. Dashed line shows the optimal pulse for a system with relaxation (71T = 272T = 5) with the energy Eo = 53.54 and the curvature R = 808.8.
one. In Fig.4.6 we show the corresponding dynamics of the population of the excited level P22(t)- Surprisingly, the analytical expression for P22M works well even for relatively large relaxation rates 71,72- The explanation of this result is that the analytical expression for p22 given by Eq.(4.58) is exact for a pulse with a constant envelope V(t) = VQ. Because the optimal field envelope is almost constant (except intervals near the boundaries) this expression works also well. We found that the value of the averaged occupation n 2 increases both for a system with and without relaxation monotonously with the energy Eo of the optimal control field. In order to illustrate this we plot in Fig.4.7 the averaged occupation n 2 as a function of the energy Eo and the curvature R of the optimal fields for a system without relaxation (71T = 72 T = 0). Note, that pulses of fixed shape (for instance Gaussian) would show an oscillating behavior for increasing energy due to Rabi oscillations [Speer (2000)]. The monotonous increase of the n 2 with EQ is a feature which characterizes the optimal pulses.
114
Optimal control and forecasting of complex dynamical
systems
1
0.8 C
o
rA/
T3 0.6
i y
J3 3
/
a o.4 o OH
x^x,
/ \
kv
s
*=<--
^\
/ /
0.2 Jy. )
i 0.2
1
0.4
,
1
i
0.6
0.8
1
Time Fig. 4.6 Dynamics of the population P22M for a system without relaxation (thick solid line) and with relaxation (dash dotted line-using formula (4.58), thin solid line-numerical solution of the Liouville equation.
4.5.1
Analytical
solution for optimal
control
field
In order to obtain an analytical solution for optimal control field, we analyze the problem in certain limiting cases. For instance, if ji^T
(4.63)
while the corresponding Euler-Lagrange equation is given by 2\0(t) - /i 2 sin(20(i))/T = 0.
(4.64)
The second order differential Eq.(4.64) requires two boundary conditions, for which we choose 0(0) = 0 and 0(T) = TT/2 (which ensure the population inversion). Equation (4.64) is similar to the equation for a mathematical pendulum and can be solved analytically in terms of special functions.
Optimal control of quantum
systems
115
cT
0,50 3
'° 3 5
PwJb
4,0
Fig. 4.7 Dependence of the time-averaged occupation n2 as a function of the pulse energy E0 and the pulse curvature R (see Eq.(4.35)) of the optimal pulses.
The first integral of the Eq.(4.64) reads as:
&-0-
\92(t)/fi2
de
+ sin2 (d(t))/T = C,
(4.65)
where C = -XV2(0) is a constant of integration and V(0) corresponds to the initial amplitude of the control field at t = 0. After the second integration we obtain:
sin 2 (^')
= t,
(4.66)
and, finally 0=
am(nV(0)t,—),
(4.67)
Here am is the amplitude of the Jacobian elliptic function [Abramovitz (1972)]. The shape of the optimal control field V(t) is obtained through differentiation: V{t) = 6(t) =
V(0)dn(fiV(0)t,k),
(4.68)
116
Optimal control and forecasting of complex dynamical systems
where dn is the Jacobian elliptic function [Abramovitz (1972)] and parameter k = — xvho) • With a help of the condition on the pulse energy £0 (see Eq.(4.34)) we can determine the Lagrange multiplier A. Note, that the control field must be nonzero at the beginning: V(0) ^ 0. If we consider the limit C —> 1 then we find, that V(T) —> 0. In this case, Eq.(4.68) can be significantly simplified to 1
£)
V(t) = — —-arccos
2exp(/xV(0)t)
.(l+exp(2fiV(0)t))l
(4.69)
Let us now consider a control problem with the opposite goal, namely, the problem to minimize the time-averaged occupation of the excited level P22- For the sake of analytical solution we consider an infinite control
Time (arbitrary units) Fig. 4.8 Solid line: optimal control field V(t) which minimizes the value of 712 (see text). Dashed line: corresponding dynamics of the occupation of the upper level P22(t)-
time interval t € (—00,00), with natural boundary conditions V(—00) = V^+oo) = 0 and 0(+oo) = TT that means the system returns back in the ground state after the interaction with a control field. Integrating the corresponding Euler-Lagrange equation one can easily obtain that the optimal field in this case described by a soliton solution
Optimal control of quantum systems
117
(see Fig.4.8): V(t) = - =
;
•
(4.70)
VXooBh{(t)/y/X/J?) Using the normalization condition for the pulse energy EQ given by Eq.(4.34), we obtain that
This result means that the soliton Eq.(4.70) is not only one possible solution that propagates without losses (since Eq.(4.64) does not have spatial variables, we can treat it as the sine-Gordon equation in the limit of the optically thin media), but it also minimizes possible energy losses which are proportional to n2 = J_ °° P22(t)dt in the limit of a weak relaxation. Using other asymptotic values of the pulse area 0(+oo) = Nir, where N is an integer number N = 2,3,..., one can immediately reproduce 2TT,3-K... soliton solutions(l). All they are the optimal pulses corresponding to different values of the applied pulse energy EQ. 4.5.2
Optimal
control at a given
time
Now we demonstrate that there is an essential difference between optimal control problem with a goal to maximize an objective at a given time T or to maximize the same objective, averaged over the control interval [0,T]. For this purpose, instead of Eq.(4.63), we consider the Lagrangian density: C1=p22(T)
+ \e2(t)/^,
(4.72)
that corresponds to a problem of maximization of the occupation of the upper level p2i{T) at a given time T. In order to make a comparison with the solution given by Eq.(4.68) we set 71 = 72 = 0. Using the same procedure, as for derivation of Eq.(4.64)), we obtain the corresponding Euler-Lagrange equation 2\6{t) - fi2S(T - t) sin(20(t)) = 0,
(4.73)
where S(t) is the Dirac delta function. One can check that the solution of Eq.(4.73) is the envelope V(t) with a constant amplitude: V«) = ^ ,
(4.74)
118
Optimal control and forecasting of complex dynamical
systems
that reflects the fact that there is only one optimal pulse with energy EQ = 7r 2 /(4T/i 2 ) which drives the occupation ^22 to 1 at given time T. This analytical result one can compare with a numerical solution for a similar problem obtained by Wusheng Zhu, Jair Botina, and Herschel Rabitz using iterative numerical technique [Zhu (1998)]. From the comparison between solutions Eq.(4.68) and Eq.(4.74) one can conclude, that the formulation of the control over time interval is a nontrivial generalization of the optimal control theory, and permits to perform more detailed coherent control of quantum systems. It is important to point out that the Lagrangian density of the form of Eq.(4.63) or Eq.(4.72) always leads to a second order differential equation for the control fields as long as the condition p = p(6(t),t) is satisfied. Therefore, one cannot demand extra boundary conditions for the control field, for instance, V(0) = V(T) = 0. Otherwise one would obtain the trivial solution V{t) = 0, which is not consistent with the condition on the pulse energy (see Eq.(4.34)). Therefore, if conditions on V(0) and V(T) have to be imposed the Lagrangian leading to at least a forth order differential equation is necessary, as we have shown before. Now let us investigate the question how decoherence affects the shape of optimal control fields. In Fig.4.9 we plot optimal control field V(t) which maximizes the simplified Lagrangian given by Eq.(4.63) for a two level system with (71T = 272T = 0.2) and without (71 = 72 = 0) relaxation. For the system with relaxation we use the analytical solution for the optimal control field given by Eq.(4.68). In both cases the field has its maximum value at t = 0 and exhibits a monotonous decay. As in the case of the solutions of the Eq.(4.43) the control field is "broader" for the system with relaxation, in order to compensate its decay of the exited state. In the Fig.4.10 we plot corresponding dynamics of the population P22{t)The overall behavior of p22{t) is similar to that of the populations shown in the Fig.4.4. This means that the essential physics of the optimal control is already contained in the second order differential equation. The boundary conditions V(0) = V(T) = 0 change dramatically the shape of the optimal fields, but they do not affect significantly the dynamics (e.g. P22CO) of the controlled system. In the Fig.4.11 we also show results for the case of large decoherence 71T = 272T = 5 . As in Fig.4.9 the field envelope V(t) is maximal at t = 0 and exhibits a monotonous decay. As in Fig.4.5 the control field is close to be constant in the strong decoherence regime. In the Fig.4.12 we plot the corresponding
Optimal control of quantum
0.4
119
systems
0.6
Time Fig. 4.9 Optimal control field V(t) which maximizes the time averaged occupation ni in a two level system. Solid line: system without decoherence (71X = 72X = 0). The pulse energy is EQ = 4.6 in both cases. Dashed line shows the optimal pulse for the system with relaxation (71X = 272X = 0.2) with the same energy.
population P22(t) dynamics. As in the Fig.4.6 the system is close to the saturation regime. 4.5.3
Estimation of the absolute to decoherence
bound for the control
due
It is known, that decoherence in the systems creates obstacles for optimal control [Ohtsuki (1999)]. With the purpose to estimate quantitatively, how relaxation and dephasing processes limit control of the time average n-2 of the occupation of the upper level, we analyze the occupation p22(t) (see Eq.(4.58)) in more detail. For a strong control field satisfying 7i,2*/0(t) « 1, t 6 [ 0 , T ] ,
(4.75)
the occupation P22W always lies under the curve P22 aa! (i) = (1 + e x p ( - ( 7 i + 72)*/2))/2.
(4.76)
This means that it exhibits an absolute upper bound. In order to illustrate this bound we plot in Fig.4.13 the dynamics of the occupation P22W for
120
Optimal control and forecasting of complex dynamical
0.4
systems
0.6
Time Fig. 4.10 Dynamics of the occupation pii (t) for system without decoherence (thick solid line) and with relaxation (dash dotted line-using an approximate analytical solution given by Eq.(4.58), thin solid line-numerical solution of the Liouville equation Eq.(4.46)).
40 randomly generated control pulses applied to a two level system with relaxation parameters j\T — 272T = 1 and setting T — 1. Therefore, due to dissipative processes the following inequality holds for the controlled averaged value of p 2 2 : n2
-u
P22(t)dt < 1/2 +
(l-exp(-(7l+72)r/2) (71 + 72)T
(4.77)
that is the absolute limit for the optimally controlled upper averaged occupation in a two level system with relaxation. Using expression Eq.(4.77), one can investigate two limiting cases. For a system in the weak decoherence regime (71,2^ and the maximum possible value of the controlled objective is achieved: n 2 = 1. In the strong relaxation limit 71,23" ~ 1 system is in the saturation regime, so that ground and exited levels are approximately equally occupied within control interval, Pn ~ p 2 2 « 0.5 and n 2 ~ 0.5. It is not possible to overcome the limit given by Eq.(4.77) within the considered model (the Markovian approximation for the decoherence ef-
Optimal control of quantum
systems
o!6
o!8
121
JU
/—^ 4-»
> 25 T3 20
CU
O
5
0
02
0.4
Time Fig. 4.11 The optimal control field V(t) for a two level system without decoherence (71 = 72 = 0, solid line). The pulse energy is Eo = 20.50. Dashed line: optimal field for a system with decoherence (71 = 272 = 5) with a pulse energy Eo = 89.72.
fects, described only by 71, 72, RWA, and the adiabatic approximation). However, the optimal pulse that satisfies Eq.(4.43) provides the highest possible value of the objective for a given pulse energy. For example, in comparison with pulses having square amplitude or Gaussian pulses, the optimal fields, obtained in our calculations, lead to a value of 712, that is up to 50% higher. With Eq.(4.77) we can, for example, determine the maximal possible lifetime for an image state at a C u ( l l l ) surface which can be achieved by pulse shaping. According to Hertel et al. [Hertel (1996)], those states are characterized by 71 = 5 • 1 0 1 3 s - 1 and 72 = 7i/2. Thus, under the described assumptions, the theory predicts the minimum possible limit for the effective decay rate ^eff > (ji + 72)/2 = 3.75 • 1 0 1 3 s - 1 .
4.6
Optimal control of nanostructures: dot
double quantum
Within the same time period, when the design and control of femtosecond laser fields exhibited its great success (80th-90th), mesoscopic science
122
Optimal control and forecasting of complex dynamical
systems
0.8 C
o \S
0.6
"3 OH
0.4
O OH
0.2 i 0
i
I 0.2
.
I 0.4
.
I 0.6
•
I 0.8
.
I 1
Time Fig. 4.12 Dynamics of t h e occupation P22 (t) for system without decoherence (thick solid line) and with decoherence (dash dotted line using Eq.(4.58), thin solid line-numerical solution of the Liouville equation Eq.(4.46)).
advanced as a new era in physics and technology. Fabrication of mesoscopic systems and nanostructures (like clusters, nanowires, quantum dots, etc.) and further investigations of their properties promises a technological breakthrough in many areas. For example, a further development of quantum dots (QD), which are often described as artificial solid-state atoms, provides a possibility to develop a new generation of semiconductor lasers. In these studies the semiconductor dots and rings are made from indium arsenide embedded in gallium arsenide [Groom 2002]. They were grown using techniques developed within the past decade that allows much smaller nanostructures to be created. Such studies provide new perspectives on the internal quantum-mechanical workings of quantum dots. The ultimate goal of mesoscopic science is to create useful electronic and optical nano-materials that have been quantummechanically engineered by tailoring the shape, size, composition and position of various quantum dots and other nanostructures. Excited electronic states in quantum dots usually persist for a relatively long time because they interact in a very restricted way with their environment. Normally, such interactions lead to decoherence or destruction
Optimal control of quantum
systems
123
Fig. 4.13 Dynamics of the occupation P22M for 40 randomly generated control pulses using 71T = 272T = 1 (thin solid lines). The thick solid line represents a bound for the possible values of p??2ax(t) = (1 + e x p ( - ( 7 l + 7 2 ) t / 2 ) ) / 2 .
of the quantum state. As a result, quantum dots may provide an excellent solid-state system for exploring advanced technologies which based on quantum coherence. For example, it may be possible to create and control superimposed or even entangled quantum states using highly coherent laser stimulation [Oliver (2002)]. External control of the full quantum wavefunction in a semiconductor nanostructure may even lead to revolutionary new applications, such as those involving quantum computing, making computation in orders of magnitude faster than it is possible today. Because the underlying principles of optimal control of quantum dynamics are broadly applicable, it is a very attractive idea to combine femtosecond dynamics and mesoscopic physics and to perform coherent control on mesoscopic objects. For such systems the electronic degrees of freedom might offer the possibility of control by pulse shaping. An important requirement for optimal control of quantum dynamics is the existence of phase coherence over a time range comparable to the duration of the pulse sequence, in terms of decoherence rates and control interval T: 7i,2T
(4.78)
124
Optimal control and forecasting of complex dynamical
systems
This condition can be fulfilled by mesoscopic systems like quantum dots. Many of the quantum-dot systems currently being studied have the potential to be combined into molecular complexes with one- or two-dimensional structures. One can imagine growing these solid-state "atoms" or "molecules" within structures containing electronic or magnetic gates and optical cavities, perhaps all connected together by quantum wires. One of the primary interest of scientists is the investigation of the currents in such systems resulting due to interaction with external coherent fields. Coherent control of carrier dynamics in mesoscopic systems using external ultrashort time-dependent fields has become a subject of active research in recent years [Cole (2001); Bonadeo (1998); Heberle (1999); Dupont (1995)]. In particular, the study of photon induced and photon suppressed quantum dynamical tunnelling [Grifoni (1998)] has attracted much attention due to potential applications of these effects in quantum computing. Sequences of control pulses of different durations affect the systems in different ways and permit to perform some restricted manipulation of physical quantities, like the photon-induced current [Cole (2001); Heberle (1999)]. In next section we are going to study the possibility of the optimal control in nanostructures. Using a double quantum dot system as a model device we determine optimal control fields which maximize the transferred charge in the system. The optimal solutions were obtained numerically using genetic algorithm, as we described it in chapter 2. We also investigate the influence of decoherence in the system on the shape of the optimal fields.
4.6.1
The optimal field for the control of the photon sisted tunnelling between quantum dots
as-
Now let us we investigate optimal control of the carrier dynamics in nanostructures. As a model device we consider an electron pump based on resonant photon-assisted tunnelling through a double quantum dot [Stafford (1996)], [Grigorenko (2002)]. We search for the shape of the optimal control field in order to maximize the transferred charge in the system. First, we give a physical picture and equations of motion in terms of reduced density matrix. Then we formulate the control problem that is reduced to the control of a time averaged current between two quantum dots. We also briefly describe the method that we employed for the numerical solution. We consider a device that is a double quantum dot coupled to two metal-
Optimal control of quantum systems
125
lie contacts (reservoirs) and configured as an electron pump as described in [Stafford (1996)]. This device is illustrated in Fig.4.14. The double quantum dot can be modeled by only two non-degenerated and weakly coupled electron levels with energies e\ and £2- By sweeping the gate voltages one can vary Ae = £2 — £\The bonding and antibonding states, that are a superposition of the wavefunctions corresponding to an electron in the left or in the right dot, have an energy splitting of AE = Eantibonding ~ Ebonding
= \ / A e 2 + 4d 2 ,
(4.79)
where d is the tunnel coupling between the two dots. The left quantum dot ("1") is connected to the reservoir on the left, and the right quantum dot ("2") is coupled to the right reservoir. The applied voltage is biased in such a way that the chemical potential of the left reservoir /J,R is lower than that on the right reservoir /J,L- Therefore, in absence of external perturbation the level " 1 " is occupied whereas level "2" is empty. Since we also assume that the coupling between the quantum dots is very weak, no current flows in the absence of external fields. If an external resonant gate voltage is applied to the system, then the double quantum dot works as an "electron pump": the Rabi oscillations of the electron occupations occur between the levels 1 and 2, and electrons can tunnel from the left to the right reservoir [Stafford (1996); Speer (2000)]. The Hamiltonian of the double quantum dot coupled to the external field F(t) = V(t)coswt, which causes the on-site energies to oscillate against each other, can be expressed as 2
HDQD
= 5^£i(t) c+Ci + d (4c2 + c+ Cl ),
(4.80)
t=i
where cf (q) is the creation (annihilation) operator for an electron on dot i and the diagonal matrix elements are given by 6i(t) = (—l)l/2(Ae + F(t)). The intradot interactions are absorbed in the on-site energies, which are assumed to be non-degenerate. We are going to find the time-dependent optimal amplitude V(t) which maximizes the transferred charge in the system. The Hamiltonian for the metallic reservoirs and the tunnel barriers is
126
Optimal control and forecasting of complex dynamical
systems
maximal *• • current
K Fig. 4.14 Illustration of the "electron pump" device: a double quantum dot coupled to contacts. Electron can tunnel from the left contact to the left quantum dot. If the resonant control field is applied, the electron can jump to the right quantum dot and it can further tunnel to the right contact.
given by [Stafford (1996)]: HRT
= Y,
£
™ CM CM + 5Z WkL (C*L CI + ct c k J
k,l=L,R
k W
c
c
+ J2 ™ ( kR 2 + 4 ckfl) + Urnn2.
(4.81)
k
Here, cj^, with £ = L,R creates an electron of momentum k in reservoir £ = L,R. The quantities Ww, represent the tunnel matrix elements between the reservoirs and the quantum dots, U is the magnitude of the interdot electron-electron repulsion, and the occupation operators ni and n-i are given by n\ = d[cx and ni = c%,c-i- For simplicity, spin is neglected. In the derivation of equations of motion we have used the following approximation. We assume that the reservoir on the right has a broad band of unoccupied states, so that once an electron has jumped from the second quantum dot to the reservoir it cannot jump back any more. Thus, the time scale for the tunnelling process between the second dot and the reservoir on the right is determined by a transfer rate T2 = 27rpfl(e)|Wicft| , where PR is the density of states in the right reservoir. Similarly, the transfer rate
Optimal control of quantum systems
127
to the left contact Ti is given by Ti = 2irpL(s)\WiiL\ . In order to describe the electron dynamics we use a density matrix approach similar to that was derived in [Stoof (1996)]. For the given above Hamiltonian, the master equation for the density matrix p(t), which describes the evolution of the system reads: ^gl/On = iTiPo + d(p12 - P21), ih-glPw ~ - i r 2P22 + d(p2i - pn),
d
r2
d
r2
^ g T P l 2 = -IYPl2
Jfi.^7P21 = ~iyP21
+ 2£
(4.82)
l ( 0 P l 2 + d(P22 - P l l ) ,
+ 2£ 2 (t)P21 + d(p22 ~ P l l ) .
Equations (4.82) allow us to investigate the case of zero and infinite interdot Coulomb repulsion U by choosing the proper expression for the quantity Po- For U = 0 we write po = 1 — pn, whereas the case U —> 00 requires Po = 1 — p n — P22> which projects out double occupancies [Stoof (1996)]. The initial situations are set as pn = 1, P22 = 0, as can be inferred from Fig.4.14. We are going to consider the photon assisted tunnelling when the one-photon resonance condition HUJ = \ / A £ 2 + 4d2
(4.83)
is satisfied. For simplicity we can assume symmetric coupling to the reservoirs: T 2 = I Y Equations (4.82) can be solved analytically in some limiting cases. For instance, considering only an isolated system of two quantum dots in an electric field, periodic in time and with a constant amplitude V(t) — VQ, an electron placed on one of the dots will oscillate back and forth between the dots with the Rabi frequency (Ti = T 2 = 0) UR\
^ - £,„,£,,
(4.84)
where J^ is the Bessel function of order N. N refers to the number of photons absorbed by the system in order to fulfill the resonance condition Nhuj = \/Ae2+4d2.
(4.85)
From the numerical integration of Eqs.(4.82) one can obtain the charge QT transferred from the left to the right reservoir due to the action of the
128
Optimal control and forecasting of complex dynamical systems
external field over a finite time interval [0,T]. For that purpose we write the current operator J = id/h~(c~l c 2 — c2 c^) which leads, in combination with Eqs. (4.82) to the time dependent average current (/(*)> = eTr{pj}
= e ^ . + ^
p22(t),
(4.86)
where e is the electron charge and TV is the trace operation. The net transferred charge from left to the right quantum dot QT is obtained as
QT = J
dt (/(*)) = ^
/
dtp22(t) + eP22(T).
(4.87)
Obviously, QT only represents the transferred charge to the right reservoir when T2 ^ 0. Otherwise it represents the charge in the right quantum dot. In the Eq.(4.87), the second term indicates that, after the field is switched off (t > T, the charge remaining in the second quantum dot ep22(T) will be completely transferred to the right reservoir. It is important to point out that QT = QT[V(t)] is a nonlinear functional of the field amplitude V(t), and can exhibit different types of behavior, depending on the form of V(t). For instance, if the external field has a Gaussian shape V(t) — Voexp(—t 2 /2r 2 ) of duration r, then QT shows Stiickelberg-like oscillations as a function of r [Speer (2000)]. However, the Gaussian shape of V(t) does not necessarily maximize the transferred charge. Our goal is to find the optimal pulse shape Vopt(t) which maximizes QT, i.e., which satisfies Q1pax = QT\Yopt{t)\The problem of finding Vopt(t) is complicated because of its nonlinearity contained in p22{t). Therefore, we are going to determine the shape of the optimal control fields numerically using the genetic algorithm as a global search method. In our present approach a vector representing the "genetic code" is just the pulse shape V(t) discretized on a time interval t € [0,T]. The fitness function, i.e. the functional to be maximized by the successive generations is the transferred charge Qr[V(4)] (see Eq.(4.87)). Note, that the fitness function in fact is a combination of two parts, one is the value of a function at a given time, the second one is an integral (time average). Thus we have a hybrid control problem. We assume that the control field is active (non-zero) within the time interval [0, T] and it satisfies the boundary conditions f (0) = V(T) = 0. As initial population of field amplitudes satisfying the boundary conditions
Optimal control of quantum
systems
129
we choose Gaussian-like functions of the form V}°\t) = 1° exp(-(* - tjffr2) t (t - T),
(4.88)
with random values for the position of the maximum tj G [0, T] and the duration Tj € (0,T]. The peak amplitude I? for each pulse is calculated from the condition that all pulses must carry the same energy
E = J F2(t)dt « - / V2(t)dt. Jo 2 JQ
(4.89)
Equation (4.89) represents an integral isoperimetric constraint for a possible solution. The value of the pulse energy E can be considered also as a parameter to be optimized. Although the above formulated control problem (Eqs.(4.82),(4.87), (4.89) is applied to a rather simple quantum system, as we shall see, it is a rich problem, and the optimal control fields have a nontrivial shape and induce complicated dynamics of the electron occupations in the system. The parameters used in calculations are given in terms of the tunnelling matrix element d. The energy difference Ae = £2 — £i must be much larger than d to ensure that the ground state of the double quantum dot is localized on the left side and the excited state is localized on the right side. This also leads to a sharper resonance behavior. Therefore we set As = 24d. In calculations we use symmetric coupling of the quantum dots to the reservoirs I^i = T2 = T and compute optimal field shape for different values of the coupling constants T = 0, O.Old, 0.05(1 It is important to point out that r must be smaller than d, so that the Rabi oscillations do not become over-damped. If T is large, the systems saturates very rapidly to Pu = P22 = 1/2, and no interesting transient dynamics can be observed. Finally, we choose the control interval T = lOOH/d which is large enough to allow back and forth motion of the electrons between the quantum dots and is of the order of h/T. In this calculations we also set a implicit constraint on the minimal width of the pulse in order to describe pulses which can be achieved experimentally. In our calculations this minimal width is naturally determined by the discretization of the time interval and by the smoothness parameters kc and km of the crossover and mutation operations (see Eqs.(2.20),(2.21)). Let us first discuss the results for [7 = 0, i.e., neglecting the inter-dot Coulomb repulsion. From the elementary analysis of Eq.(4.82) it is clear that the optimal pulse first should be able to transfer an electron from the
130
Optimal control and forecasting of complex dynamical
systems
(a)
f'"~~"~x 40 i
20
\\ \\
t
I
i
n
v
2
3 time [ti/d]
4
f
0.8
(b)
1 0.6 |o.4 0.2
J
f 2
3 time [h/d]
4
Fig. 4.15 Optimal control field for the isolated double quantum dot ( r = 0). (a) Solid line: reference square pulse of duration r = 7rf2^, ox , intensity Vo yielding the first maximum of J\{Vo/hJ) (see text), and energy EQ. Dashed line: optimal pulse shape for the maximization of the charge transferred from the left to the right quantum dot. T h e pulse energy is Eo. (b) Corresponding time-dependence of the occupation P22 (t) on the second dot for the pulses shown in (a).
left to the right quantum dot (inversion of the occupations) and then to keep this situation as long as possible. However, there are many different pulse shapes able to achieve this situation, and it is a priori not clear which one maximizes QT- There are three basic time scales involved in the problem: the control time interval T, period of the Rabi oscillations tu = LJR-1, and decoherence time t^ = Ti"1. For T —> 0, for example, Eqs.(4.82) can be solved analytically for some limiting cases. If, for example, the external field is periodic in time V(t) = Vocoswt, with a constant amplitude Vo, an electron placed on one of the dots will oscillate back and forth between the
Optimal control of quantum
131
systems
J U
0.8
0)
T3 3
1 0.6
20
|
"Si
0.4 0.2
S 10
0 " 0
• 20
• 40
• 60
• 80
1 100
time [»/d]
"53
0
20
40
60 time [ft/d]
80
100
Fig. 4.16 Optimal pulse shape for the maximization of the charge transferred from the left to the right quantum dot. Pulse energy is E = 0.57i?o, r = 10 _ 6 d. Inset: the corresponding time-dependence of the occupation P22W on the second dot.
dots with the Rabi frequency [Zeldovich (1967); Tien (1963)]: n = 2d/fiJi(V0/M.
( 4 - 90 )
being Ji the Bessel function of the order 1, if the system absorbs one photon. Note, that UJ must fulfill the resonance condition Two = %/Ae2 + Ad2. The description of the tunnelling dynamics for pulses of varying intensity is more complicated, because the Rabi frequency changes in time. Thus, for pulses of a constant amplitude there is an upper limit O m a x for the Rabi frequency, which is obtained when the ratio x — VQ/JVJJ is the first maximum of the the function J 1(2:). Using this property we construct a reference pulse of square shape (V(t) = VQ for 0 < t < r and V(t) = 0 otherwise) with intensity V0 as defined above and duration r = irQ^ax. In the following we will use the energy E0 of such reference pulse as a unit of pulse energies. In principle one would expect that the reference pulse denned above exactly achieves an inversion of the occupation in the double quantum dot within the shortest time (assuming only one-photon absorption). However, as we show below, such a pulse shape is not absolutely the best one, and one can obtain slightly better beyond the adiabatic approximation, which was assumed valid to derive expression Eq.(4.90). In Fig.4.15 we compare the
132
Optimal control and forecasting of complex dynamical
systems
OVJ
(a) 60 •E.40
e £ 20 4—
I 20 1 0.8
40 60 time [ti/d]
80
100
40 60 time [h/d\
80
100
jus
I 0.6
I 0.4 a. 0.2 0
20
Fig. 4.17 (a) Optimal pulse shape which induces maximal current for T = O.Old. Pulse energy is E = 4.26Bo (b) Corresponding behavior of P22(t).
effect induced on the isolated double quantum dot ( r = 0) by the reference square pulse with that induced by the optimal one calculated using genetic algorithm and having the same energy EQ. AS one can see in Fig.4.15(b), genetic algorithm finds a pulse shape which induces a slightly faster transfer of the charge. This result inspires us to perform calculations for more complicated problems. In the following examples we are searching also for the minimal pulse energy E. Note that, if no constraints are imposed on the width of the pulses, a pulse of infinitely small width (delta-like pulse) should yield to the maximal QT- Such pulse would produce p22{t) = 1 over the whole time control interval, leading to the maximum possible value Q™ax = eTT/h + e. However, the energy of such pulse would diverge. Pulses with zero width and infinite energy cannot describe a realistic situation. Moreover, for such pulses the whole model would break down, since very narrow pulse has very large
Optimal control of quantum
systems
133
40 60 time [h/d\
Fig. 4.18 (a) Optimal pulse shape for T = 0.05d. Pulse energy is E = 121E0 (b) The corresponding behavior of f>2i{t)-
spectral width and would excite many levels in each quantum dot. In Fig.4.16 we show the optimized pulse shape for the maximization of the charge transfer in the almost isolated double quantum dot (T = 10~6d). The optimal pulse excites the system at the beginning of the control time interval, inducing the inversion of the occupations. P22(t) reaches the value 1 when the pulse goes to zero. Since TT/hbar is very small, this occupation remains constant in time. As a consequence QT is maximized. From the comparison between Fig.4.15 and Fig.4.16 we see that a limitation of the minimal pulse width, that we employed for this calculations, leads to more symmetric and smooth optimal solutions. The corresponding evolution of the occupation of the second quantum dot is shown in the inset of Fig.4.16. In Fig.4.17 we show the optimal field envelope and the induced occupation P22W dynamics in the case of weak coupling to reservoirs with coupling constant T = O.Old. Note, that the optimal field is structured as a sequence
134
Optimal control and forecasting
20
40
60
80
100
of complex dynamical
20
time [h/d]
0
20
40
60
time [h/d]
40
systems
60
80
100
80
100
time [h/d]
80
100
20
40
60
time [h/d]
Fig. 4.19 Illustration of the optimization process using genetic algorithms. Evolution of the "fittest" pulse shape for maximization of the current for T = 0.05d.
of two pulses (see Fig.4.17(a)). The first one acts at the beginning and has the proper shape to bring the occupation of the second quantum dot to a value close to 1. However, since TT/H = 1 and according to Eqs.(4.82), P22(t) starts to decrease as soon as the first pulse goes to zero. Shortly before the end of the control time interval the second pulse brings the occupation p22(t) again to a high value (Fig.4.17(b)). The structure of the optimal pulse can be easily interpreted with the help of the expression of QT as a functional of p22{t) ( se e Eq.(4.87)). The first pulse tends to keep the term ^j/*- J0 dtp22(t) as large as possible, whereas the second pulse acts to increase ep22(T). As a consequence, QT is maximized. Figure 4.18 shows the results for the same system, but with larger coupling constant, namely T = 0.05d. As can be seen in Fig.4.18(a), in this case the optimal solution also exhibits pulses at the beginning and at the end of the control interval, but also a complicated sequence of pulses between them,
Optimal control of quantum
60
a 43 o
"O il) Ui
systems
135
3.1 3
<4t/1
s 03
I-I
2.9 2 g u.
.
' 0
10
,
•
20 30 Iteration
40
50
Fig. 4.20 Evolution of the transferred charge for increasing number of iteration of the GA for r = 0.05rf.
which prevents P22W from decay to zero and "stabilizes" it around the saturation value 1/2, i.e., at the state where both dots are equally occupied (Fig.4.18(b)). If the coupling constant T is increased further, the field structure in the middle of the time interval becomes more important. In the limit of large T we expect a square pulse to maximize QT, since the system is overdamped and goes into the saturation regime; in this case the pulse only needs to transfer charge at some constant rate with a value of the order of I\ In order to illustrate the progress achieved by genetic algorithm during optimization process we show in Fig.4.19 shapes (envelopes) of the fittest pulses at different stages of the genetic evolution, for the case of T = 0.05d. In Fig.4.19 (1st iteration) one of the pulses of the initial population ("parents" ) is plotted. As all other "parents", this pulse is a Gaussian-like (see Eq.(2.19)). "The best representative", shown in Fig.4.19, induces an excitation of the system at the early beginning of the control time interval. The successive application of the genetic operations improves the pulse shape and transforms the initial Gaussians into more complicated pulse sequences. As a result, the envelope of the fittest pulse of the tenth generation Fig.4.19, for instance, exhibits several peaks. After just 30 iterations the pulse shape already exhibits most of the features of the optimal pulse, and after 50 iterations it converges to the optimal one. To illustrate this convergence we show in Fig.4.20 the evolution of the
136
Optimal control and forecasting of complex dynamical
systems
200
40
60
100
time [h/d]
Fig. 4.21 Optimized pulse form for T = O.Old (solid line, pulse energy is E = 48AEo) and T = 0.05d (dash line, pulse energy is E — 13Eo) for the maximization of the average occupation P22 of the second quantum dot (see text).
transferred charge Qx[V(£)] as a function of the number of iterations of the genetic algorithm. It is clear that after 50 iterations the pulse induces a transferred charge very close to the optimal one. For the sake of comparison in Fig.4.21 we show the optimal pulse shape which maximizes solely the mean occupation of the second quantum dot, using as a fitness function the time average of the occupation on the second quantum dot: P22 =
u
dtf)22(t),
(4.91)
using r = O.Old, 0.05d Interestingly, the optimal pulses have also two peaks structure. Indeed, the first pulse pumps one electron from the left to the right quantum dot. The reason for the position of the second pulse is that its efficiency in increasing P22M is maximal when P22(t) reaches its minimum, i.e. at the end of the control interval. This result is a direct consequence of the nonlinear character of the considered control problem. One should also mention, as for the case of optimal control of a two-level quantum system, and for the above discussed problem as well, the optimal control fields have a tendency to be "broader" in the case with larger decoherence in the systems, namely there is some sequence of pulses not only at
Optimal control of quantum
1' 0.01
' 0.015
' 0.02
' 0.025
' 0.03
137
systems
' 0.035
' 0.04
' 0.045
' 0.05
T/d Fig. 4.22 Dependence of the total transferred charge QT (diamonds) induced by the optimal pulse as a function of the coupling V to the leads for U = 0. The circles represent the value of QT for U —> oo.
Pulse shape
QT
optimal pulse rectangular pulse Gaussian pulse constant pulse
1.29 0.85 0.74 0.77
the beginning and at the and of the control interval, but also some structure in between. Prom this example one can learn that it is a general property of the control fields for systems with decoherence and this property one finds for various functional to optimize. In order to investigate the influence of the interdot Coulomb repulsion U we perform calculations similar to those described above, but for the case U —> oo using the same set of coupling parameters T. We found that the repulsion between quantum dots leads to a relatively smaller net transferred charge (see Fig.4.22). This is due to the fact that U —• oo prevents double occupancies in the system. Therefore an electron from the left reservoir can jump into the double quantum dot only when the previous electron has already left the system and was transferred to the right reservoir.
138
Optimal control and forecasting of complex dynamical systems
Finally, to emphasize our results and to show that pulse shaping can indeed lead to a remarkable enhancement of the photon assisted current through double quantum dots, we indicate in the Table the values of the transferred charge QT for the coupling constant T = O.Old and pulses having different shapes V(t) but carrying the same energy E. As expected, the optimal pulse found by the genetic algorithm (already shown in Fig.4.17) induces clearly more transferred charge than pulses having other shapes. It is important to point out that the rectangular and Gaussian pulses mentioned in the Table are the fittest ones among rectangular and Gaussian pulses, respectively. Thus, the optimal pulse induces 1.74 times more charge than the best Gaussian pulse, and 1.5 times more charge than the best rectangular pulse.
4.7
Analytical theory for control of multi-photon transitions
Using Floquet formalism and the adiabatic approximation we develop a theory for optimal control of a quantum system interacting with external electric field under multi-photon resonance. An optimal solutions for the case of control over time interval are presented. We investigate how the order of the photon resonance affects the shape of the optimal control field. In this chapter we already have demonstrated that some analytical solutions of optimal control problems can be obtained using the adiabatic approximation together with variational formalism. As it was shown in the previous sections, the effect of weak decoherence on the shape of optimal control fields can also be taken into account. All these analytical and semi-analytical results were obtained under the assumption of weak control field and one-photon resonance. However, the weak response limit is not restricted to single-photon processes. For example, one can consider optical control of an excited state which is not accessible via a direct electric dipole transition from the initial ground state. In this case two- or even multiphoton process will be responsible for the population transfer. Since multi-photon interaction usually assumes significant nonlinearity of the corresponding system's dynamics, the integration is usually done using numerical procedure, because the overall mechanism is quite complex due to the multiphoton nature of the laser driven dynamics. For example, in [Speer (2000)], using the direct numerical integration of equations of motion for a double quantum dot system, it was shown that under two-photon
Optimal control of quantum systems
139
resonance the population transfer can be more effective than in one-photon case, even if one restricts himself to simple Gaussian control pulses. That is why it is significant and highly desirable to develop analytical theory which will help to get more insight for optimal control of quantum systems under the multi-photon resonance. Our approach to optimal control problem is based on derivation of analytical solutions for the occupation of the quantum level using the adiabatic approximation and consequent application of the variational principle. The corresponding Euler-Lagrange equations determining the optimal control field are derived in close and integrable form, so we can determine qualitatively and quantitatively how the order of the multi-photon resonance change the shape of the optimal control field. We consider an N + 1 level quantum system, described by the Hamiltonian Ho with eigenstates \i > (i — 1,..., N + 1), which interacts with a classical laser pulse. The total Hamiltonian of the system is then of the form: H{t) = H0 + nV{t) cos{cjt),
(4.92)
where fx is the dipole matrix, V(t) describes the envelope of the control field. Let us first focus on the case of a three level system and two photon resonance in the system. Let Ei,E2, E3 are the corresponding energy levels described by the amplitudes 01,02,03 interacting with an external control field E(t) — V(t)cos(ut). We assume the so-called "cascade" scheme of the levels, thus we take into account transitions between levels |1 > and |2 > and between levels |2 > and |3 >, with the corresponding dipole matrix elements /U12 and ^23, and neglect the direct transition between |1 > and |3 > (Mis » 0). The dynamics of a three-level quantum system interacting with external field can be described then as: ihai = E\di + H2iV(t)a,2Cos(u)t), iha2 = Eia-i + H2iV(t)ai cos(uit) + /J,23V(t)a3 cos(wi),
(4.93)
iM3 = E3C13 + V32V (t)a2 cos(urt). We choose the initial condition as oi(0) = 1,02(0) = 03(0) = 0. In order to obtain analytical solutions in close form it is useful to assume that the field's envelope V(t) changes adiabatically slowly with time. Thus we restrict ourselves to optimal control fields, which modulation of the
140
Optimal control and forecasting of complex dynamical
systems
field's envelope does not affect the carrier frequency ui. We also assume the case of two-photon resonance: w = a>i + w2-
(4.94)
Here we use notation for the transition frequencies u>\ = Ei — Ei,u>2 = E3 —Ei. Because the Hamiltonian of the system is almost periodic in time (for the fields envelope V(t) changing slowly on the time scale of w _ 1 ) we can use the Floquet approach (see, for example, a nice review [Grifoni (1998)]) in order to get analytical solutions for the amplitudes a 1,02,03. We represent the amplitude a^ (i = 1,2,3) as an infinite series of quasienergetic functions 0,,^: 00
a,i(t) = ^2
ai:i(t)exp(ilujt).
(4.95)
i=—00
By substituting this expression into the Eq.(4.93) and making equal the terms with the same frequencies, one obtain an infinite system of coupled equations. By neglecting rapidly oscillating quantities and keeping equations, containing only resonant terms 01,0,02,-1,03,-2, we obtain: iMifl = Eiaifl +//2iV"(t)o2,-i/2 i M 2 , - i = (£2 -
fiwK-i
+ /*2iV(t)oi, 0 /2 + H23V(t)a3,-2/2,
iftd3,_2 = (E3 - 2^)03,-2 +
(4.96)
l^32V(t)a2,-i/2.
For a slowly changing field amplitude V(t) and under the condition of the exact two-photon resonance 2w = E3 — E\, one can apply the adiabatic approximation and obtain analytical expression for the occupation of the third level P33(t): P33(i) = |a 3 (i)| 2 « |a 3 ,- 2 (i)| 2 = sin2 ( g |
J* V*{t>)dt>),
(4.97)
where A2 = UJ\ — w is the (non-zero) de-tuning of the field respect to the intermediate level. Note, that the occupation of the system begin to oscillate between levels 1 and 3 (similar to Rabi oscillations) with a frequency, which is proportional to the square of the field amplitude, that is characteristic of the two-photon process. Within our approximations the occupation of the intermediate state P22 ~ |«2,-i| 2 remains approximately zero.
Optimal control of quantum
systems
141
Let us now analyze the obtained result. Using the expression for />33(i) Eq.(4.97), the first observation is that the final value of the occupation ^33 depends only on the pulse energy E oc fQ V2(t')dt' and does not depend on the pulse area 9 <x J0 V(t')dt' of the control field, opposite to the case of one-photon resonance. Fixing the final state (for example, P33(T) = 1) of the system, one restricts the pulse energy E to have certain values. But in the same time one can find an infinite set of the field envelopes V(t) with the same energy E but with different final pulse areas OT OC / 0 V(t')dt'. This result is in dramatic difference with the case of one-photon resonance, which we considered earlier [Grigorenko (2002)], where by fixing the final state and the pulse's energy, one can uniquely determine shape of the driving V(t). From Eq.(4.97), it is easy to see that the population of the third level becomes maximal if the energy of the pulse is
Etot =
27rAo -+2im M21M23
4A9 — ,n = 0,l....
(4.98)
M21M23
It is possible to generalize the solution given by Eq.(4.97) for the case of N + 1 level system with the cascade scheme of transitions (assuming the transitions only between the neighboring states \l >-\l + 1 >, I = 1, ...,N), and assuming N photon resonance: HNu> = .Ew+i — E\. By using the same quasi-energy method in resonant approximation and the adiabatic approximation, we can obtain the solution for the upper level pN+i,N+i{t) in a close form: PN+i,N+i(t) = sin 2 ( / Q.N(t')dt'J
=
^([^ W I n(^)]i'v»<^)-
(4»)
Here we use notation for the detunings A/+i = LJI+I —UJ and the transition frequencies wi = Ei+\—Ei. Here p,i,i+i is the dipole matrix element between levels I and / + 1 . Note, that Eq.(4.99) applicable if all the A; are non-zero. Using the formalism based on the variational principle that we presented in the previous sections of this chapter, we can set the Lagrangian of the optimal control problem and derive the corresponding Euler-Lagrange equation for the optimal control field amplitude V{t). Again, the description of optimal control can be determined by a func-
142
Optimal control and forecasting of complex dynamical
systems
tional density L L = Lob + \V2{t) + \J^^-\
,
(4.100)
where A, Ai are the Lagrange multipliers, Lob refers to a physical quantity P to be maximized during the control time interval [0,T]:
L
T
Lobdt.
(4.101)
The control at a given time T can be obtained as a special case, as: / Lob(t)S(t-T)dt Jo
= Lob(T),
(4.102)
where 6(t) is the Dirac delta function. The second term in Eq.(4.100) is a constraint on the total energy E0 of the control field / E2(t)dt « I f V2(t)dt = E0. Jo 2 J0
(4.103)
The third term is a constraint on the minimal experimentally achievable duration (time resolution) of the control pulse A:
This condition excludes infinitely narrow or abrupt step-like solutions. A - 1 a R.
(4.105)
As an example, we can consider optimal control over finite time interval t £ [0,T] of a time-averaged occupation of the upper level \/T J0 pN+i,N+i{t')dt' of the ./V+1-level quantum system under the condition of the N -photon resonance (EN+I — E± = HNu). One can substitute the analytical solution Eq.(4.99) into the Lagrangian Eq.(4.100). For simplicity of the analysis let us neglect the constraint on time resolution of the optimal pulse in Eq.(4.100). Then the description of optimal control is equivalent to find an extremum of quantity S S=
f Ldt = \ [ e2/Ndt+ Jo Jo
[ Jo
Pff+itN+1(aQ)dt,
(4. 106)
Optimal control of quantum
systems
where we introduce notation Q(t) = J0 VN(t')dt' ^/ijv,/v+i I I i =7 ( 2A'+\ )•
143
and we denote a =
^ e corresponding Euler-Lagrange equation
reads: 2
2
7 5 ^ "
•
A — (— - 1 ) 9 ^
9 ^ + a s i n ( 2 a 9 i v ) = 0. •
(4.107)
Note, since the Lagrangian does not depend explicitly on time, we can write immediately the first integral: L - 9 ^ = C , 99
(4.108)
where the constant C is determined by the boundary conditions. Integrating, we obtain:
t + C = \N'2{\
0.2
dQ' N
Joo
(C -
sin\f3e'))N/2
(4.109)
0.4 0.6 Time, t
Pig. 4.23 Optimal control fields maximizing a time averaged occupation of the highest level in the case of JV = 1 (solid line), JV = 3 (thin solid line), JV = 5 (dashed line) - and JV = 10-photon resonance (dotted line).
Optimal control and forecasting of complex dynamical
144
systems
0.8 cTO.6
c o
5 0.4 Q.
S. 0.2 0. r
0.5 Time, t
1
Fig. 4.24 The corresponding population dynamics of the occupation of the highest level PN+I,N+I driven by the optimal fields, shown in the previous figure, in the case of N = 1 (solid line), N = 3 (thin solid line), N = 5 (dashed line) - and N = 10-photon resonance (dotted line).
This solution can be expressed in terms of the hypergeometric function in some limiting cases. In the Fig.4.23 we show the solution V(t) of the optimal control problem, to maximize the time-averaged occupation of the N + 1 level (the upper one), in the case of N = 1,3,5 - and 10-photon resonances. As we said, the two-photon control is degenerate under the assumptions of the adiabaticity of the control field. The control pulses have the same pulse energy E = 3.32. One can make an observation, that the optimal control pulse has a tendency to be more localized in time for the case of larger N. In the Fig.4.24 we plot the corresponding population dynamics pN+i^+i(t) of the controlled iV+1 level. It is quite interesting, that the overall behavior on the p;v+i,;v+i is very close for different cases of JV = 1,3,5 - and 10-photon resonances. The main advantage of our approach that one can obtain some insights
Robust qubits
145
regarding shape optimization and from the comparison with numerical results. When the approximations do a good job, then this approach yields the global extremum. Thus, our approach avoids multiplicity of solutions. One can also look for a problem not to maximize occupation of the certain state, but, for example, to induce some specific coherence between certain states, that is a goal in some applications. 4.8
Summary
In this chapter we review some modern applications of the optimal control theory to quantum mechanical systems and nanoscaled devices. We discussed what kind of algorithms (both numerical and analytical) can be used to calculate the optimal control fields. We obtained the shape the control fields for some simple optimal control problems in analytical form. In particular, we investigated the effect of uncontrolled interaction with environment, that cause decoherence in quantum systems, on the shapes of the optimal control fields. We found, the in the presence of decoherence the energy, stored in the control field, will be distributed more uniformly, compare to the case without relaxation. We generalized our theory for the case of multiphoton resonance and we found that the two-photon resonance should be treated as a special case, because its sensitivity to the adiabatic approximation.
This page is intentionally left blank
Chapter 5
Optimal control and quantum computing
In this chapter we are going to continue our discussion of the optimal control of quantum systems with emphasis on applications to quantum computing. We are going to carry out a systematic analysis of the system of two coupled qubits (quantum bits), each of which is subject to its own dissipative environment. Applying optimization method based on genetic algorithm we find a combination of the inter-qubit couplings that provides for the lowest possible decoherence rates of the two-qubit register. Then our method is further generalized to the case of time-dependent processes (gates), the sequences of which constitute various quantum computing protocols.
5.1
Robust two-qubit quantum registers
The rapidly advancing field of quantum computing continues to bring about a deeper understanding of the basic quantum physics alongside prospects of the ground-breaking potential technological applications. However, one of the main obstacles on the way of implementing quantum protocols has so far been and still remains a virtually unavoidable environmentally induced decoherence. However, in the last decade it was considerable progress in resolving of the decoherence problem. The list of discussed decoherence suppression/avoidance techniques includes such proposals as error correction (including the automatic one through dissipative evolution) [DiVincenzo, Zurek (1996)], encoding into decoherence-free subspaces [Palma (1996); Duan (1997); Lidar (2000); Deutsch (1997))], dynamical decoupling/recoupling schemes (like "bang-bang" pulse sequences)[Viola (1998)], and the use of the quantum Zeno effect [Hwang (2000)]. Unfortunately, none of these recipes are entirely universal and their 147
148
Optimal control and forecasting of complex dynamical
systems
actual implementation may be hindered by such factors as a significant encoding overhead that puts a strain on the available quantum computing resources (error correction), a rather stringent requirement of a completely symmetrical qubits' coupling to the dissipative environment (decoherencefree subspaces) or a need for the frequency of the control pulses to be well in excess of the environment's bandwidth (dynamical decoupling/recoupling). Albeit being more feasible in some (most notably, liquid-state NMR and trapped-ion) as compared to other proposed implementations of quantum computing protocols, overcoming the above hurdles might be particularly challenging in solid-state architectures. It is for this reason that a physicsconscious engineering of robust multi-qubit systems and a systematic approach to choosing the optimal (coherence-wise) values of their microscopic parameters appears highly desirable. To this end, we are going we explore yet another possibility of thwarting decoherence by virtue of permanent (yet, tunable) inter-qubit couplings. We find that, despite its being often thought of as a nuisance to be rid of, the qubits' "cross-talk" may indeed provide for an additional layer of protection against decoherence. In what follows, we first focus on the problem of preserving an arbitrary initial state ("quantum memory") of a basic two-qubit register. Then, in the next section, we show how our optimization method can be further extended to the case of arbitrary time-dependent two-qubit gates which can alone suffice to perform universal quantum operations (including those involving single-qubit rotations). The previous research of the problem in question [Governale (2001); Thorwart (2001)] have been largely limited to the range of parameter values corresponding to the available experimental setups [Mooij (1999); Nakamura (2003); Vion (2002)]. Therefore, they have not systematically addressed any issues related to a possible role of the inter-qubit couplings in reducing decoherence. In contrast, our discussion pertains to the generic two-qubit Hamiltonian 6
= E **&+kw) + E i=l,2
J
«*i52»
(51)
a=x,y,z
where each qubit is exposed to a local magnetic field comprised of the constant Bi = (Aj,0,Cj) and the fluctuating hi(t) = (0,0, /ij(£)) components, the latter representing two independent dissipative reservoirs. First, we switch to the conventional singlet/triplet basis formed by the states: | TT>=
149
Robust qubits
(1,0,0,0) T , 1/V2(| Ti> +1 1T>) = (0,1,0,0) T , | | i > = ( 0 , 0 , l , 0 ) r , and 1 / V 2 ( | U > - | I T > ) = (0,0,0,1) T . In this basis, the non-dissipative part of the Hamiltonian of two identical qubits (Ai = A2 and t\ = e%) reads: / Jz+e A Jx-Jy A Jy-Jz + Jx A H0 = Jx - Jy A Jz-e \ 0 0 0
0 \ 0 0 -Jx-Jy-Jz)
^
;
where A = (Ai + &i)l*J2~ and e = ei + £2- In the following discussion, we concentrate on the vicinity of the "co-resonance" point e = 0 which, in hindsight, appears to be most coherence-friendly, in agreement with the experimental data for the Josephson charge-phase qubits [Vion (2002)]. The spectrum of Eq.(5.2) and the corresponding eigenvectors are given by the expressions:
E1 = JX + sJ(Jy-Jz)2 + 2A2, E2 = —(Jx + Jy + Jz), Ez = —Jx + Jy + JZ, £ 4 = - Jx + ]/{Jy-Jz)2+2A2,
(5.3)
and (l,-(-Jy
+ Jz - y / ( J y - J 2 ) 2 + 2A2)/A,l,0), (0,0,0,1), (-1,0,1,0),
(1, -(-Jv
+ JZ + yJ(Jy - Jz)2 + 2A2)/A, 1,0),
(5.4)
respectively. As in the earlier work [Governale (2001)], we treat the effect of the reservoirs in the standard Bloch-Redfield (hence, weak-coupling and Markovian) approximation. Our resorting to the approximation which is known to become unreliable in the (arguably, most important for quantum computingrelated applications) short-time (compared to the pertinent decoherence times) limit is largely due to the simplicity of the ensuing optimization procedure. Nevertheless, our main conclusions do not critically depend on the approximation employed, and we expect any more sophisticated analy-
150
Optimal control and forecasting of complex dynamical systems
sis (e.g., akin that of Refs. [Loss (1998)]) to deliver qualitatively similar results. Under the assumption of the Born approximation the standard BlochRedfield equations for the reduced density matrix p read [Weiss (1981)]: Pnm(t)
= -iWnm
Pnm{t)
- 2__, Rnmkl k,l
Pkl(t),
(5.5)
where u)nrn = (En — Em)/h are the transition frequencies, and the elements of the relaxation tensor are given by the linear combinations tinmkl
=
Sim/,
rrm
(5.6) r
r
of the partial decoherence rates [Weiss (1981)] Mmnk ~AZP
I
= ^S(u)nk)[allmalnk °U~7.
7J2~[al,lm^l,nk
+ CT2tlm
V5-7)
where 5nm is the Kronecker symbol, S(w) = aw coth r^#(w c — w)
(5.8)
is the spectral density of the reservoirs of bandwidth u>c (0(t)-is the step function) and o-fnk,a2nk are the matrix elements of the Pauli matrices
16
The density matrix pP = pP(t) is evaluated for all 16 initial conditions p>(0) = \%n X ¥in\ using product states * | n = | * a > i |tf6 > 2 , (a, b = 1.....4): | * i > = | | > , |* 2 > = | T>,|*3 > = 1/V2(| i> +1 T>), 1*4 > =
Robust qubits
151
l / \ / 2 ( | l> +i\ | > ) which evolve in time in the presence of decoherence. We have also checked that the alternate use of the fidelity 1
16
F = T^J2Tr{p(t)p°w>
( 5 - 10 )
j=i
where po(t) represents the unitary evolution of an initial state gives rise to the identical results for the optimal inter-qubit couplings (see below). Computing the matrix elements Eq.(5.7) one finds that the decohering effect of the environment tends to decrease with an increasing length J = yjjl + J* + J?
(5.11)
of the vector J = (Jx,Jy,Jz)However, in all the practically important cases where the qubits emerge as effective (as opposed to genuine) twolevel systems, an unlimited increase of J is not possible without leaving the designated "qubit subspace" of the full Hilbert space. Besides, the coupling strength errors that are likely to be present in any realistic setup tend to increase with the overall magnitude J of the inter-qubit coupling. Therefore, in what follows we fix the length of the vector J and look for an optimal configuration of its components, allowing for either sign of the latter. Adjustment of the optimal interaction configuration is done by application of genetic algorithm, we discussed intensively in the previous chapters, which belongs to the class of the so-called intelligent global optimization techniques [Holland (1975)]. The search for the optimal set of parameters begins with a number of random sets of parameters {Ja} that, in the language of [Holland (1975)], constitute an initial "population". The latter then undergoes a generation cycle after which the worst configurations are discarded and certain recombination procedures ("mutations") are performed on the rest of the population. The new solutions ("offsprings") are selected on the basis of the corresponding purity or another chosen fitness function. The iterations continue until the set of parameters J converges to a stable solution representing the optimal configuration J°pt. In Fig.5.2 we contrast the purity Eq.(5.9) computed for this optimal configuration against the results pertaining to the Ising (Jx = Jy = 0, J 2 = J and Jx = Jz = 0, Jy = J ) , XY (Jx = Jy = J/VZ, Jz = 0), Heisenberg (Jx,y,z = J/VS) and non-interacting {JXyylZ = 0) cases.
152
Optimal control and forecasting of complex dynamical
systems
A closer analysis of the optimal configuration reveals that it corresponds to an incidence of a double degeneracy amongst the eigenvalues of the Hamiltonian Eq.(5.1): Ei = E3, Ei = £4, or E2 = E3, E\ = E4. These conditions are satisfied provided that J°pt = 0 and 2J°ptJ°pt = A 2 , thus restricting the possible optimal solutions to a hyperbola in the Jy — Jz plane. This result can be read off from the two-dimensional plot of Fig.5.1 representing the rate of the purity decay \dP/dt\ as a function of Jy and Jz for Jx = 0. Taking into account the constraint imposed on the overall length J we then obtain the optimal inter-qubit couplings j°pt = 0,
J«* = ±\(VJ2
+ &2- VJ2 - A 2 ),
J°pt = ± i ( \ / J 2 + A2 + VJ2 ~ A 2 ).
(5.12)
Notably, the double degeneracy condition can only be fulfilled for a sufficiently strong (non-perturbative) inter-qubit coupling, J > A. In Fig.5.3 we show some non-zero elements of the tensor Rnmki as a function of a single parameter w introduced through the relations Jx = 0 . , Jy — 0.25Aw, Jz = 1.98Aw. This choice of the parameters leads to the double degeneracy for w = 1, that corresponds to the overall length J = 2A. It can be readily seen that under the double degeneracy condition many of the partial decoherence rates A.nmki drop to their lowest possible values ~ aT, while T is relatively low. This observation suggests a systematic way of constructing a class of input states that undergo a slow decay governed by pure dephasing (no relaxation), contrary to the case of general encoding. One example of such a robust state is offered by the initial density matrix with the only nonzero entries P22 = P24 = P42 — /?44 = 1/2 that correspond to the pair of degenerated states with lower energy. In Fig.5.4 we plot the absolute value of the purity decay rate \dP(t)/dt\ as a function of the parameter w for the above initial state. Near the double degeneracy point (w = 1) the rate is reduced by orders of magnitude. Rationalizing these findings, we surmise that a high stability of the doubly degenerate ground state of the two-qubit system is reminiscent of the notion of "supercoherent" decoherence-free subsystems introduced in [Bacon (2001)]. Conversely, on the basis of the results depicted in Fig.5.4 one can also identify those initial states whose decay can only be made worse by increas-
Robust qubits
2.0
-1.8
-1.0
-0.5
0.0
153
0.5
1.0
1.5
2.0
Fig. 5.1 Purity decay rate \dP/dt\ as a function of Jv and Jz for Jx = j£ p * = 0, and T = 10~5A.
ing J. One example of such a fragile state is given by the initial density matrix pn — pn = p 3 1 = pm = 1/2 which represents the upper pair of degenerate levels. A word of caution is in order. In principal, any degeneracy may invalidate the use of the Golden rule expressions Eq.(5.7), whose (sufficient) condition of applicability reads inax\Rmnki\
(5.13)
It turns out, however, that in the problem at hand the solution of the master equations Eq.(5.5) remains stable even close to the double degeneracy points, for the optimal configuration shows no abrupt changes upon varying the strength of the coupling a from its lowest values that do comply with
154
Optimal control and forecasting of complex dynamical
Fig. 5.2 Purity as a function couplings: non-interacting Ja line) and Jy = J,JX = Jz = 0 dotted line), Heisenberg Ja = J = J°vt (solid line).
systems
of time for J = 2 A , T = 1 0 _ 7 A and different inter-qubit = 0 (thin solid line), Ising Jz = J, Jx = Jy = 0 (dotted (dashed line), XY Jx = Jy = J/y/2, Jz = 0 (dash-double . / / \ / 3 (dash-dotted line), and the optimal configuration
the above conditions all the way up to the regime min\wmn\
< max\Rmnki\
< max\u>mn\.
(5-14)
It is also worthwhile mentioning that beyond the lowest order in the qubit-bath couplings all the degeneracies will generally appear to be lifted, thereby resulting in the standard avoided level crossing situation, unless they are protected by the symmetries of the full Hamiltonian Eq.(5.1). We also would like to stress that the optimization of the inter-qubit couplings can be carried out for arbitrary values of the parameters e and A. Notably, for A = 0 and e ^ 0 the optimal configuration of the couplings, jopt _ (o,0, J ) , yields the Hamiltonian where all the terms commute with each other and there is again no relaxation. To summarize, in this section we demonstrated that permanent interqubit couplings might serve as a new way of protecting quantum registers
Robust qubits
0.0021
1
1
•
1
•
155
1
1
0.0015 -
0.0021-
0.0015 -
Fig. 5.3 Upper figure: non-zero matrix elements of the relaxation tensor Rnmkl computed for T = 1 0 _ 5 A , e = 0, Jx = 0, Jv = 0.25Au>, and Jz = 1.98Au) as a function of the parameter w. Note a dramatic drop of some of the matrix elements at w = 1. Bottom figure: for Jx = Jy = 0, Jz = 2Au) the matrix elements vary monotonously with w.
against decoherence. While still remaining relatively unexplored in the mainstream quantum information proposals, this possibility has already been discussed in the context of constructing logical qubits and performing fault-tolerant computations in various strongly correlated spin systems [Kitaev (2003)]. In the existing proposals the prototype logical qubits are envisioned as topological ground and/or quasi-particle bulk/edge states of rather exotic
156
Optimal control and forecasting of complex dynamical
U.UU1
\ \ 0.003 - \
.
\
« 0*0.002 T3 1
\
\
\
|
|
i
|
\ \ \
\
N
'
N
N.
0.001
i
^V
>/ S
\
s/'
Nw \ n
systems
~
*T
,\N.,-SX,
0.5
1
1.5
W
Fig. 5.4 Purity decay rate \dP/dt\ as a function of the parameter w (see Fig.5.3) for T = 1 0 - 7 A (solid line) and T = A (dashed line).
chiral spin liquids [Kitaev (2003)]. Thus, albeit enjoying an exceptionally high degree of coherence, the (topo)logical qubits require an enormous overhead in encoding, since creating only a handful of such qubits takes a macroscopically large number of interacting physical ones. Besides, the nearly perfect isolation of (topo)logical qubits from the environment can also make initialization of and read-out from such qubits rather challenging. Nonetheless, the results we have presented in this section, suggest that a much more modest idea of augmenting the other decoherence-suppression techniques with a properly tailored permanent inter-qubit couplings might still result in a substantial improvement of the quantum register's performance. Moreover, the optimization method employed in this section can be rather straightforwardly extended to the case of time-dependent gates and other quality quantifiers (quantum degree, entanglement capability, etc.) as well as beyond the Bloch-Redfield approximation and the assumption of the Ohmic dissipative environments. The rapid pace of a technological progress in solid-state quantum com-
Robust qubits
157
puting (particularly, the phase [Mooij (1999)], charge [Nakamura (2003)] and charge-phase [Vion (2002)] superconducting qubit architectures) provides one with a strong hope that the specific prescriptions towards building robust qubits and their assemblies discussed in this section could be implemented before long.
5.2
Optimal design of universal two-qubit gates
Now let us construct optimized implementations of some universal twoqubit gates that, unlike in the most examples of the previously proposed protocols, are carried out in a single step. The new protocols require tunable inter-qubit couplings but, in return, show a significant improvements in the quality of gate operations. Our optimization procedure can be further extended to combinations of elementary two-qubit as well as irreducible many-qubit gates [Grigorenko (2005)]. According to one of the central results obtained in quantum information theory, an arbitrarily complex quantum protocol can be decomposed into a sequence of single-qubit rotations and two-qubit gates [DiVincenzo (1995)]. However, despite its providing a convenient means of designing logical circuits, in practice such a decomposition may not necessarily achieve the shortest possible times of operation and, concomitantly, the lowest possible decoherence rates. In the practical implementations, the latter are crucially important, since any realistic qubit system would always suffer a detrimental effect of its dissipative environment. Recently, there have been various attempts to improve on the performance of universal quantum gates by searching for their optimal implementations among the entire class of the two-qubit Hamiltonians with the most general time dependent coefficients. However, the outcome of a typical tour-de-force variational search such as that of [Niskanen (2003)] appears to be a complicated sequence of highly irregular pulses whose physical content might still remain largely obscure. In the search of a more sophisticated analytical approach, a number of authors applied the optimal control theory with the goal of implementing an arbitrary unitary transformation independently of the initial state [Ramakrishna (2002)]. The resulting complex system of nonlinear integraldifferential equations can be solved numerically with the help of the Krotov's or similar iterative algorithms [Krotov (1996)]. Conceivably, a significantly simpler alternative to the above approaches
158
Optimal control and forecasting of complex dynamical
systems
would be a straightforward implementation of a desired unitary transformation in the smallest possible number of steps, during which the parameters of the qubit Hamiltonian remain constant. One such example is given by the two-qubit SWAP gate which can be readily implemented (up to a global phase) with the use of the Heisenberg-type inter-qubit coupling that remains constant for the duration of the gate operation (see, e.g., [Zhang (2003)]). We are going to construct one-step implementations of some universal gates (e.g. CNOT gate). In contrast with the previous works, where a constant decoherence rate was assumed and, therefore, the overall loss of coherence accumulated during a gate operation was evaluated on the basis of its total time, we quantify the adverse effect of the environment by actually solving the corresponding master equation for the density matrix of the coupled qubits. In this way, we account for the fact that the decoherence rates depend on (and vary with) the changing (in a step-wise manner) parameters of the time-dependent two-qubit Hamiltonian. On the basis of our general conclusions, we also make specific predictions for such a viable candidate to the role of a robust two-qubit gate as a pair of the charge-phase Josephson junction qubits (dubbed "quantronium" in [Vion (2002)]) which are tuned to their optimal points and coupled (both, inductively and capacitively) to each other. The problem of implementing a given unitary transformation in the course of a quantum mechanical evolution of a generic two-qubit system can be formulated as the condition of the minimum deviation \\X - T e x p ( - - i [ ° H(xi(t'))dt')\\ -* min (5.15) n Jo between the time-ordered evolution operator governed by the Hamiltonian H and the target unitary transformation X. Here \xi(t)\ < ai represent the tunable control parameters whose physically attainable values are generally bounded, and the Probenius trace norm is defined as \\Y\\ = \JTr\Y^Y]. In the standard basis where af 2 are diagonal, the noiseless part of Eq.(5.1) takes the form
Ho
/Jz+ei+e2 A2 Ai Jx - Jy A2 ei - e2 - Jz Jx + Jy Ai Ai Jx + Jy e2 - ei - Jz A2 V Jx- Jy Ai A2 - e i - e2 +
\ (5.16) Jz)
In order to facilitate our analysis of the decohering effect of the environ-
Robust qubits
159
ment, in what follows we again, as in the previous section, assume the Ohmic nature of the dissipative reservoirs < hi(t)hi(t') > described by the spectral function 5(w) = auj coth ^6(uc — w) with the bandwidth LOC, 6{UJ) is the step function. This assumption allows one to treat the environment in the standard Bloch-Redfield (i.e., weak-coupling and Markovian) approximation. Below, we restrict our analysis to the Hamiltonians that remain constant for the entire duration of the gate operation (H(t) = Ho6(t)9(tenc[ — t)) and, therefore, commute at different times {[H(t),H(t')] = 0). We then demonstrate that within such a class of quasi-stationary Hamiltonians, the problem of constructing an optimized (coherence-wise) implementation of a given universal gate allows for a rather simple and physically transparent solution. To that end, we invoke a direct relationship between a possibility of decoherence suppression and a spectral degeneracy, as revealed by the analysis done in the previous section, the problem of preserving an initial state ("quantum memory") of an idling pair of coupled qubits (see also the later works [You (2004)] for similar results). Namely, we just have shown above, a contribution to the overall decoherence factor stemming from the relaxation processes can be significantly reduced by tuning the Hamiltonian parameters towards the point where a pair of the lowest eigenvalues of the quasi-stationary Hamiltonian Eq.(5.29) becomes degenerate. The underlying (energy exchange-based) mechanism of the suppression of relaxation can be explained by the fact that near a degeneracy point and at low temperatures the partial relaxation rates (5.7) vary with the transition frequencies w^ as a sum X ^ i = i 4 cij lwu I w n e r e the coefficients c^ are essentially independent of w^-. Therefore, the relaxation rates attain their minimum values at those points in the parameter space where the largest possible number of the transition frequencies vanish as a result of the onset of degeneracy. Notably, a contribution due to the other, pure dephasing, processes is generally unavoidable and can only be suppressed by lowering the temperature of the reservoirs. In order to further illustrate the above point, in Fig.5.5 we plot the decoherence rate \dP/dt\ (note that in the Markovian approximation and in the short time limit the purity decays as a linear function of time) as a function of Jy and Jz. And in Fig.5.6 we plot the function, which quantifies the degree of the degeneracy Q = \E\ — En\ + \E2 — E3\. Obviously, its minima will correspond to the double degeneracy, that helps us to iden-
160
Optimal control and forecasting of complex dynamical
systems
tify the above described effect. Again, we kept the length of the vector constant, since in the case of realistic Josephson qubits an unlimited increase of J would have resulted in an unwanted leakage from the designated two-qubit Hilbert subspace. Besides, for the sake of physical clarity, in obtaining Figs.5.5, 5.6 we put the parameters of both (chosen to be identical, Ai = A2 = A) qubits into the coherence-friendly "quantronium regime" of Ref.[Vion (2002)]: t\ = e2 = 0. Figure 5.5 demonstrates that \dP/dt\ has the absolute minimum at the point characterized by the incidence of double degeneracy (Ei = Ei,E^ = Ez) between the eigenvalues given by the expressions £1,2 = Jx T V / ( A 1 + A 2 ) 2 + ( J J / - J 2 ) 2 ! £3,4 = -Jx T v / ( A 1 - A 2 ) 2 + (J y + J,)2.
(5.17)
We have already shown in the previous section, that the latter occurs for the inter-qubit coupling Jopt satisfying the conditions J°pt = 0,
2J°ptJ°pt
= A2.
(5.18)
Note, in the Fig.5.6 there are two points, corresponding to the double degeneracy. However, the second one, where Jy > Jz is not coherent-friendly, since we assume the coupling to the reservoirs along axis z, that breaks the symmetry between these two points. The Jz > Jy is coherent-friendly because the resulting Hamiltonian will approximately commute with the coupling operator, that leads to additional stability of the two-qubit system. Furthermore, albeit being less effective than the onset of double degeneracy, the emergence of even a single degenerate pair of the two lowest eigenvalues appears to provide a relative improvement as compared to the non-degenerate case. In this case, the two-dimensional degenerate subspace is (partially) protected by the energy gap separating it from the rest of the spectrum, thereby giving rise to the exponential suppression of some relaxation rates at low temperatures. This more relaxed constraint on the Hamiltonian parameters may provide an extra freedom in improved implementations of the logic gates. Having identified the conditions providing a suppression of decoherence, we can now try to satisfy them in the case of various universal gates. For a starter, we impose the less stringent condition of a single degeneracy between the lowest pair of energy levels, while attempting to find a one-
Robust qubits
161
step implementation of the standard CNOT gate
XcNOT=
/1000\ 0100 0001
(5.19)
Vooio/
2 10
*-
2 10"
B'ig. 5.5 The purity decay rate \dP/dt\ as a function of interaction coefficients Jx and Jv and Jz = J*'-J*-J*, J ~ 2A, T = 0.001A.
In the noiseless case, this goal can be accomplished by virtue of computing a matrix logarithm of XCNOT directly. However, the latter appears to feature a substantial degree of ambiguity
HCNOT
= ith/to log (XCNOT)
= A(C + B)A
\
(5.20)
162
Optimal control and forecasting of complex dynamical
systems
4.0 £'-£.:
0.0
J, Fig. 5.6
T h e measure of the double degeneracy \E\ - E&\ + \E2 - E3\ as a function J2 __ j 2 _ j 2 j = 2A, (compare als
of interaction coefficients .Jx and Jy and J with the previous figure).
where t0 is the protocol duration, we also use notation
C =
/oo
0
0 \
00 00
0
0
V0 0
-IT/2
(5.21)
+ TT^O^,
TT/2
TT/2
-TT/2/
/ 0 0 1 1\ /I 100\ 001 1 1 100 + 27rn2 B = 27rn! 1100 0011 \l100/ V0011/
•
2imiE,
(5.22)
where E is the unit matrix, and A is the block-diagonal matrix (ei$3
0 e"
0
\
(5.23)
163
Robust qubits
parameterized by the arbitrary integers n; (observe that [5,C] = 0 ) , and continuous phase variables <j> and fa. The high dimension of the invariant subspace of the CNOT's equivalence class (see below) is a rather unique property of this particular gate. A straightforward analysis reveals that one does reproduce the CNOT gate up to a global phase >o = 1/4 (so that XCNOT = exp(—wr<^o)exp(—itoHo/fr)) while achieving a single spectral degeneracy {Ei = 1.125, E2 = E3 = -0.875, £3 = 0.625) with the following choice of the parameters in Eq.(5.29): ei = -0.25, e2 ss -0.66, Ai = 0,A2 « 1.5, Jx = 0, Jy = 0, Jz = £2 where all the energies are measured in units of h/to. In Fig.5.7 we contrast the resulting optimized gate against the standard (multi-step) CNOT protocol (see, e.g., [Zhang (2003)]) 2
XCNOT
exp ( - i^al)
oc exp ( - 1exp ( - ija\al)
2
^
) exp ( - i-alz)
x
exp ( - i | ^ ± p ) .
(5.24)
Notably, the new one-step implementation takes only about 15% of the time required for implementing the standard one and it results in approximately 10 times better performance. As the next step, one might attempt to impose the double degeneracy condition Eq.(5.18) that is expected to result in an even better performance. It is, however, conceivable that this stringent condition may not allow for a construction of an arbitrary two-qubit gate. In such a case, one could then settle for a more readily attainable goal of constructing a gate X' which is only equivalent {X' = U\®U2XU[®U2 where Uiti and U[ 2 are single-qubit rotations), albeit not necessarily identical, to a target gate X. It should be noted, however, that such a substitute solution would only be acceptable if any one-qubit rotation can be performed quickly and, therefore, it contributes negligibly towards the overall loss of coherence. Formally, the equivalence between different gates is established on the basis of their Makhlin's invariants (see, e.g., [Zhang (2003)])
= 1
_tr2[m(X)]-tr[m2(X)}
tr?[m(X)} 16Det[X] '
2
ADet[X]
'
;
164
Optimal control and forecasting of complex dynamical
where m(X) = X%XB, XB = &XQ
systems
and
/10 0 i \ O i l O Q = 0 i -1 0
VlO 0
(5.26)
-i)
It is worth mentioning that in the case of the double degeneracy condition Eq.(5.18) G\ appears to be a real number, which fact severely restricts the set of logic gates that can be constructed this way. An example of a gate incompatible with the double degeneracy condition is given by the before mentioned VSWAP gate which has G\ = —i/4. By contrast, the CNOT equivalence class which has the invariants Gi = 0,(32 = 1 c a n D e attained with the use of the Ising coupling along z axis. This result is consistent with the CNOT implementation given by Eq.(5.24). We found that this protocol outperforms the CNOT-equivalent gate obtained by using the Heisenberg inter-qubit coupling [Romero (2004)]. Our optimized protocol appears to be « 2.3 shorter and suffers ss 25 times less decoherence, as compared to that of [Romero (2004)]. For a large part, the CNOT gate owes its popularity to the fact that a generic two-qubit gate requires no more than three applications of the CNOT complemented by local one-qubit rotations [Zhou (2004)]. With the aim to a further improvement, recently an alternative (dubbed by the authors as the "B") gate with the invariants G\ — Gi = 0 was proposed [Zhang (2003)]. Unlike CNOT, the B-gate only needs to be used twice in order to implement an arbitrary two-qubit gate (up to singlequbit rotations). We find that our approach which takes a full advantage of the double degeneracy condition offers a superior realization of the Bgate's universality class. Namely, by choosing Ai = A2 = A, J = 2A, Jx = 0, Jy = 0.52A, Jz = 1.90A we arrive at the one-step implementation of an equivalent of the B-gate that takes only m 42% of the duration of the protocol of [Zhang (2003)] and results in a much improved (by a factor of 6) gate purity. As far as practical recommendations are concerned, the above theoretical predictions can be applied to coupled Josephson junction qubits where the typical value of A is of order 1 GHz [Vion (2002)]. Assuming the strength of the inter-qubit coupling J = 2A ~ 2 GHz, we find that in the case of the equivalence class of CNOT the double degeneracy condition can be met with the following choice of the coupling parameters: Jopt =
Robust qubits
165
Time Fig. 5.7 Comparison between 5-step implementation of the CNOT gate with the optimal one-step implementation. Note, that the maximal allowed amplitudes of Ja in both cases are limited to 2A. The time is normalized on the longer duration of both protocols.
{0, ( V ^ + A W - ^ - A 2 ) ^ , (VJ 2 + A 2 + V J 2 - A 2 ) / 2 } = (0,0.25,1.98) GHz. For temperatures T ~ 0.1K the purity degradation is of order 1 — P « 4 x 1 0 - 4 which corresponds to a characteristic relaxation time of « 2500 ns. These estimates have to be contrasted with the experimentally measured single-qubit relaxation and dephasing times which were found in Ref. [Vion (2002)] to be 1000ns and 20ns, respectively. As regards the largest acceptable error in the gate's parameters, we find that in the case of the overall purity decay 5P(T) ~ 10~ 4 the control pulse may not deviate by more than 0.3%, and the same allowed deviation is for its amplitude. Using the described approach, we also can perform a one-step implementation of RNOT and FFT2 gates. As we have mentioned above, the one-step implementation permits easily to construct any fractional power of the gate Xa. However, we were able to find control parameters which correspond only to a single degeneracy of the unperturbed Hamiltonian.
166
Optimal control and forecasting of complex dynamical
systems
To summarize, in this section we present a systematic approach to a one-step implementation of highly robust two-qubit universal gates. The increased stability against decoherence is achieved thanks to the special choice of the Hamiltonian parameters which provides for an incidence of degeneracies in the instantaneous energy spectrum, thus suppressing the effect of the relaxation processes. It should be possible to further generalize our results to the case of a sequence of several two-qubit as well as irreducible multi-qubit gates. Also, this approach differs from the previous proposals for constructing "supercoherent" qubits which, albeit capable of providing an exponential suppression of decoherence, require at least four physical qubits governed by the Hamiltonian that conserves the total spin [Bacon (2001)]. The latter condition, however, is unlikely to be fulfilled in the case of effective (e.g., Josephson junction based) pseudo-spin Hamiltonians due to the almost immanent presence of permanent (non-spin rotationally invariant) local terms (in our notation S-fields) in any many-qubit counterparts of Eq.(5.1).
5.3
Entanglement of a pair of qubits
Besides fidelity and purity there is another interesting quantity to characterize quantum system-this is entanglement. Entanglement of a pair of qubits is a quantity that characterizes what the amount of information one can learn about the first qubit by measuring the second one. For example, if a two qubit system is in the singlet state (like a spin singlet), then the measurement of one qubit is enough to completely determine the state of the second qubit in the pair. And quite opposite, any product state has zero entanglement, in other words, these two qubit are uncorrelated. Entanglement may be an important resource for quantum computing. Let us consider a qubit which is in a contact with a primitive bathanother two level quantum system. Although our model is quite simple, we can show, that it can exhibit rich behavior in time that is strongly dependent on the type of coupling between these two subsystems. One can systematically investigate different regimes and even obtain some analytical solutions. This situation might be realized experimentally as a one-qubit device, which is coupled to one (the nearest) trapped charge, that is also can be treated as a two-level system. For a system consisting of two interacting qubits the von Neumann
Robust
0
0.2
qubits
0.4
167
0.6
0.8
1
Fig. 5.8 Plot of the entropy of the first qubit after t = 30.17T of unitary evolution of the whole two-qubit system, as a function of parameters (/> and tp (see text).
entropy 5i of the first qubit if defined as: Si =-kTr(Pllog(Pl)),
(5.27)
where k is the Boltzmann's constant, and the marginal density matrix p\ is obtained by partial tracing over the second qubit: p\ = Tr^p). In the case, when the whole system is in a pure state \t}) >< ip\, the entropy of the subsystem can be used as a measure of entanglement. Note that in this case p = p2. The entropy of any pure state is zero, which is unsurprising since there is no uncertainty about the state of the system. It also can be shown that unitary operators acting on a state of the whole system (such as the time evolution operator obtained from the Schrodinger equation) leave the entropy unchanged. This associates the reversibility of a process with its resulting entropy change, which is a deep result linking quantum mechanics to information theory and thermodynamics. In a case when the observed subsystem system initially is in a pure state, the rate of increase of the subsystem's entropy reflects the leakage of the
168
Optimal control and forecasting of complex dynamical
systems
information due to coupling to the unobservable subsystem (second qubit in our language). Explicitly, the marginal density matrix for the first qubit can be written as: P1
=
( P\\ + PZZ P34 + P12 I , i \P43 + P21 Pll + P44
(5.28)
where pij, i,j = 1, ..,4 are the density matrix elements of the total system. As we already did before, let us consider a quantum system of two interacting qubits with the Hamiltonian H: /Jz + e1+e2
A2
Ai Jx-Jy
Jx + Jy Ai
\
Ai
Jx-Jy
\
e2 - ei - Jz A2 A2 - e i -C2 + JzJ
^
'
The system's density matrix p(t) satisfies quantum Liouville equation: ih~p
= Lp=[H,p}.
(5.30)
By parameterizing the interaction vector (Jx, Jy, Jz), Jx = Jcos()) sin(0), Jy = J sm(4>) sin(6), Jz — Jcos(6) we can plot the entropy of a subsystem (which also characterizes the entanglement in the whole system) S^ = Si(t, 4>, 6). In Fig.5.8 we plot the entropy Si as a function of parameters >, 6, at time t = 30.17T, for a particular choice of the Hamiltonian's parameters: ei = e2 = 0, d\,d2 = d — 0.0001, and having the interaction strength J = 1. We set the initial pure state as p(0) = [[1000] [0000] [0000] [0000]]. We have restricted > £ [0, n] and 0 £ [0, TT/2] because of the symmetry of the map. In Fig.5.8 we can see that the entropy map Si(<j),9) becomes rather complicated. For even larger times it looks more fractal-like. Note, the lighter areas in the plot correspond to lower values of entropy, and the darker- to higher. This pattern reflects the sensitivity of the system to small perturbations of coupling parameters on big time scales.
5.4
Summary
In this chapter we considered an application of optimal control and optimal design approach to the problem of quantum computing. It was shown, that one can use tunable parameters of basic elements of quantum computer in order to optimize its performance and stability against decoherence effects.
Forecasting
169
We have obtained that even in the case of simple two-qubit system one can improved performance and stability of the system by approximately factor of 10. Since for large subsystems a "plain" design have even less chances be the optimal realization, the significance of the optimal control becomes obvious. One could also expect that the expectation value of the difference in performance between optimal and non-optimal realization should increase with increasing of the object's size. But one have to take into account that for larger systems the sensitivity to the optimal design implementation could also increase, because the spectrum of interacting threeand multi-qubit systems becomes more and more close to the continuum spectrum of a macroscopic system. We also have calculated the entropy of a pair of interacting qubits, and have shown its complex dependence on the interaction parameters.
This page is intentionally left blank
Chapter 6
Forecasting of complex dynamical systems
Es gibt Zeit fur die Mdrchen, Fiir der zauber Baum, Fur das Goldhaarige Mddchen, Fiir der Honigaugen Traum. I.G. As we have discussed in the previous chapters, complex systems are usually associated with the situation when it is extremely difficult to deduce system's behavior from the first principles. For example, on the current level of our knowledge, it is simply impossible to predict, for example, the behavior of a human being purely from a priori principles. The way, how it works, for example, in quantum mechanics, we cannot formulate "behavioral" variational principles (the universal principle that we have mentioned in the first chapter). This means, that without such "behavioral Lagrangian" one cannot derive the correspondent Euler-Lagrange equations. In addition, complex systems are characterized by multiple temporal and spatial scales, that make forecast very nontrivial problem. The approximation and forecast of dynamical systems seems to be a field in which many open problems must be studied in the near future. In particular more needs to be said on the actual relationship between the required accuracy and the computational effort.
6.1
Forecasting of financial markets
Financial markets can be regarded as model complex systems. In fact, they are systems composed by many agents which are interacting between them in a highly complicated way. Financial markets are continuously moni171
172
Optimal control and forecasting of complex dynamical
systems
tored, the data exist down to the scale of each single communication of bid and ask of a financial asset (quotes) and at the level of each transaction (trade). The availability of this enormous amount of data allows a detailed statistical description of several aspects of the dynamics of asset price in a financial market. The results of these studies show the existence of several levels of complexity in the price dynamics of financial asset. Financial markets provide us with a high-frequency time series, which are usually multivariate (multi-dimensional), highly correlated, highly volatile and have interesting scaling properties, they have no simple statistics (like Gaussian random numbers) and finally, they are non-stationary, i.e. their statistical properties are changing in time! One cannot treat such systems simply as deterministic, the stochastic component is an essential part of them. It was shown, (see, for example [Hegger (1998)]) that an implicit assumption of noise-free input for the time series can lead to it systematically wrong results of the model estimation. As a consequence, one obtains biased, not correct estimation of the model parameters. This effect has as greater impact as the actual level of the noise is higher. Thus, one have to use an approach that tackles the noisy time-series. All this makes forecast of financial markets extremely complicated and challenging problem. Besides that, these is, of cause, a philosophical question is it possible at all to predict future and if the answer is positive, how to quantify predictability? Let me leave this question open. Let us mention some of the most common and frequently used forecasting methods, like multi-agent forecasting programs and packages, Neural Networks, different autoregressive models, including fractionally integrated, (we did a short introduction to fractional derivatives in chapter 3): AR, ARMA, ARIMA, GARCH, VAR, ARFIMA, FGARCH [Hendry (2001)]. There are forecasting techniques, based on Chaos Theory and econometric models. In the following paragraphs we are going to talk about some of these methods.
6.2
Autoregressive models
A category of models what is commonly used for forecasting is the class of autoregressive models. The main idea is to predict future values of the time series with the help of the past (lagged) values. One can write an
Forecasting
173
autoregressive model as: N
Yt = YjaiYt-i + Zu
(6.1)
where Zt is some stochastic process, for example a white noise sequence. Intuitively, one can understand white noise as completely random sequence without systematic structure. In terms of autocorrelation it is "delta" correlated in time. =a2S(t-t1),
(6.2)
with variance
Yt=YJhZt-i.
(6.3)
i=l
Let us assume that we have observed a time series {Yi, ...,Yn}, and we are interested in forecasting a future value Yn+m, m = 1,2, ....It will be useful to distinguish between two types of forecast, the ex post and ex ante forecast. The ex post forecast observations when the "future" observations are known for certain during the forecasting period. It is used as a means to check against known data so that the forecasting model can be evaluated. On the other hand, the ex ante forecasts observations beyond the present, and in this case, the future observations are not available for checking. Suppose that we observe {Yi, ...,Yn}: we may use {Yi,...,Yr} (T < n) to estimate a model and use the estimated model to forecast {Yr+i, ...,Yn}. These are ex post forecasts since we can use them to compare against the observed {Yr+i, ...,Yn}. The estimation period in this case is T. On the other hand, when we forecast {Yn+i,...,Yn+m} for m > 0, we are doing ex ante forecasts. After fitting a model, we estimate a future value Yn+m at time n based on the fitted model, while the actual value of Yn+m is unknown. The forecast is h steps ahead; h is known as the lead time or horizon.
174
6.3
Optimal control and forecasting
of complex dynamical
systems
Chaos theory embedding dimensions
If we construct a forecasting model, we have to take care that this model will be not overcomplicated. For example, we would like to identify the order of the autoregressive model. One can do it with the help of chaos theory and so-called embedding theorem. Suppose we have a scalar time series, X\,X2---XNWe make a time-delay reconstruction of the TV phase-space with the reconstructed vectors: Vn = (xn,Xn-.u •••, Zn-(d-l)t)>
(6-4)
where t is time-delay, d is so-called embedding dimension, and n = (d — l ) i + l , . . . , N. d represents the dimension of the state space of the underlying system. The time-delay (time lag), Due to similarity of work between the present t, represents the time interval between sampled observations used in constructing the d-dimensional embedding vectors. If the time series is generated by a deterministic system, then by the embedding theorems [Takens (1981)], there generically exists a function F: that Vn+1 = F(Vn),
(6.5)
if the observation function of the time series is smooth, and d is sufficiently large. And this mapping has the same dynamic behavior as that of the original unknown system in the sense of topological equivalent. Then the remaining problem is how to choose the t and d, i.e. timedelay and embedding dimension, such that the above mapping exists. From the Takens theorem, it does not matter what time delay is selected in a "generic" sense. But in practice, because we have only a finite number of data points available with finite measurement precision, a good choice of t is deemed to be important in phase space reconstructions. In addition, determining a good embedding dimension d depends on actual choice of t. Another interesting issue is the choice of the embedding dimension from a time series. Generally there are three basic methods used in the published literature, which include computing some invariant (e.g., correlation dimension, Lyapunov exponents) on the attractor (e.g., [Grassberger (1983)]), singular value decomposition [Broomhead (1986)], and the method of false neighborhoods [Kennel (1992)]. For more discussions on this topic, see e.g., [Ott (1984)].
Forecasting
6.4
175
Modelling of economic "agents" and El Farol bar problem
As we have mentioned, economical modelling one of the most challenging problems. Some interesting insights could be gained in the modelling of economic "agents" with bounded computational skills and/or resources. One can use artificial agents to simulate individual investors. The ultimate goal of each agent is to maximize its individual wealth. To accomplish this goal, each agent must choose between investing in either a risky or riskfree asset by developing a forecast of the price of the risky security and then determining the position in that security or in the risk-free asset to be taken. One of the most efficient ways to generate a forecast, is to use genetic algorithm to attempt to learn about the behavior of the market. As we have described in the previous chapters, genetic algorithm is heuristic search mechanism based on the notion of biological evolution. A pool of potential solutions, in our case these are market forecasts, are evaluated against some objective function and those solutions that produce the best results are kept. Potential solutions that produce inferior results are discarded from the pool. Genetic operators are then applied to the remaining potential solutions to replenish the pool. One of the first, very simple, but rich multi-agent models were developed to solve El Farol bar problem. The El Farol bar problem is a problem in game theory that was created in 1994 by Brian Arthur [Arthur (1992)]. It got its name from a bar in Santa Fe, New Mexico, which offers Irish music on Thursday nights. The problem is as follows: there is a finite population of people (let's say 100) in a small town. On Thursday night, all of these people want to go to the popular El Farol Bar to hear lovely Irish music. However, since the El Farol is quite small (limited resource), and it's no fun to go there if it's too crowded (or fully occupied). So, we have a "game" with the following rules: If less than 60% of the population decide this evening go to the bar (in our case 60 people), they'll all have a better time than if they stayed at home. If more than 60 people make this decision, they'll all have a worse time than if they stayed at home. The rules of the game that everyone have to decide at the same time whether they will go to the bar or not. They cannot wait and see how many others go before deciding to go themselves. In some variants of the problem, persons are allowed to communicate with each other before deciding to go to the bar. However, they are not required
176
Optimal control and forecasting of complex dynamical
systems
to tell the truth. Now you can appreciate the significance and non-triviality of this problem. The the problem is constructed in such way, that no matter what strategy each person uses to decide if they will go to the bar or not, if everyone uses the same strategy it is guaranteed to fail. If everyone uses the same strategy, then if that strategy suggests that the bar will not be crowded, everyone will go, and thus it will be crowded; likewise, if that method suggests that the bar will be crowded, nobody will go, and thus it will not be crowded. The extension of the theoretical results applied to economic multi-agent models seems promising and certainly deserves further work. These applications can be particularly interesting, because this kind of language allows us to simulate effects of collective behavior, cooperation, clustering etc. With these models one can capture the workings of the processes stage by stage as they are observed and to reproduce the known outcomes. However, one should be aware of recent works which stress that investors are not fully rational, or have at most bounded rationality, and that behavioral and psychological mechanisms, such as herding, may be important in the shaping of market prices.
6.5
Forecasting of the solar activity
Solar activity forecasting is another important topic for various scientific and technological areas, like space activities related to operations of lowEarth orbiting satellites, electric power transmission lines, geophysical applications, high frequency radio communications. The particles and electromagnetic radiations flowing from solar activity outbursts are also important to long term climate variations and thus it is very important to know in advance the phase and amplitude of the next solar and geomagnetic cycles. Nevertheless, the solar cycle is very difficult to predict on the basis of time series of various proposed indicators, due to high frequency content, noise contamination, high dispersion level, high variability in phase and amplitude. This topic is also complicated by the lack of a quantitative theoretical model of the Sun magnetic cycle. Many attempts to predict the future behavior of the solar activity are well documented in the literature. Numerous techniques for forecasting are developed to predict accurately phase and amplitude of the future solar cycles, but with limited success.
Forecasting
6.6
177
Noise reduction and Wavelets
As we already mentioned, real life forecast is usually done on the basis of multi-frequency, noisy data, for example some currency exchange rate during the next month. In this case, before making any forecasting model, one can be useful to filter the high-frequency components from the original time-series. There are many ways to do it, and one of the most effective is based on wavelet analysis. Wavelet analysis is a relatively new development in the area of applied mathematics that is just now receiving the attention of many scientists. In particular, many useful applications of wavelets one can find in noise reduction. By design the wavelets usefulness rests in their ability to localize a process in time frequency space, unlike the Fourier transform, which is only localized in frequency, never giving any information about where in space or time the frequency happens. At high frequency levels, the wavelet is tight in shape (small time interval) and is able to focus in on short lived phenomena like singularity points, while at low frequencies the wavelet is stretched out in shape, making it well suited in identifying long periodic processes. By moving from high to low levels of frequency the wavelet is able to zoom in on a processes behavior at a point in time and identify either singularities or alternatively zoom out and reveal the long and smooth features of a signal. Wavelets can be thought of as the derivative at any order k of a smoothing kernel, under the assumption that the smoothing kernel has at least fc-ordered derivatives. Like any smoothing kernel, the kernel from which wavelets are formed is well localized in time space. But unlike normal unimodular smoothing kernels, the smoothing kernel used in deriving a wavelet can take on negative value. This feature of the smoothing kernel enables the wavelet to be well localized in frequency space, improving the decorrelation between the wavelet coefficients, and enabling the wavelet's bandwidths to be increased (decreased) to capture the long and smooth (short and discontinuous) characteristics of a time series. A continues wavelet is determined as follows:
1>a,b = M V V ( ^ ) ,
(6.6)
where a > 0 and b is any real number. Function ^o.t is simply the dilation (by a) and translation (by b) of the function. If a > 1, V flattens out horizontally, while 0 < a < 1 tightens tp. For this reason a is referred
178
Optimal control and forecasting of complex dynamical
systems
to as the scaling parameter. The l a p ' 2 term is a normalizing, constant that insures that ipa,b has an inner product equal to one. When a > 1 the \a\x/2 term causes the vertical height of to be scaled down, while when 0 < a < 1 the vertical height is increased. If is well localized around zero, changing the value of b shifts over the time arguments, allowing Va,6 to be well localized around the translation point b. b is referred to as the translation parameter. The function •)/>(£) is commonly referred to as the "mother" wavelet. In order for a function to qualify as a "mother" wavelet it must satisfy the admissibility condition, / ip(t)dt = 0. This is a necessary condition insuring smoothness and localization in frequency and time space. The admissibility condition can also be interpreted as requiring tp(t) to be nonunimodular, hence, the name wavelets.
< x(t)il>aib > = \a\x'2 [x{t)^— )dt. (6.7) J a It can be shown that the wavelet coefficients < x(t)ipa,b >b represent the details of the signal x(t) at the scale a.
j | < x(t)^0>6 > ?db = J \x(t)\2dt
(6.8)
and could be used to reconstruct x by x(t) = — / a~2 < x{t)ipa dadb, W J
(6.9)
where C$ = 2n J \i>\2(x)/\x\^X < °° Note that the admissibility condition, ftp(t)dt = 0, is implied by C$ < oo x(t) has sufficient decay. The wavelet transform represents an efficient technique for signal processing with time-varying spectra. It can be viewed as a decomposition of a signal in the time-scale plane [Daubechies (1992)],[Chui (1992)],[Mallat (1989)]. There are many application areas of wavelet transform like as subband coding, data compression and noise reduction. In reducing the noise of measured signal, many techniques are available like as filtering, adaptive method and wavelet transform. Major interests of the recent papers on the noise reduction using wavelet transform are the determination of the wavelet transform and the choice of thresholding parameters. For noise reduction, the wavelet transform that employs thresholding in wavelet domain has been proposed by Donoho as a powerful method [Donoho (1994)]. It has been proved that the method
Forecasting
179
for noise reduction works well for a wide class of one-dimensional and twodimensional signals. Thresholding in wavelet domain is to smooth or to remove some coefficients of wavelet transform of the measured signal. Through the thresholding operation, the noise content of the signal is reduced effectively under the non-stationary environment. The two well-known thresholding techniques are soft thresholding and hard thresholding [Donoho (1994)].
6.7
Finance and random matrix theory
It is often necessary to describe behavior of several (as many as 1000) firms in different sectors of economy. The common sense suggests that their time evolution can be correlated (or anti-correlated). A possible measure of these correlations one can derive from the correlation of their stock prices. As we discussed before, there are fundamental difficulties in quantifying any kind of correlations between any two stocks. In economical systems, unlike physical systems, there is no formalism or theory to calculate the "interaction" between two companies i,j. And it is not clear, in which units these "interaction" could be measured. The problem is that although every pair of company should interact either directly or indirectly, the precise nature of interaction is unknown or extremely complicated. Of course, in physics there are examples of "indirect" interactions, like "superexchange" or RKKY interaction. In the case of the RKKY interaction, which is the dominant exchange interaction in metals where there is little or no direct overlap between neighboring magnetic electrons. It therefore acts through an intermediary which in metals are the conduction electrons (itinerant electrons) [Ruderman(1954)]. However, unlike in physical systems, correlations need not be just pairwise but rather involving clusters of stocks. The correlations Cij between any two pairs i, j of stocks change with time (non-stationary). The correlation matrix C which has elements Cij
=
->
(fU0)
(T.-nowhere &i = y/< Y? > — 2 is the standard deviation of the price changes of company i, and < .. > denotes a time average over the period studied. The matrix C can be studied using Random Matrix Theory (RMT) [Wigner (1951)], which predicts the eigenspectrum of perfectly un-
180
Optimal control and forecasting of complex dynamical
systems
correlated Gaussian random matrices. The deviations of the eigenspectrum of the matrix C from the universal predictions of RMT identify correlations and non-random properties of the specific system. There are generalizations of the Random Matrix Theory for non-gaussian (Levy statistics) random variables, which exhibit so-called "heavy tails" and are more suitable for description of econometric variables. For further details, see, [Plerou (2000)].
6.8
Neural Networks
Artificial neural networks, originally developed to mimic basic biological neural systems the human brain particularly, are composed of a number of interconnected simple processing elements called neurons or nodes. Each node receives an input signal which is the total information from other nodes or external stimuli, processes it locally through an activation or transfer function and produces a transformed output signal to other nodes or external outputs. Many different ANN models have been proposed since 1980s. Perhaps the most influential models are the multi-layer perceptrons (MLP), Hopfield networks, and Kohonens self organizing networks. One can also mention other types of ANNs such as radial-basis functions networks, ridge polynomial networks, and wavelet networks. For more examples and possible applications of neural networks in identification, modelling and control of dynamic systems, one can read a nice book [Nrgaard (2000)]. For an explanatory or causal forecasting problem, the inputs to an ANN are usually the past variables.
Yn+l=f{Yn,Yn_u..,Yn^d).
(6.11)
Thus the ANN is equivalent to the nonlinear autoregressive model for time series forecasting problems. Before an ANN can be used to perform any desired task, it must be trained to do so. Basically, training is the process of determining the arc weights which are the key elements of an ANN. The training algorithm is used to find the weights that minimize some overall error measure such as the sum of squared errors (SSE) or mean squared errors (MSE), and we again arrive to multidimensional optimization problem!
Forecasting
6.9
181
Summary
In this chapter we make a brief review of a general formulation of different forecasting problems and different types of approaches to get a satisfactory solution. We outline the main ideas behind the wavelet transform, that could be useful in filtering of noisy data before making forecast. We have described, how the chaos theory can help us in effective reduction of the model's dimensions. We also have mentioned some philosophical aspects of the forecasting.
This page is intentionally left blank
Bibliography
M. Abramovitz, LA. Stegun, "Handbook of Mathematical Functions", Dover Publications, Inc., New York, (1972). N. Akman and M. Tomak, "The Wigner molecule in a 2D quantum dot", Physica E 4, 277 (1999). V. M. Akulin, N. V. Karlov, "Intence Resonant Interactions in Quantum Electronics", Springer-Verlag, (1992). L. Allen, J. H. Eberly, "Optical resonance and two-level atoms", Interscience Publication, (1975).
A Wiley-
B. L. Altshuler, P. A. Lee and R. Webb, Eds., Mesoscopic Phenomena in Solids, Elsevier, Amsterdam, (1991). L. E. E. de Araujo, I. A. Walmsley, and C. R. Stroud. Jr. "Analytic Solution for Strong-Field Quantum Control of Atomic Wave Packets", Phys. Rev. Lett. 8 1 , 955 (1998). W. B. Arthur, "On Learning and Adaptation in the Economy," Santa Fe Institute Paper 92-07-038, (1992). R. Ashoori, "Electrons in artificial atoms", Nature 379, 413 (1996). D. Bacon, K. R. Brown, and K B. Whaley, "Coherence-Preserving Quantum Bits", Phys. Rev. Lett. 87, 247902 (2001). C. J. Bardeen, V. V. Yakovlev, K. R. Wilson, S. D. Carpenter, P. M. Weberand, W. S. Warren, "Feedback quantum control of molecular electronic population transfer", Chem. Phys. Lett. 280, 151 (1997).
183
184
Optimal control and forecasting of complex dynamical systems
R. Bartels, S. Backus, E. Zeek, L. Misoguti, G. Vdovin, I. P. Christov, M. M. Murnane, H. C. Kapteyn, "Shaped-pulse optimization of coherent emission of high-harmonic soft X-rays", Nature 406, 164 (2000). W. G. Bickley, "Formulae for numerical differentiation", Math. Gaz. 25, 19 (1941). V. Blanchet, M. A. Bouchene and B. Girard, "Temporal coherent control in the photoionization of Cs2'- Theory and experiment", J. Chem. Phys. 108, 4862 (1998). R. H. Blick, R. J. Haug, J. Weis, D. Pfannkuche, K. v. Klitzing and K. Eberl, "Single-electron tunneling through a double quantum dot: The artificial molecule", Phys. Rev. B 5 3 , 7899 (1996). K. Blum, "Density matrix theory and applications", Plenum, New York, (1981). D. S. Broomhead, G. P. King, "Extracting qualitative dynamics from experimental data" Physica D 20, 217 (1986). K. Burrage and P. M. Burrage, "High Strong Order Methods for Noncommutative Stochastic Ordinary Differential Equation Systems and the Magnus Formula" ,Conference on Uncertainty, Physica D, (1998). M. M. Bogdan, A. M. Kosevich, G. A. Maugin, "Soliton complex dynamics in strongly dispersive medium", Wave Motion, 34, 1 (2001). N. H. Bonadeo, J. Erland, D. Gammon, D. Park, D. S. Katzer and D. G. Steel, "Coherent Optical Control of the Quantum State of a Single Quantum Dot", Science 282, 1473 (1998). C. Brif, H. Rabitz, S Wallentowitz, LA. Walmsley, "Decoherence of molecular vibrational wave packets: Observable manifestations and control criteria", Phys. Rev. A 63, 063404, (2001). T. Brixner and G. Gerber, "Femtosecond polarization pulse shaping", Opt. Lett. 26, 557 (2001). N. A. Bruce and P. A. Maksym, "Quantum states of interacting electrons in a real quantum dot", Phys. Rev. B 61, 4718 (2000). A. Bulatov, B. E. Vugmeister, and H. Rabitz, "Nonadiabatic control of BoseEinstein condensation in optical traps", Phys. Rev. A 60, 4875 (1999). M. Caputo, "Linear models of dissipation whose Q is almost frequency independent", Geophys. J. Roy. Astron. Soc. 13, 529 (1967).
Bibliography
185
D. M. Ceperley, "Quantum Monte Carlo simulations, Microscopic simulations in physics", Rev. Mod. Phys. 7 1 , 438 (1999). A. V. Chechkin, R. Gorenflo, and I. M. Sokolov, "Retarding Sub- and Accelerating Super-Diffusion Governed by Distributed Order Fractional Diffusion Equations", Phys. Rev. E 66, 046129 (2002). C. K. Chui, An Introduction to Wavelets, Academic Press Inc., (1992). B. E. Cole, J. B. Williams, B. T. King, M. S. Sherwin and C. R. Stanley, "Coherent Manipulation of Semiconductor Quantum Bits with Terahertz Radiation", Nature 410, 60 (2001). C. E. Creffield, W. Hausler, J. H. Jefferson and S. Sarkar, "Interacting electrons in polygonal quantum dots", Phys. Rev. B 59, 10719 (1999). I. Daubechies, "Ten Lectures on Wavelets", SIAM, (1992). K. Deb, "Multi-objective genetic algorithms: Problem difficulties and construction of test problems", Evolutionary Computation Journal, 7, 205 (1999). D. M. Deaven and K. M. Ho, "Molecular geometry optimization with a genetic algorithm", Phys. Rev. Lett. 75, 288 (1995). D. P. DiVincenzo, "Two-bit gates are universal for quantum computation", Phys. Rev. A 5 1 , 1015 (1995). D. P. DiVincenzo and P. W. Shor, "Fault-Tolerant Error Correction with Efficient Quantum Codes", Phys. Rev. Lett. 77, 3260 (1996); R. Laflamme, C. Miquel, J. P. Paz, and W. H. Zurek, "Perfect Quantum Error Correcting Code", Phys. Rev. Lett. 77, 198 (1996). J. A. Doornik, M. Ooms, "Inference and Forecasting for ARFIMA Models With an Application to US and UK Inflation", Studies in Nonlinear Dynamics and Econometrics, 8, art. 14, (2004). D. L. Donoho, "De-Noising by Soft Thresholding", IEEE Trans. Inform. Theory, (1994). E. Dupont, P. B. Corkum, H. C. Liu, M. Buchanan and Z. R. Wasilewski, "Phasecontrolled currents in semiconductors", Phys. Rev. Lett. 74, 3596 (1995). K. Diethelm, "An Algorithm for the Numerical Solution of Differential Equations of Fractional Order", Elect. Trans. Num. Anal., 5, 1 (1997).
186
Optimal control and forecasting of complex dynamical systems
A. Ederlyi, "Higher Transcendental Functions", McGrau-Hill, New York, (1955). J. Fassbender and M. Bauer, "Numerical investigations on the switching behavior of magnetic tunnel junctions in the quasi-static and dynamic regime", Europhys. Lett. 55, 119 (2001). M. D. Feit, J. A. Fleck, Jr., and S. Steiger, "Solution of the Schrodinger equation by a spectral method", J. Comput. Phys. 47, 412 (1982). B. Fornberg and D. Sloan, in Acta Numerica 1994, edited by A. Iserles, Cambridge University Press, Cambridge, 203 (1994). A.V. Filinov, M. Bonitz, and Yu. E. Lozovik, "Wigner crystallization in mesoscopic 2D electron systems", Phys. Rev. Lett. 86, 3851 (2001). A. Fuhrer, S. Luscher, T. Ihn, T. Heinzel, K. Ensslin, W. Wegscheider and M. Bichler, "Energy spectra of quantum rings", Nature, 413, 822 (2001). I. L. Garzon, K. Michaelian, M. R. Beltran, A. Posada-Amarillas, P. Ordejon, E. Artacho, D. Sanchez-Portal and J. M. Soler, "Lowest Energy Structures of Gold Nanoclusters", Phys. Rev. Lett. 81, 1600 (1998). I.M. Gelfand and S.V. Fomin, "Calculus of variations", (2000).
Dover Publications,
M. Governale, M. Grifoni, and G. Schon, "Decoherence and dephasing in coupled Josephson-junction qubits", Chem. Phys. 268, 273 (2001). I. Grigorenko and M. E. Garcia, "An evolutionary algorithm to calculate the ground state of a quantum system", Physica A 284, 131 (2000). I. Grigorenko, M. E. Garcia and K. H. Bennemann, "Theory for the Optimal Control of Time-Averaged Quantities in Quantum Systems", Phys. Rev. Lett. 89, 233003 (2002). I. Grigorenko and M. E. Garcia, "Two-Particle Systems Determined Using Quantum Genetic Algorithms", Physica A 291, 439 (2001). I. Grigorenko, O. Speer, and M. E. Garcia, "Coherent control of photon-assisted tunneling between quantum dots: A theoretical approach using genetic algorithms", Phys. Rev. B 65, 235309 (2002). I. A. Grigorenko, D. V. Khveshchenko, "Robust Two-Qubit Quantum Registers", Phys. Rev. Lett. 94, 040506 (2005). I. A. Grigorenko and D. V. Khveshchenko, "Single-Step Implementation of Uni-
Bibliography
187
versal Quantum Gates", Phys. Rev. Lett. 95, 110501 (2005). M. Grifoni, M. Winterstetter and U. Weiss, "Coherences and populations in the driven damped two-state system", Phys. Rev. E 56, 334 (1997). M. Grifoni, P.Hanggi, "Driven quantum tunneling", Phys. Rep. 304, 232 (1998). M. E. Garcia, H. 0 . Jeschke, I. Grigorenko and K. H. Bennemann, "Theory for the ultrafast dynamics of excited clusters: interplay between elementary excitations and atomic structure", Appl. Phys. B 7 1 , 361 (2000). D. E. Goldberg, "Genetic Algorithms in search, optimization, and machine learning", Addison-Wesley, (1989). P. Grassberger, I. Procaccia, "Measuring the strangeness of strange attractors" Physica D 9, 189 (1983). K. Groom, A. I. Tartakovskii, D. J. Mowbray, M. S. Skolnick, P. M. Smowton, M. Hopkinson, and G. Hill, "Comparative study of InGaAs quantum dot lasers with different degrees of dot layer confinement", Appl. Phys. Lett. 8 1 , 1, (2002). J. R. Guest, T. H. Stievater, Gang Chen, E. A. Tabak, B. G. Orr, D. G. Steel, D. Gammon, and D. S. Katzer, "Near Field Coherent Spectroscopy and Microscopy of a Quantum Dot system", Science, 293, 2224 (2001). T. T. Hartley, C. F. Lorenzo and H. K. Qammer, IEEE Trans, on Circuits and systems-I, Fundamental Theory and Applications, 42, 485 (1995). A. K. Hartmann, "Calculation of ground states of four-dimensional ± J Ising spin glasses", Phys. Rev. E, 60, 5135 (1999). A. P. Heberle, J. J. Baumberg and K. Kohler, "Ultrafast coherent control and destruction of excitons in quantum wells", Phys. Rev. Lett. 75, 2598 (1995); A. P. Heberle, J. J. Baumberg, T. Kuhn and K. Kohler, "Femtosecond pulse shaping for coherent carrier control", Physica B272, 360 (1999). R. Hegger, H. Kantz, T. Schreiber, "Practical implementation of nonlinear time series methods: The TISEAN package", e-print arXiv. chao-dyn/9810005. D.F. Hendry and J.A. Doornik, "Empirical Econometric Modelling Using PcGive", Timberlake Consultants Press (London), (2001). T. Hertel, E. Knoesel, M. Wolf, and G. Ertl, "Ultrafast electron dynamics at C u ( l l l ) : Response of an electron gas to optical excitation", Phys. Rev. Lett. 76, 535 (1996).
188
Optimal control and forecasting of complex dynamical systems
R. Hilfer, ed., Applications of fractional calculus in physics, World Scientific, River Edge, New Jersey (2000). S. Hirata, S. Ivanov, I. Grabowski, R. J. Bartlett, "Time-dependent density functional theory employing optimized effective potentials", J. Chem. Phys., 116, 6468 (2002). M. W. Hirsch, S. Smale, Differential Equations, Dynamical Systems and Linear Algebra, Academic Press, New York (1965). J. H. Holland, in Adaptation in Natural and Artificial Systems, University of Michigan, Ann Arbor, MI, (1975). J. H. Holland and J. S. Reitman, in Pattern-Directed Inference systems, edited by D. A. Waterman and F. Hayes-Roth, Academic Press, NY (1978). T. Hornung, R. Meier, D. Zeidler, K. L. Kompa, D. Proch, M. Motzkus, "Optimal control of one-and two-photon transitions with shaped femtosecond pulses and feedback", Appl. Phys. B 7 1 , 277 (2000). W. Y. Hwang, H. Lee, D. D. Ahn, and S. W. Hwang, "Efficient schemes for reducing imperfect collective decoherences", Phys. Rev. A 62, 062305 (2000). L. Ingber, "Very Fast Simulated Reannealing", Mathematical and Computer Modeling, 12, 967 (1989); L. Ingber, B. Rosen, "Genetic algorithms and very fast simulated re-annealing: A comparison", Mathematical and Computer Modelling, 16, 87 (1992). K. Jauregui, W. Hausler and B. Kramer, "Wigner Molecules in Nanostructures", Europhys. Lett. 24, 581 (1993). T. F. Jiang, X.-M. Tong, and S.-I Chu, "Self-interaction-free density-functional theoretical study of the electronic structure of spherical and vertical quantum dots", Phys. Rev. B 6 3 , 045317 (2001); M. A. Omary, M. A. Rawashdeh-Omary, C. C. Chusuei, J. P. Fackler, P. S. Bagus, "Electronic structure studies of six-atom gold clusters", J. Chem. Phys., 114, 10695 (2001). R. S. Judson, M. E. Colvin, J. C. Meza, A. Huffer and D. Gutierrez, "Do intelligent configuration search techniques outperform random search for large molecules?", Int. J. Quantum Chem. 44, 277 (1992). R. S. Judson and H. Rabitz, "Teaching lasers to control molecules", Phys. Rev. Lett. 68, 1500 (1992).
Bibliography
189
J. Kainz, S. A. Mikhailov, A. Wensauer, and U. Roessler, "Quantum dots in high magnetic fields: Calculation of ground-state properties", Phys. Rev. B 65, 115305 (2002). S. Kirkpatrick and C. D. Gelatt and M. P. Vecchi, "Optimization by Simulated Annealing", Science, 220, 671, (1983). M. Kennel, R. Brown, and H. Abarbanel, "Determining embedding dimension for phase-space reconstruction using a geometrical construction", Physical Review A 45, 3403 (1992). A. Y. Kitaev, "Foult-tolerant quatum computation by anyons", Ann. Phys. 302, 2 (2003). K. M. Kolwankar and A. D. Gangal, "Local Fractional Fokker-Planck Equation", Phys. Rev. Lett. 80, 214 (1998). LP. Kornfeld, S.V. Fomin, Ya.G. Sinai, "Ergodic theory", Springer, (1982). M. Koskinen, M. Manninen, B. Mottelson, and S. M. Reimann, "Rotational and vibrational spectra of quantum rings", Phys. Rev. B 63, 205323 (2001). D. Kouri and D. Hoffmann, "Time-dependent integral equation approach to quantum dynamics of systems with time-dependent potentials", Chem. Phys. Lett., 186, 91, (1991). L. P. Kouwenhoven, T. H. Oosterkamp, M. W. S. Danoesastro, M. Eto, D. G. Austing, T. Honda, and S. Tarucha, "Excitation spectra of circular fewelectron quantum dots", Science, 278, 1788 (1997). J. R. Koza, "Genetic programming", A Bradford Book, The MIT Press, (1992). V. F. Krotov "Global methods in optimal control theory", Marcel Dekker, Inc., New York, (1996). M. Toda, R. Kubo, and N. Saito, "Statistical Physics I: Equilibrium Mechanics", Springer-Verlag (1983).
Statistical
D. Kusnezov, A. Bulgac, and G. D. Dang, Phys. Rev. Lett. 82, 1136 (1999). D. A. Lidar, D. Bacon, J. Kempe, and K. B. Whaley, "Decoherence-free subspaces for multiple-qubit errors", Phys. Rev. A 61, 052307 (2000); A. Barenco, D. Deutsch, and A. Ekert, "Conditional Quantum Dynamics and Logic Gates", Phys. Rev. Lett. 74, 4083 (1995). E. N. Lorenz, "Deterministic nonperiodic flow", J. Atmosph. Sc. 20, 130 (1963).
190
Optimal control and forecasting of complex dynamical
systems
D. Loss and D. P. DiVincenzo, "Quantum computation with quantum dots", Phys. Rev. A 57, 120 (1998). F.B. Luczak, and J.T. Devrees, L.F. Lemmens, "Many Body Diffusion and Interacting Electrons in a Harmonic Confinement", Arxiv preprint condmat/0002343, (2000). J. Maddox, "How to shadow noisy chaos", Nature, 347, 613 (1989). Y. Makhlin, G. Schoen, and A. Shnirman, "Quantum-state engineering with Josephson-junction devices", Rev. Mod. Phys. 73, 357 (2001). C. H. Mak, R. Egger, and H. Weber-Gottschick, "Multilevel Blocking Approach to the Fermion Sign Problem in Path-Integral Monte Carlo Simulations", Phys. Rev. Lett. 81, 4533 (1998). S. Mallat, "A Theory for multiresolutional signal decomposition: The Wavelet Representation", IEEE Trans. Pattn Anal. Mach. Intell. 11, 674 (1989). N. Metropolis, A. W. Rosenbluth, M. Rosenbluth, A. H. Teller, and E. Teller, "Equation of State Calculations by Fast Computing Machines." J. Chem. Phys. 21, 1087 (1953). K. Michaelian, "Evolving few-ion clusters of Na and CI", Am. J. Phys. 66, 231 (1998); K. Michaelian, Chem. Phys. Lett. 293, 202 (1998). B. Militzer and E. L. Pollock, "Variational Density Matrix Method for Warm Condensed Matter and Application to Dense Hydrogen", Phys. Rev. E 6 1 , 3470, (2000). J. E. Mooij, T. P. Orlando, L. Levitov, L. Tian, C. H. van der Wal, S. Lloyd , "Josephson persistent-current qubit", Science 285, 1036 (1999). Y. Nakamura, Yu.A. Pashkin and J.S. Tsai, "Coherent control of macroscopic quantum states in a single-Cooper-pair box", Nature 398, 786 (1999). Yu. A. Pashkin, T. Yamamoto, O. Astafiev, Y. Nakamura, D. V. Averin4 and J. S. Tsai, "Quantum oscillations in two coupled charge qubits" Nature 421, 823 (2003). A. O. Niskanen, M. Nakahara, and M. M. Salomaa, "Realization of arbitrary gates in holonomic quantum computation", Phys. Rev. A 67, 012319 (2003); A. O. Niskanen, J. J. Vartiainen, and M. M. Salomaa, "Optimal multiqubit operations for Josephson charge qubits", Phys. Rev. Lett., 90, 197901 (2003); J. J. Vartiainen, A. O. Niskanen, M. Nakahara and M. M. Salomaa, "Accel-
Bibliography
191
eration of quantum algorithms using three-qubit gates", Int. J.Quant. Inf. 2, 1 (2004). J. A. Nelder and R. Mead, "A Simplex Method for Function Minimization", Computer Journal, 7, 308 (1965). M. Norgaard, O. Ravn, N. K. Poulsen and L. K. Hansen, "Neural Networks for Modelling and Control of Dynamic Systems", Springer, (2000). Y. Ohtsuki, W. Zhu and H. Rabitz, "Monotonically convergent algorithm for quantum optimal control with dissipation", J. Chem. Phys. 110, 9825 (1999). Y. Ohtsuki, K. Nakagami, Y. Fujimura, W. Zhu, and H. Rabitz, "Quantum optimal control of multiple targets: Development of a monotonically convergent algorithm and application to intramolecular vibrational energy redistribution control", J. Chem. Phys. 114, 8867 (2001). Y. Ohtsuki, W. Zhu and H. Rabitz, "Monotonically convergent algorithm for quantum optimal control with dissipation", J. Chem. Phys. 110, 9825 (1999). W. D. Oliver, F. Yamaguchi, and Y. Yamamoto /'Electron Entanglement via a Quantum Dot", Phys. Rev. Lett. 88, 037901 (2002). T. H. Oosterkamp, T. Fujisawa, W. G. van der Wiel, K. Ishibashi, R. V. Hijman, S. Tarucha and L. P. Kouwenhoven, "Microwave spectroscopy of a quantum-dot molecule", Nature, 395, 874 (1998). E. Ott, W. Withers, and J. Yorke, "Is the dimension of chaotic attractors invariant under coordinate changes?" J. Stat. Phys., 36, 687 (1984). E. Ott, "Chaos in Dynamical Systems", Cambridge University Press, Cambridge (1994). G. M. Palma, K. A. Suominen, and A. K. Ekert, "Quantum Computation and Dissipation", Proc. R. Soc. London A452, 567 (1996); L. M. Duan, and G. C. Guo, "Preserving Coherence in Quantum Computation by Pairing Quantum Bits", Phys.Rev. Lett. 79, 1953 (1997). I. Deutsch, A. Barenco, and A. Ekert, "Universality in quantum computation", Proc. R. Soc. Lond. A 449, 669 (1995). F. Pederiva, C. J. Umrigar, and E. Lipparini, "Diffusion Monte Carlo study of circular quantum dots", Phys. Rev. B 62, 8120 (2000).
192
Optimal control and forecasting of complex dynamical systems
R.E. Peierls, "On a Minimum Property of the Free Energy", Phys. Rev., 54, 918 (1938). A. P. Peirce, M. A. Dahleh, and H- Rabitz, "Optimal control of quantummechanical systems: Existence, numerical approximation, and applications", Phys. Rev. A 37, 4950 (1988). I. Podlubny, "The Laplace Transform Method for Linear Differential Equations of the Fractional Order", Slovac Academy of Sciences, (1997); I. Podlubny, "Fractional Calculus and Its Applications", Academic press, San Diego, (1999). I. Podlubny, "Geometric and physical interpretation of fractional integration and fractional differentiation", e-print arXiv: math/0110241. S. Poetting, M. Cramer, C. H. Schwalb, H. Pu, and P. Meystre, "Coherent acceleration of Bose-Einstein condensates", Phys. Rev. A 64, 023604 (2001). J.F. Poyatos, J.I. Cirac, P. ZoUer, "Complete Characterization of a Quantum Process: The Two-Bit Quantum Gate", Phys. Rev. Lett. 78, 390 (1997). V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral, H.E. Stanley, "A random matrix theory approach to financial cross-correlations" Physica A 287, 374 (2000). A. Puente and L. Serra, "Oscillation Modes of Two-Dimensional Nanostructures within the Time-Dependent Local-Spin-Density Approximation", Phys. Rev. Lett. 83, 3266 (1999). V. Ramakrishna, R. Ober, X. Sun, O. Steuernagel, J. Botina, and H. Rabitz, "Explicit generation of unitary transformations in a single atom or molecule", Phys. Rev. A 6 1 , 032106 (2000). V. Ramakrishna, R. J. Ober, K. L. Flores, H. Rabitz, "Control of a coupled twospin system without hard pulses", Phys. Rev A 65, 063405 (2002); J. P. Paolo and R. Kosloff, "Optimal control theory for unitary transformations", 68, 062308 (2003). L.H Haroutyunyan and G. Nienhuis, "Coherent control of atom dynamics in an optical lattice", Phys. Rev. A 64, 033424 (2001). K. M. Romero, S. Kohler, and P. Hanggi, "Coherence stabilization of a two-qubit gate by AC fields", cond-mat/0409774. M.A. Ruderman and C. Kittel, "Indirect Exchange Coupling of Nuclear Magnetic Moments by Conduction Electrons", Phys. Rev., 96, 99 (1954). R. Salomon. "The Evolutionary-Gradient-Search Procedure." J.R. Koza et al.
Bibliography
193
(Eds.), Proceedings of the Third Annual Genetic Programming Conference GP-98, Morgan Kaufmann, San Francisco, CA, 852, (1998). E. Scalas, R. Gorenflo, F. Mainardi, "Fractional calculus and continuous-time finance", Physica A 284, 376 (2000); F. Mainardi, M. Raberto, R. Gorenflo , E. Scalas, "Waiting-times and returns in high-frequency financial data: an empirical study", Physica A 287, 468 (2000); N. Laskin, "Fractional Market Dynamics", Physica A 287, 482 (2000). C. Search and P.R. Berman, "Suppression of Magnetic State Decoherence Using Ultrafast Optical Pulses", Phys. Rev. Lett., 85, 2272 (2000). L. Serra, A. Puente and E. Lipparini, "Orbital current mode in elliptical quantum dots", Phys. Rev. B 60, R13966 (1999). S. G. Schirmer, M. D. Girardeau, J. V. Leahy, "Efficient algorithm for optimal control of mixed-state quantum systems", Phys. Rev. A 6 1 , 012101 (2000). J. D. Schaffer, "Multiple Objective Optimization with Vector Evaluated Genetic Algorithms". In Genetic Algorithms and their Applications: Proceedings of the First International Conference on Genetic Algorithms, Lawrence Erlbaum, 93 (1985). C. Schumacher, M. D. Vose, and L. D. Whitley, "The No Free Lunch and problem description length", in L. Spector et al. (Eds.): Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2001), 565, Morgan Kaufmann, San Francisco, (2001). H.W. Schumacher, C. Chappert, P. Crozat, R.C. Sousa, P.P. Freitas, J. Miltat, J. Fassbender, and B. Hillebrands, "Phase Coherent Precessional Magnetization Reversal in Microscopic Spin Valve Elements", Phys. Rev. Lett. 90, 017201 (2003). H. G. Schuster, "Deterministic Wiley-VCH, (1988).
Chaos: An Introduction",
Weinheim, Germany:
B. W. Shore, "The theory of coherent atomic excitation", John Wiley and Sons, (1989). C. Sparrow, The Lorenz Equations: Bifurcations, Chaos, and Strange Springer-Verlag, New York, (1982).
Attractors,
W. Spendley, G. R. Hext, F. R. Himsworth, "Sequential application of simplex designs in optimisation and evolutionary operation." Technometrics 4, 441 (1962).
194
Optimal control and forecasting of complex dynamical systems
O. Speer, M. E. Garcia and K. H. Bennemann, "Photon-assisted Stiickelberg-like oscillations in a double quantum dot", Phys. Rev. B 62, 2630 (2000). J. C. Sprott, "Chaos and Time-Series Analysis", Oxford University Press: Oxford, (2003). C. A. Stafford and N. S. Wingreen, "Resonant photon-assisted tunneling through a double quantum dot: An electron pump from spatial Rabi oscillations", Phys. Rev. Lett. 76, 1916 (1996). V. N. Staxoverov and G. E. Scuseria, "Assessment of simple exchange-correlation energy functionals of the one-particle density matrix", J. Chem. Phys. 117, 2489 (2002). T. H. Stoof and Y. V. Nazarov, "Time-dependent resonant tunneling via two discrete states", Phys. Rev. B 53, 1050 (1996). P. Sutton and S. Boyden, "Genetic algorithms: A general search procedure", Am. J. Phys. 62, 549 (1994). F. Takens, "Detecting strange attractors in fluid turbulence". In: Rand, D. A., and Young, L. S. (Eds.), "Dynamical Systems and Turbulence, Lecture Notes in Mathematics", 898, Springer-Verlag, Berlin (1981). M. Taut, "Special analytical solutions for two and three electrons in a magnetic field", J. Phys.: Condens. Matter 12, 3689 (2000). M. Thorwart, P. Hanggi, "Decoherence and dissipation during a quantum XOR gate operation", Phys Rev A, 6 5 , 012309, (2001). G. B. Thurston, "Viscoelasticity of Human Blood", Biophysical Journal, 12, 1205 (1972). P. K. Tien and J. P. Gordon, "Multiphoton Process Observed in the Interaction of Microwave Fields with the Tunneling between Superconductor Films", Phys. Rev. 129, 647 (1963). W. Tucker, "The Lorenz attractor exists", C. R. Acad. Sci. 328, 1197 (1999). S. Vajda, A. Bartelt, E. V. Kaposta, T. Leisner, C. Lupulescu, S. Minemoto, P. Rosendo-Francisco, and L. Woste, "Feedback optimization of shaped femtosecond laserpulses forcontrolling the wavepacket dynamics and reactivity of mixed alkaline clusters", Chem. Phys. 267, 231 (2001). R. V. V. Vidal, ed., "Applied simulated annealing", Lecture notes in economics and mathematical systems, 396, Springer-Verlag, Berlin, Heidelberg, New York, (1993).
Bibliography
195
L. Viola and S. Lloyd, "Dynamical suppression of decoherence in two-state quantum systems", Phys. Rev. A 58, 2733 (1998); L.Viola, E. Knill, and S. Lloyd, "Universal Control of Decoupled Quantum Systems", Phys. Rev. Lett. 82, 2417 (1999). D. Vion, A. Aassime, A. Cottet, P. Joyez, H. Pothier, C. Urbina, D. Esteve, M. H. Devoret, "Manipulating the Quantum State of an Electrical Circuit", Science 296, 886 (2002). R. de Vivie-Riedle, K. Sundermann, "Design and interpretation of laser pulses for the control of quantum systems", Appl. Phys. B. 7 1 , 285, (2000); A. Apalategui, A. Saenz, and P. Lambropoulos, "Ab Initio Investigation of the Phase Lag in Coherent Control of H2" Phys. Rev. Lett. 86, 5454 (2001). U. Weiss, Quantum dissipative systems, (World Scientific, Singapore, 1999). E. Wigner, "On the Interaction of Electrons in Metals", Phys. Rev. 46, 1002 (1934). E.P. Wigner, "On a class of analytic function from the quantum theory of collisions", Ann. Math. 53, 36 (1951). D. Wolpert and W. Macready, "No Free Lunch Theorems for optimization", IEEE Transactions on Evolutionary Computation 1, 67 (1997). K. Yabana and G. Bertsch, "Application of the time-dependent local density approximation to optical activity", Phys. Rev. A 60, 1271 (1999). C. Yannouleas and U. Landman, "Spontaneous Symmetry Breaking in Single and Molecular Quantum Dots", Phys. Rev. Lett. 82, 5325 (1999). J. Q. You, X. Hu, and F. Nori, "Correlation-induced suppression of decoherence in capacitively coupled Cooper-pair boxes", cond-mat/0407423. G. M. Zaslavsky, M. Edelman, "Pseudochaos", e-print arXiv: nlin/0112033. Y. B. Zeldovich, Soviet Physics J E T P 24, 1006 (1967). E. Zitzler and L. Thiele, "Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach," IEEE Trans. Evol. Comput., 3, 257 (1999). J. Zhang, J. Vala, S. Sastry, K. B. Whaley, "Optimal quantum circuit synthesis from Controlled-U gates", quant-ph/0308167; J. Zhang, J. Vala, K. B. Whaley, S. Sastry, "A geometric theory of non-local two-qubit operations",
196
Optimal control and forecasting of complex dynamical systems Phys. Rev. A67, 042313 (2003); J. Zhang, J. Vala, S. Sastry, K. B. Whaley, "Exact two-qubit universal quantum circuit", Phys. Rev. Lett.91, 027903 (2003); J. Zhang, J. Vala, S. Sastry, K. B. Whaley "Minimum construction of two-qubit quantum operations", quant-ph/0312193.
Z.-W. Zhou, B. Yu, X. Zhou, M. J. Feldman, "Scalable Fault-Tolerant Quantum Computation in Decoherence-Free Subspaces", Phys. Rev. Lett., 93, 010501 (2004). W. Zhu, J. Botina, and H. Rabitz, "Rapidly convergent iteration methods for quantum optimal control of population", J. Chem. Phys., 108, 1953 (1998).
Index
adiabatic approximation, 104 algorithm, 29 artificial agents, 169 autoregressive models, 166
Fermat's variational principle, 2 fidelity, 149 financial markets, 165 Floquet formalism, 138 forecast, 165 fractional derivative, 83 Frobenius trace norm, 156 Fundamental Lemma, 7
B-gate, 162 Bloch-Redfield equations, 148 brachistochrone problem, 5 Caputo derivative, 85 chaos theory, 77 chaotic behavior, 79 CNOT gate, 156 complex systems, 165 convexity, 31 crossover operation, 47
genetic algorithm, 44 genetic operators, 47 global optimization, 30 Gray code, 47 ground state problem, 58 halting problem, 29 Heisenberg coupling, 149 high-frequency time series, 166
decoherence problem, 145 decoherence-free subspaces, 146 degenerate functionals, 15 density matrix, 95 Dirac delta function, 117 distance between functions, 11 double quantum dot, 124
inter-qubit coupling, 150 Ising coupling, 149 isoperimetric problem, 6, 18 Jacobian elliptic function, 115 Josephson qubits, 147
economical modelling, 169 El Farol bar problem, 169 electron pump, 125 Euler-Lagrange equation, 7 Euler-Ostrogradskii equation, 15 evolutionary gradient search, 74
Lagrange multiplier, 18 Lagrangian, 10 least square estimation, 32 Lorenz attractor, 82 Lyapunov exponents, 78 197
198
Optimal control and forecasting of complex dynamical systems
Lorenz attractor, 82 Lyapunov exponents, 78 Magnus series, 107 Makhlin's invariants, 163 Markovian approximation, 105 mathematical pendulum, 114 Mittag-Leffler function, 86 multi-agent models, 176 multi-photon resonance, 138 multiobjective optimization, 34 mutation operation, 47 nanostructures, 122 neural networks, 180 No Free Lunch Theorem, 29 noise reduction, 177 null controllability, 26 objective function, 30 optimal control, 94 optimal control theory, 21 Pareto front, 35 Pareto optimality, 37 partition function, 66 photon-assisted tunnelling, 124 Poincare-Bendixson theorem, 84 Pontryagin Maximum Principle, 22 purity, 150 quantronium, 158 quantum fluctuations, 74 quantum gate, 147 Quantum Genetic Algorithm, 50 quantum Liouville equation, 96 qubit, 147 Rabi oscillations, 125 random matrix theory, 179 realistic constraints, 26 relaxation, 94, 110 Riemann-Liouville derivative, 85 Ritz's method, 20 rotating wave approximation, 107
sensitivity analysis, 25 short-term predictions, 77 simplex method, 38 simulated annealing, 43 sine-Gordon equation, 117 SnelPs law, 2 solar activity, 176 soliton, 116 Stiickelberg oscillations, 128 stochastic optimization, 42 strange attractor, 79 supercoherent, 152 SWAP gate, 164 Takens's embedding theorem, 174 tautochronism, 13 thermal fluctuations, 74 three-body problem, 78 time ordering operator, 103 time-delay reconstruction, 174 topological qubits, 156 transversality condition, 16 travelling salesman problem, 32 Tucker's theorem, 83 two level system, 105 two-qubit register, 148 universal gates, 157 variational calculus, 1 wavelet analysis, 177 weighted-sum method, 38 Wigner molecule, 71 Wilkinson's problem, 33
OPTIMAL CONTROL and FORECASTING of COMPLEX DYNAMICAL SYSTEMS his important book reviews applications of optimization and optimal control theory to modern problems in physics, nano-science and finance. The theory presented here can be efficiently applied to various problems, such as the determination of the optimal shape of a laser pulse to induce certain excitations in quantum systems, the optimal design of nanostructured materials and devices, or the control of chaotic systems and minimization of the forecast error for a given forecasting model (for example, artificial neural networks). Starting from a brief review of the history of variational calculus, the book discusses optimal control theory and global optimization using modern numerical techniques. Key elements of chaos theory and basics of fractional derivatives, which are useful in control and forecast of complex dynamical systems, are presented. The coverage includes several interdisciplinary problems to demonstrate the efficiency of the presented algorithms, and different methods of forecasting complex dynamics are discussed. .„!•ftWu'SS&Jb-
1||J| ISBN 981-256-660-0
vorld Scientific YEARS OF P U B L I S H I N G
www.worldscientific.com