Nonlinear Science
Zensho Yoshida
Nonlinear Science The Challenge of Complex Systems
123
Professor Zensho Yoshida University of Tokyo, Graduate School of Frontier Sciences Kashiwa, Chiba 277-8561, Japan
[email protected]
ISBN 978-3-642-03405-3 e-ISBN 978-3-642-03406-0 DOI 10.1007/978-3-642-03406-0 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2009938821 The original Japanese edition was published by Iwanami Shoten Publishers, Tokyo, 2008 c Springer-Verlag Berlin Heidelberg 2010 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: WMXDesign GmbH, Heidelberg SPIN 12258536 Printed on acid-free paper 987654321 Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Modern science has abstracted, as compensation for establishing rigorousness, the complexity of the real world, and has inclined toward oversimplified fictitious narratives; as a result, a disjunction has emerged between the wisdom of science and reality. Reflecting on this, we see the need for science to recover reality; can it reveal new avenues for thought and investigation of the complexity? The study of science is the pursuit of clarity and distinctness. Physics, after Galilei placed it in the realm of mathematics, has been trying to establish clearness by mathematical logic. While physics and mathematics, respectively, have different intellectual incentives, they have intersected in history on countless occasions and have woven a flawless system of wisdom. The core of rigorous science is always made of mathematical logic; the laws of science cannot be represented without the language of mathematics. Conversely, it is undoubtedly difficult to stimulate mathematical intellect without a reference to the interests of science that are directed to the real world. However, various criticisms have been raised against the discourses of sciences that explain the events of the real world as if they are “governed” by mathematical laws. Sciences, being combined with technologies, have permeated, in the form of technical rationalism, the domain of life, politics, and even the psychological world. The criticisms accuse seemingly logical scientific narratives of being responsible for widespread destruction and emergence of crises, unprecedented suffering of humanity. Such arguments are based on the objection to the oversimplified perspective that deems the real world to be “machines” and sees a common “mechanism” behind all phenomena. Here we notice the fact that many of the theories of physics and mathematics have become very improper “metaphors,” and have diffused; what have become the targets of criticisms as mechanistic barbarism are these ghosts. In the history going back to Galilei and Newton, physics gained triumphs in describing the “cosmos”—the periodic movements of planets and similar regular motions in various systems. However, we have yet to write the theory of the other form of motion—a more general actuality of events in nature and society, that is the so-called “chaos.” We must speak of what the theories of sciences have understood; we must speak of the limits of their legitimacy; we shall have to speak of what these theories are leaving in abeyance. Then, and only then, we can determine the realm in which the cosmos and chaos are not disjunct: the complexity is not eliminated from v
vi
Preface
the scope of studies. We will also have to dispute the validity of the contemporary rhetoric “complex system” that has already begun to pervade scientific dialogue. Beneath the complexity of actual phenomena, there is a mathematical structure that is called nonlinearity—this is the main theme of this book. Any mathematical law cannot be expected to hold unrestrictedly, even if it is exact within a certain range. The law changes with respect to the scale of variables (parameters)—this is the simple meaning of nonlinearity. For instance, how much is the price of 5 apples when one apple is 70 cents? Children learn to solve this problem by the proportionality relation and calculate the price to be $3.50. If the price for 50,000 apples is asked, however, $35,000 is not necessarily correct in economics even if exact as the arithmetic answer. We have to change the rule of pricing according to the scale of the variable (number of apples); the complexity of reality stems from this metamorphose. The category of mathematical laws which is represented by the proportionality relation is called linear, because the graph of a proportionality relation is given by a “line”—nonlinearity is the distortion of the linear graph, the proportionality relation. To study the complexity of the real world, we have to renounce the convenience of assuming the simplest linear relationship between parameters. In the previous example of price calculation, the metamorphose of the linear relation is not simply formulated by varying the unit price; the problem reverts to the question of how the unit price is determined, and this question retrieves the conjunction of this problem to the surrounding “complex system” that consists of the producer, market, consumers, and so on. The term nonlinear is worded by a negation form—it is not a descriptive (deictic) word characterizing a particular property, but it is a distinctive word indicating opposition to linear. The meaning of nonlinear is infinitely wide as a vague area and is not bound to a concrete frame. Therefore, when we say “mathematical structure that is nonlinear,” we do not mean that there is a prescribed structure giving a framework of the theory, but we are paying attention to the unboundedly developing “differences” from linearity. We will critically analyze the structure of linear theory and reveal its limitations. By this process, the meaning of nonlinear (and, simultaneously, linear) will become more clear and precise. It is hoped that partly through these arguments, the complexity that linear theory has abandoned might be revived on the horizon of science. This book is written for readers who have a wide interest in science. It aims to provide an explicit explanation about what nonlinear science is. Because nonlinearity is primarily a mathematical concept, a simple list of so-called “nonlinear phenomena” will not suffice for a proper understanding. We will not evade mathematical considerations; we are going to analyze the “mathematical structure” of the theory. This will be a deliberate work of deciphering the manipulations that develop in the world of mathematical symbols. It is not possible, however, to touch upon technically complicated subjects in this short book. In order to give hints about the methods to approach the problems of contemporary sciences, appendices are given as Notes at the end of each chapter.
Preface
vii
Some related materials are also given as Problems. For extended studies, the reader is referred to textbooks listed at the end of each chapter. The basic plan of this book evolved from discussions with Uichi Yoshida, the editor of Iwanami-Shoten. We wanted to publish a pedagogical book on nonlinear science, which would be sufficiently elementary but with a solid mathematical backbone. It took about five years before the first version of this book was published. I had to totally rewrite the manuscript three times, because I myself struggled with the question, “What is nonlinear?”, the very title of the planned book. I think I have been able to give an answer to this question, with the help of my friends and collaborators. I wish to record my special thanks to Swadesh Mahajan, Yoshikazu Giga, Vazha Berezhiani, Nana Shatashvili, Robert L. Dewar, Hamid Saleem, Vinod Krishan, and Akira Hasegawa. I had an opportunity to organize a research project entitled “Creation and Sustenance of Diversity” at the International Institute for Advanced Studies in Kyoto, where an extremely interdisciplinary group (consisting of researchers of mathematics, physics, meteorology, medicine, sociology, economics, and philosophy) was built up around the common theme “diversity”. This book draws heavily on the products of the project. Especially, I have benefited from discussions with Kamon Nitagai and Mitsuhiro Toriumi. I am also very grateful to all members as well as the directors and staff of the Institute. Tokyo, Japan September 2009
Zensho Yoshida
Contents
1 What Is NONLINEAR? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Nature and Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Natura Vexata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Syndrome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 D´econstruction of Linear Theory . . . . . . . . . . . . . . . . . . . . . . 1.2 The Scale of Phenomenon / Theory with Scale . . . . . . . . . . . . . . . . . . . 1.2.1 The Role of Scale in Scientific Revolutions . . . . . . . . . . . . . . 1.2.2 The Mathematical Recognition of Scale . . . . . . . . . . . . . . . . . 1.3 The Territory of Linear Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Linear Space —The Horizon of Mathematical Science . . . . 1.3.2 The Mathematical Definition of Vectors . . . . . . . . . . . . . . . . . 1.3.3 Graphs—Geometric Representation of Laws . . . . . . . . . . . . . 1.3.4 Exponential Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Nonlinearity—Phenomenology and Structures . . . . . . . . . . . . . . . . . . . 1.4.1 Nonlinear Phenomena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 The Typology of Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 Nonlinearity Emerging in Small Scale—Singularity . . . . . . . 1.4.4 Nonlinearity Escaping from Linearity—Criticality . . . . . . . . 1.4.5 Bifurcation (Polyvalency) and Discontinuity . . . . . . . . . . . . . Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 1 4 5 6 6 8 10 10 12 15 20 24 24 25 27 29 31 33 41 41 43
2 From Cosmos to Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The Order of Nature—A Geometric View . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Galileo’s Natural Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Geometric Description of Events . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Universality Discovered by Newton . . . . . . . . . . . . . . . . . . . . 2.2 Function—The Mathematical Representation of Order . . . . . . . . . . . . 2.2.1 Motion and Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Nonlinear Regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45 45 45 46 48 52 52 54 ix
x
Contents
2.2.3 Beyond the Functional Representation of Motion . . . . . . . . . 56 Decomposition—Elucidation of Order . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.3.1 The Mathematical Representation of Causality . . . . . . . . . . . 58 2.3.2 Exponential Law—A Basic Form of Group . . . . . . . . . . . . . . 60 2.3.3 Resonance—Undecomposable Motion . . . . . . . . . . . . . . . . . . 62 2.3.4 Nonlinear Dynamics—An Infinite Chain of Interacting Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 2.3.5 Chaos—Motion in the Infinite Period . . . . . . . . . . . . . . . . . . . 67 2.3.6 Separability/Inseparability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.4 Invariance in Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 2.4.1 Constants of Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 2.4.2 Chaos—True Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 2.4.3 Collective Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 2.4.4 Complete Solution—The Frame of Space Embodying Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 2.4.5 The Difficulty of Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 2.5 Symmetry and Conservation Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 2.5.1 Symmetry in Dynamical System . . . . . . . . . . . . . . . . . . . . . . . 86 2.5.2 The Deep Structure of Dynamical System . . . . . . . . . . . . . . . 87 2.5.3 The Translation of Motion and Non-motion . . . . . . . . . . . . . . 91 2.5.4 Chaos—The Impossibility of Decomposition . . . . . . . . . . . . 94 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 2.3
3 The Challenge of Macro-Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 3.1 The Difficulty of Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 3.1.1 Chaos in Phenomenological Recognition . . . . . . . . . . . . . . . . 111 3.1.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 3.1.3 Attractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.1.4 Stability and Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 3.2 Randomness as Hypothetical Simplicity . . . . . . . . . . . . . . . . . . . . . . . . 121 3.2.1 Stochastic Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 3.2.2 Representation of Motion by Transition Probability . . . . . . . 123 3.2.3 H-Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 3.2.4 Statistical Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 3.2.5 Statistically Plausible Particular Solutions . . . . . . . . . . . . . . . 131 3.3 Collective Phenomena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 3.3.1 Nonequilibrium and Macroscopic Dynamics . . . . . . . . . . . . . 132 3.3.2 A Model of Collective Motion . . . . . . . . . . . . . . . . . . . . . . . . . 133 3.3.3 A Statistical Model of Collisions . . . . . . . . . . . . . . . . . . . . . . . 137 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Contents
xi
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 4 Interactions of Micro and Macro Hierarchies . . . . . . . . . . . . . . . . . . . . . . 153 4.1 Structure and Scale Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 4.1.1 Crossing-Over Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 4.1.2 Connection of Scale Hierarchies—Structure . . . . . . . . . . . . . 154 4.2 Topology—A System of Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.2.1 The Topology of Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.2.2 Scale Hierarchy and Topology . . . . . . . . . . . . . . . . . . . . . . . . . 158 4.2.3 Fractals—Aggregates of Scales . . . . . . . . . . . . . . . . . . . . . . . . 159 4.3 The Scale of Event / The Scale of Law . . . . . . . . . . . . . . . . . . . . . . . . . 161 4.3.1 Scaling and Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 4.3.2 Scale Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 4.3.3 Spontaneous Selection of Scale by Nonlinearity . . . . . . . . . . 167 4.3.4 Singularity—Ideal Limit of Scale-Invariant Structure . . . . . . 168 4.4 Connections of Scale Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 4.4.1 Complexity—Structures with Multiple Aspects . . . . . . . . . . 170 4.4.2 Singular Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 4.4.3 Collaborations of Nonlinearity and Singular Perturbation . . 174 4.4.4 Localized Structures in Space–Time . . . . . . . . . . . . . . . . . . . . 178 4.4.5 Irreducible Couplings of Multi-Scales . . . . . . . . . . . . . . . . . . 183 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Chapter 1
What Is NONLINEAR?
Abstract The word nonlinear indicates something that conflicts with linear. It is not positively defined, but rather the “antithesis” of linear. Despite a lack of concrete contents, nonlinear is a powerful and productive key word indicating the direction of contemporary sciences. This is because growing criticism of linear theory is pushing various fields toward the study of nonlinearity. In this introductory chapter, we shall give an overview of the meaning of nonlinear as a mathematical structure, as well as of its impact on sciences.
1.1 Nature and Science 1.1.1 Natura Vexata Nature is, originally, what is great, profound and capricious for human beings. Ancient people developed various idols to represent diverse aspects of nature— gods or monsters who tossed people back and forth. What was frightening was their tremendous energy and unpredictable behavior. In modern ages, however, our view of nature has been fundamentally altered. Scientists have probed the anatomy of nature to reveal the “harmony” in the essence— one may express the “logic” of nature in the words of mathematics; one may harness various parts of nature by making machines. We began to dissect predictable (or reproducible) parts of nature and to cultivate them. Predictability is not only a forecast in time future—its general meaning is the knowledge of “causal relations.” For instance, pharmacology starts from firsthand knowledge (experimental wisdom such that this medical herb is effective to that case), proceeds with componential analyses on the medical herb to find the effective chemicals, and pins down the mechanism of the medical effect at the molecular level. In this way, scientific research is directed toward universalities or principles. However, can the sciences really tame nature within a predictable area? A thing in the real world is a system that is a complex composition of plenty of elements. It is often difficult to study it directly. So we divide the system under examination into elements as small as we deem necessary and start by studying the simplest problem. In Discourse on the Method [6], Ren´e Descartes (1596–1650) Z. Yoshida, Nonlinear Science, DOI 10.1007/978-3-642-03406-0 1, C Springer-Verlag Berlin Heidelberg 2010
1
2
1 What Is NONLINEAR?
Fig. 1.1 R. Descartes proposed a method of science that can develop a clear and distinct theory. His reductionism, described in his famous book Discourse on the Method, gave a guiding principle of modern science. Portrait by Frans Hals (unsigned) (Louvre Museum, Paris)
proposed a method of giving a solid platform from which a clear and distinct theory could be developed; he formulated “four precepts” describing the idea of dividing a system into elements (Fig. 1.1). By these statements, the reductionism in sciences is often connected to Descartes (though its origin actually dates back to ancient Greek philosophy). Reduction is a basic notion of philosophy, which generally means the replacement of a thing by something that we can directly manipulate. Reductionism is a strategy of science, which teaches us to start by separating elements from the complex real world and to manipulate each of them by experimental or theoretical methods. The operation of separating an element from the real world means, in case of an experiment, the construction of an “experimental device,” and, in case of theory, the formulation of a “model.” The first work of an experimentalist is to cut out an element (object) from nature and to place it in the isolated space of his device.1 The device has some active manipulators by which the experimentalist can control the element and observe its responses. The element, effectively disconnected from nature, is under the perfect control of the device. The researcher tries to establish the
1 Francis Bacon (1561–1626) proposed a method of developing philosophy, the inductive reasoning from fact to law, which can free one’s mind from “idols” (idola). He pointed out that the description of nature in its free condition (natura libera) is not sufficient and claims the necessity of experimental method to observe afflicted nature (natura vexata) [1].
1.1
Nature and Science
3
possibility of experimental observations by achieving perfect control of the object, i.e., by eliminating the uncontrollable element of freedom from the object. A theorist also starts from the epistemological operation of separating an object from nature. For example, when a physicist discusses the motion of a “particle,” such an object is an abstract model that has eliminated all complicated relations with other elements as well as internal properties. A particle, called “point mass,” is represented only by its mass, position of the center of inertia, and time. The famous theory of Galileo Galilei (1564–1642) asserts: “A heavy object and a light object fall at the same speed, in the ideal condition”. Here, the ideal condition means that the interaction of the objects and air can be ignored. Experimentally, this condition can be satisfied when we put the objects in a vacuum vessel.2 Objects in the real world, however, fall in totally different ways; a bird’s wing, for instance, moves on an extremely complicated trajectory. It is still an impossible problem to calculate such an orbit that is affected by the interactions with the air, even with the help of a top-level computer. The notion of ideal condition eliminates such tremendous complexity and allows a theorist to assume an abstract model of an object. We perceive here a sense of apprehension about scientific statements. There is a large gap between the problems in the real world and the understanding of the decomposed elements, the anatomized nature. We can control only slight movement of small elements; we are describing a small part of the universe—a falling apple, one planet, one group of animals, weather in a narrow region, etc.—on which we arbitrarily focus, as with the lens of a camera. The object of interest is, then, isolated from the rest of the universe. Science has described only such fragments of nature. And our na¨ıve images of the unpredictable and diverse aspects of nature have been hidden (or moved to the periphery) by careful and argumentative logic in the process of dissecting (rupturing) it. To restore our original view of nature, it is now necessary to polish our sense of science concerning composition/synthesis, as opposed to decomposition/reduction. Linear theory explores the “composition” and “decomposition” of objects in their ultimate simplicity. The notion of linear is the generalization of the proportionality relation to higher dimensions (a larger number of variables)—the term “linear” is used because the graph of a proportionality relation is represented by a “line.” The proportionality relation is the simplest law that one may assume between two variables (see Sect. 1.2). For example, if the price of an apple is 50 cents, five apples cost $2.50. If an electric voltage of 1 V can drive a current of 0.1 A through a resistor, one may guess that the current will become 1.0 A when the voltage is increased to 10 V. Like these examples, it is quite conceivable to consider a proportionality relation between a pair of variables. The fundamental mathematical structure of the proportionality relation—that is, as we shall see in Sect. 1.3, the axioms of composition/decomposition— can be generalized for relations among
2 Galileo, of course, did not do the experiment in a vacuum vessel. Recognizing that the difference of the falling velocity originates from the friction of the air, he tried to reduce the difference by making samples of the same size and shape from wood and metal.
4
1 What Is NONLINEAR?
many variables. A variable consisting of multiple components is called a vector. Students may encounter the word vector, first, when they learn mechanics: A “force” is a vector that is represented by an “arrow,” and it may be composed/decomposed by geometrical manipulation using a parallelogram. Here, what we mean by drawing arrows and parallelograms to decompose a vector is the proportional distribution into the directions of the two sides of the parallelogram; composition is the inverse operation. Repeating this method, we can compose/decompose a vector in spaces of three or more dimensions The eyes of scientists try to “reduce” an object into a vector by parameterization (i.e., measurement of some parameters that may represent the characteristics of the object). The real object is converted into a geometrical object and is projected onto the vector space (which is also called a linear space). For this “space,” where the object = vector is to be placed, the “linear structure,” i.e., an axiomatic system that enables the composition/decomposition of a vector (see Sect. 1.3.2), is prepared. Hence the notion of linear law = generalized proportionality relation coincides with the “frame of the space”; the straight graph of a linear law is, in itself, a linear (sub-) space.3 However, there is a fundamental gap between the “theoretical space” (embodying “freely composable/decomposable object” or “space structured by a linear law”) and our na¨ıve recognition of the “real world” (involving unpredictable phenomena and infinite diversity that are impossible to compose/decompose). Where does this unbridgeable gap start?
1.1.2 Syndrome Syndrome is originally a technical term of medical science. The human body is a complex system where many organs (each of which has its own particular function) are cooperating. Slight sickness, caused by a little slump of an organ, may be cured by correcting the function of the organ or by a surgical operation removing a small part. When a serious problem occurs in some part, however, two or more organs are involved in a chain reaction and recovery becomes more difficult. Such a phenomenon is called a syndrome. This medical word is often used as a rhetoric when a system loses control and starts unpredictable behavior. Our modern world is flooded with complex artifacts that human beings have added to nature. For example, a nuclear plant is a huge, complex system that is composed of a control system for nuclear reactions, a circulation system of core-cooling water, steam generators, turbine power generators, power transmission lines, the electric power network system, as well as the system operators. A “perturbation” in such a system can be amplified by operators’ errors,
3 The expression of the linear law can be simplified—decomposed into independent proportionality relations—by choosing a frame (set of the basis vectors) of the linear space to be amenable to the linear law. The best choice is found by solving the so-called eigenvalue problem; this central strategy of linear theory—structuring the space by the law itself—will be explained in Sect. 2.3.
1.1
Nature and Science
5
and, in the most serious case, the trouble may reach core meltdown. Such a nuclear reactor accident is called a “China Syndrome.” This word expresses an absurd image that a nuclear accident in an American power plant will eject reacting hot nuclear fuel that will melt the rock and continue sinking to reach China, on the other side of the earth. Unfortunately, a severe core-melting accident, which gave (partial) reality to this rhetoric, occurred in Chernobyl nuclear plant in the old Soviet Union and caused an unprecedented disaster in 1986. The behavior of a system with some complexity, even if it was designed and made by human beings, can be uncontrollable—Chernobyl’s tragic episode tells the fundamental difficulty of predictions about a complex system. “Environmental problems” are causing fears of syndromes that may destroy our modern civilized society. Does our consumption do great damage to the earth? Will the influences of our activities cause an unforeseen phenomenon? Will nature avenge our abuse with its massive power? Nature is still a threat to us due to its tremendous energy and unpredictable behavior. Here, the scales are the central problem; our concerns are the “scale” of the environmental impacts that can bring about a syndrome and the “scale” of the resultant environmental perturbations. The scientific content expressed by the word “environment” is the dynamics of a huge, complex system that combines many kinds of elements (atmosphere, ocean, and the ecosystem consisting of many kinds of species). The connections of these elements change variously, depending on the scales (magnitudes) of the perturbations, and make prediction of a syndrome very difficult. In a system such as the human body, a complex “plant”, or the global environment, the connections relating many elements are modulated depending on the magnitude of the movement of each element. A system works as a united body, and its dynamics cannot be understood if it is decomposed into separate elements. Then, how can we analyze the composition of elements? How can we have a perspective concerning the scale? Carelessness about decompositions (or ruptures) and unconcern about scales have been fostered by linear theory. Thus, nonlinear is crucial for contemporary science to regain an understanding of the scale of events that make the connections of elements irreducible; it is a challenge to the fictitious vision of abstracted events. Therefore we should not restrict our discussions to a matter of “dichotomy” of mathematical structures. The relation between linear and nonlinear should be analyzed as a problem retroactive to the genesis of science.
1.1.3 D´econstruction of Linear Theory The scope of nonlinear science (the science devoted to nonlinearity) is not merely a “complement” of linearity. Motivated by criticisms of the narrowness and persistence of linear science, it has developed into a wider realm subsuming the linearity. Borrowing the term coined by Jacques Derrida (1930–2004), we can say that the aim of nonlinear science is the d´econstruction of linear science. Here, d´econstruction means a strategy of philosophy that critically analyzes a “central vs. peripheral” relation (created in an implicit manner) of dichotomy and aims at reversing the
6
1 What Is NONLINEAR?
relation.4 It does not intend to bring about “destruction” (collapse) of the system that has been structured by the “center”—the previous center will maintain its effectiveness, though its validity will be rather limited. Using this sophisticated strategy, we are going to include into the nonlinear science the solid mathematical structure of linear science. The framework of linear theory is embodied by some mathematical laws that generalize the proportionality relation. There is no room for doubting the importance of linear theory as the starting point of all rational considerations. However, one should not expect that any law or principle holds unrestrictedly. A proportionality relation, indeed, distorts when the magnitudes of variables become to some extent large. Here, note that the concept of scale (or magnitude) intervenes in our discussion. Linear theory excludes the notion of scale from the arguments; this exclusion poses a fundamental, but hidden, limitation to its legitimacy. The d´econstruction of linear theory (or “linearized” theory) starts by rehabilitating the notion of scale —focusing attention on the world of large scales where the proportionality relation distorts and linear theory ceases to apply. The d´econstruction of linear theory expands to the d´econstruction of “Science.” Here, “complexity” is the key word for the d´econstruction of the “central science” based on Cartesian reductionism. The reductionism that was the methodology to search for the “order” = “simplicity” of nature was, simultaneously, a strategy to bundle off the “disorder” = “complexity” to the periphery of science. Now, the exiled “complexity” tries to reverse its position by shaking the structure of the traditional science that holds “order” in the center—radical problems of contemporary sciences are throwing fundamental doubts on the validity of the science that shuts itself up in the area of order. The dichotomy between the cosmos and the chaos has a parallel relation with the dichotomy between the linearity and the nonlinearity. Linear theory is, in essence, an exploration of “order”; it has constructed a microcosm of logic that is not violated by complexity. The d´econstruction of linear theory—the most clear expression of order—will bring about the d´econstruction of science, allowing the invasion of complexity.
1.2 The Scale of Phenomenon / Theory with Scale 1.2.1 The Role of Scale in Scientific Revolutions The great joy of a scientist is to find truth that no one noticed before. New knowledge of science emerges to overturn old common sense (idols). A revolution of science is
4
The word d´econstruction is a coinage of Derrida [4, 5]. He perceives a fundamental inequality in various dichotomies (such as actuality/potentiality or parol/´ecriture) and criticizes the politics that utilizes the implicitly structured conflicts. He criticizes the Hegelian dialectic process that intends to reconcile conflicts by sublation (aufheben) as abstraction of actual problems.
1.2
The Scale of Phenomenon / Theory with Scale
7
the discovery of an unknown world of new scale—or, it may be more appropriate to say that a new scale is discovered to indicate a new world. We, educated by modern science, know that the ground surface is not a plane, terra is not firm—the earth is a spherical object, inside of which is a hot fluid that convects actively in large space–time scales, and the continents are just thin and fragile husks. The earth is a tiny planet circulating around the sun. Our sun is an average star in an ordinary galaxy in the universe. Our universe emerged 10–20 billion years ago and is still expanding. There may exist other universes. The notion of our “ancestors” can no longer assume the invariance of our species —human (Homo sapiens) evolved between 0.4 and 0.25 million years ago (the Homo genus diverged from the Australopithecus about 2 million years ago). About 100 million years ago, huge reptiles were ruling the earth. When fossils of huge bones were found from an old stratum, the forgotten history was revived with amazement. How long can the human genus survive? According to paleontology, the average longevity of a genus is several million years. These are the statements of science narrating large space–time scales. Modern science has also widened our view to small-scale worlds. On the micro-scale, matter is reduced to “particles.” The diversity of materials is explained by the variety of compounds or array structures of particles. From the viewpoint of materialism, biodiversity is explained as the degree of freedom of the particle array permitted to the polymer that is called DNA. In the micro-scale realm of quantum theory, a “particle” exhibits characteristics of a wave (it causes diffraction and interference), and the “existence” of a particle can be interpreted only in a “probabilistic” meaning. The revolution of science may completely deny preceding theories, as Darwin’s theory of evolution did. However, in many cases (especially in physics), a new theory expands the horizon of knowledge by subsuming an old theory; the old theory keeps a narrow position as an “approximate theory” that holds in a special limit of scale.5 For example, Newton’s classical mechanics is still valid as the macroscopic (large energy) limit of quantum mechanics, as well as the microscopic (small energy) limit of relativity. The “turf” that Newton’s classical mechanics still reserves is the world of human’s first-hand scale; from there, physics began to development. Even after our scope of interest has expanded far beyond this turf, the old theory does not turn inside out, but instead maintains a limited value as a local truth. In order to subsume an old theory, a new theory should involve a scale; an old theory is, then, localized in a limited domain of this scale. Because the old theory did not recognize such a scale, it was unaware of its “limitation.”
5 When the world of an old theory is reviewed from the world of a quite different scale (that was the “periphery” for the old authority), the old theory maintains a limited truth, moving backward to the seat of “approximate theory.” In the previous section, this strategy for reversing the central/peripheral perspective structured by classical theories was called d´econstruction, borrowing the term from Derrida.
8
1 What Is NONLINEAR?
We can say that the aim of nonlinear science is to recover the scale of which linear theory lost sight. A more careful explanation of this point will become an introduction to nonlinear science.
1.2.2 The Mathematical Recognition of Scale The literal meaning of scale is the unit by which we can quantify an observed quantity. The unit is what we only have to choose arbitrarily. However, we are going to reveal that the choice of scale is not freed from the “object”—the notion of scale includes a particularity of a phenomenon (or the theory describing a phenomenon). What does this mean? Let us consider a pair of variables (parameterization of an object) x and y, and denote by δx and δy their respective variations. When we find a certain relation between δx and δy, we may claim that there is a law governing x and y. If |δx| and |δy| are “small,” this relation is normally a proportionality relation, i.e., δy = aδx
(a = constant).
(1.1)
Here, “smallness” is a relative notion—we can say “small” (or “large”) when we compare a parameter with a certain reference (or scale). However, we do not know a priori the actual reference that determines the “smallness” of the variation. Instead, we may say the variations |δx| and |δy| are small as far as relation (1.1) holds, i.e., we infer the scale from the limitation of the proportionality relation. There are many examples of proportionality relations: elastic law of a spring, Ohm’s law of electric resistance, etc.—many elementary physics laws taught to children are proportionality relations. Newton’s equation of motion is also a proportionality relation between force and acceleration. The proportionality coefficients (the elastic constant, resistance, mass, etc.) may be assumed to be constant numbers for some magnitudes of variations of the parameters. However, when the magnitudes of the extension, the electric current, or the velocity become large, the coefficients are no longer constant. The scale that determines “smallness” is discovered when the proportionality relation is destroyed.6 Complexity in the real world may develop in the regime where the proportionality relation—the law of simplicity—is violated. In various problems of contemporary sciences, such as climate change, change in ecosystems, and economic fluctuations, finding the scale (the basis to know if a variation is either small or large) is the key issue. Linear theory keeps silent about the scale—it is “unaware” of the scale. By 6 The scale is not an a priori number prescribed in a law; it is discovered when a new theory emerges to subsume the old theory, and, then, it measures the range where the old theory applies as the approximation of the new theory. As we discussed in Sect. 1.1.3, d´econstruction of an old framework switches the viewpoint from the center to the periphery; the periphery determines the scale of the landscape. The new theory, when it is constructed, pretends to have known the scale, however.
1.2
The Scale of Phenomenon / Theory with Scale
9
finding a deviation from a linear theory (proportionality relation), we can estimate the scale that “localizes” the linear theory. Let us formalize the foregoing arguments, invoking the theory of Taylor expansion. We assume that a function f (x) is smooth (analytic) in a certain neighborhood of a point x = x0 . Then we can write f (x) as f (x) = a0 + a1 δx + · · · + an δx n + · · · ,
(1.2)
where δx = x − x0 , a0 = f (x0 ), an = f (n) (x0 )/n! and f (n) = d n f /d x n . If the Taylor expansion (1.2) has a non-zero radius of convergence,7 we can find a certain finite number r such that (see Problem 1.1) sup |an r n | < 1. By this r , we can characterize the scale of the function f (x)—using r as the unit, we rescale x (simultaneously, we shift the origin of x to x0 ): xˇ =
x − x0 . r
(1.3)
We call this process normalization. Using the normalized variable xˇ , we can rewrite (1.2) as f (xˇ ) = aˇ 0 + aˇ 1 xˇ + · · · + aˇ n xˇ n + · · · ,
(1.4)
where aˇ n = an r n (because |aˇ n | < 1, the radius of convergence of the power series (1.4) is greater than or equal to 1; see Problem 1.1). In the normalized Taylor series (1.4), we observe that |xˇ n | |xˇ | for |xˇ | < 1 and n > 1 (thus, |aˇ n xˇ n | 1). Hence, within the range of |xˇ | < 1 (i.e. |δx| < r ), we may neglect the higher-order terms and approximate f (x) by a linear function f (xˇ ) ≈ aˇ 0 + aˇ 1 xˇ . Note that r is the scale that determines the range of δx where f (x) can be approximated by a linear function. We can say that δx is “small” if |δx| < r (i.e., |xˇ | < 1); the “smallness” is judged by the proximity of f (x) to a linear function. As seen in this formal argument, the mathematical model of a phenomenon, if it is not linear, has a scale that characterizes the deviation from a proportionality relation (distortion from a linear graph). The measure of the scale is often a number of absolute importance. In the law of motion, for example, the mass (the coefficient relating force and acceleration) may be assumed to be a constant number (i.e., Newton’s law of motion holds) within the range of velocity that is much smaller than the speed of light c—the c is the absolute measure of the scale in the theory of motion. Einstein’s relativity theory, correcting Newton’s law, claims that the mass of a particle moving with a velocity v must be Let us consider a power series n an x n . If R −1 = lim supn→∞ |an |1/n < ∞, the power series converges for |x| < R. We call R the radius of convergence. If the Taylor expansion (1.2) has a non-zero radius of convergence, f (x) is said to be analytic in the neighborhood of x0 .
7
10
1 What Is NONLINEAR?
m0 m= , 1 − (v/c)2
(1.5)
where m 0 is the mass of the particle at rest. Defining vˇ = v/c, and Taylor-expanding γ (ˇv ) = m(ˇv )/m 0 in the neighborhood of vˇ = 0, we observe 1 3 Γ(ν + 1/2) 2ν γ = 1 + vˇ 2 + vˇ 4 + · · · + √ vˇ + · · · . 2 8 π ν!
(1.6)
The radius of convergence of (1.6) is shown to be 1 (see Problem 1.1). Hence, if vˇ (the velocity normalized by c) is sufficiently small (now the “smallness” is quantitatively in the sense of |ˇv | 1), we may approximate m ≈ m 0 and use Newton’s law as an “approximate law.” Because the speed of light c ≈ 3 × 108 m/s is a huge number in the scope of the physics of Newton’s age, the constancy of m did not come under question. The limitation of Newton’s theory—the nonlinearity in the relation between force and acceleration—became apparent after the scope of physics extended to the scale comparable to c, and the theory found c as an absolute number to normalize (scale) the velocity.8
1.3 The Territory of Linear Theory 1.3.1 Linear Space —The Horizon of Mathematical Science Mathematical science attends to the deep structure of various phenomena; events are projected onto the horizon of mathematics by abstraction into parameters, i.e., by measuring the object and expressing it by numbers = parameters; the “structure” is, then, the relations among the measured parameters. The measurement of an object and its subsequent representation by a set of parameters is called parameterization. When we parameterize an object x with a number, we denote the number by xˇ and distinguish it from the x itself. The evaluation of the number xˇ , which means the measurement of x, is mathematically written as x = xˇ e,
(1.7)
where e is the unit of the number xˇ . To put it another way, the unit is defining the basis for the parameterization. We can freely choose e, and, if we change e, the value xˇ changes. Generally, we need multiple parameters to parameterize an object. For the time being, we call a set of multiple variables a “vector.” The number of the variables is
8 The c is the largest possible speed to relate two events in space–time. Light (electromagnetic wave propagating in vacuum) can propagate at this maximum speed, because it is a massless particle.
1.3
The Territory of Linear Theory
11
called the dimension or the degree of freedom. The basis of a vector is the set of units (mathematical definition of these notions will be refined in the next subsection). For example, to analyze the motion of a “particle” (or the center of inertia of a body), we have to measure its position which we denote by x. For the measurement, we first define coordinates. In our space, we need three independent coordinates. In the so-called Cartesian coordinates, we invoke three mutually orthogonal unit vectors (the unit length is, for example, 1 m). Using this set of unit vectors as the basis, the position is parameterized as x = xˇ ex + yˇ e y + zˇ ez .
(1.8)
Here, the basis is equivalent to the coordinate system. On a more abstract level, any object is identified as a vector; parameterization (measurement) of the object = vector is done by specifying a set of units = basis and quantifying the set of parameters. For example, a basket of fruit is, from a mathematician’s perspective, a vector—let us denote it by a symbol x. Suppose that x contains one apple, two lemons, and three pears. Representing one apple by A, one lemon by L, and one pear by P, we can write x = 1 A + 2L + 3 P. In business, one may be more concerned with more precise amounts. So, one can measure the weight of each component. Let us represent the apple, lemon, and pear of 1 g by e1 , e2 , and e3 , respectively. Using the set of these symbols as the basis, we can write x = x 1 e1 + x 2 e2 + x 3 e3 , where x1 , x2 , and x3 are, respectively, the weight of each component. A dietitian may see the fruit basket from a quite different viewpoint; defining the basis by the vitamin A, vitamin C, fructose · · · of 1 g and denoting them by g 1 , g 2 , g 3 , · · · , the fruit basket x is parameterized as x = ξ1 g 1 + ξ2 g 2 + ξ3 g 3 · · · . We note that the parameterization of an object = vector is to decompose (resolve) it in terms of the basis = set of units; the choice of the basis is done in accordance with the observer’s “subject” to see the object—we emphasize this point here to strengthen the argument of Sect. 1.1.1. We have to use the proportionality relation in order to compose or decompose fruit baskets. Calculation of the price, too, is done by the proportionality relation. These algebra are the so-called law of vector composition. Before giving a precise and general definition of these concepts, we first see the na¨ıve connection between the notion of the vector space (linear space) and the proportionality relation. Let us
12
1 What Is NONLINEAR?
consider two fruit baskets x = x1 e1 + x2 e2 + x3 e3 and y = y1 e1 + y2 e2 + y3 e3 . If α of x and β of y are added, we obtain a combined fruit basket αx + β y = (αx1 + βy1 )e1 + (αx2 + βy2 )e2 + (αx3 + βy3 )e3 .
(1.9)
Here, we have applied the proportionality relation to each component (element). The calculation of the price has to be based on the proportionality relation, too. Suppose that the unit price (for 1 g) of each fruit is p1 , p2 , and p3 , respectively. The price of the basket x, then, is given by p1 x1 + p2 x2 + p3 x3 , which expresses the proportionality relation for each component. This calculation is the so-called inner product of the vector x (= fruit basket) and an adjoint vector p = p1 ε 1 + p2 ε 2 + p3 ε 3 that is the “price list.” Here, each ε j ( j = 1, 2, 3) is a unit vector representing the unit of the price of the corresponding fruit of 1 g. We have the orthogonality relation ε j · ek = δ jk .9 The price is, using the conventional notation of inner product, p · x. Generalizing the foregoing arguments, we can define what we call “measurement of an object” (or “description of an event”)—it is a sequence of processes (1) select n variables that can represent the object, (2) give a unit to each variable, and (3) evaluate the number of each variable based on the unit. An object in nature (or society) is, by measuring it in an appropriate coordinate system, projected into a vector space of dimension n, and it becomes a mathematical object that can be studied with geometric methods.10
1.3.2 The Mathematical Definition of Vectors Up to now, a set of multiple variables has been called a “vector.” However, the concept of a vector should be defined independently of (prior to) its “measurement” (or description in terms of a set of parameters). The measurement (description) belongs to the side of “subjectivity”; one can select an arbitrary way of measurement. For example, (1.8) gives one possible description of the position vector x—we are not defining the left-hand side (the position vector) by the right-hand side (its description). The right-hand side changes when we change the basis (coordinate system). Therefore, prior to describing a vector by its components (elements), we need to define what a vector is. We shall define that a vector is a member of a linear space (or, synonymously, vector space) where the law of vector composition applies. This law is basically the generalization (to higher dimensions) of the proportionality law, by which we can manipulate vectors; the measurement of a vector will be done by such manipulations. By axiomatizing the algebra based on the proportionality 9
δ jk is the Kronecker delta that is defined as δ jk = 1 for j = k and δ jk = 0 for j = k.
10
Galileo said that nature is a book written in the language of mathematics. He thought that phenomena in nature could be studied using geometric methods; prior to this, the object must be mapped to a geometric object—a vector—by the “measurement.” We note that the measurement is based on a choice of a certain scale (unit), and here, “subjectivity” concerning the scale influences the theory. Because of this problem, a deeper analysis is needed for the discussion about the scale, which will be discussed in Chap. 4.
1.3
The Territory of Linear Theory
13
(linear-graph) relation, the “linear structure” is marked to the “space” where we describe/analyze/manipulate the object = vector; the space where we develop theory is, thus, called a linear space. We denote by R the real number field (the totality of real numbers) and by C the complex number field (the totality of complex numbers). A set X is called a linear space if it is endowed with the following law of vector composition: Let K = R or C (which we call the field of scalars). For arbitrary members x and y of X , and an arbitrary number α ∈ K, we define the sum x + y and the scalar multiple αx that satisfy ⎧ ⎪ ⎪ (a) ⎪ ⎪ (b) ⎪ ⎪ ⎪ ⎪ (c) ⎪ ⎪ ⎨ (d) ⎪ (e) ⎪ ⎪ ⎪ ⎪ (f) ⎪ ⎪ ⎪ ⎪ (g) ⎪ ⎩ (h)
if x, y ∈ X and α ∈ K, then x + y ∈ X and αx ∈ X , x + y = y + x (x, y ∈ X ), (x + y) + z = x + ( y + z) (x, y, z ∈ X ), for every x and y, there is a unique z such that x + z = y, 1 x = x (x ∈ X ), α(β x) = (αβ)x (x ∈ X, α, β ∈ K), (α + β)x = αx + β x (x ∈ X, α, β ∈ K), α(x + y) = αx + α y (x, y ∈ X, α ∈ K).
(1.10)
A member of X is called a vector, and a number in K is called a scalar (or a coefficient). A linear space is said to be real or complex according as K is R or C. As has already been said, we have to define the law of vector composition without invoking the components of a vector, so the axiomatic definition (1.10) is rather complicated in comparison with the aforementioned na¨ıve rule (1.9) in which the sum and the scalar multiple are defined by the component-wise proportional calculations (evidently, (1.9) is consistent with the general definition (1.10)).11 The law of vector composition enables us to compose/decompose vectors, and by these manipulations, we can parameterize a vector and represent it in terms of the components (elements)—in fact, the parameterization is just the decomposition of a vector into a set of elementary vectors, each of which is directed parallel to one of the basis vectors. Let us formulate the process of parameterization explicitly. Let X be a linear space. We choose vectors e1 , · · · , en ∈ X (n is a certain integer), and define a system B = {e1 , · · · , en }. This B is said to be the basis of X if every x ∈ X can be written as x=
n
xjej
(x1 , · · · , xn ∈ K),
(1.11)
j=1
11
However, the abstractness enables us to consider various kinds of vectors. A function may be regarded as a vector of a linear space (which is called a function space); for functions f (x) and g(x), we can define f (x) + g(x) and α f (x) appropriately, and compose/decompose functions as vectors (see Note 1.1).
14
1 What Is NONLINEAR?
with a unique set of scalars (coefficients) x1 , · · · , xn . In (1.11), the right-hand side is the parameterization of the left-hand side vector x (cf. (1.8)). Displaying only the coefficients of (1.11), we may write as12 ⎞ x1 ⎜ ⎟ x = ⎝ ... ⎠ = t (x1 , · · · , xn ). ⎛
(1.12)
xn The number n of the elements of the basis is the dimension (or the degree of freedom) of the linear space X . A member of an n-dimensional linear space can be represented by n components (coefficients) x j ∈ K ( j = 1, · · · , n). So we denote by Kn the n-dimensional linear space over the field K. The parameterization of a vector is simple if the components are mutually independent. Here “independence” means, geometrically, the orthogonality of the basis vectors; to axiomatize these concepts, we have to first define the inner product. Let X be a real (or complex) linear space. We define a map from an arbitrary pair x, y ∈ X to a real (or complex) number, which we denote by (x, y). This (x, y) is called the inner product if the following conditions are satisfied: (x, x) 0, and (x, x) = 0 is equivalent to x = 0. (x, y) = ( y, x).
(1.13) (1.14)
(a1 x 1 + a2 x 2 , y) = a1 (x 1 , y) + a2 (x 2 , y).
(1.15)
The inner product of real vectors x and y ∈ Rn are often written as x · y. Two vectors x and y are said to be orthogonal if (x, y) = 0. An orthonormal basis is a system of basis vectors e1 , · · · , en such that ei · e j = δi j .13 When we choose an orthonormal basis, the components of a vector can be easily evaluated: x j = (x, e j )
( j = 1, · · · , n).
(1.16)
Recalling (1.8), this calculation of the components (parameterization) may be regarded as the “measurement” of an object = vector; the basis means the system of units to measure the object.14 We may choose a general basis B = {e1 , · · · , en } consisting of vectors which are not necessarily orthogonal to each other. Then we have to generalize the relation 12
In this book, we normally represent a vector as a column vector (displaying the coefficients in a vertical array). When we need to save space, we write it as the transpose of a row vector, i.e., t (x1 , · · · , xn ). The word “normal” means that the length of each vector (given by (e j , e j )1/2 ) is 1, and the word “ortho” means the orthogonality (e j , ek ) = 0 ( j = k). 14 Endowing Rn with an orthonormal basis and the inner product x · y = n x y , we can j=1 j j identify it as an n-dimensional Euclidean space. 13
1.3
The Territory of Linear Theory
15
between the basis and the coefficients of a vector with introducing the notion of dual space. Given the B, its dual basis B ∗ = {e1 , · · · , en } is the system of vectors that satisfy the orthogonality condition: (e j , ek ) = δ jk . The linear space span by B ∗ is called the dual (or adjoint) space of X and denoted by X ∗ . An arbitrary vector x ∈ X can be decomposed as x=
n
x jej
x j = (x, e j ) .
(1.17)
j=1
On the other hand, y ∈ X ∗ is decomposed as y=
n
yj ej
y j = ( y, e j ) .
(1.18)
j=1
Following Einstein’s rule of summing over one upper and one lower index, we abbreviate j a j b j to a j b j . Then we may write x = x j e j = x j e j . In this book, bases are chosen to be orthonormal unless otherwise specified. We use lower indexes both for the basis vectors and for the coefficients (to avoid confusion with powers).
1.3.3 Graphs—Geometric Representation of Laws We are now going to compare nonlinear with linear in a linear space. As pointed out in the preceding subsection, the “linear structure” is marked to a linear space by axiom (1.10) of the law of vector composition. Therefore nonlinearity has already been discriminated against as “distortion” in the linear space. How is “linearity” integrated with space? And what is “distortion” of nonlinearity? For the clarification, we study the structure of a graph immersed in space. Here, a graph is a geometric object representing a relation among parameters. Let us consider n variables x1 , · · · , xn (assumed to be real-valued) which are related to each other by an equation: F(x1 , · · · , xn ) = 0.
(1.19)
The set of variables satisfying relation (1.19) appears as a graph that is a hypersurface in the linear space Rn (see Fig. 1.2).15 A hypersurface is an (n − 1)-dimensional manifold that is immersed in the n-dimensional Euclidean space. Here an m-dimensional manifold (topological manifold) is a geometric object that
15
16
1 What Is NONLINEAR?
x1
x3
x2
Fig. 1.2 A mathematical law of science can be represented by a graph—a manifold immersed in a linear space. This figure shows a graph that has a “pleat.” When the graph is folded in some direction (x1 in this figure), relation (1.19) is multi-valued in determining x1 as the function of the other parameters
If we can solve (1.19) for some variable (for example x1 ) to obtain a relation x1 = f (x2 , · · · , xn ),
(1.20)
law (1.19) is translated into a map from (x2 , · · · , xn ) ∈ Rn−1 to x1 ∈ R. The function f (x2 , · · · , xn ) which is determined by relation (1.19) is called the implicit function. In general, however, the relation between x1 and x2 , · · · , xn may not be single-valued (see Fig. 1.2).16 Therefore the expression of law in the form of (1.19), which can describe multi-valued relations, is more general than the form of (1.20) which is restricted to single-valued relations. We may consider a more general case where n variables x1 , · · · , xn are related by simultaneous ν (< n) relations: Fk (x1 , · · · , xn ) = 0
(k = 1, · · · , ν).
(1.21)
If we can solve (1.21) for x1 , · · · , xν , we obtain a system of implicit functions: xk = f k (xν+1 , · · · , xn )
(k = 1, · · · , ν),
(1.22)
is locally (i.e., in the neighborhood of an arbitrary point on the manifold) identical (homeomorphic) to a Euclidean space and is a Hausdorff space (i.e., the second axiom of separation holds so that distinct points have disjoint neighborhoods). 16
The implicit function theorem gives the condition under which the implicit function is uniquely determined; it basically says that a critical point becomes the obstacle in defining the implicit function (cf. Sect. 1.4.4).
1.3
The Territory of Linear Theory
17
x1 F1(x1, x2, x3) = 0
x3 F2(x1, x2, x3) = 0
x2
Fig. 1.3 The graph of a set of simultaneous relations is the intersection of hypersurfaces (each of them is the graph of one equation). If we give two relations for three variables (x1 , x2 , x3 ) ∈ R3 , the graph is a (3 − 2 = 1)-dimensional manifold, that is a curve in R3
which determines a map from (xν+1 , · · · , xn ) ∈ Rn−ν to (x1 , · · · , xν ) ∈ Rν (see Fig. 1.3). When we represent a graph by a relation y = f (x), we call x the independent variable and y the dependent variable. When the linear space X of independent variables and the linear space Y of dependent variables are defined separately, the graph is a subset (manifold) in the product space X × Y . Here a product space is the linear space of the combined variables {x, y} (x ∈ X , y ∈ Y ), where the law of vector composition (defining the sum and the scalar multiple) are induced by those of X and Y : {x1 , y1 } + {x2 , y2 } = {x1 + x2 , y1 + y2 },
α{x, y} = {αx, αy}.
(1.23)
We may identify Rn × Rm = Rn+m . The map f (x) is not necessarily defined everywhere on X . The subset of X where f is defined is called the domain of f . The totality of the values of f (x) (i.e., the image of the domain of f ) is called the range of f , which is a subset of Y . Up to now, we have discussed the general relations between graphs and maps (functions). Hereafter, we shall study the special features of graphs in linear theory. First of all, we need to define what linear law is. As mentioned in Sect. 1.1.1, linear law is a generalization (to higher dimensions) of proportionality relation; this generalization can be formally done as follows. Let f be a map from a domain U to a range V . We assume U ⊂ X and V ⊂ Y , where both X and Y are real (or complex) linear spaces. This f is said to be a linear map (or linear operator), if, for every x, x ∈ U and arbitrary scalars a, b ∈ K = R (or C), a relation
18
1 What Is NONLINEAR?
f (ax + bx ) = a f (x) + b f (x )
(1.24)
holds. To demand this relation for all x, x , a and b, both U and V must be linear spaces. In what follows, we assume U = X and V = Y . We denote by G the set of variables {x, y} ∈ Z = X × Y satisfying the relation y = f (x) for a linear map f . We can easily verify that G (the graph of f ) satisfies axiom (1.10) of the linear space (see Problem 1.2). On the other hand, for the graph G to be a linear (sub-)space, the condition (a) of (1.10) demands that relation (1.24) holds. Therefore the fact that f is a linear map and the fact that its graph G is a linear space (represented by a “plane” immersed in the linear space Z = X × Y ) are equivalent.17 As will hereinafter be described, the geometric representation of a plane graph G gives the explicit form of a linear map f . We first define a basis of Z = X × Y (here we assume that both X and Y are real, for simplicity). Let {e1 , · · · , en } and {ε1 , · · · , ε m } be the orthonormal bases of X and Y , respectively. We represent z ∈ Z in terms of the components: z = t (x1 , · · · , xn , y1 , · · · , ym ). A plane graph G is either a hyperplane (when m = 1) or the intersection of m (> 1) hyperplanes. A hyperplane that includes the origin of Z can be defined as the totality of the vectors that are perpendicular to a certain vector a (see Fig. 1.4). The condition for z and a = t (a1 , · · · , an , b1 , · · · , bm ) to be perpendicular is that x3 a
x2
x1
Fig. 1.4 The graph of a linear law is represented by a linear subspace immersed in the space X × Y . The graph is given as a hyperplane or the intersection of hyperplanes; each hyperplane is characterized by its normal vector a
17
The word “linear” originates form the fact that the graph of a proportionality relation is given by a “line.” In a higher-dimensional space, an arbitrary “cross-section” of the graph of a linear law is a straight line (see Fig. 1.4).
1.3
The Territory of Linear Theory
a·z=
19 n
ajxj +
j=1
m
bk yk = 0.
(1.25)
k=1
Let us first assume m = 1. If the basis vector ε1 is not perpendicular to a, i.e., if b1 = 0, we can solve (1.25) for y1 and define the implicit function f : (x1 , · · · , xn ) → y1 . Introducing a 1 × n matrix L = (α1 , · · · , αn ) (α j = −a j /b1 ; j = 1, · · · , n), we may write f as f (x) = L x.
(1.26)
When m vectors a(1) , · · · , a(m) are given, and they are mutually independent (i.e., none of them are parallel to each other), we can define m hyperplanes G 1 , · · · , G m such that G j is perpendicular to a( j) ( j = 1, · · · , m). The intersection of G 1 , · · · ,G m is the totality of the points satisfying a
( )
·z=
n
a ( ) j xj
j=1
+
m
bk( ) yk = 0
( = 1, · · · , m).
(1.27)
k=1
Solving (1.27) for y1 , · · · , ym , we obtain the representation of the implicit function f : (x1 , · · · , xn ) → (y1 , · · · , ym ). Introducing matrices ⎛
a1(1) · · · ⎜ A = ⎝ ...
⎞ an(1) .. ⎟ , . ⎠
a1(m) · · · an(m)
⎛
b1(1) · · · ⎜ B = ⎝ ...
⎞ bm(1) .. ⎟ . ⎠
b1(m) · · · bm(m)
and assuming that det B = 0, we define an n × m matrix L = −B −1 A. With this L, the implicit function f (x) is written in the form of (1.26). Thus we find that every linear map f (x) can be associated with a matrix L to be written as (1.26).18 The linearity condition (1.24) demands that the graph G must include the origin, i.e., f (0) = 0. Parallel shift of the graph, allowing it to deviate from the origin, is a trivial transformation. So, a law that is represented by such a shifted graph is also called a linear law. When we have to distinguish such a generalized linear law from the previously defined one, the generalized one is called an inhomogeneous linear law. The map representing a generalized linear law is given by adding an inhomogeneous term (a constant vector c ∈ Y ) to the right-hand side of (1.26). We have clarified that a linear law is represented by a graph that, in itself, is a linear subspace; by decomposing a subspace from the space, we obtain a graph of 18
If the bases of X and Y are not orthonormal, we have to first apply a linear transformation to orthogonalize the bases. Then, the linear map can be cast into the form of (1.26).
20
1 What Is NONLINEAR?
a linear law. In this sense, the graph of a linear law can be regarded as a part of the “structure” of the “space,” or, to put it in the opposite way, we can structure the space by the linear law—this will give us a strong strategy of linear theory in elucidating the order implied by the law (see Sect. 2.3). In comparison with the “plane graph” of a linear law, the graph of a nonlinear law is distorted (the plane tangent to the curved graph is the “linear approximation” of the nonlinear law). The theme of nonlinear science is the phenomena that only distorted graphs can describe. In the next section, we shall study the basic “patterns” of nonlinearity. Before that, though, we have to learn the exponential law which, together with the proportionality law, constitutes the core of linear theory.
1.3.4 Exponential Law The exponential law (in its higher-dimensional generalization) plays the central role in the linear theory of dynamics (study of temporal evolution). First, let us explain the connection between the proportionality relation and the exponential law by recalling the calculation of “compound interest.” Suppose that, for a principal x0 , an interest αx0 is given every year (α is a constant). The point is that the interest (increment) is proportional to the principal at every moment when it is evaluated. Adding the interest to the principal, the balance becomes (1 + α)x0 after 1 year and becomes the principal for the next 1 year. After n years, the balance will become (1 + α)n x0 . This geometric progression—also known as Malthusian law in the theory of population dynamics— is a “discrete” exponential law where the increment occurs stepwise. In the limit of continuous time, i.e., if the interest is added to the principal at each infinitesimal passage of time, the evolution of the balance is described by the exponential function eta x0 , where t is the continuous time and x0 is the initial (t = 0) value. The coefficient a (or, sometimes, its reciprocal a −1 ) is called the time constant, which determines the time scale of the exponential function. We may generalize the exponential function allowing both a and x0 to be complex numbers. When a is a pure imaginary number, we find that the exponential law describes “oscillation”; writing a = iω (ω ∈ R), eta = cos (ωt) + i sin (ωt). For a general complex a, the real part of a (denoted by a) yields exponential growth (when a > 0) or damping (when a < 0). We say that eta is stable when a 0 and unstable when a > 0. The imaginary part of a (denoted by a) represents the angular frequency of oscillation. The exponential function is derived by a linear differential equation. The rate of change of a certain quantity x(t) is given by the differential coefficient d x(t)/dt. If this rate of change is proportional to x(t), the evolution of x(t) is described by d x = ax, dt
(1.28)
1.3
The Territory of Linear Theory
21
where a is a constant (proportionality coefficient). Given an initial condition x(0) = x0 , we integrate (1.28) to obtain x(t) = eta x0 .
(1.29)
A differential equation with respect to time t is called an equation of motion or an evolution equation. In mathematics, a differential equation governing functions depending on only one independent variable (often taken as time) is called an ordinary differential equation (abbreviated as ODE). The problem of solving an equation of motion with a specified initial condition is called an initial-value problem. We use the word “motion” not only to mean the spatial movement of a body but also to mean a general temporal evolution. If fact, we often encounter differential equations like (1.28) in various problems. For example, x(t) may denote the number of a certain bacteria, or the number of patients infected by the bacteria. If their increment is proportional to x(t), the “motion” of x(t) obeys (1.28). In another example, x(t) may denote the speed of a body that is moving in a viscous environment. If friction force, which is proportional to speed, dominates the motion, x(t) obeys (1.28) with a negative a (because friction force works to diminish speed); the solution x(t) = eta x0 describes the exponential damping of the speed (x0 is the initial speed). As shown in these examples, the exponential function is expected to give the archetypal expressions of various “motions.” However, we must admit that such considerations are rather unnatural (in fact, shortsighted) if compared with common sense. The linear equation of motion asserts that bacteria (or patients) will keep exponential increasing (Malthusian explosion) or that friction damping does not stop the body (speed x(t) = eta x0 cannot become zero within a finite time). These conclusions conflict with our understanding of actual phenomena, revealing the limitations of linear theory. How can the proliferation of certain species maintain a balance in an ecological system? Why can brakes stop a car? In Sect. 1.4, we shall show somewhat better analyses—for such improvements, we will have to consider “nonlinearity”. Let us return to the subject of this section; we are reviewing the range of linear theory. Apropos of the exponential law, we generalize it to higher dimensions. In the afore discussions, we have seen the relation between the exponential function and the differential equation (1.28)—the exponential function is “generated” by the linear differential equation. In (1.28), the dependent variable x has just one dimension. Here, we consider an n-dimensional dependent variable x. The proportionality coefficient a of (1.28), then, is generalized to an n × n matrix operator A; see the general form of a linear map (1.26). The evolution equation is now generalized to a system of simultaneous differential equations: d x = Ax. dt
(1.30)
Providing an initial value x(0) = x 0 ∈ Cn , we solve the initial-value problem of (1.30) to generate the exponential function of the matrix operator A—let us write the solution “formally” as
22
1 What Is NONLINEAR? x3
x2 x1
Fig. 1.5 The motion generated by a system of linear ODEs (1.30). Three orbits that start from different initial conditions are shown. In this example, the “generator” A is an antisymmetric matrix (i.e., the adjoint matrix A∗ satisfies A∗ = −A). Then we can show that et A is a unitary matrix (i.e., (et A )∗ = (et A )−1 = e−t A ) and that the motion is periodic (represented by a closed orbit); see Sect. 2.3
x(t) = et A x 0 .
(1.31)
The problem is what the exponential function (operator) et A is. Knowing this has much deeper meaning than solving (1.30) for a particular initial value (i.e., finding a particular solution). The operator et A gives a map from an arbitrary initial value x 0 to a state x(t) at any time t (see Fig. 1.5). There are several different ways to define the exponential function et A .19 Here, we adopt the simplest method: invoking the power series of the usual exponential function, we “define” et A by tA
e
=
∞ (t A)n n=0
n!
.
(1.32)
Note that we are defining the left-hand side by the right-hand side, not Taylorexpanding the left-hand side into the right-hand side. Since this et A is an “analytic function” of t (i.e., the power series converges uniformly for any t; see Problem 1.3), we can prove the following exponential law by plugging the power series expressions into the exponential functions and comparing each of the coefficients of the power series: es A et A = e(s+t)A .
(1.33)
It is obvious that e0A = I (identity). Hence et A x 0 = x 0 (initial value). Differentiating both sides of (1.32) with respect to t (since the right-hand side is analytic, we can differentiate it by each individual term), we obtain
19
The general theory of the exponential law (linear theory of motion) will be described in Sect. 2.3.
1.3
The Territory of Linear Theory
23
d tA e = Aet A . dt
(1.34)
It is now shown that et A x 0 solves the ODE (1.30) with an arbitrary initial value x 0 . We conclude this section with an outline of the direction of contemporary linear theory; it is expanding its territory into the field of “infinite-dimensional spaces.” For example, a function u(x) can be viewed as an infinite-dimensional vector; since a function can be smoothly deformed into any shape, its parameterization needs an infinite degree of freedom (cf. Note 1.1). Mathematical manipulations of u(x) can be regarded as operators applying to an infinite-dimensional vector, generalizing the notion of matrix operators. For example, the Laplacian Δ=
∂2 ∂2 ∂2 + + ∂x2 ∂ y2 ∂z 2
(1.35)
is a “differential operator” applying to some class of functions.20 If we can generate the exponential function (operator) etΔ , the function u(x, y, z, t) = etΔ u 0 (x, y, z) satisfies the diffusion equation ∂ u = Δu, ∂t
(1.36)
with the initial condition u(x, y, z, 0) = u 0 (x, y, z). Similarly, Schr¨odinger’s equation of quantum mechanics i
∂ ψ = −Δψ ∂t
(1.37)
can be solved as ψ(x, y, x, t) = eitΔ ψ0 (x, y, z). To embody the operators etΔ and eitΔ , we need the theory of linear operators in function spaces. Since a function space is of infinite dimensions, a general linear operator cannot be represented by a finite-dimensional matrix. We cannot invoke the power series (1.32) to define etΔ and eitΔ , because the differential operator Δ is not a continuous map. However, constructing an appropriate theory by which we can compose/decompose functions (infinite-dimensional vectors), we can generate exponential (and even other analytic) functions of operators (and solve the differential equations generating these functions) (cf. Note 2.1). 20 To define an appropriate domain of a differential operator, we have to specify the boundary condition (see Problem 1.4). For example, we assume that u(x) = 0 on the boundary of the domain (so-called Dirichlet boundary condition). When u(x) is a temperature governed by the heat diffusion equation (1.36), this boundary condition means that the domain is surrounded by a heat bath of temperature = 0. When u(x) is the wave function governed by Schr¨odinger’s equation (1.37), it means that the domain is surrounded by an infinitely high potential wall. We also have to assume the differentiability of u(x); cf. Note 4.1. For systematic study of the boundary values of functions and regularity, the reader is referred to Lions & Magenes [9].
24
1 What Is NONLINEAR?
1.4 Nonlinearity—Phenomenology and Structures 1.4.1 Nonlinear Phenomena As we have seen in the preceding section, the linear theory of motion can describe only exponential changes. For example, the proliferation of some living organisms may certainly be exponential for a short time. However, this cannot continue permanently. When the number of individuals increases too much, the environment changes and the reproductive rate decreases. As a result, the population will saturate. The population of certain kinds of insects (that do not overlap reproduction, and change generations all together) are observed to cause complex oscillations—Robert M. May (1936–) studied such a phenomenon mathematically in detail as a typical example of “chaos” [11] (cf. Sect. 2.3.5). There are more complicated phenomena in the real world; various ecosystems show both stable and unstable phases. Sometimes they are able to continue this way, but in other cases they eventually cease to exist. In the study of actual ecosystems, therefore, the key issue is to clarify the reason why the change of the populations deviates from linear theory (the exponential law), i.e., how nonlinearity emerges. We can say that nonlinearity is a mathematical representation of autonomia of a system. In a “nonlinear ecosystem,” the coefficient of reproductive rate (the “time constant”) is not an a priori constant, but it changes in response to the condition of the ecosystem (i.e., as a function of the population). By this nonlinearity (autonomous change of the reproductive rate), the ecosystem may achieve stability or produce complex (unpredictable) oscillations. The range of autonomia is not necessarily consolidated in a single parameter (variable). An actual ecosystem is a system of high dimensions, in which many species, as well as resources, are tightly connected. The autonomy of such a system means the cooperation of many elements. In a linear system, however, elements are basically decomposable, and the motion of each element is described by the exponential law (to be discussed in Sect. 2.3). The possibility of such “decomposition” is indeed the essence of linear theory, which visualizes the “order” of motion from the perspective of reductionism. In a nonlinear system, the elements (components) involved are not decomposable; they interact variously producing extremely complex “entanglements.” Figure 1.6 gives a geometric image of such entanglements. The complex curve in this figure was produced by numerically integrating a system of nonlinear ODEs (known as Lorenz’s equations) that was formulated by meteorologist E.N. Lorenz (1917–2008) to model the complex behavior of the atmosphere [10]. Starting from a certain initial condition, the movement of the state vector (consisting of three parameters) draws an extremely complicated orbit (compared with Fig. 1.5 which shows a typical simple motion of a linear system). The orbit irregularly alternates between two foliaceous areas like butterfly wings. Lorenz’s equation is often used as a simple example of a nonlinear differential equation that produces chaos. The linkage of undecomposable multiple degrees of freedom can often produce sophisticated “functions” in nonlinear systems. It is understood that organisms
1.4
Nonlinearity—Phenomenology and Structures
25 z
y
x
Fig. 1.6 A complex curve produced by solving Lorenz’s equation, which models the complexity (chaos) of atmospheric motion (after Lanford [7])
skillfully apply nonlinearity in various domains of life activities. For example, the transmission of information on the nervous system is, in a microscopic view, a propagation of signal pulses by electrochemical reactions, which can be modeled by a circuit characterized by strong nonlinearity. In the field of brain science, researchers are trying to understand the functions of the brain, such as perception, memory, recognition, from the nonlinear properties of nervous signal transmissions. Other researchers are designing computers simulating brain mechanisms. We may say that almost all of the contemporary problems of science and technology are dealt with in the nonlinear science. Besides the aforementioned examples, there are many other nonlinear phenomena—the organization of the beautiful spirals of galaxies, explosions (flares) on the sun, chaotic oscillations in the atmosphere or geosphere, complex movement of eddies in fluids, creation of biological bodies, evolution of species, and so on. The central problem to be studied is how they self-organize their structures or self-regulate their dynamics.
1.4.2 The Typology of Distortion The word nonlinear emphasizes the confrontation with linear, but it does not specify any concrete feature; in terms of the graph, nonlinearity means only that the graph is “not flat.” Thus we have to begin by categorizing how a graph can be distorted and what can happen as a result.21 There are two types of distortion in a graph: one is the accelerating-type and the other is the decelerating-type (see Fig. 1.7). The accelerating-type nonlinearity 21
The reader is referred to Note 1.3 for a different angle of categorization highlighting the analytical aspect of nonlinearity.
26
1 What Is NONLINEAR? y
(a)
(b) 0
x
Fig. 1.7 Graphical images of (a) accelerating-type and (b) decelerating-type nonlinearities
in a certain effect yields a positive feedback to the effect, resulting in “explosive behavior.” Conversely, decelerating-type nonlinearity yields a negative feedback and brings about “saturation.” Both of them may coexist in a high-dimensional system. For example, let us consider a model of proliferation of some living organism. We denote the population by x(t) (for the simplicity of the latter arguments, we do not restrict x to being an integer number but allow it to be a real number). We assume that the rate of the change of x(t) for a unit of time is given by ax(t). If the coefficient a is a constant, the proliferation is a linear process; see (1.28). The linear evolution equation, then, is d x = ax, dt
(1.38)
which, under the initial condition x(0) = x0 (> 0), gives the solution x(t) = eat x0 , the exponential law. When the coefficient a is modified depending on x, the process becomes nonlinear. For instance, let us assume a(x) = b(1 + εx) with real constants b (> 0) and ε. The evolution equation (1.38) is modified as d x = b(1 + εx)x. dt
(1.39)
An ε > 0 yields an accelerating-type nonlinearity, while an ε < 0 yields a decelerating nonlinearity. When ε = 0, (1.39) reduces into the linear equation. Solving (1.39) with the initial condition x(0) = x0 (> 0), we obtain22 x(t) =
1 e−bt (x0−1
+ ε) − ε
.
(1.40)
By transforming the variable as y = ε + x −1 , (1.39) is rewritten as a linear differential equation dy/dt = −by. Solving this for y(t) and transforming back to x(t), we obtain the solution.
22
1.4
Nonlinearity—Phenomenology and Structures
27
x (a) (b)
x0 t*
t
Fig. 1.8 Two types of nonlinear proliferation processes: (a) Accelerating-type nonlinearity causes “blow-up” (explosion) at a finite time t = t ∗ . (b) Decelerating-type nonlinearity suppresses the instability and brings about “saturation”
Figure 1.8 shows the typical behavior of the solution. When ε = 0 (linear), the solution describes exponential growth x(t) = ebt x0 . The accelerating-type nonlinearity (ε > 0) yields a faster growth, and the solution “blows up” at a finite time t = t ∗ = b−1 log [1+(εx0 )−1 ]. On the other hand, the decelerating-type nonlinearity (ε < 0) brings about “saturation,” i.e., x(t) → −ε−1 (as t → +∞). This is because the decelerating-type nonlinearity suppresses the instability.
1.4.3 Nonlinearity Emerging in Small Scale—Singularity The basic meaning of nonlinearity is “deviation from the proportionality relation.” In a “regular” phenomenon, nonlinearity emerges when the variation of variables becomes large; distortion of the graph becomes apparent when it is viewed on a large scale. Here, regular means, basically, that “we can start from a commonsensical proportionality relation, while at large variation, some distortion becomes apparent.” The mathematical explanation is that the Taylor expansion has a non-zero radius of convergence; see Sect. 1.2.2. In some cases, however, we have to think about “irregularity” that is a nonlinearity arising abruptly by small changes of variables—we cannot assume the proportionality relation even on a small scale. In other words, “irregularity” is a nonlinearity in which the scale is reduced to zero. The scale of the region where the linear approximation applies is evaluated by the radius of convergence of the Taylor expansion; see Sect. 1.2.2. Hence “irregularity” means the impossibility of the Taylor expansion, i.e., the convergence radius = 0. In a graph, irregularity appears as a break or discontinuity. The place where irregularity occurs is called a singularity (it is not necessarily a “point”; it may be a certain set of higher dimensions). For example, let us consider a function f (x) = |x| p
( p 1).
(1.41)
The graph of f (x) has a sharp peak at x = 0 (see Fig. 1.9). If we try to Taylorexpand f (x) around x = 0, the coefficient of the first-order term f (1) (0) (i.e.,
28
1 What Is NONLINEAR? y
x
0
Fig. 1.9 A graph with a singularity. Here we plot y = |x|1/2
the proportionality coefficient) is not determined uniquely. Therefore we may not approximate f (x) by a linear function in any small neighborhood of x = 0 (the singularity). Especially if p < 1, f (x) approaches f (0) abruptly with x → 0, and, conversely, it changes violently if x deviates slightly from 0. Let us observe how a singularity affects an ecosystem. We consider a decreasing process governed by d x = − |x|. dt
(1.42)
Given an initial condition x(0) = x0 (> 0), we solve (1.42) to obtain x(t) =
(t0 − t)2 /4 0
√ (t < t0 = 2 x0 ), (t t0 ).
(1.43)
We observe that x(t) goes to 0 at a finite time t0 , i.e., “extinction” occurs (see Fig. 1.10). Recall that the exponential law of linear theory does not explain extinction (Sect. 1.3.4). To modify linear theory in the small-scale range (near x = 0), we need a nonlinearity that is strong near x = 0, which is the singularity at x = 0. A singularity causes a stranger phenomenon. In the model (1.42) of an ecosystem, we assumed that x(t) cannot take a negative value, because it represents the number of living organisms. But, we may consider that this equation is a model x x0
t1 t0
t 1’
t1 ” t
Fig. 1.10 Nonlinear (irregular) behavior caused by a singularity. Extinction occurs in a finite time. Or uniqueness of the route of evolution can be destroyed
1.4
Nonlinearity—Phenomenology and Structures
29
of another system where x(t) may also take negative values; for example, let us assume that x(t) is the speed of a body, and the right-hand side of (1.42) describes some nonlinear force. After x(t) becomes 0, it can make a departure from 0 at an arbitrary time t1 t0 and continue to decrease into the negative range, leading to the following solutions: ⎧ ⎨ (t0 − t)2 /4 x(t) = 0 ⎩ −(t − t1 )2 /4
√ (t < t0 = 2 x0 ), (t1 t t0 ), (t > t1 t0 ).
(1.44)
Here, t1 is an arbitrary number (not smaller than t0 ). Hence the solution of (1.42) loses its uniqueness when it touches the singularity (see Fig. 1.10). If the solution of an initial-value problem of an evolution equation is not unique, the motion is unpredictable, because there are multiple possible states, in a future time, which originate from a common initial state.23 For “regular” dynamics, however, we can verify the uniqueness of the solution of the initial-value problem (for the mathematical theory of the existence and uniqueness of the solutions of initialvalue problems, see Note 1.2).
1.4.4 Nonlinearity Escaping from Linearity—Criticality There is another possibility of abrupt nonlinearity, in addition to the aforementioned singularity (irregularity), that prevents any linear approximation. It is the case where the law is regular (Taylor-expandable) but its proportionality coefficient (the first derivative) vanishes. The point x0 where the first-order term of the Taylor expansion vanishes is called a critical point. In the neighborhood of a critical point, we may first approximate y ≡ f (x0 ) and consider that y is independent of x. However, if we increase the accuracy of the observation, we find a nonlinear (higher-order) relation y − f (x0 ) =
f (2) (x − x0 )2 + · · · . 2
Here, what we mean by “increasing the accuracy” is, in mathematical terms, to magnify the variation of y by multiplying a factor δ −1 (|δ| 1), i.e., it is a transformation Y˘ = [y − f (x0 )]/δ. This δ is the scale that we choose to observe the variation of y. When we choose δ such that | f (2) /δ| ≈ 1, we can see the nonlinear variation of y in the range of |x − x0 | < 1. If we apply the same transformation to a linear relation (or a general relation in which the linear approximation does not degenerate), it changes the magnitude of
23
The present meaning of “unpredictability” differs from the “difficulty of prediction” which we shall discuss in the theory of chaos (see Sect. 3.1).
30
1 What Is NONLINEAR?
the proportionality coefficient, but the basic nature of the relation is conserved. At the critical point, however, the scale transformation causes a metamorphose of the local profile of the relation. A critical point is the place where the “tendency” changes—increasing turns to decreasing, or vice versa. In a higher-dimensional space of variables, the notion of criticality becomes richer. Let x ∈ X = Rn be the independent variables, and f (x) be a smooth function whose range is Y = R.24 We denote by G the graph of f , which is a hypersurface immersed in Z = X × Y = Rn+1 . Let G be the tangential plane with respect to G contacting at (x, y) = (x 0 , f (x 0 )). If G is parallel to all basis vectors e j ∈ X , we call x 0 the critical point of f (x). The corresponding f (x 0 ) is called the “stationary value.” The maximum value or minimum value (both are commonly called extremal values) is a stationary value, but they are not the only ones (see Fig. 1.11). For a more general map f (x) whose range Y is a multi-dimensional set, the criticality of each component of the map can be similarly defined. The case of X = Y = Rn is most important; the map f : Rn → Rn can be viewed as the coordinate transformation. The n × n matrix function J with elements defined by ∂ f j /∂ xk is called Jacobi’s Matrix. In the neighborhood of x 0 , the coordinate transformation can be approximated by the linear transformation given by the matrix J | x=x 0 . We denote by D( f 1 , · · · , f n )/D(x1 , · · · , xn ) the determinant of Jacobi’s Matrix and call it Jacobian. The point where the Jacobian becomes zero is the critical point, where the coordinate transformation is degenerate (not bijective). Critical points seem to be “special.” But, interestingly, nature (or society) tends to generate them spontaneously. An example is a dune in an hourglass. When sands pile up and the inclination of the dune steepens, the slope becomes unstable and a
Fig. 1.11 Two types of critical points (left: minimum point, right: stationary point). The tangent plane (linear approximation) degenerates at critical points
We say that a function is “smooth” if it has continuous derivatives of any order (C ∞ -class). Regularity (which means that the convergence radius of the Taylor expansion is greater than 0) is a stronger condition. To define a critical point, we need only that the function has a continuous first derivative (C 1 -class).
24
1.4
Nonlinearity—Phenomenology and Structures
31
small collapse occurs; the collapse flattens the slope a little bit, resulting in selfstabilization; then sands start piling up again. The repetition of these processes causes oscillations between the stable phase and the unstable phase; the “boundary” is the critical point where the two opposite tendencies (stable/unstable) switch. There is a fundamental difference between the oscillation of a pendulum and the above-mentioned oscillation around the critical point. The former is a motion around a stable equilibrium (the stationary state without motion). But, in the latter, the critical point is not a stable equilibrium; it is a “marginal” point. The oscillations are repetitive (but irregular) changes of the structure caused by the competition of the external drive (supply of sands) and the spontaneous collapse. The collapse has a self-stabilizing effect—this decelerating-type nonlinearity plays an essential role in the self-organization of the criticality. Because the linearity loses power in the neighborhood of a critical point, a variety of interesting nonlinear phenomena can stem from there. In the next subsection, we shall explain how a criticality produces an essential difference compared with the linearity-dominated world.
1.4.5 Bifurcation (Polyvalency) and Discontinuity Coexistence of multiple solutions is one of the most important issues that only nonlinear theory can fully describe. The change of the number of solutions (so-called branches) is called bifurcation. As we shall see, bifurcation can occur at a critical point. The solution of a linear equation is either “unique” or “indeterminate,” if it exists (“nonexistence” is the other possibility). Here, “indeterminate” means that the equation has an (uncountably) infinite number of solutions. Therefore, it is impossible for a linear equation to have a finite number of multiple solutions. First, let us confirm this fact. Let f be a linear map whose domain X and range Y are subspaces of real (or complex) linear spaces. For a given y ∈ Y , we solve a linear equation y = f (x)
(1.45)
to find x ∈ X . Suppose that this equation has two different solutions, i.e., for a single y, there exist two different elements x1 and x2 such that y = f (x1 ) and y = f (x2 ). Subtracting both sides of these equations, and using the linearity of f , we obtain 0 = f (x1 ) − f (x2 ) = f (x1 − x2 ). Choosing an arbitrary real (or complex) number α, we define xα = αx1 + (1 − α)x2 .
32
1 What Is NONLINEAR?
By our assumption x1 = x2 , xα = x1 as far as α = 1, and xα = x2 as far as α = 0. By the linearity of f , we obtain f (xα ) = f (x2 ) + α f (x1 − x2 ) = f (x2 ) = y, which implies that xα (for any α) solves (1.45). Therefore, if a linear equation has two different solutions, it has an (uncountably) infinite number of solutions, i.e., the equation is indeterminate. The “indeterminate” situation for a linear equation (1.45) means that the value of the map f (x) is “stationary” at y. For a linear map f (x) to be stationary, there must be a certain linear subspace of X where f (x) takes the same value y. In contrast, a nonlinear equation may have a finite number of multiple solutions. Now we consider an equation of the form of (1.45) with a nonlinear map f : R → R. The intersection of the graph of f (x) with the horizontal line at y gives the solution (see Fig. 1.12). When the graph has overlap in the horizontal direction, multiple branches appear. A “pleated graph” such as Fig. 1.12 appears, for example, in a model of a nonlinear electric circuit. Let x be the electric current flowing through a certain device. Suppose that a voltage y is applied to this device. Relating x and y as y = Rx, we can define the impedance (resistance) R. If R changes as a function of x, “Ohm’s Law” becomes nonlinear. Let us assume R(x) = a(x − c)2 + b,
(1.46)
where a, b, and c are positive constants. As x increases, R(x) decreases in the range of x < c, but it increases in the range of x > c. Such a phenomenon occurs when two different mechanisms compete to determine the electric conduction in a device. When a voltage y is given (i.e., if the device is connected to a constant-voltage power supply), the current x is determined by solving y = [a(x − c)2 + b]x for x. y (1)
(3) (1) x
Fig. 1.12 A pleated graph and bifurcated solutions. When we solve an equation y = f (x) for x, the number of solutions changes depending on the value of y; in the range (1) there is one solution and in (3) there are three solutions. The change of the number of solutions occurs at y where f (x) is stationary, i.e., the critical points of f (x) are the bifurcation points of the branches
Notes
33
For some choices of parameters a, b, c, and y, there may exist three branches of solutions (however, the intermediate-value solution is unstable; see Problem 1.5). From Fig. 1.12, it is clear that the overlap (in the horizontal direction) of the pleats of the graph starts (or finishes) at the critical point. As we discussed in the previous subsection, a regular function f (x) is approximated by a linear function in the neighborhood of an “ordinary” (non-critical) point. The solution of the corresponding linear equation (which is uniquely solvable) approximates the solution of the exact nonlinear equation in the neighborhood of the ordinary point. Therefore, the bifurcation point (where multiple solutions exist in the vicinity) must not be an ordinary point; hence it must be a critical point (where the linearized equation becomes indeterminate). When we vary y as a parameter, each solution x of the equation y = f (x) shifts on the corresponding branch. Normally, each x responds “continuously” to the small variation of y. However, the change in x becomes violent when it approaches the critical point; the branch may disappear, in which case the solution must then jump to another branch; see Fig. 1.12. Such a discontinuous change of the state is often compared with a “catastrophe” that occurs unexpectedly. We normally construct understanding of a phenomenon as a continuous extension of infinitesimal changes; we usually do not have a grand perspective of bifurcated branches. If the graph were discontinuous, abrupt change of the solution could be expected. However, the foregoing example shows that the solution can jump even if the graph is smooth. The dangerous point is where linear theory becomes invalid, that is the critical point. Global understanding of the parameter space is the ultimate (but often difficult), goal of nonlinear science. Competing factors often generates critical points. In a system where multiple factors are interacting, the graph of the governing law is a complex manifold immersed in a high-dimensional space, which is accompanied by various deformations, plenty of pleats, and multiple overlaps. Such a complicated manifold gives a geometric representation of a complex system. For example, the system “earth”—being an extremely high-dimensional manifold connecting materials in many phases, various types of energies and many species creating ecological chains—should have many branches of possible states. Actuality is understood as a diachrony (a historical path that passed through many critical points) on the manifold.
Notes Note 1.1 (Function Space) The study of mathematical science starts by measuring and parameterizing an object. The first question we encounter is how many parameters (variables) we need to describe the object, that is, in order to identify an object as a vector, what the dimension of the vector is.
34
1 What Is NONLINEAR?
For example, classical mechanics describes the mechanical “state” of a particle by a six-dimensional vector consisting of the position and momentum (each of them is a three-dimensional vector). In the view of quantum mechanics, however, the state of a particle is represented by a “wave function.” Quantum mechanics was not the first to study an object to be described by a function. In many different areas of physics, functions (or fields in physics terminology) are the “objects” of analysis —for example, water surface waves, electromagnetic fields, distributions of temperature, density, pressure, and so on. Also when we study an ecosystem, for instance, we may have to analyze not only total populations but also distributions of various species, which are functions of space–time. In intuitive understanding, a function can have an infinite variety of shapes, and hence, its dimension (degree of freedom) must be infinite. That is, an object that is represented by a function is identified with an “infinite-dimensional vector.” A theoretical justification for this identification is given by functional analysis. To consider a certain set of functions as a “function space,” we have to endow it with the basic structure of linear spaces. Let us recall how we defined linear spaces of vectors in Sect. 1.3.2. First we have to introduce the law of vector composition; see axiom (1.10). The sum and scalar multiple of continuous functions on a domain Ω ⊆ Rn , for example, can be defined by, for every point x ∈ Ω, ( f + g)(x) = f (x) + g(x),
(α f )(x) = α f (x).
(1.47)
Sometimes, however, we have to deal with a more general class of functions that are not necessarily defined everywhere (for example, so-called measurable functions are defined only “almost everywhere”). In such a case, the foregoing “point-wise” definition of vector composition does not apply. To consider such general functions, we have to introduce a “criterion” by which we can measure “differences” of functions, and under this criterion, we examine equality and evaluate the computations. For continuous functions, we can observe differences of functions at every point in the domain, as has already been said. Another method of measurement is to observe “integrals” of differences. For example, let f (x) and g(x) be functions defined on an interval (a, b) ⊂ R. We measure their difference by f − g p =
b
1/ p | f (x) − g(x)| p d x
,
(1.48)
a
where 1 p < ∞ (d x denotes the Lebesgue measure). This quantity can be evaluated for functions that are not necessarily defined everywhere in (a, b), while we have to assume that functions are measurable and integrable on (a, b). We denote by L p (a, b) the function space in which the differences of elements are measured by the integral (1.48), and call it Lebesgue space. A criterion to define differences is called a topology (some examples of different topologies will be given in Note 4.1). When we introduce the law of vector decomposition (1.47) to a set of functions, we need, simultaneously, a “criterion” to examine
Notes
35
the equality of both sides, i.e., we must define a topology, too. To emphasize this point, a function space (or, more generally, a linear space of abstract vectors) is often called a “topological linear (or vector) space.” The measure of differences such as (1.48) is called a norm. A general norm · is a map from a linear space X to R such that, for every x ∈ X and α ∈ K, (1) x 0 and x = 0 is equivalent to x = 0, (2) x + y x + y (triangular inequality), (3) αx = |α| · x. A function space that is endowed with a norm is called a Banach space. To be precise, we also demand completeness for a Banach space, so that every Cauchy sequence (convergence test is evaluated by the given norm) has a convergent point in the space. The completeness helps us develop analytical theory with invoking “limits” (for example, constructing a sequence of functions that are expected to converge to a solution to an equation). Parameterization of a function —representation of a function by “components”— may reveal an explicit image of the function as a “vector.” In Sect. 1.3.2, we introduced the notion of inner product to parameterize (decompose) a vector; we shall take the same route starting by defining an appropriate inner product of functions. A Banach space whose norm is induced by an inner product is called a Hilbert space. For example, for (complex-valued) functions f (x) and g(x) which are defined on (−π, π ), we define an inner product: ( f, g) =
π
f (x)g(x) d x.
(1.49)
−π
We have the relation f 2 = ( f, f )1/2 which is similar to the relation of Euclidean spaces. We denote by L 2 (−π, π ) the Hilbert space endowed with the foregoing inner product ( f, g) and the norm f 2 . Generalizing these relations, we may consider functions f (x) and g(x) on Ω ⊆ Rn and define an inner product: ( f, g) =
f (x)g(x) d x,
(1.50)
Ω
and a norm f 2 = ( f, f )1/2 . The corresponding Hilbert space is denoted by L 2 (Ω). A general inner product ·, · is a bilinear map from X × X (X is a linear space over K) to K such that, for arbitrary x, x1 , x2 , y ∈ X and α1 , α2 ∈ K, (1) x, x 0 and x, x = 0 is equivalent to x = 0, (2) x, y = y, x, (3) α1 x1 + α2 x2 , y = α1 x1 , y + α2 x2 , y.
36
1 What Is NONLINEAR?
In a Hilbert space, the notion of “components” of a function makes sense. The Fourier expansion gives one possible parameterization on the basis that is the total system √ of sinusoidal functions: Let Z be the totality of integers, and ϕk (x) = eikx / 2π (k ∈ Z). The Fourier expansion of a function f (x) ∈ L 2 (−π, π ) is +∞
f (x) =
ak ϕk (x).
(1.51)
k=−∞
The exact meaning of (1.51) is that there exists a sequence {a j ; j ∈ Z} such that lim f (x) −
n,m→∞
m
ak ϕk (x)2 = 0.
(1.52)
k=−n
Here the coefficients ak (k ∈ Z) are given by calculating the inner product: ak = ( f, ϕk ) =
π −π
f (x)ϕk (x) d x
(k ∈ Z).
(1.53)
We may regard {ϕk (x); k ∈ Z} as a set of basis vectors. Then (1.51) gives the representation of an “infinite-dimensional vector” f (x) by its components (cf. (1.16)). Evidently, this basis is an orthonormal system, i.e., (ϕ j , ϕk ) = δ jk . Moreover, by Parseval’s equality, we have f 22 =
+∞
|( f, ϕk )|2 .
k=−∞
Therefore this basis is complete to span the function space L 2 (−π, π ). It is worthwhile noting that the Hilbert space L 2 (−π, π ) is of countably infinite dimensions. For systematic study of the theory of function spaces, the reader is referred to the textbooks [8, 9, 15].
Note 1.2 (Initial-Value Problem of Ordinary Differential Equation) The purpose of an initial-value problem is to find a solution of an ordinary differential equation (ODE) satisfying a given initial condition. Let t ∈ R (meaning time) be an independent variable (t = 0 being the “initial” time), x ∈ Cn be a dependent variable, and ϕ(x, t) be a map from Cn × R to Cn . We consider an ODE governing the evolution of x(t): d x = ϕ(x, t). dt
(1.54)
Notes
37
If ϕ(x, t) is a continuous bounded function of (x, t), the initial-value problem of (1.54), with an initial condition x(0) = x 0 , has solutions (Cauchy–Peano’s existence theorem). However, there is no guarantee for uniqueness of the solution; in fact, we have seen an example of bifurcated solutions in Fig. 1.10. To guarantee the uniqueness of a solution, we need a stronger condition on the map ϕ(x, t). Let D ⊆ Cn × R. If there exists a finite number L such that |ϕ(x 1 , t) − ϕ(x 2 , t)| L|x 1 − x 2 | (∀(x 1 , t), (x 2 , t) ∈ D),
(1.55)
we say that ϕ(x, t) is Lipschitz continuous in D, and there, the solution of the initialvalue problem of (1.54) is unique (the Cauchy–Lipschitz uniqueness theorem); cf. Note 1.3-(2). A linear map defined on the whole Cn is always Lipschitz continuous. However, a nonlinear map may not even be continuous. The map (1.41) is an example of a continuous but not Lipschitz continuous function. For the basic theory of ordinary differential equations, the reader is referred to Coddington & Levinson [3].
Note 1.3 (Categorization of Nonlinearity) This note gives a bird view of the mathematical methods that apply in the realm of nonlinearity. We delineate the basic concepts considering graphs on two-dimensional space; they can be generalized to higher (even infinite) dimensions (as the proportionality relation is generalized to infinite-dimensional spaces). Nonlinearity is represented by a distorted graph. Of course, we may not develop a meaningful theory for “arbitrary” graphs; so we consider some categories of graphs and devise effective methods of analysis utilizing the specific properties characterizing the category (the smallest category is defined by linearity). Here we start from rather general category and add stronger conditions to define smaller categories of nonlinearity. (1) Continuity. First we consider a general continuous map and see what we can deduce only from continuity. Let f (x) be a continuous map from a closed interval [a, b] to R. For a given number yˆ ∈ R, we consider an equation yˆ = f (x) and try to solve it for x. We put α = max[ f (a), f (b)],
β = min[ f (a), f (b)].
If β yˆ α, the equation yˆ = f (x) has at least one solution (see Fig. 1.13). This fact implies (regardless of the shape of the graph) that the boundary values of the graph teach the solvability of the equation. However, what we can conclude is weakened when the “distortion” of the graph becomes stronger: if the monotonicity is broken (i.e., critical points are created; see Sect. 1.4.4), the uniqueness of the solution is lost, and if max f (x) or min f (x) is achieved inside the domain, the
38
1 What Is NONLINEAR? y α –1 +1
+1
a
b
x
β
Fig. 1.13 The degree of a map: by the deformation of the map f (x), the number of the solutions to yˆ = f (x) changes. Let us give each intersection the number sgn f . Then their sum—the degree of the map f (x) with respect to the value yˆ —is invariant against the continuous deformation of f (x) if the boundary values are fixed
foregoing condition is only a sufficient condition but not a necessary condition. Then a question arises: What is invariant against the continuous deformation of f (x) if the boundary values are fixed? Although the number of the solutions to yˆ = f (x) changes, the following “topological number” is invariant (and is called a homotopy invariant). The solution is found as the intersection of the graph of y = f (x) and the line of y = yˆ (see Fig. 1.13). If the graph of y = f (x) has “pleats” (cf. Sect. 1.4.4) the number of intersections changes. We give each intersection a number either +1 or −1, where the sign is selected depending on whether f (x) crosses yˆ from below (+1) or from above (−1). Formally, we define as follows: First we assume that f (x) is a smooth function. Suppose that x1 , · · · , xm are the solutions of yˆ = f (x). We define deg ( f, yˆ ) =
m
sgn f (x j ).
(1.56)
j=1
If there is no solution, we put deg ( f, yˆ ) = 0. If yˆ is a critical point of f (x), then we define deg ( f, yˆ ) = lim y → yˆ deg ( f, y ). Given a general continuous (not necessarily smooth) function f (x), we consider a sequence f 1 (x), f 2 (x), · · · of smooth functions and define deg ( f, yˆ ) = limk→∞ deg ( f k , yˆ ). This deg ( f, yˆ ) is called the degree of the map f (x). It is obvious that the degree is invariant against continuous changes of f (x), as far as the boundary values f (a) and f (b) are fixed. The definition (1.56) is generalized for a multi-dimensional map f : Rn → Rn with replacing the differential coefficient f by the Jacobian of f . Moreover, it can be generalized for a certain class of maps (such that f = I −K , where K is a compact operator and I is the identity) in an infinite-dimensional space (we consider a finitedimensional approximation of f and show that the approximation converges to a unique limit). The generalized degree is shown to be homotopy invariant [12–14]. We may recast the foregoing problem yˆ = f (x) (here we put yˆ = 0 for simplicity) into the problem of finding the fixed point of the map g(x) = x − f (x), i.e., finding x such that g(x) = x. Let g(x) be a continuous function defined on
Notes
39
Ω = [a, b]. We denote its range by g(Ω). If g(Ω) ⊆ Ω, g has at least one fixed point in Ω. In fact, if g(Ω) ⊆ Ω, f (a) 0 f (b), thus deg ( f, 0) = 1, which implies that the graph of y = f (x) intersects with the line of y = 0 (equivalently, the graph of y = g(x) intersects with the line of y = x, thus a fixed point exists). Generalizing this argument to a multiple-dimensional space, we can show that a continuous map g(x) on a bounded closed convex set Ω ⊂ Rn with its range g(Ω) ⊆ Ω has a fixed point in Ω (Brouwer’s fixed-point theorem). We can extend g(x) further to compact operators in an infinite-dimensional space (Schauder’s fixed-point theorem). These theorems are applied to show the existence of solutions for some nonlinear equations [12–14]. (2) Lipschitz continuity and contraction property. Lipschitz continuity | f (x1 ) − f (x2 )| L|x1 − x2 |
(1.57)
is a stronger condition than continuity. If L 1, the map f (x) is said to be contractive, and if L < 1 it is said to be strictly contractive. For a Lipschitz continuous map, the method of “successive approximation” is often effective. For example, the solution to the initial-value problem of an ODE d x = f (x), dt
x(0) = xˆ
(1.58)
t can be constructed by first rewriting it as an integral equation x(t) = xˆ + 0 f (x(s))ds and then successively approximating it as x0 (t) = xˆ , x1 (t) = xˆ +
t
t
f (x0 (s))ds, x2 (t) = xˆ +
0
f (x1 (s))ds, · · · .
0
By the Lipschitz continuity of f (x), we can show that, for every finite t, x j (t) converges to the unique solution of (1.58). This method—Picard’s successive approximation—applies also to higher (even infinite) dimensional ODEs [3, 15]. The fixed point of a strictly contractive map is unique, which can be obtained by successive approximation. Let us assume that f (x) is strictly contractive on a closed subset Ω and f (Ω) ⊂ Ω. Choosing an arbitrary x0 ∈ Ω, we put x1 = f (x0 ), x2 = f (x1 ), · · · . Because |x j+1 − x j | = | f (x j ) − f (x j−1 )| L|x j − x j−1 |, we find |x0 | +
∞ j=0
|x j+1 − x j | |x0 | + |x1 − x0 |
∞
L j.
j=0
Since L < 1, the right-hand side is finite. Thus x0 + ∞ j=0 (x j+1 − x j ) = lim j→∞ x j is an absolutely convergent series. We denote the limit by x∞ . By the continuity, we find x∞ = f (x∞ ). The uniqueness of the fixed point is shown as follows: suppose that x and x are two fixed points. Then, |x − x | = | f (x) − f (x )| L|x − x |.
40
1 What Is NONLINEAR? y x′
f (x) y′
π/4 x
Fig. 1.14 The graph of a monotone function. Gaps (discontinuities) of the graph should be filled to connect the graph. By titling the axes, it can be viewed as the graph of a Lipschitz continuous (contracting) map
Hence, (1 − L)|x − x | 0. Because L < 1, |x − x | must be zero. Evidently, the foregoing arguments apply in higher (even infinite) dimensional spaces.25 (3) Monotonicity. For the category of monotone functions, many of the skills of linear theory apply with appropriate generalizations. To put it another way, through the study of such generalizations, we may elucidate the deep structures that the linear theory describes only in their superficial simplicity. If a continuous function f (x) is monotone on its domain, we can always define its unique inverse f −1 (y). If f (x) has discontinuities, we fill the gaps to connect the graph (see Fig. 1.14). Then we can define the unique inverse. By “titling” the axes, the graph of a monotone function can be viewed as the graph of a Lipschitz continuous (contraction) map (see Fig. 1.14). This is formally the following transformation: To tilt the graph by π/4, we put 1 x = √ (x + y), 2
1 y = √ (x − y) 2
(y ∈ f (x)).
Let us formulate the expression of y = f (x ). For the convenience of notation, we write f (x) = Ax (A is a nonlinear operator). By the definition, x + f (x) = (I + A)x =
√ 2x .
√ Solving this equation for x, we obtain x = (I +A)−1 2x . The map f : x → y is √ 1 y = √ (I − A)(I + A)−1 2 x 2 The essential part A(I + A)−1 is called the Yosida approximation of A, that is a Lipschitz continuous monotone map [15].
25
In a Banach space (see Note 4.1), it is shown that an absolutely convergent series (measured by a certain norm) converges to an element in the space [15].
Solutions
41
In a Hilbert space, an operator A is said to be monotone if (y1 − y2 , x1 − x2 ) 0 (∀y j ∈ Ax j , x j ∈ D(A); j = 1, 2),
(1.59)
where D(A) is the domain of the operator A. If A has discontinuities, we have to “maximize” the graph with filling the gaps. Then we can show that A(I + A)−1 is a a Lipschitz continuous monotone operator, thus the aforementioned methods apply (see Brezis [2]).
Problems 1.1. Suppose that a power series R −1 = lim supn→∞ |an |1/n .
an x n has a finite radius R of convergence, where
1. Show that there is a finite number r such that supn |an r n | < 1. 2. Show that the radius of convergence of the power series (1.4) is greater than or equal to 1. 3. Show that the radius of convergence of (1.6) is 1 and that the power series actually diverges when |ˇν | = 1. 1.2. Show that the set of variables {x, f (x)} ∈ X × Y (X and Y are linear spaces) defined by a linear map f : X → Y is a linear subspace of X × Y , i.e., show that the condition (1.10) is satisfied. 1.3. Prove that the power series (1.32) has an infinite radius of convergence. Show that relations (1.33) and (1.34) hold. 1.4. Prove that the Laplacian with the Dirichlet boundary condition (u = 0 on the boundary) is a linear operator. 1.5. For a given set of three parameters a, b, and c of the nonlinear impedance (1.46) and a given voltage y, solve y = [a(x − c)2 + b]x for x. Give the range of y where three different solutions exist. Show that the intermediate-branch solution is unstable.
Solutions 1.1 1. For every ε > 0, there exists a sufficiently large N , and we can assume |an |1/n < R −1 + ε for n > N . Hence, with a sufficiently large E, we can set |an |1/n < R −1 + E for all n. Then, for r < R/(1 + E), we have |an |1/n r < 1. 2. Because |aˇ n | < 1, we find, for |xˇ | < 1, n j=1
|aˇ j xˇ n | <
|xˇ | , 1 − |xˇ |
42
1 What Is NONLINEAR?
implying that the left-hand side is a monotone-increasing bounded sequence with respect to n. Therefore, the absolute convergence of (1.4) is proven for |xˇ | < 1. 3. The odd-degree terms are 0. For even-degree terms, we have aˇ n = aˇ 2ν =
(ν − 1/2)(ν − 3/2) · · · (1/2) Γ(ν + 1/2) (ν = 1, 2, · · · ). = √ ν! π ν!
Hence, log aˇ n1/n =
1 2ν
ν − 1/2 ν − 3/2 1/2 log + log + · · · + log . ν ν−1 1 1/n
which yields lim supn→∞ log aˇ n = 0. Evidently, the series (1.6) diverges at |ˇv | = 1, which implies that the speed of a matter of finite mass cannot reach c. 1.2 Only condition (a) is essential; other conditions are naturally induced, by (1.23), from the linearity of the spaces X and Y . Let us denote by G (⊂ X × Y ) the totality of the points z = {x, f (x)} (x ∈ X ). By (1.24), we find z + z = {x + x , f (x) + f (x )} = {x + x , f (x + x )} ∈ G, and for α ∈ K, αz = {αx, α f (x)} = {αx, f (αx)} ∈ G. Hence condition (a) is satisfied. j 1.3 The following relation is well known: Let a power series ∞ j=0 a j x have a , n) be the eigenvalues of a matrix A. If radius r of convergence. Let λ j ( j = 1, · · · j and only if max|λ j | < r , the power series ∞ j=0 a j A converges. The power series ∞ −1 j j=0 ( j!) x has an infinite radius of convergence, thus (1.32) converges for an arbitrary matrix. Therefore the exponential function is an analytic function of t. The relations (1.33) and (1.34) are evident if we observe then term-wise. 1.4 The linearity of the Laplacian Δ= nj=1 ∂ 2 /∂ x 2j is superficially obvious because of the linearity of the differentiations ∂/∂ x j ( j = 1, · · · , n). However, at a more rigorous level, the following problem must be carefully examined: To define an operator, we need to specify the domain on which the operator applies. For a “differential operator,” we need to specify the regularity of functions (to guarantee the differentiations in an appropriate sense), and moreover, we often provide a boundary condition. Without the latter, the operator has a kernel (non-zero u such that Δu = 0), thus we cannot consider the inverse of the operator (when we “solve” a differential equation including the operator). The regularity requirement to define a linear differential operator forces us to consider a smaller (but dense) subspace for its domain (like the Sobolev space; see Note 4.1). A boundary condition may introduce “inhomogeneity” to the operator. For instance, if we assume an inhomogeneous boundary condition u = g = 0 on the boundary Γ and try to define the domain of Δ in the set of functions V = {u; u = g on Γ}, then the domain is not a linear subspace (hence, such an operator is not a linear operator). When the homogeneous Dirichlet boundary condition is given, the domain is a linear subspace, and then, Δ is a linear operator. For the rigorous theory of boundary conditions for differential operators, see Lions & Magenes [9].
References
43
1.5 Suppose that the current x is changed by δx. For the intermediate-value solution, dy/d x < 0. Hence, if δx > 0 (< 0), δy < 0 (> 0). The constant voltage power supply drives a larger (smaller) current, implying instability. A finite reactance (inertia effect) in the circuit balances the δy and determines the time constant of the instability.
References 1. Bacon, F.: The Novum Organum (The New Organon) (1620); In: Rees, G., Wakely, M. (eds.) The Oxford Francis Bacon: The Instauratio Magna Part II: Novum Organum and Associated Texts, Oxford Univ. Press, New York (2004) 2. Brezis, H.: Op´erateurs maximaux monotones et semi-groupes de contractions dans les espace de Hilbert, North-Holland, Amsterdam (1973) 3. Coddington, E.A., Levinson, N.: Theory of Ordinary Differential Equations, McGraw-Hill, New York (1955) 4. Derrida, J.: L’´ecriture et la diff´erence, Seuil, Paris (1967); Writing and Difference (Trans. by Bass, A.) Routledge, London (1978) 5. Derrida, J.: Deconstruction Engaged – The Sydney Seminars, (Ed. by Patton, P., Smith, T.), Univ. Illinois Press, Champaign, IL (2001) 6. Descartes, R.: Discours de la m´ethode pour bien conduire sa raison, et chercher la verit´e dans les sciences (1637); Discourse on Method and Other Writings (Trans. by Sutcliffe, F.E.) Penguin, Harmondsworth (1968) 7. Lanford, O.E.: Computer picture of the Lorenz attractor (Appendix to Lecture VII), In: Bernard, P., Ratiu, T. (eds.) Turbulence Seminar, Springer Lecture Notes in Mathematics 615, pp. 113–116, Springer-Verlag, Berlin-Heidelberg (1977) 8. Lax, P.D.: Functional Analysis, John Wiley & Sons, New York (2002) 9. Lions, J.L., Magenes, E.: Non-Homogeneous Boundary Value Problems and Applications, Vol. I, Springer-Verlag, Berlin-Heidelberg (1972) 10. Lorenz, E.N.: Deterministic nonperiodic flow, J. Atmos. Sci. 20, 130–141 (1963) 11. May, R.M.: Simple mathematical models with very complicated dynamics, Nature 262, 459–467 (1976) 12. Milnor, J.: Topology from the Differentiable Viewpoint, University Press Virginia, Charlottesville, VA (1965) 13. Nirenberg, L.: Topics in Nonlinear Functional Analysis (Lecture Note), Courant Institute of Mathematical Sciences, New York (1974) 14. Schwartz, J.T.: Nonlinear Functional Analysis, Gordon Breach, New York (1969) 15. Yosida, K.: Functional Analysis (6th ed.), Springer-Verlag, Berlin-Heidelberg (1980)
Chapter 2
From Cosmos to Chaos
Abstract In a na¨ıve view, nature is an unpredictable complex system that exhibits diverse aspects. However, there might be an “order” beneath the superficial complexity of various phenomena. If so, how can we discover the order hidden under all the complexities? If we can find an order, we will be able to foresee various phenomena and utilize the power of nature. For a long time, such an expectation (or an optimistic speculation) has been giving incentives to explore science. The universe is called “cosmos.” This word, being synonymous with order, symbolizes our basic posture of seeing nature. The purpose of science is to read and understand the order that is written, in a logical language, in the grand book of nature. However, contemporary science sees the universe as “chaos,” which is the condition before order is imposed. In this chapter, we shall discuss why our view of nature has metamorphosed from order to chaos.
2.1 The Order of Nature—A Geometric View 2.1.1 Galileo’s Natural Philosophy As legend goes, Galileo Galilei observed a swinging lamp in a cathedral in Pisa, and discovered isochronism, the constancy of the period of oscillation, independent of the amplitude. The ultimate aim of science is to reveal the basic rule of nature or society, which might be hidden under the seemingly chaotic phenomena of the actual world. Galileo took a step forward in search for principles that rule the universe through the careful observation of a small swinging lamp. Generally, the search for order (or regularity) is not easy—it is concealed under complex surfaces. However, there was a great triumph in the history of science; the planets’ seemingly complicated orbits were untangled by a fundamental and universal principle of motion. A planet is an appreciably bright star that describes a peculiar curve drifting among other ordinary stars. However, this peculiar curve is subsumed in a universal system of ellipses, by allowing our view point to move as part of the universal system; this is Copernicus’ space system. The conversion from Ptolemaic to Copernican theory is, in mathematical language, a coordinate transformation. Modern physics or any other theory of motion Z. Yoshida, Nonlinear Science, DOI 10.1007/978-3-642-03406-0 2, C Springer-Verlag Berlin Heidelberg 2010
45
46
2 From Cosmos to Chaos
follows this idea: how can we transform a seemingly complicated motion of an object into a simple and ordered motion by changing the perspective? The word “motion” addresses not only the spatial movement of an object but also the general temporal evolution; i.e., diachronic events such as the change in electric current flowing in a circuit, the progress of chemical reactions, the generation of organisms, the evolution of species, the changes in ecosystems, the economic phenomena (like the volatility of stock prices or the circulation of resources, etc.), sociological movement, and so on. We measure such an event and parameterize it. The motion then is identified as the movement of a vector in a linear space (cf. Sect. 1.3.1), which is represented as an orbit; we can analyze it by mathematical methods. This framework of the natural philosophy, envisaged by Galileo (Fig. 2.1), teaches how to project nature to the mathematical domain. Coordinate transformation is the basic method devised therein.
Fig. 2.1 Galileo Galilei said “Philosophy is written in this grand book, the universe—It is written in the language of mathematics.” He placed natural science on the horizon of mathematics and developed theoretical framework of mathematical precision. However, his theoretical attitude was criticized by phenomenologists as a pursuit of “fiction” that identifies nature with mathematical artifacts (cf. Husserl [8]). Portrait by Justus Sustermans
2.1.2 Geometric Description of Events Our mathematical analysis of the order (or the regularity) of motion starts by representing our object of interest by a vector; this process is a reduction of the actual object in the real world (cf. Sect. 1.1). Here we encounter a na¨ıve but essential question: “What is the most appropriate vector space (linear space) where we may project our object of interest?” Or, in other words: “Which basis (the set of units with which we measure an object) should we choose?” If we change the basis (our
2.1
The Order of Nature—A Geometric View Γ1
Γ1
φ
Γ2
φ
47
Γ2
φ′ t
(a)
t
(b)
Fig. 2.2 Graphical representation of a harmonic oscillator; (a) the curves in the space of time t and angle φ. (b) The curves in the space of time t, angle φ, and angular velocity φ . The projection of the curves of figure (b) onto the t − φ space yields the curves of figure (a). The graphs of the two different motions Γ1 and Γ2 drawn in the t − φ space cannot be separated from one another
perspective of observation), the parameterization of the vector (the image of the object) also changes (cf. Sect. 1.3.1). Let us see a simple example. Figure 2.2a shows the relation between the oscillation angle φ and the time t of a harmonic oscillator. Two curves Γ1 and Γ2 describe the oscillations with two different amplitudes. We notice that these two curves intersect one another. Thus, this representation of oscillation is “incomplete”—at an intersection, we cannot determine which branch the motion should go. We then consider that there must be some other parameters that are involved in this phenomenon. In Fig. 2.2b, we extend the space by adding another coordinate for plotting the angular velocity φ = dφ/dt and draw two orbits of oscillations with different amplitudes. In this representation, different motions with different initial conditions do not intersect. Hence, we may conclude that this parameterization is “complete” for the description of the motion of a harmonic oscillator, i.e., a harmonic oscillator may be identified by a vector in the space spanned by φ and φ . The vector that is a mathematical equivalent of an object is called a state vector. The linear space of state vectors is called state space, whose dimension, the number of basis vectors, is the degree of freedom. The product space of the state space and the time coordinate is called space–time.1 Figure 2.2b plots two different motions in space–time. An orbit is the projection of a graph of motion in space–time onto the state space (see Fig. 2.3). If one tries to describe the firsthand observation of a pendulum’s swing in the Cartesian coordinates, it might be a rather involved task, and its order might not be
1 Conventionally, “space–time” is the four-dimensional space composed of the three-dimensional coordinates and the time coordinate. In this book, we extend the notion to mean the general product of state space and time.
48
2 From Cosmos to Chaos
φ Γ1 Γ2
ωt
φ′
Fig. 2.3 Orbits of harmonic oscillations in the state space spanned by φ (angle) and φ (angular velocity). These ellipses are the projections of the graphs Γ1 and Γ2 in space–time onto the state space. Each orbit may be written as (φ(t), φ (t)) = (φ0 sin (ωt), ωφ0 cos (ωt))
obvious. Galileo’s keen insight discovered the separability of the amplitude and the rate of change of the angle of oscillation; the angle changes regularly and its rate of change (angular frequency) is independent of the amplitude. The ellipse of Fig. 2.3 is the geometric representation of this order. In Sect. 2.4, this motion will be further simplified into the “ultimate representation of order.” If a system is autonomous, the same motion (evolution) must occur for the same initial condition, independent of the selection of the initial time, i.e., the motion is unchanged with respect to the transformation of the origin of time (t → t + c, for any constant c). The negation of autonomous means that the system maintains a connection with the exterior world (environment). With a change in the external conditions, the motion will be different even for the same initial condition, when the start time is shifted. Such a system is often called an open system. In the case of an autonomous system, orbits (curves described in state space) never intersect. In fact, if two orbits intersected, the motion starting from the intersection point would have two different ways of evolution. In contrast, if a system is not autonomous, orbits may intersect, because they may have different initial time. Therefore, the description of motion with an orbit falls short for an open system, and graphs in space–time should be invoked.
2.1.3 Universality Discovered by Newton Isaac Newton (1642–1727) introduced an extremely innovative method to analyze motion; his method of describing a law of motion is now applied in almost all fields of sciences that are concerned with “dynamics” (Fig. 2.4). The motion of an object, when described by an orbit, exhibits unlimited diversity and complexity. Therefore, we seek universality not in an orbit itself but rather in the “mechanism” that produces the orbit. Newton discovered it in the infinitesimal passage of time, inventing the notion of zero as the limit. Present-day mathematics denotes it by limt→0 , but he expressed it by “little o” in his private notation. He succeeded in justifying tricky calculation of “dividing zero by zero” and provided it with the meaning of rate of change. Using this method—the calculus—he was able
2.1
The Order of Nature—A Geometric View
49
Fig. 2.4 I. Newton created calculus, and using it, formulated a universal principle of motion—the equation of motion. Portrait by Sir Godfrey Kneller
to define velocity (rate of change of position) and acceleration (rate of change of velocity).2 Under the superficial diversity of various orbits, there is a “deep structure” that rules the generation of every motion: the equation of motion formulated by Newton means that, if forces acting on an object do not balance, they are equivalent to a virtual force (inertial force) which is the acceleration of the object multiplied by its mass. However, there is a big difference between formulating an equation of motion and knowing or predicting an actual motion (i.e., finding the solution of an equation of motion).3 An equation of motion describes a balanced relation that must hold at every infinitesimal time. But our original incentive is to understand or predict motion in a finite time (or long term), which is yet to be investigated. Generally, it is very difficult to “solve” an equation of motion and construct (theoretically) the actual motion (orbit).4
We have already seen, in Fig. 2.2, that we need the derivative φ = dφ/dt to construct a complete description of the motion of an oscillator.
2 3
An equation of motion may be called a synchronic structure that appears on the cross-section of space–time sliced at arbitrary time. On the contrary, an orbit describes a diachronic structure which is the property of motion observed in the history of the event.
4
According to science history, Leonhard Euler (1707–178) was the first to formulate the equation of motion as a differential equation. Newton was primarily interested in knowing what “force”
50
2 From Cosmos to Chaos
Let us start by examining the geometric meaning of solving an equation of motion or what is called an ordinary differential equation (ODE). We consider a first-order ODE governing an n-dimensional state vector x = t (x1 , · · · , xn ) ∈ Rn : d x = V (x, t), dt
(2.1)
which is called an n-dimensional dynamical system (n is called the degree of freedom; see Sect. 1.3.3). The right-hand side V (x, t) is an Rn -valued function of x and t. If V does not depend on t explicitly,5 then a transformation t → t + c does not change (2.1), which implies that the system is autonomous. In this case, only the “time difference” (the time measured from an arbitrary initial time) has meaning. If V is an explicit function of t, the system is not autonomous (i.e., it is open), implying that some “external force” that depends on time is driving the system. Figure 2.5 explains the geometric meaning of solving an initial-value problem of an equation of motion (ODE). Recall the definition of the derivative of an orbit {x(t); t ∈ R}: x(t + δ) − x(t) d x = lim , δ→0 dt δ which gives the tangent vector with respect to the graph. Therefore, (2.1) demands that the orbit must be a curve whose tangent vector coincides with the prescribed
x(t)
^ x
Fig. 2.5 An orbit defined by an equation of motion (ODE) parallels the streamline of a “flow” which is the vector field on the right-hand side of the ODE
is. Solving the equation of motion and analyzing the actuality of motion became the theme of mechanics theory after the pioneering researches of Euler. 5 Even if V = V (x) (i.e., V does not depend on t explicitly), the value of V may change with time by its implicit dependence on t through x(t).
2.1
The Order of Nature—A Geometric View
51
vector field V (x, t). We may associate the abstract vector field V (x, t) with the physical image of flow. The orbit {x(t)}, then, is a streamline of the flow, which is the trajectory of a virtual “particle” as moved by the flow. We shall often use the word streamline synonymously with orbit. A particular orbit is identified by its initial condition; setting an initial time t = t0 and an initial point x(t0 ) = xˆ , we solve (2.1) to find an orbit that starts from the initial point. The solution gives the state x(t) in a future time t > t0 . For an autonomous system, the origin of time may be arbitrary, so we usually set t0 = 0. The mathematical problem of finding an orbit that satisfies an ODE and an initial condition is called the initial-value problem of ODE.6 While we have understood the geometric meaning of the initial-value problem of first-order ODE (2.1), Newton’s equation of motion, governing the motion of a point mass particle, is a second-order ODE: m
d2 q = F, dt 2
(2.2)
where q(t) is the vector position (or the coordinates) of the particle, m is the mass (assumed to be a positive constant), and F is the force acting on the particle. Denoting q = dq/dt (which means the velocity of the particle), and assuming that q (t) is a new unknown vector, we may cast the second-order ODE (2.2) into a system of first-order ODEs: d q q = , (2.3) F/m q dt which constitutes a special class of dynamical systems; the state vector of a Newtonian dynamical system is a compound of the position vector q and the velocity vector q (recall that we had to invoke the angle φ and the angular velocity φ to describe the motion of a pendulum; cf. Fig. 2.2). Generally, when a vth order ODE dv q = F(q, dq/dt, · · · , d v−1 q/dt v−1 , t) dt v
(2.4)
is given, we set q (0) = q, q (1) =
dq d υ−1 q , · · · , q (v−1) = υ−1 dt dt
6 For the existence and uniqueness of the solution of an initial-value problem of ODE, see Note 1.2. The ODE (2.1) is temporally reversible, because the transformation t → −t changes only the sign of the vector field V → −V and, hence, we may solve (2.1) to find the “past” state x(t0 − s) (s 0). However, a certain kind of initial-value problem of partial differential equations, such as a heat diffusion equation (see Sect. 1.3.4), can be solved only for t t0 ; such dynamics is said to be an irreversible process.
52
2 From Cosmos to Chaos
and define a vector x = t (q (0) , · · · , q (v−1) ) of extended unknown variables. Then the higher-order ODE (2.4) may be rewritten as a (higher-dimensional) system of first-order ODEs: ⎞ ⎛ ⎞ q (1) q (0) d ⎜ . ⎟ ⎜ ⎟ .. ⎝ .. ⎠ = ⎝ ⎠. . dt (v−1) (0) (v−1) , t) q F(q , · · · , q ⎛
(2.5)
Therefore, the study of first-order ODE (2.1) suffices for the general theory of ODE of any order. After the great success of Newtonian mechanics, formulating an equation of motion—a balanced relation dominating an infinitesimal evolution—is now a basic method of studying any kind of dynamics. An equation of motion (more precisely, the flow V on the right-hand side) is the generator of motion, which is “universal” for every motion; the “particularity” of an actual motion is attributed to its initial condition. So, the syntax of dynamics theory relates universality to infinitesimal time and particularity to initial time. This “separation” dodges our original (and ultimate) interest in the actual phenomenon, that is the particularity of the individual experience or future state. Particularity is pushed to the periphery (or surface), and is called an “arbitrary initial condition,” while the equation of motion is a universal “machine” that can generate motion when an “arbitrary initial condition” is given. It is great, but we have yet to solve the equation of motion. And, we still do not know any general relation between the initial condition and the motion itself, which is indeed the “order” of our interest. In principle, we may solve the equation of motion for each initial condition. To “understand” motion, however, we have to reveal a general relation between the motion (orbit) and the initial condition (the root of the individuality). Unfortunately, nonlinearity makes it extremely difficult. We shall start by explaining more clearly what the order of motion is.
2.2 Function—The Mathematical Representation of Order 2.2.1 Motion and Function An equation of motion (mathematically, an ODE) is a device that produces a motion. A motion, in our mathematical perspective, is an orbit {x(t); t ∈ R} in state space X , which may be viewed as an X -valued function of time t. However, when we use the term “function,” we mean something more; it should be an expression of certain “order,” a systematic relation between t and x, or between x(0) (initial condition) and x(t) (t > 0). Using the example of a pendulum’s motion, let us see how a “function” is generated by an equation of motion, and how it expresses the “order” of the corresponding phenomenon. We consider a pendulum of mass m that hangs by a string of length L (see Fig. 2.6). We denote by φ the angle of oscillation (measured from the
2.2
Function—The Mathematical Representation of Order
53
φ L
v = Ldφ /dt F=mg
Fig. 2.6 Motion of a pendulum (a weight of mass m is hung by a string of length L)
perpendicular) and by g the gravitational acceleration. The acceleration of the weight in the tangential direction of the orbit is given by d 2 (Lφ)/dt 2 , while the force in the tangential direction is mg sin φ. Hence, Newton’s equation of motion, which requires that force be equal to the product of m and acceleration, reads as d2 φ = −ω2 sin φ dt 2
ω = g/L .
(2.6)
This ODE is called the pendulum equation. Providing initial values for the angle (φ(0)) and the angular velocity (φ (0)), we solve (2.6) to determine the motion of the pendulum. However, a harmonic oscillation that shows isochronism, as Galileo found, is not derived from this equation but rather a solution to the following “linear approximation” of (2.6). The function sin φ on the right-hand side of (2.6) can be expanded in Taylor’s series in the neighborhood of φ = 0 (the equilibrium point) as 1 5 1 φ + ··· . sin φ = φ + φ 3 + 6 120 For |φ| 1, we estimate |φ| |φ 3 | |φ 5 | · · · . Hence, in small-amplitude oscillations, we may keep only the first-order term: sin φ ≈ φ,
(2.7)
which is the “linear approximation” of the sine function (see Fig. 2.7). Every force given by a regular function may be approximated by a linear function in some small range of motion (cf. Sect. 1.4). So, it is a legitimate strategy to start the analysis from the linear approximation of the equation of motion (cf. Sect. 1.2.2).
54
2 From Cosmos to Chaos y 1 –π π
x
–1
Fig. 2.7 Linear approximation of a trigonometric function
Now we linear approximate (2.6) and consider d2 φ = −ω2 φ, dt 2
(2.8)
whose general solution may be written as φ(t) = φ0 cos (ωt + δ),
(2.9)
where a real constant φ0 represents the amplitude of oscillation, and the other real constant δ specifies the phase of oscillation. Both parameters are determined by the initial condition (it may be an easy exercise to relate the constants φ0 and δ to the initial values φ(0) and φ (0)).7
2.2.2 Nonlinear Regime As the amplitude of oscillation gets larger, the linear approximation (2.7) becomes improper, and we have to solve the nonlinear pendulum equation (2.6) exactly. In reaching this goal, however, the knowledge of elementary functions falls short; we have to invoke elliptic functions. Here we describe only the essential part of calculations and leave the details for Problem 2.1. Our purpose in this section is to see how the spirit of the theory is metamorphosed in the study of nonlinearity. Let us start with a one-dimensional Newton’s equation of motion such that m
d d2 x = − U (x), dt 2 dx
(2.10)
7 As mentioned in Sect. 1.3.4, the motion of a linear system is represented by an exponential law. The trigonometric function that we derived for the linear-approximated pendulum is the special form of an exponential function (eiθ = sin θ +i cos θ). The solution (2.9) can be rewritten, invoking a complex amplitude Φ0 = φ0 eiδ , in the form of φ(t) = Φ0 eiωt = (Φ0 eiωt + Φ0 eiωt )/2. In Sect. 2.1.3, we have shown that a higher-order ODE can be reduced to a system of first-order ODEs. If we follow this method, we should be able to generate the exponential function of a matrix operator (cf. Sect. 1.3.4). A more detailed discussion on the exponential function of matrices will be made in Sect. 2.3.
2.2
Function—The Mathematical Representation of Order
55
where the positive constant m is the mass of the object and the real-valued function U (x) is the potential energy that produces the force −dU (x)/d x. We may “integrate” the differential equation (2.10) by the following steps: First we multiply both sides of (2.10) by d x/dt (velocity). After some manipulations, we obtain
d dt
m 2
dx dt
2
+ U (x) = 0.
Integrating this on [0, t], we obtain m 2
dx dt
2 + U (x) = H
(a constant),
(2.11)
which implies conservation of energy; the first term on the left-hand side is the kinetic energy. Defining W (x; H ) = (2/m)[H − U (x)], where the parameter H represents the energy, we can rewrite (2.11) as a first-order ODE: d x = ± W (x; H ). dt
(2.12)
This is a so-called separable equation, which we can integrate formally as:
dx =± √ W (x; H )
dt.
(2.13)
Let us denote by f (x; H ) the function that is given by the integral of the left-hand side of (2.13). From the right-hand side, we obtain c ± t, where c is a constant of integration. Defining the inverse map of x → y = f (x; H ), we may write x = f −1 (y; H ). The abstract solution of the equation of motion (2.10) is written as x(t) = f −1 (c ± t; H ). Here, H and c are constants of integration that may be determined by the initial condition. We may consider the left-hand side of (2.13) as a “definition” of a function, embodying the preceding statement: “a function is generated by an equation of motion.” For some specific potential energies, we can calculate the concrete forms of the function f (x; H ) or f −1 (y; H ). When U (x) is a second-order polynomial, the force −dU/d x is a linear function of x, so (2.10) becomes a linear ODE, whose solution has been given in the preceding subsection. Needless to say, the solution constructed by integrating (2.13) gives the same result. If U (x) (thus W (x; H )) is a polynomial whose order is at most four, the function f (x; H ), defined by the left-hand side of (2.13), is an elliptic integral. The pendulum equation may be transformed into an equation of this class (see Problem 2.1). The inverse function f −1 (y; H ) with respect to the elliptic integral f (x; H ) is the elliptic function.
56
2 From Cosmos to Chaos
In the development of the theory of elliptic functions, Carl F. Gauss (1777–1855) made an epoch-making contribution by finding their double periodicity. Afterward, Niels H. Abel (1802–1829) and Carl G. Jacobi (1804–1851) strengthened the theory and established the present-day style. Formally, an elliptic function is a doubly periodic (periodic in two directions), rational function on the complex plane (a trigonometric function is a periodic function of only one real variable, having only one period).8 The nonlinear theory of pendulum generalizes and refines the linear theory of oscillations. In parallel to this relation, elliptic functions generalize the exponential functions. Our question then is, how far we can extend the theory by elliptic functions. Because elliptic functions are generated if the potential U (x) is at most a fourth-order polynomial function, we must admit that they are still extremely special.
2.2.3 Beyond the Functional Representation of Motion We have seen that a linear equation of motion generates an exponential function, and the nonlinear pendulum equation generates an elliptic function. Generalizing these examples, we may say that an equation of motion (or an ODE) “generates” a certain function. Then the function embodies the mathematical structure of motion; the regularity of the function represents the order of orbits. The extension of the theory from trigonometric functions to elliptic functions was an enormous progress and as a result, the understanding of periodicity—a simple expression of order—was generalized. Recall, however, how arduous the path into the nonlinear world was, even though it was only a matter of pendulum. For each equation of motion, we define a new “function” and analyze its properties—does this mean exerting a hopelessly huge effort? But it is not only a problem of effort. There is a fundamental limitation in representing motion by a “function.” We have to explain further the meaning of this “limitation.” Representing motion by a function means not only representing it with mathematical symbols but also discerning a certain order of motion. If some motion does not have any order, it is meaningless to represent it with specific symbols. Of course, one may show the appearance of an “individual” motion as a graph by
8 For the theory and applications of elliptic functions, see Whittaker & Watson [19]. The most significant properties of elliptic functions are that they satisfy an algebraic addition theorem and that the differentiation of an elliptical function is expressed by other elliptical functions. Needless to say, trigonometric functions enjoy these properties. However, they are much simpler because differentiation does not change their shapes (it brings about only similarity transformations). Differentiation does change the shape of an elliptic function. Yet, an interesting point is that the deformation by differentiation may be equilibrated with certain algebraic operations. This fact implies that the elliptic function solves a certain differential equation including some algebraic nonlinear terms, such as a nonlinear wave equation that describes so-called solitons (see Sect. 4.4.4, Problem 4.3, and, for example, Brazen & Johnson [3]).
2.2
Function—The Mathematical Representation of Order
57
numerically solving the initial-value problem of an equation of motion; although numerically analysis gives just an approximate solution, recent computers can solve it rather accurately. Because a graph is a geometrical representation of a certain map, the graph of a solution of an “XX equation” may be named, for example, as “xx function.” The motion changes when we change the initial condition; an individual solution to each initial-value problem is a particular solution of the ODE. Therefore, we have to survey the whole shape of the “xx function” by changing the initial condition; to define a function (map), we need the general solution, not a particular solution for a special initial condition. What we meant by “motion does not have any order” is the case when the “xx function” changes at random in response to a small change in the initial condition. Then, the “xx function” does not embody any structure or order, even if it is endowed with a name or a symbol. A function that is defined to represent a motion has a meaning only if the motion describes a certain order. The motion of a pendulum may be represented as 2 sin−1 [k sn (ωt + δ, k)] (see Problem 2.1). This solemnious symbolic representation does not display the actuality of a pendulum’s motion. We may recognize the reality of the motion only after the function is plotted as a graph. However, the advantage of this mathematical expression is not in displaying a particular form of motion, but rather in representing the order or universality of all possible motions. We may claim that we completely understand the motion, when the solution of the equation of motion is given as “function” and its mathematical structure is fully analyzed. A pendulum’s motion is now perfectly understood through the theory of elliptic functions, and the theory propagates to a wider, general realm of the theory of periodicity. Immortal wisdom is built by determined analysis of individual mathematical structures. The theoretical method needed to uncover the order of motion, which we shall study in this chapter, is how to find the so-called integrals of motion. When we obtain a sufficient number of integrals that corresponds to its number of degrees of freedom, the solution can be represented by a certain “function”—such a motion is said to be integrable. The “function” may not be a textbook function, but is still defined by an integral, such as (2.13). Remembering the preceding example, we found, for the equation of motion (2.10), the energy conservation law (2.11). The “integral constant” H , which means the constancy of the energy, is one example of the integral of motion.9 After this first integration, the second-order ODE (2.10) is reduced to a first-order ODE (2.12), which is integrable and defines a function by (2.13). However, it is known that integrable equations are rather special; an equation of motion is generally non-integrable. Mechanical theory considers non-integrability as chaos, i.e., chaos (terms like complexity or irregularity may be used synonymously) is define as a motion that cannot be represented by a “function.” 9 It is noteworthy that an integral of motion is related with a certain conservation law; in the foregoing example, the energy conservation law yields the integral H . This fundamental relation will be used in the later sections, where we will look for the order of motion in a more systematic way.
58
2 From Cosmos to Chaos
Non-integrability (the difficulty in finding integrals of motion) occurs primarily because of the high dimensionality of state space. When multiple parameters create links through their motions, it becomes difficult to decompose them into separate motions and reduce each of them into a simple (ordered) motion. At the root of such “non-decomposability” there is nonlinearity. To make the relation between non-decomposability and nonlinearity clear, we shall first come back to linear theory to clarify what decomposition is, and we shall study why nonlinearity makes decomposition difficult.
2.3 Decomposition—Elucidation of Order 2.3.1 The Mathematical Representation of Causality An orbit is a graph in state space, describing the history of an individual event; a different initial condition yields a different orbit. Our aim is to elucidate universal properties common to all orbits. However, this exploration is generally impossible because of the unlimited diversity of motion that nonlinearity may produce. In this section, we start from an abstract representation of “motion”, and then discuss how we may represent motion in a concrete form—this is where decomposition becomes essential. We consider a state space X = Cn where n is a finite integer.10,11 We denote by x(t) the state at time t. We assume that x(s) and x(t), for arbitrary s and t ( s), are related by a certain map, and write x(t) = T (t, s)x(s),
(2.14)
where T (t, s) is a map (or operator) including two real parameters s and t. Interpreting the meaning of x(s) as the “cause” and x(t) as the “result”, the map T (t, s) gives an abstract representation of causality. If T (t, s) is defined for every t s ∈ R and everywhere on X , the state space X is a deterministic world ruled by the causality T (t, s). For logical consistency, we demand the following properties (axioms) for T (t, s): For every t s r ∈ R, 10
In this section, we discuss eigenvalue problems. For the algebraic equations of eigenvalue problems to be always solvable, we assume that K = C; by the fundamental theorem of algebra, the field C is algebraically closed in the sense that every non-zero single-variable polynomial with complex coefficients has as many complex roots (including multiplicity) as its degree.
11
The abstract dynamical system to be described in this subsection may be nonlinear. Then the set of states (so-called ensemble) may be restricted in a certain subset of some linear space, i.e., it is not necessarily a linear space (even in linear theory, it is often convenient to immerse the ensemble, a linear space, in a larger linear space, if we consider an infinite-dimensional space). For simplicity, however, we assume that the dynamics is defined everywhere on X = Cn .
2.3
Decomposition—Elucidation of Order
59
T (t, t) = I (identity), T (t, s) · T (s, r ) = T (t, r ).
(2.15) (2.16)
The streamline shown in Fig. 2.5 gives a graphical image of the map T (t, s). Here we consider an autonomous system (cf. Sect. 2.1.2). Then the dynamics is invariant with respect to the shift of the origin of time, so we may write T (r, s) = T (r − s, 0)
(∀r ∀s).
Thus it suffices to represent the causality using only one parameter t = r − s ( 0) (meaning “time span”); we write: T (t) = T (t, 0). For the one-parameter map T (t), axioms (2.15) and (2.16) read as T (0) = I (identity), T (s) · T (t) = T (s + t) (∀s, t 0).
(2.17) (2.18)
The condition (2.18) is called the associative law, which induces the commutative law : T (t) · T (s) = T (s) · T (t) (∀s, t 0).
(2.19)
If we allow t to take a negative value and assume that the map of backtracking the orbit satisfies (2.17) and (2.18), T (t) has its inverse T (t)−1 determined by T (t)−1 = T (−t) (t ∈ R).
(2.20)
Mathematically, an autonomous dynamical system is identified with the set of one-parameter operators G = {T (t); t ∈ R}. The operator T (t) is generally a nonlinear map in state space X . Here we assume that T (t) (∀t) is defined everywhere on X , for simplicity. For the set G, the axiom (2.17) guarantees the existence of the unit element, (2.18) applies the associative law, and (2.20) ensures the existence of the inverse element. Such a set G is called a group. Relation (2.18) means the commutative law. So G is said to be a commutative group (or Abelian group). For an irreversible process, condition (2.20) must be removed. An irreversible dynamical system is identified with S = {T (t); t 0},
60
2 From Cosmos to Chaos
which is called a semigroup. The theory of semigroups is important in the study of infinite-dimensional dissipative systems. In this chapter, however, we assume that dynamics is temporally reversible unless otherwise remarked. Calculating the temporal derivative of the operator T (t), we obtain an abstract equation of motion that governs the dynamical system G; we define (T (δ) − I )x δ→0 δ
Ax = lim
(x ∈ X ).
(2.21)
The operator A (generally a nonlinear operator) is called the generator (or infinitesimal generator) of the dynamical system G.12 Given an operator A, we may consider the initial-value problem of the equation d x = Ax. dt
(2.22)
In comparison with the general form (2.1) of the equation of motion, the “flow” V (x) is rewritten as Ax in this operator-theoretic equation of motion; the generator A defines the flow at every point x of state space (cf. Fig. 2.5). The right-hand side of (2.22) does not depend on t explicitly, because we are considering an autonomous system. If we can solve the initial-value problem of (2.22) for all t ∈ R and every initial condition x 0 ∈ X , we may write the solution as x(t) = T (t)x 0 to generate a group {T (t); t ∈ R}.
2.3.2 Exponential Law—A Basic Form of Group In the narrative of Newtonian mechanics, the rule of motion is defined by a generator A. In Sect. 2.2, we discussed how we can integrate an equation of motion and generate a function (mathematical representation of all orbits). In this section, this argument may be rephrased as “how can we generate a group {T (t)} by a generator A?.” As previously mentioned, the one-parameter map T (t) must be defined for all t ∈ R and for every initial value x 0 ∈ X , which is equivalent to knowing all orbits. This is practically possible if we can decompose the generator. Let us explain the meaning of decomposition, first within the framework of linear theory. Conditions (2.17), (2.18), and (2.20), characterizing the one-parameter map T (t), remind us of the exponential function eta (a ∈ C).13 The generator of this function is the multiplication of the scalar a (evident from definition (2.21) of the generator). 12 For an irreversible dynamical system (semigroup) S, however, the parameter t can take only non-negative values, so we have to replace, in (2.21), the limit δ → 0 by the right-limit δ → +0, and define the generator by the right derivative. 13
Recall the arguments of Sect. 1.3.4, where we have shown that the motion of a linear system obeys the exponential law.
2.3
Decomposition—Elucidation of Order
61
Hence, the corresponding equation of motion is nothing but the linear ODE (1.28); solving the initial-value problem yields the exponential function eta multiplied by an arbitrary initial value x0 . A linear operator in X = Cn is represented by a matrix A (see Sect. 1.3.3). The one-parameter map T (t), generated by a linear generator A, is the exponential function of A, which we have already defined in (1.32). However, this definition based on the infinite series does not reveal the structure or behavior of T (t). So we shall first transform the generator (matrix) into a simpler representation. We may say that linearity is the generalization of proportionality, thus, we put Ax = ax
(a ∈ C).
(2.23)
The left-hand side of (2.23) is the linear map, and the right-hand side is the proportionality relation; we are going to reduce, or decompose, the linear map into proportionality relations. The problem of finding a and x that satisfies (2.23) is called the eigenvalue problem of A, and a and x are called the eigenvalue and the eigenvector, respectively. First we assume that the generator A is a normal matrix.14 Then we obtain, by solving (2.23), the complete set of eigenvalues a j and the corresponding eigenvectors ϕ j ( j = 1, · · · , n) which are orthogonal to each other. Normalizing every eigenvector, we may define an orthonormal basis U = {ϕ 1 , · · · , ϕ n } that spans X = Cn . The linear transformation of the basis from the original one to this U is represented by the unitary matrix in which the columns are the eigenvectors: U = (ϕ 1 · · · ϕ n ) . Using this transformation, the generator A is diagonalized: ⎞ ⎛ 0 a1 ⎟ ˜ = U −1 AU = ⎜ A ⎝ ... ⎠ . 0
˜ is, by definition (1.32), The exponential function of the diagonal matrix A ⎞ ⎛ ta 0 e 1 ˜ ⎟ ⎜ .. et A = ⎝ ⎠. . 0
e
(2.24)
an
(2.25)
tan
In this representation, we find that each eigenvalue a j determines the time constant ˜ of the term eta j included in the exponential function et A . Returning to the original basis, we obtain
Let A∗ denote the adjoint matrix of A, i.e., (Ax, y) = (x, A∗ y) for every x, y ∈ X . We write the commutation of A and B as [A, B] = AB − B A. If [A, A∗ ] = 0, A is said to be normal. If A = A∗ , A is said to be self-adjoint (or hermitian). And, if A∗ = A−1 , A is called a unitary matrix.
14
62
2 From Cosmos to Chaos
et A = U et A U −1 . ˜
(2.26)
An alternative method of generating the exponential function et A is to solve directly the equation of motion: d x = Ax. dt
(2.27)
Let us give an initial value x 0 and solve (2.27) to obtain the solution x(t). Writing x(t) = T (t)x 0 , we can define the one-parameter map T (t) = et A . Let us put x = U x˜ . Multiplying both sides of (2.27) by U −1 , we obtain d ˜ x˜ . x˜ = U −1 AU x˜ = A dt
(2.28)
The initial condition is transformed as x˜ (0) = U −1 x 0 . Because the transformed ˜ is a diagonal matrix, the system (2.28) consists of mutually indepengenerator A dent separate equations; each component of the decomposed system reads as (1.28), which is easily integrated to generate the exponential function a ta j ( j = 1, · · · , n). Composing them, we obtain (2.25), and transforming back to the original variables x(t), we obtain (2.26). As has been noted, the rule of dynamics is written in the generator. The process of decoding (or decomposing) the rule is formulated as the eigenvalue problem (2.23). In linear theory, the only rule is that the proportionality coefficient determines the rate of change or the time constant.15 As we have seen, if the generator A is a normal matrix, the dynamics of the system is decomposed into independent motions of the elements. Here the elements mean the eigenvectors; the motion of each element is described by an exponential function in which the time constant is given by the corresponding eigenvalue. The linear dynamics generated by a normal matrix is the most successful example of a “complete” reduction into separate (or orthogonal) elements. A decomposed element (eigenvector) is often called a mode of the dynamical system.
2.3.3 Resonance—Undecomposable Motion In a general dynamical system, it is often difficult to decompose the dynamics into independent motions of separate elements. In this subsection, we study a rather simple example of an undecomposable (irreducible) motion in a linear system. 15
Contemporary linear theory is devoted to the studies of phenomena in infinite-dimensional state space (function space) (see Note 1.1). The generator is an operator (such as a differential operator) defined in a function space; recall the example of (1.35). Then we cannot consider the power series (1.32) to define the exponential function of a general operator. Note 2.1 describes an overview of the theory of exponential functions (semigroups) in function spaces.
2.3
Decomposition—Elucidation of Order
63
Different modes (eigenvectors) that have the same time constant (eigenvalue) may interact through resonance. As hereinafter described, a Jordan block that is contained in the generator explains the mechanism of resonance. In the preceding subsection, we assumed that the generator A is a normal matrix. So we could decompose A in terms of n eigenvectors which are mutually orthogonal (n is the degrees of freedom). However, when the generator A is a general matrix, we have to extend the notion of eigenvectors to the so-called extended eigenvectors to construct a complete basis, which are not necessarily orthogonal to each other. Suppose that we find m ( n) different eigenvalues a1 , · · · , am by solving (2.23). The extended eigenvectors corresponding to an eigenvalue a j are the vectors (excluding the trivial null vector) that satisfy (A − a j I )v φ j,v = 0 (v = 1, · · · , k j , j = 1, · · · , m),
(2.29)
where k j is a certain integer (not greater than n). If v = 1, (2.29) is nothing but the usual eigenvalue problem (2.23). As previously mentioned, if A is a normal matrix, v = 1 suffices in order to find the complete set of eigenvectors. However, for a general matrix, (2.29) with v = 1 may not yield a sufficient number (= n) of eigenvectors, and then, we would need to scout for solutions by increasing v. Such a case occurs when some eigenvalues degenerate into multiple roots (i.e., the number m of different eigenvalues is smaller than the dimension n).16 By the system of extended eigenvectors B = {φ j,v }, we may span the space X = Cn . We note that B is not necessarily an orthogonal system. The canonical form of the matrix J j that satisfies the extended (k j > 1) eigenvalue equation (J j − a j I )k j = 0 is given by the Jordan block ⎛
⎞ aj 1 0 ⎜ .. .. ⎟ ⎜ . . ⎟ ⎜ ⎟. Jj = ⎜ ⎟ .. ⎝ . 1⎠ 0 aj
(2.30)
If k j = 1, we set J j = a j . We define a regular linear map P = (φ 1,1 , · · · , φ m,km ) with the columns φ j,i that are the extended eigenvectors. Using this P, we may transform A into the Jordan canonical form: ⎛ ˜ = P −1 A P = ⎜ A ⎝
0
J1 .. 0
.
⎞ ⎟ ⎠.
(2.31)
Jm
Even if m < n, there is still a possibility that n independent eigenvectors are obtained with v = 1, i.e., some different eigenvectors may share the same eigenvalue. Then we do not need to invoke extended eigenvectors.
16
64
2 From Cosmos to Chaos
The Jordan block, which poses an impediment to the decomposition (diagonalization), represents indelible couplings (interactions) among different modes (extended eigenvectors). Interestingly, as obvious in the Jordan canonical form, such couplings can be reduced into a chain of bilateral couplings; the components in the next diagonal line, the 1s, represent these chain couplings. The irreducible (undecomposable) couplings of different modes, which are represented by a Jordan block, bring about strange motions that cannot be expressed by standard exponential functions. A Jordan block bundles multiple modes (extended eigenvectors) that have a common (degenerate) eigenvalue which represents the time constant (frequency, if the mode is an oscillator). Physically, the coupling of the multiple modes of the Jordan block is the resonance of motions. Let us analyze the behavior of the resonant modes. Using the power series expression (1.32) of the exponential function of a matrix, we may write, for an arbitrary λ ∈ C, et A = etλ et(A−λI ) t2 = etλ I + t(A − λI ) + (A − λI )2 + · · · . 2
(2.32)
We set λ = a j (an eigenvalue) and operate this et A to an extended eigenvector φ j,k j . By (2.29), we find e φ j,k j = e tA
ta j
φ j,k j
t k j −1 φ j,1 , + tφ j,k j −1 + · · · + (k j − 1)!
(2.33)
when we use the relation ( A − a j I ) p φ j,k j = φ j,k j − p . From this calculation, we see that the exponential function of a Jordan block includes terms that behave as t p eta j (0 p k j − 1). Such a term, which is an exponential function multiplied by t p , is called a secular term. It is obvious that a secular term by itself does not satisfy the associative law T (t) · T (s) = T (t + s). However, it is involved appropriately in the matrix operator et A so that the associative law holds as a whole (see Problem 2.3). When the generator is a normal matrix, the motion of the system is decomposed into independent exponential functions, and hence, the stability is judged by surveying the real part of the eigenvalues (see Sect. 2.3.1). For a general generator, however, its canonical form may include Jordan blocks, and then, algebraic instability that grows in proportion to t p for some p (a natural number) occurs, even if the corresponding eigenvalue a j is purely imaginary. Physically, such increase of the amplitudes of oscillators φ j,k j− p is due to the “resonant excitation” by the master mode φ j,k j .
2.3
Decomposition—Elucidation of Order
65
2.3.4 Nonlinear Dynamics—An Infinite Chain of Interacting Modes We have seen that the motion of a finite-dimensional linear system can be decomposed into extended exponential functions t p eta (a ∈ C, p < n). Such a reduction into elements is not generally possible for nonlinear dynamics or infinitedimensional dynamics (see Note 1.1). In other words, truly diverse, complex or irreducible phenomena can occur only in nonlinear or infinite-dimensional systems. What we have to note, in this context, is that infinite dimensionality and nonlinearity are closely related. It is because a nonlinear dynamics yields an infinite chain of interacting modes. Let us study an example. We consider d x = x − x 2, dt
(2.34)
which is a deceleration-type nonlinear proliferation model (1.39). Here, we have rescaled x and t to simplify the coefficients. In (1.40), we already solved the initialvalue problem of this equation; for x(0) = x0 , we obtain x(t) = T (t)x0 =
1 e−t (x0−1
− 1) + 1
.
(2.35)
If we try to describe this nonlinear dynamics within the framework of linear theory, i.e., if we dare to express the motion in terms of exponential functions, the following “infinite dimensionality” of modes emerges. Denoting y2 = x 2 , we consider that y2 is a new unknown variable.17 Then the evolution equation (2.34) includes only the first-order terms of the unknown variables x and y2 . And thus, it may be regarded as a linear equation. However, we need an equation by which we can calculate the evolution of the new variable y2 . By definition, we obtain dx d y2 = 2x = 2y2 − 2x y2 . dt dt Similarly, we may consider that y3 = x y2 = x 3 is another unknown variable and regard this equation as a linear equation. Continuing this process of deferring the evaluation of nonlinear terms, we obtain an infinite chain of linear equations. Let us formally arrange the system of equations. We write y1 = x,
17
y2 = x 2 ,
y3 = x 3 ,
··· .
(2.36)
Recall the process by which we converted a higher-order ODE to a system of first-order ODEs (see Sect. 2.1.3).
66
2 From Cosmos to Chaos
By definition, dyn /dt = nx n−1 (d x/dt). Using (2.34), we obtain d yn = nyn − nyn+1 dt
(n = 1, 2, · · · ).
(2.37)
Regarding the infinite number of parameters y1 , y2 , · · · as unknown variables, (2.37) constitutes a system of infinite-dimensional ODE: ⎞⎛ ⎞ ⎞ ⎛ y1 1 −1 0 · · · y1 ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ 0 2 −2 0 · · · y d ⎜ 2⎟ ⎜ ⎟ ⎜ y2 ⎟ ⎜ y3 ⎟ = ⎜ 0 0 3 −3 0 · · · ⎟ ⎜ y3 ⎟ . ⎠⎝ ⎠ dt ⎝ ⎠ ⎝ .. .. .. .. . . . . ⎛
(2.38)
Looking at the generator (the matrix on the right-hand side), we find couplings of two adjacent modes. These couplings continue infinitely. While the chain of bilateral couplings in (2.38) seems similar to the resonant interactions of modes in (2.30), there is a fundamental difference. In the latter linear system, couplings occur only among modes that have a common time constant (eigenvalue). However, in the former nonlinear system, the chain brings about couplings (interactions) of modes with different time constants (see the diagonal components). The solution of the initial-value problem of (2.34) is constructed with (2.38) with initial values such that yn (0) = x0n
(n = 1, 2, · · · ).
(2.39)
The first component y1 (t) = x(t) of the vector t (y1 , y2 , · · · ) gives the solution of (2.34). We say that the solution of the nonlinear ODE (2.34) is embedded as the first component in the solution of the infinite-dimensional ODE (2.38), if the initial condition satisfies (2.39).18 Such a transformation, called Carleman embedding, of a nonlinear ODE into an infinite-dimensional system of linear ODE is possible as far as the nonlinear terms are given by some polynomials. Let x = t (x1 , · · · , xn ) ∈ Cn be governed by an equation of motion d x = V (x), dt
(2.40)
where V (x) is a polynomial function of variables x1 , · · · , xn . Defining an infinitedimensional vector space (which is called Fock space) by collecting all powers of x1 , · · · , xn , we can define a system of infinite-dimensional linear ODE in which (2.40) is embedded. 18
However, it is known that the solution of (2.38) is not unique. We have to select the solution that j satisfies y j = y1 ( j = 2, 3, · · · ); see Kowalski & Steeb [11].
2.3
Decomposition—Elucidation of Order
67
2.3.5 Chaos—Motion in the Infinite Period We may view a general nonlinear dynamical system as a complex of an infinite number of modes each of which has different time constants. In the example in the preceding subsection, the interactions of different modes have an obvious order, as expressed in the matrix of the evolution equation (2.38), and the motion is indeed represented by a rather simple “function” (2.35). However, in a general nonlinear system, complex couplings of modes result in extremely disordered motion, which is beyond a functional representation—we call such a complex phenomenon “chaos.” In the next section, we shall try to give a clear and distinct meaning to the notion of chaos, but before that, we are going to study an elementary example of disordered evolution. The example discussed here is a modification of the deceleration-type nonlinear proliferation model (2.34); here we consider discrete-time evolution. A certain kind of insects (seed beetles, for example) reproduce simultaneously with each new generation succeeding previous ones all within the same time frame. Thus, it is more appropriate to describe the population change as a function of discrete time n by counting the generation alternation, instead of a continuous time t. In this case, we replace the differential equation (2.34) by a finite-difference equation: xn+1 − xn = b(1 − cxn )xn ,
(2.41)
where a and c are positive constant numbers.19 Writing A = b + 1, we rescale the variable by u n = (bc/A)xn and rewrite (2.41) as u n+1 = A (1 − u n ) u n .
(2.42)
Let us denote by f A (u n ) the function on the right-hand side, which is called the logistic map. We assume that u n ∈ [0, 1]. For the range of the map f A to be included in [0, 1], we assume that 0 A 4. In the foregoing discussion on the differential equation (1.39), we could rescale x and t to normalize all coefficients to unity and cast the equation into the simple form of (2.34). In the present case of finite-difference equation, however, the unit of time is fixed, so we cannot remove the coefficient A—this A plays an essential role in generating complexity. In other words, the discreteness of time is the origin of the “chaos” to be discussed below. Here we explain a graphical method of constructing the solution. The map f A from u n to u n+1 is represented by a parabolic curve in the plane with u n on the horizontal axis and u n+1 on the vertical axis (see Fig. 2.8). Starting from the initialvalue u 0 on the horizontal axis, we move vertically to find the intersection with the parabolic graph; the vertical position of the intersection gives u 1 = f A (u 0 ). Then we
19
R.M. May [14] studied this model and introduced the notion of “chaos” to the field of ecology (cf. Sect. 1.4.1). For detailed analysis of this model, see, for example, Jackson [10].
68
2 From Cosmos to Chaos
(a)
(b)
un + 1
un
1
1
u1
0.5
0.5
0 u0
0.5
u1
1
un
0
25
50
n
Fig. 2.8 The logistic map f A (u n ) = A (1 − u n ) u n ; (a) shows the graphical method in constructing the solution (A = 4), and (b) shows an example of the time series solution. If A is greater than approximately 3.57, the time series becomes “chaotic”
move horizontally to find the intersection with the linear graph u n = u n+1 , which maps u n+1 (the image of f A ) back to the horizontal axis to set the “start point” for the next step. Continuing this process, we obtain a time series by observing the intersection points. The behavior of the time series changes drastically depending on the parameter A, which expands or contracts the parabolic curve graph of f A . If 0 < A 1, the parabolic curve does not intersect with the linear graph. By the aforementioned graphical procedure, it is evident that limn→∞ u n = 0, implying “extinction.” In contrast, if 1 < A 4, the parabolic curve and the linear-graph intersect at u n = u ∗ = 1− A−1 . The intersection, where u ∗ = f A (u ∗ ), is the fixed point of the map f A . In the equation of motion (2.42), the fixed point gives a stationary (or equilibrium) point. By the graphical procedure, we see that, if 1 < A 2, {u n } is a monotonic sequence converging to u ∗ . However, if 2 < A 4, the fixed-point u ∗ exceeds the critical point u n = 1/2 of the logistic map f A , and then, the time series {u n } becomes oscillatory. When A is close to 2, the oscillation is periodic, but it becomes more complex with the further increase of A (see Fig. 2.8). If A is greater than approximately 3.57, the period of oscillation diverges, i.e., the time series is no longer repetitive. This disordered motion is called chaos. Recall that the continuous-time model, i.e., the differential equation (2.34), never produces such complex phenomena. We could solve (2.34) rather easily by transforming it into a simple linear ODE (see the footnote of the solution (1.40)). In the present discrete-time model (2.42), although the nonlinear term is the same, such a transformation does not apply (the nonlinearity is not reducible). We may deem the complexity in this example as being produced by the collaboration of the nonlinearity and the discreteness of time. It is noteworthy that continuous time and
2.3
Decomposition—Elucidation of Order
69
discrete time do not have such an essential difference in linear theory. As we have seen in Sect. 1.3.4, the discrete-time proliferation, in linear theory, is represented by (1 + α)n u 0 , while in continuous time it is naturally extended to eat u 0 , which subsumes the geometric progression by putting t = τ n and aτ = log(1 + α). We can easily judge whether an “equation” is linear or nonlinear, because it is only a matter of scrutinizing the mathematical forms of the included terms. However, as we have learned from the foregoing examples, it is extremely difficult to guess how nonlinearity influences the phenomenon (or the solution). For instance, one might imagine that a nonlinear system is intrinsically complex or disordered. Certainly, this is not right. For example, the regular motion of a planet is governed by a nonlinear equation of motion. The proliferation model (1.39) is also a nonlinear equation. With the deceleration nonlinearity (ε < 0), the behavior of the solution becomes more regular in comparison with the linear model (exponential growth), in the sense that we can predict the future more correctly; the population converges to a universal value that is independent of the initial condition20 : lim x(t) = x∞ = −ε−1 .
t→∞
However, in the solution x(t) = ebt x0 of the linear model, a small difference in the initial condition expands exponentially with time, so that exact prediction far into the future is difficult.
2.3.6 Separability/Inseparability The theory of motion which we described in the foregoing discussion is based on an a priori definition of a state space. However, when we begin the study of a certain phenomenon, we have to start by considering what the state space is. Here, we perform a “virtual experiment” and discuss how we can determine a state space. Suppose that we observe a certain phenomenon and record the temporal variation of a measured parameter. We denote this data by u(t). But we do not know the law (for example, the equation of motion) that governs u(t). We have to examine whether the data set we have measured suffices to describe the object—if not, we cannot derive any law from the data. The data u(t) is merely one parameter on which we are focusing our attention. We do not have a priori justification to assert that this parameter can describe well the phenomenon; we do not know how other parameters or external conditions influence the object. Generally, our research starts from that stage.
Note that the convergent value x∞ is determined by the parameter ε characterizing the nonlinear term. This x∞ is the stationary (or equilibrium) point that is the value of x where the generator (the right-hand side of the equation of motion) vanishes. Equation (1.39) has two stationary points x = 0 and x = x∞ . We easily find that if x0 = 0, x(t) converges to x∞ . 20
70
2 From Cosmos to Chaos
If the behavior of u(t) is “simple,” we may assume that there is a closed law that governs u(t). For instance, if u(t) is found to be constant, we may claim a “u-conservation law.” If the rate of change du(t)/dt is constant, we may claim a “law of inertia” or, if the ratio (du(t)/dt)/u is found to be constant, we may derive an “exponential law.” Of course, such a finding may be a “minimum law” associated with u(t)—we should defer asserting that the phenomenon is completely understood by the observation u(t). For instance, suppose that we measures the total mass of a system that consists of many interacting elements. The constancy of this quantity means “mass conservation law.” Certainly, this is an exact law concerning the total mass, but it has nothing to do with the micro-dynamics of the elements, the mechanisms of interactions, or the structures created in the system. The space of the variable u is separated from the other potential parameters of the system. Yet, such a separation does not necessarily mean absolute independence of the parameter u from the surrounding world. For example, if u(t) is the population of insects, which satisfies an exponential (geometric progression) law (du(t)/dt)/u = c. The constant c which determines the growth rate of the population may change if the temperature, the amount of food, or other environmental conditions are changed. Then, this c becomes a parameter that represents the connection between the group of insects and the environment. The separated variable u may recover its connection with the surrounding world when the parameter c becomes dynamical. In general, the behavior of the data u is not necessarily simple, that it exhibits no order (law). In such a case, we guess that some other variables, in addition to u, are involved in the phenomenon. Failing to observe (or control) the changes of these potential parameters, u(t) is not deterministic all by itself. Mathematically, we may formulate the problem of “measurement” as follows: Suppose that the state of a system is equivalent to a certain vector x(t) ∈ X = Rn .21 Let e be a unit vector that represents a “unit” of certain quantity. The “measurement” of the quantity e with respect to the state vector x(t) is defined by u(t) = x(t) · e.
(2.43)
In order to find a closed (self-contained) law of motion, we have to look for the space that subsume all related state variables. As previously noted, such a space is not necessarily the entire space X . If some variables are decoupled (un-correlated) with the other group of parameters, the former and later subspaces may be studied separately. Let us proceed to formulate a method to find a closed state space, including u, in which we can formulate an equation of motion. Following the idea of Newtonian mechanics, we examine the temporal derivatives of u(t). We put d v u/dt v = u (v) .
21
The state vector x(t) may be a member of a certain infinite-dimensional Hilbert space X . In quantum mechanics, a physical quantity is a certain linear operator A in X , and the “measurement” of A is represented by the inner product x, Ax.
2.3
Decomposition—Elucidation of Order
71
Calculating the differential coefficients u (1) (t), · · · , u (n) (t) from the data u(t) = u (0) (t), we plot the movement of the vector t (u (0) (t), · · · , u (n) (t)) in an (n + 1)dimensional space. If there is a certain relation, which is independent of ad hoc choices of initial conditions, a graph representing this relation will emerge in the (n + 1)-dimensional space. If we find a graph in an (n + 1)-dimensional space, the degree of freedom of the motion is n. The obtained relation implies a differential equation of order n, which may be formally written as F(u, u (1) , · · · , u (n) ) = 0, i.e., as an nth order differential equation. The aforementioned strategy of searching an appropriate state space envisages that the complexity of motion (couplings of multiple parameters) is integrated in the “history” of one parameter u(t). It is then conceivable that they are revealed by calculating the time derivatives. In fact, this method certainly goes well if the system is an autonomous finite-dimensional linear system where the motion is represented as a composition of exponential (or, generally, extended exponential) functions (see Sects. 2.3.2 and 2.3.3). Let us see how the equation of motion can be induced from the data of one parameter, u(t), alone. We consider an n-dimensional state vector that includes a singled-out variable u as a component: x = t (u, u 1 , · · · , u n−1 ) ∈ Cn . Suppose that x(t) obeys an autonomous linear equation of motion: d x = Ax. dt
(2.44)
In our gedankenexperiment, we do not know the governing equation (2.44), so we try to derive it from the analysis of the single data u(t). By the postern assumption of obedience to the aforementioned equation of motion, x(t) must be composed of extended exponential functions whose time constants are the solutions of the eigenvalue problem det (λI − A) = 0.
(2.45)
The left-hand side of (2.45) is an nth degree polynomial of variable λ, which can be written as P(λ) = cn λn + cn−1 λn−1 + · · · + c0 .
(2.46)
Replacing λ by d/dt in P(λ), we obtain an ODE of order n, which is satisfied by u(t): P(d/dt)u = cn
dn d n−1 u + cn−1 n−1 u + · · · + c0 u = 0. n dt dt
(2.47)
In fact, every term included in u(t) is the constant multiple of etλ (or t p etλ ) which satisfies (2.47). Therefore, if we plot the vector t (u (0) (t), · · · , u (n) (t)) in the (n + 1)dimensional sate space, it moves on a hyperplane cn u (n) + cn−1 u (n−1) + · · · + c0 u (0) = 0.
72
2 From Cosmos to Chaos
This gedankenexperiment reveals a remarkable property of a linear system: if we observe one parameter u(t), the influence of all other parameters that have interactions with u(t) can be analyzed by studying the derivatives of u(t).22 In nonlinear systems, graphical structures produced by similar plots are far richer. For example, suppose that u(t) is a data recording the population of insects. Calculating the derivatives, we may plot the vector t (u (0) (t), · · · , u (n) (t)) in the (n + 1)dimensional space. If we find a graph such that u (1) = u (0) − (u (0) )2 , we obtain the governing equation (2.34). Similarly we may analyze finite differences, or equivalently, we may study the relation among some neighboring data points separated by a given interval δ of time. For u(t), we define u (v) (t) = u(t + vδ) (v = 0, 1, 2, · · · ). We plot t (0) (u (t), · · · , u (n) (t)) in the (n + 1)-dimensional space (we may plot for the discrete values of t) and seek for a graph. Such a plot is called a return map. The method of return map is effective for a discrete-time dynamical system because it has an a priori time interval δ. Figure 2.9 shows the return map of the un
u n+1
1
1
0.5
0.5
n 0
25
(a)
50
0
0.5
1
un
(b)
Fig. 2.9 Beneath the complex (chaotic) behavior of a time series, a simple law governs its motion. (a) The time series was generated by the logistic map (see Fig 2.8). (b) Plotting the return map, we can elucidate the rule for generating the time series 22 As we have shown in (2.4) and (2.5), an nth order ODE can be reduced into a system of n simultaneous first-order ODEs. The inverse transformation, i.e., rewriting a system of n simultaneous first-order ODEs as a single nth order ODE, is in general, impossible. However, if the ODEs are autonomous (constant-coefficient) linear equations, it is possible; this is what we have shown here. To be exact, the set of solutions of (2.47) is larger than that of (2.44). As we mentioned in Sect. 2.3.3, if an eigenvalue of A degenerates, it may produce either a Jordan block or a diagonal block, depending on the structure of A. However, if the characteristic equation (2.45) has a multiple root, the solution of (2.47) shows secular behavior (see Problem 2.5).
2.3
Decomposition—Elucidation of Order
73
time series that was generated by the logistic map (2.42). The process of making the return map is the reversal of what we did in making Fig. 2.8. A simple relation (the parabolic curve) between u n and u n+1 is delineated by this plot, while the time series {u n } itself is totally disordered or chaotic. However, a general nonlinear system cannot be analyzed so easily as the foregoing examples; it is generally impossible to separate a single observable parameter from the other potential parameters, nor to elucidate the governing law by the analysis of the singled-out parameter alone. The reason why we succeeded in finding the governing laws in the preceding examples of logistic equations (continuous time or discrete time) is because the relevant parameter is only u (and its derivatives or neighbors). In a nonlinear system of multiple parameters, their interactions brings about unlimited diversity in the motions of all parameters, from which we cannot derive a law that governs a separated single parameter. Let us see an example where the separation of a single parameter becomes difficult. For a gedankenexperiment, we generate data by a three-dimensional nonlinear equation of motion: ⎞ ⎛ ⎞ ⎛ A sin λx3 + C cos λx2 x d ⎝ 1⎠ ⎝ x2 = B sin λx1 + A cos λx3 ⎠ , dt x C sin λx2 + B cos λx1 3
(2.48)
where A, B, C, and λ are real constants. The vector field (flow) on the right-hand side is called the Arnold–Beltrami–Childress (ABC) flow (or map). This function is periodic in every coordinate with a common period of 2π/λ. Here, we chose λ = 2π . We consider a periodic cubic domain whose length of each side is 1. The orbits calculated by (2.48) are superimposed on this cube. In Fig. 2.10, we plot the passing points of orbits through the plane at x3 = 0. Such a graph is called a Poincar´e plot. If one of the coefficients A, B, and C is 0, the motion is regular, and the Poincar´e plot visualizes the patterns with simple structure (see Fig. 2.10a). However, if none of the coefficients is 0, the orbit becomes chaotic (see Fig. 2.10b). Here, our question is if we can understand the mechanism of the dynamics by analyzing of one parameter alone. So we observe one component x1 and record its evolution as a data u(t). We apply the aforementioned method; we calculate the derivatives u (v) (t) and search for a graph in the space of vectors t (u(t), · · · , u (n) (t)). When A = 0, we find, in the four-dimensional space of u (0) , · · · , u (3) , a graph of u (3) sin λu (0) − u (2) λ cos λu (0) + u (1) B 2 λ2 sin3 λu (0) = 0,
(2.49)
where λ = 2π . In fact, writing u (0) = x1 and eliminating x2 from the first and second components of (2.48), we can derive (2.49). However, this method does not work if none of the coefficients A, B, and C is zero. In fact, the determining equation (2.48) behind the data does not allow such a reduction that we could apply in the case of A = 0. We cannot eliminate (separate) the third parameter x3 , which brings about an infinite chain of correlated u (0) , u (1) , · · · . Therefore, we cannot find a closed relation among the differential
74
2 From Cosmos to Chaos
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0.2
0.4
0.6
0.8
(a)
1
0
0
0.2
0.4
0.6
0.8
1
(b)
Fig. 2.10 Poincar´e plots of the ABC flow at the plane x3 = 0. (a) A = 0, B = 1, C = 0.3: The ABC flow does not depend on x3 . (b) A = 0.2, B = 1, C = 0.3: The ABC flow depends on all three variables. The streamlines (orbits) are chaotic
coefficients, implying that the system (2.48) cannot be rewritten as a (higher-order) single ODE governing u(t) = x1 (t). This example shows that nonlinearity may defy separation of variables and make the motion of each component infinitely complicated; we cannot find a closed relation (differential equation) in a finite-dimensional space of derivatives u (0) , u (1) , · · · , u (n) . Such “infinite dimensionality” reflects the unlimited diversity of motion of a nonlinear system. The disordered motion shown in Fig. 2.10b is a typical example of chaos,23 which we shall define more precisely in the next section.
2.4 Invariance in Dynamics 2.4.1 Constants of Motion In the light of mathematics, the order (rule) of motion is revealed by decomposing, resolving, or separating (for our purposes, these terms are synonymous) the multiple parameters involved in the system, and describing the change of each parameter by a “function” that is a mathematical representation of an order. For a linear system, this decomposition is done by optimizing the basis of the linear space; the optimum
23
In a nonlinear system, the coupling among different variables is generally very complicated, so we measure “statistical properties” of the system (cf. Chap. 3). Then the same question arises: can we estimate the statistical property of the “system” by measuring a single observable? Some numerical experiments give optimistic results; see Grassberger & Procaccia [7].
2.4
Invariance in Dynamics
75
basis is found by solving the eigenvalue problem (see Note 2.1). The fundamental idea of linear theory is, thus, the unification of a law (graph) and space to elucidate the order of the phenomena as a structure (basis) of space (cf. Sect. 1.1.1). Nonlinear is the antithesis to such a harmony of law and space—by definition, it is a discord from the structure of space. How, then, should we analyze order in a nonlinear system (if it ever exists)? Or, how can we decompose parameters that weave nonlinear structures? Generalizing the method of linear theory (which invokes eigenvectors as the basis that is most amenable to the law), we plan to span the space by constants of motion. Let us explain the meaning and method of this attempt. To understand temporal changes of state, we look for quantities that do not change—though it may sound paradoxical, this trick is the central strategy of theoretical physics. A certain quantity is called a constant of motion if it does not change with time. If the number of constants of motion amounts to the degrees of freedom, it means that nothing really changes. However, it does not necessarily mean that the actual phenomenon is stationary. Remember that the superficial behavior of an object depends on our perspective of observation, i.e., the choice of parameters; recall the example of transformation from Ptolemaic theory to Copernican theory (cf. Sect. 2.1.1). To put it in another way, even if an object is observed to be moving, it may be possible to transform the parameters to a set of parameters that consists of just constants of motion. First we delineate the meaning of constants of motion in the simplest example, the constant-velocity movement of a free particle. When a particle moves in v-dimensional coordinate space, the mechanics is described by a state vector x = t (q, q ), where q ∈ Rv and q ∈ Rv are, respectively, the vector position (or coordinates) and the velocity of the particle. State space is Rv × Rv . Equation (2.3) of motion becomes, for a free particle, d dt
q q
=
q 0
.
(2.50)
The solution (constant-velocity motion) is written as q (t) = q 0 ,
q(t) = q 0 + t q 0 ,
(2.51)
where q 0 and q 0 are the constant vectors representing the initial values of the vector position and the velocity, respectively. We immediately find v constants of motion q1 , · · · , qv . Because q is a constant vector, we may transform the coordinates to align q parallel to the axis of qv , i.e., we can set q j = 0 ( j = 1, · · · , v − 1). So we obtain other v − 1 constants of motion; q j ( j = 1, · · · , v − 1). Finally, putting qˆ v = qv − tqv,0
(= qv,0 ),
(2.52)
we obtain the 2vth constant of motion qˆ v . We note that the quantity qˆ v includes t, but it is constant when evaluated by the moving particle. The transformation of the variable qv to qˆ v is nothing but the so-called Galilean transformation into
76
2 From Cosmos to Chaos
the frame that moves together with the particle. Now, in the coordinate system {q1 , · · · , qv−1 , qˆ v , q1 , · · · , qv }, which consists of only constants of motion, the particle is completely motionless. We note that all these constants of motion are nothing but the initial conditions; in fact, qˆ v is the initial position on the vth axis. The initial condition is invariant as the “identity” of each orbit. This fundamental fact will be revisited in Sect. 2.4.5. Non-motion represents complete invariance (or ultimate order). In the foregoing example, we could easily transform the motion to non-motion, i.e., we easily found all constants of motion. We may consider a more nontrivial class of motions that can be transformed to non-motion: a dynamical system is said to be integrable if there is an appropriate transformation that converts every orbit into a straight line. A straight-line motion can be further transformed to non-motion by shifting to a moving frame.24 Therefore, integrability is equivalent to complete invariance. For example, let us transform the motion of a harmonic oscillator into a straightline motion. The equation of motion is d2 q = −ω2 q, dt 2
(2.53)
where ω is a real constant determining the angular frequency of oscillation. Putting q = dq/dt, we may cast (2.53) into the form of a first-order system d dt
q q
=
q −ω2 q
.
(2.54)
The general solution is given by q(t) = A sin [ω(t − t0 )],
q (t) = Aω cos [ω(t − t0 )],
(2.55)
where A and t0 are real constants representing the amplitude and initial phase of oscillation, respectively. We transform variables from t (q, q ) to H (q, q ) =
1 2 q + ωq 2 , 2
τ (q, q ) = ω−1 tan−1 (q/q ),
(2.56)
H corresponds to the energy of the oscillator (the first term is the kinetic energy, and the second one is the potential energy) (see Fig. 2.11).25 24
Generally a particle may not move at a constant velocity. However, when the velocity on a straight line is given by qv (qv ), we can integrate dqv /qv (qv ) = dt to obtain qˆ v . This means exactly the “integrability” that we defined in Sect. 2.2.3. See also (2.61). In the theory of mechanics, we often use, instead of the pair H and τ , an almost equivalent pair I = H/ω and φ = ωτ . I is called the action variable, and φ is called the angular variable. Even if the coefficient ω (and thus, H ) changes slowly in comparison with the period ω−1 of the oscillation, we can show that I = H/ω is approximately invariant. A quantity that is almost invariant against slow changes of coefficients (external conditions) is called an adiabatic invariant. In quantum
25
2.4
Invariance in Dynamics
77
q H
ωt q′
τ
(a)
(b)
Fig. 2.11 (a) An elliptic orbit in coordinate-velocity space, representing a harmonic oscillator. (b) After transforming the variables into the action and the angle variables, the harmonic oscillator is represented by a straight line
Plugging (2.55) to H , we obtain H (q, q ) =
(Aω)2 , 2
(2.57)
which implies the conservation of energy. On the other hand, evaluating τ (q, q ) for the solution (2.55), we obtain τ (q, q ) = t − t0 .
(2.58)
Hence, τ + t0 is t. In H -τ space, the motion of a harmonic oscillator appears as a constant-velocity “particle” moving along the straight line of H = constant. The key to the success of this simplification was the finding of the energy conservation law (2.57). Let us explain the geometric meaning of integrability. We represent by a vector x ∈ Rn the state of an autonomous dynamical system (when we consider the motion of a particle, x is an (n = 2v)-dimensional vector composed of the vector position q and the velocity q ). We write the equation of motion as d x = V (x). dt
(2.59)
Given an initial condition, we solve (2.59) to obtain the orbit {x(t)}.
mechanics, the angular frequency ω is considered to represent the energy of one “quantum.” Since H is the macroscopic (classical) energy of an oscillating system, the action I = H/ω means the number of quanta.
78
2 From Cosmos to Chaos
φ 2(x) = c2
φ 1(x) = c1 φ 1(x) = c1′
Fig. 2.12 An orbit (a curve in state space) is given as an intersection of hypersurfaces. If we change the parameter c j that determines the position of each hypersurface, the orbit moves according to the rule of the function that defines the hypersurface; this function is the constant of motion
A smooth curve in the space Rn may be defined as an intersection of n − 1 hypersurfaces (see Fig. 2.12).26 A hypersurface can be identified as a set of points that satisfies some equation φ(x) = c, where φ : Rn → R. For the convenience of later discussions, we introduce a parameter c ∈ R and let the equation determine a level-set of the function φ(x) at a value c. An orbit (curve immersed in Rn ) is given as a set of points that satisfy n − 1 simultaneous equations: φ j (x) = c j
( j = 1, 2, · · · , n − 1),
(2.60)
where each φ j (x) is a smooth function (Rn → R) defined in the neighborhood of some point P on the orbit. Because the orbit is included in every hypersurface, which is the level-set of the function φ j (x), each φ j (x) is a constant of motion. Hence, (2.60) implies conservation laws. If we know a priori n − 1 constants of motion φ1 (x), · · · , φn−1 (x) that are independent of each other (i.e., their level-sets do not contain parallel pairs), we obtain an orbit as the intersection of their level-sets. This means that the dynamical system is integrable. In fact, choosing such functions (constants of motion) as the new variables, and adding one independent variable τ , we obtain the hoped-for transformation; in the new coordinates, n − 1 variables φ1 , · · · , φn−1 are constant, so that the orbit is a straight line along the coordinate τ . At the beginning of this subsection, we said “generalizing the method of linear theory (which invokes eigenvectors as the basis that is most amenable to the law), we plan to span the space by constants of motion.” Now its meaning should be clear. To explore a nonlinear system, the basic strategy of linear theory that persists in straight axes defined by eigenvectors is no longer applicable; we should invoke some appropriate “curved axes.” A constant of motion does fit as the right choice for such an axis. If we have sufficient number (n − 1) of constants of motion, we can transform variables to them (i.e., we can change the way of parameterizing the
26
If the system is not autonomous (i.e., V depends on t explicitly), we have to describe the motion in space–time (see Sect. 2.1.2).
2.4
Invariance in Dynamics
79
system); then, complicatedly curved/twisted/tangled orbits are straightened, and the involved parameters are decomposed into constants. Remember that we already used the word “integrable” in Sect. 2.2.3. It means that we can integrate the equation of motion and define a “function” that describes the motion. Let us show the equivalence of this meaning and the present definition of integrability. Suppose that we have n − 1 constants of motion represented by φ j (x1 , · · · , xn ) = c j
( j = 1, · · · , n − 1).
Solving this for x1 , · · · , xn−1 , we may write x j (t) = ξ j (xn (t); c1 , · · · , cn−1 ), where every ξ j does not include t as an explicit parameter. Plugging these representations into V (x1 , · · · , xn ) on the right-hand side of (2.59), the nth component of (2.59) reads as d xn = Vn (xn ). dt
(2.61)
This one-dimensional ODE is a separable equation, so we can “integrate” it as
d xn = t − t0 Vn (xn )
(2.62)
to obtain xn (t), where t0 is a constant of integration specifying the initial time. Plugging this xn (t) into ξ j ( j = 1, · · · , n − 1), all the other parameters are written as functions of time. The invariance of the constant of motion must hold universally on every orbit starting from an arbitrary initial condition, not just on an ad hoc orbit. The change in the initial condition is equivalent to varying the constants c j in the conservation laws (2.60). With a change of c j (the parameter specifying the level of a level-set), the hypersurface moves, resulting in a shift of the intersection or orbit (see Fig. 2.12). The initial condition determines the future state—this is the fundamental philosophy of mechanics, which is called determinism. However, the process of evolution, in general, is extremely complex and disordered. The notion of integrability is a theoretical challenge to this na¨ıve understanding. By finding a sufficient number of constants of motion, “evolution” may be transformed to constancy. The data characterizing the initial state are decomposed into a complete set of parameters which are invariant throughout the evolution, giving the identity of each orbit. The constants of motion are nothing but the identity of the initial condition. Therefore, integrability is a much stronger assertion than mere existence of a unique solution (implying the determinism) of the initial-value problem of the equation of motion.27 Even if a solution is found for each initial-value problem, the rule that connects an arbitrary initial condition and the corresponding future state has not 27
For the existence and uniqueness of solutions of the initial-value problem of ODE, see Note 1.2 as well as the textbooks Coddington & Levinson [4] and Cronin [6].
80
2 From Cosmos to Chaos
yet been described explicitly (in Sect. 2.3.1, we denoted this rule by an abstract map T (t)). The constants of motion delineate the relation between every initial condition and the corresponding future state.
2.4.2 Chaos—True Evolution As we defined in the preceding subsection, an integrable system may be decomposed into a complete set of elements (parameters) which are kept constant throughout evolution; its superficial variations may be transformed into invariance. So, knowing the way to describe the system by constants of motion, and evaluating their initial values, we may describe the “invariant” identity of the system—the identity is determined by the initial values of the constants of motion. Here we call the negation of integrability chaos. So, chaos is “true evolution (movement)” that cannot be transformed to invariance or non-motion. Let us see how it can happen. Constants of motion are the functions whose level-sets include the orbit of motion (see (2.60)). Inverting the argument, given an orbit (curve) in Rn , we may consider, in geometrical idea, n − 1 independent hypersurfaces that include the curve (see Fig. 2.12). Each of these hypersurfaces is represented as a level-set of some function. Thus, there should be n − 1 independent constants of motion (but we suspend the question of how to represent them in concrete forms). If so, is every motion (orbit) integrable? However, non-integrability or chaos can really occur. The key to resolving this paradox is submerged in the previous awkward statement; when we wrote (2.60), we said that these functions (constants of motion) are “defined in the neighborhood of some point P on the orbit.” The aforementioned geometric intuition is also limited to a certain finite-length segment of the orbit. But a constant of motion must be invariant throughout the whole length of the orbit. If the orbit is infinitely long, the representation of a constant of motion by a function becomes questionable. Let us imagine a group of orbits that move around, curving, twisting, tangling endlessly in a certain domain (see Figs. 1.6 and 2.10). The function (constant of motion) representing each hypersurface must be constant along each orbit. Then each levelset must have an extremely complicated structure. Moreover, level-sets for different values pile up in the domain. Such an “infinite” structure cannot be represented by a single-valued function; this complexity is chaos. From these arguments, it is obvious that a system that has only finite-length orbits—a periodic system—is integrable. Of course, it is generally very difficult to calculate the motion concretely, even if it is periodic; cf. Sect. 2.2.2. However, if an original state recovers and the same history repeats, such a system of eternal recurrence is not “chaotic,” even if the process is complicated. Non-recurrence (or infinite period) is the necessary condition for chaos (true evolution). Yet, infinity of orbit lengths is not a sufficient condition for chaos. Orbits that extend simply in infinite space, such as the orbits of free particles (see Sect. 2.4.1), are integrable. Chaos occurs in a process where infinite diversity is created in a limited domain.
2.4
Invariance in Dynamics
81
2.4.3 Collective Order In the scope of determinism, “individuality” of phenomenon is attributed to its initial condition. It is possible that motion is simple (like motionless or periodic) only when some particular initial conditions are given. Needless to say, we cannot claim that we know the order of the dynamics from such a particular evidence; “order” should be a universal property. We have to study a group of motions with various initial conditions and pry the universal properties out of the bundle of orbits. Let us consider a group of “particles” with various initial conditions. Here we assume that the particles are independent, i.e., we neglect interactions among particles.28 We start by formulating the equation that describes the motion of such a group of particles. Instead of pursuing an individual particle one by one, we want to describe the collective motion of the group of particles. So, we focus our attention on the distributions of physical quantities (parameters characterizing various aspects of the system, such as the number, momentum and energy). We assume that the state of a single particle is represented by a vector x ∈ Rn . Because there is no interaction among particles, the law of dynamics of each particle is closed in their own state space. Every particle is contained in a similar state space, thus, we can put all particles into the state space of a single particle. The motion of every particle is governed by the common equation of motion (2.59). Solving (2.59) for each initial condition, we obtain the individual orbit x(t). The state space is filled with many orbits starting from different initial conditions. Now we denote by u(x, t) the distribution of a certain physical quantity (for example, the number density of the particles). The independent variables are x ∈ Rn (state vector) and t ∈ R (time). Plugging an orbit x(t) (t ∈ R) into the independent variable x of u(x, t), we define u(x(t), t) that means the value of u measured at the position of the corresponding particle. Differentiating u(x(t), t) by t yields the rate of change of u measured along with the motion of the particle; it is compared with the partial derivative ∂u(x, t)/∂t which evaluates the temporal change of u at a fixed spatial point. Using the equation of motion (2.59), we obtain dx j ∂ ∂ ∂ ∂ d u(x(t), t) = u + u= u+ Vj u dt ∂t dt ∂ x ∂t ∂ xj j j=1 j=1 n
=
n
(2.63)
∂ u + V · ∇u. ∂t
Here ∇u is evaluated at the position x(t) on the orbit. However, the final expression does not include the particular orbit (because d x/dt has been replaced by V , the generator of universal orbits). So the space–time derivatives (∂/∂t + V · ∇) can be regarded as a differential operator applicable everywhere in space–time. This operator is called the Lagrange derivative. 28
Here, a “particle” is an abstract representation of one possible state that the system can take. When we consider a particle as an element of a certain macro-system, however, the interactions among particles become an essential subject of analysis. This problem will be discussed in Sect. 3.3.
82
2 From Cosmos to Chaos
As we have defined previously, the physical quantity u(x, t) is called a constant of motion if it is constant on every orbit starting from an arbitrary initial condition. To put it formally, a constant of motion is a solution to a partial differential equation (abbreviated as PDE): ∂ u + V · ∇u = 0. ∂t
(2.64)
In general, u includes t explicitly, so u may change if measured at a fixed position. However, if measured along with the motion of an arbitrary particle, u does not change—this is the meaning of a constant of motion. We have given a simple example in (2.52). In the discussion of Sect. 2.4.1, we considered mainly the constants of motion that do not include t as an independent variable (recall (2.60)). They are the stationary (∂u/∂t = 0) solutions of (2.64). The PDE (2.64) is called an equation of collective motion, which determines the constants of motion, and is distinguished from the ODE (2.59), the equation of particle motion, which describes the motion of an individual particle. We choose an arbitrary “test particle” from the group of particles and trace its motion by (2.59), the equation of particle motion. If a physical quantity u(x, t) is a constant of motion, i.e., if it is a solution of (2.64), u is constant on the orbit of the test particle. Using many test particles with different initial conditions, and tracing their orbits, we can silhouette the shape of the function u(x, t) in space–time. Hence, solving the ODE (2.59) with every initial condition is equivalent to solving the PDE (2.64). Let us see the explicit relation of the constant of motion u(x, t) and the orbit x(t) considering a group of free particles. Particles move in one-dimensional space (the coordinate is denoted by x) at a common speed c. The equation of particle motion is d x =c dt
(a constant).
(2.65)
Solving (2.65) with an initial condition x(0) = xˆ , we obtain x(t) = xˆ + ct. Using this solution, we can backtrack along the orbit; if a particle is located at position x at time t, we estimate that the initial position is xˆ = x − ct. Because the equation of collective motion (2.64) means that u is constant on every orbit of individual particles, we obtain u(x, t) = u(x − ct, 0). Denoting the initial distribution of u by f (x), the right-hand side is f (x − ct). Hence, the solution of the initial-value problem of PDE (2.64) is given by u(x, t) = f (x − ct).
(2.66)
This solution implies that the initial distribution f (x) moves at a constant velocity c preserving its shape. This is the simplest example of wave propagation (see Fig. 2.13). The PDE (2.64) is one of the simplest wave equations.
2.4
Invariance in Dynamics
83 f (x)
f (x–ct)
x
Fig. 2.13 A simple example of wave propagation. The wave function f (x − ct) moves with a constant velocity c
Let us generalize the aforementioned method to a general dynamical system where the generator (the right-hand side of the equation of particle motion) is a multi-dimensional inhomogeneous flow V (x).29 Suppose that x( xˆ ; t) is the solution of the equation of particle motion (2.59) subject to an initial condition xˆ . Here we have included the parameter xˆ into the expression of the solution to specify the initial position of the orbit. The solution x( xˆ ; t) may be regarded as a oneparameter (t ∈ R) map relating xˆ with x (recall the one-parameter group of operators {T (t); t ∈ R} discussed in Sect. 2.3.1). We denote its inverse map by xˆ (x, t).30 The particle that is found at position x at time t came from xˆ (x, t). Therefore, the solution of the equation of collective motion (2.64), subject to an initial condition u(x, 0) = f (x), is given by (see Problem 2.6) u(x, t) = f ( xˆ (x, t)).
(2.67)
The function f (initial condition) may be arbitrarily chosen as long as it is differentiable. We call (2.67) the general solution of the first-order PDE (2.64). A specific initial condition f yields a particular solution.
2.4.4 Complete Solution—The Frame of Space Embodying Order In the preceding subsection, we constructed the solution of the PDE (2.64) of collective motion by calculating the orbits using the ODE (2.59) of particle motion (cf. Note 2.2). We may take the opposite route—we solve an ODE (equation of particle motion) by means of a PDE (equation of collective motion). It may be thought of as a roundabout way that we use a PDE to solve an ODE (because PDE is a 29
We can also consider a non-autonomous system, i.e., the case where V depends on t explicitly. Then we have to describe the motion in space–time, because the orbits can intersect when projected onto the space of only x (see Sect. 2.1.2). In the theory of “ideal” mechanics, the flow V is incompressible (∇ · V = 0) (see Note 2.3). Then the streamlines (orbits) of such a flow can never dissipate or emerge, warranting the unique existence of the inverse map. In mathematics, such a map x( xˆ ; t) is called a diffeomorphism.
30
84
2 From Cosmos to Chaos
“higher-class” problem). But this route leads to the understanding of the order of motion, since the study of order means the search for constants of motion, which is given by solving the equation of collective motion. The constants of motion give the optimum frame of state space (or parameterization of the object), where events appear completely motionless (cf. Sect. 2.4.1). Let a function ϕ(x, t) be a constant of motion, i.e., a solution of (2.64). The set of points (x, t) that satisfy the equation ϕ(x, t) = c (a constant)
(2.68)
define a graph in space–time. A particle starting from a point included in this graph moves on this graph; the constant c is defined by the initial condition. Suppose that there are m constants of motions and that they are independent, i.e., their graphs do not include a parallel pair. We denote them by ϕ j (x1 , · · · , xn , t) ( j = 1, · · · , m). We assume that at least one of these functions includes t as an independent variable. Let us assume that ϕm does, for instance, and we solve ϕm (x1 , · · · , xn , t) = cm
(2.69)
for t. The solution is not necessarily a single-valued function. However, we can describe it as a multiple-valued function, at least for a finite interval of time. Using this expression of t, we can eliminate all t possibly included in other functions. So we obtain m −1 constants of motion that do not include t, which are the “stationary” solutions of (2.64). As we discussed in Sect. 2.4.1, if m (the number of the independent constants of motion) is equal to n (the degree of freedom), the system is integrable; note that we are allowing t to be included. The intersection of the graphs of the constants of motion, i.e., the level-sets of ϕ1 , · · · , ϕm in space–time, or the level-sets of “stationary” ϕ1 , · · · , ϕm−1 in state space, describes the motion of a particle. We call such a set of n constants of motion the “complete set of constants of motion.” Suppose that ϕ(x, t) is a constant of motion, i.e., a solution of (2.64). For an arbitrary smooth function f , we observe ∂ ∂ f (ϕ(x, t)) + (V · ∇) f (ϕ(x, t)) = f ϕ + (V · ∇)ϕ = 0. ∂t ∂t Therefore, f (ϕ) is also a constant of motion. Similarly, if there are m constants of motion ϕ j (x, t), using an arbitrary smooth function f of m independent variables, we can construct f (ϕ1 , · · · , ϕm ) that solves (2.64). The complete solution of (2.64) is a function that is composed of n + 1, the dimension of space–time, independent constants of motion: u = f (ϕ0 , · · · , ϕn ).
(2.70)
Obviously, u = a (an arbitrary constant) satisfies (2.64), so we may put ϕ0 = a. Thus, we need additional n nontrivial constants of motion to construct a complete solution.
2.4
Invariance in Dynamics
85
If we find a solution to (2.64) that includes n + 1 “parameters,” that is a complete solution. In fact, such a solution bears n + 1 constants of motion. Suppose that a function σ (t, x1 , · · · , xn ; k0 , · · · , kn ) with arbitrary real parameters k0 , · · · , kn solves (2.64). Then we find that ϕ j = ∂σ/∂k j ( j = 0, · · · , n) also solves (2.64). Here, the parameters k0 , · · · , kn may be arbitrarily chosen in ϕ j . By this process, we can generate n + 1 constants of motion from σ . The simplest example of a complete solution is the linear combination σ = k0 ϕ0 + · · · + kn ϕn . A solution of (2.64) that includes a smaller number of constants of motion than the degree of freedom should be called an “incomplete solution.” In the latter discussions (Sect. 3.2), a certain kind of incomplete solution (i.e., solution based on a small number of constants of motion) plays a rather important role.
2.4.5 The Difficulty of Infinity Let us recall the general solution (2.67) of the equation of collective motion. The “backtracking map” xˆ (x, t), defining the solution f ( xˆ (x, t)), consists of n independent components xˆ j (x, t) ( j = 1, · · · , n), each of which is constant on every orbit (representing each coordinate of the initial point). The initial condition is invariant as the identity of each orbit. A trivial constant of motion xˆ 0 = constant may be included in f . Remembering the aforementioned definition (2.70), the general solution f (xˆ 0 , xˆ 1 , · · · , xˆ n ), then, seems a complete solution. If we can solve the equation of particle motion (2.59) uniquely, we can define the backtracking map xˆ (x, t). So, do we always have n constants of motion (and thus the complete solution), when (2.59) has a unique solution? Does this mean that solvability is the same as integrability? But this inference is not true; as we noted at the end of Sect. 2.4.1, integrability is a much stronger assertion than mere existence of a unique solution. The former means that we know the order of motions, while the latter claims only that the evolution equation can produce model motions. This paradox stemmed from overlooking the difficulty of “infinity.” The function φ(x) which we can call a constant of motion must satisfy φ(x(t)) = constant for every orbit x(t) and all t; its invariance must be warranted everywhere on every orbit. However, how can we define the backtracking map xˆ (x, t)? We solve an initial-value problem of (2.59) up to a certain time t. Next we return along the orbit back to time 0. By this process, xˆ (x, t) is defined until the time t. What we can calculate and define is a map within a finite range of space–time which we can actually reach by solving the initial-value problem; the function xˆ (x, t) is definable for an arbitrary “finite” t. Yet, we do not generally know the function in the limit of infinite t. The aforementioned construction of the function, which is based on the “experience” of solving the initial-value problem, cannot “predict” a function beyond the experience. We can deduce xˆ (x, t) at infinity only when we know the “order” of motion. Recall the example of (2.66), where we had a concrete universal form of x(xˆ ; t) for all t; so we could define its inverse map xˆ (x, t) for all t.
86
2 From Cosmos to Chaos
Thus, a constant of motion must be a transcendental quantity, or an a priori knowledge; the invariance should be proved, not by induction from the result (observation) of motion, but by deduction from the “structure” of the system. Here, what we call structure is more explicitly a symmetry. In fact, the method of deriving the invariance from symmetry is the essence of mechanics. This should be explained in the next section.
2.5 Symmetry and Conservation Law 2.5.1 Symmetry in Dynamical System We are trying to elucidate the order of phenomenon, not by observing individual motion, but by studying the structures of the governing law of dynamics. We start with a simple example. A free particle makes the simplest ordered (integrable) movement. Let us recall the argument of Sect. 2.4.1. Equation (2.50) of motion of a free particle may be generalized to the form of (2.59). We assume that the flow V is a constant vector defined in n-dimensional state space. Choosing the appropriate coordinates, we may set V1 , · · · , Vn−1 = 0 and Vn = V (a constant). Following the procedure explained in the preceding section, let us find the constants of motion using the equation of collective motion (2.64). For the constant vector V , we may write ∂ ∂ ∂ u + V · ∇u = u + V u = 0. ∂t ∂t ∂ xn
(2.71)
The independent variables involved in (2.71) are only xn and t. The other variables, x1 , · · · , xn−1 , do not appear, hence, (2.71) is invariant with respect to these variables. If an equation of motion (or the motion governed by the equation) is invariant against the change of a certain parameter, the equation (or the motion) is said to be symmetrical with respect to the parameter. A symmetry yields a constant of motion; evidently every u = x j ( j = 1, · · · , n − 1) satisfies (2.71). This rather trivial observation suggests a simple but powerful strategy: a variable that does not enter into the law of mechanics (the equation of motion, for example) yields a constant of motion. Therefore, a constant of motion can be found by seeking a symmetry. In the foregoing example, the symmetries are apparent in the expression of V (x, t) (though we did a minimal arrangement by choosing the coordinates so that V is parallel to one axis). In general, symmetry is not that obvious. However, there is a possibility that an appropriate transformation of the coordinates (i.e., the parameters to represent the system) lets a symmetry emerge. Our quest for the order of motion has uncovered a chain of notions: “frame of space that elucidates order of motion” = “constants of motion” = “symmetries of law”.
(2.72)
2.5
Symmetry and Conservation Law
87
Although these equalities paraphrase the same problem, it suggests a strategy to go from the clue of symmetry into the depth of phenomena. An overarching point is that the appearance of motion changes when the frame of space (coordinate system for parameterization) is transformed, as we learned in the transformation from Ptolemaic to Copernican theory. Hence, we may expect that a seemingly complex motion should be simplified. However, this theoretical donn´ee—the arbitrariness in the choice of a coordinate system—relativizes the notion of “symmetry”; a symmetry that appears in a certain coordinate system disappears in another coordinate system. To make some progress, therefore, we have to elucidate a “structure” that exists in the further depths of equations of motion and is no longer dependent upon a specific choice of the coordinate system. The “deep structure” of dynamical systems was investigated by Leonhard Euler (1707–1783), Joseph-Louis Lagrange (1736–1813), William Rowan Hamilton (1805–1865), and others. Their theories are now brought together into the system of analytical mechanics. According to this theory, the blueprint of all possible motions is written into “energy.” Let us study the method to elucidate the symmetry of energy.
2.5.2 The Deep Structure of Dynamical System The fundamental law of mechanics must have a universal structure that does not depend on the arbitrary selection of a coordinate system. Equation (2.59) of motion, which describes the relation between the tangent vector of an orbit and the force in some coordinate system, does change its form when we go to a different frame. A fundamental principle of motion that is free from the selection of a coordinate system must exist in the depths further from the equation of motion. We should write a theory so that an explicit form of the equation of motion is embodied, for each specific choice of a coordinate system, by the abstract fundamental principle. The motion of an object is represented by a curve, the so-called orbit, in state space (cf. Sect. 2.1). The aim of the theory is to find a law that determines the “actual” curve. So far, we have used the method of Newtonian mechanics that describes laws in terms of differential equations. Here we invoke another method (or syntax) called the variational principle. In comparison with the expressions of laws by differential equations, variational principles are more implicit and abstract. The arbitrariness of coordinate systems is subsumed in a “freedom of representation” caused by such abstraction. Let us see how variational principles describe the laws of physics. The variational principle interprets an event as follows: the phenomenon that gains the “actuality” is the one, among various potential possibilities, that achieves the best “purpose.”31 For example, light chooses the path (or the ray) between point A and point B that minimizes the time for travel: 31 A
stretched interpretation of a variational principle has the danger to lead us to teleology, distorting or trivializing various problems such as evolutions and movements of ecosystems or social
88
2 From Cosmos to Chaos
T =
n dq,
(2.73)
where n(q) is the refraction factor at position q and dq is the line integral element along the path. We may interpret this law, known as Fermat’s principle, as if light selects the best route according to the “purpose” of minimizing the travel time. Generally, the variational principle for describing an event (or a phenomenon that occurs in space–time) is formulated as a problem of finding the minimum of a certain integral that is called an action. This is the so-called least-action principle or Hamilton’s principle. Let us denote a four-vector of four-dimensional space–time by X = t (t, x1 , x2 , x3 ) = t (x0 , x1 , x2 , x3 ) and a four-dimensional volume element by d 4 X . The action is an integral such that S=
σ (X) d 4 X.
(2.74)
Generally, an event changes its appearance, depending on our observation or parameterization by selection of a certain coordinate system. However, the value of the integral is independent of the choice of the coordinate system. The variational principle is written in terms of this invariant quantity. Here we are explaining only the “syntax” for describing a law, so leaving the meaning and the explicit form of S or σ untouched, we proceed with abstract calculus. When we consider the motion of a “particle,” S is given by a line integral along a curve in space–time on which the particle moves. The position of the particle is represented by a vector q ∈ R3 . We consider a curve {q(t); t0 t t1 } as a candidate of the actual orbit (t0 and t1 are the initial and final times).32 The fourdimensional volume integral of (2.74) is reduced to a line integral (1-form) such that ⎛ ⎞ 3 3 dq j ⎠ dt, p j dq j = ⎝−H + pj σ (X) d 4 X = −H dt + (2.75) dt j=1 j=1 where p = t ( p1 , p2 , p3 ) and H are functions defined on the curve {q(t); t0 t t1 }. The former three-vector is called momentum, and the latter is called energy. They determine how the infinitesimal temporal and spacial movements contribute to the action. The integrand of the final expression is called a Lagrangian, which will be denoted by L. Thus, the action is the temporal integral of the Lagrangian. The functions p and H are generally not independent; through their correlations, the motion becomes nontrivial. In standard theory of mechanics, we relegate
systems. What we describe here is a formal logic of mathematical science, thus, the word “purpose” is merely a rhetoric. 32
Here, we consider the motion of single particle. However, if there are n particles, we have to consider n curves {q j (t); j = 1, · · · , n, t0 t t1 } that contribute to the integral.
2.5
Symmetry and Conservation Law
89
the complexity (expression of the relation between space and time) to H , bearing in mind that temporal perspective is often more difficult than spacial perspective. So, we consider that p, like q, is an independent vector variable, while we assume that the energy H is a function of q and t as well as p. The function H (q, p, t) is called a Hamiltonian. In this formulation, there are seven independent variables q, p, and t. Combining q and p, we define a six-dimensional vector ξ = t (q1 , q2 , q3 , p1 , p2 , p3 ). A particle motion is represented by an “orbit” {ξ (t); t0 t t1 }, and the action is regarded as a line integral in seven-dimensional space: ⎛ ⎞ t1 t1 dξ j ⎠ ⎝−H (ξ , t) + L dt = a j (ξ ) S= dt, (2.76) dt t0 t0 j where
a = j
p j ( j = 1, 2, 3), 0 ( j = 4, 5, 6).
(2.77)
The variational principle demands that the action S is minimized by the true orbit ξ (t). An infinitesimal displacement of the orbit (path of the line integral) causes a small variation of S which we denote by δS. If an S that is evaluated for a certain curve is at its extremal value, the variation δS must be zero (just like the critical point of a function is characterized as the point where the derivatives vanish). Hence, the true orbit—the minimizer of S—is found by examining the variation δS. Let us calculate the variation of action (2.76). For a small displacement ξk (t) → ξk (t) + δξk (t), the variation is given by δS = t0
⎡ t1
⎣− ∂ H δξk + ∂ξk
⎤ dξ j dδξk ⎦ + a j δ jk δξk dt. ∂ξk dt dt
∂a j j
Here, the displacement is allowed only for the path between the fixed ends ξ (t0 ) and ξ (t1 ), so we must impose the boundary conditions δξk (t0 ) = δξk (t1 ) = 0.33 Integrating by parts, we obtain ⎡
" #⎤ j ∂a j dξ ∂a dξ ∂ H j ⎣− − δ jk δξk ⎦ dt δS = δξk + δξk ∂ξk ∂ξ dt ∂ξ dt k t0 j ⎡ ⎤ t1 dξ ∂ H j⎦ ⎣− = + f kj δξk dt, ∂ξk dt t0 j
t1
where 33
Under the assumption (2.77), what we really need is δq(t0 ) = δq(t1 ) = 0.
90
2 From Cosmos to Chaos
f kj =
∂a j ∂a k − . ∂ξk ∂ξ j
(2.78)
In what follows, we denote by F the matrix that has the components f k j . Because δξk is an arbitrary function on the path of integration, the condition δS = 0 demands
f kj
j
dξ j ∂H = dt ∂ξk
(k = 1, · · · , 6) .
(2.79)
This equation is called Hamilton’s equation. Using the definition (2.77) of a j (ξ ), (2.78) reads as34 F=
0 −I I 0
.
(2.80)
Since this F is a unitary matrix, its inverse is uniquely determined; denoting its components by f k j , Hamilton’s equation (2.79) may be rewritten as d ∂H fk j ξk = dt ∂ξ j j
⇐⇒
d dt
q ∂pH , = −∂q H p
(2.81)
where ∂q and ∂ p are the gradients with respect to q and p, i.e., assuming that the dimension of coordinate space is v, ⎞ ∂u/∂q1 ⎟ ⎜ .. ∂q u = ⎝ ⎠, . ⎛
∂u/∂qv
⎞ ∂u/∂ p1 ⎟ ⎜ .. ∂ pu = ⎝ ⎠. . ⎛
∂u/∂ pv
Equation (2.81) is called Hamilton’s canonical equation (cf. Note 2.2). We have prepared the general syntax of the variational principle for describing the law of dynamics. Now, we have to embody the law by formulating an appropriate Hamiltonian H (thus the Lagrangian L) which makes Hamilton’s canonical equation (2.81) equivalent to Newton’s equation (2.2). Depending on the force that drives the dynamics, the concrete form of H changes. For a “potential force” that is given by F = −∂q U , where U (q) is a potential energy, the Hamiltonian H for a particle of mass m is H=
34
1 | p|2 + U (q). 2m
(2.82)
We may consider general a j (ξ ) and the corresponding antisymmetric operator F (see Note 2.3).
2.5
Symmetry and Conservation Law
91
The first term of H is the kinetic energy, so the Hamiltonian is the total energy. Because (2.81) yields dq/dt = p/m, the momentum p, in this case, is the mass multiple of the velocity. Note that we do not know the Hamiltonian (or the Lagrangian) by intuition, but is guessed from the first-hand knowledge of the equation of motion. However, once we find this “deep structure,” we now have a clue to investigate the symmetry of the law. The variational principle is a general syntax to construct a concrete form of the law in an arbitrary coordinate system.35 As we easily notice, the Hamiltonian is not uniquely determined and has a free-space for transformation. This free-space helps us find symmetries. Let us see how.
2.5.3 The Translation of Motion and Non-motion The canonical equation (2.81) displays clearly the relation between symmetries and constants of motion; if the Hamiltonian H is symmetric with respect to a variable q (i.e., ∂ H/∂q ), we obtain dp /dt = 0. Similarly, if H is symmetric with respect to p , then q is a constant of motion. We call the paired q and p the conjugate variables. This relation between symmetries and conservation laws, which is projected on the Hamiltonian H , is just a formalization of the fact that we have already noticed by observing the na¨ıve form of the equation of motion, (2.59). Here again, it is easy to find a constant of motion if a symmetry is obvious in H . However, there is a possibility that H contains a potential symmetry which may be revealed by rewriting H in another coordinate system. This expectation motivates us to rewrite H by transforming variables. But what is the guiding principle in rewriting it? The ultimate goal of our search for the order of motion is to transform motion into non-motion (see Sect. 2.4.1). When the object is at rest, the Hamiltonian (the energy) is zero (note that a coordinate transformation changes energy). Thus, our target is the coordinate system where the Hamiltonian becomes zero. To proceed with this strategy, we need a skill for calculating the change of H against the transformation of the coordinates. We focus on the non-uniqueness of H (or S, which we may utilize); the idea is to relate transformations of H to the transformation of the coordinates. The variational principle δS = 0 is invariant against adding a constant C to S. Introducing a certain smooth function W (q, t), we put C = W (q 0 , t0 ) − W (q 1 , t1 ),
35
Physicists believe that a fundamental law of mechanics must be expressible in the form of variational principle. However, a more general dynamical system such as ecosystems, economic systems, chemical reactions, and even physical systems that are governed by phenomenological forces like friction is not necessarily described by a variational principle.
92
2 From Cosmos to Chaos
where q 0 = q(t0 ) and q 1 = q(t1 ) are fixed end points. We put S˜ = S + C. Remembering (2.75), we may write ⎛
S˜ =
⎝−H dt +
Γ
⎞
p j dq j − dW ⎠ ,
(2.83)
j
where dW is the total differentiation of W . The path Γ of the integral is a curve {ξ (t); t0 t t1 } with both ends fixed. A vector q˜ that has the same dimension of ˜ t) is given by q may sneak into W . Then the total differentiation of W (q, q, dW =
∂W ∂q j
j
dq j +
∂W
d q˜ j +
∂ q˜ j
j
∂W dt. ∂t
Now (2.83) reads as S˜ =
⎡
⎣− H + ∂ W ∂t Γ
dt +
pj −
j
∂W ∂q j
dq j −
∂W j
∂ q˜ j
⎤ d q˜ j ⎦ .
(2.84)
Here we choose W such that pj =
∂W . ∂q j
(2.85)
p˜ j = −
∂W , ∂ q˜ j
(2.86)
We put
˜ = H + ∂W . H ∂t
(2.87)
Then, (2.84) reads as S˜ =
Γ
⎛ ˜ dt + ⎝− H
⎞ p˜ j d q˜ j ⎠ .
(2.88)
j
Because (2.88) is similar to (2.76), the corresponding canonical equation of motion has the form similar to (2.81), while the variables are transformed as ˜ q → q,
p → p˜ ,
˜. H→H
This transformation is called the canonical transformation of the first kind.
(2.89)
2.5
Symmetry and Conservation Law
93
Table 2.1 Relations among variables in canonical transformations. The Hamiltonian transforms as ˜ = H + ∂ W/∂t = H + ∂Φ/∂t H Kind Generating function Original variables New variables 1st 2nd 3rd 4th
W W W W
˜ t) = Φ(q, q, = Φ(q, p˜ , t) − p˜ · q˜ ˜ t) + p · q = Φ( p, q, = Φ( p, p˜ , t) − p˜ · q˜ + p · q
p j = ∂Φ/∂q j p j = ∂Φ/∂q j q j = ∂Φ/∂ p j q j = ∂Φ/∂ p j
p˜ j = −∂Φ/∂ q˜ j q˜ j = ∂Φ/∂ p˜ j p˜ j = −∂Φ/∂ q˜ j q˜ j = ∂Φ/∂ p˜ j
The essence of the foregoing calculation is the use of the freedom to add a constant C to the action S. Considering that this C is a constant of integration ˜ t), we may introduce new parameters of a certain total differentiation dW (q, q, q˜ = t (q˜ 1 , · · · , q˜ v ). The function W is called a generating function. In the canonical transformation (2.85), (2.86) and (2.87), the original momentum variables p impose condition (2.85) on the generating function W , while the new coordinates q˜ are free parameters. Instead of these settings, we may consider similar canonical transformations where the new momentum variables p˜ are free parameters, or the original coordinates q restrict the generating function. These relations of transformations are summarized in Table 2.1. ˜ = 0; then (2.81) reads as Our aim is to find a transformation that yields H ˜ conservation laws d q/dt = 0, and d p˜ /dt = 0. Now the central problem is to find the generating function that yields this ultimate transformation. ˜ = 0, we We invoke the second-kind transformation of Table 2.1. To obtain H need Φ such that ∂ Φ + H (q1 , · · · , qv , p1 , · · · , pv , t) = 0. (2.90) ∂t By the rule of the transformation, p j = ∂Φ/∂q j ( j = 1, · · · , v). Plugging these relations in p j of H , (2.90) reads as a PDE determining the function Φ, the essential part of the generating function. The PDE (2.90) is called the Hamilton–Jacobi equation (cf. Note 2.2). Here, an important point is that we have to find the complete solution of (2.90), i.e., we need a function Φ that includes v parameters p˜ 1 , · · · , p˜ v . This mission is much more difficult than solving the initial-value problem of the PDE (2.90). The complete solution of (2.90) completes the decipherment of the order of motion. In the coordinate system defined by q˜ j =
∂Φ , ∂ p˜ j
˜ = 0, implying complete symmetry or non-motion (all parameters q˜ we observe H and p˜ are constants of motion). ˜ /∂t = 0. In an autonomous system, we have ∂ H/∂t = 0. We also demand that ∂ H 2 2 Then, by (2.87), we require ∂ W/∂t = 0. So, we may assume Φ(q, p˜ , t) = Ψ(q, p˜ ) − Et, where E is a constant. The Hamilton–Jacobi equation (2.90) is reduced to
(2.91)
94
2 From Cosmos to Chaos
H (q1 , · · · , qv , ∂Ψ/∂q1 , · · · , ∂Ψ/∂qv ) = E.
(2.92)
Let us summarize the foregoing discussion. Our quest for the order of events is guided by a fundamental fact that vision changes when the view is transformed. Here “view” means the choice of the coordinate system or the process of the parameterization. The chain of the notions of order (2.72) is now embodied by the mathematical theory of transformations of variables. We have formulated the method of elucidating symmetries included in the Hamiltonian, that is, the Hamilton–Jacobi equation (2.90) to determine the generating function. The analysis of this equation is a process of decomposition, as we shall see in the next subsection.
2.5.4 Chaos—The Impossibility of Decomposition The order of motion is obvious even in a high-dimensional system if we can decompose the system into a set of independent elements. In linear theory, we have a strong method for decomposition, that is the eigenvalue problem of the generator. Decomposing the state vector in terms of the eigenvectors, the motion of the system is separated into a set of exponential (or, generally, extended exponential) functions (cf. Sect. 2.3.2). In nonlinear theory, the essence is still the “decomposition.” In principle the method of finding a symmetry (Sect. 2.5.2) means separating a variable from the others. In fact, the Hamilton–Jacobi equation, which formulates this procedure, is actually solvable if the involvement of variables is not so complicated and they are “decomposable.” Here we study an example where each canonical pair (q j , p j ) forms a block in the Hamiltonian H as hereinafter described (we assume that the system is autonomous). Let the first pair (q1 , p1 ) define a function h 1 (q1 , p1 ) which is the first block. The second pair appears in the second block that is written as h 2 (q2 , p2 ; h 1 ). Inductively, we define the jth block h j (q j , p j ; h 1 , · · · , h j−1 ). We assume that the Hamiltonian is given by H = h v (qv , pv ; h 1 , · · · , h v−1 ),
(2.93)
which is said to be a separable type. For the Hamiltonian of (2.93), the complete solution of the Hamilton–Jacobi equation (2.90) is given in the form Φ=
v
Ψ j (q j ; ε1 , · · · , ε j ) − E(ε1 , · · · , εv )t,
(2.94)
j=1
where each Ψ j is determined by the decomposed Hamilton–Jacobi equation h j (q j , dΨ j /dq j ; ε1 , · · · , ε j−1 ) = ε j
( j = 1, · · · , v).
(2.95)
2.5
Symmetry and Conservation Law
95
Each separate equation of (2.95) is an ODE which we can solve by quadrature. The complete solution (2.94) consists of terms including q j (as well as t) separately. The constants of motion are ε j , ∂Φ/∂ε j ( j = 1, · · · , v) as well as the total energy E(ε1 , · · · , εv ).36 We give a simple example. Let us consider H=
v j=1
h j (q j , p j ) =
v 1 2 2 ω j q j + p 2j , 2 j=1
(2.96)
where each ω j is a real constant number. The canonical equation of motion (2.81) is a linear ODE with a matrix generator: ⎞⎛ ⎞ ⎞ ⎛ 0 1 q1 q1 ⎟ ⎜ p1 ⎟ ⎜ p1 ⎟ ⎜ −ω2 0 0 1 ⎟⎜ ⎟ ⎟ ⎜ d ⎜ ⎟ ⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟⎜ . ⎟. ⎜ . ⎟=⎜ . ⎟⎜ ⎟ dt ⎜ ⎟ ⎜ ⎝ qv ⎠ ⎝ 0 0 1 ⎠ ⎝ qv ⎠ pv pv −ωv2 0 ⎛
The generator is made up of the on-diagonal 2×2 blocks, so we can easily diagonalize it. The eigenvalues are ±iω j ( j = 1, · · · , v) and the corresponding eigenvectors are x± j = ω j q j ± p j . In terms of these modes, the motion is decomposed into independent harmonic oscillators. Let us deal with the same problem invoking the Hamilton–Jacobi equation. Plugging h j of (2.96) into the decomposed Hamilton–Jacobi equation (2.95), we obtain $ dΨ j = 2ε j − ω2j q 2j dq j
( j = 1, · · · , v),
which yields Ψ j (q j ; ε j ) =
qj $ εj 2ε j − ω2j q 2j + sin−1 (ω j q j / 2ε j ( j = 1, · · · , v). 2 ωj
Thus, we have 2v constants of motion % εj ∂Ψ s j = ∂ε jj = ω−1 sin−1 (ω j q j / 2ε j )
(2.97)
( j = 1, · · · , v),
which represent, respectively, the energy and the initial phase of each oscillator (cf. (2.55)).
If only M(< v) pairs form blocks, we can construct an “incomplete solution” by combining the solutions of (2.95) for j = 1, · · · , M and by which we obtain 2M constants of motion.
36
96
2 From Cosmos to Chaos
Because the foregoing example is a linear equation, the method of Hamilton– Jacobi equation is only complex when compared with the method of eigenvalue problem. Needless to say, however, the latter method does not apply in nonlinear problems. For example, suppose that the coefficient ω j depends on h k (k < j). Such a Hamiltonian describes a nonlinear oscillator whose angular frequency modulates under the influence of the energy of another oscillator (while influence in the opposite direction is not allowed in a separable-type system). In this case, the decomposed Hamilton–Jacobi equation applies and helps integrating the equation of motion (see Problem 2.7). However, in a more general nonlinear problem (for example, coupled nonlinear oscillators whose frequency modulates through mutual influences), it is not possible to find the complete solution to the Hamilton–Jacobi equation, if the couplings of multiple variables are not separable. Unresolvable tangling of variables makes it impossible to transform the motion of the system into non-motion—such true evolution is what we call chaos. In this chapter, we started discussions on the dichotomy between order and chaos from their phenomenological aspects such as the possibility/impossibility of functional representation (Sect. 2.2.3), or recurrence/non-recurrence (Sect. 2.3.5). Then we defined for dynamical systems order/chaos by integrability/non-integrability (Sect. 2.4.2). And we found, at the core of integrability, separability (or decomposability) of variables enables elucidation of symmetry or order. In linear theory, decomposition is thoroughly done by solving eigenvalue problems. In nonlinear theory, constants of motion are supposed to play the role of eigenvectors. But we often fail to find a sufficient number of constants of motion, i.e., we encounter nonintegrability or chaos. The emergence of chaos may be explained by the following geometrical image. Learning from the success in the discovery of the order of the solar system by the transformation from Ptolemaic theory to Copernican theory, we try to simplify the description of motion by geometric transformations. Various orbits traced by particles starting from different initial conditions look like intertwined braid. The ultimate goal is to transform all orbits into a group of straight lines; integrating the equation of motion along the straight orbits, motion is expressed by a “function.” The simplest frame we may set for state space is the Cartesian coordinate system. However, nonlinear dynamics does not fit in it and appears complex. To simplify the description of the dynamics, we have to fit the state space to the phenomena. We have to restructure the frame of the state space. Generally, linear transformations fall short; it is necessary to bend the coordinate axes inhomogeneously. If we can set a coordinate axis that parallels all orbits, we can obtain the desired frame of the state space. Chaos is the impossibility of this attempt. A bundle of curves—infinitely intertwined braid that cannot be resolved into a group of straight lines: this is the geometric image of chaos.
Notes
97
Notes Note 2.1 (Linear Dynamics in Function Space) We briefly review the theory of linear evolution equations in function spaces (cf. Note 1.1). Let us consider a first-order differential equation in a Hilbert space X : d u = Au. dt
(2.98)
The generator A is a linear operator defined in X . If A involves spatial derivatives ∂/∂ x j , we usually denote the left-hand side temporal derivative by the partial derivative ∂/∂t. However, when we interpret (2.98) as an equation of motion (ODE) in X , it is more appropriate to denote the temporal derivative by d/dt (the limit in calculating the differential is defined by the norm of X ).37 The ODE (2.98) is called an evolution equation. When A = Δ (Laplacian), for example, (2.98) reads as the diffusion equation, or, when A = −iH (H is Hamilton’s operator), it is the Schr¨odinger equation. Our aim is to generate the exponential operator etA by A and solve the initialvalue problem of (2.98) as u(t) = etA u 0 for an arbitrary initial condition u(0) = u 0 . There are several abstract theories that prove the existence of the exponential operator generated by a certain class of linear operators. Here we explain the theory due to von Neumann [18]. As we have studied in Sect. 2.3.2, the structure of the exponential function of a matrix (a linear operator in a finite-dimensional vector space) can be elucidated by studying the eigenvalue problem with respect to the generator. We follow this strategy. First, let us assume that the eigenvalue problem of the generator A yields a complete orthonormal basis {ϕ j ; j = 1, 2, · · · } that spans the Hilbert space X (cf. Note 1.1). Then an arbitrary u ∈ X can be decomposed as u=
∞
(u, ϕ j )ϕ j ,
(2.99)
j=1
where (u, v) is the inner product of u and v. By Aϕ j = a j ϕ j (a j is the eigenvalue), we may write formally as Au =
∞
a j (u, ϕ j )ϕ j .
(2.100)
j=1
37
Note that d/dt in (2.98) is not the Lagrange derivative (2.63), though some authors denote it by the same symbol.
98
2 From Cosmos to Chaos
Putting u j = (u, ϕ j ), the evolution equation (2.98) is rewritten as an infinitedimensional system of “independent” ODEs: ⎞⎛ ⎞ ⎞ ⎛ u1 a1 u1 0 d ⎜u ⎟ ⎜ a ⎟ ⎜ u2 ⎟ 2 2 ⎝ ⎠=⎝ ⎠⎝ ⎠. dt .. .. .. . 0 . . ⎛
(2.101)
Solving one by one, we obtain a formal solution of (2.101): e u= tA
∞
eta j (u, ϕ j )ϕ j .
(2.102)
j=1
Since the right-hand side is an infinite series, its convergence is a subtle issue; the distribution of the eigenvalues a j (that are not necessarily bounded) must be carefully considered. Moreover, the problem is the construction of the complete orthonormal basis by the eigenvalue problem of a general generator. In finitedimensional linear theory, it is well known that the eigenvalue problem of a normal matrix yields a complete orthonormal system of eigenvectors (for a general matrix, a complete system is given by extended eigenvectors). However, the theory of a linear operator in an infinite-dimensional Hilbert space is not that simple. A general theory is established only for self-adjoint operators. A linear operator A in a Hilbert space X is said to be self-adjoint if its adjoint operator A∗ (which is defined by (Au, v) = (u, A∗ v)) is identical to A (i.e., A and A∗ are defined on the same domain in X , and there, A = A∗ ). To span X , the totality of the eigenfunctions may not be enough; then we have to generalize the notion of eigenvalues (synonymously point spectra) and eigenfunctions with adding the continuous spectra and the corresponding singular eigenfunctions. Generalizing the eigenfunction expansion (2.99), we consider the resolution of functions in an integral form: u=
(u, φμ )φμ dm(μ),
(2.103)
where m(μ) is a monotonically increasing function of μ ∈ R, dm(μ) is the Riemann–Stieltjes measure, and {φμ ; μ ∈ R} is a system of generalized eigenfunctions that consists of conventional eigenfunctions and singular eigenfunctions (as hereinafter defined). At μ = a j (the eigenvalue (point spectrum) of A), we give a “step” of unit width in m(μ) and set φa j = ϕ j (the corresponding eigenfunction). We may formally write dm(μ) = δ(μ − a j )dμ, where δ(x) is Dirac’s delta function. If the spectra consist of only eigenvalues, (2.103) reduces to (2.99). However, eigenfunctions of a general self-adjoint operator are not necessarily complete to span the Hilbert space; then, we have to invoke continuous spectra which contribute to the integral (2.103) by continuous increases in m(μ). The corresponding φμ is a singular eigenfunction which it is not a member of the Hilbert space. Therefore, the “inner product” (u, φμ ) is not defined in the standard sense—it is
Notes
99
only a formal representation of the “projection” property of the decomposition. We call (2.103) the resolution of identity. The spectral resolution of the operator A is given by Au =
μ(u, φμ )φμ dm(μ),
(2.104)
and the exponential function of the operator A may be written as e u= tA
etμ (u, φμ )φμ dm(μ).
(2.105)
These formal expressions are justified by von Neumann’s theorem. The purpose of this short Note is not to describe the rigorous mathematical theory (for the mathematical theory of linear operators in function spaces, the reader is referred to the textbooks listed at the end of this Note). Instead, we give an example and explain the basic idea of continuous spectra. We denote by X the operator that maps a function u(x) ∈ L 2 (0, 1) to xu(x), i.e., it multiplies u(x) by the coordinate x. In quantum mechanics, X is called the coordinate operator. The eigenvalue problem X φμ = μφμ has “formal” solutions: for every μ ∈ [0, 1], φμ = δ(x − μ) satisfies the equation. However, δ(x − μ) (Dirac’s delta function) is not a member of the Hilbert space L 2 (0, 1). Therefore, it is not a solution of the eigenvalue problem in the rigorous sense—it is a singular eigenfunction. Let us admit a formal expression (u(x), δ(x − μ)) =
u(x)δ(x − μ) d x = u(μ).
Then we may write u(x) =
(u, δ(x − μ))δ(x − μ) dμ =
u(μ)δ(x − μ) dμ,
which reads as (2.103) with m(μ) = μ. By (2.105), the exponential function of −iX is e−itX u = e−itμ u(μ)δ(x − μ) dμ = e−it x u(x). In the study of the order of an autonomous linear system, we solve the eigenvalue problem of the generator to obtain the “optimum” representation (parameterization) of state vectors (functions). In a finite-dimensional space, we can always find a sufficient number (= the degree of freedom) of independent eigenvectors (including extended eigenvectors), so this mission is, in principle, achievable (see Sects. 2.3.2 and 2.3.3). Also in an infinite-dimensional space, if we can decompose the generator by the eigenfunctions, we may simplify the description of motion as (2.101),
100
2 From Cosmos to Chaos
i.e., the summation over an infinite number of independent exponential functions. However, continuous spectra may cause considerably complex behavior. We shall revisit this issue in Note 3.3, where we shall discuss chaos in infinite-dimensional linear systems. For systematic study of the theory of spectral resolution of self-adjoint operators, the reader is referred to Reed & Simon [17]. For the mathematical theory of the integration of evolution equations (including some class of nonlinear equations) in function spaces, see Chap. XIV of Yosida [20].
Note 2.2 (Partial Differential Equations and Characteristics) This Note is a short introduction to the theory of hyperbolic PDEs. The method of characteristic ODE constitutes the most basic part of the theory. In physical intuition, a wave (governed by a hyperbolic PDE) may be thought as a macroscopic view of the motion of many “particles” (governed by a characteristic ODE). Here a “particle” is a theoretical artifact (not a physical molecule) whose velocity coincides with the local phase velocity of the wave. Each “particle” describes a curve in space–time, which we call a characteristics. In the discussion of Sect. 2.4, the equation of motion (2.59) is the characteristic ODE of the hyperbolic PDE (2.64) which we call the equation of collective motion. Equation (2.64) is a first-order homogeneous linear hyperbolic PDE which is formulated as a linear combination of the first-order derivatives of the dependent (unknown) variable. Here we consider a more general class of first-order hyperbolic PDEs (including nonlinear equations). Let us see how the method of characteristic applies, with an appropriate generalization, to reduce such PDEs to a system of ODEs. Let x1 , · · · , xn and t be the independent variables, and u be the dependent variable. We put p j = ∂u/∂ x j ( j = 1, · · · , n). We consider a PDE such that ∂ u + G(t, x1 , · · · , xn , u, p1 , · · · , pn ) = 0, ∂t
(2.106)
where G is a sufficiently smooth function. If G is just a linear combination of p j and independent of u, (2.106) reduces to (2.64). Here G may be a general nonlinear function. We want to integrate (2.106) along an appropriate “orbit” {x(t)}. The temporal variation of u measured along a given orbit is evaluated by differentiating u(x(t), t): ∂u d x j ∂u d u= + . dt ∂t dt ∂ x j j=1 n
(2.107)
To find the appropriate orbit {x(t)}, we recall the relation between the characteristic ODE (2.59) and the linear PDE (2.64) (where G = V · p); we generalize (2.59) as
Notes
101
d ∂G xj = dt ∂pj
( j = 1, · · · , n).
(2.108)
If G is nonlinear, the right-hand side may include u and p j which must be simultaneously evaluated when we calculate x(t) by (2.108). This point brings about an essential complexity which the linear problem (2.64) is free from. First we derive the equation that determines u on the “orbit.” Plugging (2.106) and (2.108) into (2.107), we obtain ∂G d u = −G + pj. dt ∂pj j=1 n
(2.109)
Next we derive the equation that determines p j . Differentiating (2.106) with respect to x j , we obtain ∂ ∂ ∂u ∂G ∂ p ∂G pj + + G+ = 0. ∂t ∂x j ∂ x j ∂u ∂ x j ∂ p =1 n
(2.110)
Using (2.108), (2.107), and ∂ p /∂ x j = ∂ p j /∂ x (which is evident by the definition of p j ), we may rewrite (2.110) as d ∂G ∂G pj = − pj − dt ∂x j ∂u
( j = 1, · · · , n).
(2.111)
We have derived a system of ODEs (2.108), (2.109), and (2.111) by which we can determine the characteristics. Evidently, when (2.106) is linear and homogeneous, (2.108) is decoupled from the other equations and reduces into (2.59). If G does not include u explicitly,38 (2.106) reads as (denoting such G by H ) ∂ u + H (x1 , · · · , xn , p1 , · · · , pn , t) = 0, ∂t
(2.112)
which is called the Hamilton–Jacobi equation. In this case, (2.108) and (2.111), are separated from (2.109), and read as ∂H d xj = ( j = 1, · · · , n), dt ∂pj d ∂H ( j = 1, · · · , n), pj = − dt ∂x j
(2.113) (2.114)
which are called Hamilton’s canonical equations. 38 If G includes u, we may rewrite (2.106) as (2.112) with extending the space of independent variables, i.e., we assume that u = xn+1 is an independent variable and introduce Φ(t, x1 , · · · , xn+1 ) as a dependent variable. From the solution, u is obtained as the implicit function of Φ(t, x1 , · · · , xn+1 ) = c (a constant).
102
2 From Cosmos to Chaos
In Sect. 2.5, we discussed the Hamilton–Jacobi equation (2.112) and its generalized characteristic ODE, Hamilton’s canonical equations (2.113) and (2.114), from a different angle (see also Note 2.3). The reader is referred to Courant & Hilbert [5] for the classical theory of general PDEs.
Note 2.3 (Canonical Structure of Mechanics) The canonical equation (2.81), which is derived by a variational principle, constitutes a special class of equations of motion (the most general form of an equation of motion is written as a first-order ODE (2.59) in some state space). First of all, we notice that the state vector must be a combination of q and p, which are called canonical conjugate variables. And, the flow (or generator) V , the right-hand side of (2.81), is an antisymmetric combination of gradients (with respect to q and p) of a Hamiltonian. Then, the flow V of a canonical equation (which is called a Hamiltonian flow) is incompressible; denoting ξ = t (q1 , · · · , qv , p1 , · · · , pv ), the divergence of V (ξ ) is calculated as v ∂2 H ∂2 H = 0. − ∇·V = ∂q j ∂ p j ∂ p j ∂q j j=1
(2.115)
This fact—the incompressibility of the Hamiltonian flow in the space of the canonical conjugate variables—is known as Liouville’s theorem. Equation (2.64) of collective motion with the Hamiltonian flow, may be cast in the form of: ∂ u + {H, u} = 0, ∂t
(2.116)
∂a ∂b ∂a ∂b − ∂ p j ∂q j ∂q j ∂ p j j
(2.117)
where {a, b} =
is the Poisson bracket. The PDE (2.116) is called Liouville’s equation. The solution of this equation is a constant of motion. Obviously, {H, H } = 0. Hence, if H does not depend on t explicitly (i.e., if the system is autonomous), H is a constant of motion (implying the conservation of energy). Hereafter, we consider an autonomous system. If a function ϕ satisfies {H, ϕ} = 0, we say that ϕ commutes with H . It is evident that f (H ) ( f is an arbitrary smooth function) is a trivial example of functions that commute with H . There may exist a nontrivial (i.e., non-constant and independent
Notes
103
of H ) function ϕ that commutes with H . If such ϕ does not depend on t explicitly, ϕ is a nontrivial constant of motion. Definition (2.117) of the Poisson bracket is based on the antisymmetric matrix (2.80) which is derived from the special form of the action given by (2.76) and (2.77). For wider applications, we may generalize (2.77). Let us consider a general a j (ξ ) in the definition of the action: & ' −H (ξ , t)dt + a j (ξ )dξ j . S= Γ
The derivation of Hamilton’s equation (2.79) is the same, though the matrix F is no longer the one given by (2.78); it is, in general, a nonlinear operator. By (2.78), we find that F is an antisymmetric operator: ⎛ ⎞ " # kj jk ⎝ ⎠ (F u, v) = f u j , vk = − f vk = −(u, Fv). (2.118) u j, k
j
j
k
Here the inner product is defined by (cf. Note 1.1) (u, v) = (u j , v j ) = u j (ξ )v j (ξ ) dξ. j
j
Assuming that F is a bijection, we define F −1 = A and denote its components by f k j ,39 we obtain a generalized canonical equation (Hamilton’s equation) d ∂H ξk = fk j dt ∂ξ j j
⇐⇒
d ξ = A∂ξ H. dt
(2.119)
We define a generalized Poisson bracket by {u(ξ ), v(ξ )} = (A∂ξ u) · (∂ξ v) =
fk j
k, j
∂u(ξ ) ∂v(ξ ) , ∂ξ j ∂ξk
(2.120)
which satisfies the relation {ξ j , ξk } = f k j . Liouville’s equation (2.116) is now formulated by the generalized Poisson bracket (2.120). Reflecting the antisymmetry (2.118) of F, the operator A = F −1 is also antisymmetric: (Au, v) = −(u, Av).
(2.121)
In a more general framework, we do not assume that F is a bijection. If F −1 = A has a “kernel” (i.e., if there exists some η = 0 such that Aη = 0), we say that generalized Hamilton’s equation (2.119) is non-canonical. The kernel yields a “topological defect” (or Casimir invariant) of the dynamical system (see for example Arnold [2], Marsden & Morrison [15], and Morrison [16]).
39
104
2 From Cosmos to Chaos
In fact, writing U = F −1 u and V = F −1 v, we observe (F −1 u, v) = (U, F V ) = −(F U, V ) = −(u, F −1 v). By (2.121), we find (Au, u) = 0 (∀u), which means that the operator A rotates a vector to some perpendicular direction. Thus, the motion governed by the canonical equation (2.119) occurs in the direction perpendicular to the gradient of H . Therefore the energy (Hamiltonian) H is conserved. In fact, for any generalized Poisson bracket, we have {H, H } = 0. We note that the direction perpendicular to the gradient of H is not unique in high-dimensional space; the antisymmetric operator A determines the direction to go. In contrast, a dissipative system tends to diminish the energy. The motion that dissipates the energy through the fastest route is described by the evolution equation such that d ξ = −∂ξ H. dt
(2.122)
The change of the energy H is estimated by (here H does not depend on t explicitly) d dξ H (ξ ) = , ∂ξ H = −∂ξ H 2 . (2.123) dt dt Therefore H decreases till its minimum is attained. Figure 2.14 illustrates the images of the aforementioned two different types of dynamics. A general dissipative dynamics is some intermediate of these extremes; the orbit draws a spiral downward curve. H(ξ)
G
G
G
V
V V
ξ2
ξ1
(a)
(b)
(c)
Fig. 2.14 (a) Hamiltonian dynamics: Motion occurs in the direction perpendicular to the gradient G = ∂ξ H of the Hamiltonian (energy) H (i.e., the velocity V is perpendicular to G). The orbit is, thus, included in a level-set of H . (b) Pure dissipative dynamics: Motion occurs in the opposite direction of the gradient of H , so that the energy decreases. (c) General dissipative dynamics is some intermediate of (a) and (b); the orbit draws a spiral downward curve
Problems
105
The reader is referred to the classical textbook Landau & Lifshitz [12] for a compact introduction to the theory of classical mechanics. For the thorough discussion on the complex motions (in both Hamiltonian and dissipative systems), see Lichtenberg & Lieberman [13]. Arnold [1] describes differential-geometric implications of mechanics theory. For the extension of geometric mechanics theory to infinite-dimensional systems (field theories), see references [2, 9, 16].
Problems 2.1. Solve the nonlinear pendulum equation (2.6) following the process explained in Sect. 2.2.2. 2.2. Let us consider (1.39) as an example of nonlinear evolution equation. By its solution (1.40), we define the semigroup of one-parameter maps: T (t)x0 =
1 e−bt (x0−1
+ ε) − ε
(t 0),
where b > 0, ε < 0, and x0 > 0. Show that this T (t) satisfies the associative law (2.18). 2.3. Let J be a Jordan block with a degenerate eigenvalue a. Calculate the exponential function et J . 2.4. Let us consider the logistic map (2.42) with A = 4. Putting u n = sin2 (θn /2), we transform the variable from u n to θn . 1. Show that the logistic map (2.42) is equivalent to θn+1 = 2θn
(modulo 2π ).
2. Show that there are an infinite number of periodic motions generated by this logistic map. 2.5. Consider a system of autonomous (constant-coefficient) linear ODEs of the form (2.44) with a generator A=
ωα 0ω
.
1. Solve the eigenvalue problem (2.45) to determine the time constants, and show that they are independent of α. 2. Assuming α = 0, rewrite the system (2.44) of first-order ODEs as a second-order ODE and compare it with (2.46). 3. Assuming α = 0, show that the general solution of (2.46) does not satisfy (2.44).
106
2 From Cosmos to Chaos
2.6. Show that the function f ( xˆ (x, t)) of (2.67) satisfies the equation of collective motion (2.64) by directly plugging it into the equation. 2.7. Let us consider a system of coupled nonlinear oscillators governed by ⎧ d2 ⎨ dt 2 q1 = −ω12 q1 , ⎩
(2.124) d2 q dt 2 2
=
−ω22 q2 ,
where ω1 = ω (a real constant) and ω22 =
1 2 2 ω q1 + (dq1 /dt)2 . 2
1. Find the canonical conjugates and the Hamiltonian of the system (2.124). 2. Formulate the Hamilton–Jacobi equation and find the complete solution. 3. We extend the coupling of these oscillators to be bidirectional with assuming ω12 =
1 2 2 ω q2 + (dq2 /dt)2 . 2
This system is no longer integrable. Observe chaotic behavior of coupled oscillators by numerical experiments.
Solutions 2.1 Multiplying both sides of (2.6) by dφ/dt, we obtain d dt
1 dφ 2 − ω2 cos φ = 0. 2 dt
(2.125)
Integrating (2.125), we obtain 1 2
dφ dt
2 − ω2 cos φ = h
(a constant),
(2.126)
which implies the conservation of energy; see (2.11). We denote by φ0 (< π/2) the amplitude of φ(t). When φ(t) = φ0 , the angular velocity dφ/dt is zero. Thus, we can write h = −ω2 cos φ0 . Let us rewrite (2.126) as ( φ0 φ dφ = ±2ω sin2 − sin2 . dt 2 2
(2.127)
Solutions
107
When φ is increasing (decreasing), we choose the positive (negative) sign on the right-hand side of (2.127). Putting k = sin (φ0 /2), we write sin (φ/2) = k sin ϕ. Using the relation dφ/dϕ = 2k cos ϕ/ 1 − k 2 sin2 ϕ, we may rewrite (2.127) as $ dϕ = ±ω 1 − k 2 sin2 ϕ. dt
Next we put x = sin ϕ. By d x/dt = cos ϕdϕ/dt = transform (2.128) as
(2.128) √
1 − x 2 dϕ/dt, we may
dx = ±ω (1 − x 2 )(1 − k 2 x 2 ). dt
(2.129)
Integrating (2.129), we obtain
x
±ω(t − t0 ) = 0
dx (1 − x 2 )(1 − k 2 x 2 )
,
where t0 is the constant of integration. The right-hand side defines the elliptic integral of the first kind which is denoted by F(k, x). The inverse function of F(k, x) is Jacobi’s elliptic function. Putting τ = ±ω(t − t0 ), we write x = sn (τ, k). The exact solution to the nonlinear pendulum equation (2.6) is now given by
φ(t) = 2 sin−1 [ksn(±ωt + δ, k)] .
2.2 By the definition, we observe
T (t)T (s)x0 = =
1 e−bt [e−bs (x0−1
+ ε) − ε + ε] − ε
1 e−b(t+s) (x0−1 + ε) − ε
= T (t + s)x0 .
108
2 From Cosmos to Chaos
2.3 Remembering (2.33) and writing it in a matrix form, we obtain ⎛ e
tJ
eta teta · · ·
⎜ ⎜ 0 eta =⎜ ⎜ ⎝ 0 ···
t k−1 eta (k−1)! t k−2 eta (k−2)!
··· .. . 0 eta
⎞ ⎟ ⎟ ⎟. ⎟ ⎠
2.4 Plugging u n = sin2 (θn /2) into the logistic map u n+1 = 4u n (1 − u n ), we obtain, using trigonometric identities, sin2 (θn+1 /2) = 4 sin2 (θn /2) 1 − sin2 (θn /2) = sin2 (θn ), which implies that θn+1 ≡ 2θn (modulo 2π ). Hence the logistic map generates a sequence θn = 2n θ0 (modulo 2π ). If we start from an initialvalue θ0 = 2π/2m with a certain integer m, we have θm = θ0 . 2.5 1. The time constants are determined by det (λI − A) = λ2 − 2ωλ + ω2 = 0,
(2.130)
which gives a multiple root λ = ω. 2. System (2.44) reads as simultaneous equations (denoting d/dt by ˙) x˙ = ωx +αy and y˙ = ω. Differentiating the first equation and using the second one, we obtain x¨ = ω x˙ +α y˙ = ω x˙ +αωy. By the first equation, we can write y = α −1 (x˙ −ωy); here we assume α = 0. Using this, we obtain x¨ − 2ω x˙ + ω2 x = 0. This second-order ODE is equivalent to (2.46). 3. The “characteristic equation” of the second-order ODE (2.46), i.e., (2.130) is degenerate to have a multiple root ω. So we have to invoke the “variation of parameters.” The general solution is x = aeωt + bteωt . The “secular part” bteωt does not solve (2.44). 2.6 Plugging (2.67) into (2.64), we observe ∂ f ∂ xˆ j ∂ ∂ u(x, t) = f ( xˆ (x, t)) = . ∂t ∂t ∂ xˆ j ∂t j Here, the independent variables are t and x, so d xˆ j /dt = 0. Hence, ∂ xˆ j ∂ xk ( xˆ ; t) ∂ xˆ j ∂ xˆ j =− =− Vk . ∂t ∂ xk ∂t ∂ xk k k
References
109
On the other hand, we have V · ∇u(x, t) = V · ∇ f ( xˆ (x, t)) =
k
Vk
∂ f ∂ xˆ j . ∂ xˆ j ∂ xk j
Thus, (2.67) satisfies (2.64). 2.7
1. Defining p1 = dq1 /dt, p2 = dq2 /dt, h 1(q1 , p1 ) = ω2 q12 + p12 /2, the Hamiltonian is given in a separable form H = h 1 (q1 , p1 )q22 + p22 /2 + h 1 (q1 , p1 ). 2. The Hamilton–Jacobi equation is separated into independent ODEs: 1 2 2 ω q1 + (dΨ1 /dq1 )2 = ε1 , 2 1 2 ε1 q1 + (dΨ1 /dq1 )2 = ε2 − ε1 , 2 which can be solved as (2.97).
References 1. Arnold, V.I.: Geometrical Methods in the Theory of Ordinary Differential Equations (2nd ed.), Springer-Verlag, New York (1988) 2. Arnold, V.I.: Topological Methods in Hydrodynamics, Springer-Verlag, New York (1999) 3. Brazen, P.G., Johnson, R.S.: Solitons—An Introduction, Cambridge Univ. Press, Cambridge (1989) 4. Coddington, E.A., Levinson, N.: Theory of Ordinary Differential Equations, McGraw-Hill, New York (1955) 5. Courant, R., Hilbert, D.: Methods of Mathematical Physics, Vol. I & II, Interscience, New York (1962) 6. Cronin, J.: Differential Equations—Introduction and Qualitative Theory (2nd ed.), Marcel Dekker, New York (1994) 7. Grassberger, P., Procaccia, I.: Characterization of strange attractors, Phys. Rev. Lett. 50, 346–349 (1983) 8. Husserl, E.G.A.: Die Krisis der europ¨aischen Wissenschaften und die transzendentale Ph¨anomenologie (1936); The Crisis of European Sciences and Transcendental Phenomenology (Trans. by Carr, D.) Northwestern Univ. Press, Evanston (1970) 9. Jackiw, R.: Lectures on Fluid Dynamics—A Particle Theorist’s View of Supersymmetric, Non-Abelian, Noncommutative Fluid Mechanics and d-Branes, Springer-Verlag, New York (2002) 10. Jackson, E.A.: Perspectives of Nonlinear Dynamics 1, Cambridge Univ. Press, Cambridge (1989) 11. Kowalski, K., Steeb, W.H.: Nonlinear Dynamical Systems and Carleman Linearization, World Scientific, Singapore (1991) 12. Landau, L.D., Lifshitz, E.M.: Mechanics (3rd ed.), Course of Theoretical Physics Vol. 1 (English Trans. by Sykes, J.B., Bell, J.S.), Pergamon Press, London (1976) 13. Lichtenberg, A.J., Lieberman, M.A.: Regular and Chaotic Dynamics (2nd ed.), SpringerVerlag, New York (1991)
110
2 From Cosmos to Chaos
14. May, R.M.: Simple mathematical models with very complicated dynamics, Nature 262, 459–467 (1976) 15. Marsden, J.E., Morrison, P.J.: Noncanonical Hamiltonian field theory and reduced MHD, Contemp. Math. 28, 133–150 (1984) 16. Morrison, P.J.: Hamiltonian description of the ideal fluid, Rev. Mod. Phys. 70, 467–521 (1998) 17. Reed, M., Simon, B.: Methods of Modern Mathematical Physics II: Fourier Analysis, SelfAdjointness, Academic Press, San Diego (1975) 18. von Neumann, J.: Mathematische Grundlagen der Quantenmechaik, Springer-Verlag, Berlin (1932); Mathematical Foundations of Quantum Mechanics (Trans. by Beyer, R.T.), Princeton Univ. Press, Princeton (1996) 19. Whittaker, E.T., Watson G.N.: A Course of Modern Analysis (4th ed.), Cambridge Univ. Press, Cambridge (1927) 20. Yosida, K.: Functional Analysis (6th ed.), Springer-Verlag, Berlin-Heidelberg (1980)
Chapter 3
The Challenge of Macro-Systems
Abstract Chaos is a word of rich and looming content, implying unpredictable events producing unlimited variety that is not tameable in the territory of order. The grand narrative of mechanics casts chaos as the antagonist of integrability. In the preceding chapter, we demonstrated order in integrability, showing that motion can be transformed into motionless, that is, a superficially dynamic event can be reduced into a complete set of invariant parameters (constants of motion). Symmetry is the key to find the invariance. The negation of order, then, means irreducibility or true movement—these properties define chaos in the realm of mechanics. However, more general aspects of chaos should be recognized from a phenomenological viewpoint. In a “macro-system” which includes a tremendous number of interacting parameters, non-integrability (lack of constants of motion) is just obvious, so it cannot be a determining concept characterizing something chaotic. In this chapter, we focus on the “difficulty of prediction” as an aspect characterizing chaos and discuss complex systems from a macroscopic viewpoint.
3.1 The Difficulty of Prediction 3.1.1 Chaos in Phenomenological Recognition Chaos is the mother of order or structures: this is a rather banal theme in mythical worlds. For example, ancient Egyptian myth tells us that the sun god Re was born from the primeval water of chaos symbolized by the goddess Nun. Theogony (Hesiod) describes the birth of the gods of the pantheon like phase-transitions of time, space, and matter; the origin was “Chaos” which arose spontaneously. At the opposite pole to this fertility of chaos, physics relates the end of evolution, invoking the notion of entropy, and calls it “heat death.” Birth versus death: what is implied by these dual natures of chaos? In the preceding chapter, we gave a mathematical definition of “chaos.” A skillful transformation of variables may uncover the potential order in a seemingly complex system; this expectation is the basic motivation of mechanical theory. The structural analysis of the system, which is to reveal a priori the order of motion, is the core Z. Yoshida, Nonlinear Science, DOI 10.1007/978-3-642-03406-0 3, C Springer-Verlag Berlin Heidelberg 2010
111
112
3 The Challenge of Macro-Systems
of the theory. In this framework, “chaos” means the impossibility of revealing the order, that is, the irreducibility of coupled parameters. However, “chaos,” in its original meaning, is a feature of phenomena to be recognized through experience or observation. The recognition of chaos belongs to the area of experiences while the order of mechanics is a concept in the transcendental realm. Because of this gap, the notion of chaos defined in mechanical theory is not sufficient. In fact, if we do not know (or cannot describe) an equation of motion (or a Hamiltonian), we cannot define “order”; thus, we cannot define “chaos” as the antithesis of order, though it is often difficult even to formulate an equation of motion for a really “chaotic” phenomenon. In the realm of experiences, how can “chaos” be recognized as a phenomenological feature? And, how can phenomenological chaos be compared to mathematical chaos? First we need to devise a method of measuring (or quantifying) phenomenological features of chaos. Mathematical chaos, or non-integrability, is a qualitative notion by which we classify the structure of dynamical systems. In contrast, we are aiming at a quantitative theory by which we can evaluate chaotic properties of phenomena. The method of quantification, however, will not be unique. In fact, it is a quantification of an “image” which alters when we change the viewpoint or perspective. Thus, it is difficult to give a universal definition of “chaos.” Here, we will not take an axiomatic approach to defining chaos, but will try to understand chaos through empiricism, by characterizing the objectivity of complex phenomena.
3.1.2 Stability Difficulty of prediction is the primary characteristic of “chaos.” If we know something about the order (for example, the periodicity) of motion, we can predict future states. To put it the other way around, difficulty of prediction is caused by a lack of understanding of the order (regularity). Therefore, the difficulty of prediction is related to the irregularity or disorder of motion, which may be called chaos in a phenomenological sense. However, it is not easy to give a clear definition to the word “difficulty” which we want to use to characterize “chaos.”1 Notwithstanding this conceptual problem, our practical interest is not in giving an axiomatic definition to “chaos,” but in finding an effective method to “quantify” the difficulty of prediction in actual phenomena. Here “prediction” means the estimate of future state with respect to a given initial condition. In practice, however, it is not possible to determine an exact initial condition. Thus, a more well-posed problem of prediction is to estimate a set of future states with respect to a set of initial conditions, a neighborhood of a certain initial 1 While “difficulty” is an obscure notion, we may give a clear definition to “impossibility”; the impossibility of finding a unique solution to an equation of motion is due to the irregularity of the generator (recall the example of Fig. 1.10). In such a case, we should abandon “determinism” and turn to “probabilistic descriptions” of dynamics (see Sect. 3.2). However, chaos does occur in the realm of determinism, too.
3.1
The Difficulty of Prediction
113
state; the measure of the set of states quantifies the uncertainty or error margin of prediction. If uncertainty greatly amplifies with time, the prediction becomes “difficult” as a “problem of accuracy.” Hence we may quantify the difficulty of prediction by measuring the growth of uncertainty (as we shall see later, however, difficulty in accuracy does not directly mean “chaos”). To quantify how differences in initial conditions amplify with time, we define the Lyapunov exponent. This notion is related to stability of motion. Suppose that a motion x(t) occurs from a certain initial state xˆ . From an arbitrary adjacent initial state xˆ , a different motion x (t) emerges. If the distance of every x (t) from x(t) does not increase with time, the motion x(t) is said to be stable. To put it more precisely, a motion x(t) is stable if the following condition holds: For an arbitrary ε (> 0), there exists a finite δ (> 0) such that |x (t) − x(t)| < ε (∀t 0) holds if |x (0) − x(0)| < δ.2 On the contrary, a motion is said to be unstable if a small difference in initial states expands with time. The Lyapunov exponent is an index representing the time constant of the increase of differences. Let us denote an orbit by {x( xˆ ; t); t 0}. Here, xˆ is the label of the initial condition. Sifting the initial state by δ, we consider another orbit starting from xˆ + δ, which we denote by {x( xˆ + δ; t); t 0}. We define ⎡
⎤ ˆ + δ; t) − x( xˆ ; t)| |x( x 1 ⎦, λ( xˆ ) = sup ⎣lim sup log |δ| t→∞ t δ
(3.1)
|δ|→0
and call λ( xˆ ) the maximum Lyapunov exponent (sometimes, simply the Lyapunov exponent). Let us see the meaning of the maximum Lyapunov exponent using sample orbits produced by known equations of motion.3 First we consider the simplest onedimensional autonomous linear system, which is governed by d x = ax, dt
x(0) = xˆ ,
(3.2)
2
To be precise, this condition is called the Lyapunov stability. As a cruder stability, one may consider the condition that all adjacent orbits stay in the ε-neighborhood of the orbit (without measuring the distance of the states at each time)—the motion satisfying this weaker condition is said to be orbitally stable. 3 We note that the Lyapunov exponent is defined by a set of orbits, not by an equation of motion; we may evaluate it if we have sufficiently detailed data of orbits, even without knowing the governing equation of motion. Usually, an equation of motion is thought to be most conclusive, while measurements of orbits, like Lyapunov exponents, are superficial. Indeed, in the simple examples shown here, we can deduce the Lyapunov exponent from the generator of the equation of motion. However, it is generally difficult to predict actual motions generated by nonlinear equations. Then Lyapunov exponents provide us with quantitative understanding of motion. One may produce data of orbits by numerical integration of equations of motion and evaluate Lyapunov exponents.
114
3 The Challenge of Macro-Systems
where a is a real constant. Integrating (3.2), we obtain x(xˆ ; t) = eat xˆ . By the linearity of the system, we easily obtain an estimate |x(xˆ + δ; t) − x(xˆ ; t)| = eat |δ|. By definition (3.1), we find λ(xˆ ) = a. If a > 0, the orbit is unstable, i.e., the distance of adjacent orbits (or uncertainty in the initial condition) increases exponentially. The Lyapunov exponent is the time constant of this exponential growth.4 Generalizing (3.2) slightly, let us consider an n-dimensional autonomous linear system d x = Ax, dt
x(0) = xˆ ,
(3.3)
where A is a matrix of constant elements. By generating the exponential function et A , the solution of (3.3) is written as x(t) = et A xˆ (see Sect. 2.3.2). The behavior of the solution is characterized by the eigenvalues of the generator A. By definition (3.1), it is evident that the maximum Lyapunov exponent is the maximum of the real parts of the eigenvalues (see Problem 3.1). Now, the meaning of “maximum” of Lyapunov exponents is clear; when the displacement δ is parallel to the eigenvector corresponding to the maximum real part eigenvalue, the growth of the distance becomes fastest. In a general nonlinear dynamical system, the distance between different orbits may change as a complicated function of time and position. Let us consider an ndimensional nonlinear dynamical system generated by a smooth real-valued vector field (flow) V (x, t): d x = V (x, t), dt
x(0) = xˆ .
(3.4)
We denote by {x( xˆ ; t); t 0} the orbit starting from an initial state xˆ . The behavior of orbits in the vicinity of one reference orbit can be analyzed by the linear approximation (in the neighborhood of the reference orbit) of the equation of motion, whose generator is a linear operator (matrix) ) ∂(V1 , · · · , Vn ) )) A( xˆ ; t) = . ∂(x1 , · · · , xn ) )t, x=x( xˆ ;t)
(3.5)
We note that the generator A depends on t. However, the dependence on x (i.e., nonlinearity) of the generator has been removed by the linear approximation. Solving
4 Because the Lyapunov exponent is a measure of the time constant of exponential behavior, it does not work well for slower instabilities or more violent instabilities. For example, for an “algebraic 2 instability” x(t) = t p xˆ ( p > 0), we obtain λ(xˆ ) = 0. And for x(t) = eat xˆ , λ(xˆ ) = ∞.
3.1
The Difficulty of Prediction
115
d x δ = A( xˆ ; t)x δ , dt
x δ (0) = δ,
(3.6)
we can study the stability in the neighborhood of the reference orbit x( xˆ ; t). Denoting the solution of (3.6) by x δ (t), the maximum Lyapunov exponent (3.1) is given by 1 |x δ (t)| . λ( xˆ ) = sup lim sup log |δ| t→∞ t δ
(3.7)
While a positive Lyapunov exponent scales the time constant of the increase of the distance between adjacent orbits, it does not necessarily mean that orbits actually separate unlimitedly. In fact, a positive Lyapunov exponent can occur even in a system where the motion is restricted in a bounded domain—this situation is indeed typical for chaos; see Figs. 1.6, 2.8, and 2.10. We may say that chaos occurs by the conflicts of two opposite tendencies—instability driving orbits to separate and boundedness confining orbits in limited space.5 The correct interpretation of the Lyapunov exponent is, thus, the other way around—the error margin permitted to the initial value becomes smaller with time (see Fig. 3.1). If we want to predict, with an accuracy ε, the future state at time t, the precision of the initial condition must be e−λt ε. To put it more formally, for the actual state to be included in the ε-neighborhood of x( xˆ , t), the initial state must be in the e−λt ε-neighborhood of xˆ . Figure 3.2 shows the mixing effect of a chaotic flow that is given by the ABC map (2.48). This calculation is a model of the generation of a cosmic magnetic
δ x0′
x0
x (t)
ε
x′(t)
Fig. 3.1 If the maximum Lyapunov exponent λ is positive, we need increasingly high precision for the initial condition; for two orbits to stay within a distance ε at time t, the initial states of these two orbits must be within a distance δ = e−λt ε
5 Let us recall Problem 2.4, where we derived a more transparent representation of the logistic map with A = 4, i.e., putting u n = sin2 (θn /2), we obtain θn+1 ≡ 2θn (modulo 2π ). In this representation, the Lyapunov exponent (for discrete time) is log 2, so the motion is unstable. While this exponential law is simple, complexity is introduced by the modulo arithmetic that interfolds the orbit into a bounded domain [0, 2π ).
116
3 The Challenge of Macro-Systems
(a)
(b)
Fig. 3.2 Mixing effect of a chaotic flow. When a physical quantity is transported by a chaotic flow, the length scale of the distribution reduces exponentially with time (after Gilbert & Childress [8])
field. In space, atoms are ionized because of high energy, producing coupled dynamics of matter (flow) and electromagnetic fields—such a system is called a plasma. Magnetic flux and ionized matter move together. The strength of the magnetic field is given by the spatial derivative of the magnetic flux, thus it is amplified when the length scale of inhomogeneity of distribution is reduced. Because of a positive Lyapunov exponent of the ABC map, the length scale diminishes exponentially with time, resulting in exponential growth of the strength of the magnetic field; the chaotic flow acts as a cradle (or Chola) for cosmic fields.
3.1.3 Attractors In (3.7), we have shown the relation between the maximum Lyapunov exponent and the generator of dynamics. The local value of the linear-approximated generator A determines the stability of orbits in the vicinity of the reference orbit; the eigenvalues of A estimate local time constants. The Lyapunov exponent, however, measures the effective time constant that is averaged over time and space along actual orbits. It is, therefore, different also from the simple (homogeneous) space–time average of the eigenvalues of A. If a certain domain in state space is rarely accessed, the domain does not have importance for actual phenomena. On the contrary, if almost all orbits
3.1
The Difficulty of Prediction
117
are attracted to a certain domain and they stay there for a long time, the domain determines the property of dynamics. Let us choose an initial value xˆ from a subset Ω0 of state space and observe its evolution x( xˆ ; t). We survey all xˆ ∈ Ω0 and all t T (T being a positive parameter) and define ΩT = {x( xˆ ; t); xˆ ∈ Ω0 , t T }. If the domain in which orbits move around is shrinking, we obtain ΩT ⊂ Ω0 (see Fig. 3.3). The smallest of ΩT in the limit T → ∞ is called the attractor.6 If the attractor is a “small” set, the complexity of motion reduces with time. The dimension of the attractor means the effective degree of freedom that characterizes the long-term behavior of a dynamical system. Lost dimensions reflect dissipation, loss of the information of the initial state. In a dynamical system that is governed by Hamilton’s canonical equation, the whole information of the initial state is conserved; by Liouville’s theorem (Note 2.3), the flow in state space is incompressible, implying that the volume of an arbitrary set in state space can never change with time (while the shape of domain may change). In contrast, a dynamical system where sets in state space can “shrink” is called a dissipative system. 7 p
q
Fig. 3.3 An attractor is a set in state space which absorbs orbits. This figure shows the orbits of damping oscillators (small-amplitude pendulums which are dampened by friction). The attractor is the single stationary point. In a general nonlinear system, more complicated attractors may appear (see Fig. 1.6)
6 There are some varieties of detailed definitions of attractors, depending on authors’ purpose. In Note 3.1, we give the minimum (simplest) definition of attractors in a more mathematical language. 7 Of course, shrinkage (loss of the information of the past) is the natural direction of evolution. On the contrary, expansion, that is increase of information, cannot be an autonomous process of deterministic evolution, but may happen in an open system.
118
3 The Challenge of Macro-Systems
Let us study a simple example of a dissipative system. First we consider a harmonic oscillator that has a Hamiltonian H = (q 2 + p 2 )/2. Let us add a “friction force” to the equation of motion. Then the equation of motion is d dt
q p 0 = −c , p −q p
(3.8)
where c is a positive constant representing the coefficient of friction. Calculating the energy by (3.8), we obtain d dt
q 2 + p2 2
= −cp 2 .
(3.9)
The right-hand side is negative as far as p 2 > 0. The length (Euclidean norm) of the state vector is given by (q 2 + p 2 )1/2 . Equation (3.9) shows that this length shrinks monotonically. Thus, every orbit (or any set in state space) is absorbed by the origin t (q, p) = t (0, 0) that is the attractor of this system (see Fig. 3.3).8 In a general nonlinear dissipative system, passages toward an attractor are generally very complicated. Dissipation drives orbits to converge into a stationary point, but nonlinearity defies this damping mechanism by generating instabilities (producing a positive Lyapunov exponent) and also bifurcating the stationary points (see Sect. 1.4.5). By the conflicts of these effects, an attractor (if it exists) is generally a set of complex shapes (see Fig. 1.6), which may have a non-integer “dimension.” The “dimension” of a geometric object (often referred to as a measure of the degree of freedom) may be defined in some different ways (see Note 3.2). A strange geometric object that has a non-integer dimension is called a fractal (cf. Sect. 4.2.3). An example of fractal is a curve that is infinite in length but is zero in square measure (cf. Fig. 4.2). To quantify the size of such an object, the one-dimensional measure (length) is too small, but the two-dimensional measure (area) is too large. Thus, this object has a certain intermediate dimension between 1 and 2. As previously mentioned, nonlinearity tends to extend an attractor, against shrinkage due to dissipation, to yield a finite dimension (note that the point attractor of the aforementioned linear system is a set of zero dimension). An attractor is called a strange attractor if it is a fractal. An orbit starting from inside an attractor stays within the attractor forever (in this sense, the attractor is an invariant set; see Note 3.1). The motion in a strange attractor is generally complex and the maximum Lyapunov exponent is positive. While a dissipative system reduces the degree of freedom and tends to simplify 8 Recall the abstract dissipative dynamics discussed in Note 2.3. Equation (2.122) describes the process of the “fastest” convergence into the stationary point (minimum of the energy), which is in marked contrast with the Hamiltonian dynamics that evolves to a direction perpendicular to the gradient of the energy (Hamiltonian) (see Fig. 2.14) . The aforementioned example of damping oscillation is an intermediate between these extremes; the orbit describes a spiral downward curve in the landscape of the energy.
3.1
The Difficulty of Prediction
119
the motion, the complexity survives in a narrow but vital “gap”—that is a strange attractor.
3.1.4 Stability and Integrability We have been trying to return, in the discussion of “chaos”, to a viewpoint of phenomenology; we have been describing the “superficial (or quantitative) aspect” which, however, relegates “structural characterization” to a deeper layer beneath actual phenomena. The aim of this section is to examine a basic relation, as well as a gap, between the phenomenological and structural understandings of chaos—the former is “instability” and the latter is “non-integrability.” A positive Lyapunov exponent makes prediction difficult as a problem of accuracy. It would be easy if the positivity of a Lyapunov exponent (i.e., difficulty of prediction) could be a sensible definition (or structural characterization) of chaos. However, it causes contradictions and confusion. While a Lyapunov exponent is a measure (average growth rate) of instability, it is not an index of the level of complexity (if such a thing were possible). In fact, we know “simple” motions that have positive maximum Lyapunov exponents; the simplest example is the onedimensional constant-coefficient ODE (3.2). Positivity of a Lyapunov exponent (that is the coefficient a) indeed makes prediction of future state difficult. This linear system, however, does not have a characteristic pertaining to “chaos”—it is easily integrated, and the motion is described by the well-known function eat .9 An unstable motion whose range freely expands is rather simple; complexity develops when there is a certain restriction. As we have remarked in preceding sections, the conflict of the opposite tendencies—instability driving orbits to separate and boundedness confining orbits in narrow space—yields “chaos.” The Lyapunov exponent and integrability has the following basic relation: If an autonomous system is integrable, and moreover, all orbits are included within a certain bounded domain, the maximum Lyapunov exponent cannot be positive. Therefore, if all orbits of an autonomous system are included within a certain bounded domain, while the maximum Lyapunov exponent is positive, the motion is not integrable. The basic idea of the proof is simple. Let n be the dimension of state space. If the motion is integrable, there are n − 1 constants of motion that are independent of t (see Sect. 2.4.1). Therefore, only one parameter, say xn , moves. The equation governing xn can be written as (cf. (2.61)) d xn = Vn (xn ). dt
(3.10)
9 If state space is of infinite dimensions, however, it is not so easy to distinguish order and chaos; considerable complexity may develop even in linear autonomous dynamics; see Note 3.3.
120
3 The Challenge of Macro-Systems
Because the system is autonomous, Vn does not include t as an independent variable. By the assumption that the orbits are included within a bounded domain, xn (t) must be either convergent to a certain stationary (fixed) point or periodic (i.e., xn (t) is an angular variable that is a modulus of a certain period; see Fig. 2.11). In the former case, the convergent point may change depending on the initial condition (constants of motion depend on the initial values of x1 , · · · , xn−1 , so Vn (xn ) changes as a function of the initial values), and then, the maximum Lyapunov exponent is zero (if all orbits converge to a single fixed point, the maximum Lyapunov exponent can be negative). In the case of periodic motion, |xn (t) − xn (t)| can increase at most in proportion to t (because of different frequencies determined by different x1 , · · · , xn−1 ). Thus, the maximum Lyapunov exponent is zero. If the system is not autonomous, the relation between the Lyapunov exponent and integrability becomes rather weak. In fact, an integrable motion may have a positive Lyapunov exponent even if it is confined within a bounded domain. Let us see an example. We consider a Hamiltonian H = et(q
2
+ p2 )/2
.
(3.11)
Note that this H includes t, so the system is not autonomous. The equation of motion (2.81) reads as d dt
2 2 q t pet(q + p )/2 . = 2 2 p −tqet(q + p )/2
Let us transform variables to polar coordinates: x=
q 2 + p 2 tan−1 (q/ p),
y=
q 2 + p2 .
Then the equation of motion simplifies as d dt
2 x t yet y /2 , = y 0
(3.12)
which is an integrable equation; we obtain 4 4 2 2 x(t) = xˆ + 3 + t − 3 et yˆ /2 , yˆ yˆ yˆ
y(t) = yˆ ,
(3.13)
where xˆ and yˆ are the initial values of x and y. Let us estimate the maximum Lyapunov exponent. Because the above-mentioned coordinate transformation is an orthogonal transformation (Jacobian = 1), the Lyapunov exponents evaluated in x–y space and in q– p space are the same. Plugging the solution (3.13) into the definition (3.13), we obtain
3.2
Randomness as Hypothetical Simplicity
λ(xˆ , yˆ ) =
121
yˆ 2 . 2
While this system has a positive Lyapunov exponent, the orbits do not extend infinitely. In fact, in q– p space, the set of orbits forms a rotating shear flow, that is, each streamline (orbit) is a circle with a constant radius y and the angular velocity changes as a function of y as well as t.10 Meanwhile, a non-integrable motion may have a negative maximum Lyapunov exponent in a non-autonomous system. Let x(t) be a motion of a non-integrable system that has a maximum Lyapunov exponent λ > 0. We apply a transformation y(t) = e−Λt x(t), where Λ is a certain real constant. We assume Λ > λ. Then the maximum Lyapunov exponent λ of y(t) is negative. In fact, by (3.1), we obtain |e−Λt [x( xˆ + δ; t) − x( xˆ ; t)]| 1 = λ − Λ. λ = sup lim sup log |δ| t→∞ t δ
(3.14)
The transformation we have applied simply amounts to “zooming-out.” In the limit of t → ∞, every volume element shrinks to y = 0. This transformation changes only the scale of view; it does not modify the “structure” of the system. Hence the non-integrability cannot be changed—integrability is a property that is invariant to transformation of variables. Superficial complexity of a phenomenon may be reduced by changing our viewpoint (the way of describing an object). In the preceding chapter, we have examined the possibility of coordinate transformations search for symmetries; here, we have found that the notion of “scale” also intervenes in the discussion—complexity may become actual or potential depending on the scale of observation. When appearances of an object vary depending on scales of our observation (i.e., representative values of length, time, energy, etc.), we must study hierarchies of scales. This issue will be discussed in the next chapter.
3.2 Randomness as Hypothetical Simplicity 3.2.1 Stochastic Process Theory of mechanics is based on “determinism,” even if it discusses the problem of chaos as the “difficulty” of prediction. Here, determinism is the idea that an initial condition decides a unique future state. The relation between an initial condition 10
This shear flow can yield a mixing effect that proceeds exponentially (that is, the Lyapunov exponent is positive). We note that the positive Lyapunov exponent in this ordered structure is brought about by the inhomogeneity (shear) of the flow that increases exponentially as a function of t. In a stationary shear flow that has an ordered structure, the mixing can occur only in proportion to t; see (3.10) and also Casti, Chirikov & Ford [4] and Gilmore & Lefranc [9]. In Fig. 3.2, the flow is stationary, but the mixing is exponential because the streamlines are chaotic (non-integrable) and have a positive Lyapunov exponent.
122
3 The Challenge of Macro-Systems
and future state is described by a differential equation or a variational principle; the initial and corresponding future states are connected by a curve, or orbit, in state space. Thus, it is thought that prediction of future states is possible in principle, even though there is a problem of accuracy if the maximum Lyapunov exponent is positive. However, when a system is not integrable, an individual orbit (or a particular solution of the equation of motion for a specific initial condition) cannot be the “representative” of other orbits starting from different initial conditions—solutions for slightly different initial conditions behave totally uncorrelated. This is the problem of chaos in mechanics, posing a fundamental difficulty in studying “universality” of motion.11 The chaos problem in this setting, based on determinism, envisages a small “element,” reduced from the real world, as the object of description and analysis. When the dimension of state space is a modest number, it is not a far-reaching program to investigate integrability; finding non-integrability, or chaos, is still non-trivial. But non-integrability is only trivial when we consider a macro-system, composed of many interacting elements, which has a huge degree of freedom. Of course it is impossible, and even irrelevant, to describe or analyze motions of all elements. How, then, can we determine a “representative” of the system and make use of it as a “model” to study universal properties of the system? As previously mentioned, even in a low-dimensional system, a single motion cannot be a representative of a non-integrable system. When the degree of freedom is high, motion is extremely non-integrable (i.e., the number of the constants of motion is much lower than the degree of freedom), so the notion of representative is meaningless in the framework of determinism; thus, statistical theory is needed. Let us choose a “test particle” from a set of a large number of particles and observe its motion. The test particle moves interacting with other particles. Recall the equation of collective motion discussed in Sect. 2.4.3, where we consider a set of independent (non-interacting) particles. In contrast, the test particle considered here is a part of many inseparable (interacting) bodies. If we try to evaluate the force acting on the test particle precisely, we eventually have to analyze the motion of all interacting particles simultaneously. State space is the product of every state space of each particle, which is called Γ -space. Needless to say, we cannot observe/describe/analyze the motion of an enormous number of particles. Thus, we assume that the test particle receives random force from other particles. The motion of the test particle under random force is described as the curve in state space of a single particle, which is called μ-space. The degree of freedom is now reduced to that of only one test particle. The influence of other particles is abstracted under the hypothesis of randomness. The equation of motion of the test particle can be written formally as m
11
d q = F + αG, dt
(3.15)
We note that this argument assumes the unique solvability of initial-value problems of ODE, which requires the Lipschitz continuity of the generator (see Note 1.2).
3.2
Randomness as Hypothetical Simplicity
123
where q = dq/dt is the velocity of the test particle, F is the force that is independent of other particles, and G is the random force modeling the interactions with other particles (α is a coefficient measuring the strength of the interactions) (see Note 3.4). An ODE like (3.15) that includes a random force is called Langevin’s equation.
3.2.2 Representation of Motion by Transition Probability When we consider the motion of a “test particle” that moves in a random field, its orbit is ruled by chance, thus its particular trajectory does not have a definite meaning. The equation of motion (3.15), including a random force, will give a different solution, for the same initial condition, whenever it is recalculated, because the random force will be a different time series at each trial. Therefore we have to do trials many times and study tendencies of motion by statistical averages. In theory, “trials” are virtual processes; we consider a “set of data,” or ensemble, which may be obtained by thought-experiments. Such an ensemble is endowed with some axiomatic properties, like the principle of maximum entropy (cf. Sect. 3.2.3). What we mean by “probability” or “statistical mean” is a quantity defined by the set of such virtual data. We shall no longer try to describe the test particle by a deterministic orbit, but are going to explore its statistical properties based on a probabilistic description. For this purpose, we introduce the notion of transition probability that replaces (in fact, generalizes) the notion of orbit. We denote by x the state vector in state space X . Suppose that a test particle is located at x s when time is s. We denote by P(s, x s ; t, B) the probability that the particle appears in a domain B (⊂ X ) at time t s, and call it a transition probability. A transition probability density p(s, x s ; t, x) is defined by P(s, x s ; t, B) =
p(s, x s ; t, x) d x, B
where d x is the volume element (Lebesgue measure) of state space X . A stochastic process that is described by a transition probability is called a Markov process, which is a process of not depending on “history” (past state) in the following sense: Suppose, for instance, that there is a rule that a particle cannot re-enter a certain domain D of space. Then the transition after a certain time s must depend on the history of whether the particle has ever entered D or not before time s. The probability distribution at future time t > s cannot be represented by a function P(s, x s ; t, B) that does not contain the memory of history. The Markov process excludes such history-dependent transitions. We may consider an autonomous statistical system, where the transition probability does not depend on a specific choice of the origin of time, thus only time interval t − s is an independent variable. Setting s = 0, we write P(t, x 0 , B) = P(0, x 0 ; t, B),
p(t, x 0 , x) = p(0, x 0 ; t, x).
124
3 The Challenge of Macro-Systems
A transition probability is a stochastic distribution for which we assume the following axioms: For arbitrary s t and x s ∈ X , we demand that 0 P(s, x s ; t, B) 1 (∀B ⊆ X ), and P(s, x s ; t, B) =
1 0
(B = X ), (B = ∅).
(3.16)
1 0
(x s ∈ B), (x s ∈ B).
(3.17)
And for arbitrary s and B ⊆ X , P(s, x s ; s, B) =
Moreover, we demand the Chapman–Kolmogorov equality: for arbitrary s < τ < t, x s ∈ X , and B ⊂ X , P(s, x s ; t, B) =
X
=
P(s, x s ; τ, dy)P(τ, y; t, B)
(3.18)
p(s, x s ; τ, y)P(τ, y; t, B) dy. X
This relation implies an associative law in the following sense: For a test particle to start from a position x s at time s and to arrive at a domain B at time t, there are many possible intermediate routes. The probability P(s, x s ; t, B) must be equal to the total of all probabilities of intermediate transitions; the right-hand side of (3.18) represents the sum over all intermediate paths. The relations (3.17) and (3.18) generalize the causality axioms (2.15) and (2.16) that we demanded for deterministic processes. If the movement of a test particle is definitely given by an orbit {x(t) = T (t, s)x s }, we may formally write p(s, x s ; t, x) = δ(x − T (t, s)x s ),
(3.19)
where δ(x) is Dirac’s δ function. Then (3.17) and (3.18) translate as (2.15) and (2.16). An orbit—a deterministic description of motion—is the limit of infinite “accuracy” of a transition probability. The opposite extreme is the limit of perfect randomness, which means that we have no idea where the test particle will be located after a certain time. In this case, we should admit, for a certain t > s, p(s, x s ; t, x) =
1 |Σ|
(∀x s , x ∈ Σ),
(3.20)
where |Σ| denotes the volume of the set Σ of possible states. This is the so-called principle of equal weight, meaning that all possible states have the same probability. However, there is a profound gap in logic; it is fundamentally incoherent to say that we cannot predict the position, but that the probability is distributed equally. It is an
3.2
Randomness as Hypothetical Simplicity
125
important problem of theory to prove (3.20) as a consequence of some properties of stochastic processes. In the next section, we shall give a proof of this “hypothesis” under some conditions.
3.2.3 H-Theorem The randomness of stochastic processes is explained by the so-called H-theorem, which gives a proof for the principle of equal weight (3.20) under some appropriate assumptions about the transition probability. Let Σ be the total set of possible states that a test particle can choose. Since there is a technical hurdle to consider continuous variables, we study a discrete model; we divide Σ into small cells σk (k = 1, · · · , M) and observe stochastic transitions of a test particle moving among the cells. We describe transitions at discrete time τ = 0, 1, 2, · · · . Assuming that the process is autonomous, we define a transition probability P(τ, k, ) that evaluates the probability of the test particle to move from σk to σ after time τ . The axioms (3.16) and (3.17) can be naturally reformed for the P(τ, k, ): M
P(τ, k, ) = 1 (k = 1, · · · , M),
=1
P(0, k, ) =
1 0
(k = ), (k = ).
(3.21) (3.22)
The Chapman–Kolmogorov equality (3.18) now reads as P(τ, k, ) =
M
P(ν, k, j)P(τ − ν, j, )
(τ = 2, 3 · · · , 0 < ν < τ ).
(3.23)
j=1
In addition to these basic axioms, we assume the following additional conditions: M
P(τ, k, ) = 1
( = 1, · · · , M)
(3.24)
k=1
and P(ν, k, ) > 0 (∀k, , ∃ν).
(3.25)
If we assume the “symmetry” P(τ, k, ) = P(τ, , k) of the transition, condition (3.24) is equivalent to axiom (3.21) which implies the conservation of the total probability. If such a symmetric relation does not hold, a general stochastic process may violate condition (3.24). If the sum of the left-hand side of (3.24) is larger than 1 for a certain , the sum for some other must be smaller than 1, because of the
126
3 The Challenge of Macro-Systems
conservation of the total probability. Then σ is attracting and σ repelling. Condition (3.24) inhibits such an inequality of transitions. On the other hand, condition (3.25) demands that all places have a finite chance to be visited by the test particle (which may start from an arbitrary initial condition), at some time. Violation of this condition means that the total domain Σ is divided into some subdomains where the test particle is confined, i.e., if the test particle starts from a certain initial condition, there is a place where it can never visit. If condition (3.25) is satisfied, the stochastic process is said to be indecomposable or metrically transitive. Concerning a stochastic process that satisfies the aforementioned assumptions, the H-theorem claims that the so-called H-function is non-increasing as a function of time τ . Here, the H-function is defined as follows: Putting h( p) =
p log p ( p > 0), 0 ( p = 0),
(3.26)
the H-function is H (τ ; k) =
M
h(P(τ, k, )).
(3.27)
=1
Holding to custom, we denote the H-function by H , but it should not be confounded with a Hamiltonian. The H-theorem asserts that, for every k, H (τ ; k) H (τ + 1; k)
(τ = 1, 2, · · · ).
(3.28)
We can prove this “theorem” under the aforementioned assumptions on the transition probability. Here we describe the proof due to Yosida [22]. First we shall show M
h(P(τ, k, j))P(1, j, ) h(P(τ + 1, k, )).
(3.29)
j=1
Summing both sides of (3.29) from = 1 to M, we obtain the desired relation (3.28). The inequality (3.29) can be deduced from a geometric property (convexity) of the graph of the function h( p). Let us plot points q j = (P(τ, k, j), h(P(τ, k, j)))
( j = 1, · · · , M)
on the graph of h( p) (see Fig. 3.4). Because d 2 h( p)/dp 2 = 1/ p, the graph is convex downward in the range p 0. Hence the polygon that has {q 1 , · · · , q M } as its vertexes is convex. We give for each point q j a weight of P(1, j, ). Because of (3.24), the linear combination of the vectors P(1, j, )q j ( j = 1, · · · , M) gives the “center of mass,” whose coordinates are given by, using (3.23),
3.2
Randomness as Hypothetical Simplicity
127
h (P) 0
1 P
Q
q Q*
Fig. 3.4 The relation between the H-function and the statistical average
⎛ ⎞ M M Q=⎝ P(1, j, )P(τ, k, j), P(1, j, )h(P(τ, k, j))⎠ ⎛
j=1
j=1
= ⎝ P(τ + 1, k, ),
M
⎞
P(1, j, )h(P(τ, k, j))⎠ .
j=1
Because the polygon is convex, the center of mass Q must be located inside the polygon. On the other hand, the point Q ∗ = (P(τ + 1, k, ), h(P(τ + 1, k, ))) is on the graph of h, which must be below Q because the graph is convex downward (see Fig. 3.4). Thus we have proved (3.29). Equilibrium is defined by H (τ ; k) = H (τ + ν; k)
(∀ν > 0).
(3.30)
Under the indecomposability condition (3.25), equilibrium can be achieved if and only if all points q j join together, i.e., P(τ, k, ) =
1 M
(∀ ).
(3.31)
This is exactly the principle of equal weight.
3.2.4 Statistical Equilibrium As we have shown in the preceding subsection, the H-theorem claims that the transition probability of a test particle achieves a statistical equilibrium by arriving at a uniform distribution. This is seemingly a trivial conclusion, basically saying that we have no idea where the test particle is. However, a nontrivial aspect of statistics
128
3 The Challenge of Macro-Systems
arises from an argument pertaining to the ensemble (the set of possible states) that is given “restrictions” by governing laws of mechanics. Here, we consider a physical system of many particles. We denote by x = t (q, p) the state vector of a test particle, where q = t (q1 , q2 , q3 ) ∈ Ω (⊂ R3 ) and p = t ( p1 , p2 , p3 ) ∈ R3 are the vector position and momentum respectively. The state space of a test particle is μ-space Ω × R3 . As we shall see later, the magnitude of momentum will be limited by a physical constraint, so the domain Σ of possible states in μ-space is bounded. To avoid the technical hurdle of dealing with continuous variables, we divide Σ into small cells σk (k = 1, · · · , M). We use the previous notation P(τ, k, ) of the transition probability. Assuming a certain initial statistical distribution f 0 (k), we define
f ( , τ ) = N
f 0 (k)P(τ, k, ),
(3.32)
k
where N is the total number of particles. This function f ( , τ ) is called a distribution function, which represents the expected number of particles in a cell σ at time t. The equilibrium distribution, which is independent of τ , will be denoted by f ( ). The H-theorem claims that the transition probability converges to the equalweight distribution (over the domain Σ) which is the minimizer of the H-function. Here, however, the setting of the problem is somewhat complex; the ensemble Σ is not prescribed, but is to be defined by certain constraints due to the governing mechanics laws. We postulate that the statistical distribution still tends to minimize the H-function under such constraints. As we shall find, this hypothesis is consistent to the equal-weight distribution on a set of possible states in Γ -space.12 We assume that the system is isolated from the external world, i.e., particles as well as the total energy are confined in the system. It is also assumed that neither particles nor energy are created or annihilated inside the system. Summing f ( ) for all cells gives the total particle number N , i.e., M
f ( ) = N .
(3.33)
=1
Next we formulate the energy conservation law. We assume that a particle has energy E when it is located in a cell σ . Then, M
f ( )E = E,
(3.34)
=1
12 In the previous argument (cf. Sect. 3.2.1), the motion in Γ -space is considered to be deterministic. However, it is extremely non-integrable (chaotic) because of the huge degree of freedom. Thus, we construct a statistical ensemble by trials with slightly different initial conditions in each initial cell σ j and define a transition probability.
3.2
Randomness as Hypothetical Simplicity
129
where E is the total energy of the system. We assume that E ( = 1, · · · , M) and E are positive constants. A system in which the total particle number and total energy are given as constants is called an isolated system or a micro-canonical ensemble. The minimizer of the H-function under the constraints (3.33) and (3.34) is found by a variational principle: δ
M =1
f ( ) log f ( ) + α
M
f ( ) + β
=1
M
f ( )E = 0,
(3.35)
=1
where α and β are the Lagrange multipliers. Solving (3.35), we obtain f ( ) =
1 −βE e , Z
(3.36)
where we have put Z = e1+α . This distribution function is called the Gibbs distribution. The parameters α and β are determined by plugging (3.36) into (3.33) and (3.34), and are written in terms of N and E (see Problem 3.2). By physical intuition, we assume that β is positive, because the probability of a higher energy state must be smaller. Writing β=
1 , kB T
(3.37)
we define the temperature T , where the coefficient kB (relating temperature and energy) is the Boltzmann constant (see Note 3.5). The Gibbs distribution (3.36) shows that the equilibrium distribution is a function of the energy Ek . This result is seemingly contradictory to the principle of equal weight which implies a homogeneous distribution over an ensemble. We may understand the reason why an inhomogeneous distribution occurs by considering that the “constraints” pose obstacles for the H-function to arrive at the minimum. While the mass conservation law (3.33) is implemented in the axiom of the transition probability, the energy conservation law (3.34) is an additional condition. Because of this constraint, the motion (transition) of a test particle is not free; if it moves to a place of higher energy, some other particles must lose energy to compensate for the energy gain of the test particle. Thus, the ensemble (set of possible states) is not well specified in μ-space; we should describe it in Γ -space. The state vector ξ in Γ -space represents the state of all the particles. Denoting the state of each particle by ( j) ( j = 1, · · · , N ), which indicates the location of the cell in state space of a single particle, we may write ξ = t ( (1) , · · · , (N ) ). The energy conservation law can be formally written as E( (1) , · · · , (N ) ) = E,
(3.38)
where E is the Hamiltonian (energy) of the total system. The relation (3.38) determines a hypersurface (level-set of the Hamiltonian) in Γ -space, which is normally
130
3 The Challenge of Macro-Systems
a closed sphere-like surface. Let us denote it by G. The state vector ξ moves on G, which may be considered as a stochastic process. We now apply the H-theorem on the set G and claim that the transition probability converges to the homogeneous (equal weight) distribution over G. By projecting this equally weighted hypersurface onto μ-space, we obtain the equilibrium distribution that indeed agrees with the aforementioned Gibbs distribution. Let us prove this fact. We denote by f ( ) the number of particles projected into a cell σ of μ-space. The number of different states (in Γ -space) corresponding to a certain partition f (1), · · · , f (M) is given by W ( f (1), · · · , f (M)) =
N! . f (1)! f (2)! · · · f (M)!
If every possible state on G is counted (with the same weight = 1), the total W reaches a maximum when conditions (3.33) and (3.34) determine G. Hence the equilibrium distribution f ( ) ( = 1, · · · , M) corresponding to the homogeneous (equal-weight) probability on G is determined by a variational principle13 : M M δ log W ( f (1), · · · , f (M)) + λ f ( ) + μ f ( )E = 0. (3.39) =1
=1
The reason why we maximize log W , instead of W , is explained in Note 3.5. The function log W is called entropy. Because N and all f ( ) are large numbers, we can apply Stirling’s formula, i.e., for a large m, log m! = m log m − m +
1 log (2π m) + O(1/m). 2
Using this formula, we find log W ( f (1), · · · , f (M)) = −
M
f ( ) log f ( ),
=1
which shows that the entropy is equivalent to −H (i.e., minus H-function). The variational principle (3.39) now reads as δ −
M
f ( ) log f ( ) + λ
=1
M =1
f ( ) + μ
M
f ( )E = 0,
(3.40)
=1
which is equivalent to (3.35).
13
In (3.34), we assume that the energy of each particle is determined independently of other particles; compare it with the general form of the energy (3.38). Thus, the interaction of particles is not precisely considered. We shall reconsider the problem of particle interactions more carefully in Sect. 3.3.
3.2
Randomness as Hypothetical Simplicity
131
3.2.5 Statistically Plausible Particular Solutions The Gibbs distribution (3.36) is derived from the combination of the H-theorem (or the principle of maximum entropy) and the mechanical constraints (particle and energy conservation laws). If we do not have the “knowledge” of mechanical constraints, the statistical theory predicts smaller H (or larger entropy), trivializing the result. Here, we get into a fundamental question—the Gibbs distribution is derived by imposing the constraints on N and E, but is there any other constraint? If there is another “macroscopic” conservation law (and we know it), it imposes another constraint, making an additional obstacle for the minimization of H . Here, a “macroscopic” constant of motion is an integral (or a sum) over the ensemble such as # " f ( )F( ) , (3.41) F= f (x)F(x) d x or F =
where F(x) is a certain physical quantity (F = 1 gives the particle number and F = E gives the energy). Here, we return to continuous variables, using the integral forms in the places of the discrete forms of the particle and energy conservation laws (3.33) and (3.34). If N , E, and F are constrained, the variational principle becomes δ f (x) log f (x) d x − α f (x) d x (3.42) f (x)E(x) d x − β2 f (x)F(x) d x = 0, −β1 which yields f (x) =
1 −β1 E(x)−β2 F(x) e . Z
(3.43)
This distribution corresponds to the equal-weight distribution on the manifold defined by the constraints of N , E, and F in Γ -space. Yet, our fundamental question is not removed—are there really no other constants of motion? It is much more difficult to prove the “absence” of another constant of motion than to find a new one (if it exists). A conservation law is “knowledge” gained through the study of mechanics. However, we have already abandoned the attempt to study the equation of motion of a high-dimensional system. Therefore, trying to prove the absence of the constant of motion is an absurdity; we have to reserve the conviction that there is no other conservation law as a “hypothesis”—this is the so-called ergodic hypothesis. In fact, we admitted this hypothesis when we derived the Gibbs distribution. Let us review the derivation carefully, and see how the ergodic hypothesis braces the theory. We assume that a large number of particles contained in an isolated system
132
3 The Challenge of Macro-Systems
collide frequently and exchange their energies. Though the energy of each particle changes, the total energy E is kept constant. Thus, the state vector of Γ -space moves on the hypersurface defined by the macroscopic energy conservation law, which is the so-called micro-canonical ensemble. We postulate that the state vector can move freely on this hypersurface; there are no attracting/repelling places, no separated subdomains, i.e., we accept assumptions (3.24) and (3.25) for the transition probability. These assumptions are equivalent to the ergodic hypothesis. The H-theorem, then, asserts the equal-weight (homogeneous) probability distribution on the ensemble. The ergodic hypothesis bridges the gap between the “conservation laws that we know” and the “conservation laws that really exist.” The “complexity” of a high-dimensional system gives us confidence in believing that there is no symmetry, or conservation law, except for that of the total energy.14 The randomness— homogeneous distribution over an ensemble—is the simplest connotation of “complexity” in a statistical context. The Gibbs distribution is justified phenomenologically in many “equilibrium” systems. Such an equilibrium is called heat death; there exist no structures or collective motions such as flows, waves, vortices, and patterns. Of course, the real world is not in the mode of heat death. We are living in dynamic nature, where diverse structures emerge and evolve. In the next section, we shall come back from axiomatic discussions on ultimate randomness to the study of dynamics, and consider how the collective motion of many elements can organize structures.
3.3 Collective Phenomena 3.3.1 Nonequilibrium and Macroscopic Dynamics The modes of motions of a macro-system are far richer than what we can imagine on the basis of knowledge about the mechanics of a single element. We call them collective phenomena. Educated by modern science, we know that various materials (fluids, solids, or bodies of living things) consist of a huge number of micro-particles (molecules). In the realm of micro-scales, there are detailed, elegant theories about the physical and chemical properties of materials. Diverse and complex phenomena in the macroscopic world, which are actually observable, are composed of motions of microscopic particles. However, it is impossible in reality to study macroscopic phenomena by calculating motions of a huge number of elements (a single gram of a given material is composed of about 1 mole, or 6 × 1023 molecules, and each of these is composed of even smaller units); one cannot see the forest for the trees. 14 The energy conservation law is due to the symmetry of the Hamiltonian H (of the total system) with respect to time, i.e., ∂ H/∂t = 0 yields d H/dt = ∂ H/∂t + {H, H } = 0. The constancy of the total particle number is a more basic conservation law that is not due to a “symmetry,” but is due to a topological constraint in Hamiltonian mechanics.
3.3
Collective Phenomena
133
To study the diversity of the macroscopic world, which is abstracted by reductions into elements, we need theory that describes collective phenomena in macro-scale realms. The theory of statistical equilibria, described in the preceding section, describes macro-scale systems in their “simplest” form; there is neither structure nor motion in the equilibrium, heat death. We have yet to develop the theory of non-equilibrium phenomena to describe and understand the unlimited diversity of structures and complex behavior of macro-systems in the real world. In Sect. 2.4.3, we introduced the notion of collective order; we studied the order that is shared in a group of particles (or abstract elements constituting a general system) that have different initial conditions but move independently without interactions. In this section, we consider a group of “interacting particles.” We can use the equation of collective motion (2.64) or its canonical form (2.116), as the frame of theory. However, instead of considering particles that obey an a priori rule (i.e., a prescribed Hamiltonian H or flow field V ), we have to formulate the autonomous dynamics generated by particles’ interactions, where the Hamiltonian is no longer a given function, but a part of dynamical variables. In the statistical model discussed in the preceding section, the interactions among particles are assumed to be random. That is, a test particle—a representative, in a statistical sense, of the group of interacting particles—is thought to move under random force. The stochastic process leads the transition probability to the homogeneous (equal-weight) distribution. However, it is a rough assumption to consider that all interactions are just random. Collective phenomena, such as flows and waves, occur because the particles composing a system do not move quite at random but they have a certain order, or coherence. Isn’t it possible, then, to separate particle interactions into a random part and an ordered (structured) part? On the microscopic level, the particles are not moving under a complete order; they are almost disordered. But on the macroscopic level, the random part of microscopic motion is averaged out at zero so that a coherence, even a slight tendency, of motion emerges as a collective motion. The mechanics of such collective motion may be described by a macroscopic interaction, the commensurable part of microscopic interactions.
3.3.2 A Model of Collective Motion In the framework of classical mechanics, the state of a single particle is represented by a six-dimensional vector x = t (q, p), where q and p are the vector position (coordinates) and the momentum of a particle. The state space of a system of N particles (i.e., Γ -space) is the product space composed of each particle’s state space, with a dimension, thus, of 6 × N . An exact state of the system is represented by a single vector of Γ -space and its evolution is described by an orbit (curve) there. The Hamiltonian governing the motion in Γ -space is an “encyclopedia” containing a huge amount of information about the system, which is represented by a function of all the state variables (and time, if the system is not autonomous).
134
3 The Challenge of Macro-Systems
Though the description of mechanics in Γ -space is perfect as an “idea,” it is practically impossible to describe and analyze events because of its tremendous degree of freedom. Therefore we invoke the notion of a test particle and describe its motion in μ-space. In the preceding section, we assumed that the interactions of a test particle and other (myriads of) particles are random. But here, we are trying to take out the collective component from complex interactions of particles through a more careful analysis of macroscopic dynamics. Because the state variables are common for all particles, we may put all particles into μ-space. First we assume that the motions of all particles are already known (though it is of course an unfounded assumption)—these particles moving in μspace are called field particles. We put a virtual particle as a test particle in μ-space, which moves under the influence of the field particles; the field particles are the causes of the motion of the test particle (the field particles must be a priori objects). In the Chap. 2 discussion, the “field” was assumed to be a known function included in the Hamiltonian. In the preceding section, the “field” was also a priori—it was assumed to exert random force on the test particle. Neither of these assumptions applies here; the motion of the field particles, which was assumed to be transcendental, is indeed what we want to determine (in some appropriate sense). Thus, our aim is to establish “consistency” such that set of test particles = set of field particles.
(3.44)
We have to interpret this relation in a “statistical” sense, because knowing the exact motion of all field particles is a matter of impossibility—let us formulate the problem explicitly. We assign a number to each field particle and denote the state vector of the jth particle by x j (t) = t (q j (t), p j (t)) ( j = 1, · · · , N ). We denote by x(t) = t (q(t), p(t)) the state vector of a test particle whose motion is determined by the forces given by the field particles (the field particles are not influenced by the test particle, because they are given a priori). The Hamiltonian of the test particle may be written as Hμ (q, p; x 1 , · · · , x N ). The equation of motion governing the test particle is d dt
q ∂ p Hμ . = −∂q Hμ p
(3.45)
Providing a variety of initial conditions, we obtain a group of test particles. Different test particles obey the same equation of motion (the difference is only in the initial condition), which means that they do not interact; recall the discussions in Sect. 2.4.3. The corrective equation of motion (2.116), determining a constant of motion u(x, t), now reads as ∂ u + {Hμ , u} = 0. ∂t
(3.46)
3.3
Collective Phenomena
135
Let us see an example of a μ-space (i.e., test particle’s) Hamiltonian Hμ . We consider a galactic system consisting of many “particles” that have a common mass m (positive constant) for simplicity. The gravitational interactions of particles dominate the evolution of the system. We formulate the equation of motion within a nonrelativistic framework. The gravitational potential (Newtonian potential) produced by the jth particle is U j (q, t) =
−mG , |q − q j (t)|
where G is the gravitational constant. This U j (q, t) is the solution of Poisson’s equation (potential equation) with a point mass: ΔU j = mGδ(q − q j ),
(3.47)
where Δ is the Laplacian with respect to the coordinates (q1 , q2 , q3 ). Because we are considering a point mass, the potential U j (q, t) diverges at the position of the particle q = q j (t). While we cannot evaluate −∂q U j at the position of the particle, we assume that the gravitational force is zero at that point. Because the potential equation (3.47) is linear with respect to the distribution of the mass, the total potential energy produced by all particles is given by U (q, t) = j U j (q, t). Adding the kinetic energy of the test particle, we obtain the Hamiltonian: Hμ =
| p|2 1 | p|2 − m2G = + mU (q, t). 2m |q − q k (t)| 2m k
(3.48)
If we knew the orbits of the field particles, we would be able to define a “deterministic” distribution function (recall the deterministic limit (3.19) of the transition probability): f K (x, t) =
N
δ(x − x j (t)).
(3.49)
j=1
This formal expression of the distribution of the field particles is called the Klimontovich distribution function. We easily verify that f K (x, t) is one possible solution of the equation of collective motion (3.46) (see Problem 3.3). Remember that (3.46) is the equation determining the constants of motion of the “test particle”; the density distribution u(x, t) of test particles is certainly a constant of motion (see Note. 2.3). The fact that f K (x, t)—the distribution of the field particles—satisfies (3.46) means there is exact “consistency” (3.44). As we have noted, however, it is impossible to know the orbits of all particles. Equation (3.46) of collective motion is only a virtual form extending the microscopic law to a macroscopic one, which cannot be solved in a practical sense. The Klimontovich distribution function is also just a conceptual representation.
136
3 The Challenge of Macro-Systems
To construct a model that is really solvable, we appeal to a statistical framework. Instead of the exact data of the field particles, we use the probability density (distribution function) of the field particles and estimate the averaged (in a statistical sense) field. The Hamiltonian is, then, defined by this averaged field, which we shall denote by H μ . Suppose that the “statistical distribution” (not the exact distribution f K (x, t)) of the field particles is given by f (x, t) = f (q, p, t). The coordinate-space density is, then, a statistical distribution given by ρ(q, t) =
f (q, p, t) dp.
(3.50)
The averaged field (gravitational potential) is determined by Poisson’s equation ΔU = mGρ(q, t).
(3.51)
We note that the distribution of the point mass in (3.47) has been replaced by a smooth function ρ(q, t). Accordingly, the potential field U (q, t) is also regularized (see Fig. 3.5).15 Using U for the potential energy in the Hamiltonian of a test particle, we define Hμ =
| p|2 + mU (q, t). 2m
(3.52)
The corresponding equation of collective motion becomes ∂ u + {H μ , u} = 0. ∂t
(3.53)
The statistical distribution f of the field particles must be a constant of motion. Hence, “consistency” (3.44) now demands that u = f satisfy the equation of collective motion (3.53) and the field equations (3.50) and (3.51) simultaneously. The equation of collective motion (3.53) that uses the averaged field is called Vlasov’s equation. Let us compare the Hamiltonian Hμ of the particle picture and H μ of the statistical formulation. The irregular potential field of the former model is replaced by a smooth (averaged) field in the latter one (see Fig. 3.5). Detailed information about the exact positions of individual particles is lost in the statistical model. By this reduction (coarse graining), the statistical model becomes a practical problem.
15
The left-hand side of Poisson’s equation (3.51) is the second-order derivatives of U , which must balance with the right-hand side mGρ. After differentiating twice, mGρ is more irregular than U . To put it another way, U is smoothed by being integrated twice. For systematic study of the theory of potentials, the reader is referred to Adams & Hedberg [2] and Gilbarg & Trudinger [7].
3.3
Collective Phenomena
(a)
137
(b)
Fig. 3.5 (a) The irregular potential produced by many “particles” and (b) the smooth potential produced by a smooth statistical distribution of “density.” These potentials are given by solving Poisson’s equation (here the dimension of space is two, and the sign of the right-hand side term is chosen to give a convex potential)
The Vlasov–Poisson system (3.53), (3.50), and (3.51) cannot describe microscopic (particle-to-particle) interactions (in the next subsection, we shall try to recover the effect of microscopic interactions in a statistical sense), but can model the “autonomous” nature of a macroscopic system. The Hamiltonian H μ is not an a priori blueprint of dynamics, but a “harmonious rule” that is self-determined by the collective motion itself.
3.3.3 A Statistical Model of Collisions In the preceding subsection, we formulated Vlasov’s equation (3.53) as a model of collective motion of a many-particle system. This model cannot describe the particle-to-particle direct interactions, i.e., scattering by the irregular potential field near particles (see Fig. 3.5). In a system with almost no collisions, Vlasov’s equation is useful,16 but when collisions are not negligible, we need an appropriate extension. Needless to say, we should not plan to solve the particle interactions directly. A practical and plausible method is to model the effect of collisions in terms of a random force (see Sect. 3.2.1). The effect of a random force can be represented by adding a diffusion term to Vlasov’s equation. This type of equation appears universally when we formulate an evolution equation governing a transition probability
16
A collisionless model works well as a good approximation for standard galactic systems or hightemperature plasmas (see for example Hazeltine & Waelbroeck [12], Nishikawa & Wakatani [16] and Van Kampen & Felderhof [20] for theoretical modeling of plasmas and applications).
138
3 The Challenge of Macro-Systems
(cf. Sect. 3.2.2). Leaving the derivation of the equation for Note 3.6, we show here the form of equation and explain the effect of collisions. Let us denote by p(t, x 0 , x) the probability density of transition from x 0 to x after time t. We put a(x 0 , t) =
b(x 0 , t) =
(x − x 0 ) p(t, x 0 , x) d x, |x − x 0 |2 p(t, x 0 , x) d x
and define (rewriting x 0 as x) 1 a(x, δt ), δt 1 D(x) = lim b(x, δt ). δt →0 2δt
V(x) = lim
δt →0
(3.54) (3.55)
The evolution of the transition probability density p(t, x 0 , x) is described by Kolmogorov’s equation ∂ p + ∇ · (V p) = Δ(Dp), ∂t
(3.56)
where the differential operators are the derivatives with respect to x = t (q, p). The transition probability density p(t, x 0 , x) contains the initial position x 0 as a parameter, which, however, is not essential for the later discussions. We consider a certain initial distribution f 0 (x 0 ) of test particles and define a distribution function f (x, t) by f (x, t) =
p(t, x 0 , x) f 0 (x 0 ) d x0 .
(3.57)
By (3.16), f 0 (x 0 ) d x0 = f (x, t) d x = N (the number of particles). The governing equation of f (x, t) is derived by multiplying both sides of (3.56) by f 0 (x 0 ) and integrating over x 0 . Because V and D are independent of x 0 , we easily obtain ∂ f + ∇ · (V f ) = Δ(D f ), ∂t
(3.58)
which is called the Fokker–Planck equation. A unique property of (3.58), in comparison with (2.64) and other “wave equations” introduced in Note 2.2, is that it includes a second-order derivative term. This type of PDE is called a diffusion equation. The coefficient D is a diffusion coefficient, which is related to the standard deviation of the displacement (by its definition (3.55), D 0).
3.3
Collective Phenomena
139
The inclusion of the second-order derivative term Δ(D f ) changes the mathematical property of the PDE drastically; an evident change is the violation of temporal reversibility. If a PDE consists of only first-order differentials ∂/∂t and ∂/∂ x j (like a wave equation), temporal reversal (t → −t) can be neutralized by spacial reversal (x j → −x j ), i.e., the equation is invariant against these transformations. Physically, this means that the phenomenon described by such a PDE is temporally reversible. However, the diffusion equation is temporally irreversible. In fact, the aforementioned transformation changes the sign of the diffusion term (D → −D), violating the intrinsic non-negativity of D due to the definition (3.55). Mathematically, if the diffusion coefficient is negative, the diffusion equation (3.58) cannot be solved. This fact is intuitively evident—diffusion is a stochastic process dissipating “information” (cf. Sects. 3.2.3 and 3.2.4). Tracking back time requires recovery of the lost information, which is a matter of impossibility.17 Finally, let us see the relation between the Fokker–Planck equation (3.58) and Vlasov’s equation (3.53). As we have noted in Sect. 3.2.2, the “deterministic limit” of the transition probability recovers the notion of orbits; see (3.19). We consider a test particle in which motion is determined by the averaged Hamiltonian H μ . Suppose that a test particle starts from x 0 and moves to x(t) at time t. The transition probability then localizes on orbits, thus we may write p(t, x 0 , x) = δ(x − x(t)). Expressing the deterministic equation of motion (with the Hamiltonian H μ ) as d x/dt = V , we may write, for an infinitesimal δt , p(δt , x 0 , x) = δ(x − x 0 − V δt ). Using this “probability” in (3.54) and (3.55), we obtain V = V and D = 0. Hence the Fokker–Planck equation (3.58) reduces to ∂ f + ∇ · (V f ) = 0. ∂t By Liouville’s theorem (Note 2.3), ∇ · V = 0, hence we obtain ∂ f + (V · ∇) f = 0, ∂t
(3.59)
which is equivalent to Vlasov’s equation (3.53). We note that Vlasov’s equation is not purely “deterministic,” because it uses the statistically averaged Hamiltonian H μ. 17
The initial-value problem of a temporally reversed diffusion equation is often called an inverse problem, which is a typical ill-posed problem. However, it is an interesting and practically important problem if viewed as a problem to guess missing information (see for example Kirsch [13]).
140
3 The Challenge of Macro-Systems
Notes Note 3.1 (Attractor) An “attractor” is literally a domain (set), in state space, to which the state vector is attracted. In this note we give a formal definition of an attractor. Let us consider a dynamical system represented by a one-parameter (t ∈ R) continuous map T (t) on state space X . An orbit that passes a point xˆ ∈ X at time t = 0 is given by x( xˆ , t) = T (t) xˆ . Here, we assume that T (t) is defined also for t < 0. Let Ω ⊂ X be an open set that contains initial points. At time t, T (t) maps Ω to T (t)Ω = {T (t) xˆ ; xˆ ∈ Ω}. A closed set A ⊂ X is called an invariant set if T (t)A ⊆ A for all t ∈ R (i.e., every orbit rooted in A never comes out from A). An invariant set A is called an attractor if A is included in an open set B that satisfies the following two conditions: (1) For every t 0, T (t)B ⊆ B. (2) For every open set Ω that includes A, there exists τ such that T (t)B ⊆ Ω for all t τ . The union of all Bs (the sets that satisfy the above conditions) is called a basin. Because an attractor A is an invariant set, A ⊆ T (t)B (∀t * 0), i.e., T (t)B T (t)B = A. cannot shrink smaller than A. By the condition (2), we find that t0
Note 3.2 (Hausdorff (Fractal) Dimension) The notion of “dimension” is related to “measures” of the theory of integration. For example, length is evaluated by the one-dimensional measure, and area (square measure) is evaluated by the two-dimensional measure. The dimension of measure can be generalized to an arbitrary real number. Suppose that we measure the size of an object (set) A ⊂ Rn . For instance, its size may be infinite if measured by the one-dimensional measure, while it is zero if measured by the two-dimensional measure. Then there must be a certain “dimension” d, between 1 and 2, such that a finite size is evaluated by the d-dimensional measure. A rigorous mathematical definition of such an intermediate dimension of measure is given by the following Hausdorff dimension. Let A ⊂ Rn be a bounded (non-empty) closed set. We denote by B j an ndimensional ball and by r(B j ) its radius. We say that a set {B j ; j = 1, · · · , k} covers A if A⊆
k + j=1
Bj.
Notes
141
Restricting r(B j ) < δ (∀ j), we try to cover A by a countable number of balls. We define, for a real number d ( n), μdδ (A) = inf
A⊆∪B j r(B j )<δ
αd r(B j )d ,
where αd = π d/2 /Γ (d/2 + 1) is the volume of a d-dimensional unit ball. With a larger δ, the coverage has a larger freedom, so μdδ (A) may have a smaller value. Hence μdδ (A) is a monotone decreasing function of δ. Therefore we can define μd (A) = lim μdδ (A), δ→0
which is called the Hausdorff measure of A. The Hausdorff dimension of A is the infimum of d such that μd (A) = 0, i.e., dimH (A) = inf{d 0; μd (A) = 0} = inf{d 0; μd (A) < ∞} = sup{d 0; μd (A) > 0} = sup{d 0; μd (A) = ∞}. There are other definitions of intermediate dimensions; for example, one may use the scaling of the number of balls for coverage vs. the size of balls (see Sect. 4.2.3). For systematic study of the geometric measure theory, the reader is referred to Morgan [15].
Note 3.3 (Order and Chaos in Infinite-Dimensional Linear System) If a system has a very high degree of freedom, its motion may be considerably complex even when the governing equation of motion is linear. In an infinitedimensional state space (cf. Notes 1.1 and 2.1), events may have unlimited diversity. In Sect. 2.4, we defined “order” by integrability which means that the system has as many constants of motion as its degree of freedom. Even if superficial phenomena are complex, the system is integrable if it can be decomposed (reduced) into independent modes. In a linear system, the mathematical processes of (1) the decomposition of the system into modes, (2) the integration of the equation of motion of each mode, and (3) the reconstruction or the composition of the modes can be done by (1) the spectral resolution of the generator by solving the eigenvalue problem, (2) generating exponential functions, and (3) the vector composition of all the modes, respectively. For a finite-dimensional system, these processes are straightforward (see Sect. 2.3.2). However, if the degree of freedom is infinite, they are not simple. Even if the system is decomposable and the motion of each mode is a simple exponential function, it is questionable whether the sum (superposition) of an infinite number of the simple motions is actually simple. In fact, in an infinitedimensional system, integrability does not necessarily imply simplicity in the
142
3 The Challenge of Macro-Systems
phenomenological realm. Complex phenomena of infinite-dimensional linear systems are called “quantum chaos” or more generally the “chaos of waves.” In an infinite-dimensional space, the problem of complexity pertains to the following questions: An abstract theory may assert the “existence” of a complete set of constants of motion, however, it is generally rather difficult to find their explicit forms. The composition, then, of such an infinite number of modes is purely virtual. How can we deduce concrete knowledge about the behavior or predict the future state from such a theoretical construction? For example, an abstract theory may be able to prove the existence of the sum of infinitely many modes by Cauchy’s convergence test. However, we cannot calculate the limit of the sequence of the sum if we do not know an explicit “regularity” of the sequence; we cannot understand the “limit” without concrete knowledge of an order (structure) of infinite modes. Due to von Neumann’s theorem, Schr¨odinger’s equation can be solved by the spectral resolution of the generator (the quantized Hamiltonian) (see Note 2.1). If the spectra of the generator consist of only point spectra (eigenvalues), this infinitedimensional linear dynamical system can be decomposed into an infinite number of independent harmonic oscillators. The energy and the initial phase of each oscillator are the constants of motion (cf. the example of (2.96)). This system is, thus, integrable. However, it is generally difficult to execute the spectral resolution. The abstract theorem just guarantees the possibility of integrating the motion by an infinite number of eigenvalues and eigenfunctions. The “limit” (the infinite sum (2.102) or the integral (2.105)) can be actually understood, without executing the infinitely many calculations, only if the distribution of the eigenvalues and the structures of the eigenfunctions have some clear order (regularity) like equal difference or geometrical progression. In general, eigenvalues do not distribute regularly. The “Sinai billiard” example gives an intuitive understanding about chaos in a quantum-mechanical system. We consider a stadium-like domain Ω that is surrounded by a high fence (see Fig. 3.6). We put a ball in Ω and observe its motion. In a classical-mechanical picture, we can imagine chaotic motion of the billiard ball. If this system is so small that quantum mechanics applies, we must study the wave function representing the trapped particle. In such a system, the Hamiltonian has only point spectra, and hence, the wave function is given by the composition of an infinite number of eigenfunctions each of which oscillates with a given eigen-
Fig. 3.6 An example of classical-mechanical chaotic motion in a so-called “Sinai billiard.” In a quantum-mechanical microscopic situation, the wave function of an electron trapped in a tablelike domain generates complex structure by interferences of reflected waves. A similar structure appears in macroscopic electromagnetic waves captured in a cavity
Notes
143
frequency. The distribution of the eigenfrequencies (the eigenvalues of the Hamiltonian) is shown to be disordered, thus the wave function is a composition of an infinite number of oscillations with irregular frequencies. Many physicists are seeking the origin of “quantum chaos” in the complexity of the spectral distribution of the Hamiltonian. It is expected that the irregularity in the distribution of the spectra is related to the chaos of the orbits (rays) in the classical limit (or eikonal approximation) [5, 10, 11, 14, 19]. In mathematical spectral theory, a wave that can be decomposed by point spectra (eigenvalues) is “integrable” in a sense (cf. Note 2.1 and Reed & Simon [17]); thus, the use of the word “chaos” in that territory may meet with some disapproval. If the generator has continuous spectra, however, there is no trammel of the integrals in the discussion about chaos. If the eigenspace of the point spectra does not cover the entire space so that the complementary space belonging to continuous spectra still contains an infinite degree of freedom, then chaos can freely develop in this subspace. Physically, a point spectrum represents the energy (frequency) of a wave function that is trapped by a potential well (the localized wave function behaves as a standing wave that may be viewed as a harmonic oscillator). On the other hand, a continuous spectrum describes a non-stationary (transient) wave that moves in unbounded space; chaos—true movement—occurs in this range of energy. A generator (Hamiltonian) of a quantum-mechanical system is a self-adjoint operator (cf. von Neumann [21]). In a more general wave system, a non-self-adjoint generator may appear. A general matrix, a linear operator in a finite-dimensional space, can be reduced into a Jordan canonical form, and the structure of its exponential function is well understood (see Sect. 2.3.3). However, there is no general theory that enables reduction of an arbitrary operator in an infinite-dimensional space. The analysis of a general non-self-adjoint generator in an infinite-dimensional space still remains undeveloped. There are a variety of interesting phenomena produced by non-self-adjoint generators.18
Note 3.4 (Wiener Process) A motion described as a probabilistic (stochastic) process is called a Brownian motion, commemorating botanist R. Brown (1773–1858) who studied the irregular motion of pollen floating on water. In mathematics, such a random process is formulated as a Wiener process. Let Rn be the state space. A Wiener process is a set of motions (we denote by x(t) a sample of motion) that satisfy the conditions: 1. For all t0 < t1 < · · · < t N , every x(t ) − x(t −1 ) ( = 1, · · · , N ) is an independent statistical variable.
18
Nonlinear waves are generally disordered, but some nonlinear equations are integrable in the sense that we may formulate a process of calculating the complete data to construct the wave forms (see for example Ablowits & Segur [1]).
144
3 The Challenge of Macro-Systems
2. Each component x j (t) ( j = 1, · · · , n) of the vector x(t) is an independent statistical variable, and the probability of x j (t)−x j (s) is given by a normal distribution 2 ∝ e−|x j (t)−x j (s)| /2|t−s| . A sample motion x(t) is continuous, but is not continuously differentiable almost everywhere; the state at time t does not have correlation with any future state. If we define the force that generates such a motion, it appears as a time series of random impulses, which is the term G in (3.15). The statistical equation (3.15) has another term representing conventional force F. By the combination of F and G, a variety of random motions may be generated. The Brownian motion is explained by a model that includes friction force F = −cq (c is a positive constant representing the coefficient of friction); the corresponding statistical motion is called the Ornstein– Uhlenbeck process (see Note 3.7). For systematic study of statistical evolution equations, the reader is referred to Feller [6] and Soong [18].
Note 3.5 (Entropy) In Sect. 3.2.3, we gave a justification for the principle of equal weight by the Htheorem. In the proof, we need only the convexity of h( p) for the H-function. If we employ a different convex function to define another H-function, the principle of equal weight still holds, but the variational principle (3.35) yields a different distribution function. This ambiguity of the H-function is eliminated when it is related to entropy in order to consider the equal-weight probability distribution on a micro-canonical ensemble (see Subsec. 3.2.4). We have yet to discuss the reason why we take the logarithm of W to define entropy. Let C be a macroscopic system. Suppose that a “macroscopic” quantity F C is defined by summing a certain microscopic physical quantity of every particle; cf. (3.41). The system C may be divided into subsystems; A ∪ B = C (A ∩ B = ∅). Then the summations over the subsystems A and B (which we denote by F A and F B , respectively) must be related to F C by FC = F A + F B.
(3.60)
A macroscopic quantity is called an extensive variable if it satisfies the additivity relation (3.60). The number of combinations, though, is decomposed into a product of those of the subsystems; denoting the total number of possible states of the sets A, B and C by W A , W B and W C , respectively, we have W C = W A · W B.
(3.61)
The terms contributing to the variational principle (3.39) must be on equal footing, i.e., all terms must be extensive variables. Hence we have to map W to an extensive variable. This is done by taking the logarithm of W ; we define
Notes
145
S = kB log W,
(3.62)
which is called entropy. The coefficient kB has the dimension relating temperature and energy (in the standard units, kB = 1.38 × 10−23 J/K). The reader is referred to Beck and Schl¨oegl [3] for the axiomatic characterization of entropy functions (and generalization), their information-theoretic implication, and the application to statistical studies on chaotic dynamics.
Note 3.6 (Kolmogorov (Fokker–Planck) Equation) Kolmogorov’s equation (or the Fokker–Planck equation) is derived from the Chapman–Kolmogorov equality (the causality relation of transition probabilities). For simplicity, we assume that the dimension of state space is one. We denote the state variable by x (∈ R) and the transition probability density by p(t, x0 , x). Let ϕ(x) be an arbitrary smooth function which has a compact support {x; ϕ(x) = 0}. The expectation of ϕ(x) is given by ϕ(t) =
ϕ(x) p(t, x0 , x) d x.
Let us calculate the temporal derivative of ϕ(t):
∂ ϕ(x) p(t, x0 , x) d x ∂t 1 ϕ(x) [ p(t + δt , x0 , x) d x − p(t, x0 , x) d x] . = lim δt →0 δt
d ϕ(t) = dt
(3.63)
Using the Chapman–Kolmogorov equality (3.18), we rewrite the right-hand side as 1 lim δt →0 δt
ϕ(x)
p(t, x0 , y) p(δt , y, x) d yd x −
ϕ(x) p(t, x0 , x) d x . (3.64)
Taylor-expanding ϕ(x) in terms of (x − y), and approximating to order (x − y)2 , we may further rewrite (3.64) as 1 lim δt →0 δt
b(y, δt ) ϕ (y)a(y, δt ) + ϕ (y) 2
p(t, x0 , y) dy
(3.65)
with a(y, δt ) =
(x − y) p(δt , y, x) d x,
We define (cf. (3.54) and (3.55))
b(y, δt ) =
(x − y)2 p(δt , y, x) d x.
146
3 The Challenge of Macro-Systems
1 a(x, δt ), δt →0 δt
V(x) = lim
1 b(x, δt ). δt →0 2δt
D(x) = lim
Using these functions, (3.65) reads as
[ϕ (y)V(y) + ϕ (y)D(y)] p(t, x0 , y) dy.
(3.66)
Integrating by parts, we obtain , ∂ ∂2 − [V(y) p(t, x0 , y)] + 2 [D(y) p(t, x0 , y)] ϕ(y) dy. ∂y ∂y Changing the symbol y to x, we may compare this expression with the righthand side of (3.63): because ϕ(x) is an arbitrary function, the transition probability density must satisfy ∂ ∂ ∂2 p+ [V(x) p] = 2 [D(x) p]. ∂t ∂x ∂x
(3.67)
In multi-dimensional state space, we do the same calculations for each component and obtain Kolmogorov’s equation (3.56). In the derivation of Kolmogorov’s equation, we approximated the Taylor expansion of ϕ(x) to second order. This approximation puts limitations on the validity of Kolmogorov’s equation as the determining equation of a transition probability; the probabilistic properties of transitions are characterized only by the average a(y, δt ) and the standard deviation b(y, δt ). The model of Wiener process (cf. Note 3.4) gives the theoretical foundation of the probabilistic description of statistical dynamics (see Feller [6] and Soong [18]).
Note 3.7 (Gibbs Distribution and Fokker–Planck Equation) In Sect. 3.2.4, we derived the Gibbs distribution (3.36) as the minimizer of the Hfunction (maximizer of entropy). Therefore, the Gibbs distribution is expected to be an “equilibrium” of some stochastic process. Here we demonstrate more explicitly that it is a stationary solution of the Fokker–Planck equation with a simple force term. We consider a test particle that obeys the one-dimensional Langevin equation: m
d v = −Cv + G(t), dt
(3.68)
where C (> 0) is a coefficient of friction and G(t) is the random force modeling the collision of the test particle with field particles; cf. (3.15). We assume
Problems
147
G(t) = 0,
G(t1 )G(t2 ) = 2m 2 Dδ(t1 − t2 ),
where denotes the ensemble average. The friction force drags the test particle to zero velocity, while the random force causes diffusion in velocity space. The Fokker–Planck equation corresponding to (3.68) is ∂ ∂ ∂ f = (av f ) + ∂t ∂v ∂v
∂ D f ∂v
,
(3.69)
where a = C/m. The stationary solution of (3.69) is given by ( f (v) =
a −av2 /(2D) . e 2π D
(3.70)
Comparing (3.70) with the Gibbs distribution (3.36), we find that β −1 = kB T = m D/a, which relates the strength of the random force and the temperature T . The model (3.69) has only friction force as collective effect; it describes simple relaxation process into the ultimate equilibrium, heat death. In a more general model (3.3.2) of collective motion, internal collective fields (such as self-gravity in a galactic system or self-electromagnetic field in a plasma) may produce diverse structures and complex evolution.
Problems 3.1. Consider an autonomous linear dynamical system (3.3) with a generator A that has a degenerate eigenvalue producing a secular term t p etλ (we assume that λ has the largest real part (> 0) among all eigenvalues) (see Sect. 2.3.3). Evaluate the maximum Lyapunov exponent of this system. 3.2. Determine the parameters Z (= e1+α ) and β of the Gibbs distribution (3.36) in terms of the particle number N and the energy E. Show that β can formally take both positive and negative values. 3.3. Show that the Klimontovich distribution function (3.49) solves the equation of collective motion (3.46). 3.4. We consider Liouville’s equation with a Hamiltonian ψ(x, y, t): ∂ u + {ψ, u} = 0, ∂t where x and y are conjugate variables, and {a, b} = (∂a/∂ y)(∂b/∂ x) − (∂b/∂ y)(∂a/∂ x)
(3.71)
148
3 The Challenge of Macro-Systems
is a Poisson bracket (cf. Note 2.3). We assume that H and u are periodic functions of x and y with a period 2π . Fourier-expanding, we write (cf. Note 1.1) u(x, y, t) =
uˆ k, (t)ei(kx+ y) ,
uˆ k, (t) = qk, (t) + i pk, (t),
k,
where both qk, (t) and pk, (t) are real-valued functions. 1. Derive the equation of motion that governs qk, (t) and - pk, (t). 2. Show that the volume element (Lebesgue measure) k, dqk, dpk, of the infinitedimensional space of the variables {qk, , pk, ; k, ∈ Z} is invariant. Such a measure is said to be an invariant measure. By solving these problems, we find that (1) the original formulation (3.71) of the dynamics, which is represented by a PDE, is rewritten as an infinite-dimensional system of ODEs (indeed Hamilton’s canonical equations), and (2) the latter dynamics is also incompressible in the infinite-dimensional state space of the variables {qk, , pk, } (like the former one is incompressible in the state space of (x, y)), i.e., Liouville’s theorem holds in the infinite-dimensional space; see (2.115)). By this fact, we may develop statistical theory in the state space of {qk, , pk, }, that is the “Fourier-space representation” of dynamics (see also Sect. 4.4.3).
Solutions 3.1 We find that 1 p log t log t p etλ = + λ. t t Thus, the corresponding Lyapunov exponent is just λ. Therefore the Lyapunov exponent of a linear autonomous system is always given by the maximum of the real parts of the eigenvalues. 3.2 By (3.33), we obtain Z = N −1 e−βE , and by (3.34), E e−βE = E Z . Combining these relations, we obtain " # ∂ −βE log e = −E/N . ∂β Solving this, we obtain β. Then Z is determined. 3.3 For the convenience of notation, we rewrite (3.45) and (3.46), respectively, in abstract forms:
Solutions
149
d x = V, dt ∂ u + V · ∇u = 0, ∂t
(3.72) (3.73)
where V = t (∂ p H, −∂q H ) is the Hamiltonian flow, and ∇ = ∂ x = t (∂q , ∂ p ). Let ϕ(x) be a compactly supported smooth function. We define the expectation by ϕ(t) =
ϕ(x) f K (x, t)d x =
N
ϕ(x j (t)).
(3.74)
j=1
The rate of change of the expectation is given by d x j (t) d ϕ(t) = · (∇ϕ)| x=x j (t) = (V · ∇ϕ)| x=x j (t) , dt dt j=1 j=1 N
N
(3.75)
where we have used (3.72). We may write (V · ∇ϕ)| x=x j (t) =
(V · ∇ϕ)δ(x − x j (t))d x.
(3.76)
Remembering that f K (x, t) consists of the terms δ(x − x j (t)) ( j = 1, · · · , N ), we may rewrite (3.75) as d ϕ(t) = dt
(V · ∇ϕ) f K (x, t)d x.
(3.77)
The left-hand side of (3.77) may be written as ϕ(∂ f K /∂t)d x. Using Liouville’s theorem (2.115), we integrate the right-hand side of (3.77) by parts to obtain − ϕV · ∇ f K d x. Hence, (3.77) reads as
ϕ
∂ f K + V · ∇ f K d x = 0. ∂t
(3.78)
Since ϕ is arbitrary, (3.78) is equivalent to (3.73). We thus find that the Klimontovich distribution function f K satisfies (3.46). 3.4 Fourier expanding the Hamiltonian: H=
k,
ψˆ k, (t) exp i(kx + y),
150
3 The Challenge of Macro-Systems
we may write Liouville’s equation (3.71) as a system of ODEs d uˆ k, = ( k − k )ψˆ k , uˆ k−k , − dt k ,
(k, ∈ Z).
The real and imaginary parts of each uˆ k, constitute a canonical pair: the above equation reads as Hamilton’s canonical equation: ∂ d ∂ d qk, = pk, = − H, H dt ∂ pk, dt ∂qk, H= ( k − k )uˆ k, i ψˆ k , uˆ k−k , − . k, k ,
Thus, the corresponding “Hamiltonian flow,” in the state space of the canonical variables {qk, , pk, ; k, ∈ Z}, is incompressible; see (2.115).
References 1. Ablowits, M.J., Segur, H.: Solitons and the Inverse Scattering Transform, SIAM, Philadelphia (1981) 2. Adams, D.R., Hedberg, L.I.: Function Spaces and Potential Theory, Springer-Verlag, BerlinHeidelberg (1996) 3. Beck, C., Schl¨ogl, F.: Thermodynamics of Chaotic Systems—An Introduction (Cambridge Nonlinear Science Series 4), Cambridge Univ. Press, Cambridge (1993) 4. Casti, G., Chirikov, B.V., Ford, J.: Marginal local instability of quasi-periodic motion, Phys. Lett. A 77, 91–94 (1980) 5. Dewar, R.L., Cuthbert, P., Ball, R.: Strong “quantum” chaos in the global ballooning mode spectrum of three-dimensional plasmas, Phys. Rev. Lett. 86, 2321–2324 (2001) 6. Feller, W.: An Introduction to Probability Theory and Its Applications (3rd ed.) Vol. I, John Wiley & Sons, New York (1968) 7. Gilbarg, D., Trudinger, N.S.: Elliptic Partial Differential Equations of Second Order, SpringerVerlag, Berlin-Heidelberg (1977) 8. Gilbert, A.D., Childress. S.: Evidence for fast dynamo action in a chaotic web, Phys. Rev. Lett. 65, 2133–2136 (1990) 9. Gilmore, R.G., Lefranc, M.: The Topology of Chaos—Alice in Stretch and Squeezeland, John Wiley & Sons, New York (2002) 10. Gutzwiller, M. C.: Chaos in Classical and Quantum Mechanics, Springer-Verlag, New York (1990) 11. Haake, F.: Quantum Signature of Chaos (2nd ed.), Springer-Verlag, Berlin-Heidelberg (2000) 12. Hazeltine, R., Waelbroeck, F.L.: The Framework of Plasma Physics, Perseus Books New York, NY (1998) 13. Kirsch, A.: An Introduction to the Mathematical Theory of Inverse Problems, SpringerVerlag, New York (1996) 14. Mehta, M. L.: Random Matrices (3rd ed.), Elsevier, San Diego (1991) 15. Morgan, F.: Geometric Measure Theory—A Beginner’s Guide, Academic Press, London (1988) 16. Nishikawa, K., Wakatani, M.: Plasma Physics (3rd ed.), Springer-Verlag, Berlin-Heidelberg (2000)
References
151
17. Reed, M., Simon, B.: Methods of Modern Mathematical Physics II: Fourier Analysis, SelfAdjointness, Academic Press, San Diego (1975) 18. Soong, T.T., Random Differential Equations in Science and Engineering, Academic Press, San Diego (1973) 19. St¨ockmann, H. J.: Quantum Chaos—An Introduction, Cambridge University Press, Cambridge (1999) 20. Van Kampen, N.G., Felderhof, B.U.: Theoretical Methods in Plasma Physics, North-Holland, Amsterdam (1967) 21. von Neumann, J.: Mathematische Grundlagen der Quantenmechaik, Springer-Verlag, Berlin (1932); Mathematical Foundations of Quantum Mechanics (Trans. by Beyer, R.T.) Princeton University Press, Princeton, NJ (1996) 22. Yosida, K.: Markoff process with a stable distribution, Proc. Imp. Acad. Tokyo 16, 43–48 (1940)
Chapter 4
Interactions of Micro and Macro Hierarchies
Abstract Facing a macro-system, we encounter a fundamental problem: even a hopelessly massive effort to describe a huge amount of detailed data does not lead us to a deep (or universal) understanding of the system. Modern science has inclined toward micro-systems—reducing a macro-system into elements, it seeks universality in the decomposed virtual world. However, it is extremely difficult to connect the elements again, to compose (reconstruct) the macro-system, and to understand the diversity of the system from the fragmented parts. How can a macro-system, the “reality” of our world, become the subject of science? As an observer, a reporter, or an analyst, we have a certain subject (or perspective) when we see the reality. We can choose various subjects. Depending of the subject, the appearance of an event changes. We can say that such diversity of appearance is the essence of “complexity.” Here, we propose the notion of scale hierarchy, ranging from macro to micro scales, which defines our horizon of the vision of an event. What we mean by macro or micro is the “norm” of the observation, description, or analysis; it is not merely the composition or decomposition of the object. A micro image and a macro image are not necessarily independent. The relation among different scale hierarchies is produced by nonlinearity. The aim of this chapter is to analyze the mechanisms of the conjunction of the scale hierarchies.
4.1 Structure and Scale Hierarchy 4.1.1 Crossing-Over Hierarchies A faint wind caused by the flapping of butterfly wings changes the motion of air slightly, but it may possibly result in a tornado—such a complex chain of unstable motion is called the butterfly effect. It is rhetoric of someone’s invention to explain the effect of chaos, i.e., a small difference of conditions can produce an unexpectedly large difference in the result (see Sect. 3.1). Needless to say, it is absurd, but there is an interesting point worthwhile to analyze critically. Since the atmosphere cannot be dissociated into parts, the influence of a tiny perturbation in a small part on the macroscopic motion is not a mathematical zero. However, comparing the power of butterfly wings with that of a tornado (there Z. Yoshida, Nonlinear Science, DOI 10.1007/978-3-642-03406-0 4, C Springer-Verlag Berlin Heidelberg 2010
153
154
4 Interactions of Micro and Macro Hierarchies
might be a difference of more than 10 digits), it is evident that a butterfly cannot compose the “main cause” of a tornado. What we should consider as the primary cause of a tornado is the origin of its energy the process starting from a temperature fluctuation in the atmosphere on a very large scale, creating a huge cumulonimbus involving a large vortex motion, and concentrating its energy to a localized strong whirlpool. The dynamics of a tornado is the bundle of various phenomena encompassing extremely wide scale hierarchies—the transport of energy from a huge-scale inhomogeneity to a localized whirlpool, turbulent fluid motion pervading the vortex, production of small-scale eddies and raindrops, and so on. A simple narrative of phenomenon assuming a monogenetic causality does not apply to describe and understand a tornado. The theory of chaos is, indeed, pointing out the fundamental difficulty in connecting cause and result when multiple parameters create the bundle of orbits linking in complex ways; we may imagine a braid lacking any kind of order (see Sect. 2.4). However, if a rhetoric of chaos tells only the unpredictability of the route tying a butterfly and a tornado, it is overlooking an important theme, that is the consideration of the possibility of crossing-over scale hierarchies. In general, it is not easy to overcome the difference of scales. What can enable this is structure. A chain of phenomena triggered by a small perturbation but causing a serious result is called a syndrome (see Sect. 1.1.2). For such a thing to occur, the system must have a structure that involves an efficient passage crossing over the scale hierarchies. Since we said the chain of phenomena starts from a “trigger,” let us consider the example of a gun. The power exerted by a finger squeezing a trigger is negligible. A gun is equipped with a clever structure such that a small motion of the trigger causes a steep motion of the percussion hammer. The bullet’s cartridge is filled by gunpowder and when it is stricken by the hammer, it explodes to launch the bullet. When this series of mechanisms functions, minimal power results in major damage. Like this example, an artifact is equipped with the structure that achieves its designer’s intention, and it works to connect the distances of scales of energy, time, or space. Although such mechanisms should operate according to their designs, a complex artifact sometimes behaves in an unexpected way, in contradiction to the designer’s or user’s plan—for example, the “China Syndrome.” A singular structure—line of escape (ligne de fuite) [6] crossing-over hierarchies—causing a syndrome is not made by someone’s intention but is rather contingent and potential. It is a complex passage concealed within a chaotic braid, so to speak. In addition, various faults that do not necessarily compose causality links cross over the scale hierarchies—a tornado may be represented by such an image.
4.1.2 Connection of Scale Hierarchies—Structure A thing in the real world is regarded as a system that is composed of many elements. Usually, a system is viewed as a macro-object, while an element is viewed as a
4.1
Structure and Scale Hierarchy
155
micro-object. We may also consider a variety of scales to characterize subsystems and can regard smaller-scale subsystems as elements of a relatively larger-scale subsystem. Thus, a scale hierarchy is envisaged. As the word “hierarchy” hints, the notion of scale hierarchy assumes an abstract horizon of events, which is separated (distinct) from different scales. The model of a phenomenon on a certain hierarchical level can be established by abstracting, by some parameters, the relations of the highlighted phenomenon with other phenomena of larger or smaller scales. For instance, the planet earth is modeled as a point mass particle, the mass of which is the sum of all of the microscopic particles, and the position and momentum are the place and momentum of the center of gravity, when we are interested in its motion in the solar system. Various phenomena of the planetary interior, then, are totally abstracted. And, the largescale structure of space that surrounds the planet—for instance, gravitation from the sun—is parameterized as the gravitational field to be included in the Hamiltonian of the particle. When such hierarchization (or parameterization) is successful, we can establish a self-contained model on a certain hierarchical level. However, separating a certain hierarchical level from others is not possible in general; the relations with different hierarchical levels are note necessarily represented by appropriate parameters. Although the motion of a planet can be described by the point mass—the coarsegrained model of a real planet definable only by doing summation—the motion of a living organism, for instance, is not that simple; it is not determined by a force acting on its center of gravity. A living organism moves through the synthesis of multi-scale phenomena, ranging from molecular reactions, cellular activities, to organ’s actions. The activity of a cell cannot be described by studying the physical quantities that are evaluated by summation, like the particle number or the total energy, of the molecules that compose the cell. On the other hand, a cell is always influenced from larger-scale hierarchies of organs, individuals, and environment, which are not necessarily parameterized as instantaneous effects, i.e., forces. To understand the motion of a living organism, we have to analyze how large-scale phenomena influence small-scale phenomena, as well as how small-scale elements compose large-scale systems. We encounter a fundamental difficulty in relating micro-objects and macroobjects when a simple summation (or integral) does not work well—it is the difficulty of nonlinearity. For instance, we say “1 + 1 is not 2” when two persons collaborate to reap big reward. Generally, we cannot understand the characteristic of a system by the simple summation of the properties of fragmented elements. The function of a system (or a group) is not proportional to the number of its elements. As we have discussed in Sect. 3.3, it is because of a structure (nonuniformity) that the synthesis of elements cannot not be done by simple summation. By organizing structures, a system maintains diverse relations among the hierarchies. For example, let us consider physical matter composed of a lot of molecules. If it is a gas confined in a static container, we imagine that random motions of molecules occur on the molecular-scale hierarchical level. When we shift to a
156
4 Interactions of Micro and Macro Hierarchies
larger scale, only the total mass and the gross energy of the molecules—the simple summations of micro-quantities—are handed over to the macro-scale hierarchical level, the realm of thermodynamics. All other micro-information (particle motion or quantum state of every molecule) disappears in the macro-scale view. Suppose that a molecule is moving in a certain direction. Among a huge number of other molecules, there are some that move in other directions so that the total momentum cancels out. Therefore, the specific individuality of one molecule cannot emerge on the macro-scale hierarchical level. Here the large number and the disorder of the microscopic elements are the two factors that eliminate the individuality of each element in the macroscopic vision. In such a case, we can formulate a self-contained (or closed) relation among macro-parameters (such as the gross weight, the gross energy) that are evaluated as the summations (or averages) of micro-parameters (cf. Sect. 3.2). However, such summations (or averages) do not work well when structures are organized. The aforementioned living organism is a typical example; structures are organized in many different hierarchical levels like cells, organs, individuals, groups. Owing to structures (inhomogeneities), a macroobject is not a mere summation of the micro-objects. While summation makes individual differences average out, a structure maintains differences, allowing various functions of elements to emerge. The structure of a living organism is generated and inherited by a precise program, the gene. At the opposite pole to this mechanism of biosis, the structure of a physical system, like various vortexes or crystals, emerges and evolves by a posteriori mechanism, which may be similar to the order formation in society, and also to the evolution of species. Such a structure is determined by interactions, under a rather simple rule, of a large number of elements, not by a given program like the blueprint coded on DNA. The motion of one element (particle, in physics terminology) is dominated by a field—a function of space-time producing force—which is a common rule and, at the same time, a common product of the group of elements. In other words, the system organizes a recurrent relation such that the motion creates a field and the field rules the motion (remember the formulations in Sect. 3.3). If so, what structural order does this self-consistency relation involve? One may think that complete disorder or chaos is the only natural state. In fact, we have shown in Sect. 3.2 that the combination of the random motion and the homogeneous (structureless) field is a trivial equilibrium solution, heat death. However, there are a variety of structures and examples of evolution in the universe. For instance, the beautiful spiral structures of galaxies: They are displaying self-organized order in the collective motion of many stars and gases interacting by gravity. Such a system is able to find a self-consistent “solution” without any program or a priori aim. The structure (or inhomogeneity) in collective phenomena is the main theme of this chapter. Nonlinearity, invalidating the trivial linear composition (or summation, averaging, scale transformation, etc.), is at the root of the diversity of structures or phenomena.
4.2
Topology—A System of Differences
157
4.2 Topology—A System of Differences 4.2.1 The Topology of Geometry Phenomena of the real world have infinite diversity; the more accurately we observe them, the more individualities and complexities come to the surface. Therefore, a scientist tries to eliminate detailed individualities and extract universality. The framework of such an abstraction or reduction is embodied by the notion of topology. In geometry, topology defines a norm to categories or relativize the shapes of geometrical objects. For example, let us imagine the shape of a supple string. It can be a “straight line” or a “circle” only in the ideal limit; when a real string is thrown out on a table, its actual shape is neither a true straight line nor a circle. The more precisely an actual curve is described, the more individual and complex it is. Even if each exact shape is specified and named, a meaningful theory does not develop. The properties that can be the subjects of our common attention are, for example, the connectedness or knots of the string (see Fig. 4.1). If the string is a loop with no knots, it may be identified as a circle. No matter how the string is thrown out on the table, these properties are invariant; to change them, we have to cut the string once. The system of differences, with a certain level of universality, is made by studying the invariant properties and relativizing individual varieties—this is the study of topology. The classification of the shapes of strings is the most intuitive example of the notion of topology. We may generalize it as the fundamental concept of describing relative properties of curves (for example, orbits in state space) or surfaces (for example, graphs of certain laws) of arbitrary dimensions. A conservation law (cf. Chap. 2), being an invariant property against evolution or the transformations of state variables, can be considered as a norm that characterizes topology. If a function φ(x) is a constant of motion (cf. Sect. 2.4.1), the constancy of φ(x) poses a topological constraint on the orbits, just like the constancy of the number of knots of a string.
Fig. 4.1 The topology of knots: a simple example of mathematical theory discussing a system of differences
158
4 Interactions of Micro and Macro Hierarchies
A continuum like a string is has an infinite degree of freedom. A macroscopic system composed of many elements also has a very high degree of freedom. Knowing a small number of conservation laws for such a high-dimensional system is often more important than investigating the detailed properties of the individual history of evolution. A constant, like the number of knots, describes a universal property, separating the particularities and complexities of individual events. As we discussed in Sect. 2.5, a conservation law is a priori knowledge that is derived not by observing actual motions but by investigating the structure (symmetry) of the system. When we study a complex system, we need to understand a universal property, that is, the topology of orbits characterized by (a rather small number of) constants of motion.
4.2.2 Scale Hierarchy and Topology In the preceding subsection, we studied the concept of topology by citing an example of geometric objects. Here we discuss a wider or deeper meaning of the notion of topology. The literal meaning of topology is the “study of topos,” where topos is a Greek word for “place” or “position.” Scientific studies start by observing and describing an event in a certain “place.” In other words, the event to which we direct our interest is “placed” as the subject of study in our perspective of the world. The setting of topos—the choice of the perspective to see an object—is subject to our interest, or subject. For example, when we observe and describe the shape of a string, our recognition is based on a reference system of differences (such as connection, linkage, or knots). The topos where a string is placed is this abstract field of recognition. When we observe or describe an object of the real world, we have a subject of cognition, by which we abstract clear and distinct properties of the complex object; the object is converted to a subject placed in the topos, the system of differences. In mathematical analysis, topos is defined by scales, i.e., the units to measure the object. We start by projecting an object of the real world onto a linear space by “measuring” a set of parameters. Our subject of analysis then becomes the vector placed in linear space. In the “measurement”—evaluating numbers to sketch the object—we need units. The choice of units is not simply a matter of arbitrarily assigning a basis of linear space (cf. Sect. 1.3.1), but a choice of the scales of our interest in observing or describing the object. Here the problem of “accuracy” intervenes in the notion of scale. The scales determine the accuracy of the measurement; in other words it defines whether two states are distinguished or identified, like the geometric topology defines the classification of the shapes of a string. Thus, what we call a scale defines the topology in mathematical analysis, which is the reference system of differences for the measurement. An object of the real world changes its appearance when measured by different scales. Only lines or planes in linear space—mathematical artifacts representing linear relations—are independent of the choice of scales. Nonlinear relations, as well as laws in infinite-dimensional spaces, transform their representations drastically
4.2
Topology—A System of Differences
159
when the scales are changed.1 A high-accuracy topology, distinguishing small-scale structures, and a low-accuracy topology, observing only coarse-grained structures, define different scale-hierarchical levels. And on different levels, events generally obey different laws. For example, the macroscopic fluid motion of water is described by the fluid-mechanics model, while, in the microscopic world, the motions of molecules are governed by a particle’s equation of motion. In the next subsection, we shall study strange (though in nature rather common) nonlinear objects that are very difficult to measure.
4.2.3 Fractals—Aggregates of Scales Photographs of natural history, for example, present illusion; it is often difficult to estimate the scale of the objects. What is thought to be a photograph of a huge fault turns out to be a section of a small stone, or what is thought to be a satellite photograph of a coastline turns out to be puddles on a street. Such images of the parts, which have been cut out from nature, are not self-explanatory about their scales. We need the help of juxtaposed scales or captions to recognize their scales. The reason why we mistake the size is not because we do not know the object, but because similar structures appear on various scales. We can produce an artificial object that contains similar structures on many different scales. From such an exercise, we may learn the basic mechanism of nature, which produces a variety of similar structures. Let us introduce a simple example. First we consider a straight line of length L 0 . Divide this line into three pieces (the length of each piece is L 0 /3), double the central piece, and fold it into a wedge shape (see Fig. 4.2). We repeat this manipulation to make a wedge on each piece. Let us denote by n the number of times of manipulation. By one manipulation, the
n=2
n=5
n=1
n=4
n=0
n=3
Fig. 4.2 Production of the Koch curve
1 In Chap. 2, we discussed the possibility of finding order (symmetry) by transforming the combinations of the variables. Here, our interest is in the change of the scales of the variables. The problem of infinite dimensionality is explained in Note 4.1.
160
4 Interactions of Micro and Macro Hierarchies
line length is multiplied by 4/3, and the minimum unit length (the length of a straight piece) is divided by 3. After manipulating n times, the total length of the line (L n ) and the unit length ( n ) are L n = (4/3)n L 0 ,
n = (1/3)n L 0 .
(4.1)
Repeating this process, we obtain an infinitely folded curve, which is called the Koch curve. The Koch curve has a similar structure on every scale of unit length. If a part of it is shown, we cannot specify the scale in which it is displayed. Such an aggregate of multi-scale similar structures is called a fractal .2 The reason why the process of making the Koch curve produces similar structures on every hierarchical scales is because the manipulation creates a new (small) structure while almost preserving the old (large) structures. If the old structure were totally destroyed, similarity could not exist on different scales simultaneously. The complexity of the fractal is produced by a subtle balance of transformation and preservation. Using the Koch curve as a sample, let us see how we can separate (extract) universality from complexity. By the second equation of (4.1), we may write n = log3 (L 0 / n ). Using this in the first equation, we obtain p
p L n = L 0 1− n
( p = ln 4/ ln 3 ≈ 1.262) .
(4.2)
p
Here, the first factor L 0 is independent of n (i.e., the invariant part), which is sepa1− p rated from the divergent factor n . By measuring the speed of the growth of the divergent part, we obtain an index that scales the complexity of the fractal. Let us rewrite (4.2) as L n = N n
p p N = L 0 − . n
(4.3)
A connotation of this equation is that the “size” of L n , measured by a unit n , is N ; see (1.7). In the limit of n → 0, the size N diverges with the exponent p = ln 4/ ln 3. The exponent p has an interesting meaning. Suppose that we have a gauge of length ε. Using this gauge as a yardstick, we measure the “size” of the Koch curve (n → ∞) by counting the number against this gauge. By this method of measurement, fluctuations on the curve smaller than the unit ε are ignored, i.e., our topos is set to see the structures that are larger than ε. If we choose a gauge of ε = n , the measurement is equivalent to the estimate (4.3), so we obtain the “size” N = L 0 ε− p . p
(4.4)
2 See Mandelbrot [26] and Peitgen, J¨ urgens & Saupe [30] for rich examples of fractal geometrical objects.
4.3
The Scale of Event / The Scale of Law
161
Thus, the Koch curve has different sizes depending on how it is measured. The exponent p indicates how the size changes. The exponent p is called the fractal dimension. Here, we define the “dimension” of a geometric object in its relation to the measurement (cf. Note 3.2). For example, let us think about a piece of paper (we ignore the thickness). We consider a ddimensional rectangular box (if d = 1, it is a bar, if d = 2, a square, and if d = 3, a cube). To cover the paper, we need d 2. This minimum d is the dimension of the paper. For a general geometric object, the minimum dimension d of the boxes that can cover the object is called the embedding dimension. Next we calculate the number of boxes that can cover the object. For example, we cover a piece of paper with d = 3 boxes. Let N denote the number of necessary boxes. If the box size ε is sufficiently small, we have to arrange boxes two dimensionally. Therefore, we estimate N ∝ ε−2 . Here, the dimension of the paper appears as the exponent. Generalizing this observation, we may define, by the number N and the size ε of boxes that can cover an object, p = − lim
ε→0
ln N , ln ε
(4.5)
and call this p the fractal dimension of the object. Comparing (4.5) and (4.4), it is clear that the fractal dimension is a generalization of the dimension of conventional (mathematical) objects.3
4.3 The Scale of Event / The Scale of Law 4.3.1 Scaling and Representation It has become clear that we have to select a scale when we recognize, describe, or analyze an event; the view of the object is subject to the selection of the scale. As the premise for what we consider objectivity, the subjective operation of scale selection is necessary. Scientists try to establish objectivity by transforming an object of the real world to a mathematical object, a vector, which is measured by a set of units (basis). The measurement depends on the selection of the units, the topos defined by the scale of our interest.4 3 However, the existence of the limit (4.5) is not always guaranteed. A more rigorous definition of an intermediate dimension of singular objects may be given by generalizing the foregoing argument of embedding (see Note 3.2). 4 In the preceding section, we have seen that the measurement of the Koch curve gives different lengths depending on the selection of the scale.
162
4 Interactions of Micro and Macro Hierarchies
The reason for the false feeling caused by a fractal is that we do not notice the difference between the real scale and the premised (expected) scale. Creating an illusion, a fractal exposes the prejudgment of the scale, which intervenes between the object and our recognition or description. The main purpose of this section is to critically analyze the fundamental structure of recognition, common in all mathematical models, which requires the selection of a certain scale. Excepting the special cases of fractals, the view of an object generally changes drastically when we alter the scale of observation.5 Therefore, the model (or the governing law) of the phenomenon changes depending on the selection of the scale; in the mathematical representation of the model, the magnitudes (and possibly the forms) of the terms constituting the equation change, because all variables and coefficients involved in the equation are measured by a certain system of units (reference values). The setting of the units means the selection of the scale to which we pay attention. It should be thought that one model describes one scale-hierarchical level that is defined by the system of selected units. In formulating a model equation, the selection of a hierarchical level is formally done by the normalization of variables and coefficients. Let us consider a simple example. Let t and x denote time and spatial position respectively. Selecting certain units T of time and L of length, we put tˇ =
t , T
xˇ =
x , L
(4.6)
where the variables marked by ˇ are dimensionless numbers.6 The “dimensions” of time and length are borne by the units T and L. We say that the numbers tˇ and xˇ are normalized by the units T and L, respectively. In this sense, the notion of normalization is merely the numerical evaluation of a quantity by giving a unit (see Sect. 1.3.1). However, we are going to give a more profound meaning for normalization; we select the unit to make the evaluated numbers have magnitudes typical of order unity. This is equivalent to choosing the unit to be the “representative.” The normalization, then, implies the selection of the scale of our interest (or the definition of the topos). For example, the normalization by the MKS system of units recognizes objects based on human scales. The length, weight, time, or speed of human activity is normally of order unity when evaluated in the MKS units. It may be more convenient
5 The unique nature of a fractal—the self-similar appearance that is independent of the scale of observation—implies that the governing law (or the mathematical model) has a certain scale invariance. Recall the procedure by which we produced the Koch curve: we repeated the same manipulation on all scales. The scale invariance of governing laws will be discussed in Sect. 4.3.3. 6 Here, we mean by “dimension” the unit of a physical quantity (for example, [sec] for time or [m] for length). It is different from the dimension of vector space or the dimension of a geometric object discussed in Sect. 4.2.3. Every physical quantity has its proper dimension, which can be decomposed into the units of fundamental quantities such as time, length, mass, charge. For example, the dimension of velocity is equivalent to that of length / time.
4.3
The Scale of Event / The Scale of Law
163
to use different units, when we measure an object in the universe. For example, in the study of the solar system, the astronomical unit (AU; the average distance between the sun and the earth ≈ 1.5×1011 m) is used to measure length. In condensed matter physics, however, the atomic scale (nanometer) becomes the representative. Such selections of units are done to make the evaluated parameters typical of order unity; we apply the normalization in accordance with the “subject” of our interest. As we revealed using the Koch curve as an example, the measurement of an object cannot be carried out without prescribing the scale to which we focus our attention. This fact is often forgotten because we are used to measuring simple artifacts like a circle, triangle, or cone, not the objects in the real world. But, when we measure the distance between two places, we use different maps depending on the subject; to measure the distance between two houses, we use a map of meter-scale, and between two cities, we use a map of kilometer-scale. When we consider the distance between cities, such a distance is not defined on the scale of meters. The “scale” of a map represents the topos of our interest or subject. Let us observe how the description of a phenomenon changes depending on the normalization, the selection of scales. For example, consider a wave that is represented by a function u(x, t) = u 0 (x − V t),
(4.7)
∂ ∂ u + V u = 0. ∂t ∂x
(4.8)
which obeys a wave equation
Here V (assumed to be a positive constant) gives the speed of wave propagation. The function u 0 (x) determines the wave form at t = 0, which propagates while keeping its shape. First we choose arbitrary scales L and T as the units of the independent variables x and t, respectively. For the wave function u, we choose its representative value U for the unit and put uˇ = u/U . Using these normalized variables, (4.7) and (4.8) read as (with the normalized initial wave form uˇ 0 (xˇ ) = u 0 (x/L)/U ) ˇ xˇ , tˇ) = uˇ 0 (xˇ − Vˇ tˇ), u(
(4.9)
∂ ∂ uˇ + Vˇ uˇ = 0. ∂ tˇ ∂ xˇ
(4.10)
and
Here, the wave velocity V is normalized by the unit L/T : Vˇ =
T V. L
(4.11)
164
4 Interactions of Micro and Macro Hierarchies
Now, let L and T represent, respectively, the scales of space and time of our interest; for them to be the representative scales of the object u(x, t), they should be the scales of space-time where the wave u(x, t) exhibits an appreciable variation. For example, let u 0 (x − V t) = U sin[2π (x − V t)/λ] in (4.7), which is a sinusoidal wave with a wavelength λ. Then, L should be of order λ, and T should be of order λ/V , the period of oscillation. By these units, we find that L = V T,
(4.12)
ˇ xˇ and ∂ u/∂ ˇ tˇ are of order unity. From (4.11), we also find that Vˇ is of and both ∂ u/∂ order unity. In the normalized wave equation (4.10), two terms have the same order of magnitude—the balance of these terms describes the law of wave propagation. If our scales of observation do not match with the intrinsic scales of the object, we fail to describe the event properly (see Problem 4.1).
4.3.2 Scale Separation In the foregoing example of (4.8), the temporal variation (first term) and the spatial variation (second term) are directly related to derive the unique relation (4.12) of the scales of space and time. However, in a more general system where more than two terms apply, the terms constitute complex relations in which the balances may vary when shifted to different hierarchical levels of scales. For example, let us add to the wave equation (4.8) a term including a higher-order derivative (physically implying diffusion effect): ∂ ∂2 ∂ u + V u − D 2 u = 0, ∂t ∂x ∂x
(4.13)
where D is a positive constant (the diffusion coefficient). In the theory of PDEs, the terms of the highest-order derivatives determine the essential property of the equation. Equation (4.13) includes a second-order derivative with respect to x, while the temporal derivative is first order. This type of PDE is called a diffusion equation (or a parabolic PDE).7 Applying the aforementioned normalization (4.8), we obtain 2 ∂ ∂ ˇ ∂ uˇ = 0, uˇ + Vˇ uˇ − D ∂ tˇ ∂ xˇ ∂ xˇ 2
(4.14)
7 For the wave equation (4.8), we have the general solution (4.7). We cannot write explicitly the general solution of the diffusion equation, but we have a particular solution u(x, t) = a exp [ik(x − V t) − Dk 2 t] (a and k are real constants). We observe that the diffusion term −D∂ 2 u/∂ x 2 yields the damping factor exp (−Dk 2 t). The general solution is composed by integrating such a particular solution over k.
4.3
The Scale of Event / The Scale of Law
165
where Vˇ =
T V, L
ˇ = D
T L2
D.
(4.15)
We should set L and T , respectively, to match with the spatial and temporal variaˇ xˇ 2 have magˇ tˇ, ∂ u/∂ ˇ xˇ , and ∂ 2 u/∂ tions of the object u(x, t). Then all factors ∂ u/∂ nitudes of order unity. Unlike the previous example, there are two different physical effects (second and third terms) producing the temporal variation (first term). Each of them are, respectively, scaled by the coefficients V and D, which we call scale parameters. They control the balances among three terms. When V D/L, the third term is negligible with respect to the second term on the left-hand side of (4.14), for any choice of T . Hence the behavior of u(x, t) is determined by the balance of the first and second terms, both of which are normalized to be of order unity by setting T = L/V . However, when V D/L, the third term is superior to the second term for any choice of T . Then the first term and the third term are normalized to order unity by T = L 2 /D. Here, the relative strengths of the second and third terms are determined not only by the corresponding scale parameters (V and D) but also by the length scale L of ˇ include L as the object u(x, t); note that the normalized scale parameters Vˇ and D shown in (4.15). By this fact, the mutual relations among three terms have somewhat complex implications.8 In the foregoing discussion, we first fixed the length scale L and compared the magnitudes of the scale parameters V and D. But now, let us first fix V and D (by prescribing the physical parameters). Then the second term dominates if L D/L, and the third term dominates if L D/L. Here, L is the length scale on which we focus our attention. Therefore, depending on the scale of our interest (subject), the governing term (physical effect) switches. This example reveals that the same phenomenon may exhibit some different aspects depending on the scale of observation. Different hierarchical levels are dominated by independent laws; in this sense, the hierarchical levels are separated. As we shall see in the later discussions, however, this separability does not necessarily hold in general nonlinear systems, and the interacting hierarchical levels will be analyzed as an essential complexity of nonlinear systems. The foregoing arguments may be generalized in an abstract form. Let us consider an equation that describes the balance among three different terms A, B, and C pertaining to an event: A + B + C = 0.
(4.16)
8 The complexity is due to the fact that the equation consists of terms including different-order derivatives. The diffusion term (including the second-order derivative) works as a singular perturbation, whose meaning and effect will be discussed in Sect. 4.4.
166
4 Interactions of Micro and Macro Hierarchies
For these three terms to play their own roles in a given topos, or hierarchical level of scales, every term must have the same order of magnitude, which we normalize to order unity. In fact, if C, for instance, has a much smaller magnitude, (4.16) degenerates into A + B = 0. An interesting change may occur when we observe the same event on a different hierarchical level of scales. When we normalize variables to a larger scale, the term A may disappear, and when we normalize them to a smaller scale, B may disappear. So the representation of a law (parameterization of an object) may change depending on the normalization. This statement might cause confusion, if one misses the deeper meaning of normalization and understands it as merely the numerical evaluation of quantities by an arbitrary set of units (see Sect. 4.3.1). When we write a law in the form of (4.16), all terms A, B, and C must have the same dimension. Therefore, just changing the units of parameters (for example, shifting from the MKS system to the CGS system) cannot change the balance of the terms—indeed, a well-posed law must be “invariant” with respect to the choice of units. Let us look at (4.13), for example: If u is a quantity that has a dimension [μ], every term has the dimension [μ/T ] (T is a unit of time). If we change the units as T → T = αT , L → L = β L and μ → μ = γ μ with arbitrary factors α, β, and γ , all terms are divided by the same factor α/γ . Hence the balance of the terms does not change. Then, why do we claim that the balance of terms changes by varying the normalization? It is because, as we emphasized at the beginning of this section, normalization is not merely the choice of units, but it is the determination of the scales of our interest. The variables of our interest (in the foregoing example, the ˇ are set to be of order independent variables tˇ, xˇ and the dependent variable u) unity—this means the determination of the representative value of each variable. When we change the scales of interest (the representative values of the variables), the newly normalized variables must be again of order unity. This requirement ˇ may cause changes in the scale parameters (in the previous example, Vˇ and D), because each term consists of variables that we want to normalize to order unity and the scale parameter (a different scale parameter bears a different dimension). ˇ → In the example of (4.15), the scale parameters transform as Vˇ → Vˇ /(β/α), D 2 ˇ D/(β /α). It is these scale parameters that bring about changes in the balance of the terms. Here, the normalization of parameters is done by choosing the scales on which to focus. When we select certain scales by which we observe, describe, or analyze an object, we are ignoring the events or structures that may exist on different scales. When the influences from different scales cause problems exposing the discrepancy of the objectivity and the subjectivity, more careful consideration is needed in defining scale hierarchies. Since the example discussed in the present subsection is a linear problem, the separation of the hierarchical levels of scales is done rather easily; the scaling units are evaluated by an a priori choice of the scales of our interest. In nonlinear systems, however, the problem is not that simple. In the next subsection, we shall study the connections of different scales created by nonlinearity.
4.3
The Scale of Event / The Scale of Law
167
4.3.3 Spontaneous Selection of Scale by Nonlinearity In the foregoing examples of linear systems, we were able to determine the scales of interest easily. This is primarily because the estimates of the scale parameters depend only on the independent variables—normalization (4.11) or (4.15) includes L and T , while it is independent of U . In a nonlinear system, however, the scales of the phenomenon depend above all on the magnitudes of the dependent (unknown) variables—or, to put it another way, the phenomenon itself determines (or changes) the scales. Let us examine a simple example. We replace the coefficient V in (4.8) by the unknown variable u to formulate a nonlinear wave equation: ∂ ∂ u + u u = 0, ∂t ∂x
(4.17)
where the unknown variable u itself is the propagation speed. If the representative values T of time, L of length and U of the dependent variable u (having the dimension of velocity) satisfy a relation L = U T,
(4.18)
∂ ∂ uˇ + uˇ uˇ = 0. ∂ tˇ ∂ xˇ
(4.19)
we can normalize (4.17) as
We note that relation (4.18) includes the scale U of the unknown variable u. Hence, if we fix T , for instance, the appropriate length scale L for observation must be tuned to follow the change of the unknown U . This point is in sharp contrast to the previous linear case (4.12). Another interesting difference from the linear problem (4.10) is that the normalized equation (4.19) includes no scale parameter. Thus, it is formally equivalent to the original equation (4.17). This fact implies that the nonlinear equation (4.17) is scale invariant with respect to the transformations t → tˇ, x → xˇ , and u → uˇ that satisfy relation (4.18). Such an invariance (the self-similarity of the equation) cannot occur in the linear equation (4.8), where the scale transformations of T and L cause a change in the scale parameter Vˇ . The scale invariance of the governing equation implies the nonexistence of a particular scale in the system. In a linear system, the governing equation must include a scale parameter corresponding to a proportionality coefficient, which determines an a priori scale (to be deduced directly from the equation); see (4.11) and (4.15). In the nonlinear equation (4.17), the scale parameter V is replaced by u; in the absence of the scale parameter, the scale invariance emerges. A nonlinear equation that exhibits scale invariance pertains to the creation of some self-similar structures of scale hierarchies. In the foregoing example, if a
168
4 Interactions of Micro and Macro Hierarchies
certain event occurs in the hierarchical level of T , L, and U , an exactly similar event can occur in the level of αT , β L, and (β/α)U (∀α, β > 0), because the governing equation is the same. As we have noted in Sect. 4.3.1, a fractal has such a self-similarity, implying that the process that produces the fractal must have scale invariance. In fact, the Koch curve—a mathematical model of a fractal—is produced by scale-invariant manipulation (though it is not a continuous-time process). The basic equation of fluid mechanics has a similar structure to the nonlinear equation (4.17), and its scale invariance is considered to yield the self-similarity of turbulent flows. This problem will be discussed in Sect. 4.4.
4.3.4 Singularity—Ideal Limit of Scale-Invariant Structure Nature determines the scale by the help of nonlinearity. On the other hand, the scale is the subjective prerequisite that we need in order to observe, describe, or analyze an object (see Sect. 4.3.1). Such ambivalence of the scale sometimes causes a serious discrepancy between the subjectivity of the observer and the object. The mathematical actualization of such a discrepancy is a singularity. A singularity is the place in space–time where variations or derivatives of fields (functions) diverge (or become undefinable). For example, a step function such that f (x) =
0 (x 0), 1 (x > 0)
has a singularity at x = 0, where the derivative d f (x)/d x is not definable. Or, a complex function f (z) = z −1 (z ∈ C) has a singularity at z = 0. For a function f (x), the scale L of the independent variable x is the magnitude of the change in x which is enough to yield an appreciable variation of f (x), which is estimated by L=
[[ f (x) ]] . [[ d f (x)/d x ]]
(4.20)
Here, we denote by [[ f ]] the representative magnitude of f . In the neighborhood of a singularity, the denominator |d f (x)/d x| diverges, thus the scale L shrinks to zero. If L is zero, L cannot be magnified to be a finite number. In this sense, a singularity is a special realization of scale invariance. Hence it is expected to be created by some scale-invariant mechanism. Let us see how a singularity can be created. We invoke the foregoing model (4.17). As we noted in Sect. 4.3.3, the nonlinear equation (4.17) is scale invariant; it has no particular length scale that may pose an obstacle to the creation of an unlimitedly small-scale structure (a singularity). We assume an initial distribution
4.3
The Scale of Event / The Scale of Law
169
⎧ ⎨ −a (x −1), u 0 (x) = ax (−1 < x < 1), ⎩ a (1 x),
(4.21)
where a is a real number of order unity. The solution of (4.17) is given by ⎧ (x −1 − at), ⎨ −a u(x, t) = ax/(1 + at) (−1 − at < x < 1 + at), ⎩ a (1 + at x).
(4.22)
If a < 0, we must restrict t < −1/a. In Fig. 4.3, we show the behavior of the solution (4.22). Depending on the sign of a (the slope of the initial distribution), two different phenomena occur: If a > 0, the distribution of u(x, t) becomes flatter with time, while if a < 0, it sharpens, and at t = −1/a, the gradient becomes infinite, i.e., a singularity appears at x = 0, and the differential equation (4.17) breaks down because u(x, t) is no longer differentiable with respect to x. This singularity represents a shock wave (see Note 4.3). By (4.22), we find that the length scale of the variation of u(x, t) is given by 1 + at, which changes with time. The rate of change is determined by a. If a < 0, the length scale reaches zero at a finite time t = −1/a. We note that the singularity is created within a finite time, i.e., it really appears. This is possible only in a nonlinear system—the dynamics of a linear system basically obeys the exponential law, thus it takes infinite time for any finite quantity to become infinity or zero.
a>0
a<0
t
t
x
x
Fig. 4.3 The expanding solution (a > 0) and the contracting solution (a < 0): this example shows that the scale of the object changes spontaneously in a nonlinear system. If a < 0, the scale (at which the object shows an appreciable variation) decreases to zero at a finite time t = −1/a, creating a singularity (shock)
170
4 Interactions of Micro and Macro Hierarchies
The creation of a singularity ruins the governing law that is represented by a differential equation—the field involving a singularity is not differentiable, thus it is not a “solution” of the differential equation. Then we should not persist in the original selection of the scale; we have to look at smaller-scale structures. By changing the perspective, we will be able to find an actual (finite scale) structure which is, in the crude view, abstracted as a zero-scale singularity.9 Yet we have to contrive means to include such a microscopic effect—a term that rules the small-scale hierarchy and determines the small-scale structure—into the model of nonlinear phenomena. It must be a term that disappears in a macroscopic description but appears in a microscopic description. Such a term is called a singular perturbation. In the next section we shall describe the connection of scale hierarchies as the collaboration of nonlinearity and singular perturbation.
4.4 Connections of Scale Hierarchies 4.4.1 Complexity—Structures with Multiple Aspects The word “complexity” addresses pleiotropic properties in which appearance (objectivity) changes depending on the vision (selection of the subject). The coexistence of various and conflicting aspects—order vs. disorder, statics vs. dynamics, rhythm vs. chaos, or, in a more emotional area, beauty vs. ugliness, good vs. evil, confidence vs. doubt, etc.—constitutes complexity, producing a variety of images. Here our interest is directed to the coexistence of “order (structure)” and “disorder (randomness).” Complete order and complete disorder are the two extremes of simplicity (randomness is never complex; the coexistence of order and disorder is what we call complex). Let us examine Fig. 4.4, which shows a result of numerical simulation based on a model of nonlinear vortex dynamics. Here we see a typical order–disorder complex. The left figure displays an ordered structure, while the right figure reveals complicated small-scale perturbations filling the gaps of the structure. The left figure is the plot of the Hamiltonian in which the contours describe the streamlines. Large-scale vortexes organize an ordered pattern. The right figure plots the vorticity that is the Laplacian of the Hamiltonian. Since the Laplacian evaluates the second derivatives, the vorticity highlights smaller-scale perturbations. Thus, the plot of the Hamiltonian gives the view of the large-scale hierarchical level and that of the vorticity gives the view of the small-scale level; together they constitute a synchrony. As shown in this figure, order and disorder tend to coexist, divided into different levels of the scale hierarchy. However, the separation of the scales is often incomplete because of nonlinearity (see Sect. 4.3.3). Structures mediate the connection of different scales. What creates a structure is nonlinearity, and what determines the 9 The other way of consideration is that we generalize (weaken) the meaning of differentials to define weak solutions (see Sect. 4.4.5 and Note 4.5).
4.4
Connections of Scale Hierarchies
171
Fig. 4.4 Order–disorder complex: a nonlinear vortex system (plasma) organizes an ordered array of vortexes (left: streamlines of plasma flow). In the boundary of ordered domains, however, complexity develops in small-scale vortexes (right: contours of the vorticity, the curl derivative of the flow) (after Biskamp & Welter [2])
detail of the structure (which often pertains to disorder on small scales) is singular perturbation. We are going to analyze the order–disorder complex by a model of interacting scale hierarchies; the collaboration of nonlinearity and singular perturbation enables such interactions. As we have studied in Sect. 4.3.4, nonlinearity can create a localized structure in space–time, which looks like a crevice or a gap. In the vision of a scale-invariant model (i.e., without singular perturbation), the width of the crevice reduces to zero and becomes a singularity. However, singular perturbation introduces a small but finite scale and prevents the creation of a singularity by producing small-scale structures in the crevice. Figure 4.4 clearly shows the creation of such a crevice and the internal disordered motions. The crevice is a fault line crossing the hierarchies, a juncture of different scales. The crevice is not a given (a priori or transcendental) structure, but is “selforganized” through the nonlinear amplification of variations of some variables. We have yet to give a concrete expression of singular perturbation which sustains the crevice as a finite “structure” (preventing it from becoming a singularity). In the next subsection, we shall give a mathematical representation of singular perturbation and study its effects.
4.4.2 Singular Perturbation In Sect. 4.3, we discussed the basic relation between scale hierarchies and the representation of governing law, using an abstract equation (4.16). Each term constituting the equation may change its magnitude when we change the scales (the representative values of the variables) of our interest. A singular perturbation must be a term such that it is small in a large-scale representation and becomes large (to be
172
4 Interactions of Micro and Macro Hierarchies
comparable with other terms) in a small-scale representation. Let us give a concrete example and study the effects of the singular perturbation. First, consider a linear differential equation that consists of three terms like the abstract equation (4.16): ε
d2 d u − i u + ωu = 0. dt 2 dt
(4.23)
Suppose that this equation is written on a large scale (to simplify the notation, we omit the “ ˇ ” mark on the normalized variables). We assume that ω is a real constant of order unity and ε is a small positive constant. If ε = 0, (4.23) reduces into a first-order differential equation: −i
d u + ωu = 0. dt
(4.24)
We can interpret the term εd 2 u/dt 2 in (4.23) as a perturbation to (4.24), which, being multiplied by a small coefficient (scale parameter) ε, is seemingly small. By this perturbation, however, the mathematical structure of the equation undergoes a fundamental alternation: the first-order differential equation (4.24) becomes a second-order differential equation (4.23). Such a perturbation that brings about a structural change, raising the order of the derivatives, is called a singular perturbation. Although we can solve (4.23) easily, we do a mathematical experiment to highlight the influences of the singular perturbation. We start by putting ε = 0; solving (4.24) we obtain a harmonic oscillation: u(t) = u 0 (t) = ae−iωt ,
(4.25)
where a is a complex constant representing the initial value. We marked this solution with the subscript 0 in order to indicate that we set ε = 0. When the change of t becomes of order ω−1 ≈ 1, an appreciable variation occurs in u 0 (t). Hence we can say that u 0 (t) is in the hierarchical level of the time scale of order unity. When ε is given a small but finite value, another component of u appears in a different hierarchical level. To observe this new element, we have to change the time scale. Let us renormalize time as τ = t/ε. Since ε is a small parameter, τ is expanded time (measured by the small unit time ε). Using this τ , (4.23) reads as d d2 u − i u + εωu = 0. dτ 2 dτ
(4.26)
We observe that the small parameter ε has been shifted to the third term on the left-hand side. Neglecting the small-term εωu in (4.26), we obtain u(t) ≈ u ε (τ ) = beiτ + c = beit/ε + c ,
(4.27)
4.4
Connections of Scale Hierarchies
173
where b and c are real constants. The solution u ε (τ ) describes the phenomenon observed in the hierarchical level of the time scale ε, which is newly produced by the singular perturbation. Here, the constant c is the image of the slow component u 0 (t) on the present fast time scale. Because we easily find the exact solution of the linear equation (4.23), we may compare it with the approximate solutions u 0 (t) and u ε (τ ) that are split into the two separate hierarchical levels. The general solution of the constant-coefficient ODE (4.23) is given by a linear combination of exponential functions; putting u(t) = αe−iλt and plugging it into (4.23), we obtain the determining equation ελ2 + λ − ω = 0, which gives two time constants: λ± =
√ 1 −1 ± 1 + 4εω . 2ε
(4.28)
The exact solution of (4.23) is written as u(t) = α+ e−iλ+ t + α− e−iλ− t ,
(4.29)
where α± are the constants representing the initial conditions of the two components of oscillations. Since ε 1, we may approximate ελ± by Taylor expanding them in terms of ε: λ+ ≈ ω,
λ− ≈ −(ω + ε−1 ).
(4.30)
These approximate time constants agree with those of the approximate solutions u 0 (t) and u ε (τ ). In the foregoing simple (linear) example, two hierarchical levels are basically independent. In fact, in the analytical solution (4.29), two modes of oscillations (with frequencies λ+ and λ− ) are independently determined by the two constants of motion (initial values) α+ and α− . As previously mentioned, in the vision of the scale of ε (see (4.27)), the motion of the scale of unity appears as a constant c, which might be thought to be due to the influence of the other scale. However, c is independent of the other constant b, thus both modes are independent. On the other hand, on the scale of order unity, the small-scale motion does not appear. This is because the assumption (setting of the topos) of the time scale eliminates the motion of short time scale from the vision. To justify the neglect of the singular perturbation term εd 2 u/dt 2 , we need to assume that ε is small and that the magnitude of d 2 u/dt 2 is not large, in comparison with du/dt and ωu, as well. This requirement means that the time scale must not be smaller than unity. Therefore, the solution of (4.24) excludes the motion of short time scale. By the foregoing calculations, we have seen how a singular perturbation produces (or describes) an event in a micro-scale hierarchical level; the particular scale
174
4 Interactions of Micro and Macro Hierarchies
is determined by the scale parameter ε. Here, we have to mention that a “reciprocal effect” is also produced by the singular perturbation. Comparing the unperturbed solution (4.25) with the exact perturbed solution α+ e−iλ+ t , we find a small shift of the time constant (frequency). This discrepancy is negligible if we observe the event for a period of time of order unity, but it becomes visible if we observe it for a longer time scale. This means that the singular perturbation influences larger scales, too. While the term |εd 2 u/dt 2 | is much smaller than the other terms in the longer-time hierarchical level, the integration over a long time yields a finite influence. If we do not know the exact solution, we need a higher-level technique to estimate the long-term behavior of the approximate solution—that is a method to overcome the hierarchical gap. We give a simple example in Note 4.2.
4.4.3 Collaborations of Nonlinearity and Singular Perturbation Fourier transform makes it clear that nonlinearity bridges different hierarchical levels of scales. For example, let us consider a dynamical system of variables u(x, t) and v(x, t). Fourier-transforming them with respect to x, we write u(x, t) =
ˆ t)eikx , u(k,
k
v(x, t) =
vˆ (k, t)eikx .
k
ˆ t) = u(−k, ˆ Here, we assume u(k, t) and vˆ (k, t) = vˆ (−k, t), so that u(x, t) and v(x, t) are real-valued functions. Suppose that the evolution equation includes a ˆ t)eikx and vˆ (k , t)eik x (and nonlinear term u · v. Then the Fourier components u(k, their complex conjugates) yield
ˆ t)ˆv (k , t)ei(k−k )x . ˆ t)ˆv (k , t)ei(k+k )x + u(k, u(k,
(4.31)
ˆ t)eikx , 1/k represents the length scale (2π/k is the For a Fourier component u(k, wave length). As (4.31) shows, the nonlinear term produces Fourier components of new length scales 1/(k ± k ), and the new ones make further products through nonlinear couplings, yielding an infinite number of length scales (wave lengths). It is singular perturbation that anchors the unlimited scale transformation driven by nonlinear terms. As shown is Sect. 4.4.2, a singular perturbation is represented by a term that is a higher-order derivatives multiplied by a small coefficient (scale parameter ε). Mathematically, the fundamental structure of a differential equation is determined by the highest-order derivatives, i.e., the singular perturbation does dominate the law. Physically (or phenomenologically), however, a term multiplied by a small coefficient is “normally” assumed to be negligible. But an “abnormal” situation where the singular perturbation cannot be neglected occurs when the premised scale is altered by the nonlinear effect. Let us consider more carefully how this happens. Even if the singular perturbation is negligible on the premised scale, it may become appreciably large when nonlinearity produces small-scale fluctuations and
4.4
Connections of Scale Hierarchies
175
the higher-order derivatives amplify to overcome the small magnitude of the coefficient ε. This means that the small-scale hierarchy that is spontaneously created by nonlinearity must be included in our vision. As an example, let us consider a nonlinear evolution equation that is composed by adding a singular perturbation to (4.17): ∂ ∂2 ∂ u + u u = D 2 u, ∂t ∂x ∂x
(4.32)
where D is a small positive constant (representing a viscosity when u is a fluid velocity).10 Choosing the representative values (scales) of time T , length L, and velocity U that satisfy relation (4.18), we normalize the variables to rewrite (4.32) in a dimensionless (or normalized) form ∂ ∂2 ∂ ˇ uˇ + uˇ uˇ = ε 2 u. ∂ tˇ ∂ xˇ ∂ xˇ
(4.33)
Here, the scale parameter is ε=
D . UL
(4.34)
The singular perturbation introduces a particular length scale that makes ε = 1 (i.e., L = D/U ). In the range of L D/U , the viscosity (or diffusion) term dominates, and the system becomes almost linear, destroying the scale invariance of the unperturbed equation (see Sect. 4.3.3). This is why we said “singular perturbation anchors the unlimited scale transformation driven by nonlinear terms.” We note that relation (4.34) includes U , the magnitude of the unknown variable u. This fact is caused by the nonlinearity of (4.32). Remembering the discussion in the preceding subsection, a singular perturbation in a linear system determines a unique scale where it dominates the system. However, in a nonlinear system, the micro-scale where the singular perturbation collaborates with the nonlinearity is determined by the balance of both terms. Finding this balance is not simple; while in the preceding paragraph we have formally used L = D/U to specify the length scale of the singular perturbation, this estimate is made based on the subjective choice of T , L, and U , not the spontaneous scales. We have to estimate how large the magnitude of u really is on the scale that we have yet to determine. Here, we introduce a heuristic method to estimate the balance of nonlinearity and singular perturbation.
10
The nonlinear evolution equation (4.32), which is called Burgers’ equation, describes the motion of one-dimensional fluid. The singular perturbation term represents the viscous force that yields diffusion of the fluid momentum. In one-dimensional fluid mechanics, the viscosity effect is “stronger” than the nonlinear (convection) effect, so that (4.32) can be transformed to a linear diffusion equation (see Sect. 4.4.4).
176
4 Interactions of Micro and Macro Hierarchies
We consider a model of fluid motion generalizing (4.32) to a three-dimensional system of equations; the Navier–Stokes equation governs incompressible (∇ ·u = 0) viscous fluid 11 : ∂ u + (u · ∇)u = εΔu − ∇ p + F, ∂t
(4.35)
where ε is the normalized viscosity defined by (4.34), p is the pressure (mass density of the fluid is normalized to unity), and F is a certain external force. We have applied the same normalization as in (4.33) with macroscopic representatives T , L, and U . Here we omit the “ ˇ ” mark on the normalized variables to simplify notation. To evaluate the balance of the terms, we define the Reynolds number that scales the ratio of the magnitudes of the nonlinear term (u · ∇)u and the viscous damping term εΔu: Re =
[[ (u · ∇)u ]] . [[ εΔu ]]
(4.36)
Here, we denote by [[ f ]] the “representative” magnitude of f . If u(x, t) varies only on the macroscopic length scale, which is normalized to unity, Re is simply the reciprocal of ε. However, it is not an appropriate assumption. The nonlinearity of (4.35) produces u on every length scale. Then the meaning of “representative” is ambiguous, and the definition (4.36) is rather obscure; it may vary when evaluated on different scales. We should define Re on each scale (conventionally, however, the Reynolds number is just the reciprocal of ε that is defined by (4.34) with the typical macroscopic velocity U and the system size L). We denote by ˆf (k) the Fourier transform of a field f (x) and by f k the magnitude of the Fourier component ˆf (k) in the range of |k| ≈ k, i.e., f k evaluates the magnitude of f on the length scale 1/k. Spatial derivatives may be evaluated as |∇ f |k ≈ k f k . To estimate the Fourier component of a nonlinear term, we need some appropriate modeling. Here we invoke Kolmogorov’s ansatz of local interactions in the wave-number space, which may be formulated as follows: Let us consider a quadratic term f · g. All ˆf (k1 ) and gˆ (k2 ) contribute to ( f · g)k , if |k1 ± k2 | ≈ k. However, the locality ansatz claims that only f k and gk dominates ( f · g)k . Similarly, we estimate [( f · ∇)g]k ≈ k f k gk . Using these estimates, we define the Reynolds number in each hierarchical level of k: Re(k) =
11
uk . εk
(4.37)
The following discussion is devoted to fully developed turbulence of fluid. The space dimension is an essential parameter in the theory of turbulence. The one-dimensional system (4.32) does not produce turbulence. In two-dimensional fluid, the nonlinearity is still too weak to produce the fully developed turbulence to which the following model applies (see Note 4.6 as well as the monographs Frisch [8] and Ladyzhenskaya [19]).
4.4
Connections of Scale Hierarchies
177
If u k is not an increasing function of k (we assume that F exists only on a macroscopic scale k ≈ 1), a smaller scale (larger k) has a smaller Re(k). The scale of Re(k) = 1 is called the Kolmogorov scale, where the viscosity term (singular perturbation) starts working to dampen energy. On a larger scale where Re(k) 1, the viscosity term is negligibly small and the nonlinear term dominates the dynamics. We call such a range of scale the inertial range. By this definition, the Kolmogorov scale 1/k D is given by kD =
ukD . ε
(4.38)
To estimate (eliminate) u k D , we invoke the energy injection rate P (by the external force F on the scale k ≈ 1) which balances, under the stationarity of energy, with the energy damping rate (cf. (4.62)): P = εk 2 u 2k |k=k D =
1 4 u . ε kD
(4.39)
Solving (4.39) for u k D , we obtain u k D = P 1/4 ε1/4 . Substituting this estimate of u k D into (4.38), we obtain k D = P 1/4 ε−3/4 .
(4.40)
This relation shows that the Kolmogorov scale—the scale where the singular perturbation and nonlinearity balances—is determined not only by the scale parameter ε but also by the energy injection rate P (the latter scaling the magnitude of the nonlinear term). In the inertial range, the dynamic is dominated by the nonlinear term alone. Since the energy is injected on the scale 1/k ≈ 1 and is dissipated on the Kolmogorov scale 1/k D , the energy “cascades” in the inertial range toward the smaller scale. Let us estimate the distribution of the energy in the scale hierarchy. In steady state, the energy injection rate balances with the energy transfer rate that is determined by the nonlinear term. The energy transfer due to the nonlinear term (inertial force) is given by multiplying it by the flow velocity u (cf. Sect. 4.4.5). By Kolmogorov’s locality ansatz, we estimate [(u · ∇)u · u]k ≈ u 3k k, which must be independent of k and balance with the energy injection rate P. Thus, we obtain u k ≈ P 1/3 k −1/3 . The energy in the range of scale (k, k + δ) is written as E(k, k + δ) = Cu 2k δ = CP 2/3 k −2/3 δ,
178
4 Interactions of Micro and Macro Hierarchies
where C is a constant adjusting the dimensions of the parameters (in the dimensionless formulation, we may set C = 1). We define the energy spectrum function E(k) by
k+δ
E(k, k + δ) =
E(k) dk. k
We then find E(k) = C P 2/3 k −5/3
(4.41)
with C = (9/8)C, which is called the Kolmogorov spectrum. See Problem 4.2 for Kolmogorov’s original derivation of (4.41) based on dimensional analysis.
4.4.4 Localized Structures in Space–Time As we noted in Sect. 4.3.4, nonlinearity often produces a localized (small-scale) structure; in the limit of zero scale, it becomes a singularity, while a singular perturbation may prevent the collapse of the scale. In the preceding subsection, we have shown, using Fourier transform, that nonlinearity produces small scales, but the argument falls short of explaining “localization” of small-scale structures; we have to pay attention to the phases of the Fourier components. A localized structure (or singularity) can be created when the phases of many Fourier components align. Figure 4.5 shows a numerical experiment; Figs. (a) and (b) plot, respectively, f 1 (x) = f 2 (x) =
n k=1 n
cos (kx),
(4.42)
cos (kx + φk )
(4.43)
k=1
in the domain of (−π, π ) (we chose n = 15). In f 2 (x), the phases φk (k = 1, · · · , n) are random ranging in [0, 2π ). On the other hand, f 1 (x) is composed of
(a)
(b)
Fig. 4.5 Compositions of (a) waves with coherent phases and (b) waves with random phases
4.4
Connections of Scale Hierarchies
179
coherent-phase Fourier components, and has a spatially localized structure. Indeed, (4.42) is just the Fourier expansion of the delta function δ(x). The localized structures in space–time (see Fig. 4.4) are better analyzed in coordinate space, rather than in wave number space. First, we show that the singularity of the shock discontinuity, discussed in Sect. 4.3.4, is smoothed by the effect of a singular perturbation and a finite-scale structure is created. Adding a small viscosity, we modify the ideal model (4.19) to (4.33). We normalize all variables by macroscopic scales. To simplify the notation, we omit the “ ˇ ” mark on the normalized variables, and rewrite (4.33) as ∂ ∂2 ∂ u + u u = ε 2 u. ∂t ∂x ∂x
(4.44)
Because of the singular perturbation of the viscosity, (4.44) is a nonlinear diffusion equation. The diffusion effect causes flattening of the distribution of u. Thus, the singularity of the ideal model is modified to a finite-width crevice. For example, we find a stationary solution of (4.44) such as u(x, t) = −2tanh(x/ε),
(4.45)
which describes a standing shock (see Fig. 4.6(a)). General properties of the solutions will be discussed in the next subsection (see also Note 4.4). In (4.44), the singular perturbation (ε∂ 2 u/∂ x 2 ) is relatively stronger than the nonlinearity (u∂u/∂ x), so that the former (linear term) “linearizes” the equation. Here, what we mean by “linearize” is not a linear approximation; the nonlinear equation can be rewritten as a linear equation by an appropriate transformation of the variables, which is called the Cole–Hopf transformation. First we write u(x, t) = ∂ϕ(x, t)/∂ x introducing a potential function ϕ(x, t). Plugging this representation into (4.44), we obtain ∂ ∂x
∂ 1 ϕ+ ∂t 2
∂ ϕ ∂x
2
∂2 − ε 2 ϕ = 0. ∂x
(4.46)
Integrating (4.46) over x, we obtain (with an integral constant C) c x ξ
(a)
(b)
Fig. 4.6 Localized structures created by the collaborations of the convective nonlinearity (u∂u/∂ x) and the singular perturbations of (a) dissipation (viscosity) and (b) dispersion. Figure (a) shows the smoothed standing shock. Figure (b) shows a propagating single-pulse wave (soliton)
180
4 Interactions of Micro and Macro Hierarchies
1 ∂ ϕ+ ∂t 2
∂ ϕ ∂x
2 −ε
∂2 ϕ = C. ∂x2
(4.47)
Putting ϕ = −2ε log ψ, (4.47) transforms into a linear diffusion equation ∂2 ∂ ψ − ε3 2 ψ = C. ∂t ∂x
(4.48)
Note that the nonlinear term was eliminated by a term that originated from the singular perturbation ε∂ 2 u/∂ x 2 . The aforementioned “linearization” does not apply when the space dimension is higher than one. The relative strengths of the nonlinear term and the diffusion term (singular perturbation) change depending on the space dimension—the nonlinear effect strengthens in higher dimensions.12 In an example of a two-dimensional vortex structure (Fig. 4.4), we have seen that the crevice is filled by small-scale complex structures. A three-dimensional vortex system is generally much more complex. In the preceding subsection, we analyzed the three-dimensional Navier–Stokes equation in the wave number space assuming homogeneous turbulent dynamics. However, it is known that the system tends to self-organize localized strong vortexes (like tornadoes), producing “intermittent” fluctuations in space–time. Such structures bring about a slight modification of the Kolmogorov spectrum (4.41). Let us continue to study one-dimensional problems where analytical methods provide truly profound understanding of localized structures created in nonlinear systems. As we have been considering many different subjects from many different points of view, the description of nonlinear phenomena depends strongly on the perspective of the observer, and here the choice of scales of both independent and dependent variables is a matter of great interest. We want to formulate a method to find the “optimal scales” that simplify the description of the dynamics or structures of a nonlinear system. For this purpose, we consider a problem where the collaboration of nonlinearity and singular perturbation is more complex than the previous example of a shock-like structure. We turn to an example from plasma physics, which describes acoustic waves propagating by electric interactions of particles. The governing equations are, in normalized units (omitting the “ ˇ ” mark on normalized variables), ∂ ∂ ρ+ (uρ) = 0, ∂t ∂x ∂ ∂ ∂ u+u u+ φ = 0, ∂t ∂x ∂x ∂2 φ = eφ − ρ, ∂x2 12
(4.49) (4.50) (4.51)
Of course, the problem of the relative “strengths” of the nonlinear term and the singular perturbation is not equivalent to the question of whether the “linearization” succeeds or not, but an important implication might be contained there.
4.4
Connections of Scale Hierarchies
181
where ρ is the density of charged particles (ions), u is the flow velocity (onedimensional), and φ is the electric potential. Equation (4.49) is the mass conservation law, (4.50) is the momentum equation with the electric force ∂φ/∂ x, and (4.50) is Poisson’s equation; the charge density giving the source of the potential is ion density (positive charge) and the electron density (negative charge) which has the Maxwell distribution eφ . Setting φ = 0 and ρ = constant trivializes (4.49) and (4.51), and then, (4.50) reduces into a simple nonlinear equation (4.17). It is due to the nonlinear couplings with (4.49) and (4.51) that the system receives a singular perturbation. However, the characteristic scale of the singular perturbation is not evident in the system of equations. Our aim is to elucidate how the nonlinearity and the (potential) singular perturbation collaborate to create a localized structure. The technique we invoke is called the reductive perturbation method. We start with a stationary solution ρ = 1, u = 0, and φ = 0. Our interest is a small (but nonlinear) perturbation on this stationary state. Introducing a small parameter ε, we expand the dependent variables: ⎧ ρ = 1 + ερ (1) + ε2 ρ (2) + · · · , ⎪ ⎪ ⎪ ⎪ ⎨ u = εu (1) + ε2 u (2) + · · · , ⎪ ⎪ ⎪ ⎪ ⎩ φ = εφ (1) + ε2 φ (2) + · · ·.
(4.52)
All new variables ρ (k) , u (k) , φ (k) (k = 1, 2, · · · ) are assumed to be of order unity. Plugging (4.52) into (4.49), (4.50), and (4.51) yields equations governing the new variables, in which the number of unknown variables has been increased infinitely, so they are not useful at this stage (note that nonlinearity causes couplings of different scales, producing an inseparable hierarchy of equations). However, we may find new scales of independent variables which simplify the system of equations. Let us transform the independent variables x and t as ξ = ε1/2 (x − t), τ = ε3/2 t.
(4.53)
The variable ξ is boosted (by shifting x to x − t) to find a propagating solution and expanded (by multiplying a small coefficient ε1/2 ) to observe a small structure. Simultaneously, time is expanded in τ . The selection of the exponents of expansion will be explained later. The deferential operators are written as ∂ξ ∂ ∂τ ∂ ∂ ∂ ∂ = + = −ε1/2 + ε3/2 , ∂t ∂t ∂ξ ∂t ∂τ ∂ξ ∂τ ∂ξ ∂ ∂τ ∂ ∂ ∂ = + = ε1/2 . ∂x ∂ x ∂ξ ∂ x ∂τ ∂ξ Using these new variables in (4.49), (4.50), and (4.51), we obtain
182
4 Interactions of Micro and Macro Hierarchies
∂ (−ρ (1) + u (1) ) ∂ξ ∂ (1) ∂ (1) (1) ∂ (2) ρ + (ρ u ) + (u − ρ (2) ) + · · · = 0, +ε5/2 ∂τ ∂ξ ∂ξ ∂ (−u (1) + φ (1) ) ε3/2 ∂ξ ∂ (1) ∂ ∂ (2) +ε5/2 u + u (1) u (1) + (φ − u (2) ) + · · · = 0, ∂τ ∂ξ ∂ξ ∂ (−φ (1) + ρ (1) ) ε3/2 ∂ξ 3 ∂ (1) ∂ (2) (1) ∂ (1) (2) φ − φ + − φ ) + · · · = 0. φ (ρ +ε5/2 ∂ξ 3 ∂ξ ∂ξ
ε3/2
Summing up the terms multiplied by ε3/2 , we obtain ρ (1) = u (1) = φ (1) .
(4.54)
To order ε5/2 , we obtain three equations. Summing them and using (4.54), we obtain ∂ (1) ∂ 1 ∂ 3 (1) φ = 0. φ + φ (1) φ (1) + ∂τ ∂ξ 2 ∂ξ 3
(4.55)
The first-order variable φ (1) (ξ, τ ) is determined, separately from the higher-order terms, by (4.55), which is called the KdV equation. Comparing (4.55) with (4.17), we find that the term including the third-order derivate works as a singular perturbation (in the present normalization, the scaling parameter is set to unity). Unlike the diffusion term in (4.44), the third-order derivate brings about a dispersive effect which, preventing concentration of wave energy into a singularity, produces so-called solitons. For example, we easily verify that a solitary wave φ (1) (ξ, τ ) = 3c sech2
c/2(ξ − cτ )
(4.56)
is a solution of (4.55), where c is an arbitrary constant representing the propagation speed (see Fig. 4.6(b) and Problem 4.3).13 Let us analyze what was essential in the derivation of the simplified equation (4.55). In the system (4.49), (4.50), and (4.51), Poisson’s equation (including a higher-order derivative ∂ 2 /∂ x 2 ) is the origin of the dispersion (singular perturbation). The point of the ordering (4.53) is that the second-order spatial derivative 13
For further study of the theory of solitons, the reader is referred to, for example, Brazen & Johnson [3] and Infeld & Rowlands [14].
4.4
Connections of Scale Hierarchies
183
∂ 2 /∂ x 2 scales as ε1 ∂ 2 /∂ξ 2 +· · · to make a balance with a second-order nonlinearity that appears as multiplication of ε1 φ (1) + · · · . Therefore, on the length scale ε1/2 x, the second-order nonlinearity produced by perturbations of the scale ε can make a balance with the dispersion term (see Problem 4.4).
4.4.5 Irreducible Couplings of Multi-Scales To study the general properties of the collaboration of nonlinearity and singular perturbation, we start by considering an abstract equation that includes both terms. Suppose that the singular perturbation is scaled by a small coefficient ε. Putting ε = 0, the equation reduces into the unperturbed (or ideal) equation; we call it the “0-model.” And we call the precise equation (perturbed by a finite ε) the “ε-model.” We denote the solution of the ε-model by u ε . If there exists u 0 such that lim u ε = u 0 ,
ε→0
and this u 0 satisfies (in some sense) the 0-model, we may use the 0-model as an approximate macroscopic model that is separated from the smaller-scale hierarchical level. Even if that is the case (here we call it the class-1 singular perturbation), however, the 0-model may have some solutions that cannot be constructed as the limit of u ε . Such solutions should be excluded as improper solutions (we shall see a concrete example later). The influence of the singular perturbation is very serious when u ε diverges in the limit ε → 0, or, even if it converges, the limit does not satisfy the 0-model. In this case (here we call it the class-2 singular perturbation), we would overlook important effects originating from the smaller-scale hierarchical level, if we were to use the 0-model. In the example of Sect. 4.4.2, the small-scale oscillation (4.27) becomes singular (the frequency diverges) in the limit of ε → 0. The large-scale component (4.25) alone is regular in this limit, and it gives the solution of the 0-model. Because this example is a linear system, the two different hierarchical levels are independent; omitting the small-scale phenomena does not influence the 0-model solution u 0 . In this sense, the 0-model applies as a coarse-grained macroscopic model of the system, while it overlooks the small-scale phenomenon. In nonlinear systems, hierarchical levels are generally connected, thus neglecting the singularity (small-scale effects) produced by a class-2 singular perturbation leads to a serious misunderstanding of the phenomenon. We shall look at a specific example later. First we give a typical example of class-1 singular perturbation. In the preceding subsection, we have seen that the shock discontinuity metamorphoses to a finite width crevice by the effect of the singular perturbation. In this problem, (4.19) is the 0-model and (4.33) is the ε-model. All variables are normalized by macroscopic scales (we are omitting the “ ˇ ” mark on the normalized variables). Here we give an initial condition with shock:
184
4 Interactions of Micro and Macro Hierarchies
u 0 (x, 0) =
−a (x 0), a (0 < x),
(4.57)
where, we assume that a is a positive constant of order unity. The 0-model has the following standing-shock stationary solution: u 0 (x, t) =
−a (x 0), a (0 < x).
(4.58)
The 0-model has also an expansion-wave solution in t 0: ⎧ ⎨ −a (x −at), u 0 (x, t) = x/t (−at < x < +at), ⎩ a (at x);
(4.59)
see (4.22) and Fig. 4.3. Moreover, shifting time t by an arbitrary amount τ ( 0) in (4.59), and connecting it to (4.58) at t = τ , we can produce another solution. Hence the initial-value problem of the 0-model (4.19) with the initial condition (4.57)—the so-called Riemann problem—has an infinite number of solutions. We note that the aforementioned “solutions” with the singularity (shock discontinuity) satisfy the differential equation (4.19) in a certain “generalized sense.” Of course, a function cannot be differentiated at discontinuous points. Thus the shock solutions satisfy the differential equation excepting the singularity. Although the discontinuous point must be excluded in evaluating the governing equation, it is not “omitted” in the sense that the values of u 0 on both sides of the discontinuous point are controlled by the equation. In fact, the standing-shock solution (4.58) implies that, if u 0 (x, t) = −a in x < 0, its value in x > 0 must be +a; it cannot be an arbitrary number. To interpret the governing equation of such singular solutions— so-called weak solutions— we have to generalize the meaning of the derivatives (see Note 4.5). As a result of this “generalization,” the uniqueness of the solution is lost. In order to judge which of (4.58) and (4.59) is the “plausible solution” to represent the macroscopic image of the phenomenon, we have to invoke the ε-model. As we have shown in the preceding subsection, the ε-model (4.33) is a diffusion equation, thus the singularity of the standing shock (4.58) cannot be sustained.14 Once the discontinuity is smoothed to be a finite-width crevice, the slope of u ε tends to flatten by the diffusion effect, and also the width of the crevice expands because the up-stream velocity is larger than the down-stream velocity. Therefore, the solution of the ε-model can never converge to the standing-shock solution in the limit of ε → 0, while it may converge to the expansion solution (the flattening solution). By invoking the ε-model—a more precise model that is sensible on a
14 When a < 0, the fluid elements move toward discontinuity (see Fig. 4.3). The balance of such flow and diffusion can sustain an inhomogeneous distribution (see Fig. 4.6(a) and Note 4.4).
4.4
Connections of Scale Hierarchies
185
smaller scale—the unphysical solutions of the crude (macroscopic) 0-model can be eliminated. Next we discuss turbulence as a typical example of class-2 singular perturbation. Here, the ε-model is the Navier–Stokes equation (4.35) of incompressible fluid mechanics. The corresponding 0-model is given by putting the viscosity coefficient ε = 0, which is called Euler’s equation.15 As hereinafter described, this “ideal” model of fluid does not apply in any sense to “turbulence”, although it is useful in the study of regular motions of fluids. Here we put the external driving force F = 0, for simplicity. We assume that the fluid is confined in a bounded domain Ω. On the boundary Γ, the velocity must vanish:16 u = 0 (on Γ).
(4.60)
Let us calculate the energy of the fluid motion. Using the ε-model (4.35), we obtain ∂ u · u d x = [−(u · ∇)u + εΔu − ∇ p] · u d x. (4.61) ∂t Assuming that u is a sufficiently smooth function, the left-hand side of (4.61) can be rewritten as d 1 |u|2 d x . dt 2 The parenthetic term, which we shall denote by E, represents the total (volumeintegrated) kinetic energy of the fluid (the mass density is normalized to unity). Integrating by parts, the right-hand side of (4.61) can be rewritten as p + |u|2 /2 (∇ · u) d x = −ε |∇ × u|2 d x, −ε |∇ × u|2 d x + where we have used ∇ · u = 0 and (4.60). Now (4.61) reads as 15
In the one-dimensional model (4.17), the nonlinearity due to the compressional motion makes the characteristic curves collide and produce the singularity of shock discontinuity (see Sect. 4.3.4 and Note 4.3). For the present 0-model of three-dimensional incompressible flow, it is still unknown whether singularities can be created within a finite time. In the two-dimensional case, however, it is known that singularities do not appear if the initial state is sufficiently regular (see, for example, Foias et al. [7]).
While the scale parameter ε is not explicitly visible in the boundary condition (4.60), we have to reduce it to a weaker condition n · u = 0 (n is the unit normal vector onto Γ) for the 0-model (inviscid fluid model). Physically, this is because the inviscid fluid can slip along the boundary (while it cannot penetrate). From a mathematical perspective, the reduction of the boundary condition is appropriate because the second-order derivatives are removed and the PDE reduces into a first-order equation. 16
186
4 Interactions of Micro and Macro Hierarchies
d dt
1 |u|2 d x = −ε |∇ × u|2 d x. 2
(4.62)
The left-hand side is the rate of change of the energy E. The right-hand side cannot be positive, implying that the energy is always dissipated by the viscosity as far as the vorticity ∇ × u is finite at some place. The 0-model assumes ε = 0 in (4.35). Then we have to put ε = 0 in the energy estimate (4.62). We thus find the conservation of the energy E. To estimate the energy E of general fluid motion, we have to plug the actual solution u(x, t) into the right-hand side of (4.62). Needless to say, we cannot hope for an explicit expression of the function u(x, t) of general turbulent flows. As we discussed in Sects. 2.4 and 2.5, an estimate such as (4.62) should be considered as an a priori law, i.e., a relation that is deduced by the equation, not from the solution; in fact, we derived (4.62) by manipulating the equation itself. As long as the matter of interest is the bound of the energy, for instance, the “inequality” d E/dt 0 (or the equality d E/dt = 0 for the 0-model) suffices. Our present interest, however, is the actual evolution of E(t), so we need the solution u(x, t) to evaluate the right-hand side of (4.62). General properties of E(t), especially its dependence on ε, are beyond the understanding of theory. There has been a great deal of progress in numerical simulations, though, which suggest the behavior as sketched in Fig. 4.7. Even if we reduce ε close to zero, E(t) does not tend to be conserved. This implies that the solution of the ε-model (detailed model including the small-scale effect) does not approach the solution of the 0-model (macroscopic model that excludes the small-scale effect) in the limit of ε → 0. How does such a discrepancy occur? When the viscosity coefficient ε is reduced, the general fluid motion becomes more turbulent so that smaller vortexes are generated. As we have shown in (4.62), the energy dissipation is due to the vorticity ∇ × u, in which magnitude amplifies with the reduction of the size of the vortexes. On the right-hand side of the energy estimate (4.62), a reduction of the coefficient ε yields an increase of the square integral of the vorticity, resulting in a finite bound
E
ε=0 ε=1 ε = 10–3 ε = 10–6 ε = 10 t 0
Fig. 4.7 Conceptual graphs describing the change of energy E of fluid motion. The 0-model conserves energy. However, the limit ε → 0 of the solution of the ε-model does not conserve energy
Notes
187
for the product. Hence the 0-model is irrelevant as a model of turbulence (if we are concerned about the energy of the fluid).17 A nonlinear system often connects different hierarchical levels by self-organizing a structure. Thus, we may not simply assume that a 0-model—a priori neglect of the small-scale effect represented by a singular perturbation—gives a coarse-grained description of the large-scale hierarchy. We have to analyze carefully what influence comes up to the larger-scale level from the smaller-scale level, and on the other hand, how large-scale events influence small-scale fluctuations. As discussed in the preceding subsection, the connection of different scales tends to occur in a localized place, a crevice in space–time; it is generally difficult to predict where and when it is created. In Fig. 4.4, we have shown an example of numerical simulation of a plasma (electrically conducting high-temperature fluid). Although large-scale vortexes self-organize an ordered array of opposite-polarity vortexes (left figure), the order provokes disorder—opposite-polarity subdomains remain separated by crevices which are filled by microscopic disordered fluctuations (right figure). The understanding of such hierarchical connections is central to decipher the complexity that develops in a nonlinear system.
Notes Note 4.1 (Topology of Function Spaces) In mathematical analysis, topology defines the framework for judging whether or not two elements of the set are close. Usually, the notion of distance is introduced to define the topology. Here, the distance dis(a, b) of a, b ∈ X is a map from X × X to R such that (1) dis(a, b) 0 and dis(a, b) = 0 is equivalent to a = b, (2) dis(a, b) = dis(b, a), and (3) dis(a, c) dis(a, b) + dis(b, c). A linear space that is endowed with a definition of distance is called a metric space. A norm a − b defines a distance (cf. Note 1.1). A metric space whose distance is defined by a norm is called a normed space. If it is complete, it is called a Banach space. Axiomatically, defining a topology by a certain distance and defining a criterion of “convergence” is equivalent. For example, if we say that a sequence {e−n ; n = 1, 2, · · · } converges to 0, it means that the distance dis(e−n , 0) = |e−n − 0| becomes 0 in the limit of n → ∞. The set of points whose distance from 0 is less than ε is called the ε-neighborhood of 0. Every point e−n with n > − log ε is included in the ε-neighborhood of 0. The judgment of inclusion/exclusion defines the system of difference, that is the topology (see Sect. 4.2).
17
The foregoing example of the 0-model of standing shock (4.58), which we categorized to class1, also fails to give an energy estimate (to estimate the energy dissipation, we need to evaluate the derivative). The energy dissipation of the solution of the ε-model is shown to be independent of ε (see Problem 4.5).
188
4 Interactions of Micro and Macro Hierarchies
A finite-dimensional vector space is given the same topology independent of the specific definition of the distance, i.e., if a sequence {un } satisfies limn→∞ dis(un , u) = 0, then, by any other definition of distance, limn→∞ dis (un , u) = 0. This is because the convergence un → u is equivalent to the convergence, in R or C (the field of scalar), of every component of the vector un . However, the topology of a function space (cf. Note 1.1) changes depending on the specific definition of the distance. The same series of functions may be tested to be either converging or not converging depending how the distance of functions is measured. In this sense, there are scale hierarchies of function spaces. Here we give simple examples. Let us consider complex-valued functions defined on (a, b) ⊂ R. We measure the distance of two functions f (x) and g(x) by a norm (cf. (1.48)) f (x) − g(x) L 2 =
b
1/2 | f (x) − g(x)|2 d x
.
(4.63)
a
We denote by L 2 (a, b) the function space whose topology is defined by this distance. When derivatives of functions are the matter of concern, we may consider a different distance (norm) that takes into account the derivatives, too: f (x) − g(x) H n
. n . j . d [ f (x) − g(x)] . . . = . . dx j j=0
(n = 1, 2, · · · ),
(4.64)
L2
where we put d 0 /d x 0 = 1. The function space whose topology is defined by the norm (4.64) is called the Sobolev space, which is denoted by H n (a, b). We may identify H 0 (a, b) = L 2 (a, b). It is evident by the definition that a convergent sequence in H n (a, b) is convergent in H m (a, b) if m < n. However, the converse is not necessarily true. Let us see an example. We put (a, b) = (0, 1) and u m (x) = √ m n=1 an ϕn (x), √ ϕn (x) = 2 sin (kn x), an = 2 2kn−1 ,
(4.65)
kn = (2n − 1)π.
Setting u(x) ≡ 1 (x ∈ (0, 1)), we measure the distance between u m (x) and u(x). We observe ⎡
u m − u L 2 = ⎣
0
1
" m n=1
#2 4kn−1
sin (kn x) − 1
⎤1/2 dx⎦
= 1−8
m
1/2 kn−2
,
n=1
thus, limm→∞ u m − u L 2 = 0, i.e., the sequence {u m (x); m = 1, 2, · · · } converges to u(x) in the space L 2 (0, 1). On the other hand, we observe
Notes
189
u m − u H 1 = u m − u L 2
1/2 . . m . d(u m − u) . . = 1−8 +. kn−2 + (8m)1/2 . . 2 . dx L n=1
The right-hand side diverges with m → ∞. Thus, the sequence {u m (x)} does not converge in the space H 1 (0, 1). We note that (4.65) is the Fourier expansion of u(x). The foregoing calculation is an experiment to decompose the function u(x) into Fourier modes and reconstruct it by vector composition (in an infinite-dimensional linear space). The reconstruction (Fourier synthesis) may succeed or fail depending on the definition of the topology, the criterion of convergence. Such richness (or ambiguity) of the vision (topos) pertains to the infinity of the degree of freedom, i.e., the infinite dimensionality of state space. For the theory of topological vector spaces, the reader is referred to the textbooks Lax [22] and Yosida [37].
Note 4.2 (Simple Example of Renormalization—Correction of Scale) A singular perturbation dominates primarily the small-scale hierarchical level. However, it also influences larger-scale phenomena. Here we analyze the linear equation (4.23) as an example.18 To estimate the effect of the singular perturbation on a “larger” scale, we need to improve the approximation used in Sect. 4.4.2. We put ˜ u(t) = u 0 (t) + εu(t),
(4.66)
where u 0 (t) = ae−iωt is the approximate (unperturbed) solution given by (4.25). ˜ and u 0 (t) are of order unity; we are going to correct the We assume that both u(t) unperturbed solution u 0 (t) by adding a term of order ε to take into account the perturbation term. Plugging (4.66) into (4.23), we obtain, to order unity, −i
d u˜ + ωu˜ = ω2 u 0 . dt
(4.67)
Solving (4.67), we obtain ˜ =e u(t)
−iωt
t 2 iωt a + iω e u 0 (t ) dt
= a e−iωt + iω2 ate
18
This example is due to Oono [29].
0 −iωt
,
(4.68)
190
4 Interactions of Micro and Macro Hierarchies
˜ As far as t is of order where a is a constant corresponding to the initial value of u. unity, this solution is valid. But when t becomes as large as ε−1 , the approximation 2 ˜ becomes ˜ beaks down; because |u(t)/u 0 (t)| = |(a /a)+iω t|, the correction term ε u larger than u 0 , and then, the higher-order terms omitted in (4.67) are no longer negligible. The correction term that grows with t is called a secular perturbation. We have to removes the difficulty of the divergence of the secular term. Because the solution (4.68) is valid for a sufficiently short time, we try to extend time by connecting the short-period solution with renewing the initial condition; given a corrected solution u(t) up to time T , we solve (4.67) for t > T with giving the initial condition u(T ). Putting s = t − T (redefined time) and u(T ) = A(T )e−iωT (renewed initial condition), the unperturbed solution is written as u 0 (T + s) = A(T )e−iω(T +s) = A(T )e−iωt . Correcting this by the perturbation, we obtain u(T + s) = A(T )e−iωt + εiω2 A(T )se−iωt
(4.69)
= A(T )e−iωt + εiω2 A(T )(t − T )e−iωt . For an arbitrary t, we put t = T + (t − T ) = T + s (with an appropriate T ) to make s < 1. Then the representation (4.69) does not break down. Yet we have to determine the unknown function A(T ). The equation to determine A(T ) is derived through the following consideration. Since T is an artificial parameter introduced for the convenience of calculations, it must not remain in the final expression of the solution. Hence we demand ∂u/∂ T = 0; plugging (4.69) into this equation, and putting t = T , we obtain d A − εiω2 A = 0. dT 2
Solving this equation, we obtain A(T ) = A0 eiεω T . The value of the corrected solution at time T is, then, A(T )e−iωT . The corrected solution (4.69) now reads as u(t) = A0 e−iω(1−εω)t .
(4.70)
Comparing (4.70) with the unperturbed solution u 0 (t) = ae−iωt , we find that the time constant is corrected as ω(1 − εω), implying a slight modification of the time constant, which may be noticed when we observe the event on the long time scale t > ε−1 . Because we have the exact solution (4.28) of this simple problem, we may compare it with (4.70) to examine how the foregoing perturbation method improves the approximation. Approximation to order ε2 of the Taylor expansion of ελ+ gives λ+ ≈ ω(1 − εω), which agrees with the time constant of (4.70); recall that u 0 (t) has the time constant that is the approximation to order ε of ελ+ .
Notes
191
The present calculation is a simple example of the so-called renormalization. The divergence of the secular perturbation is removed by renormalizing the time constant, the unit defining the scale.
Note 4.3 (Colliding Characteristic Curves and Shocks) The mechanism of the creation of the shock singularity in the nonlinear model (4.17) of fluid motion is elucidated by the method of characteristic ODE (see Note 2.2). The nonlinear PDE (4.17) implies that the function u(x, t) is constant along the orbit of each “particle” that moves with the velocity u(x, t) (see Sect. 2.4.3). The orbit (characteristic curve) is determined by the equation of motion (characteristic ODE) dx = u. dt
(4.71)
The initial condition x(0) = xˆ specifies the start point of each “particle.” Although u on the right-hand side of (4.71) is an unknown variable, it is determined, by (4.17), to be simply constant along the orbit. Hence the solution of (4.71) is given by x(t) = xˆ + u 0 (xˆ )t. Given the initial distribution (4.21), we obtain xˆ =
x 1 + at
(−1 < xˆ < 1).
(4.72)
Since u is constant along the orbit, we have u(x, t) = u(xˆ , 0). Plugging (4.72) into this relation, we obtain the solution (4.22). If a < 0, the orbits determined by the equation of motion (4.71) collide at x = 0, t = −1/a. On each orbit, the value of u is conserved, thus u(x, t) becomes multivalued when different orbits collide—this is the mechanism of the creation of the shock singularity. Because the foregoing example is a problem of one-dimensional space, we could find the solution of the nonlinear PDE rather easily. In higher-dimensional space, much more complicated phenomena occur, and their behavior have not yet been well understood. What is still very difficult to understand is the creation and evolution of vortexes. As we have seen, the creation of shock singularity is understood as the collision of characteristic curves (the compressing motion of the fluid). However, the mechanism of the creation of vortex singularity is different from this process; the “shearing motion” brings about scale reduction that may possibly yield a singularity in an incompressible fluid. When the space dimension is two, it is known that singularities cannot be created within a finite time (see Kato [17] for ideal flow with boundary, and McGrath [27] for ideal and viscous flow without boundary). However, the three-dimensional problem is unsettled (see, for example, Ladyzhenskaya [19] and Temam [32, 33]). For further studies on shocks and other nonlinear wave phenomena, the reader is referred to Courant & Friedrichs [5] and Jeffrey & Taniuti [16].
192
4 Interactions of Micro and Macro Hierarchies
Note 4.4 (Rankin–Hugoniot Relation and Entropy Condition) Let us derive the condition for a shock-like localized structure (see Fig. 4.6) to be sustained against the smoothing effect of viscosity. First we rewrite (4.44) as ∂ ∂2 ∂ u+ f (u) = ε 2 u, ∂t ∂x ∂x
(4.73)
where f (u) = u 2 /2. We focus on a solution that propagates, keeping the shape, at a constant speed V , so we put u (x, t) = ϕ((x − V t)/).
(4.74)
We assume (see Fig. 4.8) ϕ(+∞, t) = u + ,
ϕ(−∞, t) = u − ,
(4.75)
where u + and u − are constants. In the limit → 0, u 0 has a jump u + − u − at x = V t. Plugging (4.74) into (4.73), we obtain − V ϕ + f (ϕ) = ϕ ,
(4.76)
where denotes the differentiation. Integrating (4.76) we obtain A − V ϕ + f (ϕ) = ϕ ,
(4.77)
where A is the integration constant. By (4.75), we find A = V u + − f (u + ) = V u − − f (u − ).
(4.78)
We may rewrite (4.78) as V (u + − u − ) = f (u + ) − f (u − ), u+
(4.79)
u–
V
V u+
u–
x
x
(a)
(b)
Fig. 4.8 Propagating shock solution: (a) This profile violates the entropy condition. The shock cannot be sustained and expansion wave is generated. (b) This profile satisfies the entropy condition. The shock can propagate (keeping the shape) at the speed given by the Rankine-Hugoniot relation
Notes
193
which is called the Rankine–Hugoniot relation. By (4.79), we can relate the jump of the shock and the propagation speed. As has been mentioned (in heuristic words) in Sect. 4.4.5, a finite ε destroys the standing-shock solution and converts it into an expansion wave, when u + = a > u − = −a and V = 0. Let us prove this fact more formally. We shall also show that the shock is sustained when u − > u + . First we assume that u + > u − . To connect the up- and down-stream regions smoothly, as the diffusion equation demands, we need ϕ > 0 (see Fig. 4.8). Using this condition in (4.77) and eliminating A by (4.78), we obtain V =
f (u) − f (u − ) f (u + ) − f (u − ) < u+ − u− u − u−
(u − ∀u u + ).
(4.80)
(u + ∀u u − ).
(4.81)
When u + < u − , the same argument yields V =
f (u + ) − f (u) f (u + ) − f (u − ) < u+ − u− u+ − u
Relation (4.80) or (4.81) is called the entropy condition. Let us examine the entropy condition for the Riemann problem that we discussed in Sect. 4.4.5. The standing-shock solution (4.58) is the limit ε → 0 of (4.74) with u + = a, u − = −a, and V = 0. If a > 0, (4.80) must hold, which means that the graph of f (u) must be above the line connecting f (u − ) and f (u + ). But f (u) = u 2 /2 is a convex function, so this condition cannot be satisfied (while the Rankin–Hugoniot condition is satisfied). On the other hand, if a < 0, the entropy condition (4.81) is satisfied by the standing-shock solution. Thus, the narrow structure is sustained (cf. the analytic solution (4.45) given in Sect. 4.4.4). The entropy condition is generalized to a more complicated relations in general shock problems (see Jeffrey & Taniuti [16]).
Note 4.5 (Weak Solution) It is often convenient to extend the notion of the solution to a differential equation; a weak solution is a function which may include singularities, so it is not differentiable in the conventional sense, but it satisfies a certain relation that is some integral form of the differential equation. We start with the example of the nonlinear differential equation (4.17) which creates shock singularities. We first “rewrite” it as (cf. (4.73)) 1 ∂ 2 ∂ u+ u = 0. ∂t 2 ∂x
(4.82)
Here we have assumed that we can calculate ∂u 2 /∂ x = 2u∂u/∂ x. A singular (discontinuous) function (modeling a standing shock)
194
4 Interactions of Micro and Macro Hierarchies
u(x, t) =
−a (x 0), +a (0 < x)
(4.83)
satisfies (4.82), because u(x, t)2 ≡ a 2 . When a < 0, we have the contracting solution (4.22) which approaches the singular function (4.83) as t → −1/a. And we may connect (4.22) to (4.83) for t −1/a. Of course, such a singular function does not satisfy (4.17), but it satisfies the “rewritten” equation (4.82). Contrary, when a > 0, we may connect (4.83) to the expansion solution (4.22) at an arbitrary time t0 (with setting t = t − t0 ), which also satisfies (4.82). We may consider a more irregular solution of (4.82); given an arbitrary series of intervals I0 = (−∞, x1 ], I1 = (x1 , x2 ], · · · , Im = (xm , +∞) of R (m is some natural number), we put u(x, t) = (−1)n a
(x ∈ In , n = 0, 1, · · · , m).
(4.84)
This singular function is also a solution of (4.82). Thus, the “rewritten” equation (4.82) cannot determine a unique solution. Equation (4.82) is called the weak form of the differential equation (4.17). If u(x, t) is a solution of (4.17), it is “differentiable” with respect to x, so we may calculate as 2u∂u/∂ x = ∂u 2 /∂ x. Hence u(x, t) satisfies the weak form (4.82), too. However, the converse is not always true. The singular solution (4.83) or (4.84) of the weak form (4.82) is not differentiable at the discontinuous points, thus we cannot calculate as ∂u 2 /∂ x = 2u∂u/∂ x. Yet, we may give the derivatives of such singular functions new meaning by the theory of distributions; the solutions of a weak-form equation—we call them weak solutions—are generalized functions (distributions) which are “differentiable” in a generalized sense.19 Let us give the formal definition of the weak form and the weak solution. There are some variety of “weakness” depending on the definition. Before general arguments, we continue to discuss the example (4.17). The weak form of the initial-value problem (with an initial condition u(x, 0) = u 0 (x)) is the problem to find u(x, t) that satisfies +∞ +∞ +∞ u ∂ ∂ ϕ+ ϕ d x dt − u u 0 (x)ϕ(x, 0) d x = 0 (4.85) ∂t 2 ∂x 0 −∞ −∞ for every “smooth” function ϕ(x, t). The support of ϕ(x, t) (the set of points (x, t) where ϕ(x, t) = 0) is assumed to be a compact set, i.e., ϕ(x, t) becomes zero as 19
In Sect. 4.4.5, we introduced the notion of “class-1 singular perturbation” to distinguish the case where we may give some generalized meaning to the singular solutions. We note that the aforementioned weak solutions satisfy the Rankine–Hugoniot relation (4.79), which is derived from the weak form (4.73). The singular perturbation yields the entropy condition (4.80) or (4.81) which determines the unique solution that is physically plausible (i.e., amenable to the singular perturbation).
Notes
195
t → +∞ or x → ±∞. The degree of the “smoothness” of ϕ(x, t) determines the “weakness” of the solution. In the weak form (4.85), u(x, t) is not differentiated, so it is not required to be a smooth function. On the other hand, the function ϕ(x, t) is differentiated, so it must be smooth. If u(x, t) is sufficiently smooth, we can integrate the left-hand side of (4.85) by parts to transform it as u ∂ ∂ ϕ+ ϕ d xdt − u 0 (x)ϕ(x, 0) d x ∂t 2 ∂x ∂ 1 ∂ 2 u+ u ϕ d xdt + [u(x, 0) − u 0 (x)]ϕ(x, 0) d x =− ∂t 2 ∂x (4.86) ∂ ∂ u + u u ϕ d xdt + [u(x, 0) − u 0 (x)] ϕ(x, 0) d x. =− ∂t ∂x (4.87)
u
Let us look at (4.87): If this is zero for every ϕ(x, t), both integrands (two parenthetic terms) must be zero, implying that the differential equation (4.17), as well as the initial condition, must be satisfied. In the intermediate expression (4.86), we find the “rewritten” form (4.82) of the equation. Along with the manipulations transforming (4.85) to (4.86) and further to (4.87), the requirement for the smoothness (regularity) of u(x, t) is strengthened. In (4.87), the “solution” recovers its original (classical) meaning. Changing our view point, we may consider that (4.85) is the “definition” of (4.87)—what is defined in (4.87) is the “derivatives” included there. As far as we can evaluate (4.85) (i.e., if ϕ(x, t) is sufficiently smooth), the formal derivatives included in the expression (4.87) can be defined even if u(x, t) is not differentiable in the classical sense; this is the so-called derivative in the distribution sense. A weak solution satisfies the determining differential equation with interpreting the derivatives in this wider (weaker) sense. Finally, we give the general definition of the weak form. Let us consider an abstract functional equation F(u) = 0,
(4.88)
where u is a member of a function space X and F is a map from X to another function space Y . We denote the norm of Y by Y . The equality of (4.88) means that F(u)Y = 0. This condition may be weakened to formulate a weak form of (4.88). The set Y ∗ of all continuous linear maps on Y is called the dual space of Y . We denote by ϕ, y the value of ϕ ∈ Y ∗ for y ∈ Y . Giving a certain subset W ⊆ Y ∗ , we define a weak form of (4.88) as ϕ, F(u) = 0
(∀ϕ ∈ W ).
(4.89)
196
4 Interactions of Micro and Macro Hierarchies
A smaller W imposes a weaker requirement, thus the set of the solutions becomes larger. Normally W is chosen to be a dense subset of Y ∗ . Changing the smoothness criterion of ϕ, we may control the size of W . If we choose W = Y ∗ , the weak form (4.89) is equivalent to the original (strong form) equation (4.88), because y = 0 is equivalent to ϕ, y = 0 (∀ϕ ∈ Y ∗ ).20 The following remarks should be mentioned. In an infinite-dimensional vector space Y , lim y j = 0 is not equivalent to limϕ, y j = 0 (∀ϕ ∈ Y ∗ ); the latter one is called the weak convergence, while the former (conventional) one is called the strong convergence. The weak convergence occurs under a weaker criterion in comparison with the strong convergence. Therefore we have to be careful when we construct solutions through considering the limits such as lim ϕ, F j (u) = 0 (∀ϕ ∈ Y ∗ ),
(4.90)
(∀ϕ ∈ Y ∗ ).
(4.91)
j→∞
or lim ϕ, F(u j ) = 0
j→∞
These conditions are weaker than the strong convergence condition, thus the solutions given by these processes are also called weak solutions. For the careful study of the weak-solution representation of shocks (nonlinear hyperbolic PDE), the reader is referred to Lax [21]. In the theory of the Navier– Stokes equation (nonlinear parabolic PDE), the existence of the weak solution has been established by Leray [23, 24] and Hopf [13], but their relation to the classical (smooth) solution is unknown. It is still unsettled whether or not the Navier–Stokes equation has a strong solution globally in time (cf. Note 4.6).
Note 4.6 (Fluid-Mechanics Equations) This Note is a short summary of a variety of fluid-mechanical equations, ranging from one-dimensional to higher-dimensional systems, from incompressible to compressible models, and from neutral to electromagnetic fluids. We also review basic mathematical theories related to fluid mechanics. We start with a general transport equation in space–time: ∂ u + v · ∇u = f, ∂t
(4.92)
where u is a certain unknown variable, v is the fluid velocity (n-dimensional vector field), and f is some (internal or external) driving or damping term. We have If Y is a Hilbert space, this is evident, because ϕ, y can be regarded as the inner product of ϕ ∈ Y and y (Riesz’ representation theorem). If Y is a general Banach space, it is generally not easy to represent Y ∗ explicitly, but this relation is deduced from the Hahn–Banach theorem (see for example Lax [22] and Yosida [37]).
20
Notes
197
encountered this type of equation in general mechanics theory (Chap. 2), i.e., the equation of collective motion (2.64) (where f = 0 so that u is a constant of motion). (1) One-dimensional problems. In one-dimensional space, the flow v has only one component. Chap. 4, we have considered a variety of examples: the linear wave equation (4.8) and the diffusion equation (4.13) are nonlinearized by setting v = u to yield, respectively, the nonlinear wave equation (4.17) and the nonlinear diffusion equation (4.44). In these examples f ∝ ∂ 2 u/∂ x 2 is the dissipation (diffusion or viscosity) term. The KdV equation (4.55) assumes f ∝ ∂ 3 u/∂ x 3 which brings about dispersion effect. If f is zero, (4.92) may be viewed as a Hamilton–Jacobi equation, which is integrated by the method of characteristics (see Note 2.2); if it is nonlinear (v = u), the characteristics collide to produce singularities (see Note 4.3). (2) Incompressible two-dimensional flow. In two-dimensional space,21 an incompressible flow (∇ · v = 0) may be represented in the form of a Hamiltonian flow: v=
∂ψ/∂ y −∂ψ/∂ x
.
(4.93)
Then (4.92), with f = 0, reads as Liouville’s equation (see (2.116)): ∂ u + {ψ, u} = 0, ∂t
(4.94)
where x and y are the canonical conjugates. The streamlines of v is given by the level-sets of the Hamiltonian ψ. As we shall see, the vorticity of an incompressible two-dimensional flow obeys (4.94). A finite f (for example a viscosity) destroys the conservation of u. (3) General two-dimensional flow and level-set dynamics. In a general compressible flow, streamlines can collide to produce singularities. Let us focus on the movement of the level-set (contour), on the x–y plane, of the function u(x, y, t). If u(x, y, t) has a “cliff” (jump discontinuity) its motion may be viewed as the movement of, for instance, the interface of phases. We may write v = V ∇u/|∇u|, where V is a certain function representing the speed of the propagation of u. Then (4.92) may be written as ∂ u + V |∇u| = 0. ∂t
(4.95)
If V = V (x, y, t, ∇u), (4.95) is a Hamilton–Jacobi equation. Let us consider the motion of a surface that is written as y = φ(x, t). Introducing a function u(x, y, t) = φ(x, t)− y, the surface is given by the level-set u(x, y, t) = 0. Plugging this u(x, y, t) into (4.95), we obtain ∂φ/∂t + V 1 + (∂φ/∂ x)2 = 0. If V includes 21
A two-dimensional fluid model applies when the system has a distinct scale in a certain direction (for example, a thin depth) or an external force (for example, Lorentz force or Coriolis force) drives the fluid two dimensionally (cf. Hasegawa [12]).
198
4 Interactions of Micro and Macro Hierarchies
only ∂φ/∂ x, we may write V 1 + (∂φ/∂ x)2 = F(∂φ/∂ x). Denoting U = ∂φ/∂ x we obtain ∂ ∂ U+ F(U ) = 0. ∂t ∂x
(4.96)
Comparing (4.96) with (4.73), we find that singularity (kink of the surface) is created by the nonlinearity. And, a finite diffusion effect works as a singular perturbation that mollifies the singularity. For further studies on the surface dynamics, the reader is referred to Giga [10]. (4) Multi-dimensional fluid dynamics. In the equation of motion (momentum equation) of multi-dimensional fluid, the unknown variable u of (4.92) is the vector function v itself: ∂ v + (v · ∇)v = −∇ p, ∂t
(4.97)
where p is the pressure (more precisely, the enthalpy under a barotropic relation). Here we are omitting the viscosity (εΔv) and external force (F); adding them on the right-hand side, we obtain the Navier–Stokes equation (4.35). To confine the fluid in a domain Ω, we impose a boundary condition n · v = 0,
(4.98)
where n is the unit normal vector onto the boundary ∂Ω. In a compressible fluid, p must be determined by solving the mass conservation equation (u = ρ: the mass density and f = −ρ∇ · v in (4.92)) and relating p and ρ by an equation of state like p = p(ρ) (a barotropic relation). We may also consider an incompressible fluid (∇ · v = 0). Then p is determined so that (4.92) does not violate the condition ∇ · v = 0 (the formulation will be explained later). In one-dimensional geometry (cf. Sect. 4.4.3), the incompressibility condition (∂v/∂ x = 0) trivializes the problem (allowing only homogeneous v). However, in a higher dimension, incompressible fluid mechanics is still of great interest, because a different type of inhomogeneity of the flow—the vorticity characterized by the curl derivative (∇ × v)—may develop. (5) Vortex dynamics (two and three-dimensional). The vortex dynamics in an incompressible fluid is described by the vortex equation that is derived by operating curl on both sides of (4.97). In two-dimensional space, the vorticity is given by ∇ × v = ∂v y /∂ x − ∂vx /∂ y which we denote by ω. Using (4.93), we find that ω satisfies Liouville’s equation (4.94). In three-dimensional space, we obtain ∂ ω − ∇ × (v × ω) = 0. ∂t where ω = ∇ × v is a three-dimensional vector. Rewriting (4.99) as
(4.99)
Notes
199
∂ ω + (v · ∇)ω = (ω · ∇)v, ∂t
(4.100)
we may compare the vorticity equation with the general transport equation (4.92). The term on the right-hand side of (4.100) is called the vortex-tube stretching term, which works as a drive (or a damp) of the vorticity. We note that the two-dimensional vortex equation does not have such a term; in two-dimensional space, vortexes are “conserved.” Two-dimensional fluids tend to self-organize ordered structures (see Fig. 4.4 and Hasegawa [12]). However, in three-dimensional space, vortexes are created (or annihilated) by the inhomogeneity of v, which makes three-dimensional flow strongly turbulent. (6) The mathematical formulation of incompressible fluid dynamics. Let us show how the pressure p is determined in an incompressible flow. Calculating the divergence of both sides of (4.97), and using ∇ · v = 0, we obtain Δ p = −∇ · [(v · ∇)v].
(4.101)
And multiplying both sides of (4.97) by n· and using the boundary condition (4.98), we obtain n · ∇ p = −n · [(v · ∇)v].
(4.102)
As will hereinafter be shown, we can determine v(x, t) without calculating p, i.e., we can eliminate p from the governing equation (4.97). Thus we may solve Poisson’s equation (4.101) for p with giving the right-hand side as well as the boundary condition (4.102) as known functions. Let us eliminate the term ∇ p from (4.97). Generalizing (1.50) for vector functions (here we consider real-valued functions), we define the inner product by (u, v) =
Ω
u(x) · v(x) d x.
We denote the corresponding Hilbert space of vector functions also by L 2 (Ω). For every v such that ∇ · v = 0 (in Ω) and n · v = 0 (on ∂Ω) and every p, we find (assuming that ν and p are sufficiently smooth)
(v, −∇ p) =
Ω
(∇ · v) p d x −
∂Ω
(n · v) p ds = 0,
(4.103)
where ds is the surface element on ∂Ω. Let us define L 2g (Ω) = {−∇ p; p ∈ H 1 (Ω)}, which is a closed subspace of L 2 (Ω). The orthogonal complement of L 2g (Ω) in L 2 (Ω) is shown to be [31, 32] L 2σ (Ω) = {v ∈ L 2 (Ω); ∇ · v = 0 in Ω, n · v = 0 on ∂Ω},
200
4 Interactions of Micro and Macro Hierarchies
i.e., we have the orthogonal decomposition L 2 (Ω) = L 2σ (Ω) ⊕ L 2g (Ω).
(4.104)
Let us denote by Pσ the orthogonal projection onto the subspace L 2σ (Ω). By (4.104), we may write Pσ u = u + ∇ϕ (∀u, ∃ϕ). Applying this Pσ to both sides of (4.97), we obtain ∂ v = −Pσ (v · ∇)v, ∂t
(4.105)
which reads as an evolution equation in the Hilbert space L 2σ (Ω) [1, 9, 19, 32]. (7) Fourier-space dynamics and invariant measure. Remembering Problem 3.4, we may transform the two-dimensional vorticity equation (4.94) to the infinitedimensional system of canonical equations by Fourier decomposition. The state variables are then the real and imaginary parts of the Fourier coefficients, which span the function space of the vorticity ω. In the three-dimensional vortex dynamics, the vorticity ω is a member of L 2σ (Ω). To parameterize ω, we invoke the Beltrami fields that are the eigenfunctions of the curl operator [4, 35]: we consider the eigenvalue problem ∇ × ϕ = λϕ,
(4.106)
with the boundary condition n · ϕ = 0. By (4.106), it is obvious that ∇ · ϕ = 0. Separating the kernel of the curl operator (which, under ∇ · ϕ = 0, constitutes the cohomology on Ω), we can define a self-adjoint curl operator, whose eigenfunctions ϕ j (together with the cohomology h) span thefunction space L 2σ (Ω) (see Yoshida & Giga [35]). So we may write ω(x, t) = j c j (t)ϕ j (x) + ch h(x). Using this parameterization, (4.99) reads as an infinite-dimensional system of ODEs: d c j = (∇ × (v × ω), ϕ j ) = (ν × ω, ∇ × ϕ j ) = λ j (v × ω, ϕ j ) dt = λj ck (v × ϕ k , ϕ j ) + ch (v × h, ϕ j ) . k
By (v × ϕ j ) · ϕ j ≡ 0, we find that ∂(dc j /dt)/∂c j = 0 (∀ j), which means - that the “flow” in the infinite-dimensional state space is incompressible. Thus, j dc j is an invariant measure [15, 18, 28, 34]. (8) Matter–field couplings. We have hitherto considered “neutral fluids” where the unknown variables are only fluid’s parameters (i.e., the velocity, the density and the pressure). In a plasma, however, matter’s motion couples with the electromagnetic field, so we have to analyze the dynamics of the matter and the field together. In Sect. 4.4.4, we have studied the model of a one-dimensional electrostatic plasma. A more general model is formulated with extending the flow v to the “canonical momentum” P = mv + q A (m is the mass of a particle, q is the charge, and A
Problems
201
is the vector potential of the magnetic field). A usual plasma consists of negatively charged electrons as well as positively charged ions; then the motions of both fluids and the electromagnetic field constitute a nonlinear coupled system [11, 20, 25, 36].
Problems 4.1. See what happens if we apply the following normalizations in (4.7) and (4.8): 1. Set L/T |V |. 2. Set L/T |V |. 4.2. Let us derive the Kolmogorov spectrum (4.41) by dimensional analysis. We start by reviewing the general method of dimensional analysis. Let us denote by μ1 , μ2 , · · · , μm the dimensions of basic physical quantities (like time, length, mass). We consider n (> m) quantities which are written as p
p
a1 = c1 μ1 1,1 μ2 2,1 · · · μmpm,1 ,
··· ,
p
p
an = cn μ1 1,n μ2 2,n · · · μmpm,n ,
where c1 , · · · , cn are dimensionless coefficients. A negative power may represent either quotient or differential (for example, we write d x/dt as [L]1 [T ]−1 with denoting the dimensions of x and t by [L] and [T ], respectively). 1. Since n > m, all a1 , . . . , an are not dimensionally independent. Suppose that a product of the powers of a1 , · · · , am has the same dimension of am+1 , i.e., we may write q
q
a1 1 a2 2 · · · amqm = πm+1 am+1 ,
(4.107)
with a dimensionless parameter πm+a . Formulate the relation between the exponents pi, j and qk (1 i, j, k m). q q q −1 . 2. By (4.107) we can define a dimensionless quantity πm+1 = a1 1 a2 2 · · · amm am+1 How many dimensionless quantities can we derive from a1 , · · · , am ? A relation among dimensionless quantities πm+1 , · · · πν : F(πm+1 , · · · , πν ) = 0 is called a complete equation. 3. Let us consider a simple example. The mechanical law of a pendulum’s motion involves the following physical quantities: the length of the string, the mass m of the weight, the period τ of the oscillation, the amplitude θ of the oscillation angle, and the gravitational acceleration g. The dimensions of these quantities consist of three basic dimensions: length [L], mass [M], and time [T ]. Derive dimensionless quantities to formulate a complete equation. 4. From the complete equation, estimate the period τ as a function of other quantities.
202
4 Interactions of Micro and Macro Hierarchies
Now we apply dimensional analysis to the study of the turbulence energy spectrum. We assume that a model of turbulence can be described in terms of the following set of physical quantities: the wave number k, the energy spectrum function E(k), the energy injection (= damping) rate P, and the viscosity coefficient ε. 4. Show the dimensions of the aforementioned quantities. 5. Find dimensionless quantities, and formulate a complete equation. 6. Solving the complete equation for E, find its dependence on the aforementioned quantities. 7. In the inertial range, E is thought to be independent of the viscosity ε. Eliminating the dependence of E on ε, derive the Kolmogorov spectrum. 4.3. 1. Putting φ (1) (ξ, τ ) = V +u(ξ, τ ) (V is a constant of order unity and |u| 1), linearize the KdV equation (4.55). Find a sinusoidal wave solution (such that u(ξ, τ ) = u 0 ei(kξ −ωτ ) ) of the linearized equation. Show that the propagation speed of the sinusoidal wave changes as a function of the wave number k (this effect is called dispersion). 2. Putting φ (1) (ξ, τ ) = Ψ (ξ − cτ ), solve the nonlinear KdV equation (4.55) to find the propagating solution (4.56) (such a solution is called a soliton). 4.4. For the nonlinear plasma wave (ion acoustic wave) equations (4.52), try a different scaling such that ξ = ε(x − t), τ = ε2 t, and show how the equations reduce. 4.5. For the one-dimensional fluid equation (4.44), we define the energy dissipation (due to the viscosity) by
+∞ −∞
∂u ε ∂x
2 d x.
For the standing-shock solution (4.45) of (4.44), calculate the energy dissipation (entropy production) and show that it is independent of ε.
Solutions 4.1 If we choose scales such that L/T |V |, we obtain the normalized coefficient ˇ xˇ , tˇ) in the predicted Vˇ ≈ 0. Then we do not observe the temporal evolution of u( range of time scale. The evolution equation (4.10) degenerates into a trivial equation ˇ tˇ ≈ 0. On the other hand, if we set L/T |V |, we miss the spatial variation ∂ u/∂ ˇ xˇ ≈ 0. of the event; (4.10) degenerates into ∂ u/∂
Solutions
203
4.2 1. Comparing the exponents, we obtain a linear equation ⎞ ⎛ ⎞ ⎛ ⎞ q1 p1,m+1 p1,1 · · · p1,m ⎜ .. . . . ⎟ ⎜ . ⎟ ⎜ . ⎟ ⎝ . . .. ⎠ ⎝ .. ⎠ = ⎝ .. ⎠ . pm,1 · · · pm,m qm pm,m+1 ⎛
(4.108)
2. We define the matrix of exponents such that ⎛
⎞ p1,1 · · · p1,n ⎜ ⎟ M = ⎝ ... . . . ... ⎠ . pm,1 · · · pm,n We denote the rank of M by r . From the construction of (4.108), it is evident that we can define n − r dimensionless products πm+1 , · · · , πm−r +n . 3. The matrix of exponents reads as ⎛
⎞ 1000 1 M = ⎝0 1 0 0 0 ⎠, 0 0 1 0 −2 whose rank is three. Thus, we can define two dimensionless quantities. The amplitude θ (angle variable) is already dimensionless. The other one is π5 = τ g 1/2 −1/2 . 4. Using the aforementioned dimensionless quantities the complete equation is written as F(θ, π5 ) = 0, by which we may define the period τ as an implicit function: τ = f (θ )( /g)1/2 . To determine f (θ )—the dependence of the period of oscillation on the amplitude— we need to solve the equation of motion (see Sect. 2.1). However, the dependence of the period on other parameters is already revealed by the dimensional analysis. 5. We denote by [L] and [T ] the dimensions of length and time, respectively. The wave number k has dimension [L]−1 , and the fluid velocity u has dimen]−1 . We define the energy of the fluid element of unit mass: E = sion1 [L][T 2 3 mass density and |Ω| is the volume of Ω 2 ρ|u| d x/(ρ|Ω|), where ρ is the the fluid. By Fourier transform ˆf = f ei k·x d 3 x/|Ω|, we may rewrite (assuming a homogeneous ρ) E = 12 |ˆv |2 d 3 k|Ω|. Assuming isotropic distribution, we write d 3 k = 4π k 2 dk. The energy spectrum function is defined by E = ∞ 1 ∞ |˜v |2 4π k 2 dk|Ω| ≡ 12 0 E(k, t)dk. By these definitions, vˆ has dimension 2 0
204
4 Interactions of Micro and Macro Hierarchies
[L][T ]−1 , E has dimension [L]2 [T ]−2 , and E(k) has dimension [L]3 [T ]−2 . The dimension of the energy injection rate P is [L]2 [T ]−3 (because it is the rate of change of E) and that of the kinematic viscosity ε is [L]2 [T ]−1 (because εΔu has the same dimension of ∂ u/∂t). In summary, we have [P] = [L]2 [T ]−3 , [ε] = [L]2 [T ]−1 , [E] = [L]3 [T ]−2 , [k] = [L]−1 . 6. From the foregoing dimensional relations, the matrix of exponents is M=
2 2 3 −1 . −3 −1 −2 0
The rank of M is two, so we can compose two dimensionless quantities. First we match E with the combination of P and ε, i.e., we solve [P] p [ε]q = [E], which reads as
2 2 −3 −1
p 3 = . q −2
Solving this equation yields p = 1/4 and q = 5/4. Thus, we obtain π1 = E/(P 1/4 ε5/4 ). Similarly, solving [P] p [ε]q = [k], we obtain π2 = k/(P 1/4 ε−3/4 ). Here, the denominator, which has the dimension of a wave number, is Kolmogorov’s wave number k D ; see (4.40). Thus, the complete equation reads as F(π1 , π2 ) = F(E/( 1/4 ν 5/4 ), k/k D ) = 0.
(4.109)
7. The implicit function of (4.109) determines E as E = P 1/4 ε5/4 f (π2 ) = P 1/4 ε5/4 f (k/k D ). −5/3
8. To eliminate ε from E of (4.110), we demand f (π2 ) = π2 reads as E = Ck P 2/3 k −5/3 ,
(4.110) . Then, (4.110)
(4.111)
where Ck is a dimensionless number. 4.3 1. Neglecting the higher-order terms, we obtain a linear equation ∂ ∂ 1 ∂3 u = 0. u+V u+ ∂τ ∂ξ 2 ∂ξ 3
(4.112)
Solutions
205
Plugging u(ξ, τ ) = u 0 ei(kξ −ωτ ) into (4.112) yields (−iω + i V k − ik 3 /2)u = 0. For this to be satisfied, we demand ω = V k − k 3 /2. The propagation speed (phase velocity) of the sinusoidal wave is given by ω/k = V − k 2 /2, which varies as a function of k. 2. Plugging φ (1) (ξ, τ ) = Ψ(ξ − cτ ) into (4.55), we obtain −c
∂ ∂ 1 ∂3 Ψ = 0. Ψ+Ψ Ψ+ ∂ξ ∂ξ 2 ∂ξ 3
Integrating yields d2 Ψ = −U (Ψ), dξ 2
1 U (Ψ) = −aΨ − cΨ2 + Ψ3 , 3
(4.113)
where a is the integration constant. This ODE is integrable invoking the method of Sect. 2.2.2. Multiplying both sides of (4.113) by dΨ/dξ , we obtain 1 d 2 dξ
2 d d Ψ = − U (Ψ). dx dξ
Integrating, we obtain √ dΨ = 2dξ. √ a − U (Ψ) Since U (Ψ) is a third-order polynomial, the integral on the left-hand side yields the elliptic integral. The simplest solution is (with putting a = a = 0), Ψ = 3c sech2
c/2(ξ − ξ0 ) ,
which is a single-pulse wave, a “Soliton”. 4.4 Under this scaling, the differential operators are written as ∂ ∂ ∂ = −ε + ε2 , ∂t ∂ξ ∂τ
∂ ∂ =ε . ∂x ∂ξ
Using these relations and (4.52) in (4.49), (4.50), and (4.51), we obtain to the lowest order: n (1) = u (1) = φ (1) , and to the next order: ∂ (1) ∂ φ + φ (1) φ (1) = 0. ∂τ ∂ξ
206
4 Interactions of Micro and Macro Hierarchies
Comparing with (4.55), we find that the dispersion term is lost. 4.5 For u(x) = −2tanh(x/ε), we observe
+∞
−∞
∂u ε ∂x
2
dx =
+∞ −∞
4 16 sech4 (x/ε)d x = . ε 3
References 1. Berger, M.: Nonlinearity and Functional Analysis—Lectures on Nonlinear Problems in Mathematical Analysis, Academic Press, New York (1977) 2. Biskamp, D., Welter, H.: Dynamics of decaying two-dimensional magnetohydrodynamic turbulences, Phys. Fluids B 1, 1964–1979 (1989) 3. Brazen, P.G., Johnson, R.S.: Solitons—An Introduction, Cambridge Univ. Press, Cambridge (1989) 4. Chandrasekhar, S., Kendall, P.C.: On force-free magnetic fields, Astrophys. J. 126, 457–460 (1957) 5. Courant, R., Friedrichs, K.O.: Supersonic Flow and Shock Waves, Interscience, New York (1948) 6. Deleuze, G., Guattari, F.: Mille Plateaux—Capitalisme et Schizophr´enie, Minuit, Paris (1980) 7. Foias, C., Manley, O., Rosa, R., Temam, R.: Navier-Stokes Equations and Turbulence, Cambridge Univ. Press, Cambridge (2001) 8. Frisch, U., Turbulence—The Legacy of A.N. Kolmogorov, Cambridge Univ. Press, Cambridge (1995) 9. Fujita, H., Kato, T.: On the Navier-Stokes initial value problem I, Arch. Rational Mech. Anal. 16, 269–315 (1964) 10. Giga, Y.: Surface Evolution Equations—A Level Set Approach, Birkhauser, Basel-BostonBerlin (2006) 11. Grad, H.: Mathematical problems arising in plasma physics, In: Proc. Internat. Congress Math., Nice, 1970, pp. 105–113, Gauthier (1971) 12. Hasegawa, A.: Self-organization processes in continuous media, Adv. Phys. 34, 1–42 (1985) ¨ 13. Hopf, E.: Uber die Anfangwertaufgabe f¨ur die hydrodynamischen Grundgleichungen, Math. Nachr. 4, 213–231 (1951) 14. Infeld, E., Rowlands, G.: Nonlinear Waves, Solitons and Chaos (2nd ed.), Cambridge Univ. Press, Cambridge (2000) 15. Ito, N., Yoshida, Z.: Statistical mechanics of magnetohydrodynamics, Phys. Rev. E 53, 5200–5206 (1996) 16. Jeffrey, A., Taniuti, T.: Non-Linear Wave Propagation with Applications to Physics and Magnetohydrodynamics, Academic Press, New York (1964) 17. Kato, T.: On classical solutions of the two-dimensional non-stationary Euler equation, Arch. Rational Mech. Anal. 25, 188–200 (1967) 18. Kraichnan, R.H.: Irreversible statistical mechanics of incompressible hydromagnetic turbulence, Phys. Rev. 109, 1407–1422 (1958) 19. Ladyzhenskaya, O.A.: The Mathematical Theory of Viscous Incompressible Flow, Gordon & Breach, New York (1969) 20. Ladyzhenskaya, O.A., Solonnikov, V.A.: The linearization principle and invariant manifolds for problems of magnetohydrodynamics, J. Soviet Math. 8, 384–422 (1977) 21. Lax, P.D.: Hyperbolic systems of conservation laws II, Comm. Pure Appl. Math. X, 537–566 (1957) 22. Lax, P.D.: Functional Analysis, John Wiley & Sons, New York (2002)
References
207
23. Leray, J.: Etude de diverses e´ quations integrales nonlin´eaires et quelque probl`emes que pose l’hydrodynamique, J. Math. Pures Appl. 12, 1–82 (1933) 24. Leray, J.: Sur le movement d’un liquide visqueux emplissant l’espace, Acta Math. 63, 193–248 (1934) 25. Mahajan, S.M., Yoshida, Z.: Double curl Beltrami flow—diamagnetic structures, Phys. Rev. Lett. 81, 4863–4866 (1998) 26. Mandelbrot, B.B.: The Fractal Geometry of Nature (updated and augmented), W.H. Freeman and Company, New York (1983) 27. McGrath, F.J.: Nonstationary plane flow of viscous and ideal fluids, Arch. Rational Mech. Anal. 27, 329–355 (1968) 28. Montgomery, D., Turner, L., Vahala, G.: Three-dimensional magnetohydrodynamic turbulence in cylindrical geometry, Phys. Fluids 21, 757–764 (1978) 29. Oono, Y.: Nonlinearity and renormalization, Butsuri 52, 501–507 (1997) [in Japanese] 30. Peitgen, H.O., J¨urgens, H., Saupe, D.: Chaos and Fractals—New Frontiers of Science. Springer-Verlag, New York (1992) 31. Schwarz, G., Hodge Decomposition—A Method for Solving Boundary Value Problems, Springer-Verlag, Berlin-Heidelberg (1995) 32. Temam, R.: Navier-Stokes Equations, North-Holland, Amsterdam (1984) 33. Temam, R.: Navier-Stokes Equations and Nonlinear Functional Analysis (2nd ed.), SIAM, Philadelphia (1995) 34. Turner, L.: Statistical magnetohydrodynamics of a reversed-field pinch, Phys. Rev. A 24, 2839–2842 (1981) 35. Yoshida, Z., Giga, Y.: Remarks on spectra of operator rot, Math. Z. 204, 235–245 (1990) 36. Yoshida, Z., Mahajan, S.M.: Simultaneous Beltrami conditions in coupled vortex dynamics, J. Math. Phys. 40, 5080–5091 (1999) 37. Yosida, K.: Functional Analysis (6th ed.), Springer-Verlag, Berlin-Heidelberg (1980)
Index
A Accelerating-type nonlinearity, 25 Action, 88 Algebraic instability, 64 Arnold-Beltrami-Childress (ABC) flow, 73 Associative law, 59 Attractor, 117, 140 strange—, 118 Autonomous system, 48 B Bacon, Francis, 2 Banach space, 35 Basin, 140 Basis, 13, 47, 97 Beltrami fields, 200 Bifurcation, 31 Blow-up, 27 Branch, 31 Burgers’ equation, 175 C Canonical transformation, 92 Carleman embedding, 66 Cascade (of energy), 177 Casimir invariant, 103 Causality, 58, 124 Chaos, 24, 67, 80, 94 phenomenological—, 111 quantum—, 143 Chapman-Kolmogorov equality, 124 Characteristic curve, 191 Cole-Hopf transformation, 179 Collective order, 81 Collective phenomena, 132 Commutative law, 59 Complete equation, 201 Complete solution, 84 Composition, 3
Conjugate variables, 91 Constant of motion, 75, 84 Continuous spectrum, 98 Contractive map, 39 Critical point, 29 D Decelerating-type nonlinearity, 25 Decomposition, 3, 58, 94 D´econstruction, 5 Degree, 38 Degree of freedom, 47 Dependent variable, 17 Derrida, Jacques, 5 Descartes, Ren´e, 1 Determinism, 79, 112 Diffusion equation, 23, 138, 164, 180, 184 Dimension, 118, 162 Dimensional analysis, 201 Dispersion, 202 Dissipative system, 104, 117 Distribution, 194 Distribution function, 128 E Eigenvalue problem, 61 Element, 154 Elliptic function, 55, 107 Entropy, 130, 144 Entropy condition, 193 Equation of collective motion, 82 Equation of motion, 21, 52 collective—, 134 Hamilton’s—, 90 Hamilton’s canonical—, 90 Newton’s—, 51 Ergodic hypothesis, 131 Euler’s equation, 185 Evolution equation, 21, 97
Z. Yoshida, Nonlinear Science, DOI 10.1007/978-3-642-03406-0, C Springer-Verlag Berlin Heidelberg 2010
209
210 Expansion wave, 184 Explosion, 27 Exponential law, 20 Extinction, 28 F Fixed point, 38, 68 Fluid mechanics, 196 Fokker-Planck equation, 138, 145 Fourier expansion, 36, 148 Fractal, 118, 140, 159 Function, 52 Function space, 33, 97, 187 G Galilei, Galileo, 3, 45 Γ -space, 122 Generating function, 93 Generator, 60 Gibbs distribution, 129, 146 Graph, 15 pleated—, 32 Group, 59, 60 semi—, 60 H Hamilton’s canonical equation, 90, 101 Hamilton’s equation, 90 Hamilton’s principle, 88 Hamilton-Jacobi equation, 93, 101, 197 Hamiltonian, 89 Hamiltonian flow, 102, 197 Hausdorff dimension, 140 Heat death, 111, 132 Hierarchy – of function space, 188 scale—, 154, 158, 177 Hilbert space, 35 Homotopy invariant, 38 H-theorem, 125 I Implicit function, 16 Incompressible flow, 102, 117, 176, 199 Indecomposable process, 126 Independent variable, 17 Initial-value problem, 21, 36, 51, 82 Inner product, 14, 35 Integrable, 57, 79 non—, 57, 80 Invariant measure, 148, 200 Invariant set, 140 Isolated system, 129
Index J Jordan block, 63 K KdV equation, 182, 202 Klimontovich distribution function, 135 Koch curve, 160 Kolmogorov scale, 177 Kolmogorov’s equation, 138, 145 Kolmogorov spectrum, 178, 201 L Lagrange derivative, 81 Lagrangian, 88 Langevin’s equation, 123, 146 Law of vector composition, 11, 12 Least-action principle, 88 Lebesgue space, 34 Level-set, 78, 130 —dynamics, 197 Linear —approximation, 53 — map, 17 — operator, 17 —space, 4, 12 Linear theory, 3 Liouville’s equation, 102, 147, 197 Liouville’s theorem, 102, 148 Lipschitz continuity, 37, 39 Logistic map, 67, 73, 105, 115 Lorenz’s equation, 24 Lyapunov exponent, 113 M Macro, 154 Macro-system, 111 Malthusian law, 20 Markov process, 123 Metrically transitive process, 126 Micro, 154 Micro-canonical ensemble, 129 Mixing, 115 Monotonicity, 40 μ-space, 122 Multi-scales, 160, 183 N Navier-Stokes equation, 176, 185 Newton, Isaac, 48 Non-equilibrium, 133 Non-integrable, 57, 80 Nonlinear, 5 —phenomenon, 24
Index Norm, 35 Normalization, 9, 162 O Orbit, 47 Ordinary differential equation (ODE), 21, 36, 50 characteristic—, 100, 191 P Parameterization, 4, 10 Partial differential equation (PDE), 82, 100, 138, 164 Pendulum equation, 53 Plasma, 171, 180, 200 Poisson bracket, 102 Poincar´e plot, 73 Poisson’s equation, 135, 181, 199 Prediction, 112 Principle of equal weight, 127 Proportionality relation, 8 Q Quantum chaos, 143 R Random, 122 Rankine-Hugoniot relation, 193, 194 Reductionism, 2 Reductive perturbation method, 181 Regular, 27 Renormalization, 189 Resonance, 62 Return map, 72 Reynolds number, 176 Riemann problem, 184 S Saturation, 27 Scale, 5, 8, 158 –invariance, 167 –parameter, 165 –separation, 164 Schr¨odinger’s equation, 23, 97, 142 Secular perturbation, 190 Secular term, 64 Self-adjoint operator, 98 Semigroup, 60 Separable type, 94 Shock, 169, 179, 183, 191 Sinai billiard, 142
211 Singular perturbation, 170, 172 Singularity, 27, 168, 178 Sobolev space, 188 Soliton, 182, 202 Spectral resolution, 99 Stability, 113 Lyapunov—, 113 State space, 47 Stochastic process, 121 Strange attractor, 118 Structure, 155 Successive approximation, 39 Symmetry, 86 Syndrome, 4, 154 System, 154 macro—, 111 T Taylor expansion, 9 Test particle, 122 Time constant, 20, 113 Topological defect, 103 Topology, 34, 157 of function space, 187 Topos, 158 Transition probability, 123 Transport equation, 196 U Unit, 160, 162 Universality, 157, 160 Unpredictability, 29 V Variational principle, 87 Vector, 4, 46 law of — composition, 12 —space, 4, 12 Vlasov’s equation, 136 Von Neumann’s theorem, 99 Vortex, 170, 186, 198 —equation, 198 —tube stretching, 199 Vorticity, 170, 186, 198 W Wave equation, 82, 163 Weak form, 194 Weak solution, 170, 184, 193 Wiener process, 143