Geometric Mechanics

GEOMETRIC MECHANICS RICHARD TALMAN Wiley-VCH Verlag GmbH & Co. KGaA This Page Intentionally Left Blank GEOMETRIC ...

Author: Richard Talman

500 downloads 2881 Views 23MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

GEOMETRIC MECHANICS

RICHARD TALMAN

Wiley-VCH Verlag GmbH & Co. KGaA

This Page Intentionally Left Blank

GEOMETRIC MECHANICS

This Page Intentionally Left Blank

GEOMETRIC MECHANICS

RICHARD TALMAN

Wiley-VCH Verlag GmbH & Co. KGaA

All books published by Wiley-VCH are carefully produd. Nevertheless, authors, editors, and publishex do not WBITBnt the information contained in these books,including this book, to be fieof arors. Readers are advised tokeepinmind that statements,data, illustrations, procedural details or other items may inadvertently be inaccurate.

Library of Congress Card No.: Applied for British Ilbmry Cataloging-ib.PublictionData: A catalogue record for this book is available fkom the British Library

Bibliographic information pubhhed by Me Deutsche BibUothek Die Deursche Bibliothek lists this publication in the Deutsche Natidbibliogdie; detailed bibliographic data is available in the Internet at
WILEY-VCH Valsg GmbH & CO.KGaA, W d i m

Au rights reserved (incl* those of translation into other languages). No part of this book may be reproducedin any form - nor transmittedor translated into machine language without written permission hm the publishers. Registered names,&ademarb,etc. used in this book, even when not specifically marked as such,are not to be considered unprotected by law. Printed in the Federal Republic of Germany Printed on acid-fie peper

Printing and Bookbinding buch bUcher dd ag, Birkach ISBN-13 9784471-15738-0 ISBN-10: 0-471-15738-4

~~

CONTENTS

Preface

xvii

Introduction

xxix

PART I

SOLVABLE SYSTEMS

1 Review of Solvable Systems

1 3

1.1 Introduction and Rationale / 3

1.2 Worked Examples I 6 1.2.1 Particle in One-Dimensional Potential / 6 1.2.2 Particle in Space / 12 1.2.3 Weakly Coupled Oscillators / 12 1.2.4 Conservation of Momentum and Energy / 24 1.2.5 Effective Potential / 25 1.2.6 Multiparticle Systems / 28 1.2.7 DimensionaVScalingConsiderations / 3 1 Bibliography / 31

PART II THE GEOMETRY OF MECHANICS 2 Geometry of Mechanics I: Linear

33 35

2.1 Introduction / 35 2.2 Pairs of Planes as Covariant Vectors / 37 V

Vi

CONTENTS

2.3 DifferentialForms / 42 2.3.1 Geometric Interpretation I 42 2.3.2 Examples Illustrating the Calculus of Differential Forms I 46 2.3.3 Connections between Differential Forms and Vector Calculus I 50 2.4 AlgebraicTensors I 54 2.4.1 Vectors and Their Duals I 54 2.4.2 Transformation of Coordinates I 56 2.4.3 Transformation of Distributions I 60 2.4.4 Multi-Index Tensors and Their Contraction / 61 2.4.5 Overlap of Tensor Algebra and Tensor Calculus I 64 2.5 (Possibly Complex) Cartesian Vectors in Metric Geometry I 66 2.5.1 Euclidean Vectors I 66 2.5.2 Skew Coordinate Frames I 68 2.5.3 Reduction of a Quadratic Form to a Sum or Difference of Squares I 68 2.5.4 Introduction of Covariant Components I 70 2.5.5 The Reciprocal Basis I 71 2.5.6 Wavefronts, Lattice Planes, and Bragg Reflection I 74 2.6 Unitary Geometry / 80 2.6.1 Solution of Singular Sets of Linear Equations I 84 Bibliography I 88 3 Geometry of Mechanics Ii: Curvilinear

3.1 (Real) Curvilinear Coordinates in N-Dimensions I 89 3.1.1 Introduction I 89 3.1.2 TheMetricTensor I 90 3.1.3 Relating Coordinate Systems at Different Points in Space I 92 3.1.4 The Covariant (or Absolute) Differential / 97 3.1.5 Derivation of the Lagrange Equations of Mechanics from the Absolute Differential I 101 3.1.6 Practical Evaluation of the Christoffel Symbols I 106 3.1.7 Evaluation of the Christoffel Symbols Using Maple / 107 3.2 Absolute Derivatives and the Bilinear Covariant I 109 3.3 Lie Derivative-Coordinate Approach I 111 3.3.1 Lie-Dragged Coordinate Systems I 111

89

CONTENTS

Vii

3.3.2 Lie Derivatives of Scalars and Vectors I 115 3.4 Lie Derivative-Lie Algebraic Approach I 120 3.4.1 Exponential Representation of Parameterized Curves I 120 3.4.2 Identification of Vector Fields with Differential Operators I 121 3.4.3 LoopDefect I 122 3.4.4 Coordinate Congruences I 123 3.4.5 Commutators of Quasi-Basis-Vectors I 125 3.4.6 Lie-Dragged Congruences and the Lie Derivative I 126 Bibliography I 130 4 Geometry of Mechanics 111: Multilinear 4.1 Generalized Euclidean Rotations and Reflections I 132 4.1.1 Introduction I 132 4.1.2 Reflections I 133 4.1.3 Expressing a Rotation as a Product of Reflections I 134 4.1.4 The Lie Group of Rotations I 136 4.2 Multivectors I 137 4.2.1 “Volume” Determined by Three and by n Vectors I 137 4.2.2 Bivectors I 139 4.2.3 Multivectors and Generalization to Higher Dimensionality I 140 4.2.4 “Supplementary”Multivectors I 142 4.2.5 Sums of p-Vectors I 143 4.2.6 Bivectors and Infinitesimal Rotations / 143 4.3 Curvilinear Coordinates (Continued) I 146 4.3.1 Local Radius of Curvature of a Particle Orbit / 146 4.3.2 Generalized Divergence and Gauss’s Theorem I 147 4.4 Integration and Exterior Differentiation of Forms I 150 4.4.1 One-Dimensional Integrals I 150 4.4.2 Two-Dimensional Integrals I 151 4.4.3 Metric-Free Definition of the “Divergence” of a Vector I 153 4.5 Spinors in Three-Dimensional Space I 155 4.5.1 Definition of Spinors I 155 4.5.2 Demonstration That a Spinor Is a Euclidean Tensor I 156

132

Viii

CONTENTS

4.5.3 Associating a 2 x 2 Reflection (Rotation) Matrix with a Vector (Bivector) / 156 4.5.4 Associating a Matrix with a Trivector (Triple Product) 1 157 4.5.5 Representations of Reflections / 158 4.5.6 Representations of Rotations I 158 4.5.7 Operations on Spinors / 159 4.5.8 Real Euclidean Space / 160 4.5.9 Real Pseudo-Euclidean Space I 161

Bibliography I 161

PART Ill

LAGRANGIAN MECHANICS

5 Lagrange-Poincare Description of Mechanics

163 165

5.1 Review of the Lagrange Equations I 165 5.2 The Poincark Equation I 166 5.2.1 Some Features of the Poincark Equations / 175 5.2.2 Invariance of the Poincart5 Equation / 176 5.2.3 Translation into the Language of Forms and Vector Fields / 177 5.2.4 Example: Free Motion of a Rigid Body with One Point Fixed I 178 5.3 Vector Field Derivation of the P o i n c d Equation / 181 5.3.1 The Basic Problem of the Calculus of Variations / 181 5.3.2 The Euler-Lagrange Equations I 182 5.3.3 Calculus of Variations Using the Algebra of Vector Fields / 183

Bibliography / 187 6 Simplifying the Poincard Equation With Group Theory

6.1 Continuous Transformation Groups / 188 6.2 Use of Infinitesimal Group Parameters as Quasi-coordinates I 191 6.3 Infinitesimal Group Operators I 193 6.4 Commutation Relations and Structure Constants of the Group / 200

6.5 Qualitative Aspects of Infinitesimal Generators / 202

188

CONTENTS

iX

6.6 The Poincari Equation in Terms of Group Generators I 205 6.7 The Rigid Body Subject to Force and Torque I 207 6.7.1 Group Parameters as Quasi-Coordinates I 207 6.7.2 Description Using Body Axes I 209 6.8 Example: Rolling Sphere I 212 6.8.1 Commutation Relations Appropriate for Rigid Body Motion t 212 6.8.2 Bowling Ball Rolling without Slipping I 214 Bibliography I 220 7 Conservation Laws and Symmetry

221

7.1 Multiparticle Conservation laws I 22 1 7.1.1 Conservation of Linear Momentum I 221 7.1.2 Rate of Change of Angular Momentum: Poincari Approach I 223 7.1.3 Conservation of Angular Momentum: Lagrangian Approach I 224 7.1.4 Conservation of Energy I 225

7.2 Cyclic Coordinates and Routhian Reduction t 226 7.2.1 Integrability; Generalization of Cyclic Variables I 229 7.3 Noether’sTheorem I 230 Bibliography

t 233

PART IV NEWTONIAN MECHANICS

235

8 Gauge-Invariant Mechanics 8.1 Vector Mechanics I 237 8.1.1 Vector Description in Curvilinear Coordinates t 237 8.1.2 The Frenet-Serret Formulas I 240 8.1.3 Vector Description in an Accelerating Coordinate Frame I 244 8.1.4 Exploiting the Fictitious Force Description I 249 8.2 Single-Particle Equations in Gauge-Invariant Form / 255 8.2.1 Newton’s Force Equation in Gauge-Invariant Form I 256 8.2.2 Active and Passive Interpretations of Time Evolution t 259

237

X

CONTENTS

8.2.3 Continued Discussion of the Cartan Matrix I 261 8.2.4 Reconciling Fictitious Force and Gauge-Invariant Descriptions I 262 8.2.5 Newton’s Torque Equation I 265 8.2.6 ThePlumbBob I 266 8.3 Lie Algebraic Description of Rigid Body Motion I 271 8.3.1 Space and Body Frames of Reference / 272 8.3.2 Review of the “Association” of 2 x 2 Matrices to Vectors I 274 8.3.3 “Association” of 3 x 3 Matrices to Vectors I 276 8.3.4 Some Interpretive Comments Concerning Similarity Transformations I 277 8.3.5 Rigid Body Equations in Rotating Frame I 279 8.3.6 The Euler Equations for a Rigid Body / 281 Bibliography I 282 9 Geometric Phases

284

9.1 The Foucault Pendulum I 284 9.1.1 Fictitious Force Solution I 286 9.1.2 Gauge-Invariant Solution I 287 9.2 “Parallel” Displacement of Coordinate Axes / 290 9.3 Tumblers, Divers, Falling Cats, etc. I 295 9.3.1 Miscellaneous Issues / 301 Bibliography I 304

PART V HAMILTONIAN MECHANICS 10 Hamiltonian Treatment of Geometric Optics

10.1 Motivation / 307 10.1.1 The Scalar Wave Equation / 309 10.1.2 The Eikonal Equation / 3 10 10.1.3 Determination of Rays from Wavefronts I 311 10.1.4 The Ray Equation in Geometric Optics I 312 10.1.5 Variation of Light Intensity along a Ray I 3 14 10.2 Variational Principles I 316

305 307

Xi

CONTENTS

10.2.1 The Lagrange Integral Invariant and Snell’s Law / 316 10.2.2 The Principle of Least Time / 3 17 10.3 Paraxial Optics, Gaussian Optics, Matrix Optics / 319 10.4 Huygens’ Principle I 322 Bibliography I 324

11 HamiltonJacobi Theory

325

11.1 The Hamilton-Jacobi Equation / 325 11.1.1 Derivation / 325 11.1.2 The Geometric Picture / 327 11.1.3 Constant S Wavefronts I 328 11.2 Trajectory Determination Using the Hamilton-Jacobi Equation I 329 11.2.1 Complete Integral / 329 11.2.2 Finding a Complete Integral by Separation of Variables / 329 11.2.3 Hamilton-Jacobi Analysis of Projectile Motion / 330 11.2.4 The Jacobi Method for Exploiting a Complete Integral I 332 11.2.5 Completion of Projectile Example / 334 11.2.6 The Time-IndependentHamilton-Jacobi Equation / 335 11.2.7 Hamilton-Jacobi Treatment of 1-D Simple Harmonic Motion I 335 11.2.8 The Kepler Problem / 337 11.3 Analogies Between Optics and Quantum Mechanics / 343 11.3.1 General Discussion / 343 11.3.2 Classical Limit of the Schrodinger Equation / 343 11.3.3 Condition for Validity of Semiclassical Treatment I 345 12 Relativistic Mechanics 12.1 Review of Special Relativity Theory / 348 12.1.1 Form Invariance / 348 12.1.2 World Points / 349 12.1.3 World Intervals / 349 12.1.4 ProperTime / 350 12.1.5 The Lorentz Transformation / 351 12.1.6 Transformation of Velocities / 352 12.1.7 Four-Vectors / 353

348

Xi1

CONTENTS

12.1.8 Antisymmetric 4-Tensors / 355 12.1.9 The 4-Gradient of a 4-Scalar Function / 355 12.1.10The 4-Velocity and 4-Acceleration I 355 12.2 The Relativistic Principle of Least Action / 356 12.3 Energy and Momentum / 357 12.3.1 4-Vector Notation I 358 12.4 Relativistic Hamilton-Jacobi Theory / 359

12.5 ForcedMotion / 360 12.6 Generalizationof the Action to Include ElectromagneticForces / 360 12.7 Derivation of the Lorentz Force Law / 362 12.8 GaugeInvariance / 363 12.9 Trajectory Determination / 364 12.9.1 Motion in a Constant Uniform Electric Field / 364 12.9.2 Motion in a Constant Uniform Magnetic Field I 365 12.10 The Longitudinal Coordinate as Independent Variable / 366 Bibliography / 368

13 Symplectic Mechanics

13.1 Derivation of Hamilton’s Equations I 369 13.1.1 Charged Particle in Electromagnetic Field / 371 13.2 Recapitulation I 372 13.3 The Symplectic Properties of Phase Space / 373 13.3.1 The Canonical Momentum One-Form / 373 13.3.2 The Symplectic Two-Form G / 376 13.3.3 Invariance of the Symplectic Two-Form / 379 13.3.4 Use of G to Associate Vectors and One-Forms / 381 13.3.5 Explicit Evaluation of Some Inner Products I 381 13.3.6 The Vector Field Associated with d7i I 382 13.3.7 Hamilton’s Equations in Matrix Form / 383 13.4 Symplectic Geometry / 385 13.4.1 Symplectic Products and Symplectic Bases / 386 13.4.2 Symplectic Transformations / 387

369

Xiii

CONTENTS

13.4.3 Properties of Symplectic Matrices / 388 13.4.4 Alternate Coordinate Ordering I 395 13.5 Poisson Brackets of Scalar Functions / 396 13.5.1 The Poisson Bracket of Two Scalar Functions / 396 13.5.2 Properties of Poisson Brackets / 397 13.5.3 The Poisson Bracket and Quantum Mechanics I 398 13.6 Integral Invariants / 399 13.6.1 Integral Invariants in Electricity and Magnetism / 399 13.6.2 The PoincarkXartan Integral Invariant / 402 13.7 Invariance of the Poincari-Cartan Integral Invariant 1.1. / 405 13.7.1 The Extended Phase Space Two-Form and Its Special Eigenvector / 405 13.7.2 Proof of Invariance of the Poincark Relative Integral Invariant / 407 13.8 Symplectic System Evolution / 409 13.8.1 Liouville’s Theorem and Generalizations I 410 Bibliography I 412

PART VI APPROXIMATE METHODS

413

14 Analytic Basis for Approximation

415

14.1 Canonical Transformations / 415 14.1.1 The Action as a Generator of Canonical Transformations / 415 14.2 Time-Independent Canonical Transformation I 419 14.3 Action-Angle Variables / 421 14.3.1 The Action Variable of a Simple Harmonic Oscillator I 421 14.3.2 Adiabatic Invariance of the Action I / 422 14.3.3 Action/Angle Conjugate Variables / 426 14.3.4 Parametrically Driven Simple Harmonic Motion / 428 14.4 Examples of Adiabatic Invariance / 431 14.4.I Variable-Length Pendulum / 43 1 14.4.2 Charged Particle in Magnetic Field / 432 14.4.3 Charged Particle in a Magnetic Trap / 434

XiV

CONTENTS

14.5 Accuracy of Conservation of Adiabatic Invariants I 439 14.6 Conditionally Periodic Motion I 442 14.6.1 Stkkel’s Theorem I 443 14.6.2 Anglevariables I444 14.6.3 ActiodAngle Coordinates for Keplerian Satellites I 447

Bibliography I 448 15 Linear Systems

449

15.1 Linear Equations With Constant Coefficients I 449 15.2 Treatment of Velocity Terms I 452 15.3 Linear Hamiltonian Systems I 456 15.3.1 Inhomogeneous Equations I 457 15.3.2 Exponentiation, Diagonalization, and Logarithm Formation of Matrices I 457 15.3.3 Eigensolutions I 460 15.3.4 Eigenvectors of a Linear Hamiltonian System I 461 15.4 A Lagrangian Set of Solutions I 465 15.5 Periodic Linear Systems I 467 15.5.1 Floquet’s Theorem I 468 15.5.2 Lyapunov’s Theorem I 469 15.5.3 Characteristic Multipliers, Characteristic Exponents I 470 15.5.4 The Variational Equations / 471 15.5.5 Periodic Solutions of Inhomogeneous Equation I 473

Bibliography I 478

16 PerturbationTheory 16.1 The Lagrange Planetary Equations I 480 16.1.1 Derivation of the Equations I 480 16.1.2 Relation Between Lagrange and Poisson Brackets I 484 16.1.3 Advance of Perihelion of Mercury I 485 16.2 Iterative Analysis of Anharmonic Oscillations I 489 16.3 The Method of Krylov and Bogoliubov I 495 16.3.1 First Approximation I 495 16.3.2 Examples I 497

479

CONTENTS

XV

16.3.3 Equivalent Linearization I 499 16.3.4 Power Balance, Harmonic Balance I 501 16.3.5 Qualitative Analysis of Autonomous Oscillators I 502 16.3.6 Higher Approximation I 507 16.4 Multidimensional,Near-Symplectic Perturbation Theory I 5 10 16.4.1 PureDamping I 518 16.4.2 Time-DependentPerturbations I 521 16.4.3 Resonance Induced by Time-Dependent Perturbation I 523 16.4.4 Threshold of Instability I 527 16.5 SuperconvergentPerturbation Theory I 528 16.5.1 Canonical Perturbation Theory I 528 16.5.2 Application to Gravity Pendulum I 530 16.5.3 Superconvergence I 532 Bibliography I 533

Index

535

This Page Intentionally Left Blank

PREFACE This text is designed to accompany a mechanics course for students who have already encountered Lagrange’s equations. Its purpose is to introduce students to a somewhat more “modern,” but not intentionally more difficult, treatment than Goldstein without requiring the mathematical sophistication assumed by, say, Arnold. It has been used at Cornell in physics courses taken by advanced undergraduates and beginning graduate students. Though the systems analyzed come primarily from the domain customarily called mechanics, the methods employed are intended to be introductory to more advanced topics such as quantum mechanics, elementary particle physics and general relativity. With this emphasis, the geometric chapters of the text have been used as the text for a course “Geometric Concepts of Physics,” taken by advanced undergraduates at Cornell. Communications pointing out errors, making comments, or suggesting improvements will be appreciated. E-mail address; talman @lns62.lns.cornell.edu. Coining the phrase “it takes an institution” to foster scholarship, the institutions contributing (in equal parts) to this text have been the public schools of London, Ontario, and universities, U.W.O., Caltech, and Cornell. I have profited, initially as a student, and later from students, at these institutions, and from my colleagues there and at accelerator laboratories worldwide. I have also been fortunate of family: parents, brother, children and, especially, wife.

RICHARDTALMAN Ithaca, New York June 1999

xvii

This Page Intentionally Left Blank

INTRODUCTION GENERALITIES A more descriptive title for this text might have been Geometric Mechanicsfor Pedestrians, where “pedestrian” anticipates that the reader is untrained or, more precisely, since a fairly high degree of training in physics or engineering is assumed, less than optimally trained in mathematics. The most important prerequisite is an understanding of Lagrangian mechanics at the level of texts such as Marion’s or Symon’s. No background in Hamiltonian mechanics is assumed. The word “geometric” in the title is used in a narrow, technical, mathematical sense. For a physicist the phrase “geometric mechanics” is redundant, since mechanics is to geometry what swimming is to water-one occurs in the other. By contrast, a mathematician defines geometry (the kind under discussion anyway) to be the algebraic or analytical representation of spatial objects and relationships. For better or for worse, modern work in mechanics, and much of theoretical physics for that matter, tends to be geometric in this sense. The pedagogy of physics at the level of this text is only beginning to evolve in this direction, however. Since this book‘s approach is supposed to be modem, not classical, and because relativistic mechanics is included, describing the subject matter as modem, nonquantum mechanics would also be accurate. On the other hand, dynamical system theory, often used for material like this, would not be, as it would imply a mathematical emphasis and rigor not strived for, and certainly not attained. If a purely analytic approach to mechanics is calledformal, then the approach to be emphasized here might be called informal mechanics if that did not have misleading connotations-there will be no shortage of formulas-so this statement needs elaboration. Any successful physical theory yields analytic predictions. This fact might seem to force geometry into a subsidiary role and, in fact, most physics can be expressed formally-that is, by formulas. The role of geometry is to provide qualitative structure to this quantitative formalism. From one point of view, this qualitative content may appear to be mere scaffolding,to be discarded once formulas have been obtained, and much of physics and engineering is usually taught in this way. But a broader view shows that qualitative understanding has value in structuring‘results,in avoiding blunders, and in suggesting new directions. Paradoxically, these things bexix

XX

INTRODUCTION

come progressively more true as the evaluation of formulas is more and more taken over by computers. Why, then, the stress on geometry? It may be only an accident of history, but the subject of mechanics is far less pictorial than is electricity and magnetism. In electrostatics, though it would be possible to describe everything in terms of the Coulomb force F12 between pairs of charges 1 and 2, introduction of the (artificial) electric field E yields a more picturesque and ultimately more powerful theory. Density of, and directions of, jseld lines give force magnitudes and directions; surfaces normal to field lines give equipotentials. In mechanics one also introduces artificial functions, such as the Lagrangian, but, commonly, little attention is paid to their geometric properties, such as surfaces on which they are constant. Rather, their main use is formal-differentiating them appropriately yields equations of motion. Here we try to ascribe geometric properties to the functions of mechanics. The most elementary-and already well-understood-example of this kind is the potential energy function V. Surfaces on which V is constant have a well-known significance and work done can be obtained by counting contours of V crossed. There is an even more important reason for “geometricizing” mechanics, but it is hard to appreciate at this point. It has to do with a restriction of Lagrangian mechanics to systems for which “generalized coordinates” can be found. Because of this, an important thread guiding the choice of subject matter for this text is the Poincare‘ equation. This will seem curious to those educated traditionally in classical mechanics, many of whom may not even have heard of this particular equation-this only makes it more challenging to convince the reader that the Poincan5 equation is “better than” the Lagrange equation. (It is easier to show it is “as good,” since in all cases where the Lagrange equation is valid, the two equations are identical.) Anyway, by “better than” I include the recommendation that “it is what undergraduates should be learning.” Whether or not the reader is eventually convinced, the attempt to prove the point is valuable in itself because it provides guidance and motivation. A certain amount of circular reasoning creeps into physics naturally (and not necessarily unproductively) as follows. Suppose that by making a special assumption a certain difficult issue can befinessed. Then, by the simple expedient of defining “physics,” or “fundamental physics,” as being limited to systems satisfying the special assumption, one is relieved by dejnition of worrying further about the difficult issue. In the present context here is how it goes. Once one has found the generalized coordinates of a system, the Lagrangian method proceeds reliably and in a purely mechanical way, with no need to be troubled by annoying mathematical concepts such as tangent spaces. The stratagem then is to define mechanics to be the theory of systems for which generalized coordinates can be found and presto, one has a tidy, self-contained, and powerful tool-Lagrange’s equations-for studying it. (It must be acknowledged that even if this approach is judged cowardly as applied mathematics, it may be “high principle” as physics-the principle being that to be Lagrangian is to be fundamental.) Being unwilling to employ this stratagem, we will face up to the Poincari equation, which is the tool of choice for studying systems that are Lagrangian except for not being describable by generalized coordinates. This, in turn, requires studying

INTRODUCTION

XXf

the geometric structure of mechanics. At that point it becomes almost an advantage that the Poincark equation is novel, since it does not carry with it the baggage of less-than-general truth that already-assimilatedphysics necessarily carries. It is also pedagogically attractive to investigate a brand new subject rather than simply to rehash Lagrangian mechanics. The greatest virtue of the Lagrange method is that it provides a foolproof scheme for obtaining correct equations of motion. With computers available to solve these equations, the mere writing down of a correct Lagrangian can almost be regarded as the solution of the problem. The P o i n c d equation, valid in far less restrictive circumstances (commonly involving constraints or rotational motion) has the same virtue. Its extra terms, compared to the Lagrange equation’s, can also be calculated in a computer using symbolic algebra. The resulting differential equations of motion can then be solved numerically, as in the Lagrangian procedure. This contributes to the validity of the earlier statement that the Poincark equation is better than the Lagrange equation. Purely Newtonian mechanics is also subject to geometrization. In this text, this takes the form primarily of a “gauge-invariant” formulation of mechanics. Both to provide mechanical systems on which to exercise the methods and because of its inherent interest, the subject of geometric phases is emphasized. It is not uncommon for Hamiltonian mechanics to be first encountered in the waning days of a course emphasizing Lagrangian mechanics. This has made it natural for a next course to start with and emphasize Hamiltonian mechanics. Here, though standard Hamiltonian arguments enter, they do so initially via a geometric, Hamilton-Jacobi route. Then the full geometric artillery developed earlier in the text is rolled out under the name of symplectic mechanics, which is just another name for Hamiltonian mechanics with its geometric structure emphasized. Especially important is Liouville’s theorem and its generalizations. Because of their fundamental importance, for example in accelerator physics, and because of their connection with quantum mechanics, adiabatic invariants and Poisson brackets, central to both traditional “canonical” formalism and the more geometric treatment, are also stressed. Though Lagrangian methods have mainly been emphasized in this brief survey, Newtonian and Hamiltonian mechanics get comparable billing in the body of the text. Furthermore, an effort has been made to organize the material in such a way that a path can be followed through any one of the three major approaches independent of the other two. This is not to say that this is recommended, but if there is too much material for any particular curriculum, as there almost surely is, this may help in paring down the material. For some years at Cornell, the curriculum for beginning graduate students in physics has treated the study of mathematical methods of physics per se as best contained in a course on electromagnetism. This has probably reflected the assignment of highest priority to differential equations, special functions, and Fourier methods. But recently algebra and geometry seem to be “making a comeback,” and mechanics is a natural vehicle for exercising those methods. During the half century after the discovery of quantum mechanics, the pedagogy of teaching classical mechanics to physicists and engineers remained static. This

xxii

INTRODUCTION

was perhaps natural while the exciting research developments occurred in quantum physics. Still, because of its obvious technological importance, and because of the (admittedly hard to justify) belief that its understanding is essential to the understanding of quantum physics, classical mechanics has remained an essential part of the cumculum. Recently though, there has been renewed interest in mechanics as a fruitful research area. Unfortunately, with this research field having been left to mathematicians for two generations, the language of modem advanced dynamics bears little resemblance to the subject taught in traditional physics courses. One response to this situation has been the appearance of numerous texts explaining highly specialized topics of abstract mathematics and directed toward their subsequent application in physics. While I understand the concept, and to some extent even agree with it, this text takes a less ambitious approach, staying as close as possible to familiar methods. The reason for this is that a fully satisfactory mathematical preparation for geometrical mechanics would consist of a formidable sequence of subjects: differential equations, calculus of variations, vector and tensor analysis, multilinear algebra, differential forms, smooth manifolds (including a smattering of topology), differential geometry (including Riemannian geometry), Lie groups and Lie algebra, and dynamical systems. Most physicists setting out on that path would never reach its end. They would be stalled by insufficient motivation, be diverted into side issues (valuable or otherwise), or switch to mathematics. Another goal of the text is to demonstrate how natural are the generalizations by which special relativity and quantum mechanics spring from classical mechanics. Also, although the subject of general relativity is not broached here, many of its analytical and conceptual difficulties are faced. Everything contained in this book is explained with more rigor, or more depth, or more detail, or (especially) more sophistication, in at least one of the books listed at the end of this introduction. Were it not for the fact that most of those books are fat, intimidating, abstract, formal, mathematical and (for many) unintelligible, the reader’s time would be better spent reading them (in the right order) than studying this book. But if this text renders those books both accessible and admirable, it will have achieved its main purpose. It has been said that bridge is a simple game; dealt thirteen cards, one has only to play them in the correct order. In the same sense, mechanics is easy to learn; one simply has to study readily available books in a sensible order. I have tried to chart such a path, extracting material from various sources in an order that I have found appropriate. At each stage I indicate (at the end of the chapter) the reference my approach most closely resembles. In some cases what I provide is a kind of Reader’sDigest condensation of a more general treatment, and this may amount to my having systematically specialized and made concrete descriptions that the original author may have systematically labored to generalize and make abstract. The texts to which these statements are most applicable are listed at the end of each chapter, and keyed to the particular section to which they relate. It is not suggested that these texts should be systematically referred to, as they tend to be advanced and contain much unrelated material. But if particular material in this text is obscure, or seems to stop short of some desirable goal, these texts should provide authoritative help.

INTRODUCTION

XXiii

The mathematical level strived for is only high enough to support a persuasive (to a nonmathematician) trip through the physics. Still, “it can be shown that” almost never appears, though the standards of what constitutes “proof” may be low, and the range of generality narrow. I believe that much mathematics is made difficult for the nonmathematically inclined reader by the absence of concrete instances of the abstract objects under discussion. This text tries to provide essentially correct instances of otherwise hard-to-grasp mathematical abstractions. I hope and believe that this will provide a broad base of general understanding from which deeper, more specialized, more mathematical texts can be approached with a respectable general comprehension. This statement is most applicable to the excellent books by Arnold, who tries hard, but not necessarily successfully, to provide physical lines of reasoning. Much of this book was written with the goal of making one or another of his discussions comprehensible. In the early days of our weekly Laboratory of Nuclear Studies Journal Club our former esteemed leader, Robert Wilson, imposed a rule-though honored as much in the breach as in the observance, it was not entirely a joke-that the Dirac y matrices never appear. The (largely unsuccessful) purpose of this rule was to force the lectures to be intelligible to us theory-challenged experimentalists. In this text there is a similar rule. It is that hieroglyphics such as

4 :(X

E

R2 : 1x1 = 1) + R

not appear. The justification for this rule is that a “physicist” is likely to skip such a statement altogether or, once having understood it, regard it as obvious. Like the jest that the French “don’t care what they say as long as they pronounce it properly” one can joke that mathematicians don’t care what their mappings do, as long as the spaces they connect are clear. Physicists, on the other hand, care primarily what their functions represent physically and are not fussy about what spaces they relate. Another “rule” has just been followed; the wordfunction will be used in preference to the (synonymous) word mapping. Other temfying mathematical words such asflow, symplectomorphism, and manifold will also be avoided, except that, to avoid longwinded phrases such as “configuration space described by generalized coordinates,” the word manifold will occasionally be used. Of course, one cannot alter the essence of a subject by denying the existence of mathematics that is manifestly at its core but, in spite of the loss of precision, I hope that sugar-coating the material in this way will make it more easily swallowed by nonmathematicians.

Notation: “Notation isn’t everything, it’s the only thing.” Grammatically speaking, this statement, like the American football slogan it paraphrases, makes no sense. But its clearly intended meaning is only a mild exaggeration. After the need to evaluate some quantity has been expressed, a few straightforward mathematical operations are typically all that is required to obtain the quantity. But specifying quantities is far from simple. The conceptual depth of the subject is substantial and ordinary language is scarcely capable of defining the symbols, much less expressing the relations among them. A way of assessing the seriousness of the problem is to note that the index of symbols in Goldstein’s Classical Mechanics has about 400 entries. This makes

XXiV

INTRODUCTION

the introduction of sophisticated symbols essential. Discussion of notation and the motivation behind its introduction is scattered throughout this text-probably to the point of irritation for some readers. Here we limit discussion to the few most important, most likely to be confusing, and most deviant from other sources: the qual@ed , the vector, the preferred reference system, the active/passive interpreequality tation of transformations, and the terminology of differentialforms. A fairly common occurrence in this subject is that two quantities A and B are equal or equivalent from one point of view but not from another. This circumstance will be indicated by “qualified equality” A % B. This notation is intentionally vague (the “q” stands for qualified, or questionable,or query, as appropriate) and may have different meanings in different contexts; it only warns the reader to be wary of the risk ofjumping to unjustified conclusions. Normally the qualification will be clarified in the subsequent text. Next, vectors. Consider the following three symbols or collections of symbols: +, x, and (x, y , z ) ~ The . first, 4,will be called an arrow (because it is one), and this word will be far more prevalent in this text than any other of which I am aware. This particular arrow happens to be pointing in a horizontal direction (for convenience of typesetting) but in general an arrow can point in any direction, including out of the page. The second, boldface quantity, x, is an intrinsic or true vector; this means that it is a symbol that “stands for” an arrow. The word “intrinsic” means “it doesn’t depend on choice of coordinate system.” The third quantity, (x, y , z ) ~ , is a column matrix (because the T stands for transpose) containing the “components” of x relative to some preestablished coordinate system. From the point of view of elementary physics, these three are equivalent quantities, differing only in the ways they are to be manipulated; “addition” of arrows is by ruler and compass, addition of intrinsic vectors is by vector algebra, and addition of coordinate vectors is component-wise. Because of this multiplicity of meanings, the word “vector” is ambiguous in some contexts. For this reason, we will often use the word “arrow” in situations where independence of choice of coordinates is being emphasized (even in dimensionality higher than 3 if necessary.) According to its definition above, the phrase intrinsic vector could usually replace it, but some would complain of the redundancy and the word arrow more faithfully conveys the intended geometric sense. Comments similar to these could be made concerning higher-order tensors, but they would be largely repetitive. A virtue of arrows is that they can be plotted in figures. This goes a long way toward making their meaning unambiguous, but the conditions defining the figure must still be made clear. In classical mechanics, “inertial frames” have a fundamental significance, and we will almost always suppose that there is a “preferred” reference system, its rectangular axes fixed in an inertial system. Unless otherwise stated, figures in this text are to be regarded as “snapshots” taken in that frame. In particular, a plotted arrow connects two points fixed in the inertial frame at the instant illustrated. As mentioned previously, such an arrow is symbolized by a true vector such as x. It is, of course, essential that these vectors satisfy the algebraic properties defining a vector space. In such spaces, “transformations” are important; a “linear” transformation can be represented by a matrix symbolized, for example, by M,with elements

INTRODUCTION

XXV

M i j . The result of applying this transformation to vector x can be represented symM x of “intrinsic” quantities, or spelled out bolically as the “matrix product” y explicitly in components y’ = MijxJ.Frequently both forms will be given. This leads to a notational difficulty in distinguishing between the “active” and “passive” interpretations of the transformation. The new components y’ can belong to a new arrow in the old frame (active interpretation) or to the old arrow in a new frame (passive interpretation). On the other hand, the intrinsic form y % M x seems to support only an active interpetation according to which M “operates” on vector x to yield a different vector y. To avoid this problem, when we wish to express a passive interpretation we will ordinarily use the form i3 % Mx and will insist that x and Z stand for the same arrow. The significance of the overhead bar, then, is that Z is simply an abbreviation for an array of barred-frame coordinates Xi. When the active interpretation is intended, the notation will usually be expanded to clarify the situation. For example, consider a moving point located initially at r(0) and at r(t) at later time t. These vectors can be related by r(t) = O(t)r(0) where O(t) is a time-dependent operator. This is an active transformation. The beauty and power of vector analysis as it is applied to physics is that a boldface symbol such as V indicates that the quantity is intrinsic and also abbreviates its multiple components V’ into one symbol. Though these are both valuable purposes, they are not the same. The abbreviationworks in vector analysis only because vectors are the only multicomponent objects occurring. That this will no longer be the case in this book will cause considerable notational difficulty because the reader, because of experience with vector analysis, is likely to jump to unjustified conclusions concerning boldface quantities.’ We will not be able to avoid this problem, however, since we wish to retain familiar notation. Sometimes we will be using boldface symbols both to indicate intrinsicality and as abbreviation for multicomponent objects. Sometimes the (redundant) notation ? will be used to emphasize the intrinsic aspect. Though it may not be obvious at this point, this was the source of the above-mentioned need to differentiate verbally between active and passive transformations. In stressing this distinction, the text differs from a text such as Goldstein’s, which, perhaps wisely, deemphasizes the issue. According to Arnold, “it is impossible to understand mechanics without the use of differential forms.” Accepting the validity of this statement only grudgingly (and trying to corroborate it), but knowing from experience that typical physics students are innocent of any such knowledge, a considerable portion of the text is devoted to this subject. Briefly, the symbol dx will stand for an old-fashioned differential esplacement of the sort familiar to every student of physics. But a new quantity dx, to be known as a differential form, will also be used. This symbol is distinguished from dx both by being boldface and having an overhead tilde. Dis lacements dx’ ,dx2, . . . -2 in spaces of higher dimension will have matching forms dx ,dx , . . . . This notation is mentioned at this point only because it is unconventional. In most treatments, one

-1:

‘Any computer programmer knows that, when two logically distinct quantities have initially been given the same symbol because they are expected to remain equal, it is hard to unscramble the code when later on it becomes necessary to distinguish between them.

XXVi

INTRODUCTION

or the other form of differential is used, but not both at the same time. I have found it impossible to cause classical formulations to metamorphose into modern formulations without using this distinction (and others to be faced when the time comes). It is hard to avoid using terms whose meanings are vague. (See the previous paragraph, for example.) I have attempted to acknowledge such vagueness, at least in extreme cases, by placing such terms in quotation marks when they are first used. Since quotation marks are also used when the term is actually being defined, a certain amount of hunting through the surrounding sentences may be necessary to find if a definition is actually present. (If it is not clear whether or not there is a definition, then the term is without any doubt vague.) Italics are used to emphasize key phrases, or pairs of phrases in opposition, that are central to the discussion. Parenthesized sentences or sentence fragments are supposedly clear only if they are included right there, but they should not be allowed to interrupt the logical flow of the surrounding sentences. Footnotes, though sometimes similar in intent, are likely to be real digressions, or technical qualifications or clarifications.

Organization of the Book: The text is divided into major parts (see Figure 1). Part I describes various systems that are subject to exact analytic solution in closed form.With much of modem mechanics dealing with systems that are close to solvable, some of these examples will later serve as the bases from which perturbative treatments can proceed. More important, this material is intended to provide a brief overview of elementary methods that the reader may wish to review. Since the formalism (primarily Lagrangian) is assumed to be familiar, this review consists primarily of examples, many worked out partially or completely. Part II contains geometry that will be used throughout the text. This constitutes a rather heavy initial dose of mathematics for a physics text. This material could instead have been interspersed with the physics throughout the text, for use on a “just in time” basis. Such an organization has sometimes been called “spiral” since all mathematical material is well motivated by the requirements of the immediate physics and the climb toward the final goal is continuous. This approach provides a desirable integration of the mathematics and physics. The spiral approach has not been followed here however, since the resultant fragmentation would impede subsequent study and review. For this reason it is recommended (almost required) that the reader (or course syllabus) restrict coverage of the geometric sections, perhaps covering initial sections in detail and deferring later sections until needed. On the other hand, Chapters 2 through 9 by themselves have served as the text for a one-term course with the title “Geometric Concepts in Physics” for advanced undergraduates at Cornell. The flowchart shown in Fig. 1 is intended to aid this planning by showing where the various geometric methods are primarily employed. Parts III, IV,and V are devoted to the major distinct formalisms of mechanics. Even with the emphasis being geometric, the traditional division of mechanics as Newtonian, Lagrangian, or Hamiltonian continues to be appropriate. In fact, it is primarily the nature of the geometries that motivates this division. The Poincad procedure emphasized previously fits naturally into the Lagrangian approach. Numerous examples intended to illustrate the essence of each of the approaches are included.

INTRODUCTION

I

XXVii

Elementary Lagrangian mechanics / : \

I

FIGURE 1. Major dependencies in the text starting from the assumed prerequisite of elementary Lagrangian mechanics. Proceeding sequentially from chapter to chapter is satisfactory but not particularly recommended. Note that Lagrangian, Newtonian, and Hamiltonian topics are largely restricted to threads on the left, center, and right, respectively, so that mutually independent routes can be taken through each of these major approaches. Broken lines indicate minor dependencies that can perhaps be overlooked and made up heuristically. The most traditional, most elementary, route to practical calculational methods is the Hamiltonian route on the right.

With close-to-integrable systems being so important, it is appropriate to develop approximate methods, and that is the subject of Part VI. Rather than using "the same method" to solve numerous problems, the approach taken is more nearly to use different methods to solve similar problems. Partly this is a reaction to what seems to me to be an overemphasis on the method of canonical transformations in existing textbooks, and partly it is intended to emphasize other geometric aspects.

GENERAL MECHANICSTEXTS V. I. Arnold, Mathematical Methods of Classical Mechanics, Springer-Verlag. New York, 1978. N. G. Chetaev, Theoretical Mechanics, Springer-Verlag. Berlin, 1989. H. Goldstein, Classical Mechanics, Addison-Wesley, Reading, MA, 1980.

XXViii

INTRODUCTION

L. D. Landau and E. M. Lifshitz, Mechanics, Pergamrnon, Oxford, 1976. L. A. Pars, Analytical Dynamics, Ox Bow Press, Woodbridge, CT, 1979. F. Scheck, Mechanics, Springer-Verlag, Berlin, 1990. K. R. Symon, Mechanics, 3rd ed., Addison-Wesley, Reading, MA, 1971. D. Ter Haar, Elements of Hamiltonian Mechanics, 2nd ed., Pergarnon, Oxford, 1971. E. T. Whittaker, A Treatise on the Analytical Dynamics of Particles and Rigid Bodies, Cambridge University Press, Cambridge, UK, 1989.

SPECIALIZED MATHEMATICAL BOOKS ON MECHANICS R. Abraham and J. E. Marsden, Foundations of Mechanics, Addison-Wesley, Reading, MA, 1985. V. I. Arnold. V. V. Kozlov, and A. I. Neishtadt, Dynamical SystemsIll, Springer-Verlag, Berlin, 1980. J. E. Marsden, Lectures on Mechanics, Cambridge university Press, Cambridge, UK, 1992. K. R. Meyer and R. Hall, Introduction to Hamiltonian Dynamical Systems and the N-Body Problem, Springer-Verlag. New York, 1992.

RELEVANT MATHEMATICS E. Cartan, Legons sur la ggometrie des espaces de Riemann, Gauthiers-Villars, Paris, 1951. (English translation available.) E. Cartan, The Theory of Spinors, Dover, New York, 1981. B. A. Dubrovin, A. T. Fomenko, and S.P. Novikov, Modem Geometry: Methods and Applications, Part I , Springer-Verlag, New York, 1984. H. Flanders, Differential Forms WithApplications to the Physical Sciences, Dover, New York, 1989. D. H. Sattinger and 0.L. Weaver, Lie Groups and Algebras, Applications to Physics, Geometry, and Mechanics, Springer-Verlag, New York, 1993. B. F. Schutz, Geometrical Methods of Mathematical Physics, Cambridge University Press, Cambridge, UK, 1980. V. A. Yakubovitch and V. M. Starzhinskii, Linear Differential Equations with Periodic Coefficients, Wiley, New York, 1975.

SOLVABLE SYSTEMS

This Page Intentionally Left Blank

1 REVIEW OF SOLVABLE SYSTEMS

1.l.INTRODUCTION AND RATIONALE An introductory textbook on Lagrangian mechanics (which this is not) might be expected to begin by announcing that the reader is assumed to be familiar with Newtonian mechanics-kinematics, force, momentum and energy and their conservation, simple harmonic motion, moments of inertia, and so on. In all likelihood such a text would then proceed to review these very same topics before advancing to its main topic of Lagrangian mechanics. This would not, of course, contradict the original assumption since, apart from the simple pedagogical value of review, it makes no sense to study Lagrangian mechanics without anchoring it firmly in a Newtonian foundation. The student who had not learned this material previously would be well advised to start by studying a less advanced, purely Newtonian mechanics textbook. Because so many of the most important problems of physics can be solved cleanly without the power of Lagrangian mechanics, it would be uneconomical to begin with an abstract formulation of mechanics before developing intuition better acquired from a concrete treatment. One might say that Newtonian methods give better “value” than Lagrangian mechanics because, though ultimately less powerful, Newtonian methods can solve the most important problems and are easier to learn. Of course this would only be true in the sort of foolish system of accounting that might attempt to rate the relative contributionsof Newton and Einstein. One (but not the only) purpose of this textbook is to go beyond Lagrange’s equations. By the same foolish system of accounting just mentioned, these methods would be rated less valuable than La3

4

REVIEW OF SOLVABLE SYSTEMS

grangian methods because, though more powerful, they are more abstract and harder to learn. It is assumed the reader has had some (not necessarily much) experience with Lagrangian mechanics.’ Naturally this presupposes familiarity with the abovementioned elementary concepts of Newtonian mechanics. Nevertheless, for the reasons described in the previous paragraph, we start by reviewing material that is, in principle, already known. It is assumed the reader can define a Lagrangian, can write it down for a simple mechanical system, can write down (or copy knowledgeably) the Euler-Lagrange equations and from them derive the equations of motion of the system, and finally (and most important of all) trust these equations to the same extent that she or he trusts Newton’s law itself. A certain (even if grudging) acknowledgment of the method’s power to make complicated systems appear simple is also helpful. Any reader unfamiliar with these ideas would be well advised to begin by repairing the defect with the aid of one of the numerous excellent textbooks explaining Lagrangian mechanics. Because a systematic review of Newtonian and Lagrangian mechanics would be too lengthy, this chapter organizes the review in the form of worked examples that illustrate the important concepts. Though these examples are intended for solution “by hand,” many of them can also be worked using a mathematical manipulation computer language such as Maple, Mathematica, or Matlab, and a few such solutions are given in the text. They are given in Maple since, of the languages mentioned, that is the only one the author “speaks,” but there is no reason to suppose this choice is optimal. No attempt is made to explain the Maple programs on a line-by-line basis. Such listings are only given in cases where whole classes of problems can be solved by making minor alterations in the given code. By glancing at these solutions you should be able to assess roughly the effort that would be involved in making such alterations and also to guess the likelihood of success for solving a particular problem of the class. If this problem is extremely close to the solved problem it might be appropriate to port it into your language of choice. Students wishing to use these solutions to confirm their own pencil-and-paper solutions should simply guess the correspondences between variable names and symbols and skip to the results. (In these cases the name puzzle might be more appropriate than problem.) It can be argued (persuasively I believe) that training in the use of mathematical manipulation programs should be part of any scientific curriculum, and (perhaps less persuasively) that classical mechanics is the best training ground. However, appropriate material for a course explaining those methods would go well beyond the scope of this text, which is already uncomfortably broad. This is unfortunate because

‘Though “prerequisites” have been mentioned, this text still attempts to be “not too advanced.” Though the subject matter deviates greatly from the traditional curriculum at this level (as represented, say, by Goldstein, Classical Mechanics) it is my intention that the level of difficulty and the anticipated level of preparation be much the same as is appropriate for Goldstein.

INTRODUCTION AND RATIONALE

5

the results that come effortlessly from the computer (after appreciable initial effort2) lend a concreteness to analytic formulas that can otherwise be achieved only by laborious hand calculation. By permitting one to follow effortlessly in the footsteps of Lagrange, Euler, Gauss, Christoffel, and so on, the computer puts flesh on their methods, makes it thinkable to experiment with variants, and generally enhances one’s admiration for these “parents of the field” who had only pencil and paper to work with. Generally speaking, one should attempt to solve problems in purely analytic form to the extent possible; it is invariably easier to solve problems in purely numerical terms, but such solutions are typically of limited value for gaining insight into the nature of the system under study. In this text much greater emphasis is placed on obtaining correct equations than on solving them. In the present computer age, this might be called “modern” because computers are very good at solving equations. Going further by using the computer to derive the equations might then be called “hypermodern.” Part of our task is to develop methods by which computers can do just that. In spite of these considerations there is simply too much analytical material and the level of abstraction needed to appreciate geometric mechanics is too deep to leave much time for problem solving by computer. Ultimately, though, students with a serious interest in pursuing the methods of this text must plan eventually to become adept in some computer problem-solving environment. Since the process of becoming adept is “fun,” the best approach would probably be for each individual to take the initiative in achieving this goal. If time spent in this pursuit is charged to “hobby” time, it will leave more time for the deeper thought required to assimilate the “theory” of the subject. The following examples and problems are initially stated in such a way as to be appropriate for solution using the traditional tools of a physics course, pencil and paper. Furthermore, their level of difficulty is designed to match the expected level of preparation of the reader. The sufficiently motivated and adequately prepared reader with enough time should be able to work out the answers with pencil and paper, even ignoring the solutions. The given solutions would have to be regarded as extremely generous hints for some of these problems to qualify as homework assignments. But the examples should be appropriate for “self-assignment” by which the reader confirms her or his level of preparation as adequate for the text. To the extent possible, examples in later chapters are based on these examples. The use of examples is especially appropriate for describing the evolution of systems that are close to solvable systems. *In my experience it takes at least three times as long to solve a problem of the sort studied here using Maple as it would to solve the same problem by hand. But this comparison is about as meaningful as comparing the cost of setting up a production line with the cost of building one automobile. Once properly set up. a program can solve not only the problem at hand but also related problems. Furthermore, it can serve as a springboard to the solution of more challenging problems, possibly even qualifying as “research.” Unfortunately, a student in the context of a university course typically has insufficient time for solving related problems, and hence may not be in a position to amortize the expense of writing the program. This practical consideration contributes to my belief that the time-challengedstudent should not be steered too strenuously away from analytical work and into programming.

6

REVIEW OF SOLVABLE SYSTEMS

1.2. WORKED EXAMPLES 1.2.1. Particle in One-Dimensional Potential The first problem is prototypical of all problems in simple harmonic motion or, in other words, of the linearized description of motion near equilibrium. One locates a point of equilibrium where the force vanishes or, equivalently, the potential function has a minimum, and approximates the potential locally by the best-fit upright parabola. For convenience the origin can be moved to the minimum point. Application of energy conservation yields the “turning points” immediately, in addition to providing a formula relating position and velocity. Probably something close to the computer program listing that serves as a solution can be used to linearize any potential that can be written explicitly as a differentiable function. In higher than one dimension the linearizationprocess is essentially similar but, of course, there are more equations and they are harder to solve.

Problem 1.2.1: A particle of mass m and total energy E is subject to the onedimensional Morse potential U ( X )= ~ ( e - ~-”2e-‘“*).

(a) Assuming the particle bound, find its turning points and its point of stable equilibrium. (b) Find the frequency of small oscillation. (c) Find the frequency of large oscillation. (d) Solve for the general motion by “reduction to quadrature,” i.e., as an integral remaining to be evaluated. (e) On a “phase space” plot with x as abscissa and p = m i as ordinate, sketch typical trajectories.

Maple Program 1.2.1 # Particle i n lDPotentia1 > with(p1ots): with(lina1g): withbtudent):

norm trace

Warning: new definition for Warning: new definition for >

U := A*( exp(-l*alpha*x)

-

P*exp(-alpha*x)

u := A(ef-2’”x) >

>;

-2e(-@”))

F := -diff(U,x);

F := - A ( - ~ u ~ ( - ~ O ” )+ 2 , y e ( - u X ) 1

=.

x-0 := aolve(F,x) ; x-0 := 0

WORKED EXAMPLES

7

> # S u b s t i t u t e f o r x t o avoid s e t t i n g x-0 while obtaining k >

F 1;

F := subs( x'xx,

F := - ~ ( - 2 a e ( - 2 ' ~ *+~2)f f e ( - * ' ) ) > k := -diff(F,xx);

>

F := subs(

F 1:

XX=X,

> xx := x-0:

k := eval(k);

k := 2A a2 > small-amp-period

:= 2*Pi*sqrt (m/k) ;

smallampperiod := IT > #

Must not use "E" a s t o t a l energy; maple i n t e r p r e t s it

# a s 2.718 >

Aff

...

t p t s := solve( U-EE, x> ;

In

(:

)

2A +2A

tpts := -

- l n ( 1 2 A - 2 m A

a

ff

>

d e l t a := 0.0000001: # Fudge f a c t o r t o help i n t e g r a t i o n a t end of range

>

t p t l := t p t s [ l ] + d e l t a : t p t 2 := tpts[2]-delta:

> I n t ( l / s q r t (EE-U)

( 1- (

In 1/2 2A-2-)

,x

=

t p t l . .tpt2

;

A -.110-6

1

,/EE- A(e(--2@X)-2ef-wXf)

In 1 / 2 2A+c=E) A

-

dx

+.110-6

a

> period := s q r t (2*m) *changevar( exp(-alpha*x)=y

,

'I,

y) ;

[ [ "("'7 ,1 104))

Q

period := JZJG

le In(

2 A+2

> A := 1: alpha := 1: period;

4

~

)

1

- A (y2 - 2 y ) y

dY

8

REVIEW OF SOLVABLE SYSTEMS

(la(l-JiTE)+.110-6)

>

TO := evalf(small-amp-period) ;

TO := 4.442882938 f i >

EE := -0.9:period; .6837723024

1

-

- y 2 -+ 2 y y

J-.9

dY

TI := 4.680867535

p l :=

EE

:=

Jm ( -.9

-0.7:period: T2 := evalf(

- e( - 2 x

)

+ 2 e('-X ) )

'I

1: p2 := sqrt(2*m*(EE-U)):

>

EE := -0.5:period: T3

:= evalf (

'I

):

p3 := sqrt(2*m*(EE-U))

:

>

EE

:= -0.3: period:

T4

:= evalf (

'I

):

p4 := sqrt(2*m*(EE-U))

:

>

EE

:= -0.1:period: T5 := evalf(

I*

1: p5 := sqrt(2*m*(EE-U)):

>

plotsetup(ps,plotoptions='noborder');

plotsetup: warning plotoutput filename set to postscript.out interface(plotoutput='Shape.ps');

plot(U,x=-1. .5, title='Morse Potential, U(x),

1

A=alpha=l');

>

interface(p1otoutputt'Force .pa') ;

>

plot(F,x=-1. -5, title='Force, F(x),

>

m := 1:

>

interface(plotoutput= ' Period .ps ' ;

>

plotpoints := [ [-1,TO] ,I-. 9,T1], [- .7,T2] ,I-. 5,T3] , [- .3,T41,I-. 1,T511;

A=alpha=lO;

plotpoints := [[ -1,4.442882938], [ -.9,4.680867535]. [ -.7,5.308304409], [ -.5,6.281216813], [ -.3,8.109331364], [ -.1, 14.0462873211 >

plot( plotpoints, style=line, title='Period vs. Energy'); interface(plotoutput='PhaseSpace .ps') ;

>

plot (Ipl,p2,p3,p4,p5), x=- 1. .3,title- 'Momentum vs position' ;

WORKED EXAMPLES

;

>

interface(plotoutput='Flou.ps')

>

f ieldplot ( [p/m, F] ,x=-0.5..3,p=O. . 1.3

;

Maple Results 1.2.1

Morse Potential, U(x), A=alpha=l

1

5

FIGURE 1.2.1. Graph of the Morse potential.

Force, F (x), A=alpha=l

FIGURE 1.2.2. The force resultingfrom the Morse potential.

9

10

REVIEW OF SOLVABLE SYSTEMS

Period vs. Energy 14-

12.-

10-

8-

6-

FIGURE 1.2.3. The period of oscillation about the equilibrium point of the Morse potential, as a function of amplitude.

Momentum vs. position t

t

-1

I

0

I

1

x

2

3

FIGURE 1.2.4. Phase space plot, momentum p versus position x for the Morse potential, for E = -.9, -.7, -5,-.3, -.1. The smooth curves are contours of equal total energy. 'Turning points" are the (extrapolated) intersections of these curves with the x-axis (about which the plot is symmetric.)

WORKED EXAMPLES

11

FIGURE 1.2.5. Arrows represent the flow (plrn, F ) plotted with ( x , p) axes; the Morse potential.

Problem 1.2.2: Two potential energy functions expressible in terms of elementary functions and leading to bounded motion are as follows: (a) U ( x ) = -Uo/ cosh’(ax). (b) U ( x ) = Uo tan’(ax). Sketch these functions, find ranges of energy E for which bounded motion occurs, give a formula determining the turning points, and find the frequency of small oscillations.

Problem 1.2.3: Motion always “back and forth” between two limits, say a and b, in one dimension due to force derivable from a potential energy function V ( x ) is known as “libration.” Conservation of energy then implies

li’ = ( x

- a )(b - x ) @ ( x ) ,

or li = * J ( x

- a)(b - x ) @ ( x ) ,

where @ ( x ) =. 0 through the range a 5 x 5 b but is otherwise an arbitrary function of x . It is necessary to toggle between the two f choices depending on whether the particle is moving to the right or to the left. Consider the change of variable x -+8 defined by x = a - ,9 cos 8,

where a - ,9 = a ,

a

+ ,9 = b.

Show that ( x - a)(b - x ) = /?’sin’ 8 and that energy conservation is expressed by

e = J@(a- j3 cost?), where there is no longer a sign ambiguity because 8 is always positive. The variable 8 is known as an “angle variable.” One-dimensional libration motion can always be

12

REVIEW OF SOLVABLE SYSTEMS

expressed in terms of an angle variable in this way, and then can be “reduced to quadrature” as

re This type of motion is especially important in the conditionally periodic motion of multidimensional oscillatory systems. This topic is studied in Section 14.6.

1.2.2. Particle in Space Problem 1.2.4: The acceleration of a point particle with velocity v(t) is given by

Show that Ivl is constant. For the case that A is independent of time and position, give the motion of the particle in terms of its initial position ro and velocity vo. 1.2.3. Weakly Coupled Oscillators

Already in two dimensions there is a rich variety of possible system motions. No attempt will be made here to survey this variety, but some of it can be inferred by studying the pictures accompanying the following problem. In particular the final part exhibits the ubiquitous phenomenon of “avoided line-crossing” where “line” refers to the graph of a normal mode frequency plotted as a function of a parameter of the problem. A theory of nonlinear perturbations to systems like this is described in the final chapter.

Problem 1.2.5: The Lagrangian

with la I << w2 and la I << w; describes two oscillators that are weakly coupled. (a) Find normal coordinates and normal mode frequencies Q 1 and 522. (b) For the case w = w;?, describe free motion of the oscillator. (c) Holding a and w;? fixed, make a plot of G? versus w showing a branch for each of and S22. Do it numerically if you wish. Describe the essential qualitative features exhibited.

Maple Program 1.2.5 > #

Two weakly-coupled oscillators

> uith(p1ots):

with(lina1g): plotsetup(ps,plotoptions=‘noborder‘):

WORKED EXAMPLES

13

Warning: new definition for norm Warning: new definition for trace plotsetup: warning plotoutput filename set to postscript.out >

Digits := 5:

>

alias( x=x(t)

>

V

1: alias( y=y(t)

):

:= 1/2*(omega_1^2*~^2 + ornega_2^2*y^2)

-

alpha*x*y;

v := -1 omega-12 x 2 + -1 omega22 y 2 - a x y 2

2 >

T := m/2*(diff (x.t)-2 + diff (y,t)-2);

L := m 2 > # >

L

((i + (i x)’

y12) -

ornega-12x2 - -1 omega22 y2 2

+ax y

Substitute for x and y since they have been aliased := subs( { x=xx, y=yy, diff(x,t)=xp,

diff(y.t)=yp

1, L);

1 L := - rn (xp2 + y p 2 ) - - omega-I2x2 - - omegaJ2yy2 +axryy 2 2 2 1

>

1

dL-dxp := diff(L,xp); d L d r p := m xp

>

dL-dyp := diff (L,yp); d L d y p := m y p

>

dL-dx := diff (L,xx) ; d L d x := --ornega-I2 XT

+ a yy

d L d y := -omega2 2 y y

+a w

dL-dy := diff (L,yy);

>r

a-dxp := subs( Cxx=x, yy=y, xp=diff (x ,t) , yp’dif f(y ,t) 3 , dL-dxp) ; dLAxp := m

>

(i

x)

dl-dyp := subs( {xx=x, yy=y, xp=diff (x, t) , yp=diff (y ,t) 1, dL-dyp) ; d L d y p := m

(i

y)

14

REVIEW OF SOLVABLE SYSTEMS

d L d r := -omega-12 x

+

a! y

> dL-dy := subs( Cxx=x, yy=y, xp=diff (x,t), yp-diff (y,t)).

d L d y := -ornegaJ2 y > alias( x=x

: alias( y=y

dL-dy);

+ a! x

1; 1

z #

The only remaining alias is I = sqrt(-1)

> eqnx := diff(dL-dxp.t)

eqnx := m >

-

&-dx

($

eqny := diff(dL-dyp,t)

-

x( t ))

= 0 ;

+ omega-] 2 x( t

- a! y ( t ) = 0

dL-dy = 0 ;

> omega-1 := 1: omega-2 := 1: alpha := 0.1: > eqnxl := subs(Cx(t>=xt(t),

y(t)=yt(t)),

m := 1:

eqnx);

> ODE := eqnx1,eqnyl;

> initvals := xt(O)=l.

yt(O)=O,

D(xt) ( O ) = O , D ( y t ) (O)=O:

> funcs := Cxt(t),yt(t));

funcs := { xt( t ), yt( t ) } >

sols := dsolve(

C

ODE, initvals),

funcs, 'laplace');

sols := { x t ( t ) = .5()00coS(.94869t)

+ . ~ ~ O O O C O1.0488t), S(

Yt( t ) = SoOOO COS( .94869 t ) - So000COS( 1.0488 t )) >

assijp(so1s);

=.

interface(plotoutput='xyCpldOsc.ps'):

WORKED EXAMPLES

plot( xt(t>. t=O..100 1:

>

plot1

>

plot2 := plot( yt(t),

>

display({plotl,plot2},

>

omega-1 := 'omega';

:=

15

t=0..100, thickness=l 1: title='x(t)

and y(t) '1;

omega-1 := w >

ODE := eqnx,eqny; +y(t)-.lx(t)=O

X(t)--.ly(t)=O, >

initvals := x(O)=l, y(O)=O,

>

funcs

:=

D(x) ( O ) = O , D(y) (O)=O:

{x(t>,y(t>}; funcs := { x( t ). y (

>

11

sols := dsolve( { ODE, initvals}, funcs, 'laplace');

(

%1 := RootOf l002?

+ (100 + 100w2)

Z 2- 1

>

assign(so1s) ;

>

interface (plotoutput=' CpldOsc-xVsOmega.ps' ) ;

>

plotdd

>

interface(plotoutput='CpldOsc-yVs0mega.p~') ;

>

plot3d

> #

(

(

+ 100w2)

x(t), omega=0.8..1.2,t=O..40, numpoints = 4000, style=hidden, orientation=[-lO,45], axes=BOXED, title = 'x(t>' 1;

y(t), omega=0.8..1.2. t=O..40, numpoints = 4000, style=hidden, orientation=[-10,451, axes=BOXED, title = 'y(t)' ) ;

Unassign alpha and omega-2. i.e. Let them be variables again---they assigned numerical values above.

# were >

alpha

:=

'alpha'; omega-2 := 'omega-2'; (Y

:= (Y

omega2 := omega2 >

M

:= matrix(2,2, [omega-2, alpha, alpha, omega-2^2]

1;

16

REVIEW OF SOLVABLE SYSTEMS

M:=[ >

w,’

“ 1

omegaz’

eigs := eigenvals(M); eigs := -1 m2

2 1

1

2

2

+ -1 omega22 + -’ J o 4 2

- 2w2 omega22 + omega24 + 4 a 2 ,

2

- 0 2 + - omega22 - 1J 0 4 - 2 0 2 omega22 + omega24 + 4 a2 2

whattype(eig6);

exprseq > omplus := sqrt (eigs C11) ;

omplus := >

1 J2 0 2 + 2 omega22 + 2 J04 2

- 2 a 2 omega22 + omega24 + 4a2

omminus := sqrt (eigs C21) ;

> alpha := 0.1: omega-2 := 1: >

interface(plotoutput=‘ModeFreqs.ps‘);

>

p~ot(Complus,omminus3, omega=o..l.b, title=‘Nonnal Mode Frequencies‘, numpoints=400, axes=BOXED

;

Maple Results 1.2.5

x ( t ) and y ( t )

FIGURE 1.2.6. Sloshing of energy between two weakly coupled oscillators of equal natural frequency, w = y = 1, a = 0.1.

WORKED EXAMPLES

17

t

FIGURE 1.2.7. The displacement x of the originally displaced mass is shown; holding the coupling strength and the other natural frequency constant, (Y = 0.1, y = 1, its natural frequency w is allowed to vary over the range y - 0.2 < 0 c y 0.2.

+

1. 1.

0. 0. 0. 0.

omega FIGURE 1.2.8. Holding the coupling strength and one natural frequency constant at y = 1, a = 0.1, the other natural frequency is allowed to vary over the range 0 < w < 1.5. The eigenfrequencies 521 and Q2 lie on separate branches. For low o the pair (521, 522) x (y. w) but for high 61 (B,, Q2) (w. y). That is, the identificationreverses as one passes from o c w2 to y < w. As a result the two branches never intersect.

18

REVIEW OF SOLVABLE SYSTEMS

Problem 1.2.6: The approximate Lagrangian for an n-dimensional system with coordinates ( 4 1 ~42. . . . , 4”). valid in the vicinity of an equilibrium point (that can be taken to be (0, 0, . . . ,0)) has the form

(1.2.1) with T positive definite. It is common to use the summation convention for summations like this, but in this text the summation convention is reserved for tensor summations. When subscripts are placed in parentheses (as here) it indicates they refer to different variables or parameters (as here) rather than different components of the same vector or tensor. However, for the rest of the problem the parentheses will be left off, while the summation signs will be left explicit. It is shown in Section 2.5.3 that a linear transformation qi + y, can be found such that T takes the form 1 ” T =-Cmrj;, 2 r=l where, in this case, each “mass” m, is necessarily positive because T is positive definite. By judicious choice of the scale of the yr each “mass” can be adjusted to 1: 1 ” T =5

(1.2.2)

r=l

For these coordinates yr the equation (1.2.3) r=l

defines a surface (to be called a hypersphere). From now on we will consider only points y = (y1, . . . , y,,) lying on this sphere. Also two points u and v will be said to be “orthogonal” if the “quadratic form” Z(u, v) defined by

vanishes. Being linear in both arguments Z(u, v) is described as being “bilinear.” We also define a bilinear form V(u, v) by

WORKED EXAMPLES

19

where coefficients kr, have been redefined from the values given above to correspond to the new coordinates yr so that 1 V(Y) = 2 W Y , Y). The following series of problems (adapted from Courant and Hilbert [ 11) will lead to the conclusion that a further linear transformation yi --f z j can be found that, on t!e one hand, enables the equation for the sphere in Eq. (1.2.3) to retain the same form,

r=l

and, on the other, enables V to be expressible as a sum of squares with positive coefficients:

Pictorially the strategy is, having deformed the scales so that surfaces of constant T are spherical and surfaces of constant V ellipsoidal, to orient the axes to make these ellipsoids erect. In the jargon of mechanics this process is known as “normal mode” analysis. The “minimax” properties of the “eigenvalues” to be found have important physical implications, but we will not go into them here. (1) Argue, for small oscillations to be stable, that V must also be positive definite. (2) Let z1 be the point on “sphere” (1.2.3) for which V ( ‘g ~ K I is) maximum. (If there is more than one such point pick any one arbitrarily.)Then argue that

(3) Among all points that are both on sphere (1.2.3) and orthogonal to z1,let 22 def. 1 be the one for which V( = 9 ~ 2 is ) maximum. Continuing in this way show that a series of points 21, z2,. . . ,z,,,each maximizing V consistent with being orthogonal to its predecessors, is determined, and that the sequence of values, V(zr) = Z1 K ~ , r = 1,2, ... ,n,is monotonically nonincreasing. (4) Consider a point z1 €6 which is assumed to lie on surface (1.2.3) but with 6 otherwise arbitrary. Next assume this point is “close to” z1 in the sense that 6 is arbitrarily small (and not necessarily positive). Since z1 maximizes V it follows that

+

Show therefore that

20

REVIEW OF SOLVABLE SYSTEMS

This implies that U ( z l , z r )= O

for r > 1,

because, other than being almost orthogonal to 2 1 . 4 is arbitrary. Finally, extend the argument to show that

where the coefficients /cr have been shown to satisfy the monotonic conditions of Eq. (1.2.4) and is the usual Kronecker-6 symbol. Taking these Zr as basis vectors, an arbitrary vector z can be expressed as

In these new coordinates show that Eqs. (1.2.1) become

L ( z , ~=) T - V , T =

--xi:.v

1 " 2 =-CKrzr. 2 r=l

1 " 2 r=l

(1.2.5)

Write and solve the Lagrange equations for coordinates 2,.

Problem 1.2.7: Continuing with the previous formula, using a more formal approach, the Lagrange equations resulting from Q. (1.2.1) are ( 1.2.6)

These equations can be expressed compactly in matrix form as

Mq + Kq = 0,

(1.2.7)

or, assuming the existence of M-' , as

+ M-'Kq

= 0.

(1.2.8)

Seeking a solution of the form qr = ~

~

r= 2 I, 2 ~ , . . . ,n, '

the result of substitution into Eq. (1.2.6) is

( M - ~ K- 0 2 i )=~0.

(1.2.9)

WORKED EXAMPLES

21

These equations have nontrivial solutions for values of w that cause the determinant of the coefficients to vanish:

I M - ~ K- w2i1= 0.

( 1.2.10)

Correlate these “eigenvalues” with the constants K~ defined in the previous problem.

Problem 1.2.8: Particles of mass 3m, 4m, and 3m are spaced at uniform intervals h along a light string of total length 4k stretched with tension 7 and rigidly fixed at both ends. To legitimize ignoring gravity the system is assumed to lie on a smooth horizontal table so the masses can oscillate only horizontally. Let the horizontal displacements be X I , x2, and x3. Find the normal mode frequencies and the corresponding normal mode oscillation “shapes.” Discuss the “symmetry” of the shapes, their “wavelengths,” and the (monotonic)relation between frequency and number of nodes. See Fig. 1.2.9. Already with just 3 degrees of freedom the eigenmode calculations are sufficiently tedious to make some efforts at simplifying the work worthwhile. In this problem, with the system symmetric about its midpoint it is clear that the modes will be either symmetric or antisymmetric and, since the antisymmetric mode vanishes at the center point, it is characterized by a single amplitude, say y = X I = -x3. Introducing “effective mass” and “effective strength coefficient,” the kinetic energy of the mode, necessarily proportional to j , can be written as T2 = $meff)i2 and the potential energy can be written as V2 = $keffy2. The frequency of this mode is then given by w = d-, which, by dimensional analysis, has to be have been given proportional to Q = J?/(mh). (The quantities T2, V2, and subscript 2 because this mode has the second highest frequency.) Factoring this expression out of Q. (1.2.lo), the dimensionless eigenvalues are the eigenfrequencies in units of q . Complete the analysis to show that the normal mode frequencies are (wi ,w ,w3) = (1, and find the corresponding normal mode “shapes.”

m,m).

Problem 1.2.9: Though the eigenmode/eigenvalue solution method employed in solving the previous problem is the traditional method used in classical mechanics, equations of the same form, when they arise in circuit analysis and other engineering fields, are traditionally solved using Laplace transforms-a more robust method, it seems. Let us continue the solution of the previous problem using this method. Individuals already familiar with this method or not wishing to become so should 3m

3m

h

h

h

h

FIGURE 1.2.9. Three beads on a stretched string. Tha transverse displacements are much exaggerated. Gravity and string mass are negligible.

22

REVIEW OF SOLVABLE SYSTEMS

skip this section. Here we use the notation

(1.2.11) as the formula giving the Laplace transform X(s), of the function of time x ( t ) . T(s) is a function of the “transform variable” s (which is a complex number with positive real part). With this definition the Laplace transform satisfies many formulas, but for present purposes we use only dx

= sx - x(O), dt

(1.2.12)

which is easily demonstrated. Repeated application of this formula converts time derivatives into functions of s and therefore converts (linear) differential equations into (linear) algebraic equations. This will now be applied to the system described in the previous problem. The Lagrange equations for the beaded string shown in Fig. 1.2.9 are

(1.2.13)

Suppose the string is initially at rest but that a transverse impulse I is administered to the first mass at t = 0; as a consequence it acquires initial velocity u10 = f(0) = 1/(3m). Transforming all three equations and applying the initial conditions (the only nonvanishing initial quantity, u10, enters via Eq. (1.2.12)).

Solving these equations yields XI

=-

+-

1

(1.2.15)

-

x3 = -

+--s 2 +1 q 2

s2+2q2/3

WORKED EXAMPLES

23

It can be seen, except for factors hi, that the poles (as a function of s) of the transforms of the variables are the normal mode frequencies. This is not surprising as the determinant of the coefficients in Eq. (1.2.14) is the same as the determinant entering the normal mode solution, but with w2 replaced with -s2. Recall from Cramer’s rule for the solution of linear equations that this determinant appears in the denominators of the solutions. For “inverting” Eq. (1.2.15) it is sufficient to know just one inverse Laplace transformation, ( 1.2.16)

but it is easier to look in a table of inverse transforms to find that the terms in Eq. (1.2.15) yield sinusoids that oscillate with the normal mode frequencies. Furthermore, the “shapes” asked for in the previous problem can be read off directly from (1.2.15) to be (2 : 3 : 2), (1 : 0 : -l), and (1 : -1 : 1). When the first mass is struck at t = 0 all three modes are excited and they proceed to oscillate at their own natural frequencies, so the motion of each individual particle is a superposition of these frequencies. Since there is no damping, the system will continue to oscillate in this superficially complicated way forever. In practice there is always some damping and, in general, it is different for the different modes; commonly, damping increases with frequency. In this case, after a while, the motion will be primarily in the lowest frequency mode; if the vibrating string emits audible sound, an increasingly pure tone will be heard as time goes on.

Problem 1.2.10: Damped and Driven Simple Harmonic Motion. The equation of motion of mass m, subject to restoring force -wimx, damping force -2Amf, and external drive force f cos y r is 5

f cos y t . + 2 A i + w; = m

(1.2.17)

(a) Show that the general solution of this equation when f = 0 is x ( t ) = ae-At cos(wt

+ $),

(1.2.18)

where a and $ depend on initial conditions and w = d m . This “solution of the homogeneous equation” is also known as “transient” because when it is superimposed on the “driven” or “steady-state” motion caused by f it will eventually become negligible. (b) Correlate the stability or instability of the transient solution with the sign of A. Equivalently, after writing the solution (1.2.18) as the sum of two complex exponential terms, Laplace transform them, and correlate the stability or instability of the transient with the locations in the complex s-plane of the poles of the Laplace transform. (c) Assuming x(0) = f(0) = 0, show that 1,aplace transforming Eiq. (1.2.17) yields

24

REVIEW OF SOLVABLE SYSTEMS

X(s) =-f

1 2as

S

s2

+ y 2 ~2 + + w;‘

( 1.2.19)

This expression has four poles, each of which leads to a complex exponential term in the time response. To neglect transients we need only drop the terms for which the poles are off the imaginary axis. (By part (b) they must be in the left half-plane for stability.) To “drop” these terms it is necessary first to isolate them by partial fraction decomposition of Fq.(1.2.19). Performing these operations, show that the steady state solution of Eq. (1.2.17) is

where wo 2 - y2

- 2 ~ y?= i J (w2 - y2)2 + 4~ y eis.

(1.2.21)

(d) The response is large only for y close to OXJ. To exploit this, defining the “small” “frequency deviation from the natural frequency” E=Y-wg,

show that y 2 - o2x

~ E O and that

(1.2.22)

the approximate response is (1.2.23)

Find the value of E for which the amplitude of the response is reduced from its maximum value by the factor 1/a.

1.2.4. Conservationof Momentum and Energy It has been shown previously that the application of energy conservation in onedimensional problems permits the system evolution to be expressed in terms of a single integral-this is “reduction to quadrature.” The following problem exhibits the use of momentum conservation to reduce a two-dimensional problem to quadratures, or rather, because of the simplicity of the configuration in this case, to a closed-form solution.

Problem 1.2.11: A point mass m with total energy E, starting in the left half-plane, moves in the ( x , y) plane subject to potential energy function

The “angle of incidence” to the interface at x = 0 is 0, ,and the outgoing angle is 8. Specify the qualitatively different cases that are possible, depending on the relative

WORKED EXAMPLES

25

values of the energies, and in each case find 8 in terms of 8i . Show that all results can be cast in the form of “Snell’s Law” of geometric optics if one introduces a factor ,/-, analogous to index of refraction.

1.2.5. Effective Potential Since one-dimensional motion is subject to such simple and satisfactory analysis, anything that can reduce the dimensionality from two to one has great value. The “effective potential” is one such device.

Problem 1.2.12: The Kepler Problem. No physics problem has received more attention over the centuries than the problem of planetary orbits. In later chapters of this text the analytic solution of this so-called “Kepler problem” will be the foundation on which perturbative solution of more complicated problems will be based. Though this problem is now regarded as “elementary” one is well-advised to stick to traditional manipulations as the problem can otherwise get seriously messy. The problem of two masses m 1 and m2 moving in each other’s gravitational field is easily converted into the problem of a single particle of mass m moving in the gravitational field of a mass mo assumed very large compared to m,that is, F = - K f / r 2 , where K = Gmom and r is the distance to m from mo. Anticipating that the orbit lies in a plane (as it must), let x be the angle of the radius vector from a line through the center of mo; this line will later be taken as the major axis of the elliptical orbit. The potential energy function is given by

K U ( r ) = --,r

(1.2.24)

and the orbit geometry is illustrated in Fig. 1.2.10.

FIGURE 1.210. Geometric constructiondefining the ‘true anomaly“ x and “eccentricanomaly“ u in terms of other orbit parameters.

26

REVIEW OF SOLVABLE SYSTEMS

W o conserved quantities can be identified immediately: energy E and angular momentum M . Show that they are given by E = -m(r 1 - 2 + r 2 x. 2 ) - -, K 2 r M = m r 2i .

(1.2.25)

One can exploit the constancy of M by eliminating x from the expression for E, 1 E = -mf2 2

+ UeR(r),

M2 K where Ue&) = -- -. 2mr2 r

(1.2.26)

The function Uefi(r).known as the “effective potential,” is plotted in Fig. 1.2.11. Solving both the expression for E and the expression for M for the differential dt

mrL dt = - d x , M (1.2.27) and equating the two expressions yields a differential equation that can be solved by “separation of variables.” This has permitted the problem to be “reduced to quadratures,”

x(r)=

Ir

Mdr’lrf2 J2m ( E

+ $) - M 2 / r f 2 ’

(1.2.28)

Note that this procedure yields only an “orbit equation,” the dependence of x on r (which is equivalent to, if less convenient than, the dependence of r on x .) Though a priori one should have had the more ambitious goal of finding a solution in the

FIGURE 1.2.11. The effective potential U,ft for the Kepler problem.

WORKED EXAMPLES

27

form r ( r ) and x (t), no information whatsoever is given yet about time dependence by Eq.(1.2.28). (a) Show that all computations so far can be carried out for any central force that is radially directed with magnitude dependent only on r. At worst the integral analogous to (1.2.28) can be performed numerically. (b) Returning to the Kepler problem, perform the integral (1.2.28) and show that the orbit equation can be written ECOSX

where E

=

Jm..

M2 1 + 1= mK r’

(1.2.29)

(c) Show that (1.2.29) is the equation of an ellipse if 6 < 1 and that this condition is equivalent to E < 0. (d) It is traditional to write the orbit equation purely in terms of “orbit elements” E, which can be identified as the “eccentricity” and the “semimajor axis” a: (1.2.30) The reason a and E are special is that they are intrinsic properties of the orbit, unlike, for example, the orientations of the semimajor axis and the direction of the perpendicular to the orbit plane, both of which can be altered at will and still leave a “congruent” system. Derive the relations

E = --,K 2a

M 2 = (1 - E2)mKa,

(1.2.3 1)

so the orbit equation is a -= r

~+ECOSX

1-€2

(1.2.32) *

(e) Finally, derive the relation between r and t : (1.2.33) An “intermediate” variable u that leads to worthwhile simplication is defined by r=a(l-ccosu).

(1.2.34)

The geometric interpretation of u is indicated in Fig. 1.2.10. If (x, z) are Cartesian coordinates of the planet along the major and an axis parallel to the minor axis

28

REVIEW OF SOLVABLE SYSTEMS

through the central mass, they are given in terms of u by x =acosu -a€,

z = a d 1 - € 2 sinu,

(1.2.35)

since the semimajor axis is a,/and the circumscribed circle is related to the ellipse by a z-axis scale factor d n . The coordinate u, known as the “eccentric anomaly,” is a kind of distorted angular coordinate of the planet, and is related fairly simply to t: I

t = J gK( u - c sinu).

(1.2.36)

This is especially useful for nearly circular orbits, since then u is nearly proportional tot. Analysis of this Keplerian system is continued using Hamilton-Jacobi theory in Section 112 . 8 , and then again in Section 14.6.3 to illustrate actiodangle variables, and then again as a system subject to perturbation and analyzed by “variation of constants” in Section 16.1.1. Problem 1.2.13: The effective potential formalism has reduced the dimensionality of the Kepler problem from two to one. In one dimension, the linearization (to simple harmonic motion) procedure, illustrated above, for example in Problem 1.2.1. can then be used to describe motion that remains close to the minimum of the effective potential (see Fig. 1.2.11). The radius ro = M 2 / ( m K )is the radius of the circular orbit with angular momentum M.Consider an initial situation for which M has this same value and f ( 0 ) = 0, but r(0) # ro, though r(0) is in the rsgion of good parabolic fit to U e ~Find . the frequency of small oscillations and express r ( t ) by its appropriate simple harmonic motion. Then find the orbit elements a and E , as defined in Problem 1.2.12, that give the matching two-dimensional orbit.

1.2.6. Multiparticie Systems Solving multiparticle problems in mechanics is notoriously difficult; for more than two particles it is usually impossible to get solutions in closed form. But the equations of motion can be made simpler by the appropriate choice of coordinates, as the next problem illustrates. Such coordinate choices exploit exact relations such as momentum conservation and thereby simplify subsequent approximate solutions. For example, this is a good pre-quantum starting point for molecular spectroscopy.

Problem 1.2.14: The position vectors of three point masses, m l , m2, and m3, are rl ,r2, and r3. Express these vectors in terms of the alternative configuration vectors sc, s;, and 512 shown in the figure. Define “reduced masses” by mn=mi+rn2,

M=mi+rnz+m3,

cLl2

=

m1m2

m12

m3mn

P3 = M

.

Calculate the total kinetic energy in terms of s, sl,,and s12 and interpret the result.

WORKED EXAMPLES

29

FIGURE 1.2.12. Coordinates describing three particles. C is the center of mass and sc its position vector relative to origin 0. C12 is the center of mass of rnl and 4 and s; is the position of m3 relative to C12.

Defining corresponding partial angular momenta 1, I;, and 112, show that the total angular momentum of the system is the sum of three analogous terms. SOLUTION: In Fig. 1.2.12 the origin 0 is at an arbitrary location relative to which the center of mass C is located by radius vector sc. Relative to particle 1, particle 2 is located by vector s12. Relative to the center of mass at C12 mass 3 is located by vectors;. In terms of these quantities the position vectors of the three masses are

Substituting these into the kinetic energy of the system

T = -mIrf 1 2

+ -m& 1 + -m3r3, 1 -2 2

2

the “cross terms” proportional to sc . si, sc . s12, and s; the result

.s12

all cancel out, leaving

where uc = IscI, v; = IsiI, and u12 = JS12).The angular momentum (about 0)is given by

L = rl x

(mli.1)

+ r2 x (m2i.2) + 1-3 x (m3r3).

Upon expansion the same simplificationsoccur, yielding 1 L = - M rc x vc 2

1 1 + -p3 r3‘ x v3’ + -p12 r12 x v12. 2 2

30

REVIEW OF SOLVABLE SYSTEMS

Problem 1.2.15: Determine the moment of inertia tensor about center of mass C for the system described in the previous problem. Choose axes to simplify the problem initially and give a formula for transforming from these axes to arbitrary (orthonormal) axes. For the case m3 = 0, find the principal axes and the principal moments of inertia.

SOLUTION: Setting sc = 0, the particle positions are given by

Since the masses lie in a single plane it is convenient to take the z-axisnormal to that plane. Let us orient the axes such that the unit vectors satisfy

si = 9, s12 = a i + bf,

(and hence a = s;

fs12).

so the particle coordinates are

x3

m 12 = -, M

y3 = 0.

In terms of these, the moment of inertia tensor I is given by

For the special case m3 = 0 these formulas reduce to

The formulas derived in this solution will be used again in Section 9.3. Problem 1.2.16: A uniform solid cube can be supported by a thread from the center of a face, from the center of an edge, or from a comer, In each of the three cases the system acts as a torsional pendulum, with the thread providing all the restoring torque and the cube providing all the inertia. In which configuration is the oscillation period the longest? (If your answer involves complicated integrals you are not utilizing properties of the inertia tensor in the intended way.)

BIBLIOGRAPHY

31

1.2.7. DimensionaVScalingConsiderations Problem 1.2.17: Suppose the potential energy of a central field is a homogeneous function of degree v : V ( a r )= a ” U ( r ) , for any a > 0. (a) Starting with a valid solution of Newton’s law, making the replacements r 3 a r and t + pr, and choosing p = a’-”/2,show that the total energy is modified by factor a” and the equation of motion is still satisfied. (b) For the case u = 2, derive from this the isochronicity of simple harmonic motion. (c) For the case v = - 1, derive Kepler’s third law.

This sort of argument is introduced from a Lagrangian point of view in Landau and Lifshitz [2], where, among other applications, the viriul theorem is proved. Arnold introduces the argument from a Newtonian point of view and gives other interesting applications of similarity.

BIBLIOGRAPHY

References 1. R. Courant and D. Hilbert, Methods of Mathematical Physics, VoI. 1, Interscience, New

York, 1953,P. 37. 2. L. D. Landau and E. M. Lifshitz, Classical Mechanics, Pergamon, Oxford, 1976. p. 22.

This Page Intentionally Left Blank

THE GEOMETRY OF MECHANICS

Since the subject of mechanics deals with the motion of particles in space, the phrase “geometric mechanics” might seem redundant. In this text the phrase is intended mainly to imply that more consideration will be paid to geometric ideas than was once considered appropriate. Close to a century ago the subject of general relativity sprang almost entirely from a deep contemplation of geometry, and yet the appreciation of geometry as physics has accelerated only recently. Until recently an intuitive grasp of high school geometry has been considered adequate as a pedagogical basis for classical mechanics, with the result that high school geometry is pretty firmly fixed in the intuition of most physicists. Generally this is good, or at least satisfactory, but it can impede the learning of physics that employs more abstract geometric methods. As an example of the way restricted intuition can retard assimilation of new physics, one need only recall the mental cortortion that was required before the (counterintuitive)formula for the composition of velocities in special relativity could be accepted. The strategy of this part of the text is to contemplate geometry with mechanics kept in the background, to better understand geometric ideas on their own before they are folded back into the physics. Of course, the geometric ideas important in mechanics will be the ones to be emphasized.

This Page Intentionally Left Blank

GEOMETRY 0F MECHANICS I: LINEAR

2.1. INTRODUCTION Even before considering geometry as physics, one can try to distinguish between geometry and algebra, starting, for example, with the concept of “vector.” The question “What is a vector?’ does not receive a unique answer. Rather, two answers are perhaps equally likely: “an arrow” or “a triplet of three numbers ( x , y, z).” The former answer could legitimately be called geometric, the latter algebraic. Yet the distinction between algebra and geometry is rarely unambiguous. For example, experience with the triplet ( x , y , z) was probably gained in a course with a title such as “Analytic Geometry” or “Coordinate Geometry.” For our purposes it will not be necessary to have an ironclad postulational basis for the mathematics to be employed, but it is important to have a general appreciation of the ideas. Again, that is a purpose of this chapter. Since the immediate goal is unlearning almost as much as learning, the reader should not expect to find a completely self-contained, unambiguous development from first principles. To make progress in physics, it is usually sufficient to have only an intuitive grasp of mathematical foundations. For example, the Pythagorean property of right-angle triangles is remembered even if its derivation from Euclidean axioms is not. Still, some mulling over of “well-established” ideas is appropriate, as they usually contain implicit understandings and definitions, possibly different for different individuals. Some of the meanings have to be discarded or modified as an “elementary” treatment metamorphoses into a more “abstract” formulation. Faced with this problem, a mathematician might prefer to “start from scratch,” discard all preconceived notions, define everything unambiguously, and proceed on a 35

36

GEOMETRY OF MECHANICS I: LINEAR

firm postulational basis.‘ The physicist, on the other hand, is likely to find the mathematician’s approach too formal and poorly motivated. Unwilling to discard ideas that have served well, and too impatient or too inexperienced to follow abstract argument, when taking on new baggage, he or she prefers to rearrange the baggage already loaded, in an effort to make it all fit. The purpose of this chapter is to help with this rearrangement. Elaborating on the metaphor, some bags are to be removed from the trunk with the expectation they will fit better later, some fit as is, some have to be reoriented; only at the end does it become clear which fit and which must be left behind. While unloading bags it is not necessary to be fussy, when putting them back one has to be more careful. The analysis of spatial rotations has played a historically important part in the development of mechanics. In classical (both with the meaning nonrelativistic and the meaning “old-fashioned”) mechanics courses this has largely manifested itself in the analysis of rigid body motion. Problems in this area are among the most complicated for which the equations can be “integrated” in closed analytic form in spite of being inherently “nonlinear,”a fact that gives them a historical importance. But since these calculations are rather complicated, and since most people rapidly lose interest in, say, the eccentric motion of an asymmetric top, it has been fairly common, in the teaching of mechanics courses, to skim over this material. A “modern” presentation of mechanics has a much more qualitative and geometric flavor than the “old-fashioned” approach just mentioned. From this point of view, rather than being just a necessary evil encountered in the solution of hard problems, rotations are the easiest-to-understandprototype for the analysis of motion using abstract group theoretical methods. The connection between rotational symmetry and conservation of angular momentum, both because of its importance in quantum mechanics and again as a prototype, provides another motivation for studying rotations. It might be said that classical mechanics has been left mainly in the hands of mathematicians-physicists were otherwise occupied with quantum questions-for so long that the language has become nearly unintelligible to a physicist. Possibly unfamiliar words in the mathematician’s vocabulary include bivectors, multivectors, differential forms, dual spaces, Lie groups, irreducible representations, pseudoEuclidean metrics, and so on. Fortunately all physicists are handy with vector analysis, including the algebra of dot and cross products and the calculus of gradients, divergences, and curls, and in the area of tensor analysis they are familiar with covariant (contravariant)tensors as quantities with lower (upper) indices that (for example) conveniently keep track of the minus sign in the Minkowski metric of special relativity. Tools like these are much to be valued in that they permit a very compact, very satisfactory formulation of classical and relativistic mechanics, of electricity and magnetism, and of quantum mechanics. But they also leave a physicist’s mind unwilling to jettison certain “self-evident” truths that stand in the way of deeper levels of abstraction. Perhaps the simplest example of this is that, having treated ‘Perhaps the first treatment from first principles, and surely the most comprehensive text to base mechanics on the formal mathematical theory of smooth manifolds, was Abraham and Marsden, [l]. Other editions with new authors have followed.

PAIRS OF PLANES AS COVARIANT VECTORS

37

vector cross products as ordinary vectors for many years, one’s mind has difficulty adopting a mathematician’s view of cross products as being quite disimilar to, and certainly incommensurable with, ordinary vectors. Considerable effort will be devoted to motivating and explaining ideas like these in ways that are intended to appeal to a physicist’s intuition. Much of this material will be drawn from the work of Elie Cartan, which, though old, caters to a physicist’s intuition.* To begin with, covariant vectors will be introduced from various points of view and contrasted with the more familiar contravariant vectors.

2.2. PAIRS OF PLANES AS COVARIANT VECTORS The use of coordinates ( x , y , z)-shortly we will switch to ( x ’ , x 2 , x3)-for locating a point in space is illustrated in Fig. 2.2.1. Either orthonormal (Euclidean) or skew (Carte~ian)~ axes can be used. It is rarely required that skew axes be used rather than the simpler rectangular axes but, in the presence of continuous deformations, skew axes may be unavoidable. Next consider Fig. 2.2.2, which shows the intersections of a plane with the same axes as in Fig. 2.2.1. The equation of the plane on the left, in

FIGURE 2.2.1. Attaching coordinates to a point with Euclidean (or orthogonal) axes (on the left) and Cartesian (or possibly skew) axes (on the right). One of several possible interpretationsof the figure is that the figure on the right has been obtained by elastic deformation of the figure on the left. In that case the primes on the right are superfluous since the coordinates of any particular point (such as the point P) the same in both figures, namely (1, 1, 1). *Cartan is usually credited as being the “father” (though I think not the inventor) of differential forms as well as the discoverer of spinors (long before and in greater generality than) Pauli or Dirac. That these early chapters draw so much from Cartan simply reflects the lucidity of his approach. Don’t be intimidated by the appearance of spinors; only elementary aspects of them will be required. 3Many authors use the term “Cartesian”to imply orthogonal axes, but we use “Euclidean” in that case and use “Cartesian” to imply (possibly) skew axes.

38

GEOMETRY OF MECHANICS I: LINEAR

FIGURE 2.2.2. Intersection of a plane with orthogonal axes on the left and a “similar“ figure with skew axes on the right. The equations of the planes are ‘the same:’ though expressed with unprimed and primed coordinates.

terms of generic point (x, y , z) on the plane, is ax

+ by + cz = d ,

(2.2.1)

and, because the coordinates of the intersections with the axes are the same, the equation of the plane on the right in terms of generic point (XI,y’,2 ’ ) is also linear, with the same coefficients (a, b, c , d ) , ax’

+ by‘ + cz‘ = d .

(2.2.2)

The figures are “similar,” not in the conventional sense of Euclidean geometry, but in a newly defined sense of lines corresponding to lines, planes to planes, and intersections to intersections and of the coordinates of the intersections of the plane and the axes being numerically the same. The unit measuring sticks along the Euclidean axes are ex, e y ,e,, and along the skew axes exl, eyt, e,!. The coordinates ( d / a ,d / b , d / c ) of the intersection points are determined by laying out the measuring sticks along the respective axes. Much as (x, y , z) “determines” a point, the values (a, b, c ) , along with d , “determine’’ a plane. Commonly the values ( x , y , z) are regarded as projections onto the axes of an arrow that is allowed to slide around the plane with length and direction held fixed. Similarly, any two planes sharing the same triplet (a, b, c) are parallel. (It would be wrong though to say that such planes have the same normals since, with the notion of orthogonality not yet having been introduced, there is no such thing as a vector normal to the plane. Saying that two planes have the same “direction” can only mean that they are parallel-that is, their intercepts are proportional.) The analogy between plane coordinates (a, b , c) and point coordinates ( x , y, z) is not quite perfect. For example, it takes the specification of a definite value d in

PAIRS OF PLANES AS COVARIANT VECTORS

ax+ b y =d

39

+1

ax

ax

\

+by= 0

FIGURE 2.2.3.

vector x crosses +3 line-pairs number of line-pairs crossed = a x + b y line-pair

Parallel planes. How many plane-pairs are crossed by vector x?

the equation for the plane to pick out a definite plane, while it takes three values, say the (xo, yo, zo) coordinates of the tail, to pick out a particular vector. Just as one regards ( x , y , z) as specifying a sliding vector, it is possible to define a “planerelated” geometric structure specified by ( a , b, c ) with no reference to d. To suppress the dependence on parameter d , observe first that the shift from d to d+ 1 corresponds to the shift from plane ax +by +cz = d to plane ax +by cz = d 1. Each member of this pair of unit-separated planes is parallel to any plane with the same (a, b, c ) values. The pair of planes is said to have an “orientation,’4 with positive orientation corresponding to increasing d. This is illustrated in Fig. 2.2.3.Since it is hard to draw planes, only lines are shown there, but the correspondence with Fig. 2.2.2 should be clear-and the ideas can be extended to higher dimensionality as well. In this way the triplet (a,b, c ) - o r ( a l ,a2, aj), a notation we will switch to shortly-stands for any oriented, unity-spaced pair of planes, both parallel to the plane through the origin ax by cz = 0. Without yet justifying the terminology we will call x a contravariant vector, even though this only makes sense if we regard x as an abbreviation for the three numbers ( x ’ , x 2 , x 3 ) ; it would be more precise to call x a true vector with contravariant components (x, y , z ) E ( X I , x 2 ,x 3 ), so that x E xlel +x2e2 + x 3 e 3 . In some cases, if it appears to be helpful, we will use the symbol 2, instead of just x, to emphasize

+

+

+ +

4The orientation of a pair of planes is said to be “outer.” The meaning of outer orientation is that two points related to each other by this orientation must be in sepamte planes. This can be contrasted to the inner orientation of a vector, by which two points can be related only if they lie in rhe same line parallel to x. An inner orientation for a plane would be a clockwise or counterclockwiseorientation of circulation within the plane. An outer orientation of a vector is a left- or right-handed screw-sense about the vector’s mow.

40

GEOMETRY OF MECHANICS 1: LINEAR

its contravariant vector nature. Also we symbolize by z, an object with components (a, b, c) = ( a l ,a2, as) that will be called c o v a r i ~ n t . ~ In Fig. 2.2.3 the (outer) orientation of two planes is indicated by an arrow (wavy to indicate that no definite vector is implied by it.) It is meaningful to say that contravariant vector x and covariant vector have the same orientation; it means that the arrow x points from the negative side of the plane toward the positive side. Other than being able to compare their orientations, is there any other meaningful geometric question that can be asked of x and Z? The answer is yes; the question is, “How many plane-pairs %doesx cross?” In Fig. 2.2.3 the answer is “3.” Is there any physics in this question and answer? The answer is yes again. Visualize the right plot of Fig. 2.2.3 as a topographical map, with the parallel lines being contours of equal elevation. (One is looking on a fine enough scale that the ground is essentially plane and the contours are straight and parallel.) The “trip” x entails a change of elevation of three units. This permits us to anticipate/demand that the following expressions (all equivalent)

s

(2.2.3) have an intrinsic, invariant significance, unaltered by deformations and transformations (if and when they are introduced.) This has defined a kind of “invariant x ) ~ of, ~a covariant and a contravariant vector. It has also introduced product”:, the repeated-index summation convention. This product is clearly related to the “dot product” of elementary vector analysis a . x, but that notation would be inappropriate at this point because nothing resembling an angle between vectors, or the cosine thereof, has been introduced. The geometric interpretation of the sum of two vectors as the arrow obtained by attaching the tail of one of the arrows to the tip of the other is well known. The geometric interpetationof the addition of covariant vectors is illustrated in Fig. 2.2.4. As usual, the natural application of a covariant vector is the determination of the number of plane-pairs crossed by a general contravariant vector. Notice that the lines %ew” notation is to be discouraged but, where there appears to be no universally agreed-to notation, our policy will be to choose symbols that cause formulas to look like elementary physics, even if their meanings are more general. The most important convention is that multicomponent objects are denoted by boldface symbols, as for example the vector x. This is more compact, though less expressive, than I?. Following Schutz [2], we use an overhead tilde to distinguish a one-form (or covariant vector, such as Z), from a (contravariant) vector, but we retain also the boldface symbol. The use of tildes to distinguish between covariant and contravariantquantitieswill break down when mixed quantities enter. Many mathematicians use no notational device at all to distinguish these quantities, and we will be forced to that when encountering mixed tensors. When it matters, the array of contravariant components will be regarded as a column vector and the array of covariant components as a row vector, but consistency in this regard is not guaranteed. 6The left-hand side of Eq. (2.2.3).being homogeneous in (x, y , z ) , is known as a “form” and, being first-order in the x i , as a “one-form:’ The notations n i x i , 6, x), and Z(x) are interchangeable. ’In this elementary context, the invariant significanceof Z(x) is utterly trivial, and yet when the same concept is introduced in the abstract context of tangent spaces and cotangent spaces, it can seem obscure (see, for example, Arnold, [ 1, p. 203, Fig. 1661).

41

PAIRS OF PLANES AS COVARIANTVECTORS

,

z

FIGURE 2.2.4. Geometric interpretation of the adgtion of covariant vectors, + E. The solid arrow crosses two of the g plane-pairs, one of the b plane-pairs, and hence three of the + b plane-pairs. Tips of the two dotted arrows necessarily lie on the same g b plane.

z

+ of Z + L are more closely spaced than the lines of either Z or L. The geometry of

this figure encapsulates tJe property that a general vector x crosses a number of planes belonging to Z b e T a l to the number of planes it crosses belonging to Z plus the number belonging to b. The reader should also pause to grasp the geometric interpretations of H - b and b - H. There are various ways of interpreting figures like Fig. 2.2.2. The way intended so far can be illustrated by an example. Suppose you have two maps of Colorado, one having lines of longitude and latitude plotted on a square grid, the other using some other scheme-say a globe. To get from one of these maps to the other, some distortion would be required, but one would not necessarily say there had been a coordinate transformation, as the latitude and longitude coordinates of any particular feature would be the same on both maps; call them ( x , y). One can consider the right figure of Fig. 2.2.2 to be the result of a deformation of the figure on the left-both the physical object (the plane or planes) and the reference axes have been deformedpreserving the coordinates of every particular feature. The map analog of the planes in Fig. 2.2.2 are the equal-elevation contours of the maps. By counting elevation contours one can, say, find the elevation of Pike’s Peak relative to Denver. (It would be necessary to break this particular trip into many small segments in order that the ground could be regarded as a perfect plane in each segment.) With local contours represented by Z(j) and local transverse displacements vectors by x ( i ) , the overall change in elevation is obtained by summing the contributions (X(j), x(i)) from each segment.* Clearly one will obtain the same result from both maps. This is a virtue of the form (X, x) . As stated previously, no coordinate transformation has yet occurred, but when one does, we will wish to preserve this feature-if x + x’ we will insist that Z + Z’ such that the value of the form is preserved. That’s an invariant. Fig. 2.2.2 can also be interpreted in terms of “transformations” either active or passive. The elements of this figure are redrawn in Fig. 2.2.5 but with origins superimposed. In part (a) the plane is “actively” shifted. Of course its intersections with the coordinate axes will now be different. The coefficients (covariant components) in the equation of the shifted plane expressed in terms of the original axes are al-

+

8As always in this text, subscripts (i) are enclosed in parentheses to protect against their being interpreted as vector indices. There is no implied summation over repeated parenthesized indices.

42

GEOMETRY OF MECHANICS I: LINEAR Z

FIGURE 2.2.5. “Active” (a) and “passive” (b) interpretationsof the relations between elements of Fig. 2.2.2 as transformations. In each case, though they were plotted separately in Fig. 2.2.2, the plots are here superimposed with common origin 0.

tered from their original values. The new coefficients are said to be the result of an active transformation in this case. Part (b) of Fig. 2.2.5 presents an alternative view of Fig. 2.2.2 as a “passive” change of coordinates. The plane is now unshifted but its covariant components are still transformed because of the different axes. Similar comments apply to the transformation properties of contravariant components9 From what has been stated previously, we must require the form (%,x) to be invariant under transformation. This is true whether the transformation is viewed actively or passively.

2.3. DIFFERENTIAL FORMS 2.3.1, Geometric Interpretation

There is a formalism which, though it seems curious at first, is in common use in modern mechanics. These so-called differentialforms will not be used in this chapter, but they are introduced at this point, after only a minimal amount of geometry has been introduced, in order to emphasize that the concepts involved are very general, independent of any geometry yet to be introduced. In particular there is no dependence on lengths, angles, or orthogonality. Since the new ideas can be adequately illustrated by considering functions of two variaJbles, x_and y, we simplify accordingly and define elementary differential forms dx and dy as functions (of a vector) ’While it is always clear that two possible interpretations exist, it is often difficult to understand which view is intended. A certain fuzziness as to whether an active or a passive view is intended is traditionala tradition this text will regrettably continue to respect. In many cases the issue is inessential, and in any case it has nothing to do with the contravariantlcovariant distinction.

DIFFERENTIAL FORMS

43

satisfying

these functions take displacement vector Ax = x - xo as argument and produce c_omponEntsAx = x - xo and Ay = y - yo as values.” A linear superposition of dx and d y with coefficients a and b is defined by” (a&

+ b&)(Ax) = a Ax + b Ay.

(2.3.2)

In practice Ax and Ay will always be infinitessimal quantities and the differentials will be part of a “linearized” or “first term in Taylor series” procedure. Consider a scalar function h ( x , y)-for concreteness let us take h ( x , y ) to be the elevation above sea level at location (x, y). By restricting oneself to a small enough region about some reference location (xo, yo), h ( x , y) can be linearized-i.e., approximated by a linear expansion

In the language of differential forms this same equation is written as

& = a d7U + b &,

(2.3.4)

where, evidently,

(2.3.5)

l h s shows that & is closely connected to the gradient of ordinary vector analysis. It is not the same thing, though, since the ordinary gradient is orthogonal to contours of constant h and the concept of orthogonality has not yet been introduced. Note that & is independent of h(x0, yo). (Neglecting availability of oxygen and dependence of g on geographic location, the difficulty of climbing a hill is independent of the elevation at its base.) Returning to the map of Colorado, imagine a trip made up of numerous path intervals x ( i ) . The change in elevation h(i) during the incremental path interval x ( i ) is given by

+

h(i) = a ( i ) x ( i ) b(i)y(i)= (a(i)&

u

+ b(i)dy)(x(i))= &i)(X(i)).

(2.3.6)

l’Though the value of a differential form acting on a vector is a real number, in general it is not a scalar. A possibly helpful mnemonic feature of the notation is that to produce a regular-face quantity from a boldface quantity r e q u F the Ese of another boldface quantity. “That the symbols dx and dy are no? ordinary di$erentials is indicated by the boldface type and the overhead tildes. They are being newly defined here. Unfortunately, a more common notation is to represent a differential form simply as d x ; with this notation it is necessary to distinguish by context between differential forms and ordinary differentials. A converse ambiguky in our terminology is that it may not be clear whether the term differentialform means adx bdy or dh.

+

44

GEOMETRY OF MECHANICS 1: LINEAR

Since this equation resembles Eq.(2.2.3),it can also be written as h(i) = (&),

x(i)).

(2.3.7)

The total change of elevation, h, can be obtained by summing over the incremental paths: (2.3.8) 1

As usual, such a summation becomes an integral in the limit of small steps

h=

lE(&

(2.3.9)

dx);

the lower and upper limits of integration correspond to the beginning B and end E of the trip. Though the notation is highly abbreviated, it has an unambiguous, coordinate-free meaning that makes it clear that the result is invariant, in the sense discussed above. The formula has a seeming excess of differentials but, when expanded in components, it takes on a more customary appearance: rE

h=

jB (a(x) d x + b ( x ) d y ) .

(2.3.10)

Example 2.3.1: Three points Pi, i = 1,2, 3, with coordinates ( x ( i ) ,y ( i ) ,q j ) ) , are fixed in ordinary space. (1) Dividing Eq. (2.2.1) by d , find the coefficients in the equation a d

b d

x- + y - +z-

c =1 d

(2.3.1 1)

of the plane passing tkough the points. (2) Defining h ( x , y ) as the elevation z at point (x, y ) , evaluate dh at the point P I . (3) For a general point P whose horizontal displacements relative to PI are given by ( A x = x - ~ ( 1 1 ,A y = y - y(1)),find its elevation Ah = z - z(1) relative to P I . SOLUTION: (a) Ratios of the coefficients ( a ,b, c, d ) are obtained by substituting the known points into Eq. (2.3.11) and inverting:

(":) = (i;:) (:::; c:: ;:)-' =

C'

c/d

X(3)

Y(3)

Z(3)

(i)

.

(2.3.12)

(b) Replacing z by h and defining hl = d/c - ( a / c ) x ( l ) - ( b / c ) y ( l ) ,Eq. (2.3.3) becomes a b h ( x , Y ) = hl - - ( x - X ( 1 ) ) - -(Y - Y ( 1 ) ) . C

C

DIFFERENTIAL FORMS

45

Since the ratios a'/c' and b'/c' are available from Eq.(2.3.12), the required (2.3.4): differential form is given by

a.

-

-

a' b' dh = - - d ~ - -dy. N

C'

a' Ah = - - A X C'

C'

b'

- -Ay. C'

Problem 2.3.1: (a) For points P I , P2, P3 given by (l,O,O), (O,l,O), (O,O,l), check the formula just derived for A h by applying it to each of P I , P2, and P3. (b) The coordinates of three well-known locations in Colorado, Denver, Pike's Peak, and Colorado Springs, are, respectively, W. longitudes 105.1°, 105.1°, and 104.8'; N. latitudes 39.7', 38.8', and 38.8'; and elevations 5280 feet, 14,100 feet, and 5280 feet. Making the (thoroughly unwarranted) assumption that the town of Golden, situated at 105.2'W, 39.7"N, lies on the plane defined by the previous three locations, find its elevation. At this point we can anticipate one implication of these results for mechanics. Recall the connection between elevation h and potential energy U = mgh in the earth's gravitational field. Also recall the connection between work W and potential energy U . To make the equation traditionally expressed as A U = A W = F . Ax meaningful, the vectorial character of force F has to differ from that of the displacement Ax. In particular, since Ax is a contravariant vector, the force F should be a covariant vector (meaning its symbol should be F) for the work to be coordinate independent. In the traditional pedagogy of physics, covariant and contravariant vectors are usually differentiated on the basis of the behavior of their components under coordinate transformations. Note, though, that in our discussion the quantities g and x have been introduced and distinguished, and meaning was assigned to the form 6,x) before any change of coordinates has even been contemplated. This, so far, is the essential relationship between covariant and contravariant vectors. l 2 Since the ideas expressed so far, though not difficult, may seem unfamiliar, recapitulation in slightly different terms may be helpful. It has been found useful to associate contravariant vectors with independent variables, like x and y, and covariant vectors (or one-fonns) with dependent variables, like h . Knowing h ( x , y ) . one can prepare a series of contours of constant h, separated from each other by one unit '*Because the components of vectors vary in such a way as to preserve scalar invariants, a common though somewhat archaic terminology refers to vectors as invarianrs or as invariant vectors, in spite of the facts that (1) their components vary and (2) the expression is redundant anyway. (Note especially that invariant here does not mean constant.) Nowadays the term tensor automatically carries this connotation of invariance. In special relativity the phrase manifestly covariant (or simply covariant) means the same thing, but this is a different (though related) meaning of our word covarianr. Our policy, whenever the invariant aspect is to be specially emphasized, is to use the term true vector, even though it is redundant.

46

GEOMETRY OF MECHANICS I: LINEAR

of “elevation,” and plot them on the (x, y ) plane. For a (defined by a contravariant vector) change (Ax, A y ) in independent variables, by counting the number of contours (defined by a covariant vector) crossed, the change in dependent variable can be determined. We have been led to a somewhat unconventional and cumbersome notation (with d7U being the function that picks out the x-component of an arbitrary vector) so that the symbol dx can retain its traditional physics meaning as an infinitesimal deviation of x. In mathematical literature the symbol dx all by itself typically stands for a differential one-form. Furthermore, we have so far only mentioned one-forms. When a two-form such as dx d y is introduced in mathematical literature, it is taken implicitly (roughly speaking) to pick out the area defined by dx and d y rather than the product of two infinitesimals. We will return to these definitions shortly. There is an important potential source of ambiguity in traditional discussion of mechanics by physicists and it is one of the reasons mathematicians prefer different terminology for differentials: a symbol such as x is used to stand both for where a particle is and where it could conceivably be.13 This is arguably made clearer by mathematician’s notation. Since we wish to maintain physics usage (not to defend it, only to make formulas look familiar) we will use differential forms as much to demystify them as to exploit their power.

2.3.2. Examples Illustratingthe Calculus of Differential Forms Even more than the previous section, since the material in this section will not be required for some time, the reader might be well-advised only to skim over it, planning to address it more carefully later-not because the material is difficult but because its motivation may be unclear. Furthennore the notation here will be far from standard as we attempt to metamorphose from old-fashioned notation to more modem notation. (In any case, since there is no universally accepted notation for this material, it is impossible to use “standard” notation.) For the same reason, there may seem to be inconsistencieseven internal to this section. All this is a consequence mainly of our izsistence on maintaining a distinction between two types of “differential,” dx and dx. Eventually, once the important points have been made, it will be possible to shed some of the notational complexity. A notation we will use temporarily for a differential form such as the one defined in Eq. (2.3.4) is

+

G[dl = f ( x , Y )ZX g(x, Y )&.

(2.3.13)

The only purpose of the “argument” din square brackets here is to correlate i 3 with the particular coordingte diffcentials dx and as contrasted, say, with two independent differentials 6x and 6y:

&,

I3If constraints are present x can also stand for a location where the mass could not conceivably be.

DIFFERENTIAL FORMS

47

The S symbol does not signify some kind of differential operator other than d ; it simply allows notationally for the later assignment of independent values to the differently named differentials. Square brackets are used to protect against interpretation of d or S as an ordinary argument of G. One can develop a calculus of such differential forms. Initially we proceed to do this by treating the differentials as if they were the “old-fashioned” type familiar from physics and freshman calculus. Notationally we indicate this by leaving off the overhead tildes and not using boldface symbols; hence w[dl = f ( x , Y ) dx

+ g ( x , Y )d y .

(2.3.15)

The differential Sw[d] = S(w[d])is defined by

(2.3.16) Since these are ordinary differentials, if f and g were force components, Sw[d] would be the answer to the question, “How much more work is done in displacement ( d x , d y ) from displaced location P SP = (x Sx, y Sy) than is done in displacement ( d x , d y ) from point P = ( x , y)?’ S o [ d ] is not the same as dw[S] but, from the two, the combination

+

+

B.C.[w]

+

Sw[d] - dw[S],

(2.3.17)

can be formed; it is to be known as the “bilinear covariant” of w. After further manipulation it will yield the “exterior derivative” of w .

Example 2.3.2: Consider the example w [ d ] = y dx

+x dy.

(2.3.18)

Substituting into Eq. (2.3.16),we obtain Sw[d] = Sy dx

+ SX d y ,

dw[S] = d y S X

+ dx Sy.

(2.3.19)

Notice, in this case, that the bilinear covariant vanishes, h [ d ] - d o [ & ]= 0.

This is not always true, however; operating with d and operating with S do not “commute”-that is, 6w[d]and d o [ S ] are different. But products such as d x Sy and 6 y dx are the same; they are simply the products of two (a physicist might say tiny) independently assignable coordinate increments.When its bilinear covariant does, in fact, vanish, w is said to be “closed.” In the case of Eq. (2.3.18), w[d] is “derivable from” a function of position h ( x , y ) = xy according to w [ d ] = dh(x, y ) = y dx

+x dy.

(2.3.20)

48

GEOMETRY OF MECHANICS I: LINEAR

In this circumstance (of being derivable from a single-valued function), o is said to be “an exact differential.” Problem2.3.2: Show that the bilinear covariant B.C.[o] of the differential oneform, w [ d ] = d h ( x , y ) . vanishes for arbitrary function h ( x , y ) .

Example 2.3.3: For the differential form w [ d ]= y d x ,

(2.3.21)

one sees that

h [ d ] - dw[S] = Sy dx - d y SX, which does not vanish. But if we differentiate once again (introducing D as yet another symbol to indicate a differential operator) we obtain D(Sy dx - d y SX) = 0,

since the coefficients of the differentials being differentiated are now simply constants.

Problem 2.3.3: For o [ d ]given by Eq. (2.3.15),with f ( x , y ) and g(n,y) being general functions, show that its bilinear covariant does not vanish in general. We have been proceeding as if our differentials were “ordinary,” but to be consistent with our “new” notation Eq.(2.3.16) should have been written

with the result being a two-form-a function of two vectors, say A x ( l ) and Ax(2). Applying Eq. (2.3.1),this equation leads to

Except for the renaming of symbols, Sx + Ax(1),d x + AxQ), Sy -+ Ay(l),and d y + Ay(2),this is the same as Eq. (2.3.16). Hence, though the original symbols had different meanings, Eqs. (2.3.16) and (2.3.22)have equivalent content. For this to be true we have implicitly assumed that the first of the two arguments Ax(1) and Ax(2)is acted on by the 6differential form and the second by the d form. Since these appear in the same order in every term we could as well say that the first form acts on A x ( [ )and the second on Ax(2).Furthermore, since Eq. (2.3.13) made no distinction

DIFFERENTIAL FORMS

49

between S and d forms, we might as well have written Eq. (2.3.22) as

&:(

Cj[dl[d] =

+

(2.3.24)

as long as it were understood that in a product of two differential forms the first acts on the first argument and the second on the second. Note though that in spite of the fact that it is legitimate to reverse the order in a product of actual displacementEli&e Ax(l)Ay(z),it is illegitimate to reverse the order of the terms in a product like dxdy; that is,

&& # Zy&.

(2.3.25)

The failure to commute of our quantities, which will play such an important role in the sequel, has entered here as a simple consequence of our notational convention specifying the meaning of the differential of a differential. How then to express the bilinear covariant, without using the distinction between d and S? Instead of antisymmetrizingwith respect to d and S, we can antisymmetrize with respect to the arguments. A “new notation” version of Eq. (2.3.17), with C j still given by Eq. (2.3.13), can be written

This can be reexpressed by defining the “wedge product” &A&

= && - &&.

(2.3.27)

&,

(2.3.28)

Note from its definition, that

& A & = -gy

A

and zx A gx = 0.

We obtain (2.3.29) which can be substituted into Eq.(2.3.26). Since the arbitrary increments Ax(1) and Ax(2) then appear as common arguments on both sides of Eq. (2.3.26) they can be suppressed as we define a two-form E.D.[Z], which is B.C.[Z](AX(I),Ax(2)) with its arguments unevaluated: (2.3.30)

50

GEOMETRY OF MECHANICS I: LINEAR

When operating on any two vector increments,E.D.[&]generates the bilinear covariant of & evaluated for the two vectors. This newly defined differential two-form is known as the “exterior differential” of the differential one-form G.From here on this will be written ad 4

(2.3.31) Note that this relation is consistent with the rule d(f&) = ZfAZX. The vectors Ax(1) and Axp) can be said to have played only a “catalytic” role in the definition of the exterior derivative since they no longer appear in Eq.(2.3.31). From its appearance, one might guess that the exterior derivative is related to the curl operator of vector analysis. This is to be pursued next.

2.3.3. Connections between Differential Forms and Vector Calculus Like nails and screws, the calculus of vectors and the calculus of differential forms can be regarded as essentially similar or as essentially different depending on one’s point of view. Both can be used to hold physical theories together. A skillful carpenter can hammer together much of a house while the cabinet maker is still drilling the holes in the kitchen cabinets. Similarly the physicist can derive and solve Maxwell’s equations using vector analysis while the mathematician is still tooling up the differential form machinery. The fact is, though, that some structures cannot be held together with nails and some mechanical systems cannot be analyzed without differential forms. There is a spectrum of levels of ability in the use of vectors, starting from no knowledge whatsoever, advancing through vector algebra, to an understanding of gradients, curls, and divergences, to a skillful facility with the methods. The corresponding spectrum is even broader for differential forms, which can be used to solve all the problems that vectors can solve plus others as well. In spite of this, most physicists remain at the “no knowledge whatsoever” end of the spectrum. This is perhaps partly due to some inherent advantage of simplicity that vectors have for solving the most commonly encountered problems of physics, but the accidents of pedagogical fashion probably also play a role. According to Arnold, in Mathematical Methods of Classical Mechanics (1989), “Hamiltonian mechanics cannot be understood without differential form^."'^ It behooves us therefore to make a start on this subject. But in this text only a fairly superficial treatment will be included (the rationale being that the important and hard thing is to get the general idea, but that following specialized texts is not so difficult once one has the general idea). The whole of advanced calculus can be formulated in terms of differential forms, as can more advanced topics, and there are several texts I4The notation of Eq. (2.3.31) is still considerably bulkier than is standard in the literature of differential forms. There, the quantity (exterior derivative) that we have called E.D.[G] is often expressed simply as d o , andEq. (2.3.31)becomesdo = (-af/ay+ag/ax) d x h d y o r e v e n d o = (-af/ay+ag/ax)dxdy. I5It might be more accurate to say that “without differentialforms one cannot understand Hamiltonian mechanics as well as Arnold,” but this statement would be true with or without differential forms.

DIFFERENTIAL FORMS

51

concentrating narrowly yet accessibly on these subjects. Here we are more interested in giving the general ideas than in either rigorous mathematical proof or practice with the combinatorics that are needed to make the method compete with vector analysis in compactness. The purpose of this section is to show how formulas that are (assumed to be) already known from vector calculus can be expressed using differential forms. Since these results are known, it will not be necessary to prove them in the context of differential forms. This will permit the following discussion to be entirely formal, its only purpose being to show that definitions and relations being introduced are consistent with results already known. We work only with ordinary, three-dimensional, Euclidean geometry, using rectangular coordinates. It is far from true that the validity of differential forms is restricted to this domain, but our purpose is only to motivate the basic definitions. One way that Eq. (2.3.13) can be generalized is to go from two to three dimensions: $1)

= f ( x , y , z) dux

+ g(xt y , z) & + h ( x , y . z) &,

where the superscript (1) indicates that leading to Eq. (2.3.31) yield

(2.3.32)

is a one-form. Calculations like those

Next let us generalize Eq. (2.3.27) by defining

(2.3.34)

& A & A &(Ax(l), AX(^), Ax(3)) = det

4 1 )

Ay(l)

AX(2) 4 3 ) Ay(2) Ay(3)

.

(2.3.35)

where the superscript (2) indicates that Zi(2) is a two-form. At first glance this may seem to be a rather ad hoc and special form, but any two-form that is antisymmetric in its two arguments can be expressed this way.16 We then define the exterior 161n most treatments of differential forms the phrase “antisymmetric two-form” would be considered redundant, since “two-forms” would have already been defined to be antisymmetric.

52

GEOMETRY OF MECHANICS I: LINEAR

differential,

These definitions are only special cases of more general definitions, but they are all we require for now. From Eq. (2.3.37),using Eqs. (2.3.28), we obtain

(2.3.38) Let us recapitulate the formulas that have been derived, but using notation for the coefficients that is more suggestive than the functions f ( x , y, z), g ( x , y , z ) , and h ( x , y, z) used so far.

Then Eqs. (2.3.4),(2.3.33), and (2.3.38) become

-

d

a4-

= -dX

ax

a++ -a$d y + -dZ, aY az

2

+ (--+-

3;)-

-

dZAdx,

(2.3.40) We can now write certain familiar equations as equations satisfied by differential forms. For example,

--

d d2) = 0, is equivalent to

V B = 0.

(2.3.41)

a

The three-form zG(2) is “waiting to be evaluated” on coordinate increments as in Eq. (2.3.32);this includes the “Jacobean factor” in a volume integration of V .B. The = 0 therefore represents the “divergence-free” ngure of the vector B. equation d While V . B is the integrand in the integral form of this law, d includes also the Jacobean factor in the same integral. When, as here, orthonormal coordinates are used as the variables of integration, this extra factor is trivially equal to 1, but in other coordinates the distinction is more substantial. However, since the Jacobean factor cannot vanish, it cannot influence the vanishing of the integrand. This discussion of integrands is expanded upon in Section 4.2. Here are some other examples of familiar equations expressed using differential forms:

z(2)

z(2)

53

DIFFERENTIAL FORMS N

= -d$O) , equivalent to E = -Vd,

&(I)

(2.3.42)

yields the “electric field” as the (negative) gradient of the potential. Also

Z P = 0,

v x E = 0,

equivalent to

(2.3.43)

states that E is “irrotational” (that is, the curl of E vanishes). The examples given so far have been applicable only to time-independent problems such as electrostatics. But let us define $’)

= [J,(x, y, z, t)&

& + J y ( x ,y , z, t)& A & + Jz(X, A &’ A &.

A

- p(X, y, Z, t)&

y, z, t)&

A

&‘]

A

&

(2.3.44)

Then

Zcj‘’)

= o is equivalent to

aP v x J+= 0, at

(2.3.45)

which is known as the “continuity equation.” In physics such relations relate “fluxes” and “densities.” This is developed further in Section 4.4.2. Another familiar equation can be obtained by defining

~ ( l ) = ~ , & + ~ , & + ~ , & - ~ & . (2.3.46) Then the equation

(2.3.47) is equivalent to the pair of equations

B=VxA,

aA E=---Vd. at

(2.3.48)

These examples show that familiar vector equations can be reexpressed as equations satisfied by differential forms. All these equations are developed further in Chapter 12. The full analogy between forms and vectors, in particular including cross products, requires the introduction of “supplementary” multivectors, also known as “the star (*) operation.” This theory is developed in Section 4.2.4. What are the features of these newly introduced differential forms derived by exterior differentiation? We state some of them, without proof for now. 0 The forms derived in this way inevitably find themselves acting as the “differential elements” of multidimensionalintegrals. When one recalls two of the important difficulties in formulating multidimensionalintegrals, evaluating the appropriate Jacobian and keeping track of sign reversals, one will be happy to know that these exterior derivatives “take care of hoth problems.” The true power of the exterior derivatives is that this formalism works for spaces of arbitrary di-

54

GEOMETRY OF MECHANICS I: LINEAR

0

0

0

0

mension, though formidable combinatorial calculations may be necessary. We will return to this subject in Sections 4.3.2 and 4.4. The differential forms “factor out” the arbitrary incremental displacements, such as Ax(1) and Ax(2) in the above discussion, leaving them implicit rather than explicit. This overcomes the inelegant need for introducing different differential symbols such as d and 6. Though this feature is not particularly hard to graspit has been thoroughly expounded upon here-not being part of the traditional curriculum encountered by scientists, it is what causes the equations to have an unfamiliar appearance. The quantities entering the equations of physics such as Maxwell’s equations are, traditionally, physically measurable vectors, such as electric field E, that are naturally visualized as arrows. When writte2 i t terms of forms, invariant combinations of forms and vectors, such as (E, Ax), more naturally occur. Something resembling this observation has no doubt already been encountered in sophomore electricity and magnetism. Traditionally, after first encountering Maxwell’s equations in integral form, one uses vector analysis to transform them into relations among curls and divergences relating space and time derivatives of the electric and magnetic quantities. Though the differential versions of Maxwell’s equations fit more neatly on tee shirts, the integral_ver_sionsare just as fundamental; their integrands are invariant products like (E, Ax). It is only when the differential versions of these equations are expressed in terms of exterior derivatives instead of curls and divergences that they acquire an unfamiliar appearance. By far the most fundamental property of the calculus of differential forms is that they make the equations manifestly invariant, that is, independent of coordinates. Of course this is also the chief merit of the vector operators, gradient, divergence, and curl. Remembering the obscurity surrounding these operators when they were first encountered (some of which perhaps still lingers in the case of curl), one has to anticipate a considerable degree of difficulty in generalizing these concepts-which is what the differential forms do. In this section only a small start has been made toward establishing this invariance; the operations of vector differentiation, known within vector analysis to have invariant character, have been expressed by differential forms. Having said all this, it should also be recognized that the differential forms really amount to being just a sophisticated form of advanced calculus.

2.4. ALGEBRAIC TENSORS 2.4.1. Vectors and Their Duals In traditional physics (unless one includes graphical design) there is little need for geometry without algebra-synthetic geometry-but algebra without geometry is both possible and important. Though vector and tensor analysis were both motivated initially by geometry, it is useful to isolate their purely algebraic aspects. Everything

ALGEBRAIC TENSORS

55

that has been discussed so far can be distilled into pure algebra. That will be done in this section, though in far less generality than in the references listed at the end of the chapter. Van der Waerden 141allows numbers more general than the real numbers we need; Arnold [3] pushes further into differential forms. Most of the algebraic properties of vector spaces are “obvious” to most physicists. Vectors x , y, etc., are quantities for which superposition is valid-for scalars a and b, ax by is also a vector. The dimensionality n of the vector space containing x, y, etc., is the largest number of independent vectors that can be selected. Any vector can be expanded uniquely in terms of n independent basis vectors e l , e2, . . ., en;

+

x = eix’.

(2.4.1)

This provides a one-to-one relationship between vectors x and n-component multiplets ( X I , x 2 , . . . ,xn)-for now at least, we will say they are the same thing.I7 In particular, the basis vectors el, e ~. .,. , e,, correspond to (1 , 0, . .. , 0),(0, 1, . . . ,0 ) ,. . . , (0,0, . . ., 1). Component-wise addition of vectors and multiplication by a scalar is standard. hportant new content is introduced when one defines a real-valued linearfunction f ( x ) of a vector x ; such a function, by definition, satisfies relations

Expanding x in basis vectors ei, this yields

-f ( x ) = fi x’ = (f, x ) , .

-

where

-

fi = f(ei).

(2.4.3)

This exhibits the value of?(x) as a linearform in the components x i with coeifficients fi. Now we have a one-to-one correspondence between linear functions f and ncomponent multiplets (fl, f2, . . . , fn). Using language in the loose fashion often applied to vectors, we can say that a linear function of a vectz and a linear form in the vector’s components are the same thing though, unlike f , the fi depend on the choice of basis vectors. This space of linear functions of vectors in the original space is called dual to the original space. With vectors in the original space called contravariant, vectors in the dual space are called covariant. Corresponding to basis vectors ei in the original space there is a natural choice of basis vectors 3 in the dual space. When acting on ei, 3 yields 1; when acting on any other of the ej it yields 0. Just as the components of el are (1, 0, . . . , 0) the components of Z’ are (1, 0, . . ., 0). and so on. More concisely,18 7‘

e (ej)=

(2.4.4)

l7 As long as possible we will stick to the colloquial elementary physics usage of refusing to distinguish between a vector and its collection of components,even though the latter depends on the choice of basis vectors while the former does not. ‘*There is no immediate significance to the fact that one 01 the indices of 6;. is written as a subscript 1 and one as a superscript. Equal to 6ij, 6’j is also a Kronecker-6.

56

GEOMETRY OF MECHANICSI: LINEAR

By taking all linear combinations of a subset of the basis vectors, say the first m of them, where 0 < m < n, one forms a sub-vector-space S of the original space. Any vector x in the whole space can be decomposed uniquely into a vector y= eixi in this space and a vector z = eixi. A “projection operator” P onto the subspace can then be defined by y = Px. It has the property that Pz = P. Since x = P x (1 - P)x, z = (1 - P)x and 1 - P projects onto the space formed from the last n - m basis vectors. There is a subspace 9 in the dual space, known as the “annihilator” of S; it is the vector space made up of all linear combinations of the n - m forms P+’,P s 2 .,.., p .These are the last n - m of the natural basis forms in the dual space, as listed in Eq. (2.4.4).Any form in So “annihilates” any vector in S, which is to say yields zero when acting on the vector. This relationship is reciprocal in that S annihilatesSo.Certainly there are particular forms not in Sothat annihilate certain vectors in S, but So contains all forms, and only those forms, that annihilate all vectors in S. This concept of annihilation is reminiscent of the concept of the orthogonality of two vectors in ordinary vector geometry. It is a very different concept, however, since annihilation relates a vector in the original space and a form in the dual space; an arrow in S such as el crosses no planes corresponding to a form in SO such as Em+’. Only if there is a rule associating vectors and forms can annihilation be used to define orthogonality of two vectors in the same space. By introducing linear functions of more than one vector variable we will shortly arrive at the definition of tensors. However, since all other tensors are introduced in the same way as was the dual space, there is no point in proceeding to this definition without first having grasped the concept of the dual space. Toward that end we should eliminate an apparent asymmetry between contravariant vectors and covectors. This asymmetry has resulted from the fact that we starred with contravariant vectors, and hence might be inclined to think of them as more basic. But consider the space of linear functions of covariant vectors-that is, the space that is dual to the space that is dual to the original space. (As an exercise) it can be seen that the dual of the dual is the same thing as the original space. Hence, algebraically at least, which is which between contravariant and covariant vectors is entirely artificial, just like the choice of which is to be designated by superscripts and which by subscripts.

xy

Cm+,

+

2.4.2. Transformationof Coordinates When covariant and contravariant vectors are introduced in physics, the distinction between them is usually expressed in terms of the matrices accompanying a change of basis vectors. Suppose a new set of basis vectors e‘j is related to the original set ej by (2.4.5)

(If one insists on interpreting this relation as a matrix multiplication, it is necessary to regard e’j and ej as being the elements of row vectors, even though the row elements are vectors rather than numbers, and to ignore the distinction between upper and

ALGEBRAIC TENSORS

57

lower in dice^.)'^ Multiplying on the right by the inverse matrix, the inverse relation is (2.4.6)

For formal manipulation of formulas, the index conventions of tensor analysis are simple and reliable, but for numerical calculations it is sometimes convenient to use matrix notation in which multicomponent objects are introduced so that the indices can be suppressed. This is especially useful when using a computer language that can work with matrices as supported types that satisfy their own algebra of addition, multiplication, and scalar multiplication. To begin the attempt to represent the formulas of mechanics in matrix form, some recommended usage conventions will now be formulated, and some of the difficulties in maintaining consistency will be explicitly addressed. Already, in defining the symbols used in Eq. (2.4.5), a conventional choice was made. The new basis vectors were called e’j when they could have been called e j ‘ ; that is, the prime was placed on the vector symbol rather than on the index. It is a common, and quite powerful notation, to introduce both of these symbols and to use them to express two distinct meanings. (See for example Schutz.) In this notation, even as one “instantiates” an index, say replacing i by 1, one must replace i’ by l’, thereby distinguishing between el and elf.In this way, at the cost of further abstraction, one can distinguish change of axes with fixed vector from change of vector with fixed axes. At this point this may seem like pedantry, but confusion attending this distinction between active and passive interpretations of transformations will dog us throughout this text and the subject in general. One will always attempt to define quantities and operations unambiguously in English, but everyday language is by no means optimal for avoiding ambiguity. Mathematical language, such as the distinction between el and ell just mentioned, can be much more precise. But, sophisticated as it is, we will nor use this notation, because it seems too compact, roo mathematical, too cryptic. Another limitation of matrix notation is that, though it works well for tensors of one or two indices, it is not easily adapted to tensors with more than two indices. Yet another complication follows from the traditional row and column index-order conventions of matrix formalism. It is hard to maintain these features while preserving other desirable features such as lower and upper indices to distinguish between covariant and contravariant quantities, which, with the repeated-index summation convention, yield very compact formulas.20 Often, though, one can restrict calcula‘’Since our convention is that the up/down location of indices on matrices is irrelevant, Eq. (2.4.5) is the same as e’. = ei A i j . This in turn is the same as ei = ( A * ) i , e j , which may seem like a more natural J ordering. But one sees that whether it is the matrix or its transpose that is said to be the transformation matrix depends on whether it multiplies on the left or on the right and is not otherwise significant. *‘The repeated-indexconvention is itself used fairly loosely. For example, if the summation convention is used as in Eq. (2.4.5) to express a vector as a superposition of basis vectors, the usage amounts to a simple abbreviation without deeper significance. But when used (as it was by Einstein originally) to form a scalar from a contravariant and a covariant vector, the notation includes a deeper implication of invariance. In this text both of these conventions will be used, but for other summations, such as over particles in a system, the summation symbol will be shown explicitly.

58

GEOMETRY OF MECHANICS I: LINEAR

tions to a single frame of reference and, in that case, there is no need to distinguish between lower and upper indices. When the subject of vector fields is introduced, an even more serious notational complication will arise because a new kind of “multiplication” of one vector by another will be noncommutative. As a result, the validity of an equation such as AX)^ = xTAT is called into question. One is already accustomed to matrix multiplication being noncommutative, but the failure of vector fields to commute will seriously compromise the power of matrix notation and the usefulness of distinguishing between row and column vectors. In spite of all these problems, matrix formulas will still often be used, and when they are, the following - conventions will be adhered to: As is traditional, contravariant components X I ,x 2 , . . . , x” are arrayed as a column vector. This leads to the remaining conventions in this list. (Covariant)components fi of form fare to be arrayed in a row. The basis vectors ei, though not components of an intrinsic quantity, will be arrayed as a row for purposes of matrix multiplication. Basis covectors3 will be arrayed in a column. Notations such as will not be used; the indices on components are necessarily 1 , 2 , 3 ,. . . . Symbolic indices with primes, as in x,l are, however, legitimate. The indices on a quantity like h i j are spaced apart to make it unambiguous which is to be taken as the row, in this case i , and which as the column index. The up/down location is to be ignored when matrix multiplicationis being employed. In terms of the new basis vectors introduced by Eq. (2.4.5), using Fq. (2.4.6), a general vector x is reexpressed as

-

e’jxl’

= x = ekxk = e‘j(12-i ) ik x k ,

(2.4.7)

from which it follows that ,Ii

= (A-l)j kx k .

(2.4.8)

Because the matrix giving x i --f xi’ is inverse to the matrix giving ei -+ e’i, this is known conventionally as contravariant transformation. If the column of elements x’j and x k are symbolized by x’ and x and the matrix by A-‘, then Eq. (2.4.8)becomes XI

= A-’x.

(2.4.9)

When boldface symbols are used to represent vectors in vector analysis, the notation implies that the boldface quantities have an invariant geometric character, and in this context an equation like (2.4.9) might by analogy be expected to relate two different “arrows” x and XI.The present boldface quantities have not been shown to have this geometric character and, in fact, they do not. As they have been introduced, since x and x’ stand for the same geometric quantity, it is redundant to give them different symbols. This is an instance of the above-mentioned ambiguity in specifying

ALGEBRAIC TENSORS

59

transformations. Our notation is simply not powerful enough to distinguish between active and passive transformations in the same context. For now we ignore this redundancy and regard Eq. (2.4.9) as simply an abbreviated notation for the algebraic relation between the components. Since this notation is standard in linear algebra, it should be acceptable here once the potential for misinterpretation has been understood. Transformation of_covariant components fi has to be arranged to secure the invariance of the form (f,x ) defined in Eq.(2.4.3). Using Eq.(2.4.8) (2.4.10) and from this fk

= fj(A-‘)jk

Or

f ; = f j A ik .

(2.4.1 1)

This is known as covariant transformation because the matrix is the same as the matrix A with which basis vectors transform. The only remaining case to be considered is the transformation of basis one-forms; clearly they transform with A-’ . Consider next the effect of following one transformation by another. The matrix representing this “composition” of two transformations is known as the “concatenation” of the individual matrices. Calling these matrices A1 and 6 2 , the concatenated matrix A can be obtained by successive applications of Eq. (2.4.9): (2.4.12) This result has used the fact that the2ontravariant components are arrayed as a column vector. On the other hand, with f regarded as a row vector of covariant components, Eq. (2.4.1 1) yields

f7/ =?A1 A2, or A = A l A , .

(2.4.13)

It may seem curious that the order of matrix multiplications can be opposite for “the same” sequence of transformations, but the result simply reflects the distinction between covariant and contravariant quantities. Since general matrices A and B satisfy (AB)-’ = B-IA-’, the simultaneous validity of Eq. (2.4.12) and Eq. (2.4.13) can be regarded as mere self-consistency of the requirement that @, x ) be invariant. The transformationsjust considered have been passive, in that basis vectors were changed but the physical quantities were not. Commonly in mechanics, and even more so in optics, one encounters active linear transformations that instead describe honest-to-goodness evolution of a physical system. If the configuration at time tl is described by x ( t l ) and at a later time ?2 by x(t2). linear evolution is described by

and the equations of this section have to be reinterpreted appropriately.

60

GEOMETRY OF MECHANICS I: LINEAR

2.4.3. Transformationof Distributions

Often one wishes to evolve not just one particle in the way just mentioned, but rather an entire ensemble or distribution of particles. Suppose that the distribution, call it p ( x ) . has the property that all particles lie in the same plane at time t i . Such a distribution could be expressed as $ ( x ) 3 ( a x + by cz - d), where 6 is the Dirac &-“function”with argument which, when set to zero, gives the equation of the plane. (For simplicity set d = 0.) Let us ignore the distribution within the plane (described by b(x)) and pay attention only to the most noteworthy feature of this ensemble of points, namely the plane itself and how it evolves. If x(1) is the displacement vector of a generic particle at an initial time t(l), then initially the plane is described by an equation

+

f(l)ix’ = 0.

(2.4.15)

For each of the particles, setting x i = xll) in Eq. (2.4.15)results in an equality. Let us call the coefficients f(1)i“distribution parameters” at time f1 since they characterize the region containing the particles at that time. Suppose that the system evolves in such a way that the individual particle coordinates are transformed (linearly) to xi2). and then to x i 3 ) , according to

(2.4.16) With each particle having been subjected to this transformation,the question is, what is the final distribution of particles? Since the particles began on the same plane initially and the transformations have been linear, it is clear they will lie on the same plane finally. We wish to find that plane, which is to say to find the coefficients f(3)k in the equation f(3)k X k = 0.

(2.4.17)

This equation must be satisfied by xt3, as given by Eq. (2.4.16),and this yields k if(3)k (BA)i x - 0.

(2.4.18)

It follows that f(3)k

1

= f(l)i((BA)-l)ik = f(l)i(A-B

-1

i

k.

(2.4.19)

This shows that the coefficients fi describing a distribution of particles transform covariantly when individual particle coordinatesx i transform contravariantly. We have seen that the composition of successive linear transformations represented by matrices A and B can be either BA or A-lB-’ depending on the nature of the quantity being transformed, and it is necessary to determine from the context which one is appropriate. If contravariantcomponents compose with matrix BA,then covariant components compose with matrix A-’B-’.

ALGEBRAIC TENSORS

61

Though these concatenation relations have been derived for linear transformations, there is a sense in which they are the only possibilities for (sufficientlysmooth) nonlinear transformations as well. If the origin maps to the origin, as we have assumed implicitly, then there is a “linearized transformation” that is approximately valid for “small-amplitude” (close to the origin) particles, and the above concatenation properties must apply to that transformation. The same distinction between the transformation properties of particle coordinates and distribution coefficients must therefore apply also to nonlinear transformations, though the equations can be expected to become much more complicated at large amplitudes. It is only linear transformations that can be concatenated in closed form using matrix multiplication, but the opposite concatenation order of covariant and contravariant quantities also applies in the nonlinear regime. There is an interesting discussion in Schutz, [2, Sec. 2.181, expanding on the interpretation of the Dirac deltafunction as a distribution in the sense in which the word is being used here. If the argument of the delta function is said to be transformed contravariantly, then the “value” of the delta function transforms covariantly.

*2.4.4. Multi-index T e n s o r s and Their Contraction2’ We turn now to tensors with more than one index. Tbo-index covariant tensors are defined by conssering real-valued bilinear functions of two vectors, say x and y. Such a function f(x, y) is called bilinear because it is linear in each of its two arguments separately. When the arguments x and y are expanded in terms of the basis introduced in Q.(2.4.3), one has22 u

f(x, y) = fi, x i y J , where

fi,

-

= f(q, ej).

(2.4.20)

As usual, we will say that the functionTand the array of coefficients fi, are the same thing and that?(x, y) is the same thing as the bilinear form fij x iy J . The coefficients fi, are called covariant components of f. Pedantically it is only T(x, y), with arguments inserted, that deserves to be called aform, but common usage seems to_be to call T a form all by ?self. An expressive notation that will often be used is f(.,.), which indicates that f is “waiting for” two (as yet nameless, or “anonymous”) vector arguments. Especially important are the anti-symmetric bilinear functions f(x, y) that change sign when x and y are interchanged:

-

-

-f(x, y) = -T(Y, x),

or

fij

= -f... I‘

(2.4.21)

21Thissection is rather abstract. The reader willing to accept that the confrnczionof the upper and lower index of a tensor is invariant can skip it. Footnote 23 hints at how this result can be obtained more quickly. 22Eq. (2.4.20) is actually unnecessarily restrictive, since x and y could be permitted to come from different spaces.

62

GEOMETRY OF MECHANICS I: LINEAR

These alternating or antisymmetric tensors are the only multi-index quantities that represent important geometric objects. The theory of determinants can be based on them as well. (See Van der Waerden, Section 4.7.) To produce a contravariant two-index tensor requires the definition of a bilinear function of two covariant vectors G and V. One way of constructing such a bilinear function is to start with two fixed contravariant vectors x and y and to define f@, 7) = (Z, x)

6,Y).

(2.4.22)

This tensor is called the tensor product x 8 y of vectors x and y. Its arguments are u and v. (The somewhat old-fashioned physics terminology is to call f the dyadic product of x and y.) In more expressive notation,

The vectors x and y can in general belong to different spaces with different dimensionalities,but for simplicity in the following few paragraphs we assume they belong to the same space having dimension n. The components of x @ y are .

.

f " = (x@y)(Z'i, Z j ) = ( ? , x ) ( Z J , y ) = x ' Y'.

(2.4.24)

Though the linear superposition of any two such tensors is certainly a tensor, call it t = ( r i j ) , it does not follow in general that two vectors x and y can be found for which t is their tensor product. However, all such superpositions can be expanded in terms of the tensor products ei 8 e j of the basis vectors introduced previously. These products form a natural basis for such tensors t. In the next paragraph the n2-dimensionalvector space of two-contravariant-index tensors t will be called 7. At the cost of greater abstraction, we next prove a result needed to go from one index to two indices. The motivation is less than obvious, but the result will prove to be useful straightaway-what a mathematician might call a lemma;23

THEOREM 2.4.1: For any function B ( x , y) linear in each of its two arguments x and y, there exists an intrinsic linear function of the single argument x 8 y, call it S(x C?J y), such that

The vectors x and y can come from different vector spaces. Proofi In terms of contravariant components x i and y j , the given bilinear function has the form B(X, y) = sjj x i y j .

(2.4.26)

23The reader impatient with abstract argumentation may consider it adequate to base the invariance of the trace of a mixed tensor on the inverse transformation properties of covariant and contravariant indices.

ALGEBRAIC TENSORS

63

This makes it natural, for arbitrary tensor t drawn from space I,to define a corresponding function S(t) that is linear in the components r i j oft: ..

S(t) = sij r’J.

(2.4.27)

When this function is applied to x @ y, the result is S(x @ y) = sij x i y j = B(x, y),

(2.4.28)

which is the required result. Since components were used only in an intermediate stage the theorem assures the relation to be intrinsic (coordinate-free). (As an aside, note that the values of the functions S and B could have been allowed to have other (matching) vector or tensor indices themselves without affecting the proof. This increased generality is required to validate contraction of tensors with more than two rn indices.) Other tensor products can be made from contravariant and covariant vectors. Holding i? and V fixed while x and y vary, an equation like (2.4.22) can also be regarded as defining a covector product i? @V. A mixed vector product f = u @ y can be similarly defined by holding i? and y constant:24

-

f(x,V) = G,x ) F , Y).

(2.4.29)

The components of this tensor are

fi’

= c;;@ y)(ei, ~

j =)

(i?,e i ) g j , y) = ui yj.

(2.4.30)

Later on we will also require antisymmetrizedtensor products, or “wedge products” defined by

V) - (x, 3 ( Y , 3 , y) = (& X) 6, y) - (z, Y) 6, X).

x A Yc;;,7) = (x, 3 ( Y ,

5 A ?(X,

(2.4.31)

-

The generation of a new tensor by “index contraction” can now be considered. Consider the tensor product t = u @ x where iT and x belong to dual vector spaces. The theorem proved above can be applied to the function

BC;;, bilinear in

X)

= (i?,X) = uj x i l

(2.4.32)

and x, to prove the existence of intrinsic linear function S such that

~(ii 8 X) = uj x i = tr(ii 8 XI,

(2.4.33)

where tr(t) is the sum of the diagonal elements of tensor i? €3 x in the particular coordinate system shown (or any other, since 5,x is invariant). Since any mixed two24A deficiency of our notation appears at this point since it is ambiguous whether or not the symbol f in f = 8 y should carry a tilde.

64

GEOMETRY OF MECHANICS 1: LINEAR

component tensor can be written as a superposition of such covectorkontravector products, and since the trace operation is distributive over such superposition, and i ix) is an intrinsic function, it follows that tr(t) = tii is an invariant since ~ (8 function for any mixed tensor. tr(t) is called the contraction oft. 2.4.5. Overlap of Tensor Algebra and Tensor Calculus Before leaving the topic of tensor algebra, we review the differential form & obtained from a function of position x called h ( x ) . We saw a close connection between this quantity and the familiar gradient of vector calculus, V h. There is little to add now except to call attention to a potentially confusing issue of terminology. A physicist thinking of vector calculus thinks of gradients, divergences, and curls (the operators needed for electromagnetism) as being on the same footing in some sense-they are all “vector derivatives.” On the other hand, in mathematics books discussing tensors, gradients are normally considered to be “tensor algebra” and only the divergence and curl are the subject matter of “tensor calculus.” It is probably adequate for a physicist to file this away as yet another curiosity not to be distracted by, but contemplation of the source of the terminology may be instructive. One obvious distinction among the operators in question is that gradients act on scalars whereas divergences and curls operate on vectors, but this is too formal to account satisfactorily for the difference of terminology. Recall from the earlier discussion of differential forms, in particulE Eqs. (2.3.3) and (2.3.4) that, for a linear function h = ax +by, the coefficients of dh are a and b. In this case selecting the coefficient a or b, an algebraic operation, and differentiating h with respect to x or y, a calculus operation, amount to the same thing. Even for nonlinear functions, the gradient operator can be regarded as extracting the coefficients of the linear terms in a Taylor expansion about the point under study. In this linear “tangent space” the coefficients in question are the components of a covariant vector, as has been discussed. What is calculus in the original space is algebra in the tangent space. Such conundrums are not unknown in “unambiguously physics” contexts. For example, both in Hamilton-Jacobi theory and in quantum mechanics there is a close connection between the x-component of a momentum vector and a partial-with-respect-to-xderivative. Yet one more notational variant will be mentioned before leaving this topic. There is a notational convention popular with mathematicians but not commonly used by physicists (though it shouId be since it is both clear and powerful.) We introduce it now, only in a highly specialized sense, intending to expand the discussion later. Consider a standard plot having x as abscissa and y as ordinate, with axes rectangular and having the same scales-in other words ordinary analytic geometry. A function h ( x , y ) can be expressed by equal h-value contours on such a plot. For describing arrows on this plot it is customary to introduce “unit vectors,” usually denoted by (i, j) or (f,9).Let us now introduce the recommended new notation:

(2.4.34)

ALGEBRAIC TENSORS

65

Being equal to i and j, these quantities are represented by boldface symbols? i is that arrow that points along the axis on which x varies and y does not, and if the tail ofi is at x = xo, its tip is at x = xo 1. The same italicized sentence serves just as well to define alax-the symbol in the denominator signifies the coordinate being varied (with the other coordinates held fixed). This same definition will also hold if the axes are skew, or if their scales are different, and even if the coordinate grid is curvilinear. (Discontinuous scales should not be allowed, however.) Note that, though the notation does not exhibit it, the basis vector @/ax also depends on the coordinates other than x because it points in the direction in which the other coordinates are constant. One still wonders why this notation for unit vectors deserves a partial derivative symbol. What is to be differentiated? The answer is h ( x , y) (or any other function of x and y). The result, a h l a x , yields the answer to the question of how much h changes when x varies by one unit with y held fixed. Though stretching or twisting the axes would change the appearance of equal-h contours, it would not affect these questions and answers, since they specify only dependence of the function h ( x , y) on its arguments and do not depend on how it is plotted. One might say that the notation has removed the geometry from the description. One consequence of this is that the application of vector operations such as divergence and curl will have to be rethought, since they make implicit assumptions about the geometry of the space in which arguments x and y are coordinates. But the gradient requires no further analysis. x). What is the From a one-form % and a v e c t z x one can form the scalar 6, quantity formed when one-form dh, defined in Eq. (2.3.4), operates on the vector a/ax just defined? By Eq. (2.3.1) and the defined meaning of a/& we have = 1. Combining this with Eqs. (2.3.4) and (2.3.5) yields

+

&(a/&)

(2.4.35) where the final term is the traditional notation for the x-component of the gradient of h. In this case the new notation can be thought of simply as a roundabout way of expressing the gradient. zome modem authors, Schutz for example, (confusingly in my opinion) s i p l y call dh “the gradient of h.” This raises another question: Should the symbol be dh, which we have been using, or should it be &? Already the symbol d has been used toindicaLe “exterior differentiation,” and a priori the independently defined quantities d h are dh are distinct. But we will show that they are in fact equal, so it is immaterial which notation is used. From these considerations one infers that for contravariant basis vectors ex = and ey = 8 / 8 y the corresponding covariant basis vectors are Z* = d7u and

u

a/&

25Whetheror not they are true vectors depends on whether or not i and j are defined to be true vectors. The answer to this question can be regarded as a matter of convention; if the axes are regarded as fixed once and for all then i and J are true vectors; if the axes are transformed, they are not.

66

Z2

GEOMETRY OF MECHANICS I: LINEAR

&I. Why is this so? For example, because & ( 8 / 8 x ) a ax’

el = -,

a

e2 = - , . . . , e n 8x2

= 1. To recapitulate: 8

=-

(2.4.36)

OX“

are the natural contravariant basis vectors, and the corresponding covariant basis vectors are -1

Z1 = d x ,

-Jz

-2

e =dx

,..., uen = d-xn .

(2.4.37)

The association of a/&’,B / B x 2 , .. . , B/Bx”, with vectors will be shown to be of far more than formal significance in Section 3.4.3. where vectors are associated with directional derivatives.

2.5. (POSSIBLY COMPLEX) CARTESIANVECTORS IN METRIC GEOMETRY 2.5.1. Euclidean Vectors

Now, for the first time, we hypothesize the presence of a “metric” (whose existence can, from a physicist’s point of view, be taken to be a “physical law,” for example the Pythagorean “law” or the Einstein-Minkowski “law”). We will use this metric to “associate” covariant and contravariant vectors. Such associations are being made constantly and without a second thought by physicists. Here we spell the process out explicitly. The current task can also be expressed as one of assigning covariant components to a true vector that is defined initially by its contravariant components. A point in three-dimensional Euclidean space can be located by a vector x = elx 1

+ e2x 2 + e3x3

E eixi,

(2.5.1)

where el, e2, and e3 form an orthonormal (defined below) triplet of basis vectors. Such a basis will be called “Euclidean.” The final form again employs the repeatedindex summation convention, even though the two factors have different tensor character in this case. In this expansion, the components have upper indices and are called “contravariant” though, as it happens, because the basis is Euclidean, the covariant components xi to be introduced shortly will have the same values. For skew bases (axes not necessarily othogonal and to be called “Cartesian”), the contravariant and covariant components will be distinct. Unless stated otherwise, xl, x 2 , and x3 are allowed to be complex numbers-we defer concerning ourselves with the geometric implications of this. We are restricting the discussion to n = 3 here only to avoid inessential abstraction; in The Theory of Spinors by Cartan, most of the results are derived for general n, using arguments like the ones to be used here. The reader may be beginning to sense a certain repetition of discussion of concepts already understood, such as covariant and contravariant vectors. This can best be defended by observing that, even though these concepts are essentially the same

(POSSIBLY COMPLEX) CARTESIAN VECTORS IN METRIC GEOMETRY

67

in different contexts, they can also differ in subtle ways, depending upon the implicit assumptions that accompany them. All vectors start at the origin in this discussion. According to the Pythagorean relation, the distance from the origin to the tip of the arrow can be expressed by a “fundamental form” or “scalar square” @(x)=x.x=

(x + (q2 + (x3)2.

(2.5.2)

Three distinct cases will be of special importance: The components xl, x2, and x3 are required to be real. In this case @(x), conventionally denoted also by 1xI2,is necessarily positive, and it is natural to divide any vector by 1x1 to convert it into a “unit vector.” This metric describes ordinary geometry in three dimensions, and constitutes the Pythagorean law referred to above. 0 The components x’, x2, and x3 are complex with fundamental form given by Eq. (2.5.2). Note that @(x) is not defined to be X’x 1 X 2 x 2 X3 x 3 and that it has the possibilities of being complex or of vanishing even though x does not. If it vanishes, the vector is said to be “isotropic.” If it does not, it can be normalized, converting it into a “unit vector.” 0 In the “pseudo-Euclidean” case the components xl, x2, and x3 are required to be real, but the fundamental form is given not by E!q. (2.5.2) but by 0

+

@(x) =

+ (x2)2

- (x3)2.

+

(2.5.3)

Since this has the possibility of vanishing, a vector can be “isotropic,” or “on the light cone” in this case also. For @ > 0, the vector is “space-like”; for @ < 0 it is “time-like.’’In these cases a “unit vector” can be defined as having fundamental form of magnitude 1. In this pseudo-Euclidean case, “ordinary” space-time requires n = 1 3. This metric could legitimately be called “Einstein’s metric,” but it is usually called “Minkowski’s.” In any case, its existence can be regarded as a physical law, not just a mathematical construct. To the extent possible these cases will be treated “in parallel,” in a unified fashion, with most theorems and proofs applicable in all cases. Special properties of one or the other of the cases will be interjected as required. The “scalar” or “invariant product” of vectors x and y is defined in terms of their Euclidean components by

+

x . y d y 1 + x 2 y 2 + x 3 y 3.

(2.5.4)

Though similar-looking expressions have appeared previously, this is the first one deserving of the name “dot product.” If x . y vanishes, x and y are said to be orthogonal. An “isotropic” vector is orthogonal to itself. The vectors orthogonal to a given vector span a plane. (In n-dimensional space this is called a “hyperplane” of n - 1

68

GEOMETRYOF MECHANICS I: LINEAR

dimensions.) In the pseudo-Euclidean case there is one minus sign in the definition of scalar product as in Eq. (2.5.3). The very existence of a metric permits the introduction of a “natural” association of a vector form ito a vector x such that, for arbitary vector y, i(y) = x y.

Problem 2.5.1: Show that definition (2.5.4) follows from definition (2.5.3) if one assumes “natural” algebraic properties for “lengths” in the evaluation of (x + hy) . (x h y ) , where x and y are two different vectors and h is an arbitrary scalar.

+

2.5.2. Skew Coordinate Frames The basis vectors ql,q2,and q3,in a skew, or “Cartesian,” frame are not orthonorma1 in general. They must however be “independent”; geometrically this requires that they not lie in a single plane; algebraically it requires that no vanishing linear combination can be formed from them. As a result, a general vector x can be expanded in terms of vl,q2,and q3.

(2.5.5)

x = q j x i,

and its scalar square is then given by . .

@(x) = qi . q

j x l x ~

(2.5.6)

gijxixj.

Here “metric coefficients,” and the matrix G they form, have been defined by

gij

= g/”l --~

i * 7 1 j ,

G=

771 ‘ 7 7 1

rll .772

772.111

~ 2 , 7 7 2 772.113

773 .77l

r13 ’772

(

771.773

773 - 7 3

)

.

(2.5.7)

As in Section 2.4, the coefficients x i are known as “contravariant components” of x. When expressed in terms of them, the formula for length is more complicated than the Pythagorean formula because the basis vectors are skew. Nevertheless it has been straightforward, starting from a Euclidean basis, to find the components of the metric tensor. It is less straightforward, and not even necessarily possible in general, given a metric tensor, to find a basis in which the length formula is Pythagorean. *2.5.3. Reduction of a Quadratic Form to a Sum or Difference of Squares26 For describing scalar products, defined in the first place in terms of orthonormal axes, but now using skew coordinates, a quadratic form has been introduced. Con26Tbe material in this and the next section is reasonably standard in courses in algebra. It is nevertheless spelled out here in some detail since, like some of the other material in this chapter. analogous procedures will be used when “symplectic geometry” is discussed.

69

(POSSIBLY COMPLEX) CARTESIAN VECTORS IN METRIC GEOMETRY

versely, given an arbitrary quadratic form @ = giju'u', can we find a coordinate transformation xi = q j u j to variables for which @ takes the form of Eq. (2.5~1)?~ In general, the components can be complex. If the components are required to be real then the coefficients ai, will also be required to be real; otherwise they also can be complex. The reader has no doubt been subjected to such an analysis before, though perhaps not with complex variables allowed.

THEOREM 2.5.1: Every quadratic form can be reduced to a sum of (positive or negative) squares by a linear transformation of the variables. Proof: Suppose one of the diagonal elements, say g l 1 , is non-zero. With a view toward eliminating all terms linear in ul, define (2.5.8) which no longer contains u ' . Hence, defining YI

(2.5.9)

gliu',

the fundamental form can be written as @

1 = -yy: g11

+ aq;

(2.5.10)

the second term has one fewer variable than previously. If all diagonal elements vanish, one of the off-diagonal elements, say does not. In this case define 2

@2

gijuiui - -(g21u1

g23u3)(gl2u2

g12

g12,

+ g13u3),

(2.5.11)

4-g13u3,

(2.5.12)

which contains neither u 1 nor u 2 . Defining Y1

+ Y 2 = g21u1 4- g23u3,

YI - Y 2 =

we obtain @

= g12 ( Y : - Y:)

+ @2,

(2.5.13)

again reducing the dimensionality. 27F0rpurposes of this proof, which is entirely algebraic, we ignore the traditional connection between upperflower index location, and contravariantkovariantnature. Hence the components xi given by x; = a i j d are nor to be regarded as covariant, or contravarianteither, for that matter.

70

GEOMETRY OF MECHANICSI: LINEAR

The form can be reduced to a sum of squares step-by-step in this way. In the real domain, no complex coefficients are introduced, but some of the coefficients may be negative. In all cases, normalizations can be chosen to make all coefficients be 1 or -1.

Problem 2.5.2: Sylvester’s Law of Inertia. The preceding substitutions are not unique but, in the domain of reals, the relative number of negative and positive coefficients in the final form is unique. Prove this, for example by showing a contradiction resulting from assuming a relation @ = y 2 - z l2- z ; = u l +2u 2 - w 2 ,

2

(2.5.14)

between variables y, 21, and 22 with two negative signs on the one hand, and u, u2, and wl with only one negative sign on the other. In “nondegenerate” (that is, det lgj, I # 0) ordinary geometry of real numbers, the number of positive square terms is necessarily 3. 2.5.4. Introductionof Covariant Components The contravariant components x 1 are seen in Eq. (2.5.5) to be the coefficients in the expansion of x in terms of the qi. In ordinary vector analysis one is accustomed to identifying each of these coefficients as the “component of x” along a particular coordinate axis and being able to evaluate it as 1x1 multiplied by the cosine of the corresponding angle. Here we define lowered-index components x i , to be called “covariant,” (terminology to be justified later) as the “invariant products” of x with the ‘li: xi = x . qi = g i k X k ,

or as a matrix equation x = ( G XT) N

where ? stands for the array Eq. (2.5.4) can be written

(XI,

. . . ,x n ) .

x T GT ,

(2.5.15)

Now the scalar product defined in

~ . y = ~ i =y y’ j ~ ’ .

(2.5.16)

By inverting Eq. (2.5.15) contravariant components can be obtained from covariant ones

x T =%(GT)-’, or as components, x i = g i k X k

where g i k = (G-’)jk. (2.5.17) For orthonormal bases, G = 1 and, as mentioned previously, covariant and contravariant components are identical. Introduction of covariant components can be regarded as a simple algebraic convenience with no geometric significance. However, if the angle 0 between vectors x and y is defined by cost? =

X.Y

vwm5’

(2.5.18)

(POSSIBLY COMPLEX) CARTESIANVECTORS IN METRIC GEOMETRY

71

x:(x’= 2, x*= 1)

92

+

FIGURE 2.5.1. The true vector 2111 7)2 expressed in terms of contravariant and, using Eq. (2.5.19), its covariant components related to direction cosines. For Euclidean geometry is normally symbolized by 1x1.

then a general vector x is related to the basis vectors q l , q2,and q3 by direction cosines cos 01, cos 02, cos 03, and its covariant components are xi = X . qi =

Jm

COSO~.

(2.5.19)

This definition is illustrated in Fig. 2.5.1, The angles (or rather their cosines) introduced in these definitions would be just redundant symbols except for the fact that all of trigometry is imported into the formalism in this way.

2.5.5. The Reciprocal Basis Even in Euclidean geometry there are situations in which skew axes yield simplified descriptions, which makes the introduction of covariant components especially useful. The most important example is in the description of a crystal for which displacements by integer multiples of a right-handed triad of “unit cell” vectors ql, q2, and q3leave the lattice invariant. Let these unit cell vectors form the basis of a skew frame as in Section 2.5.2. For any vector x in the original space we can associate a particular form E in the dual space by the following rule, giving its value %(y)when acting on general vector y: 6,y) = x . y. In particular, “reciprocal basis vectors” and basis forms ?f are defined to satisfy @ q .J ) =qi . qJ . =A‘.,J

(2.5.20)

and $, ;i”,and G3 are the basis dual to ql, q2,and q3 as in Eq. (2.4.4). The vectors qi in this equation need to be determined to satisfy the final equality. This can be accomplished mentally;

(2.5.21) where the orientation of the basis vectors is assumed to be such that f i is real and nonzero. (From vector analysis one recognizes to be the volume of the unit

72

GEOMETRYOF MECHANICSI: LINEAR

cell.) One can confirm that Eqs. (2.5.20) are then satisfied. The vectors ql, 772, and q3 are said to form the “reciprocal basis.” The “reciprocal lattice” consists of all superpositionsof these vectors with integer coefficients.

Problem 2.5.3: In terms of skew basis vectors q l , q2,and q3 in three-dimensional Euclidean space, a vector x = q i x i has covariant components xi = gijx’. Show that x=x1q

1

+ x 2 q 2 + x 3 q3.

where the q‘ are given by Eq.(2.5.21). By inspection one sees that reciprocal base vector q’ is normal to the plane containing q3and q2.This is illustrated in Fig. 2.5.2, which shows the unit cell vectors superimposed on a crystal lattice. (q3points normally out of the paper.) Similarly, 772 is normal to the plane containing q3and ql. Consider the plane passing through the origin and containing both q3 and the vector q1 N q 2 , where N is an integer. Since there is an atom situated at the tip of this vector, this plane contains this atom as well as the atom at the origin and the atom at 2(ql N q 2 ) , and so on. For the case N = 1, these atoms are joined by a line in the figure and several other lines, all parallel and passing through other atoms

+

+

-=I a = 0.69

b = 0.57 sin4 = 0.87 = 1.67

= 2.02

FIGURE 2.5.2. The crystal lattice shown has unit cell vectors 91 and 92 as shown, as well as 9 3 pointing normally out of the paper. Reciprocal basis vectors qi and $ are shown. The particular lattice planes indicated by parallel lines correspond to the reciprocal lattice vector 4 - q’ . It is coincidental that $ appears to lie in a crystal plane.@ , = absin4.

(POSSIBLY COMPLEX) CARTESIAN VECTORS IN METRIC GEOMETRY

73

that are shown as well. The vector

is perpendicular to this set of planes. (Again for N = 1) the figure confirms that q' - $ is normal to the crystal planes shown.

Problem 2.5.4: Show that for any two atoms in the crystal, the plane containing them and the origin is normal to a vector expressible as a superposition of reciprocal basis vectors with integer coefficients, and that any superposition of reciprocal basis vectors with integer coefficients is normal to a set of planes containing atoms. [Hint:For practice at the sort of calculation that is useful, evaluate (ql q2). (ql q2).I

+

+

It was only because the dot product is meaningful that Eq. (2.5.20) results in the But once that identification is association of an ordinary vector q' with the form made, all computations can be made using straightforwardvector analysis. A general vector x can be expanded either in terms of the original or the reciprocal basis

q.

x = qix' = x i qi ,

(2.5.22)

(The components xi can be thought of either as covariant components of x or as components of 2 such that %(y) = xiy'.) In conjunction with Eqs. (2.5.17) and (2.5.20) this yields

(gik)=G-l= q3

.q1

773

.$

q3

.$

(2.5.23)

Problem 2.5.5: Confirm the Lugrunge identity of vector analysis

(A x B) . (C x D) = det

:: ; 1 ;:,"I.

(2.5.24)

This is most simply done by expressing the cross products with the three, index anti-symmetric symbol E i j k . With the vectors A, B, C, and D drawn from ql,q2,and q3,each of these determinants can be identified as a cofactor in Q. (2.5.7). From this show that g = det IGI.

Problem 2.5.6: Show that original basis vectors qi are themselves reciprocal to the reciprocal basis vectors qi.

74

GEOMETRY OF MECHANICS I: LINEAR

2.5.6. Wavefronts, Lattice Planes, and Bragg Reflection In this section, to illustrate the physical distinction between contravariant and covariant vectors, we digress into the subjects of wave propagation, wave-particle duality, coherent interaction, energy and momentum conservatibn, and, ultimately, the condition for Bragg scattering of X-rays from crystals. The main purpose of this section is to provide examples of covariant vectors. Though these calculations may appear, superficially, to have little to do with mechanics, in the end a surprising connection with mechanics, especially quantum mechanics, will emerge. A two-dimensional slice through a crystal lattice is shown in Fig. 2.5.2. The unit cell vectors q1 and q2 lie in the plane shown. Their lengths are a and b. The remaining basis vector q3 is normal to both of them and has length c. Applying Eqs. (2.5.21), f i = abc sin 4, q1has length l/(a sin#) and is normal to q2,and # has length l/(b sin 4) and is normal to ql. In preparation for describing the interaction of X-rays with a lattice we review some properties of waves. We use standard Euclidean geometry. The wave vector k of a monofrequency plane wave points in the direction of propagation of the wave and has magnitude 2n/A where k is the wavelength. A “wavefront” is, on the one hand, a plane orthogonal to k and, on the other hand, a plane on which the “phase” # is constant. Analytically, # appears in the description of the wave by a wave function *(x, t ) q,(x, t )

-

ei@(x)e-iwr

, where

4(x) = k . x = k ; x ’ .

(2.5.25)

Note that the final expression, trivially valid for Euclidean axes, is also valid for skew axes; in this case it is seen to be economical to describe the wave vector by its covariant components, k; . To emphasize this point we can associate with k a form such that N

4(x) = k W .

(2.5.26)

This is entirely equivalent to the second part of (2.5.25). We wish now to point out the natural connection between pairs of adjacent wavefronts and the pairs of planes of a covariant vector. Though previously the spacing between pairs of planes representing a covariant vector have been taken to be “unity,” it is more natural, in a physics context, for the spacing to have the dimensions of length, and the natural choice is the wavelength I or perhaps A/(2n). Even though waves are being described, it is customary from geometric optics to define “rays” as being normal to wavefronts. One can keep track of the value of 4 by “stepping off’ wavelengths along a ray and multiplying the number of wavelengths by 2n. But it is more in the intended spirit of the present discussion to observe that the advance of 4 (in radians) along any path is obtained simply by counting the number of wavefronts crossed. This eliminates the need for introducing rays at all, since it works for any path. Fig. 2.5.3illustrates Bragg scattering from a particular set of parallel lattice planes in a two-dimensional lattice with unit cell vectors 77, and q2.Though shown as single

(POSSIBLY COMPLEX) CARTESIANVECTORS IN METRIC GEOMETRY

75

FIGURE 2.5.3. Bragg scattering from a crystal. The candidate scattering planes are indicated by straight lines with spacing d . It is only coincidentalthat the incident beam is directed more or less . construction hinted at by the dashed wavefronts normal to the lattice plane containing ~ 1The of incident an scattered wave demonstrates that coherence from all scattering sites in the same plane requires the (candidate) angle of reflection to be equal to the angle of incidence 0.

arrows, the incident X-rays actually form a wave-packet or beam, extended both longitudinally and transversely. We must assume that each wave packet is coherent (i.e., plane-wave-like) over a volume containing many atoms. Then the essence of the present calculation is to seek a situation in which there is constructive interference of the scattering amplitudes from numerous, say N, scattering sites so that the scattering amplitude, proportional to @, dominates all other scattering processes by a relative factor of order N . Offhand, one might imagine that N could be of the order of Avogadro’s number, which would yield a truly astronomical enhancement factor. Even apart from the fact that such a large scattering probability might imply more scattered intensity than incident intensity, there are many effects that reduce the effective coherence volume, thereby restricting the value of N to smaller numbers. (Some such effects are microstructure or “mosaic spread” due to imperfect crystals, thermal effects, and limited beam coherence.) For our purposes we will only assume that a situation yielding coherence from numerous sites dominates the scattering process. The result in this case is called “Bragg scattering.” The construction of Fig. 2.5.3 demonstrates that coherence from all scattering sites in a single plane requires the (candidate) angle of Bragg reflection to be equal to the angle of incidence 8.In this case the incident beam phase lag for the scattering on the right relative to the scattering on the left is exactly compensated for by its phase advance after scattering. This argument relies on the assumed equality of scattered and incident wavelengths. With the rest energy of a single scattering site being large compared to the energy of a single photon, this would tend to be approximately true even for scattering from a single atom, but in Bragg scattering the crystal recoils as a whole (or as nf sites anyway), validating the assumption by a further large factor N.

76

GEOMETRY OF MECHANICS I: LINEAR

For simplicity the following discussion is restricted to the particular configuration of planes shown in Fig. 2.5.3; the origin and ql q2 lie in the same plane, and the normal to the scattering planes (emphasized by parallel lines) is given by 17’ - $. (At the cost of introducing a few more integer coefficients, the final results could be proved in the far greater generality for which they are valid.) Denoting the incident and scattered wave vectors as p and q. the corresponding forms are and Define also ? = 5 - i. Motivated by earlier discussion, these quantities are defined as forms (with overhead tildes). (This comment is primarily pedantic in any case since the presence of a tilde, say on 5. distinguishes it from a corresponding vector p only by the feature that a dot product such as p . x can be expressed as F(x).) The condition ensuring coherence of the atom at the origin and the next atom on the same plane (and hence all atoms in the plane) is

+

G.

This condition fixes the direction, but not the length of r. For general scattering planes, this condition would beF(rnlql + r n 2 q 2 ) = 0 where rnl and rn2 are integersthe natural argument of ^i: is a vector from the origin to any lattice site. The enhancement factor h/z coming from the many coherent sites in any one plane may’be appreciable, but it will still.lead to negligible scattering unless there is also coherence from different planes. One way of expressing this extra coherence condition is that the phase lag caused by the extra path in scattering from the next plane must be equal to 2 n (for simplicity we skip the further possibility of this being an integral multiple of 2n),or h = 2dsin

(5 -

0) ,

(2.5.28)

which is traditionally known as “Bragg’s law.” This is illustrated in Fig. 2.5.4. This condition can also be expressed by an equation more nearly resembling Eq. (2.5.27). Starting from the origin, the vector q1 leads to an atom on the preceding scattering plane, and the vector q2leads to the one on the next plane. Recalling Eq.(2.5.26),and applying the condition for constructive interference from these

extrapath

-

dcos0

incident

:,

. e e ,

I

1 reflected

FIGURE 2.5.4. Geometric construction leading to Bragg’s law.

(POSSIBLY COMPLEX) CARTESIAN VECTORS IN METRIC GEOMETRY

77

atoms as well, leads to the equations u

r(ql) = -2n,

and F(v2)= 27r.

(2.5.29)

These are (a rather specialized version of) the so-called “Laue equations.’’ We now use these equations in order to write r as a superposition of reciprocal basis vectors of the form r = q 1 + & 2.

(2.5.30)

From Eq. (2.5.20) one has rJ . q j = Jij, which, combined with Eqs. (2.5.29) yields = -,9 = -2n, and hence

(Y

r

-=

2n

q2 - q1.

(2.5.31)

Referring to Fig. 2.5.2, this shows that the Bragg condition can be expressed by the statement that (except for factor 2n) the difference of incident and scattered wave vectors is equal to the reciprocal lattice vector describing the planes from which the scattering occurs. This form of the condition for coherence from successive planes is illustrated graphically in Fig. 2.5.5. The incident beam is represented by wavefronts that correspond to the incident covariant wave vector E, and the scattered beam by (shown as -6 in the figure). As drawn, equality of angles of incidence and reflection is satisfied, but the spacing of planes is not quite right for coherence. For conve-

FIGURE 2.5.5. Bragg scattering from a crystal. Dashed lines are wavefronts of incident wave covariant vector and (negative) reflected wave covariant vector -i. Dotted lines represent covariant vector - q. As drawn the Bragg condition is almost, but not quite, satisfied.

7 7

78

GEOMETRY OF MECHANICS I: LINEAR

nience in correlating with Eq. (2.5.31), let us suppose the contour lines of p and q in Fig. 2.5.5 are spaced by 2n rather than by unity. Then, to make the dotted lines match the solid lines exactly, it would be necessary to decrease the incident and scattered line spacings slightly; this would require a slight increase in the light frequency or a slight decrease in the angle of incidence (relative to the normal). Elegant as it is, Eq.(2.5.31) becomes yet more marvelous when conservation of momentum is invoked along with the deBroglie relation between momentum and wavelength. By introducing Planck’s constant we can state that if the momentum of an incident photon is given by bnc= hfi and of a reflected photon by fiEf = h i , then conservation of momentum yields for the recoil momentum of the lattice as a whole, AClat = hF. (Apologies for the dimensionally inconsistent notation.) Then Eq. (2.5.31) becomes (2.5.32) (Naturally, for scattering from another set of planes, the right hand side of this equation contains the appropriately different reciprocal lattice vector.) One of the justifications for calling this equation “marvelous” is that both sides of the equation refer to the lattice, leaving no reference to the incident and scattered photon beams. This suggests, and the suggestion is confirmed experimentally, that other elementary processes, such as bremstrahlung and electron-positron pair production, will acquire coherent enhancement when the crystal’s recoil momentum matches one of its reciprocal lattice vectors as in Eq.(2.5.32). Some of these considerations may well have been behind deBroglie’s original conjecture relating momentum and inverse wavelength. For our purposes it is perhaps more appropriate to regard these results as telling us that the natural interpretation of momentum is as a covariant vector. This statement is especially evident when the condition is written out in covariant components. Then Eq. (2.5.32) becomes

(This is dimensionally consistent because the reciprocal basis vectors have the dimensions of inverse length.) The generalization of this equation to arbitrary crystal planes is that the lattice recoil momentum (divided by h ) has to be a superposition of reciprocal basis vectors with integer coefficients. The following comments, though not germane to classical mechanics, may be of interest nonetheless. We have here stressed a connection of Planck‘s constant h with momentum that can be investigated experimentally using Bragg scattering from a crystal. There is an independent line of investigation, for example using the photoelectric effect, in which h is related to the energy of a photon. Comparison of the values of h obtained from these entirely independent types of experiment provides a serious test of the overall consistency of the quantum theory. By this point the reader should be convinced that the natural geometric representation of a covariant vector is as contour lines (in spite of any previously held prejudice to the contrary based on thinking of a gradient vector as an arrow pointing in the direction of maximum rate of ascent). It may take longer to become convinced

79

(POSSIBLY COMPLEX) CARTESIAN VECTORS IN METRIC GEOMETRY

that in the context of smooth manifolds (such as the set of configurations of a system described by generalized coordinates in Lagrangian mechanics) this is the onfy possible interpretation of a covariant vector. It is because, in that case, there is no such thing as a direction of maximum rate of ascent. Numerous important topics concerning the use of skew frames (i.e., frames linearly related to Euclidean frames) remain to be covered. They are characterized by the property that components of the metric tensor are constant, independent of position. We will be returning to these topics starting in Section 4.1.1. The reader might prefer to proceed there directly, skipping the discussion of curvilinear (nonlinear) reference frames in the next chapter. As mentioned before, even in nonlinear situations, a limited region can always be defined in which linear analysis is approximately valid.

Problem 2.5.7: With patterns photocopied onto transparency material from Fig. 2.5.6, two sets of parallel lines, representing incident and reflected wavefronts, are

0

0

0 0 0

0 0

0 0 0

0

0 0

0

0 0

0

0

0 0 0

0

0 0

0 0

0

0 0

0

0

0

0

0

0 0

0 0

0

0

0

0 0

0 0

0 0

0

0 0

0

0 0 0

FIGURE 2.5.6. Patterns for investigating covariant vectors and the Bragg reflection condition. This page is to be copied (twice) onto transparency material in order to perform Problem 2.5.7. The purpose of the two patterns on the left is to give a preview of the sort of pattern being looked for. The virtue of a pattern of circles is that all possible orientations of parallel (if only tangent) lines are represented. The problem uses only the patterns on the right.

80

GEOMETRY OF MECHANICS I: LINEAR

to be laid over the “lattice” of tiny circles in a configuration for which the Bragg condition is satisfied. Cell atoms are located at (O,O), (3,0), (1,3), (4,3), etc. and, in the same units, the spacing between lines is 21/22. There is not a unique solution. If managing three arrays proves to be too frustrating, a fairly good solution can be obtained using the planes in the lower left comer of the figure.

Problem 2.5.8: For Bragg scattering from the plane, mentioned below Eq. (2.5.27), containing the origin and the point m 1ql + m2q2, find the recoil momentum r corresponding to Bragg scattering with the longest wavelength possible and obtain a formula for that wavelength. *2.6. UNITARY GEOMETRY2*

Yet another kind of geometry, namely unitary geometry, can be introduced. This is the “geometry” of quantum mechanics. In physics one prefers to avoid using complex numbers because physical quantities are always real. But complex numbers inevitably result when the equations of the theory are solved. In quantum mechanics one accepts this inevitability and allows complex numbers from the start. However, since the eigenvalues of physical operators are interpreted as the values of physical measurements, it is necessary to limit the operators to those that have real eigenvalues, namely Hermitean operators. This can be regarded as one of the main principles of quantum mechanics. This is fortunate because the resulting simplification is great. In classical mechanics complex quantities also necessarily intrude, and so also do eigenvalue problems, and with them complex eigenvalues. But these eigenvalues do not have to be real, and there is no justification for restricting operators to those that are Hermitean. This is unfortunate, because the resultant formalism is “heavy.” As a result, this mathematics has not been much used in classical mechanics and cannot be said to be essential to the subject. Nevertheless the unitary formalism is consistent with the ideas that have been encountered in this chapter. It may also be interesting to see quantum methods applied to classical problems. However, since our operators will in general not be Hermitean, results cannot be carried from one field to the other as easily as we might wish. The main application of the theory in this section will be to analyze the solution of singular linear equations. This is more nearly algebra than geometry, but the results can be interpreted geometrically. They will be used while discussing linearized Hamiltonians and will be essential for the development of perturbation theory in the last chapter of the text. Other than this, the material will not be used. Because complex eigenvalues and eigenvectors are unavoidable in analyzing linearized Hamiltonian equations, it provides major simplification to introduce “ad**This section and the rest of the chapter can be skipped until needed. The results are used only in discussing Hamiltonian eigenvectors starting in Section 15.3.4 and in developing near-symplectic perturbation theory in Chapter 16. On the other hand, its connection with quantum mechanical formalism may be of interest; it also serves as a fairly elementary generalization of ordinary Euclidean geometry and illustrates the association of a form with a vector.

UNITARY GEOMETRY

81

joint” equations and solutions. It is for similar reasons that Dirac’s “bras” and “kets” are used in quantum mechanics. For any two (possibly complex) vectors w and z, a “Hermitean scalar product” is defined by the sum

(w, z) = wi*zi;

(2.6.1)

for the rest of this section two vectors placed in parentheses like this will have this meaning. One can say therefore that the elements w’* are components of a covariant vector or form in a space dual to the space containing z. In this way an association is established between the original space and the dual space-the association is implemented by complex conjugation. This definition has the feature that . .

(z, z) = zi*z‘

= 1212

(2.6.2)

is necessarily real and positive (unless z = 0). This is the “metric” of unitary geometry. The vanishing of this product can be expressed “w is orthogonal to z.” More explicitly this means that the form associated with w vanishes when evaluated on vector z. The adjoint matrix At is obtained from A by the combined operation of transposition and complex conjugation (in either order);

At = A*T.

(2.6.3)

(In the case that A is real, which will normally be true in our applications, the adjoint is simply the transpose.) The motivation for this definition is provided by the following equation:

(w. Az) = (A’w, z).

(2.6.4)

Definition (2.6.3) is applicable to vectors as well; that is, zt is the row vector whose elements are the complex conjugates of the elements of column vector z. Hence Eq. (2.6.1) can also be written as

(w, z) = wtz.

(2.6.5)

Much of mechanics can be reduced to the solution of the equation

dz - = AZ dt

(2.6.6)

where z is a column vector of unknowns and A a matrix whose elements characterize the (linearized) dependence of “velocities” dz/dt on position z. In principle all these quantities (except t) could be complex, but it will not hurt to think of them all as real. The equation that is said to be adjoint to (2.6.6) is (2.6.7)

82

GEOMETRY OF MECHANICSI: LINEAR

This is not the same as “taking the adjoint” of Eq.(2.6.6), which yields

dzt - = ztAt dt

(2.6.8)

An important feature making the adjoint solution useful can be seen in the following.

Let x(t) be a solution of Eq. (2.6.7) and y(t) a solution of Eq. (2.6.6). We evaluate

d -(x, y) = (-Atx, y) dt

+ (x, Ay) = 0.

(2.6.9)

That is, the scalar product of any solution of the direct equation with any solution of the adjoint equation is a constant. Perhaps the solutions of Eq. (2.6.6) that are easiest to visualize are those whose initial values are zl(0) = (1,0,0, .. . ,O ) T , z2(0) = (0, 1,0, . . ., O ) T , and so on. One can form the “transfer matrix” M(t) of Eq. (2.6.6) by assembling these solutions as the columns

M(t) = (zl(t)

~ 2 ( t ) .. . ) .

(2.6.10)

Note that M(0) = 1, the identity matrix. A transfer matrix Mad’(?)of the adjoint equation can be defined similarly. The equations

d -Madj = -AtMadj dt

and

d -M=AM dt

(2.6.1 1)

can be used to show that the transfer matrices are related by

Mad’([) = (Mt)-’(t).

(2.6.12)

This relation is true at t = 0 since both sides are equal to 1 and it continues to be true for all t since, using Eqs. (2.6.1l), (2.6.13) In discussing perturbation theory we will frequently need the transfer matrix of the adjoint equation; we will always express Mad’ as it appears as on the right-hand side of Eq. (2.6.12) to reduce by one the number of confusingly related functions. The solutions of Eqs. (2.6.6) and (2.6.7) (2.6.14) are not particularly simply related. The best one can do is Eq. (2.6.9) and Eq. (2.6.12), along with the following converse statement. If a function $(x, t) = (z(t), x(t)) (obtained from test function z(t) and arbitrary solution x(t) of the first equation) is a constant of the motion of the first of these equations, then z(t)) satisfies the

UNITARY GEOMETRY

second equation. To confirm this, evaluate the rate of change of by hypothesis); +(z,Ax)= dt

83

+ (which vanishes

(Z

-+A’z,x

. )(2.6.15)

Assuming an arbitrary function can be formed by superposition of solutions x ( t ) , the desired result follows. In solving Eq.(2.6.6), the eigenvectorsof A play an important role. We can exploit Dirac-like notation to provide a “box” to contain the eigenvalue of a vector w = (w’,w 2 , . . . , w2n)Tthat is known to be an eigenvector of a matrix (operator) A with eigenvalue A. This provides a convenient labeling mechanism. For simplicity we assume h is nondegenerate. Symbolizing the eigenvector by w = /A), it therefore satisfies

Alh) = All-).

(2.6.16)

w2*,. . . , w ~ ~ *If )A. has p as an eigenvalue, then At Its adjoint is Ih)t = (wl*, has p* as eigenvalue. Define (pit to be the eigenvector of At with eigenvalue p*; it therefore satisfies At(Plt = CL*(Pl+.

(2.6.17)

The adjoint of this relation is (PIA = CL(II.1.

(2.6.18)

(11.1 can therefore also be said to be the “left eigenvector” of A with eigenvalue p. The scalar product of row vector (ul = ( u l , u2, . . . , ~ 2 n with ) Ih) = w is defined by

(2.6.19) which is the same as straight matrix multiplication. In general (A( and Ih), being left and right eigenvalues of the same matrix with the same eigenvalue, are not simply related, and neither are (A*l and /A). (Confirm this for yourself on a simple matrix like

(k i).)

But in the special case that A is

“self-adjoint” or “Hermitean,”

At = A ,

(2.6.20)

the situation is simpler. In quantum mechanics it is taken as a matter of principle that operators representing observable quantities satisfy (2.6.20), since this condition ensures that the eigenvalues are real. (You should reconstruct the proof that this is true.) We will not be able to make this assumption universally, but if A is Hermitean, Eq. (2.6.18) becomes

A ( A t = P(Plt, or (PIt = IF.).

(2.6.21)

84

GEOMETRY OF MECHANlCS [: LINEAR

Under the same circumstance (At = A), we have

(A I A) =

c

wi*wi = (w,w) E Iwl2 .

(2.6.22)

2.6.1. Solution of Singular Sets of Linear Equations Adjoint matrices can be usefully applied to the problem of solving linear algebraic equations of the form AX = b,

(2.6.23)

where A is a square, 2n x 2n matrix and b is a 2n-element column vector. (The number of equations being even has no significance, other than that this is always the case in Hamiltonian mechanics.) All elements are (possibly complex) constants. In the trivial case that det IAl # 0, the solution is

x = A-’b,

(2.6.24)

but the situation is more complicated if det IAI = 0, since Eq. (2.6.23) may or may not be solvable. In ordinary geometry Eq.(2.6.23) could be the formulation of a problem “find vector x whose dot products with a set of vectors a(1),a(2),.,.. are equal to the set of values b .= ( b l ,bZ, . ..).7729 The matrix A would have a{1) as its upper row of elements, a:2, as its second row, and so on. The determinant of A vanishes if any combination of the a’s are mutually dependent, say for simplicity a(1)= a(2). In this case, EQ. (2.6.23) would be solvable, if at all, only if 61 = b2. In general, dependency among the rows of A has to be reflected in corresponding dependency among the elements of b. If det IAl = 0, there is a nonvanishing vector z satisfying the equation ATz = 0. (For the simple example mentioned in the previous paragraph this would be z = (1, - 1, 0, 0, . . .).) A necessary condition that has to be satisfied by b for the equations to be solvable can be obtained from the matrix product zTb = Z’AX = ( A = z ) ~ x= 0. One would therefore conclude that Eq. (2.6.23) could be solvable only if b is orthogonal to z. For the simple example this implies bl = b2, as already stated. The initial statement of the problem just discussed made it appear that the set of equations under discussion was specific to metric geometry, but this is misleading. With no metric defined, no product of vector a(i)with vector x can be defined. But the rows of the original matrix A can always be regarded as the components a(i)jof one-forms &. Then the individual equations are a(i)jx’ = bi. It is then no longer appropriate to regard b as a vector. These reinterpretations of the meaning of quantities entering Eq. (2.6.23) have no effect on the necessary condition for its solvability, 29AU entries in A and b would be real in ordinary geometry,and solutions for which entries in x are complex would be declared unacceptable. Such solutions would however be judged acceptable from the point of view of this section.

UNITARY GEOMETRY

85

since it was derived algebraically. The solvability condition is (6,z) = 0, which involves only the invariant product defined for any linear vector space. In the remainder of this section, conditions like this one for the solvability of Eq.(2.6.23) will be based on the Hermitean product (2.6.1). In this case it is a priori natural to think of b and the rows of A as forms. Though the overhead tildes will not be used, the temptation to think of b and x as belonging to the same space should be resisted. Since things become seriously messy if 0 is a multiple root of A, we will initially assume this to not be the case. Let y and z then be the solutions (unique up to multiplicativefactors) of the “homogeneous” and “adjoint homogeneous” equations:

Ay=O

and Atz=O.

(2.6.25)

Conjecturing that x satisfies Eq. (2.6.23), we form the scalar (2,

b) = (2, AX) = (Atz, X) = 0.

(2.6.26)

Certainly then, Eq. (2.6.23) can be solved only if b is “orthogonal” to z in the sense that (2, b) = 0. Conversely, suppose (2,

b) = 0.

(2.6.27)

The vector z can be chosen as a basis vector in the dual space, say the first, and 2n - 1 more basis vectors can be chosen, arbitrary except for being independent. We assemble these column vectors into a 2n x 2n matrix Z. In a similar way we form a square matrix Y whose first column is y (defined above) and whose other columns are arbitrary but independent. Next form the matrix

ZtAY=(. 0 A0o ) ,

(2.6.28)

which has been partitioned into a 1 x 1 element in the upper left, a (2n - I ) x (2n - 1) matrix A0 in the lower right, and so on. The elements shown to vanish do so because of the conditions previously placed on z and y. Also, because of the assumed absence of multiple roots, the matrix A0 has to be nonsingular. We return to the original Eq.(2.6.23), changing variables according to

x’ = Y-’x,

and defining b’ = Ztb.

(2.6.29)

The equations become

(o0

Ao)x’ 0 = b’ G ((:b)).

(2.6.30)

The uppermost element of b’ is (z, b). If this vanishes, then Eq. (2.6.30) is solvable; otherwise it is not. Supposing that (z, b) = 0, we can find a “particular solution” of

86

GEOMETRY OF MECHANICS I: LINEAR

Eq. (2.6.23), (0, x:)~, whose uppermost element is 0 and whose remaining elements satisfy Am = b or x g = A g l b N .

(2.6.31)

The matrix inversion is necessarily possible. From here on, the vector xo will be augmented by another uppermost component with value 0. We have found therefore that the condition (z, b) = 0, with z satisfying the first of Eqs. (2.6.25), is necessary and sufficient for the solvability of the original equation in the event that det IAJ= 0. Assured of the existence of a solution xo, we ask whether it is unique. Obviously not, since xo can be augmented by any multiple of y and still remain a solution. Since one prefers to have unique solutions, it is worth establishing a disciplined procedure for placing a minimal further requirement or requirements that pick out, from the set of solutions, the particular one that satisfies the extra requirements. In this spirit we insist that x also satisfy

(y, x) = 0,

which is to say ytx = 0.

(2.6.32)

Consider then the equation (A - ZY+)X = b,

(2.6.33)

which is the same as the original equation except for the second term. (Note that the second term (in parentheses) is a column vector multiplying a row vector, which yields a square matrix of the same size as A.) But our condition (2.6.32) makes the second term vanish. The unique solution to our augmented problem is therefore given by x = (A - zyt)-'b,

(2.6.34)

provided this matrix is not itself singular. Since the properties of z have not been used, the factor z could be replaced by (almost) any other, and it is unthinkable that the matrix would always be singular. But for the particular choice of z (the solution to the homogeneous adjoint equation), the matrix in Eq. (2.6.33) can be shown to be nonsingular. For the proof, it is enough to show that the only solution of the equation (A - zyt)yo is yo = 0. This is left as a not particularly easy exercise. Commonly the zero root of A is multiple, say double, in which case the homogeneous equation has two independent solutions, say y1 and y2. Since any linear combination of these is also a solution, these are not uniquely determined but, other than being independent, they can be chosen arbitrarily. Similarly the adjoint homogeneous equation has two independent solutions z1 and 2 2 . All of the above arguments can be generalized to cover this situation. Eq.(2.6.27) is generalized by accumulating the adjoint homogeneous solutionsjust mentioned as the columns of a matrix

V=(z1

22).

(2.6.35)

UNITARY GEOMETRY

87

The necessary and sufficient condition for the solvability of Eq. (2.6.1) is then

Vtb = 0.

(2.6.36)

In words, b must be orthogonal to all solutions of the adjoint homogeneous equation. The main increase in complexity comes in generalizing condition (2.6.32). We accumulate also the solutions of the direct homogeneous equation as the columns of a matrix.

u=(Yl

Y2)>

(2.6.37)

and place the extra condition on solution x that

u t x = 0.

(2.6.38)

(A - W t ) x = b,

(2.6.39)

Replacing Eq. (2.6.33) by

the unique solution of the original equation, augmented by condition (2.6.38) is given by

x = (A - W t ) - ' b .

(2.6.40)

Roots of even higher multiplicity can be handled in the same way. Later in the text, operators like the one appearing on the right-hand side of this equation, which generate a unique solution consistent with subsidiary conditions, will be symbolized by S,as in

x = Sb,

where S = (A - VUt)-'.

Problem 2.6.1: For the matrix A = this section.

(2.6.41)

( i)check all the formulas appearing in

When equations like Eq. (2.6.23) arise in a purely algebraic context they are often expected to be nonsingular, for example because they solve a well-posed problem. But it is more common for Eq.(2.6.23) to be singular in a geometric context. For example, in ordinary geometry, if A represents a projection onto the ( x , y ) plane followed by a rotation around the z axis, then Eq. (2.6.23) can be solved only if b lies in the ( x , y) plane, and the solution is unique only up to the possible addition of any vector parallel to the z-axis. The equations of this section have similar geometric interpretations for unitary geometry. In technology (and later in this text) the sort of analysis that has been applied to Eq. (2.6.23) is often somewhat generalized. Let the n-component vector x = ( x ' , . . . ,x") be replaced by x(t), a vector functim depending on a continuous parameter ?.Apart from possible new requirements, such as continuity,this amounts to

88

GEOMETRY OF MECHANICS I: LINEAR

little more than letting n +. 00. Suppose also that b is replaced by “drive” f(t) and A by linear “response operator” md2/dt2 k:

+

(m$

+ k) x ( t ) = f(t).

(2.6.42)

This equation can have “transient” solutions with f = 0 and “driven” solutions with f # 0. Important practical issues are whether a driven solution exists and, if it does, how the transient content of the solution is made unique by insisting that other conditions be satisfied. Examples of the use of the notation, manipulations,and results of this section can be found starting in Section 15.3.4.

BIBLIOGRAPHY References 1. R. Abraham and J. E. Marsden, Foundations of Mechanics, Addison-Wesley, Reading, MA, 1985. 2. B. F.Schutz, Geometrical Methods of Mathematical Physics, Cambridge University Press, Cambridge, UK, 1995. 3. V. I. Arnold, Mathematical Methods of Classical Mechanics, 2nd ed., Springer-Verlag, New York, 1989. 4. B. L. Van der Waerden, Algebra. Vol. 1, Springer-Verlag, New York, 1991.

References for Further Study Section 2.2 E. Cartan, Lqons sur la gkometrie des espaces de Riemann, Gauthiers-Villas, Paris, 195 1. (English translation available.) J. A. Schouten, TensorAnalysisfor Physicists, 2nd ed., Oxford University Press, Oxford, 1954.

Section 2.5 E. Cartan, The Theory of Spinors, Dover, 198 1

I

Section 2.6 H.Weyl, The Theory of Groups and Quantum Mechanics, Dover, New York, p 1-27. V. Yakubovitch and V. M. Starzhinskii, Linear Direrential Equations with Periodic Coeficients, Wiley, New York, 1975.

3 GEOMETRY OF MECHANICS II: CURVILINEAR 3.1. (REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS 3.1.1. Introduction In this section the description of orbits in real n-dimensional Euclidean space is considered, but using nonrectangular coordinates. The case n = 3 will be called “ordinary geometry.” Generalizing to cases with n > 3 is unnecessary for describing trajectories in ordinary space, but it begins to approach the generality of mechanics, where realistic problems require the introduction of arbitrary numbers of generalized coordinates. Unfortunately the Euclidean requirement (i.e., the Pythagorean theorem) is typically not satisfied in generalized coordinates. However, analysis of curvilinear coordinates in ordinary geometry already requires the introduction of mathematical methods like those needed in more general situations. It seems sensible to digest this mathematics in this intuitively more familiar setting rather than in the more abstract mathematical setting of differentiable manifolds. In the n = 3 case, much of the analysis to be performed may already be familiar, for example from courses in electricity and magnetism. For calculating fields from symmetric charge distributions, for example one that is radially symmetric, it is obviously convenient to use spherical rather than rectangular coordinates. This is even more true for solving boundary value problems with curved boundaries. For solving such problems, curvilinear coordinate systems that conform with the boundary must be used. It is therefore necessary to be able to express the vector operations of gradient, divergence, and curl in terms of these “curvilinear” coordinates. Vector theorems such as Gauss’s and Stokes’s need to be similarly generalized. In electricity and magnetism one tends to restrict oneself to geometricaliy simple coordinate systems such as polar or cylindrical systems, and in those cases some of the following formulas can be derived by more elementary methods. Here we con89

90

GEOMETRY OF MECHANICS II: CURVILINEAR

sider general curvilinear coordinates where local axes are not only not parallel at different points in space (as is true already for polar and cylindrical coordinates) but may be skew, not orthonormal. Even the description of force-free particle motion in terms of such curvilinear coordinates is not trivial-confirm this by describing forcefree motion using cylindrical coordinates. More generally, one is interested in particle motion in the presence of forces that are most easily described using particular curvilinear coordinates. Consider, for example, a beam of particles traveling inside an elliptical vacuum tube which also serves as a wave guide for an electromagnetic wave. Since solution of the wave problem requires the use of elliptical coordinates, one is forced to analyze the particle motion using the same coordinates. To face this problem seriously would probably entail mainly numerical procedures, but the use of coordinates conforming to the boundaries would be essential. The very setting up of the problem for numerical solution requires a formulation such as the present one. The problem just mentioned is too specialized for detailed analysis in a text such as this; these comments have been intended to show that the geometry to be studied has more than academic interest. But, as stated before, our primary purpose is to assimilate the necessary geometry as another step on the way to the geometric formulation of mechanics. Even such a conceptually simple task as describing straight-line motion using curvilinear coordinates will be instructive. 3.1.2. The Metric Tensor An n-dimensional “Euclidean” space is defined to consist of vectors x whose components along rectangular axes are xl, x2, . . . ,x ” , now assumed to be real. The “length” of this vector is x . x = X I 2 +x2

2

+ . . . +x”!

(3.1.1)

The “scalar product” of vectors x and y is x . y = x I y 1 + x 2y2 + . * . + x ” y ” .

(3.1.2)

The angle 0 between x and y is defined by (3.1.3) repeating the earlier result Eq. (2.5.18). That this angle is certain to be real follows from the well-known Schwarz inequality. A fundamental “orthonormal” set of “basis vectors” can be defined as the vectors having rectangular components el = (1,0,.,.,O),ez= (0, 1,...,0), etc. More general “Cartesian” or “skew” components are related to the Euclidean components by linear transformations xli

= Aijxj,

xi = (A-l)ijx’J.

(3.1.4)

Such a homogeneous linear transformation between Cartesian frames is known as a “centered-affine” transformation. If the equations are augmented by (possibly

(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS

91

vanishing) additive constants, the transformation is given the more general name ‘‘fine.” In terms of the new components the scalar product in (3.1.2) is given by (3.1.5) where the coefficients g’,k are the primed-system components of the metric tensor. Clearly they are symmetric under the interchange of indices, and the quadratic form with x = y has to be positive definite. In the original rectangular coordinates gjk = 6jk, where 6,k is the Kronecker symbol with value 1 for equal indices and 0 for unequal indices. In the new frame, the basis vectors ell = (1, 0, . . . ,0), e’2 = (0, 1, . . . , 0), etc., are not orthonormal in general, in spite of the fact that their given contravariant components superficially suggest it; rather (3.1.6)

e fI . . eJ’ .--k ’ i j . f

As defined so far, the coefficients gjk are constant, independent of position in space. Here, by “position in space” we mean “in the original Euclidean space.” For many purposes, the original rectangular coordinates x ’ , x 2 , .. . ,X” would be adequate to locate objects in this space and, though they will be kept in the background throughout most of the following discussion, they will remain available for periodically “getting our feet back on the ground.” These coordinates will also be said to define the “base frame” or, when mechanics intrudes, as an “inertial” frame.’ As mentioned previously, “curvilinear” systems, such as radial, cylindrical, or elliptical systems, are sometimes required. Letting u l , u 2 , . . . , u” be such coordinates, space is filled with corresponding coordinate curves; on each of the %’curves,” u1 varies while u 2 , . . . , U” are fixed, and so on. Sufficiently close to any particular point P, the coordinate curves are approximately linear. In this neighborhood the curvilinear infinitesimal deviations Au’ , Au2, . . . , Au“ can be used to define the scalar product of deviations Ax and Ay:

This equation differs from Eq. (3.1.5) only in that the coefficients g j k ( P ) are now permitted to be functions of position P? The quantity 4 is designated as IAxl or as As and is known as “arc length.” ’There is no geometric significance whatsoever to a coordinate frame’s being inertial, but the base frame will occasionally be called inertial as a mnemonic aid to physicists, who are accustomed to the presence of a preferred frame such as this. The curvilinear frame under study may or may not be rotating or accelerating. 21n this way, a known coordinate transformationhas determined a corresponding metric tensor g , k ( P ) . Conversely, one can contemplate a space described by components u’ , u 2 , . . . u” and metric tensor g , k ( P ) , with given dependence on P, and inquire whether a transformation to components for which the scalar product is Euclidean can be found. The answer, in general, is no. A condition that must be satisfied to ensure that the answer to this question be yes is given in Cartan [I].

.

92

GEOMETRY OF MECHANICS II: CURVILINEAR

FIGURE 3.1.1. Relating the “natural” local coordinate axes at two different points in ordinary space described by curvilinear coordinates. Because this is a Euclidean plane, the unit vectors el and e2 at M can be “parallel slid to point M +dM without changing their lengths or directions; they are shown there as dashed arrows. The curve labeled u h + 1 is the curve on which u1 has increased by 1, and so on.

3.1.3. Relating Coordinate Systems at Different Points in Space One effect of the coordinates’ being curvilinear is to complicate the comparison of objects at disjoint locations. The quantities that will now enter to discipline such comparisons are called “Christoffel coefficients.” Deriving them is the purpose of this section. Consider the coordinate system illustrated in Fig. 3.1.1, with M dM(ul d u l . u2 d u 2 , . . . , u” d u n ) being a point close to the point M(ul, u2, . . . , u“). (The figure is planar but the discussion will be n-dimensional.) For example, the curvilinear coordinates ( u ’ , u2, . . . , u”) might be polar coordinates (I;8,Cp). The vectors M and M dM can be regarded as vectors locating the points relative to an origin not shown; their base frame coordinates (x’ ,x2, . . . ,x”) refer to a rectangular basis in the base frame centered there; one assumes the base frame coordinates are known in terms of the curvilinear coordinates and vice versa. At every point, “natural” basis vectors3 (el, e2, . , .en) can be defined having the following properties:

+

+

+

+

+

ei is tangent to the coordinate curve on which u i varies while the other coordinates are held constant. Without loss of generality i can be taken to be 1 in subsequent discussion. 0

With the tail of el at M, its tip is at the point where the first component has increased from u’ to u’ 1.

+

3The basis vectors being introduced at this point are none other than the basis vectors called 8 / 6 u 1,

B/Bu2,etc.. in Section 2.4.5,but we ref’rain from using that notation here.

(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS

93

However, the previous definition has to be qualified because the unit increment of a coordinate may be great enough to cause the coordinate curve to veer noticeably away from the straight basis vector-think, for example, of a change in polar angle @ -+ @ 1 radian. Clearly the rigorous definition of the “length” of a particular basis vector, say el, requires a careful limiting process. Instead, forsaking any pretense of rigor, let us assume the scale along the u 1 coordinate curve has been expanded sufficiently by “choosing the units” of u1 to make the unit vector coincide with the coordinate curve to whatever accuracy is considered adequate.

+

0

0

We will be unable to refrain from using the term “unit vector” to describe a basis vector ei, even though doing so is potentially misleading because, at least in physics, the term “unit vector” usually connotes a vector of unit length; the traditional notation for a vector parallel to ei, having unit length, is 6.Here we have ei 11 $ but $ = ei/leil.

If one insists on ascribing physical dimensions to the ei, one must allow the dimensions to be different for different i. For example, if (el, e2, e3) correspond to (r, 8, @), then the first basis vector has units of length while the other two are dimensionless. Though this may seem unattractive, it is not unprecedented in physics-one is accustomed to a relativistic 4-vector having time as2nz coordinate and distances as the others. On the other hand, the vectors @, 8,4) all have units of meters-but this is not much of an advantage since, as mentioned previously, the lengths of these vectors are somewhat artificial in any case.

Hence, deviating from traditional usage in elementary physics, we will use the basis vectors ei exclusively, even calling them unit vectors in spite of their not having unit length. Dimensional consistency wiIi be enforced separately. Dropping quadratic (and higher) terms, the displacement vector dM can be expanded in terms of basis vectors at point M:4

d~ = d u ’ e l + du2e2 + .dunen = du‘ei.

(3.I .8)

Note that this equation imples aM/aul = e l , aM/au2 = e2, etc. At each point other than M, the coordinate curves define a similar “natural” nplet of unit vectors. The reason that “natural” is placed in quotation marks here and above is that what is natural in one context may be unnatural in another. Once the particular coordinate curves (u’,u2, .. . ,u ” ) have been selected, the corresponding n-plet (el, e2, . . .,en) is natural, but that does not imply that the coordinates ( u l , u2,. . . , u”) themselves were in any way fundamental. Our present task is to express the frame (e’l ,e’2, . . . , e’n)at M dM in terms of the frame ( e l ,e2, . . . ,en) at M. Working with just two components for simplicity,

+

4A physicist might interpret Eq. (3.1.8) as an approximate equation in which quadratic terms have been neglected; a mathematician might regard it as an exact expansion in the “tangent space” at M.

94

GEOMETRY OF MECHANICSIt: CURVILINEAR

the first basis vector can be approximated as

e’l = el

+ del = el + q l e l + wfe2

(3.1.9a)

The (yet to be determined) coefficients oij can be said to be “affine-connecting” as they connect quantities in affinely related frames; the coefficients r j k are known as Christoffel symbols or as an “affine connection.” Both forms of Eq. (3.1.9) will occur frequently in the sequel, with the (b) form being required when the detailed dependence on coordinates ui has to be exhibited, and the simpler (a) form being adequate when all that is needed is an expansion of new basis vectors in terms of old. Here, as in the previous chapter, we employ a standard but bothersome notational practice; the incremental expansion coefficients have been written as q’rather than as doi’-a notation that would be harmless for the time being but would clash later on when the notation do is conscripted for another purpose. To a physicist it seems wrong for a differential quantity del to be a superpositionof quantities like ullel that appear, notationally, to be nondifferential. But, having already accepted the artificial nature of the units of the basis vectors, we can adopt this notation, promising to sort out the units and differentials later. The terminology “affine connection” anticipates more general situations in which such connections do not necessarily exist. This will be the case for general “manifolds’’ (spaces describable, for example, by “generalized coordinates” and hence essentially more general than the present Euclidean space). For general manifolds there is no “intrinsic” way to relate coordinate frames at different points in the space. Here “intrinsic” means “independent of a particular choice of coordinates.” This can be expressed by the following prohibition against illegitimate vector superposition: A vector at one point cannot be expanded in basis vectors belonging to a different point?

After this digression we return to the Euclidean context and Eq. (3.1.9).Thisequation appears to be doing the very thing that is not allowed, namely expanding e’l in terms of the ei . The reason it is legitimate in this case is that there is an intrinsic way of relating frames at M and M + dM-it is the traditional parallelism of ordinary geometry, as shown in Fig. 3.1.1. One is really expanding e’l in terms of the vectors ei slid parallel from M to M dM.All too soon, the concept of “parallelism” will have to be scrutinized more carefully, but for now, since we are considering ordinary space, the parallelism of a vector at M and a vector at M dM has its usual, intuitively natural meaning-for example, basis vectors el and e’l in the figure are almost parallel, while e2 and e’2 are not. With this interpretation,Q.(3.1.9) is a relation entirely among vectors at M+dM. The coefficients oiJand rijkbeing well defined, we proceed to determine them,

+

+

’This may seem counterintuitive; if you prefer, for now replace “cannot” by “must not” and regard it as a matter of dictatorial edict.

(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS

95

starting by rewriting Eq. (3.1.9) in compressed notation:

e', = ei

+ dei = ei + wij ej = ei

(3.1.10a)

+ riJkduke j .

(3.1.lob)

The quantities 0.'

= rijkdUk

(3.1.1 1)

are one-forms, linear in the differentials duk.6The new basis vectors must satisfy Q. (3.1.6):

e/ i .eI r = gjr

+ dgjr = (ei + mi'

e j ) . (e,

+ a,"e s ).

(3.1.12)

Dropping quadratic terms, this can be written succinctly as

+

dgj, = wiJg,, w,Sgi, '2 W i r

+

Ori.

(3.1.13)

(Because the quantities w i j are not the components of a true tensor, the final step is not a manifestly covariant, index-lowering tensor operation, but it can nonetheless serve to define the quantities wij, having two lower indices.) Because also

dgi, - ( a g i , / a u J )d u J ,one obtains

(3.1.14)

For reasons to be made clear shortly, we have written the identical equation three times, but with indices permuted, substitutions like gri = gir having been made in some terms, and the first equation having been multiplied through by - 1.

+

Problem 3.1.1: Show that Eqs. (3.1.14) yield n2(n 1)/2 equations that can be applied toward determining the n3 coefficients rijk.Relate this to the number of parameters needed to fix the scales and relative angles of a skew basis set. For the n = 3, ordinary geometry case, how many more parameters are needed to fix the absolute orientation of a skew frame? How many more conditions on the ri'k does this imply? Both for n = 3 and in general, how many more conditions will have to be found to make it possible to determine all of the Christoffel coefficients? old references they were known as pfaffian forms.

%

GEOMETRY OF MECHANICS II: CURVILINEAR

3.1.3.1. Digression Concerning “Flawed Coordinate Systems.” Two city dwellers part company intending to meet after taking different routes. The first goes east for N E street numbers, then north for NN street numbers. The second goes north for N N street numbers, then east for N E street numbers. Clearly they will not meet up in most cases because the street numbers have not been established carefully enough. Will their paths necessarily cross if they keep going long enough? Because cities are predominantly two-dimensional, they usually will. But it is not hard to visualize the presence of a tunnel on one of the two routes that leads one of the routes below the other without crossing it. In dimensions higher than two that is the generic situation. Though it was not stated before, we now require our curvilinear coordinate system to be free of the two flaws just mentioned. At least locally, this can be ensured by requiring (3.1.15) When expressed in vector terms using Eq. (3.1.8), the quantities being differentiated here can be expressed

aM

(3.1. I 6 )

aui = ei* Hence, using Eq. (3.1.10)b, we require

(3.1.17) This requirement that ryi be symmetric in its lower indices yields n2(n - 1)/2 further conditions which, along with the n2(n 1)/2 conditions of Eq. (3.1.14), should permit us to determine all n3 of the Christoffel coefficients. It can now be seen why Eq. (3.1.14) was written three times. Adding the three equations and taking advantage of Eq. (3.1.17) yields

+

(3.1.18) For any particular values of “free indices” i and j (and suppressing them to make the equation appear less formidable) this can be regarded as a matrix equation of the form gkrrr

= Rk

or GI’ = R.

(3.1.19)

Here G is the matrix (gkr) introduced previously, r = (r,)is the set of Christoffel symbols for the particular values of i and j , and R = (&) is the corresponding righthand side of Eq. (3.1.18). The distinction between upper and lower indices has been

(REAL)CURVILINEAR COORDINATES IN N-DIMENSIONS

97

ignored? Being a matrix equation, this can be solved without difficulty to complete the detemzination of the Christoffel symbols:

r =G-~R,

(3.1.20)

and the same can be done for each i , j pair. Though these manipulations may appear overly formal at this point, an example given below will show that they are quite manageable. 3.1.4. The Covariant (or Absolute) Differential There is considerable difference between mathematical and and physical intuition in the area of differentiation. Compounding this, there is a plethora of distinct types of derivative, going by names such as total, invariant, absolute, covariant, variational, gradient, divergence,curl, Lie, exterior, Frechkt, Lagrange, etc. Each of these-some are just different names for the same thing4ombines the common concepts of differential calculus with other concepts. In this chapter some of these terms are explained, and eventually nearly all will be. The differential in the denominator of a derivative is normally a scalar, or at least a one-component object, often d t , while the numerator is often a multicomponent object. The replacement oft by a monotonically related variable, say s = f(r), makes a relatively insignificant change in the multicomponent derivatives-all components of the derivative are multiplied by the same factor d t l d s . This makes it adequate to work with differentials rather than derivatives in most cases, and that is what we will do. We will disregard as inessential the distinction between the physicist’s view of a differential as an approximation to a small but finite change and the mathematician’s view of a differential as a finite yet exact displacement along a tangent vector. We start with a type of derivative that may be familiar to physicists in one guise, yet mysterious in another; the familiar form is that of coriolis or centrifugal acceleration. Physicists know that Newton’s first law-free objects do not accelerateapplies only in inertial frames of reference. If one insists on using an accelerating frame of reference-say one fixed to the earth, incorporating latitude, longitude, and altitude-the correct description of projectile motion requires augmenting the true forces, gravity and air resistance, by “fictitious” coriolis and centrifugal forces. These extra forces compensate for the fact that the reference frame is not inertial. Many physicists, perhaps finding the introduction of fictitious forces artificial and hence distasteful, or perhaps having been too well-taught in introductory physics that “there is no such thing as centrifugal force,” resist this approach and prefer a strict inertial frame description. Here we insist instead on a noninertial description using the curvilinear coordinates introduced in the previous section. ’Failing to distinguish between upper and lower indices ruins the invariance of equations as far as transformation between different frames is concerned, but it is valid in any particular frame. In any case, since the quantities on the two sides of Eq. (3.1.19) are not tensors, distinction between upper and lower indices would be unjustified.

98

GEOMETRY OF MECHANICS II: CURVILINEAR

A particle trajectory can be described by curvilinear coordinates u ' ( t ) , u 2 ( t ) , . . . , u"(f) giving its location as a function of time r. For example, uniform motion on a circle of radius R is described by r = R, Q = o f . The velocity v has curvilinear velocity components that are dejned by .

du' dt

V ' E - E U ,

.i

(3.1.21)

In the circular motion example, i. = 0. Should one then define curvilinear acceleration components by (3.1.22) No! One could define acceleration this way, but it would lead, for example, to the result that the radial acceleration in uniform circular motion is zero-certainly not consistent with conventional terminology. Here is what has gone wrong: while v is a perfectly good arrow, and hence a true vector, its components vi are projections onto axes parallel to the local coordinate axes. Though these local axes are not themselves rotating, a frame moving so that its origin coincides with the particle and having its axes always parallel to local axes has to be rotating relative to the inertial frame. One is violating the rules of Newtonian mechanics. Here is what can be done about it: Calculate acceleration components relative to the base frame. Before doing this we establish a somewhat more general framework by introducing the concept of vectorjeld. A vecforjeld V ( P ) is a vector function of position that assigns an arrow V to each point P in space. An example with V = r, the radius vector from a fixed origin, is illustrated in Fig. 3.1.2. (Check the two boldface arrows with a ruler to confirm V = r.) In the figure the same curvilinear coordinate system as appeared in Fig. 3.1.1 is assumed to be in use. At each point the curvilinear components V' of the vector V are defined to be the coefficients in the expansion of V in terms of local basis vectors:

v = Vie'.

(3.1.23)

The absolute differential DV of a vector function V( P ) . like any differential, is the change in V that accompanies a change in its argument, in the small change limit. For this to be meaningful it is, of course, necessary to specify what is meant by the changes. In Fig. 3.1.2, in the (finite) change of position from point M to point M',the change in V is indicated by the arrow labeled V' - V; being an arrow, it is manifestly a true vector. Jh terms of local coordinates, the vectors at M and M' are given, respectively, by

V

= V ( M ) = V'el

+ V2e2,

V' = V(M') = V'le'I

+ VI2e'2.

(3.1.24)

In the limit of small changes, using Eq. (3.1.10a), one has

DV

= d(V')ej + V j d ( e , ) = d(V')ei + V j u j i e i = ( d ( V ' ) + V j m j i ) e j . (3.1.25)

(REAL) CURVILlNEAR COORDINATES IN N-DIMENSIONS

99

FIGURE 3.1.2. The vector field V ( P ) = r(P). where r(P) is a radius vector from point 0 to point P, expresssed in terms of the local curvilinear coordinates shown in Fig. 3.1.1. The change V’ - V in going from point M to point M’ is shown.

This differential (a true vector by construction) can be seen to have contravariant components given by

DV’=(DV)‘= d V ‘ + V J w j i ~ d V ’ + V j , J ! . ~ d u ~ ,

(3.1.26)

where the duk are the curvilinear components of M’relative to M. (Just this time) a certain amount of care has been taken with the placement of parentheses in these Note that, since equations. The main thing to notice is the definition DV E (DV)’. the components uk and V’ are known functions of position, their differentials duk and d V’ are unambiguous; there is no need to introduce symbols D(uk) and D( V‘) since, if one did, their meanings would just be duk and d V’ . On the other hand the quantity DV is a newly defined true vector whose components are being first evaluated in Eq.(3.1.26). (It might be pedagogically more helpful if these components rather than by DV but since that is never done, were always symbolized by (DV)’ it is necessary to remember the meaning some other way. (For the moment the superscript i has been moved slightly away to suggest that it “binds somewhat less tightly” to V than does the D.)Note then that the DV are the components of a true vector, while d V ’ , differential changes expressed in local coordinates, are not. DV is commonly called the covariunt differential; this causes DV to be the “contravariant components of the covariant differential.” Since this is unwieldy, we will use the term absolute differential rather than covarianfdifferential. If the vector

’

’;

’

100

GEOMETRY OF MECHANICS II: CURVILINEAR

being differentiated is a constant vector A , it follows that D A = 0,

.

.

and hence, d A i = -AJ#,’.

(3.1.27)

How to obtain the covariant components of the absolute differential of a variable vector V is the subject of the following problem.

Problem 3.1.2: Consider the scalar product, V . A , of a variable vector V and an arbitrary constant vector A. Its differential, as calculated in the base frame, could be designated D (V . A), while its differential in the local frame could be designated d (V . A). Since the change of a scalar should be independent of frame, these two differentials must be equal. Use this, and Eq. (3.1.27) and the fact that A is arbitrary, to show that the covariant components of the absolute differentials of vector V are given by D v i = d V j - vkaik .

(3.1.28)

If ?; is the form naturally associated with V then the DVi are the components of a form E V .

Problem 3.1.3: The line of reasoning of the previous problem can be generalized to derive the absolute differential of more complicated tensors. Consider, for example, a mixed tensor aiJ having one upper and one lower index. Show that the absolute differential of this tensor is given by Da

iJ

= d a i J - akJ‘ wi k

+ aik #k j .

(3.1.29)

Problem 3.1.4: Ricci’s Theorem. Derived as in the previous two problems, Dai, = dai, - ak, mik - aik w j k .

(3.1.30)

Using this formula and Eq. (3.1.13), evaluate Dgij, the absolute differential of the metric tensor gi,, and show that it vanishes. This means that the metric tensor elements act like constants for absolute differentiation. Use this result to show that the absolute differential D ( A .B ) of the scalar product of two constant vectors A and B vanishes (as it must). Because the concept of absolute differentiation is both extremely important and quite confusing, some recapitulation may be in order. Since the difference of two arrows is an arrow, the rate of change of an arrow is an arrow. Stated more conventionally, the rate of change of a true vector is a true vector. Confusion enters only when a vector is represented by its components. It is therefore worth emphasizing that The components of the rate of change of vector V are not, in general, the rates of change of the components of V.

(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS

101

This applies to all true tensors. Unfortunately, since practical calculation almost always requires the introduction of components, it is necessary to develop careful formulas, expressed in component form, for differentiating vectors (and all other tensors). The derivation of a few of these formulas is the subject of the set of problems just above. 3.1.5. Derivation of the Lagrange Equations of Mechanics from the Absolute Differential In mechanics one frequently has the need for coordinate systems that depend on position (curvilinear) or time (rotating or accelerating). Here we analyze the former case while continuing to exclude the latter. That is, the coefficients of the metric tensor can depend on position but are assumed to be independent of t . On the other hand, the positions of the particle or particles being described certainly vary with time. In this section we symbolize coordinates by q’ rather than the ui used to this point. There is no significance whatsoever to this change other than the fact that generalized coordinates in mechanics are usually assigned the symbol q . The first vector to be subjected to invariant differentiation will be the position vector of a point following a given trajectory.

Example 3.Z.Z: Motion that is simple when described in one set of coordinates may be quite complicated in another. For example, consider a particle moving parallel to the x axis at y = 6 with constant speed u. That is (x = u t , y = 6). In spherical coordinates, with 6 = 71/2, the particle displacement is given by 6 r=sin 4 ’

4 =tan-

1 -.6

ut

The time derivatives are U

vrri=ucos4,

v@E$=---sin24; 6

following our standard terminology for velocities, we have defined v‘ = i and v@ = 6. (This terminology is by no means universal, however. It has the disagreeable feature that the components are not the projections of the same arrow onto mutually orthonormal axes, as Fig. 3.1.3 shows. Also they have different units. They are, however, the contravariant components of a true vector along well-defined local axes.) Taking another time derivative yields u2 F = -sin 6

3

..

4, 4 =

2u2 -CCOS~

b2

sin34.

Defining by the term “absolute acceleration”the acceleration in an inertial coordinate frame, the absolute acceleration obviously should vanish in this motion. And yet

102

GEOMETRY OF MECHANICS II: CURVILINEAR

X

Ut

FIGURE 3.1.3. A particle moves parallel to the x-axis at constant speed u.

the quantities i: and Example 3.1.4.

6 are nonvanishing. We will continue this example below in

Example 3.1.2: In cylindrical ( r , 4, z) coordinates the nonvanishing Christoffel elements are 1 r22 = -r,

(as will be shown shortly). The polar components of r = (r, 0) components of the covariant derivative with respect to r are

= (4' , q 2 ) and the

That is, (i-, t$) are the polar components of D r l d t . This is a misleadingly simple result, however. It follows from the result that, if r = rer is the radius vector of a particle, and r is taken as its first, and its only nonvanishing, coordinate, then Dq'ldr = 4' rr'i,:. For this particular choice of coordinates, the second term vanishes. In general, though the elements dq' transform like the components of a vector, the coordinates q' themselves do not, because they are macroscopic quantities that cannot be expected to have properties derived from linearization. Hence the quantities Dq' have nondescript geometric character and should perhaps not even have been written down. Nevertheless, (q l , q 2 , . . . ,q") is a true vector.

+

Exumpfe3.1.3: This same result can be obtained simply using traditional vector analysis on the vectors shown in Fig. 3.1.4:

The factor r in the final term reflects the difference between our basis vector eg and the unit length vector (i) of ordinary vector analysis.

(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS

103

P FIGURE 3.1.4. Time rate of change of a unit vector.

If the changes to vector V discussed in the previous section occur during time d t , perhaps because a particle that is at M at time t moves in such a way as to be at M' at time t d t , the differentials DV' of vector V can be converted to time derivatives:8

+

The quantity V being differentiated in Eq. (3.1.31) is any vector field. One such possible vector field, defined at every point on the trajectory of a moving particle, is its instantaneous velocity v = d x / d t . Being an m o w (tangent to the trajectory) v is a true vector; its local components are ui = dq'/dt = q ' . The absolute acceleration a is defined by

Dv a=-

dt '

. Dv' or a' = -. dt

(3.1.32)

Substituting V' = ui in Eq. (3.1.31) yields (3.1.33)

As the simplest possible problem of mechanics let us now suppose that the particle being described is subject to no force and, as a result, to no acceleration. Setting a' = 0 yields

# = -ri k 4.' q k. j

(3.1.34)

This is the equation of motion of afree particle. In rectangular coordinates, since rik = 0,this degenerates to the simple result that v is constant; the motion in question is along a straight line with constant speed. This implies that the solution of Eq. (3.1.34) is the equation of a straight line in our curvilinear coordinates. Since a line is a purely geometric object, it seems preferable to express its equation in 'In this and previous expressions the common shorthand indication of ford time derivative by an overhead dot has been used. One can inquire why V' has been defined to mean dV ' / d r , rather than DV ' / d t . It is just a convention (due originally to Newton), but the convention is well established, and it must be respected if nonsense is to be avoided. The vector field V. though dependent on position, has been assumed to be constant in time; if V has an explicit time dependence, the term V r would have to include also a contribution av'lar.

104

GEOMETRY OF MECHANICS II: CURVILINEAR

terms of arc length s along the line rather than time t. As observed previously, such a transformation is easy-especially so in this case since the speed is constant. The equation of a straight line is then d2q’ -=-r! ds2

. dqjdqk

--.

J

ds ds

(3.1.35)

Suppose next that the particle is not free, but rather is subject to a force F. Substituting a = F/m, where m is the particle mass, into Eq. (3.1.33) can be expected to yield Newton’s law expressed in these coordinates, but a certain amount of care is required before components are assigned to F. Before pursuing this line of development, we look at free motion from another point of view. There are two connections with mechanics that deserve consideration-variational principles and Lagrange’s equations. The first can be addressed by the following problems: the first of these is reasonably straightforward; the second is less so and could perhaps be deferred or looked up9

Problem 3.1.5: Consider the integral S =

J7

L ( q ’ , q ’ , r)dr, evaluated along a candidate particle path between starting position at initial time rl and final position at time 22, where L = ygijq’q’. Using the calculus of variations, show that Eq. (3.1.34) is the equation of the path for which S is extreme. In other words, show that Eq. (3.1.34) is the same as the Euler-Lagrange equation for this Lagrangian L .

Problem 3.1.6: It is “obvious” also, since free particles travel in straight lines and straight lines have minimal lengths, that the Euler-Lagrange equation for the trajectory yielding extreme value to integral I = 1d s where ds2 = g;,dq’dq’ should lead also to Eq. (3.1.35). Demonstrate this. These two problems suggest a close connection between Eq. (3.1.33) and the Lagrange equations that will now be considered. The key dynamic variable that needs to be defined is the kinetic energy T. In the present context, using Eq. (3.1.7),

T =m --gjkqiqk. 2

(3.1.36)

(Though not exhibited explicitly, the metric coefficients depend on the coordinates 4’ .) When one thinks of “force” one thinks either of what its source is, for example an electric charge distribution, or what it must be to account for an observed acceleration. Here we take the latter tack and (on speculation) introduce a quantity Q; (to be

9B0th problems are solved in Dubrovin et al. [2].

(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS

105

interpreted later as the “generalized force” corresponding to q’ ) by Ql=

d aT d t aq

aT

7 --

aql‘

(3.1.37)

We proceed to evaluate Ql by substituting for T from Eq. (3.1.36), and using Eq. (3.1.18):

(3.1.38)

This formula resembles Eq. (3.1.33); a comparison with Eq. (2.5.17) shows that their right-hand sides are covariant and contravariant components of the same vector. Expressed as an intrinsic equation, this yields

(3.1.39) This confirms that the Lagrange equations are equivalent to Newton’s equations since the right-hand side is the acceleration a’. For this equation to predict the motion, it is of course necessary for the force Q’ to be given.

Recapitulation: From a given particle trajectory it is a kinematical job to infer the acceleration, and the covariant derivative is what is needed for this task. The result is written on the right-hand side of Eq. (3.1.39), in the form of contravariant components of a vector. It was shown in Eq. (3.1.38) that this same quantity could be obtained by calculating the “Lagrange derivatives” d / d t ( a T / a q ) - a T / a q where T = (m/2)g,k q J q k (The . occurrence of mass m in the definition of T suggests that it is a dynamical quantity, but inclusion of the multiplier m is rather artificial; T and the metric tensor are essentially equivalent quantities.) It is only a minor complication that the Lagrange derivative of T , yields covariant components which need to “have their indices raised” before yielding the contravariant components of acceleration. Dynamics only enters when the acceleration is ascribed to a force according to Newton’s law, a = F / m . When a in this equation is evaluated by the invariant derivative as in Eq. (3.1.39), the result is called “Newton’s equation.” When a is evaluated by the Lagrange derivative of T , the result is called “Lagrange’sequation.” Commonly force is introduced into the Lagrange equation by introducing L = T - V , where V is “potential energy.” This is an artificial abbreviation, however, since it mixes a kinematic quantity T and a dynamic quantity V. From the present point of view, since it is not difficult to introduce forces directly, this procedure is, logically speaking, clearer than introducing them indirectly in the form of potential energy.

106

GEOMETRY OF MECHANICS II: CURVILINEAR

The prominent role played in mechanics by the kinetic energy T is due, on the one hand, to its close connection with ds2 and, on the other hand, to the fact that virtual force components Ql can be derived from T using Fq.(3.1.37). 3.1.6. Practical Evaluation of the Christoffel Symbols The “direct” method of obtaining Christoffel symbols for a given coordinate system is to substitute the metric coefficients into Eq. (3.1.18) and solve Eqs. (3.1.19). But this involves much differentiation and is rather complicated. A practical alternative is to use the equations just derived. Suppose, for example, that spherical coordinates are in use: ( q l , q2,q 3 )= (r, 6,4). In terms of these coordinates and the formula for distance ds, one obtains metric coefficients from Eqs. (3.1.5) and (2.5.17): ds2 = d r 2

+ r2 do2 + r2 sin26 d#’,

822 = r

gll = 1,

2

,

g33

= r2 sin26,

(3.1.40)

and all off-diagonal coefficients vanish. ( g ” , g22,g33are defined by the usual index raising.) Acceleration components aican then be obtained using Eq.(3.1.39),though it is necessary first to raise the index of Ql using the metric tensor. From Eq. (3.1.39) one notes that the Christoffel symbols are the coefficients of terms quadratic in velocity components, i., 8, and in Eq. (3.1.38), and this result can be used to obtain them. Carrying out these calculations (for spherical coordinates with m = l), the kinetic energy and the contravariant components of virtual force are given by

4

2T = i 2 + r 2 b 2 + r 2 sin28#2,

_ - _ _~ ~ ) = k . + ; i2* - s i n 6 c o s e ) 2 ,

( d aT

Q2=-

r2 dt

a@

case . .

Q 3 = - - (1d ~ - - 5 ) = 4 + - i ) + 2..- 6 )2 . r2sin26 dt a+ r

(3.1.41)

sin 6

Matching coefficients, noting that the coefficientswith factors of 2 are the terms that are duplicated in the (symmetric) off-diagonal terms of (3.1.38), the nonvanishing Christoffel symbols are

r2I

= -r,

r,2

= F,

3

r313= -r

sin26 ,

1

2 r33 = -sin6

1

r 2 3 =sin6 .

r 1 3= -, r

3

cose

cos6,

(3.1.42)

(REAL) CURVILINEAR COORDINATES IN N-DIMENSIONS

107

Example 3.1.4: We test these formulas for at least one example by revisiting Example 3.1.1. Using Eq.(3.1.41), we find that the force components acting on the particle in that example are u2 sin3 4 - r$2 = 0, Q' = 7 2u2

Q3

= p cos4 sin3 4

2 + -i-i = 0. r

This shows that the particle moving in a straight line at constant speed is subject to no force. This confirms statements made previously.

Problem 3.1.7: For cylindrical ( p , 4, z) coordinates, calculate the Christoffel symbols both directly from their defining Eqs. (3.1.18) and indirectly using the Lagrange equation.

3.1.7. Evaluation of the Christoffel Symbols Using Maple It is even easier to obtain the Christoffel symbols using a computer program. The following listing shows the use of Maple to obtain the Christoffel symbols for spherical and cylindrical coordinates. This would be most useful for less symmetric coordinates, defined by more complicated formulas. Calculate Christoffel coefficients readlib(tensor1: Ndim:=3: #

# #

Spherical coordinates xl:=r; x2:=theta; x3:=phi; gll:=l; g22:=r-2; g33:=rA2*sin(theta)^2; tensor0 ; for i from 1 by 1 to 3 do for j from 1 by 1 to 3 do for k from 1 by 1 to 3 do if i<=j then if C.i.j.k <> 0 then printf( '%d,%d,%d',i,k.j); print (C. i.j.k) ; fi; fi; od ; od ; od ; #

1,2,2

108

GEOMETRY OF MECHANICS II: CURVILINEAR

2

- r sin(theta1

# Cylindrical coordinates restart; readlib(tensor): Ndim :=3: xl:=r; x2:=phi; x3:=z; gll:=l; g22:=r^2; g33:=1; tensor0 ; for i from 1 by 1 to 3 do for j from 1 by 1 to 3 do for k from 1 by 1 to 3 do if i<=j then if C.i.j.k <> 0 then printf ( '%d,%d,%d' ,i,k,j); print(C.i.j.k); fi; fi; od ; od ; od ;

l/r 2.1.2

ABSOLUTE DERIVATIVESAND THE BILINEAR COVARIANT

109

3.2. ABSOLUTE DERIVATIVES AND THE BILINEAR COVARIANT Absolute differentials have been defined for contravariant vectors in Eq. (3.1.26), for covariant vectors in Problem 3.1.2, and for two-index tensors in Problem 3.1.3. The generalization to tensors of arbitrary order is obvious, and the following discussion also transposes easily for tensors of arbitrary order. For simplicity, consider the case of Eq. (3.1.30): (3.2.1) Since duk and Dair are true tensors, the coefficients (3.2.2) also constitute a ?ruetensor. As another example, if X i are the covariant components of a vector field, then (3.2.3) is also a true tensor. The tensors of Eq. (3.2.2) and Eq. (3.2.3) are called covariant (or invariant) derivatives of ail and X i , respectively. We now perform an important, though for now somewhat poorly motivated, manipulation. What makes this apology necessary is that our entire discussion up to this point has been more specialized than will eventually be required. A (subliminal) warning of this was issued as the Christoffel symbols were introduced and described as a “connection.” Their unique definition relied on the fact that the curvilinear coordinates being analyzed were embedded in a Euclidean space, with distances and angles having their standard meanings inherited from that space. In more general situations, an affine connection exists but is not calculable by Eq. (3.1.18). Once one accepts that the rikjcoefficients are special, one must also accept that covariant derivatives like (Dair)k or ( D X i ) j ,rather than being universal, are specific to the particular connection that enters their definition. But, relying on the fact that the Christoffel symbols are symmetric in their lower indices, it is clear (in Eq. (3.2.3) for example) that a more universal (independent of connection) derivative can be formed by antisymmetrizing these tensors to eliminate the Christoffel symbols. Subtracting Eq. (3.2.3) and the same equation with indices interchanged yields (3.2.4) Being a sum of tensors this is a tensor and it has intrinsic significance for any system described by smoothly defined coordinates. It generalizes the “curl” of vector field

110

GEOMETRY OF MECHANICS II: CURVILINEAR

X,familiar from vector analysis. For tensors having more indices, similar antisymmetrized, intrinsic derivatives can also be defined. That this combination is a true tensor can be used to prove the invariance of the so-called “bilinear covariant”” formed from a differential form w ( d ) . Here w[d] is an abbreviation

The same differential, but expressed with a different argument 6, is w[S] = XlSU’

+ X2SU2 + . + X“SU”.

(3.2.6)

(More symmetric notation, such as d(l) and d(2) instead of d and 8, could have been used but would have caused a clutter of indices.) Interchangingd and 6, then forming another level of differential in Eqs. (3.2.5) and (3.2.6), and then subtracting, yields dw[S] - 6w[d] =

2

(3 - 3) (6uk duj auJ au

- SuJ

duk) .

(3.2.7)

(The factor 1/2 takes advantage of the fact that when this is expanded the terms are equal in pairs.) The right-hand side is the tensor contraction of the product of the first factor (shown to be a true tensor in Eq. (3.2.4)) and the bivector (also an invariant”) formed from du and 6u. The upshot of this paragraph is that this combination dw[S] - 6 4 4 ,called the “bilinear covariant” of the form w, has been shown to be independent of choice of coordinates. The combination ( D X i ) , - (DXj)i has also been shown to be intrinsic or a true tensor. This is true for fields defined on any smooth manifold. Strictly speaking, this result has only been proved here for manifolds with connection rikjdefined, but the only requirement on the connection is contained in Q. (3.1.10b), which links basis frames at nearby locations, including the requirement of symmetry in the lower indices. For manifolds encountered in mechanics, this weak requirement is typically satisfied.

“For the time being we continue to use the somewhat archaic “bilinear covariant” as it was introduced in Section 2.3.2 rather than rely on the “exterior derivative” formalism because the only result that will be used is explicitly derived in this section. The exterior derivative formalism streamlines the algebra and obviates the need for introducing the distinguished symbols d and 6, but our terminology is (arguably) better motivated in the present context. ”Bivectors will be discussed at length in Chapter 4. For now, their tensor character can be inferred from their transformation properties under the transformations defined by Eq. (2.4.5).

LIE DERIVATIVE-COORDINATE APPROACH

111

*3.3. LIE DERIVATIVE-COORDINATEAPPROACH’* 3.3.1. Lie-Dragged Coordinate Systems Prototypical coordinate systems discussed so far have been spherical, cylindrical, elliptical, etc., the fixed nonrectangular coordinate systems familiar from electricity and magnetism and other fields of physics. We now consider coordinate systems that are more abstractly defined in terms of a general vector field V defined at every point in some manifold of points M.The situation will be considerably more general than that of the previous section in that the coordinates of point M are allowed to be any generalized coordinates and no metric is assumed to be present. From studying special relativity, one has become accustomed to (seemingly) paradoxical phenomena such as “moving clocks run slow.” Closer to one’s actual experence, at one time or another everyone has been sitting in a train watching the adjacent train pull out slowly, shortly to be surprised that the watched train is stationary and it is actually one’s own train that is moving. The “Lie derivative” is a mathematical device for analyzing phenomena like this. To appreciate the name “the fisherman’s derivative” that Arnold gives the Lie derivative, one has to visualize the fisherman sitting on the bank and watching the river flow by or (better, because there are more concrete objects to view) sitting in a boat that is coasting with the flow and watching the shore. I prefer the term “passing-parade derivative.” Visualize yourself standing by the side of the road as a parade passes by. The evolution you observe is a marching band turning into a float that turns into prancing horses that turn into another marching band, and so on. An alternative is to visualize yourself in the parade watching the buildings along the route metamorphose into one another. Since these parade images violate the requirement of smooth evolution, let us try another one. Visualize yourself by the side of the route of a marathon race, 10 miles from the starting point, 1 hour after the start, well after the leaders have passed. As the runners straggle by you say “the runners are aging rapidly” when, in fact, it is just that the older runners have taken longer getting there. If the 30-year-olds run at 11 miles per hour and the 40-year-olds run at 10 miles an hour, 1 hour into the race the 30-year-olds will be ahead by 1 mile, relative to your viewing station, and the 40-year-olds will be where you are. The aging rate you observe is (4030)/( 10/10 - 10/11) x 100 yearshour. The same result could have been obtained via the spatial rate of change of age at fixed time which is -10/1 = -10 yeardmile. To get (the negative of) the observed aging rate from this you have to multiply by the 10 mileshour velocity. The 100yearshour aging rate you observe can be said to be the negative of the “Lie derivative” of runner’s age. 12Sections marked by an asterisk are noticeably more difficult and should perhaps be skipped or skimmed over on first reading. The material in this section will only be required when the properties of vector fields are used to derive the Poincad equations. It is placed here because it is based on concepts like those required to analyze curvilinear coordinates. It would be possible to avoid the Lie derivative altogether in analyzing the algebra of noncommuting vector fields, in which case the concept would never need to enter mechanics. But vector fields and Lie derivatives are so similar, and so thoroughly woven together in the literature, that one is eventually forced to understand this material. There will also be striking similaritiesbetween the Lie derivative and the covariant derivative derived earlier in this chapter.

112

GEOMETRY OF MECHANICS II: CURVILINEAR

From the point of view of physics, age and rate of aging are fundamentally, dimensionally different, but from the point of view of geometry, apart from the limiting process, they differ only by a scalar multiple dt and hence have the same geometric character. A similar relationship is that position vectors and instantaneous velocity vectors have the same geometric character. In the same way, the Lie derivative of any quantity has the same geometric (i.e., tensor) character as the quantity being differentiated. When one recalls the mental strain that accompanied first understanding the time dilation phenomenon of special relativity mentioned above, one will anticipate serious conceptual abstraction and difficult ambiguity avoidance in defining the Lie derivative. Here it will be defined in two steps. First, starting from one (henceforth to be called preferred) coordinate system, one defines another “Lie-dragged‘’ coordinate system (actually a family of other Lie-dragged coordinate systems). Then, mimicking procedures from Sections 3.1.3 and 3.1.4, components in the dragged frame will be “corrected” to account for frame rotation or distortion, relative to the preferred frame, to form the Lie differential. To elaborate further, consider a row of birds flying across the sky. Assume the velocities v of the birds depend13 on their position but not on their time of arrival at that location. In other words we assume that v(x) depends only on x. As time goes on, the birds remain in a (possibly curved) transverse line. A multiple-exposurephotograph showing a single line of birds at uniformly spaced time intervals (interval = At) is shown in Fig. 3.3.1. The velocity in the lower left comer has been chosen to be about l/Af so that the curved grid conforms with the rectangular grid in that

7

4At

6 5 4

3

2 1 y= 0 X =

FIGURE 3.3.1. Locations of a single “line” of birds at successive times, t = -At, 0 , At, 2At.. . . . The picture is a multiple-exposure photograph. Because the curves in this plot refer to different times, they are not the coordinate curves sought at this time-that system is illustrated in the next-but-one figure. 13This and all similar dependencies are assumed to be smooth.

LIE DERIVATIVE-COORDINATE APPROACH

113

region. If the birds were numbered sequentially, the numbers could serve as a kind of one-dimensional transverse coordinate system. Successive lines of birds would permit the definition of a longitudinal coordinate system based on the exposure number. The reader should not become too attached to this system, however, since the Lie-dragged coordinate system to be introduced shortly, though also based on bird locations, is different because, unlike this system, it relies only on bird locations at a fixed time. A picture that is closer to what we seek (but still is not it) is shown as Fig. 3.3.2. In this figure, successive flights of airplanes.. . , FO, F1, F2, . . . take off at regular intervals, with large ones taking off first and later flights being made up of progressively smaller ones. (Airplane size has been introduced only to clarify the picture; it has no effect on the analysis.) The flight paths are the same as in the previous figure. A viewer at a fixed location may regard the overhead airplanes as being “dragged past”; however, the smooth curves in Fig. 3.3.2 still do not constitute a Lie-dragged coordinate system. Finally, after two false starts, consider Fig. 3.3.3, which could be derived from either one of the previous two figures. Let us suppose that at ? = 0 there is a bird or airplane at each square of the initial rectangular ( x , y ) grid, and that a single snapshot is taken at a later time t = At. (Though the time interval At is arbitrary in principle, it will be useful to think of it as “small” because, for the most important considerations to follow, At will approach this limit.) To construct this figure it is necessary to plot displacement-in-time-At vectors in Fig. 3.3.1 and interpolate from them to Fig. 3.3.3. The rows and lines of birds at t = A? provide the sought-for “Lie-dragged” new coordinate system (X,Y).

F-1 FO 7

6 5

F1

F2

F3 F4 F5 ,F6

4

3

F7

2 1 y= 0 X=

FIGURE 3.3.2. Successive flights of airplanes fly overhead, larger ones in the front; a single snapshot records all of the flights at a fixed time. To a viewer at fixed location, the airplanes appear to (and do) become gradually smaller with time. Though the planes may appear to be being “dragged past,”the smooth contours still do not define a Lie-draggedcoordinate system.

114

GEOMETRY OF MECHANICS II: CURVILINEAR

x=

0

2

3

4

Y=4 Y=3

a b c

d A

Y=

B

X =

C

D

x

y

X

Y

3 4 4 3

3 3 4 4

1.9 2.9 2.7 1.8

2.9 2.9 3.8 3.8

4.2 5.6 6.0 4.3

3.2 3.3 4.8 4.3

3 4 4 3

3 3 4 4

FIGURE 3.3.3. Corresponding to Fig. 3.3.1 or Fig. 3.3.2, the (new) ( X , Y ) coordinate system derived by ”Lie-dragging”the (old) ( x , y ) coordinate system for time A t with velocity vector field v. The boldface arrows are (almost) parallel to the trajectories of the previous figure but are displaced a bit transversely. To restrict clutter, only enough interpolated arrows are shown to illustrate where curves of constant X (shown dashed) and Y (shown dotted) come from.

Since any point in the plane can be located by “old” coordinates (x, y) or “new” coordinates (X, Y), there necessarily are well defined functional relations x = x ( X , Y), y = y(X,Y )as well as inverse relations X = X ( x , y ) , Y = Y ( x , y). Of course this presupposes that all relevant functions are sufficiently smooth and invertible, and that interpolation between “bird locations” is arbitrarily accurate. The new (X, Y) system is said to be “Lie-dragged’ from the old system. Four points a, 6, c, and d, defining a unit rectangle in the old coordinates and the corresponding, at-time-At, unit “rectangle” with corners A , B, C, and D, are emphasized in Fig. 3.3.3 and broken out on the right, where their coordinates are also shown. Note that none of these points lies on the bird paths in Fig. 3.3.1, but their velocities are parallel (after interpolation) to the bird velocities shown in that figure. In the table, the coordinates of all these points are given in both the old and the new coordinate systems. Because of the way they have been defined, the new coordinates (XA, Y A ) of point A are the same as the old coordinates (xa, ya) of point a, and the same is true for every similarly corresponding pair of points. We now seek explicit transformation relations between old and new coordinates, though only as Taylor series in powers of At. Regarding point (I as typical, one has

+ ux(a)At + . YA = yo + U y ( U ) A t + . . . ,

XA

= xa

(3.3.1)

LIE DERIVATIVE-COORDINATEAPPROACH

115

where ux and UY are known functions of position; they are the “old” components of v, an arbitrary vector field. (It is not necessary to make the arguments of ux and uY more explicit since, to the relevant order in Ar, it is not necessary to distinguish among %, XA, Xa, and XA.) Eq. (3.3.1) can be regarded as the description in the old system of reference of an active transformation of point a + A. However, since one has defined the numerical equalities X A = Xa and YA = ya, Eq. (3.3.1) can be rewritten as XA

=x~+u~Ar+.-.

YA

= YA + VYAt + . . . .

(3.3.2)

These equations can be regarded as apassive (X,Y) --+ (x, y) coordinate transformation. They can be checked mentally using the entries in the table in Fig. 3.3.3, making allowance for the nonlinearity (exaggerated for pictorial clarity) of that figure. Since the subscripts A in this equation now refer to the same point, they have become superi3uous and can be dropped

+ vXAr+ . .. y = Y + uYAr+ . . . .

x =X

(3.3.3)

The inverse relations are

X =x

- vXAr+...

Y = y - uYAr

+ . ...

(3.3.4)

Since these equations describe transformation between curvilinear systems, they can be treated by methods described earlier in the chapter. But, as mentioned before, we are not now assuming the existence of any metric. Hence, though the old coordinate system in Fig. 3.3.3 is drawn as rectangular, it would not be meaningful to say, for example, that the line bc is parallel to the line ad, even though that appears to be the case in the figure. (The pair ( x , y) might be polar coordinates ( p , 4) for example.) 3.3.2. Lie Derivatives of Scalars and Vectors Having Lie-dragged the coordinate system, we next define the Lie-dragging of a general scalar function f ( x ) , or of a vector function w(x), or for that matter of tensors of higher rank. For vectors (and tensors of higher rank), this calculation is complicated by the fact that the Jacobean matrix relating coordinate systems depends on position. We defer addressing that problem by starting with scalar functions, which, having only one component, transform without reference to the Jacobean matrix. We define a “Lie-dragged function” f* (whose domain of definition is more or less the same as that of f ) by asserting the relation

f*(a) = f ( A )

(3.3.5)

116

GEOMETRY OF MECHANICS II: CURVILINEAR

to typical point a and its Lie-dragged image point A . This function describes a new physical quantity whose value at a is the same as that of the old quantity f at A. (If A is thought of as having been “dragged forward” from a, then f * might better be called a “dragged-back” function. This ambiguity is one likely source of sign error.) It could happen that f * and f have the same value at a point such as a but, because of f’s dependence on position, this will not ordinarily be the case. Though it is not shown explicitly in Eq. (3.33, the definition o f f * depends on At; this dependence has been incorporated by introducing a new function f * rather than by giving f another argument. For small At, f (A) can be approximated by the leading terms of a Taylor series: f (A) = f ( a )

af af + --*At + -uYAt + - .. . ax aY

(3.3.6)

The “Lie derivative” of function f , relative to vectorfield v, evaluated at a, is defined, and then evaluated, by (3.3.7)

Problem 3.3.1: In the text at the beginning of this section, an observation of runners passing a stationary spectator was described. For the example numerical values given there, assign numerical values to all quantities appearing in Eq. (3.3.7) and confirm the equality approximately. Also derive a function a ( x , t ) that gives the age of runners at position x at time t . From it evaluate La!at the observation point. V

Before evaluating the Lie derivative of a vector field w(x), we must assign meaning to the Lie-dragging of a vector. To do this, first consider Fig. 3.3.4, which is a blown-up version of the circular insert in Fig. 3.3.3. Consider in particular the arrow a‘c, and suppose it to be w(a), the value of vector field w evaluated at a. (It happens to connect two intersections of the original gri$ but that is just to simp$fy the picture.) Further suppose that w(A) is the arrow AC‘. The vectors a‘c and AC’, because they are defined at different points in the_manifold M, cannot be directly compared (or, more to the point, subtracted). But AC’ is the result of Lie-dragging some vector ac* along the vector v for the time At being considered. In the small At limit, the arrow A is the Lie differential of w with respect to v and the Lie derivative of w with respect to v at the point a is defined by

-

L w = lim v

At40

a;* - ic A = lim -. At At-0 At

(3.3.8)

The construction exhibited in Fig. 3.3.4 is similar to that in Fig. 3.1.2. In that figure, the vector V was slid from the point M to M’ without changing its direction. This was possible because “parallelism”was meaningful in the metric geometry valid for that figure. The purpose in sliding V was to evaluate V’ - V and from that the derivative of V.The corresponding construction in Fig. 3.3.4 is to obtain vector a;*

LIE DERIVATIVE-COORDINATEAPPROACH

*

117

Y

a

0.0 0.0

c

1.0 1.0

c* 0.8 1.2 A -0.2 0.2

FIGURE 3.3.4. The circular inset of Fig. 3.3.3 is blown up to illustrate the Lie-dragging and Lie derivative ot a true vector field w(x) whose value at a is the arrow a’c and whose value at A is the arrow AC’.

from vector A:’ in order to evaluate A F cz* and from that a derivative. The yector a;* is said to be “pseudo-parallel’’to AC’. Byjhe samq token “unit vectors” ab and A-B can be said to be pseudo-parallel, as can ad and AD. Here the “pseudo” means that, their components being proportional, they would be truly parallel except for the fact that the basis vectors do not define parallelism. Because w is a true vector, this constructionensures that Cw is also a true vectorV the fact that we are able to draw the Lie differential A as an unambiguous arrow shows this. This implies that to transform to coordinates other than ( x , y), the transformation matrices for Cw and w must be the same. This requirement will now be V

applied to obtain a formula in component form for Lw.It will be much like the V formulas for absolute differential in Section 3.1.4. Before obtaining the formula, it will be useful to introduce a more expressive notation than (X,Y) for the new, Lie-dragged, coordinates-namely, (,rLt,y&). In addition to making explicit the previously implicit dependence on A?, this makes 0. The indicates “new.” manifest the smoothness requirement for the limit Af (Often the notation * is used to indicate “new,” but * is already in use.) After this replacement, Eq. (3.3.4) becomes

+

(3.3.9)

’118

GEOMETRY OF MECHANICS II: CURVILINEAR

For deviations from x i t , yLt these equations yield

Exumple 3.3.1: For the situation illustrated in Fig. 3.3.3, the numerical values are, roughly, aux

aux

(Eax $) = (0.10.4

0.1 0.2)

(3.3.1 1)

Contravariant components of a vector must transform with the same matrix as in Eq. (3.3.10). Applying this to the contravariant components of the vectors shown in Fig. 3.3.4 yields

the notation on the left indicates that the components are being reckoned in the “new” system. (In hopes of improved clarity, some redundancy has entered here. The super- 1 - 2 scripts 1,2 are interchangeablewith x , y, and the elements AC’ , AC’ are identical to w’(A), w2(A).) But the prescription for Lie dragging is that the elements on the left-hand side are numerically equal to the components of a;* in the old system, which are (a;*’, (This is illustrated in Fig. 3.3.4, where the location of point c* is proportionally the same within the square above dc as the point C’within the parallelogram above DC.) Remembering that the arrows came originally from the vector w whose Lie derivative is being evaluated, in a step analogous to Eq. (3.3.6), the vector appearing on the right-hand side of Q. (3.3.12) can be obtained as

After substituting this in Eq. (3.3.12) and completing the multiplication, Gtl and - 2 ac’ can, to adequate accuracy, be replaced by w1 and w2 in the terms proportional to A f .Combining formulas we obtain

As required, Lw is a tensor of the same order as w,and its contravariant components V are displayed in this equation. For the sake of concreteness, this result has been de-

LIE DERIVATIVE-COORDINATEAPPROACH

119

rived in the 2-D case, but extending the result to higher dimensions is straightforward (as will be sketched shortly). One observes that the terms with positive sign come from changes of the individual components (treated as scalar functions of position) and the terms with negative sign come from frame distortion and (using the word loosely) rotation. It is also possible to define the Lie derivative of covariant vectors and of tensors of higher dimensionality. Toward this end, in order to take advantage of formulas derived previously, we recast result (3.3.14) in terms reminiscent of the “absolute derivative” derived in Section 3.1.4. Linear transformation equations like Eiq. (3.3.10) were introduced and analyzed in Section 2.4.2. To make Eq. (3.3.10) conform with the notation of Eq. (2.4.8), we define the transformation matrix A (truncating higher order terms for simplicity): l - z aAV=t -%At), (,+%At !$At (A-l)ij = A‘. = %At l+%At At 1 - aY At

.,

7’

(3.3.15) By Eq. (2.4.5), these same matrix elements relate unit vectors along the axes according to

(3.3.16) These equations can be related in turn to Eqs. (3.1.9), in which the connecting quantities mii were introduced:

e+Ar,l = el

+ w1I el ,+% q e+Ar.2 = e 2 + o2el + w22 e 2 . 1

(3.3.17)

Here, unlike in Section 3.1.3 and as has been stated repeatedly, no metric is being assumed. Still, even though coordinate systems at different points have been connected using the vector field v rather than a metric, we may as well use the same symbols wi’ for the connecting coefficients now:

(3.3.18) According to Eq. (3.1.26), the components of the absolute differential D w ’ of a vector field w subject to “connection” w j i are given by

Dw‘ = dw‘ + w i w j i ,

(3.3.19)

where the second term “adds” the contributionfrom ham? rotation to the “observed” change dw’.In our present context, wishing to evaluate A C ’ - A k , (because dragging

120

GEOMETRY OF MECHANICS II: CURVILINEAR

i c forward is as good as dragging A?? back), we have to “subtract” the contribution

from frame rotation. Hence we obtain

in agreement with Eq. (3.3.14). As mentioned before, this same line of reasoning makes it possible to evaluate the Lie derivative of arbitrary tensors, contravariant, covariant, or mixed, by simple alteration of the formulas for absolute derivatives derived in Section 3.1.3.

Problem 3.3.2: Evaluate Cv, the Lie derivative of a vector with respect to itself, V in two ways-once using the formula derived in the text, the second time, in a more intuitive process, using the construction of a vector diagram like the one in Fig. 3.3.4. *3.4. LIE DERIVATIVE-LIE

ALGEBRAIC APPROACH14

3.4.1. Exponential Representation of Parameterized Curves A family of noninteracting, space-filling curves such as those encountered in the

previous section is known as a congruence. At each point on every curve of the congruence there is a unique tangent vector, call it v. The curves are then known as “flowlines” of v. Two of them are illustrated in Fig. 3.4.1. The lower curve passes through point A with parameter value Av,o, and other points on this curve are given by coordinates x i ( A , Av.o, Av). If another vector field, say u, is to be discussed, it is necessary to introduce another parameter, such as Au or p, but for now, since only one vector field is under discussion we can suppress the v subscript from Av. The coordi-

FIGURE 3.4.1. Two out of a congruence of curves belonging to vector field v, both parameterized by Av.

I4Sections marked by an asterisk are noticeably more difficult and should perhaps be skipped or skimmed over on first reading. The material in this section repeats the derivation of the Lie derivative of the preceding section. but using intrinsic, coordinate-free methods.

LIE DERIVATIVE-LIE ALGEBRAIC APPROACH

121

nates of the point with parameter value A, on the particular curve of the congruence that passes through point A (parameter value l o ) , can be expressed compactly using a Taylor series expansion in E = A - A0 and the exponential function:

-+... d

)

IAo

1,d2

(3.4.1) “]A0

The “exponential operator” appearing in the last line can be regarded simply as an abbreviation for the expansion in the previous line. In the sequel, standard formulas satisfied by the exponential function will be appfied to it. Such manipulations will be regarded as formal, subject to later verification, but they could be verified on the spot.

3.4.2. Identification of Vector Fields with Differential Operators

At this point we take what might be said to be the most important step on the route from classical formalism to modem formalism. It is to assert the curious identity

d dAv

- = v,

(3.4.2)

where v is one of the vector fields discussed in the previous section and Av is the parameter of the v-congruence of curves. Like v, the arrow corresponding to d/dAv depends on where it is located. By its definition (3.4.2), both it and v are tangent to the curve passing through that location, and it is assumed that Av is adjusted so their are, by definition, two symbols for ratio is equal to 1 everywhere. In short, v and the same quantity. For any usefulness to accrue to this definition, it is necessary to ascribe more properties to d/dAv. First of all, in a linearized approximation, an increase by 1 unit of the parameter Av corresponds to the same advance along the curve as does v. This is like the relation of ordinary vector analysis in which, if arc length s along a curve is taken as the curve’s parameter Av and x is a radius vector to a point on the curve, then dx/dAv is a unit-length tangent vector. If time t is taken as the curve’s parameter Av, then dx/dAv is instantaneous velocity. (It is just a coincidence that, in this case, the symbol v is appropriate for “velocity.”) These formulas can be interpreted as the result of “operating” on the coordinates of x (which are functions of position and hence of A) with the operator d/dAv. More generally, if f is any smooth function of

&

122

GEOMETRY OF MECHANICS 11: CURVILINEAR

position, then’5

d dAV

- f is the (linearized) change in f for Av +. Av

+ 1.

(3.4.3)

With the new notation, to recover velocity components from a trajectory parameterized as x i ( t ) , one applies the operator d/dr to x i (t). Further justification for the derivative notation will be supplied in the next section.

3.4.3. Loop Defect Consider next the possibility that two vector fields, the previous one v and another one u, are defined on the space under study, Since the quantity g = (d/dAv)f just introduced is a smooth function of position, it is necessarily possible to evaluate =

(&)(3

(3.4.4)

f.

Then, for consistency with Eq. (3.4.2), the quantity (d/dAu)(d/dAv),the “composition” of two operators, has to be regarded as being associated with a vector that is some new kind of “product” of two vectors u and v. With multiplication being the primary operation that is traditionally referred to as “algebraic,” one can say then that there is a new algebra of vector fields based on this product. It is not yet the Lie algebra of vector fields, however-the product in that algebra is the “commutator.” An attempt to understand the new “product” of two vectors is illustrated in Fig. 3.4.2, which brings us to the first complication. Though, according to Eq.(3.4.4), the “multiplication” of two vectors is necessarily defined, according to the figure the result of the multiplication can depend on the order of the factors. To quantify this we will use Eq. (3.4.2) to calculate (approximately, for small t) the difference of the coordinates of the two points B(,,”) and B(vu) shown in Fig. 3.4.2:

(3.4.5) To abbreviate formulas like this we introduce square brackets to define the “commutator” of two vectors d/dp and d/dA by (3.4.6) With this notation, dropping terms cubic and higher in t , Eq.(3.4.5) becomes

‘%he result (3.4.3) is also what would result from the replacement v -+ v . V in ordinary vector analysis.

LIE DERIVATIVE-LIE ALGEBRAIC APPROACH

123

0 FIGURE 3.4.2. Two routes to (potentially) the same destination. One route starts out from 0 along the v-congruence curve through that point; advancing the parameter by E yields point A("). The route continues from there to point B("") along the u-congruence curve as its parameter advances by the same amount C. For the other route, the congruences are traversed in reversed order. The deviations between tangent vectors and smooth curves are vastly exaggerated, especially since proceeding to the small E limit is anticipated.

d

Xf,,)

- Xf,,, = [e%,

d

eCG]xi

d I+€-+-€ dl,

I

0

12d2 -, I+€-+-€ 2 dl: dl,

12d21

2

- xi dl;l

(3.4.7) 0

This shows that the commutator, a new vector field, when applied to the position coordinates x i , provides (to leading, quadratic, order) the coordinate deviation between the two destinations. This justifies representing the closing vector by c2[d/dlv, d/dl,], as shown in the figure. This has provided us with a geometric interpretation for the commutator of two vector fields.

3.4.4. Coordinate Congruences The congruences corresponding to general vector fields u and v just analyzed have much in common with the coordinate curves of ordinary coordinate systems such as the curves on which x 1 varies while x 2 (and, in higher dimensions, all other coordinates) remain constant. We anticipated this connection in Section 2.4.5, Eq.(2.4.34), where notation

(3.4.8)

124

GEOMETRY OF MECHANICS 11: CURVIUNEAR

for unit vectors along coordinate axes was introduced formally. The main difference between the present notation and that of Eq.(3.4.2) is that partial derivative symbols are used here and total derivative symbols there. This distinction is intentional, for reasons we now investigate. One thing important to recognize is that the quantity x’ plays different roles, usually distinguishable by context: The set (x I , x2, . . . , x”) serves as coordinates of a manifold M. Any one coordinate, such as X I ,is a one-component function of position in M. This is the role played by x’ in Eq. (3.4.2). But x can equally well be regarded as the parameter establishing location on the curve resulting from variation of the first coordinate as the remaining coordinates ( x 2 , . . . , x”) are held fixed; this is one of the curves of one of the coordinate congruences. This is the sense of x 1 as it appears in Eiq. (3.4.8). In this context, x’ could just as well be symbolized by Ael where el is the vector field yielding this congruence. The vector d/dAv can presumably be expanded in terms of the basis unit vectors

(3.4.9) where the ui are the ordinary components of v. They are, themselves, also functions of position. When this expansion is used to evaluate the commutator defined in Eq. (3.4.6),the result is

(3.4.10) where the fact that the order of partial differentiation makes no difference has been used. In this form, it can be seen that the failure to commute of u and v is due to the possibility that their components are nonconstant functions of position-otherwise the partial derivatives on the right-hand side of Eq.(3.4.10)would vanish. When this observation is applied to the coordinate basis vectors themselves it can be seen that

(3.4.11) since the expansion coefficients of basis vectors are all constant, either zero or one. In other words, coordinate basis vectors belonging to the same basis commute. Up to this point one might have been harboring the impression that a “grid” made up of curves belonging to the u and v-congruences was essentially equivalent to a coordinate grid made of, say, curves of constant x ’ and x2. It is now clear that this is not the case since the latter set “commutes” while the former may not. (Actually it is a special property of two dimensions that the curves of two congruences necessarily

125

LIE DERIVATIVE-LIE ALGEBRAIC APPROACH

form a grid at all. In higher dimensionality the points B(,,”)and B(YU)in Fig. 3.4.2can be displaced out of the plane of the paper and the curves can pass without intersecting.) We have shown, then, that for u and v to serve as basis vectors it is necessary that they commute. It will be shown below that this condition is also sufficient.

Example 3.4.1: Expressing a Vector in Other Coordinates. Consider the vector v = - y a / a x x S / a y , with x and y being rectangular coordinates. How can this vector be expressed in terms of the unit vectors O/Br and 6/64 where polar coordinates r and 4 are defined by r(x, y) = d m ’ and @ ( x , y) = tan-’ f ? Evaluating vr and vqj we find

+

vr = 0,

and v4 = 1 ,

and from this we find

a + ( ~ 4 a) - = -. a

v = (vr)ar

84

This example makes the act of changing coordinates simpler than is the case in general. The simplifying feature here is that both vr and v4 are independent of x and y . In general, to express the coefficients of v in terms of the new coordinates requires substitution for the old variables in the new coefficients. It is still a useful and straightforward exercise to generalize this procedure to arbitrary coordinate transformations, leaving this substitution implicit.

3.4.5. Commutators of Quasi-Basis-Vectors A circumstance common in mechanics is that one wishes to use an independent set

of vector fields q I ,q2.. . ., qr as “iocal” basis even though they do not commute. In this case they are called “quasi-basis-vectors.” Let their expansions in terms of a true basis set be

r = 1 , 2 , ..., r.

(3.4.12)

where the coefficients are functions of position and, as is customary in mechanics, the coordinates of x are denoted 9 ’ . Sometimes the number r of these vectors is less than the dimensionality n of the space, but for now we assume r = n and that Eqs. (3.4.12)can be inverted; the inverse relations are (3.4.13)

Using Eq. (3.4.10),the commutator of two such quasi-basis-vectorsis given by

126

GEOMETRY OF MECHANICS 11: CURVILINEAR

(3.4.14)

This can be abbreviated as

This formula will result in a remarkable simplification in Section 5.3.3.

3.4.6. Lie-Dragged Congruences and the Lie Derivative A gratifying inference can be drawn by combining Eqs. (3.3.20), (3.4.9), and (3.4.10):

which can be written succinctly as

L = [v, ’ I,

(3.4.17)

where the appearing as the second argument is a placeholder for an arbitrary vector field, such as w(x). In words, Lie differentiation with respect to v and commutation with respect to v are identical operations. Arnold also calls [u, v] the “Poisson bracket” of the vectors u and v, but in this text the term Poisson bracket is applied only to scalar functions. This is such an important result that it is worth rederiving it in the modem language of vector fields, not because the derivation of Eq.(3.4.17) has been deficient in any way, but to exercise the methods of reasoning. We return, then, to Lie-dragged coordinate systems, first encountered in Section 3.3.1, now discussed using the new vector field notation. Consider Fig. 3.4.3; it shows the same two curves of the v-congruence as are shown in Fig. 3.4.1. (These curves happen to lie in the (x1,x2)-plane, but there is the possibility, not shown, of other, out-of-plane coordinates.) Temporarily supposing that some other vector field u is also defined, suppose that one curve of the u-congruence passes through points A and B. From A and B, advancing the v-parameter by c results in motions to points P and Q’,and advancing other points on the curve through A and B results in the dashed curve PQ’. Heavy curve PQ is

LIE DERIVATIVE-LIE ALGEBRAIC APPROACH

127

U

FIGURE 3.4.3. The (heavy) curve through points A and B is Lie-dragged by amount aAV= along the v-congruence,preservingits parametervalues, to yield the dashed curve. If both heavy curves belong to the u-congruence, and if the curves PQ and PQ’ coincide and their parameters match, the u congruence is said to be “Lie-draggedalong v.”

the member of the u-congruence passing through P. As drawn, the point Q lies on the curve BQ’, but in more than two dimensions the curves PQ and BQ’ might m i s s completely. In any case the points Q and Q’ do not necessarily coincide. On the other hand, if points Q and Q’ coincide and the I, parameter values at P and Q match those at A and B, the u-congruence is said to be “Lie-dragged along v.” As an alternative, let us drop the assumption that a vector field u has been predefined, and proceed to define u, retaining only the curve AB to get started. (In higher dimensions it would be a hypersurface.) We assume that &(A) = Iv(B)-this can be achieved easily by “sliding” parameter values by the addition of a constant-and we assume the same is done for all points on the curve AB. Performing the dragging operation shown in Fig. 3.4.3 for a continuous range of parameter values E yields a “Lie-dragged” u-congruence. In this dragging, points Q and Q’ coincide by definition, and the parameter values along curve AB are dragged (unchanged) to the curve PQ. By construction then, the parameters Iv and I , can serve as coordinates over the region shown. (In higher dimensionality, if AB is a hypersurface with coordinates I,, , I u 2 ,. . ., then a similar dragging operation yields hypersurface PQ, and I V ,IU1, A,, .. . form a satisfactory set of coordinates.) The basis vectors of this newly defined coordinate system are d/dIv and d/dA, because these vectors point along the coordinate curves and (in linearized approximation) match unit advance of their parameter values. Furthermore, because Iu is constant on a curve of the v-congruence, the replacement d/dIv + a/& is valid. Similarly, since I Vis constant on a curve of the u-congruence, the replacement d/dIu -+ O/OI, is also valid. Applying Eq. (3.4.1 I), we conclude that the u-congruence generated by Lie-dragging along v satisfies

[u, v] = 0.

(3.4.18)

128

GEOMETRY OF MECHANICS 11: CURVILINEAR

X2

>.

0

X’

FIGURE 3.4.4. Construction illustratingthe vector field derivation of the Lie derivative.

Finally, we revisit the Lie derivative concept, defining and then evaluating Cw, the Lie derivative of w relative to v. For vector w,we repeat both lines of reaso4ng applied to the vector u earlier in this section. On the one hand, w will not, in general, satisfy the requirements of having been Lie-dragged by v. On the other hand, starting with a single curve of the w-congruence such as the curve AB in Fig. 3.4.4, setting A, = Av,o at every point on it by appropriate sliding, we can generate a v-dragged congruence-call it w* = d/dAw*.To simplify the notation slightly, since all calculations take place in the vicinity of the particular curve AB, which was chosen arbitrarily, we make the replacement A,,o + Av. For w* constructed in this way, d d d& dAc,*

d d dAw* dA,

w*(&) = ~ ( h , ) and -- - --.

(3.4.19)

The notation here is not quite consistent with that used in Fig. 3.3.4 because here the function w* is dragged forward whereas there it was dragged buck. This difference will be accounted for by a sign reversal below. For the following discussion, to avoid having to display vector functions, we introduce an arbitrary scalar function of position f ; it could be called “catalytic” since it will appear in intermediate formulas but not in the final result. Using Taylor series expansion to propagate w*forward we obtain d

d

d

d

d

d

(3.4.20)

where both of the relations (3.4.19)have been used. We can evaluate the second term similarly, by propagating w backward:

LIE DERIVATIVE-LIE ALGEBRAIC APPROACH

129

with adequate accuracy, the second coefficient has been evaluated at A,. Because the two quantities just evaluated are both being evaluated at the same place, they can be directly subtracted to define the Lie derivative by

(3.4.22) Combining formulas, ignoring the distinction between w and w*in the double derivatives, and suppressing the subsidiary function f , we obtain

cw = [v,w],

(3.4.23)

in agreement with Eq. (3.4.16). It is possible now (if one is so inclined) to abstract all “geometry” out of the concept of the commutator of two vector fields (or equivalently the Lie derivative.) One can think of a curve not as something drawn with pencil and paper (or by a skywriter in 3-D) but as a one-dimensional (smoothly connected, etc.,) set of points parameterized by hV in a space with coordinates (x’ ,x2, . . .), and think of d/dAv as a directional derivative operator (where “directional” means along the set). Then determining the discrepancy resulting from changing the order of two directional differentiations is a problem of pure calculus. This observation will be put to good use when the Poincark equation is derived in Chapter 5 . Numerous properties of the Lie algebra of vector fields are investigated in the following series of problems (mainly copied from Schutz). The most important Lie algebraic applications apply to a set of vector fields that is “closed under commutation” (meaning that the commutator of two vectors in the set is also in the set) in spite of the fact that it is a “proper” (not the whole set) subset.

Problem 3.4.1: Show that

and from this, removing the catalytic function u, show that, when operating on a vector

LC

c] = [v,w]‘ c

v’ w

(3.4.24)

Problem 3.4.2: Confirm Eq. (3.4.24)when the terms operate on a scalar function of position.

130

GEOMETRY OF MECHANICS II: CURVILINEAR

Problem 3.4.3: Using

6u = [v, u], show that, when acting on a vector u, (3.4.25)

which is known as the “Jacobi identity.”

Problem 3.4.4: For scalar function f and vector function w,show that

L

)

(3.4.26)

L(fw) = L f w + f Lvw , which is known as the “Leibniz rule.”

Problem3.4.5: If v = a/&’, which is to say v is one of the coordinate basis vectors, use Eq. (3.3.20) and the properties of a coordinate basis set to show that

&J‘

i

=B” XI

(Lw)

(3.4.27)

Problem 3.4.6: Consider any two vector “superpositions” of the form x = au

+ bv,

y = cu

+ dv,

where [u, vl = 0,

(3.4.28)

and where a, b, c, and d are functions of position with arguments not shown. Show that [x, y] can be written as a similar superposition of u and v.

Problem 3.4.7: Consider any two vector “superpositions” of the form x = au

+ bv,

y = cu

+ dv,

where [u, v] = eu

+ fv,

(3.4.29)

and where a, b, c , d, e, and f are functions of position with arguments not shown. Show that [x, y] can be written as a superposition of u and v. Before proceeding to further curvilinear properties, we will concentrate once again on linear spaces in order to further develop some linear-algebraic/geometric properties. In particular, multivectors are introduced in the next section. Curvilinear analysis resumes in Section 4.3.

BIBLIOGRAPHY

References 1. E. Cartan, LeFons sur la gbometrie des espaces de Riemann, Gauthiers-Villas, Paris, 195 1 (English translation available).

BIBLIOGRAPHY

131

2. B. A. Dubrovin, A. T. Fomenko, and S. P. Novikov, Modem Geometry: Methods and Applications, Part 1, Springer-Verlag,New York, 1984, p. 317.

Reference for Further Study Section 3.4 B. E Schutz, Geometrical Methods of Mathematical Physics, Cambridge University Press, Cambridge, UK, 1995.

4 GEOMETRY 0F MECHANICS Ill: MULTILINEAR

4.1. GENERALIZED EUCLIDEAN ROTATIONS AND REFLECTIONS 4.1 .l.Introduction “Generalized Euclidean” rotations will be considered next. As pointed out in Section 2.5, Euclidean geometry is characterized by the existence of metric form @(x) that assigns a length to vector x. The discussion here will mainly be restricted to three dimensions, even though, referring to Cartan’s book, one finds that most results can be easily generalized to n-dimensions. Certainly relativity requires at least four dimensions, and we will need more than three dimensions later on. Though arguments are usually given in 3-D, only methods that generalize easily to higher dimensionality are used. This may make some arguments seem clumsy, but the hope is that maintaining contact with ordinary geometry will better ground the discussion, by addressing issues for which the motivation is familiar. On deeper study at a later date, the reader should have no difficulty constructing more general relations. The word “generalized” is intended to convey two ways in which something more general than Euclidean geometry is being studied. One of these, already rather familiar from special relativity, is the “pseudo-Euclidean” case in which one of the signs in the Pythagorean formula is negative. One knows of course that, including time, nature makes use of four coordinates. To save words without essential loss of generality, we will restrict the discussion to three, say x, y, and t. The more important “generalization”is that the “components” x’, x2, and x3 will be allowed to be complex numbers. In spite of the extra level of abstraction, the theorems and proofs are quite straightforward,and physical meanings can be attached to the results. In ordinary geometry, spatial rotations are described by “orthogonal” matrices. They are sometimes called “proper” to distinguish them from “improper” rotations 132

GENERALIZED EUCLtDEAN ROTATIONS AND REFLECTIONS

133

- combine a reflection and a rotation. But to avoid confusion later on, because the term “proper” will be used differently in connection with the pseudo-Euclidean metric of special relativity, we will use the terms “rotations or reversals” for transformations that preserve the scalar product of any two vectors. Such a transformation has the form x” = a i k x k , or as a matrix equation x’ = Ax.

(4.1.1)

If both frames of reference related by this transformation are “orthonormal,” then the orthogonality requirement is (4.1.2) where the usual “Kronecker-8” symbol satisfies 8jk = 1 for j = k and zero otherwise. These conditions simply express the assumed orthonormality of the new basis vectors. These terms determine the elements of the matrix product AAT, they also imply that det lAATI = 1 and hence

det IAI =

1.

(4.1.3)

The same transformation, when expressed in terms of skew axes that are related to the orthonormal basis by matrix 7, will be described by a matrix equation y’ = 7 A 7 - l ~= By. As a result, because of the multiplicative property of determinants, det IBI = f l for any basis vectors, orthonormal or not. Operations for which det IAl = 1 are to be called “rotations,” and those for which det IAJ = -1 are “reversals” or “reflection plus rotations.” 4.1.2. Reflections Referring to Fig. 4.1.1, the equation of a plane (or, in general, a hyperplane) ?r containing the origin is a . x = aix’ = 0 .

X’

FIGURE 4.1.1. Reflection of vector x in plane ff associated with vector a.

(4.1.4)

134

GEOMETRY OF MECHANICS 111: MULTILINEAR

This implies that R is associated with a vector a having covariant component ai and that any vector x lying in K is orthogonal to a.l Whenever the statement “a hyperplane is associated with (or corresponds to) a vector” appears, this will be its meaning. A vector x’ resulting from “reflection” of vector x in plane K is defined by two conditions: (1) The vector x’ - x is orthogonal to hyperplane R; (2) The point i<x’ x) lies in 7r.

+

The first condition implies that x’ - x is parallel to a:

+ Lai.

(4.1.5)

aixi ai(2xi+ ha’) = 0, or A = -2. aia’

(4.1.6)

x”

- x i = Aai, or x” = x i

The second condition then implies that

Since this formula fails if aia’ = 0, we insist that a be nonisotropic, and in that case we may as well assume that a is a unit vector. Then the reflection vector x’ is given bY ’i

-

.

n - x‘ - 2akxka’,

or x’ = x - 2(a. x)a.

(4- 1.7)

(For real vectors, using standard vector analysis, this formula is obvious.) This transformation can also be expressed in matrix form: x’ =

(

I - 2a’al -2a2al -2a3al

I

-2a’a2 -2a’as - 2a2a2 -2a2a3 -2a3a2 1 - 2a3a3

) ($) =

Ax.

(4.1.8)

Just as the reflection plane K can be said to be “associated” with the vector a,the 3 x 3 matrix A can also be said to be “associated” with a. Since a can be any nonisotropic vector, it follows that any such vector can be associated with a reflection plane and a reflection matrix. Transformation Eq. (4.1.8), when associated with plane R,or equivalently with unit vector a, is called a “reflection.” Reflection preserves the scalar square, as can be checked. In the real, pseudo-Euclidean case, reflections for which a is space(time)like are called “space(time)-like.” 4.1.3. Expressing a Rotation as a Product of Reflections In this section, some properties of rotations are obtained by representing a rotation as the product of two reflections. Though it may seem distasteful and unpromising ’If a has a nonvanishing scalar square, as will be required shortly, then a can be taken to be a unit vector without loss of generality.

GENERALIZED EUCLIDEAN ROTATIONS AND REFLECTIONS

135

to represent a continuous object as the product of two discontinuous objects, the arguments are both brief and elementary, and encompass real and complex vectors, as the following theorem shows. The theorem is expressed for general n because the proof proceeds by induction on n; to simplify it, mentally replace n by 3. It applies to all transformations that leave @ (the form introduced in Q. (2.5.2)) invariant, but differentiates between the two possibilities, rotations and reversals. For rotations of ordinary geometry the theorem is illustrated by Fig. 4.1.2.

THEOREM 4.1.1: Any rotation (reversal) in n-dimensional space is a product of an even (odd) number 5 n of reflections. Proof: For n = 1, the theorem is trivially satisfied because rotation (reversal) amounts to multiplication by 1(- 1). Assume it holds for n - 1. As a special case, suppose that the transformation leaves invariant a nonisotropic vector ql.(In ordinary 2-D geometry (in a plane), this could be true for a reflection, but not for a typical rotation.) Taking ql as one basis vector, and augmenting it with n - 1 independent vectors all orthogonal to ql to form a complete set of basis vectors, the fundamental form becomes

aJ= 811(U1)*

*

+ @,

(4.1.9)

where in = giju’uj, the summation running 2,3, . . .,n . Since the transformation leaves u’ invariant, by applying the theorem to the n - 1-dimensional hyperplane orthogonal to vi the domain of applicability of the theorem is increased from n - 1 to n, and hence to all n in this special case. Advancing from the special case just discussed to the general case, suppose the transformation is such as to transform some nonisotropic vector a into a’. Consider

Y

I’

FIGURE 4.1.2. Composition of a pure rotation from two reflections.

136

GEOMETRY OF MECHANICS 111: MULTILINEAR

then the reflection associated with the vector a - a’. For this reflection to make sense, the vector a - a’ must itself be nonisotropic; we assume that to be the case and defer discussion of the exception for the moment. Applying conditions ( 1 ) and (2) above, it can be seen that the vector a transforms to a‘ under this reflection. The original transformation can then be thought of as being composed of this reflection plus another n - 1 dimensional rotation or reversal that leaves a invariant. Having manipulated the problem into the form of the special case of the previous paragraph, the theorem is proved in this case also. There still remains the possibility that the vector a - a’ is isotropic for all a. The theorem is true even in this case, but the proof is more difficult [l]. We will cross this particular bridge if and when we come to it. Factorization of a rotation in twodimensional space into a product of two reflections is illustrated in Fig. 4.1.2.

4.1.4. The Lie Group of Rotations That rotations form a group follows from the fact that the concatenation of two reflections conserves the fundamental form and that each reflection has an inverse (which follows because the determinant of the transformation is nonvanishing). To say that the group is continuous is to say that any rotation can be parameterized by a parameter that can be varied continuously to include the identity transformation. Continuous groups are also called Lie groups. One scarcely expects a proof depending on closeness to the identity to be based on transformations that are clearly not close to the identity, such as reflections. But that is what will be done; clearly the product of two reflections can be close to the identity if the successive planes of reflection are almost coincident. Referring again to Fig. 4.1.2, let a and b be the vectors associated with those reflections.

THEOREM 4.1.2: In complex Euclidean space, and in real Euclidean space with positive definite fundamental form, the set of rotations (real in the latter case), forms a continuous group. Proof: For any two unit vectors a and b, and for n 2 3, there is at least one unit vector c orthogonal to both a and b. From these vectors one can construct two continuous families of reflections depending on a parameter I; they are the reflections defined by unit vectors

a’ = acos t +csin t ,

b’ = bcos t +csin t

0 5 t 5 n/2.

(4.1.10)

The planes of reflection are associated, as defined above, with these unit vectors. The product of these reflections is a rotation. Let us suppose that a particular rotation under study results from reflection corresponding to a followed by reflection corresponding to b. That case is included in Eq. (4.1.10) as the t = 0 limit. In the t = n/2 limit, the transformation is the identity rotation-it is the product of two reflections in the same plane, the one corresponding to c. This exhibits the claimed continuity for rotations constructable from two reflections. For dimensions higher

MULTIVECTORS

137

than 3, a rotation may need (an even number of) reflections greater than two. Proof that the continuity requirements hold in this case can be be based on pairs of these rn reflections. Every rotation in real 3-D space is represented by a 3 x 3 orthogonal matrix with determinant equal to 1. The group formed from these matrices and their products is called S0(3), where “S” stands for “special” and implies determinant equal to 1, “0’ stands for “orthogonal,” and “3” is the dimensionality.

+

+

4.2. MULTIVECTORS “Multivectors” in n-dimensional space are multi-component objects, with components that are functions of p 5 n vectors, x, y, z, . . .. Because of this they are also known as “p-vectors.” They can be regarded as the generalization of the well known vector cross product to more than two vectors andor to dimensionality higher than 3. The number p must have one of the values 1,2, . . . , n. Of these, p = 1 corresponds to ordinary vectors, and the case p = n is somewhat degenerate in that, except for sign, all components are equal. (In the n = 3 case, they all equal the “triple product” x . (y x z). See Fig. 4.2.1.) For the case n = 3 then, the only nontrivial case is p = 2. That is the case that is “equivalent to” the vector cross product of standard physics analysis. Here this geometric object will be represented by a 2-vector, also known as a “bivector.” This will permit generalization to spaces of higher dimension. Multivectors are also essentially equivalent to “antisymmetric tensors.”

4.2.1. “Volume” Determined by Three and by n Vectors The (oriented) volume of the parallelopiped defined by vectors x, y, and z is V = x (y x z). The sign of this product depends on the order of the three vectors-that is the essential content of the “oriented” qualifier. The interpretation as “volume” can be inferred from the well-known geometric properties of cross products. By this

-

Y

FIGURE 4.2.1. For n = 3-dimensional space, the p = 3-multivector formed from vectors x, y, and z is essentially equivalent to the triple product x . (y x 2 ) . Its magnitude is the volume of the parallelopiped defined by the three vectors and its sign depends on their orientation; this makes it an “oriented volume.”

138

GEOMETRY OF MECHANICS 111: MULTILINEAR

interpretation, it is clear that the volume is invariant, except possibly in sign, if all vectors are subject to the same rotation or reversal. The same result can be derived algebraically from the known properties of determinants. For this the volume A is related to the determinant of the array of components x1

x2

i;: $

A = detl

::1

x3

(4.2.1) *

Assume temporarily that these components are Euclidean, i.e., the basis is orthonormal. If the three vectors are all transformed by the same rotation or reversal (defined previously), the determinant formed the same way from the new Components is unchanged, except for being multiplied by the determinant of the transformation. This is known as “the multiplication property of determinants.” (This result is regularly used as the “Jacobian” factor in the evaluation of integrals.) For rotations or reversals, the determinant of the transformation is f 1, and A is at most changed in sign. Now retract the assumption that the basis in Eq. (4.2.1) is Euclidean. A determinant can also be formed from the covariant components:

I

x1 x2 A ’ = det y1 y2

x3

z2

z3

z1

y3

I

.

(4.2.2)

Its value can be determined from the definition of covariant components (Eq. (2.5.15)) and the multiplication property of determinants, A’ = g A ,

(4.2.3)

where g is the determinant of the matrix of metric coefficients g i j . Taking advantage of the fact that transposing a matrix does not change its determinant, the product

AA‘=det

y’

y2 y3 det

z1 z2 z3

1 1: 1 I x2

Y1 y2

21

y3

23

z2

x.x x.y x.z = det y.x y-y y.z z.x

z.y

z.Z

1

= V2, (4.2.4)

where the product has been called V2; its value is independent of the choice of axes because the final determinant form is expressed entirely in terms of scalars. (They can be evaluated in the Euclidean frame.) From Eqs. (4.2.3) and (4.2.4), it follows that

(4.2.5) Here the signs of V and A have been taken to be the same. All the determinants in this section generalize naturally to higher dimension n. It is natural to define the volume V of the hypervolume defined by n vectors in ndimensions by Eq. (4.2.5), with A being the n x n determinant of the contravariant components of the n vectors.

139

MULTIVECTORS

4.2.2. Bivectors

In 3-D, consider the matrix of components of two independent vectors x and y:

(”: Y

From this array,

(

z)

Y

;i).

(4.2.6)

= 3 independent 2 x 2 determinants can be formed:

(4.2.7) as well as the three others that differ only in sign. (One might think it more natural to define x l3with the opposite sign, but this is just a matter of convention, and the present definition preserves the order of the columns of the subblocks and orders the indices correspondingly with no sign change.) The pair of vectors x and y can be said to constitute a “bivector” with components given by Eq. (4.2.7). Note that the components are the “areas” of projections onto the coordinate planes (except for a constant factor, which is 1 if the axes are rectangular). This is illustrated in Fig. 4.2.2. A common (intrinsic) notation for this bivector is x A y, which is also known as the

..... - ........... r..

<....*..

FIGURE 4.2.2. The components xg of the bivector x A y, as defined in Eqs. (4.2.7), are areas of projections onto the coordinate planes. Their magnitudes (with orientation) are also given by cross products of projections of x and y onto the corresponding planes.

140

GEOMETRY OF MECHANICS 111: MULTILINEAR

“wedge product” or “exterior product” of x and y. The components are also said to belong to an antisymmetric tensor.2 The bivector x A y “spans” a two-dimensional space, the space consisting of all linear superpositions of x and y. The condition for a vector t to belong to this space is x2 y2

x3

det y ’ t’

t2

t3

I

XI

I

y3 = t 1 x 2 3+ t2x31

+ t3x12 = 0

(4.2.8)

(the volume defined by x, y, and t is zero). This means that the necessary and sufficient condition for two bivectors xi’ and y’j to span the same space is that their components be proportional. One can also define covariant components of a p-vector. They are the same determinants, but with contrayiant components replaced by covariant components. From two one-fonns Z and b one can similarly form a two-form that will be the oneforms’ wedge product ; A 6. Mixed two-forms for which one factor is a vector and the other a form can also be defined. The geometric object on which the symplecric geometry of Hamiltonian systems is based is such a two-form. (See Chapter 13.)

4.2.3. Multivectors and Generalization to Higher Dimensionality

In three dimensions one can define a 3-vector from the vectors x, y, and z. It consists of the

(:>

= 1 independent determinants that can be formed by picking three

columns from the 3 x 3 matrix whose rows are xT, yT, and zT.From Eq. (4.2.4) it is clear that the value of this component is the oriented volume defined by the three vectors. In an n-dimensional space, a p-vector can be defined similarly, for p 5 n. Its elements are the

(l)

determinants that can be formed by picking p columns from

the matrix with p rows xy, x;, . , - ,xp’. An invariant “measure” or “area” or “volume” (as the case may be) V of a pvector can be defined by a p x p determinant:

y.x

V 2 = det ...

... y*z . . . ... . . . = det

y.y

Y’xi

.,.

y’yj

...

” ’

...

ykZk

...

2Normally the “antisymmetrization”of tensor x’j is defined to yield .[‘’I I(1/2!)(xij - 1 ’’). This means there is a factorial factor (1/2!) by which the wedge product differs from the antisymmetrized product.

MULTIVECTORS

=xiyj

. . . Zk det . . . . . . . . . . . .

141

(4.2.9)

The validity of removing a factor common to a column of a determinant has been employed repeatedly in going from the first line to the second, and so has the vanishing of all terms for which two indices are equal. That V 2 is invariant is made manifest by its definition by the first expression. In the final summation the only nonvanishing terms have combinations of indices that are all different, and for any such combination there are p! equal terms-this accounds for the l/p! factor For example, consider a bivector whose covariant components Pij are given by

The square of the measure of this bivector is

+

This has the dimensions of area squared. For n = 3, it is equal to P1*P12 P23P23 f P31 P31. If axes 1 and 2 are rectangular, PI2 and P12 are each equal to the projected area on the 1,2 plane. Since the product P l2 P12 is invariant for other, not necessarily skew, axes 1’ and 2’, provided they define the same plane, its value is the squared area of the projection onto that plane. As a result, the square of the measure of the bivector is the sum of the squared areas of the projections onto all coordinate planes. In particular, if x and y both lie in one of the basis planes-a thing that can always be arranged-the measure is the area of the parallelogram they define. These relationships can be thought of as a “Pythagorean relation for areas.” They are basic to invariant integration over surfaces. Clearly the bivector Pi, and the conventional “cross product” x x y are essentially equivalent, and the measure of the bivector is equal to Ix x yI, except possibly for sign. The virtues (and burdens) relative to x x y of Pij are that it is expressible in possibly skew, possibly complex coordinates and is applicable to arbitrary dimensions. The invariant measure of the trivector formed from three vectors x, y. and z, in the n = 3 case, is x’

x2

x3 (4.2.12)

An important instance where this combination arises is when x and y together represent a bivector (geometrically, an incremental area on the plane defined by x and y) and z = F is a general vector field. In this case V measures the flux of F through the area. In the next section, V will be equivalently regarded as the invariant formed from vector F and the vector “supplementary” to the bivector formed from x and y.

142

GEOMETRY OF MECHANICS 111: MULTILINEAR

As mentioned elsewhere, V as defined by Eq. (4.2.12) and regarded as a function of vectors x, y, and z is known, for example by Arnold, as a three-form because it is a linear, antisymmetric function of its three vector arguments. The measure defined by Eq. (4.2.9) will be important in generalizing Liouville’s theorem in Section 13.8.1. 4.2.4. “Supplementary” Multivectors

There is a way of associating an (n - p)-vector Q to a nonisotropic p-vector P. Cartan calls Q the “supplement” of P, but it is more common to call it the “the Hodge-star of P,” or * P. It is the mathematician’s more-sophisticated-than-crossproduct but in-simple-cases-equivalent way of obtaining a vector from two other vectors when a fundamental form exists. The conditions to be met by Q are as follows:

(1) The (n - p)-dimensional manifold spanned by Q consists of vectors orthogonal to the p-dimensional manifold spanned by P; (2) The “volume” or “measure” of Q is equal to the “volume” of P; and (3) The signs of the volumes are the same. For the case n = 3, p = 2, the identification proceeds as follows. Let xij = (x A y)‘’ be the 2-vector P and t be the sought-for 1-vector Q. The conditions for t to be orthogonal to both x and y are tin’

+ t2x2 + t3x3 = 0,

tly’

+ t2y2 + t3y3 = 0.

(4.2.13)

Eliminating alternately t2 and rl yields t1xl2

+ 23x32 = 0 ,

t2x21

+ t3x3’

= 0.

(4.2.14)

On the other hand, as in Eq. (4.2.8), if the Qi are the covariant components of Q . the condition for t to belong to the space spanned by Q is that all 2 x 2 determinants in the matrix

(:,

:2

i3)

must vanish:

t l Q3

- t 3 Q l = 0,

f2 Q 3

- r3 Q 2 = 0.

(4.2.15)

Comparing Eqs. (4.2.14) and (4.2.15), it then follows that the Qi and the x ’ j are proportional when the indices ( i , j , k) are an even permutation of (1, 2,3): (Ql, Q2, Q3)

= const. ( x * ~ , x ~ ~ , x ~ * ) .

(4.2.16)

Using condition (2) to determine the constant of proportionality, further manipulation yields

MULTIVECTORS

143

(4.2.17)

As an example, suppose that x ’ j derives from x = (Ax, 0,O)and y = (0, Ay, 0). so that its nonvanishing components are x I 2 = -x2’ = Ax Ay. Then the supplementary covector is ( Q l , Q2, Q3) = ,&(O,O, Ax Ay). This combination will be used later on in Section (4.3.2),in deriving a generalized version of Gauss’s theorem. This derivation could also have been carried out within traditional vector analysis. All that is required is that the three-component “vector” with components x2y3 x3y2,x3y2 - x 1 y 3 ,x1y2 - x2y3 be orthogonal to both x and y. The point of the present derivation is that it works for complex components and for skew axes and, furthermore, that it can be generalized to arbitrary n and p (though it involves relatively difficult combinatorics). The most general multivector P has contravariant components with p superscripts. These components are related to the covariant components of the supplemen.. . tary multivector Q, which has n - p subscripts, by P t I 1 2 - ’ p = ( l / & l Q i p + l i p + p i , , . Altogether the indices are to be taken in cyclic order. Interchanging covanants and contravariants, the formula is Pjli2,,.jp= J r i Q i p + l i p + z . . . i f l . Applying this result in three dimensions, the vector supplementary to the bivector formed from (contravariant) vectors x and y is given by Q i = & € i j k X i y k , where C j j k is the Levi-Civita, antisymmetric symbol. This is the same formula used to calculate the components (x x y)’ of the cross product in rectangular coordinates. One sees now that it is more appropriate to regard the results as being covariant components. 4.2.5. Sums of pVectors An algebra of p-vectors can be defined according to which two p-vectors can be “added” component-wise. All components of a p-vector can also be multiplied by a common factor. Dual (n - p)-vectors are obtained using the same formulas as above. After addition of two p-vectors, each derived from p 1-vectors as above, one can inquire whether p 1-vectors can be found that would yield the same p-vector. The answer in general is no. Hence one introduces new terminology. A “simple” p-vector is one obtainable from p 1-vectors as above, and the term p-vector is redefined to include sums of simple p-vectors. However, for n = 3 all bivectors are simple.

4.2.6. Bivectors and Infinitesimal Rotations We finally make contact with mechanics by identifying an infinitesimal rotation, such as a physical system might be subject to, with a bivector. It is appropriate to mention that the potential ambiguity between the active and passive interpretations of transformations is nowhere more troublesome than in this area. This difficulty shows up mainly in a difficulty in maintaining notational consistency between boldface index-free symbols that stand for geometric objects and regular-face symbols with

144

GEOMETRY OF MECHANICS 111: MULTILINEAR

indices that stand for their components. However, the difficulty will mainly come up in later chapters, when the results of this section are applied in mechanics. Consider a rigid object, such as the one illustrated in Fig. 4.2.3, with a single point, perhaps its center of mass, fixed in space. Taking this point as the origin, let x ( t ) be a vector from the origin to a point P fixed in the body-this vector depends on time 1 because the object is rotating. The components x i ( t ) of x(t) are taken with respect to Cartesian axes (el, e2, e3), not necessarily orthonormal, but fixed in space. The most general possible relation between x(t) and an initial state x(0) is a rotation

(With skew axes, the matrix 0 is not necessarily orthogonal, but it will facilitate use of this formula in a later chapter if it is given a symbol that suggests it may be orthogonal. 0 can be time-dependent.)The velocity of point P is given by (4.2.19) This shows that velocity components u i ( t ) are linear combinations of the ~ ~ ( 0 ) which, by the second of Eqs. (4.2.18), are in turn linear combinations of the instantaneous particle coordinates x ' ( t ) . This implies that the ui( t ) are linear functions ofthe x l ( t ) , (4.2.20) which serves to define the matrix 0: (4.2.21) Since the body is rigid, the velocity is necessarily orthogonal to the position vector:

Since this is true for d l x i , it follows that Qlk = -&[. his means that the components Q'k are the mixed components (one index up, one down) of a bivector. During an "infinitesimalrotation" occuring in the time interval from t to f +dt, the displacement of point P is given by dx = v d t . To work in linearized approximation, while nevertheless suppressing dr from the formulas, we simply set dt = 1, in effect assuming that the unit of time is so small as to legitimize linear approximation. This is the only way that one can "legitimize"the addition of the "tangent plane quantity" dx to x. Then from Eq.(4.2.20), the coordinates of point P at time dt are given by x" = x i

+ dx',

where dx' = n i k X k

(4.2.23)

MULTIVECTORS

145

0 FIGURE 4.2.3. A rigid object with point 0 fixed rotates by angle d@about an axis along vector w during time dt. The velocity of point P is given by v = w x x.

and where the Q’k are the mixed components of a bivector. (In an attempt to maintain dimensional consistency, since C2 has inverse time units, a physicist might call dx’ the incremental displacement of P per unit of time.) Referring to Fig. 4.2.3, one can compare this result with rotational formulas from vector analysis, w=-

dx=d4xx,

d4 dt ’

v=wxx,

(4.2.24)

where d+ is an infinitesimal vector, directed along “the instantaneous rotation axis,” with magnitude equal to the rotation angle. Our bivector Rik clearly corresponds to the “angular velocity” vector w; in fact, transcribing the last of Eqs. (4.2.24) into matrix notation,

(The “qualified equality” symbols are intended to acknowledge the nonintrinsic, i.e., coordinate-dependent,nature of the relationship.) Infinitesimalrotations around individual Euclidean base axes can be expressed in terms of the antisymmetric matrices

,I=(,

0 0 0 0 -I), 0 1 0

0 .=(O

0

1

0 0). -1 0

0

0 -1 0 h=(l 0 0). 0 0 0

(4.2.26)

146

GEOMETRY OF MECHANICS 111: MULTILINEAR

For example, an infinitesimal rotation through angle d& around el is described by

(The sign of the second term depends on whether the transformation is regarded as active-x rotates-or passive-the coordinate system rotates. As written, the frame is assumed fixed and the vector x is actively rotated-a positive value for dbl corresponds to a vector aligned with the positive x2 axis being rotated toward the positive x3 axis.) The transformation yielding rotation through angle d$ around unit vector a is x' = ( 1

+ (a151 + a252 + a353)d#)x = (1 + a .J d4)x,

(4.2.28)

where the triplet (51, J 2 , J 3 ) is symbolized by J, as if it were a vector. Eq. (4.2.28) strains our notation. The appearance of a matrix like 51 as one element of the triplet (31, J2, 53) suggests it should have a lightface symbol, while the fact that it, itself, has multiple elements suggests that a boldface symbol would be appropriate. However, the latter is somewhat unpersuasive since the elements of J1 are not different in different coordinate systems.

4.3. CURVILINEAR COORDINATES (CONTINUED) At this point, several of the threads encountered so far can be woven together: bivectors along with their measures, curvilinear coordinates, invariant differentiation, differential forms. and mechanics.

4.3.1. Local Radius of Curvature of a Particle Orbit Recall the analysis of a particle trajectory in Section 3.1.5. As in Eq. (3.1.36),with local coordinates being u' ,the particle speed u is given in terms of its velocity components 6' by u2

= gjkLiJuk3

(4.3.1)

where gjk is the metric tensor evaluated at the instantaneous particle location. Since the particle acceleration ai = iii I')kuJ6kwas shown to be a true vector in Section 3.1.5, it can be used along with the velocity to form a true bivector

+

(4.3.2) The square of the measure of this bivector (as defined by Eq. (4.2.11) it is equal to a sum of squared-projected-areason the separate coordinate planes) is known to be

CURVILINEAR COORDINATES (CONTINUED)

147

coordinate-independent.In particular, if the particle orbit lies instantaneously in one such plane-a thing that can always be arranged-the measure of the bivector is the area defined by the instantaneous velocity and acceleration vectors, that is, by u 3 / p , where p is the local radius of curvature of the particle trajectory.

Problem 4.3.1: .Write . a manifestly invariant expression for local radius of curvature p in terms of 2 ,a', and g j k . Check it for uniform circular motion on a circle of radius R in the x , y plane. *4.3.2. Generalized Divergence and Gauss's Theorem3 As derived in vector analysis, Gauss's theorem relates the volume integral of the divergence of a vector to the flux of the vector through the volume's surface. Here, in several steps, this result is generalized to be independent of choice of coordinates. This is only a partial generalization in that we are still describing Euclidean geometry, though using arbitrary curvilinear coordinates. The first step is to define the divergence of a vector. The other steps are: (a) express the volume element in terms of Christoffel symbols, (b) express the volume integral as a sum of one-dimensional integrals, (c) utilize the fundamental formula of integral calculus, ( d X / d x )dx = X I : , and (d) express the resulting terms as a surface integral of an intrinsically defined flux. Absolute differentials have been defined for contravariant vectors in Eq. (3.1.26). for covariant vectors in Problem 3.1.2, and for two-index tensors in Problem 3.1.3. The generalization to tensors of arbitrary order is obvious, and the following discussion also transposes easily for tensors of arbitrary order. For simplicity, consider the case of Eq, (3.1.30):

Since duk and Dail are true tensors, the coefficients4 (Dail)k G ail;k =

aair auk

- - ajr riJk- aij

riJk

(4.3.4)

also constitute a true tensor. As another example, if Xi are the covariant components of a vector field, then

(DXi)j z X i ; j = Xi,j - X k

k Ti j

(4.3.5)

3Sections marked by an asterisk are noticeably more difficult and should perhaps be skipped or skimmed over on first reading. The next section contains similar material, based instead on differential forms, which does not depend on material in this section, and is more central to the later developments in mechanics. 41n this section, following Schutz, a covariant derivative with respect to u k is indicated by the semicolon subscript ;k . This notation must be distinguished from the even more common notation in which the subscript , k is used to indicate the ordinary partial derivative, as in X i . j = a X i / a u J , which appears in Elq.(4.3.5).

148

GEOMETRY OF MECHANlCS 111: MULTILINEAR

is also a true tensor. The tensors of Eq. (4.3.4) and Eq. (4.3.5) are called covariant (or invariant) derivatives of ail and Xi, respectively. In preparation for the formulation of volume integration, one must understand the curvilinear description of volume itself. It was shown in Section 4.2.1 that the volume of the parallelopiped defined by n independent-of-position vectors x, y, . . . , z in a Euclidean space of n dimensions is given by V = A &,where A is the determinant of the n x n matrix of the vectors' contravariant coordinates and g = I det lgij II. The vectors x, y, . . . , z also define an n-vector, which is an n-index antisymmetric tensor all of whose components are equal, except for sign, to u12"'n. (See Section 4.2.3.) Forn = 3, x1 x2 x3

= A.

(4.3.6)

The covariant differential of this tensor is 0-~

~ 1 2 - +-1 da12"'n

- da12...n

+ ai2...n

+ali...n

+ a12.,.n

wi2

+ . . . + a'2-i

+ wzZ + . * . +

onn),

wi

(4.3.7)

which vanishes because the vectors of the multivector are assumed to be constant. Expressing this in terms of the volume V yields

-

(4.3.8)

fi

Being defined by constant vectors, V itself is constant, which implies

(4.3.9) a relation fundamental to volume integration.

Problem 4.3.2: Confirm Eq. (4.3.9)by direct differentiation of g. This is unacceptably tedious unless one takes advantage of the fact that the metric can be made diagonal (variable elements) by appropriate choice of coordinates. Consider next the covariant derivative of a contravariant vector X i; it is given by Eq. (3.1.26).It was shown in Section 2.4.4that contraction on the indices of a mixed tensor such as this yields a true scalar invariant. In this case it yields what is to be known as the divergence of X,

where Eq. (4.3.9) has been used. This quantity does not depend on the Christoffel symbols.

149

CURVILINEAR COORDINATES (CONTINUED)

As in ordinary vector analysis, the primary application of the divergence operation is in Gauss’s theorem. Cross multiplying the &factor in Eq.(4.3.10) and integrating over volume V (again specializing to n = 3 for convenience) yields

& ,

divXdu’ du2 du3 =

au

du’ du2 du3.

(4.3.11)

At this point one should recall the derivation of Gauss’s theorem for ordinary rectangular coordinates-the volume is broken up into little parallelopipeds with faces on which one of the coordinates is fixed and the others vary. Applying Taylor’s theorem to approximate the integrands variation, and recognizing that contributions from intenor surfaces cancel in pairs, one can replace the right-hand side of Eq. (4.3.1 I ) with a surface integral over the closed surface S bounding the volume V .The result is

Jfs,

divX &du‘

du2du3 =

Js,

&(XI du2 du3 + X 2 du3 du’

+ X 3 du’ du2). (4.3.12)

//l

d i v X A d e t dq’ dr’

dq2 dq3 = dr2 dr3

&det

dw‘

x’

dw2 dw3 , x2

x3

’Though expressed in coordinates, the coordinate-independenceof individual factors has been demonstrated previously.

150

GEOMETRY OF MECHANICS 111: MULTILINEAR

called the value of a two-form d2) evaluated on the two vectors, so that k(2) (x, y) is the “measure” (in this case area) the vectors define. We have come most of the way toward deriving the generalized invariant integration theorem without introducing the formalism of exterior differential forms. This is no worse than following in the footsteps of P o i n c d , who obtained these results in the first place. It also serves as preparation for the application of the exterior differential, which comes next. *4.4. INTEGRATION AND EXTERIOR DIFFERENTIATION OF FORMS6 4.4.1. One-Dimensional Integrals

The integration of a one-form has already been considered in Eqs. (2.3.8) and (2.3.10), which are combined here in slightly modified form: (4.4.1) In this case, the one-form being integrated is 6 and the “line integral” runs along a curve from beginning point B to end E, in steps Ax(i) that become infinitesimally small as n becomes large, and the result is the difference in elevation h between B and E. For a general position-dependent 1-form Z(X),a corresponding integral is

(4.4.2) An important special case has the line integral being evaluated over a closed curve y , and in this case the result will be called the “circulation” of E over y and will be represented by an abbreviated notation:

f Z = f(Z,

ax).

(4.4.3)

Line integrals like these are very important in physics, not only to represent work, as in mechanics (e.g. EQ.(4.4.1)), but even more so in electricity and magnetism to represent things like the line integral of a magnetic field around a closed path. This is where the curls come from in Maxwell’s equations. There is a minor difference now, however, since our integrands contain differential forms, whereas the line integrals involved in the derivation of Maxwell’s equation involve quantities like H . dx instead. Our task therefore is to find the analog of the curl for differential forms and in the process find the analog of Stokes’s theorem, which one may recall as the mathematical tool needed to convert the integral forms of Maxwell’s equations into their more compact differential forms. ‘Sections marked by an asterisk are noticeably more difficult and should perhaps be skipped or skimmed over on first reading. The material in this section will be required mainly near the end of Chapter 13, but there it will be central to the entire development.

INTEGRATIONAND EXTERIOR DIFFERENTIATIONOF FORMS

151

The analog of the curl of a vector is the “exterior derivative” of a one-form. This particular derivative is just one example of the more general form of differentiation of forms of any order known as exterior differentiation. However, since we will use only the derivative that is analogous to the curl, we will not attempt to proceed with such great generality. This being the case, no mathematics described here will be more difficult than the reader saw on first encountering the curl of a vector. For this reason, the treatment here will be very limited, intended to provide just enough discussion for the reader to be persuaded that nothing too mysterious is involved.

4.4.2. Two-Dimensional Integrals In this spirit let us restrict the discussion to two dimensions and integrate the differential form Z = f ( x , y ) & g ( x , y) $y over the curve y shown in Fig.4.4.la. A regular grid based on repetition of basis vectors u and v has been superimposed on the figure. As drawn, since the curve lies in a single plane, a vector normal to the surface is everywhere directed along the z-axis, and a surface r with y as boundary can be taken to be the ( x , y ) plane. (For a general, nonplanar curve, establishing the grid and the surface would be somewhat more complicated, but the following argument would be largely unchanged.) Though the curve y is macroscopic, the required circulation can be obtained by summing the circulations around individual microscopic “parallelograms” as shown. Instead of traversing y , one is instead traversing every parallelogram once and summing the results. The interior contributions cancel in pairs and the stair-step path around the periphery can be made arbitrarily close to y by reducing the grid size. (Though the number of steps is inversely proportional to the grid size, the error made in each step compared to the correct path along y is proportional to the square of the step size.)

+

D surface

(b)

X

FIGURE 4.4.1. (a) Approximation of the circulation of a one-form around a contour y by summing the circulations around the elements of a superimposed grid. For simplicity, the surface r bounded by y is here assumed to be a plane. (b) EJaluation of the circulation around one (differential) element of the grid.

152

GEOMETRY OF MECHANICS Ill: MULTILINEAR

In this way the problem has been reduced to one of calculating the circulation around a microscopic parallelogram as shown in Fig. 4.4.lb. The vectors u and v forming the sides of the parallelogram will be treated as “differentially small” so that their higher powers can be neglected relative to their lower powers. We now wish to introduce for one-forms the analog of the curl of a vector so that the integral can be expressed as an integral over the surface r rather than along the curve y . We have not as yet assigned any meaning to the surface integral of a form, but we can do so by applying our previously defined line integration. The line integrals2long the counterclockwise vectors in Fig. 4.4. l b corning from the first term f ( x , y) dx of the form 2 being integrated are given approximately by

f (u+;)(&,v)=

:;

(f ( O ) + - ;: ( Y ) + - ( ux+-

uy+- :))uxv

where all partial derivatives are evaluated at the lower left comer. Approximating the clockwise contributions similarly and summing the four contributions yields

Performing the same calculations on g(x, y)& and summing all contributions yields

in& (-”ay + 3 ) ax

M

(UXUY

- uYvX).

(4.4.4)

Here the notation ll has been introduced for the parallelogram under discussion described as an area and an for the the same parallelogram described as the curve circumscribing it in a counterclockwise szse. The exterior derivative of form G = fdx+gzy was defined in Eq. (2.3.31),which is repeated here, (4.4.5) with the notation & being used for the exterior derivative. Also, when Eq. (2.3.29) is applied to the vectors u and v, the result is -

N

dx A dy(u, V ) = u x v y - uYux.

(4.4.6)

All these results can be combined and abbreviated into the equation (4.4.7)

INTEGRATION AND EXTERIOR DIFFERENTIATION OF FORMS

153

This equation relates infinitesimal quantities, but completing the argument implied by Fig. 4.4.1 we have (4.4.8) where r is the complete surface and y = ar is the curve circumscribing it in a counterclockwise sense. This is known as Stokes’s theorem for forms. Strictly speaking, we have still not defined such a thing as an area integral, as only line integrals have appeared. It is implied, however, that we intend Sr d G to be regarded as an integral Sr G(2)of an antisymmetric two-form G(’) over the surface r. Any surface r can be spanned by a grid of infinitesimal parallelograms for which a typical one has sides u and v. The integral can then be regarded as the sum of infinitesimal contributions Z(*)(u,v). In ordinary vector analysis, an integral over a two-dimensional surface can legitimately be called an “area integral” because areas are defined by the usual Pythagorean metric and, if the integrand is 1, the integral over surface r yields the total area. Another sort of integral over a two-dimensional surface in ordinary physics is to calculate the “flux” of a vector, say E,through the surface. Not only does the definition of the meaning of such an integral rely on a metric within the surface, it implies the introduction of the concept of “normal to the surface” and the scalar product of E with that vector. In contrast, the integral (4.4.9) does not require the existence of a metric on the surface and does not require anything involving “going out of the surface.” Probably the main reason for having introduced two-forms (and forms of other order) is illustrated by Eq. (4.4.9), where the two-form serves as the integrand of an integral over a two-dimensional surface. 4.4.3. Metric-Free Definition of the “Divergence” of a Vector The theorem expressed by Eq.(4.4.8) applies to the integration of an exterior differential form & of arbitrary order n - 1 over the “surface” bounding a closed ndimensional “volume.” This generalization will also subsume Gauss’s theorem (see Section 4.3.2) once the divergence has been expressed in the form of an exterior differentiation. While a metric was assumed to exist, the definition of divX amounted to requiring that Eq. (4.3.13) be valid in the limit of vanishingly small ranges of integration, but this definition is no longer applicable. Since we have dropped the requirement that a metric exist, it is now necessary to define differently the divergence of the given vector X(x). We assume that an nform G(’) is also given. The number G(“)(dp,dq, . . . , dr) obtained by supplying arguments to this form is the measure of the hyperparallelogram delineated by incremental vectors (dp, dq, . . . , dr). Performing an n-volume integration amounts to

154

GEOMETRY OF MECHANICS 111: MULTILINEAR

filling the interior of the n-volume by such parallelograms and adding the measures. It is possible to choose coordinates such that n dots

z(n)(-)

n dots

=

r\

G2r\ ...a7y"( e ., ., ., . ).

(4.4.10)

Expanded in terms of basis vectors, X is given by (4.4.1 1)

From

and X one can define an (n - 1)-form (4.4.12)

Substituting from Eq.(4.4.11) into Eq. (4.4.10)yields a-1 dots

*

- 2

&")(X;,.,.,.)=X'dx

- 3

r\dx

r \ . . .s

- 3

- 4

+ X 2 d x Adx

h.

. .s

* \ l + .

(4.4.

Forming the exterior differential of this expression yields n dots

n-1 dots

(4.4. This has validated the following definition of divergence:

r

n-1 dots

1 (4.4.15)

This definition of divergence depends on G('), a fact that is indicated by the subscript can be written on div,. Finally, Eq. (4.4.8) Zj(")div,X =

I,

-

n - 1 dots

Z(')(X, -,., ., .).

(4.4.16)

Here the form Cj(') is playing the role of relating "areas" on the bounding surface to "volumes" in the interior. This role was played by the metric in the previous form of Gauss's law. More precisely, the factor a d e t I I in Eq. (4.3.13) constituted the definition of the "volume measure" Zj(3)(dp,dq, dr). Finally, let us contemplate the extent to which the operations of vector calculus have been carried over to an intrinsic calculus of geometric objects defined on a general manifold. A true (contravariant) vector field is, in isolation, subject to no curl-like operation, but a one-form (or covariant vector) is subject to exterior differentiation, which can be thought of as a generalized curl operation. Furthermore there

SPINORS IN THREE-DIMENSIONAL SPACE

155

is no divergence-like operation by which a true scalar can be extracted from a true (contravariant) vector field X,in the absence of other structure. But we have seen that an n-form-dependent divergence div,X can be formed.

*4.5. SPINORS IN THREE-DIMENSIONAL SPACE’ The treatment here resembles somewhat the discussion of “Cayley-Klein” parameters in Goldstein. Basically the close connection between groups SO(3) and SU(2) is what is to be explored. The treatment follows naturally what has gone before and has the virtue of introducing Pauli matrices in a purely geometric context, independent of quantum mechanics. Our primary purpose is to exploit the representation of a rotation as the product of two reflections (see Fig. 4.1.2) in order to find the transformation matrix for a finite rotation around an arbitrary axis. The three components of certain vectors will, on the one hand, be associated with an object having two complex components (a spinor). On the other hand, and of greater interest to us because it applies to real vectors, a 3-D vector will be associated with one 2 x 2 complex matrix describing rotation about the vector and another describing reflections in the plane orthogonal to the vector. 4.5.1. Definition of Spinors The Euclidean, complex components ( ~ 1 x 2x3) , of an “isotropic” vector x satisfy x12 + x 22

+ x 32

=o.

(4.5.1)

To the vector x can be associated an object called a “spinor” with two complex components (60,tl),defined so that xi

=ti--;, x 2 = i ( t i + t 3 ,

~3=-2totl.

(4.5.2)

Inverting these equations yields

(4.5.3) It is not possible to choose the sign consistently for all vectors x. To see this, start, say, with some particular isotropic vector x and the positive sign for 60. Rotating by angle 01 around the e3 axis causes X I - i x z to be multiplied by e-jrr,and t o by e-ia/2. Taking 01 = 2n causes x to return to its starting value, but causes the sign of 60 to be reversed. Rotation through 2rr around any axis reverses the signs of (to, 61). (This will be easier to see after the next few sections.) Another full rotation restores the signs to their original values. 7S0me parts of this section should perhaps only be skimmed initially. Apart from its obvious importance in atomic physics, this formalism is necessary for analyzing the propagation of spin directions in accelerators and is helpful for analyzing rigid body motion.

156

GEOMETRY O F MECHANICS 111: MULTILINEAR

4.5.2. Demonstration That a Splnor Is a Euclidean Tensor For (60,61) to be the components of a tensor, they must undergo a linear transfonnation when x is subjected to an orthogonal transformation:

(4.5.4)

To see that the right-hand side is a perfect square, write the discriminant (a13

+ i a 1 2 + a22)(-al1 + i a 2 1 + i a 1 2 + 022) (4.5.6) = (all - i ~ 2 1+ ) ~(012 - i ~ 2 2 + ) ~(a13 - ia23)’ = 0, ’

2

- la231 - ( a i l

- ia21

where the vanishing results because the rows of an orthogonal matrix are orthonormal. As mentioned before, 66, with square-only determined by Eq. (4.53,can be given either sign. The second spinor component 4; is given by a similar perfect and (4.5.2); square, but its sign 4; follows from the third of Eqs. (4.5.4)

4.5.3. Associating a 2 x 2 Reflection (Rotation) Matrix with a Vector (Bivector) It has been seen above in Q. (4.1.8) that there is a natural association between a vector a, a plane of reflection 7r orthogonal to a, and a 3 x 3 transformation matrix describing the reflection of a vector x in that plane. There is a corresponding 2 x 2 matrix describing reflection of a spinor (60,Jl) in the plane. It is given by

where (known as Pauli spin matrices in quantum mechanics)

0 -i

1

0

(4.5.9)

SPINORS IN THREE-DIMENSIONAL SPACE

157

Some useful results follow easily from this definition: det 1x1 = -x . x,

x x = (x XY

*

x)l,

+ Y X = 2(x. y)l.

(4.5.10)

The latter two equations are especially noteworthy in that they yield matrices proportional to the identity matrix 1. In particular, if x is a unit vector, X2 = 1. Also, if (XI, x2. x3) are real, X is Hermitean:

x* = XT.

(4.5.1 1)

All these relations can be be checked for q , a2. and a3. For example, ‘ I

for i # j.

= -ojq

Next consider the bivector x x y = ( ~ 2 ~ x3 3 ~ 2 Eq.(4.5.8) is used, the bivectorhatrix association is

2i(x x y) + XY

-YX

(4.5.12)

, ~ 3-~~ 11 ~ 3 , ~ 1 yx2y1). 2

= [X,Y],

If

(4.5.13)

where the matrix “commutator” [X,Y ]E XY - Y X has made its first appearance. If x . y = 0,then XY = - YX and

i(x x y) -+ XY.

(4.5.14)

Problem 4.5.1: Suppose spinor ( t o , 61) is derived from vector x. Show that matrices 61,a 2 , and ~ 3 when , acting on a (40, .$I), have the effect of reflecting in the y, z, the n, z, and the x , y planes, respectively-that is, of generating the spinor derived from the corresponding reflection of x. 4.5.4. Associating a Matrix with a Trivector (Triple Product) Consider a trivector corresponding to three orthogonal vectors x, y, and z.It has six components, one for each permutation of the indices (1,2,3), all equal, except for sign (which depends on the evenness or oddness of the permutation), to the same determinant, which is (x x y) . z = u . z where u = x x y, a vector necessarily parallel to z. The matrices associated to these vectors by Eq. (4.5.8) are to be called X,Y,Z,and U.ByEq.(4.5.10),thematrixiu~zlisequaltoiUZ.ByEq.(4.5.14), the matrix i U associated with iu is XY. Hence

X Y Z = i U Z l = i(x

x y) .zl = iul,

where u is the volume of the trivector. In particular, ala203 = il.

(4.5.15)

158

GEOMETRY OF MECHANICS HI: MULTILINEAR

The following associations have been established: x

+ x , y + Y,

1 2

x * yl = -(XY

+YX),

2i(x x y)

-9

[X,Y ] ,

iu

4.

XYZ.

(4.5.16) 4.5.5. Representationsof Reflections Reflections in a plane orthogonal to unit vector a have been described previously, in

Eq.(4.1.7): x’ = x - 2(a.x)a.

(4.5.17)

Rearranging this into a matrix equation using Eq.(4.5.10)and A2 = 1 yields X’ = X - A ( X A + A X ) = - A X A .

(4.5.18)

By Eq. (4.5.14),the matrix associated with the bivector corresponding to orthogonal vectors x and y is proportional to X Y , and reflecting this in the plane defined by A yields X’Y’ = ( - A X A ) ( - A Y A ) = A X Y A .

(4.5.19)

In terms of the matrix U , defined in the previous section as associated with the bivector, U’ = A U A .

(4.5.20)

Comparing Eqs. (4.5.18)and (4.5.20)one can say that, except for sign, vectors and bivectors transform identically under reflection.

4.5.6. Representationsof Rotations It was demonstrated in Section 4.1.3that any rotation can be expressed as the product of two reflections. Let the matrices for these reflections be A and B . When subjected to these, the vector-matrix X and the bivector-matrix U of the previous section transform according to

x’ = B A X A B = ( B A ) x ( B A ) - ’ , u’ = B A U A B ,

(4.5.21)

which is to say identically. Note that A B = ( B A ) - ’ , since it reverses the two reflections. Defining the matrix S 3 B A to represent the rotation, the rotations can be written

x’ = sxs-1, u’ = sus-‘.

(4.5.22)

From these one can say that vectors and bivectors transform identically under rotation.

SPINORS IN THREE-DIMENSIONALSPACE

159

These formulas can be expressed more concretely: Let I be a unit vector along the desired axis of rotation-it is unfortunate that the symbol I, for axis vector, and 1,for unit matrix, are so easily confused-L. its associated matrix, and 8 the desired angle of rotation. By Eq. (4.5.10) and Eq. (4.5.13), since the angle between unit vectors a and b is 8/2, suppressing the identity matrix for brevity, AB

e

+ B A = 2 a . b = 2 cos -,e2

A B - B A = 2iL sin -. 2

(4.5.23)

Subtracting and adding these yields

e

S = B A = cos - - iL sin 2 2

e

(4.5.24)

which with Eqs. (4.5.21) and (4.5.8) yields

X’=

(4.5.25)

This is a very old formula, derived initially by Hamilton. Stated more compactly, (4.5.26) A general, real, orthogonal 3 x 3 rotation matrix has nine parameters, of which all but three are redundant. This representation of the same rotation has only one redundancy-the components of 1 must make it a unit vector. The four elements of S are known as Cayley-Klein parameters. Eqs. (4.5.26) are somewhat coupled but, by Eq. (4.5.8), the third component x i is not, and it is not difficult to separate x i and x i . When that is done, if x is a reference location of a point, then the new location x’ is expressed in terms of the Cayley-Klein parameters. Also, formula Eq. (4.5.26) lends itself naturally to the “concatenation” of successive rotations. Note that the rotated vector x’ can also be obtained from initial vector x and the axis of rotation vector 1 using normal vector analysis (see Fig. 4.5.1): x’ = (1. x)l+ cose((1 x x) x 1)

+ sine(1 x x).

(4.5.27)

4.5.7. Operations on Spinors

The defining use of matrix A associated with vector a is to transform spinor 6 into t’,its “reflection” in the plane defined by a:

6‘

=At.

(4.5.28)

160

GEOMETRY OF MECHANICS 111: MULTILINEAR

I

9

FIGURE 4.5.1. Vector diagram illustrating Eq. (4.5.27) and giving the result of rotating vector x by angle 0 around axis 1. Except for the factor ]Ix XI, which is the magnitude of the component of x orthogonal to I, the vectors I x x and (I x x) x I serve as orthonormal basis vectors in the plane orthogonal to 1.

It is necessary to show that this definition is consistent with our understanding of the geometry, including the association with the reflected isotropic vector associated with 6. For special cases, this was demonstrated in Problem 4.5.1. Also the spinor rotation is given by (4.5.29)

4.5.8. Real Euclidean Space

All of the results obtained so far apply to real or complex vectors, either as the components of points in space or as the vectors associated with reflections or rotations. Now we restrict ourselves to real rotations and reflections in Euclidean (ordinary) geometry. It has been seen previously that a real vector x is associated with a Hermitean reflection matrix X :X* = X T . The matrix U associated with a real bivector satisfies U* = --LIT. Since a rotation is the product of two reflections, S = B A , it follows that

this is the condition for S to be unitary. Hence a 2 x 2 spinor-rotation matrix is unitary. This is the basis of the designation SU(2) for the 2 x 2 representation of spatial rotations. Since a spinor is necessarily associated with an isotropic vector, and there is no such thing as a real isotropic vector, it is not possible to associate a spinor with a real vector. It is, however, possible to associate “tensor products” of spinors with real vectors. The mathematics required is equivalent to the “addition of angular momenta” mathematics of quantum mechanics.

BIBLIOGRAPHY

161

4.5.9. Real Pseudo-Euclidean Space In special relativity, taking axis 2 as the time axis to simplify the use of preceding formulas, the position of a (necessarily massless) particle traveling at the speed of light can satisfy 2

X I +X3

- c2 t 2 ,

2-

x2

= ct,

x;

- x;

+x32

= 0.

(4.5.31)

Replacing ix2 by x2 in the preceding formalism, and now requiring ( x i , x2. x3) to be real, the associated matrix x3

XI - x 2

X I +x2

-x3

(4.5.32)

is real, and there is a spinor (40,tl), also real, associated with x:

BIBLIOGRAPHY References 1. E. Cartan, The Theory of Spinors, Dover, New York, 1981, p. 10.

2. V. I. Arnold, Mathematical Methods of Classical Mechanics, 2nd ed., Springer-Verlag, New York, 1989, Ch. 7.

This Page Intentionally Left Blank

LAGRANGIAN MECHANICS

The major formulations of mechanics are known as the Newtonian, Lagrangian, and Hamiltonian. From the point of view of the physicist this division is rather artificial, since most predictions of one approach have identical predictions in each of the others. The segregation is based entirely on the mathematical methods used. In both the Lagrangian and the Hamiltonian descriptions, artificial functions (Lagrangian and Hamiltonian) are introduced and variational principles are prominent. Central to Lagrangian mechanics is the need to describe the system by n “generalized coordinates.” Variational methods in the “configuration space” of these coordinates are used to derive the n second-order, ordinary equations of motion, determining the system evolution. Motion-describing kinetic quantities (velocities) appear only as time derivatives of the generalized coordinates. Though most powerful when idealizations are made that are frictionless, lossless, etc., the Lagrangian method also permits the introduction of phenomenological forces. The P o i n c d equation further generalizes the Lagrange procedure when it is impossible to define generalized coordinates. The division into Newtonian, Lagrangian, and Hamiltonian descriptions has always been regarded as natural in mechanics, and we follow that tradition. In fact, the division is even more natural in a treatment (like this one) that emphasizes the geometric formulation of the subject. The space of generalized coordinates, though describing the same configurations as the Euclidean space with which we are familiar, requires a more complicated geometrical treatment. Furthermore, the geometry of configuraton space and of phase space are different. The most basic aspect of this difference is that the momentum variables basic to phase space require different geometric representation than do the position vectors.

164

LAGRANGIAN MECHANICS

We start with Lagrangian descriptions. If only elementary methods were being described, the Newtonian description would come fist, but that is not the case here because a more sophisticated “gauge-invariant” treatment employs different geometric methods than are needed for the Lagrange-Poincd description. To see the dependencies amongst the chapters in this text, refer to the flowchart near the end of the introduction.

LAGRANGE-PO INCARE DESCRIPTION OF MECHANICS 5.1. REVIEW OF THE LAGRANGE EQUATIONS One of our stated aims is to show that the Poincark equations are “better than” the Lagrange equations (while preserving the ways the Lagrange equations are “better than” the Newton’s equations). We start by recalling the Lagrange equations and review the benefits that justify the level of abstraction needed to use them. The actual use of the equations can be mechanized, making it tolerable, even natural, to forget whatever complexity attended their derivation. This will also largely be true for the Poincark equations. In particular, in cases where the Lagrange equations are valid, the Poincark equations trivially reduce to the Lagrange equations. It is when the Lagrange equations are not applicable, and/or when the system exhibits symmetry, that the power of the Poincark method emerges. Consider an N particle mechanical system. It is convenient to use Cartesian coordinates ( x ( i ) , y ( i ) , z ( i ) ) to locate each of the N point particles of the system. In the presence of constraints, however, it is often both necessary and possible to introduce “generalized coordinates” q = (4’. q 2 , . . . , q”),n 5 3N, that automatically respect the constraints. They are related to the Cartesian coordinates by functions fi : X(j)

= f(i)(q), i = 1,. . , N .

(5.1.1)

In this case, the constraints are said to be “holonomic” or “integrable.” It does not complicate the formalism greatly if the functions f(i) are allowed to depend also on time r, but for simplicity we exclude this possibility for now. (One of the merits of the Poincark formalism is its ability to handle nonintegrable constraints, but to simplify the present discussion we exclude this case also and restrict discussion to the case where the mechanical system is unambiguously described by generalized coordinates.) 165

166

LAGRANGE-POINCARE DESCRIPTION OF MECHANICS

The kinetic energy of the system is defined as a sum over all particles in the system: (5.1.2)

Substituting Eqs. (5.1.1) into Eq. (5.1.2), the kinetic energy of the system can be expressed in the form 1

T(q, 4) = p ( q ) 4 k d .

(5.1.3)

In general, the coefficients Ukl are functions of q as indicated, but not of the velocities q. If the system is subject to generalized forces, they are often describable by a potential energy function V(q, t ) (this assumption is not essential, however). The Lagrange equations are

d _ aT _

dt aqi

-

av a T - -aqi

aqi’

i = 1, ..., n.

(5.1.4)

It is customary to simplify these equations by defining a “Lagrangian function” L(q, q) = T - V, but the resulting simplification is mainly cosmetic. Because it is common for T and V to have different symmetry, and because symmetry will often be essential, we will often leave force terms explicit but refer to the equations as Lagrange equations nevertheless.

5.2. THE POINCARE EQUATION

We are now ready to apply our geometric ideas to mechanics proper. The plan is to introduce the “Poincar6 equation” as an “improvement” over the Lagrange equation. This equation will be derived two different ways, first using traditional elementary calculus and then using modern geometric methods. This is not really an extravagance since it is important to correlate old and new methods. Furthermore, this discussion can serve as a review of Lagrangian mechanics, since more than half of the derivation amounts to studying the properties of the Lagrange equations. There are two rather different ways of introducing the Lagrange equations themselves. Both methods start by assuming that the configurations of the mechanical system under study are describable uniquely by “generalized coordinates” 4‘.From there it is quickest to postulate Hamilton’s principle of least action and then apply the calculus of variations. Though this procedure seems like “black magic” when first encountered, it has become so well established that it is now considered reasonably straightforward and elementary. This variational method has the further advantage of immediately exhibiting a remarkable “invariance” to changes of coordinates. The other method amounts to applying “brute force” to the equations given by Newton’s second law to transform them into Lagrange equations using nothing but calculus.

THE POINCARE EQUATION

167

This method is also not very well motivated since it is not clear a priori what one is looking for. Furthermore, once one has derived the Lagrange equations, one has to derive their remarkable invariance property. We will take the second approach, taking advantage of the fact that the Lagrange equations were already derived this way in Chapter 3, and proceed to study their properties. Before proceeding, we preview the ways in which it is necessary to be more ambitious in order to obtain the Poincark equations by the same methods that yield the Lagrange equations. Two considerations are important: 0

0

Fundamental to the Lagrangian formalism are its generalized coordinates q’ and their corresponding velocities q ’ . l However, there are convenient velocities, of which angular velocities are the most familiar example, that cannot consistently be expressed as the time derivatives of generalized coordinates. Something about the noncommutativity of rotations causes this (as we shall see). In describing constrained motion, it is always difficult, and usually impossible, to express the constraints analytically without the use of velocity coordinates-such constraints normally are not “integrable.” The coordinates are said to be nonholonomic, and the Lagrange procedure is not applicable.

One tends to be not much concerned about the latter restriction, but that is more because physics courses skip over the problem than because nonholonomic systems are rare in nature. Trains, which are holonomic (at least if they are on cog railways) are far less prevalent than automobiles, which are not. Because it permits “quasicoordinates,” the Poincark equation can descibe nonholonomic systems. Later we will see how the whole line of reasoning can be greatly streamlined by a “geometric” treatment. It is assumed the reader has already mastered the Lagrange equations, especially concerning the definition of generalized coordinates and generalized forces and the application of d’Alembert’s principle to introduce force terms into the equations. We assign ourselves the task of changing variables in the Lagrange equations. This will illustrate some of the essential complications that Lagrange finessed when he invented his equations. To follow these calculations it is necessary to “understand” the tangent space of possible instantaneous velocity vectors of the system or (more likely) to accept without protest some steps in the calculus that may seem a bit shady. A remarkable feature of the Lagrange equations that has already been pointed out is that they maintain the same form when the generalized coordinates q’ are transformed. Since the Lagrangian depends also on velocity components q’, one is tempted to consider changes of the velocity variables as well, especially in cases where the kinetic energy can be expressed more simply in terms of “new” velocities, call them si , rather than in terms of the 4’. In simple cases, the velocity transformation can be “integrated” and is equivalent to a pure coordinate transformation-for ‘In this text, the quantity q’ = dq’ /dr is always called “the velocity corresponding to q’” even though in some cases this causes the physical dimensions of different velocity components to be different.

168

LAGRANGE-POINCARE DESCRIPTIONOF MECHANICS

example, consider the change of rectangular coordinates describing transformation to a uniformly translating frame of reference-but in general this is impossible. The most familiar example of the sort of complication being discussed is rotational motion of an extended object: Let d, s2,and s3 be three instantaneous angular velocities2 around respective axes of a rectangular frame. If two of the angular velocities vanish, then the motion can be “integrated to yield the macroscopic value of the remaining angle and hence specify the orientation. One can attempt to define three global angles, one at a time, in this way. But if rotation occurs around more than one axis, since the order of application of rotations affects the final orientation, these angles would not satisfy the requirement that there be a on-to-one correspondence between generalized coordinates and system configurations? Our purpose in the rest of this section therefore, is to learn how to change the velocity variables in the Lagrange equations to new variables in which these complications can be handled. Consider then a mechanical system described by generalized coordinates q l , q 2 , , . , q”.To describe the system, introduce new “quasi-velocities’’ sl, s2,. . . , s“,some or all of which differ from q ’ , q 2 , , . . , 4”. By definition they are invertible superpositions of the generalized velocities related to the original velocities by linear relations of the form4

.

(5.2.1)

Typically the coefficients in Eq. (5.2.1) are functions of the coordinates, but they must not depend on velocities. If the number of quasi-velocities is small, it may be convenient to give them individual symbols such as (sl, s2, . . .) = (s,g , 1 , . . .). In this case the transformation looks like (5.2.2)

? h e r e are too few letters in the English alphabet. It is conventional to give the coordinates of a mechanical system symbols that are Roman characters such as r , x, y, etc., and, similarly, to use such characters as u for velocities. To emphasize their ephemeral character, we use Greek symbols u,y , b, . . . , to stand for the “quasi-coordinates”that are about to be introduced. Comesponding to these will be “quasivelocities” and, to emphasize their real existence while preserving their ancestry, they will be symbolized by matching Roman letters s. g, I, . . . . A further (temporary and self-imposed) “requirement” on these characters is that there be uppercase Greek characters Z, r, A, . .. , available that “match” them (to serve as a mnemonic aid later on.) The quantities 3’. s2, s3 being introduced here are “quasi-velocities:’ not momenta. Probably because the most common quasi-velocities are angular velocities, the symbol w is commonly used in this context, but that symbol is already overworked, for example also as a general f o m . In any case, once the general arguments have been made, a less restrictive notational scheme will have to be acceptable. 3By carefully specifying their order of application, the so-called “Euler angles” circumvent this problem to some extent. 4The slight displacement to the right of the lower index is to facilitate mental matrix multiplication but otherwise has no significance, and the upldown location of indices will also have no significance for the time beiig. Also, the order of the factors A‘( and q’ could be reversed without changing anything, except that it would no longer be a conventional representation of an equation using matrix multiplication.

THE POINCARE EQUATION

169

This form has the advantage of mnemonically emphasizing the close connection between any particular quasi-velocity, say g , with its corresponding row

(rl,r2,r3,r4).

As an example, for a single particle with coordinates x , y and z, one could try the definition s = x x x, which can be written SX

(zz)

nogood

-

noinverse

0

(2y

-2

i

Y

ix)

(i).

(5.2.3)

This choice, though linear in the velocities as required by (5.2.1), is illegal because the determinant vanishes identically, meaning the relation cannot be inverted. Note that the vanishing does not occur at a single point in configuration space (which would be tolerable), but rather is identically true for all ( x , y, z ) (see Problem 5.2.2 and Problem 5.2.3).This failure is unfortunate since the transformation (5.2.3) seems to be otherwise promising. Its purpose would have been to write the equations of motion in terms of the angular momentum variables (sx , sY , sL) rather than the linear velocities (i,j , 2). The characteristics of this transformation can be discussed at greater length to illustrate the concept of foliation. To have a similarly motivated sample to visualize, we can try instead (5.2.4) which is invertible. For s‘ defined as in Eq. (5.2.1) it can happen that coordinates or are findable such that5

do‘ q -=sr, dt

r = l , Z ,..., n ,

(5.2.5)

but this is the exception rather than the rule. (The or would have to be found by “integrating” Eqs. (5.2.I), which may not be possible, even in principle.) Nevertheless, for the time being, we will pretend that “quasi-coordinates” exist, promising to later undo any damage that this incurs. In any case, Eq. (5.2.1) can be written in differential form

da‘ = A‘i(q)dq’ r = 1 , 2 , .. . , n.

(5.2.6)

That this is a differential of the “old-fashioned” calculus variety is indicated by the absence of overhead tildes. The concept of “tangent space”-a linear vector space containing sums of and scalar multiples of tangent vectors-is central to the present derivation. The tangent space is a mathematical device for making it legitimate to regard a quantity like dx/dt as not just a formal symbol but as a ratio of two quantities dx and dt that 5Recall that the symbol

means “qualified” equality.

170

IAGRANGE-POINCAR6 DESCRIPTION OF MECHANICS

are not even necessarily “small.” A physicist is satisfied with the concept of “instantaneous velocity” and does not insist on distinguishing it from an approximation to it that is obtained by taking d t small enough that the ratio d x / d t is a good approximation. In this text, the tangent space will be said to constitute a “linearized” approximation for evaluating the effects of small deviations from any particular configuration. The following partial derivatives can be derived from Eq. (5.2.6):

(5.2.7) these are the “Jacobian matrices” for the coordinate transformations of Eq. (5.2.6). The invertibility requirement can be stated as a nonvanishing requirement on the determinant of the matrix of coefficients. We assume then that Eq. (5.2.6) can be inverted:

s- k --( ~ - l ) k . ~ i I

’

(5.2.8)

It must be remembered that the matrices A and A-’ depend on q. In mechanics, the most important differential is dW = Fdx, the work done by force F acting through displacement d x . For a system described by generalized coordinates 4’.this generalizes to

dW = Q i d q ‘ ,

(5.2.9)

where the Qi are said to be “generalized forces.” The discussion of contravariant and covariant vectors in earlier chapters suggests strongly that the Qi may be expected to be covariant, and the notation of (5.2.9) anticipates that this will turn out to be the case6 When expressed in terms of the quasi-coordinates,d W is therefore given by

dW = Sidu’,

where Si = Qk(A-’)ki,

(5.2.10)

where the generalized forces Si have been obtained from the Qi in the same ways that generalized forces are always obtained in mechanics. Geometrically it amounts to counting contours (represented by covariant vector Qi or Sj) that are crossed by the m o w represented by the dqi or the d u ’ . At this point, the “shady” steps mentioned above have to be faced. Regarding the velocities s‘(q, q) as depending on q and independently and linearly on q, one has

(5.2.11) the assumed linearity in the q’ has made this step simple. (Recall that the meaning of partial differentiation is only unambiguous if the precise functional dependence summation in Eq.(5.2.9)may be more complicated than it appears. It may include sums over particles or independent systems. Such sums should not be implicitly included in the summation convention but, for brevity, we let it pass.

THE POINCARE EQUATION

171

is specified. In this case, the implication of asr/aqk is that all the q’ and all the q’ except qk are being held constant.) Many people have difficulty seeing why it makes sense for q‘ to vary and q’ to not vary. However, Lagrange thought it made sense, and everyone since then has either come to their own terms with it or taken their teacher’s word for it. As mentioned before, the accepted mathematical procedure for legitimizing this procedure is to introduce “tangent planes” at every location. Displacement in any tangent plane is independent of both displacements in the original space and displacements in any other tangent space. Once this concept has been accepted, it also follows that (5.2.12) This maneuver will be said to be “pure tangent plane algebra.” Because S‘ stands for d a r / d f and q i stands for dq’/dt, and because do‘ and dq’ reside in the tangent space, it is legitimate (in spite of one’s possible recollections from freshman calculus) simply to divide out the dt. With the first of Eqs. (5.2.7), this yields Eq. (5.2.12). In deriving the Lagrange equations from Newton’s equations, the only tricky part is more or less equivalent to deriving Eq. (5.2.12). Because the Lagrange equations have already been derived (from purely geometric considerations) in Section 3.1.5, we skip this derivation and directly express the Lagrange equations as the equality of two ways of evaluating the work during arbitrary displacement Sqk; (5.2.13) (The only significanceto the fact that Sqk is used instead of, say, d q k ,is that later the Sqk will be specialized for our own convenience.) We now wish to transform these equations into the new quasi-coordinates. Because the Lagrange equations are based on the expression for the kinetic energy of the system, the first thing to do is reexpress T. The function expressing kinetic energy in terms of the new coordinates will be called 7: T(q, 4, t ) = T(q, S, t ) .

(5.2.14)

If this were a true coordinate transformation, then it would be possible for the first argument of T to be u.But since the very existence of coordinates u is in question, a hybrid functional dependence on new velocities and old coordinates is all we can count on. What will make this ultimately tolerable is that only derivatives of T will survive to the final formula. The terms in Eq. (5.2.13) can be worked on one at a time, using Eqs. (5.2.7) and (5.2.1 I), to obtain aT

aTaz

as.@

aqk -

-

aT

- -Ark, asr

172

LAGRANGE-POINCARE DESCRIPTIONOF MECHANICS

(5.2.15)

The strategy so far, in addition to trying to eliminate the q and q variables, has been to replace 5' by si wherever possible. The q i factors remaining can be eliminated using Eq. (5.2.8). Collecting terms, the left-hand side of Eq. (5.2.13) contains the following three terms:

(5.2.16)

To abbreviate the writing of the second term, the following coefficients, destined to play a large role as "structure constants" in the sequel, have been introduced: (5.2.17)

Because the differentials 60' are arbitrary, they can be replaced by Kronecker 6's. As a result, the Lagrange equations have been transformed into (5.2.18)

These are known as the Poincad equations. Unlike the n Lagrange equations, which are second-order-in-time derivatives, these n equations are first-order-in-timederivatives, and the independent variables are now velocities. The defining equations (5.2.1) provide n more equations, making 2n in all. In this regard, the Poincar6 procedure resembles the transition from Lagrange equations to Hamilton's equations but, in spite of this, I believe it is more appropriate to regard the Poincad equations as only a modest generalization of the Lagrange equations. No momentum variables have been defined and no phase space has been introduced. Apart from the fact that these are complicated equations, an essential complication is that 'i; and the coefficients c'li are a priori known explicitly only as functions of the original q-variables. If the equations were being solved numerically, then, at each time step, once the the s-variables had been updated, the corresponding q's would have to be calculated, and from them the c'lj coefficients calculated. To avoid this complication, it would be desirable to have the c'li coefficients expressed in terms of the cr variables. But this may be impossible, which brings us back to the issue that has been put off so far-what to do if the cr-variables do not exist.

THE POINCARE EQUATION

173

Though the quasi-coordinates cri do not necessarily exist, the quasi-velocitiescertainly do-they are given by Eq. (5.2.1). Our task then is to evaluate terms in the Poincark equation that appear to depend on the 0‘ in terms of only the s’ . Actually, we have already done this once without mentioning it in Eq. (5.2.10) when we calculated the quasi-forces Si . Because this calculation was “local,” it depended only on differentials da’,which, as we have said before, are strictly proportional to the s’ in linearized approximation. That was enough to relate the Si ’s to the Qi ’s. Essentially equivalent reasoning allows us to calculate the derivatives a?;/acr‘ that appear in the Poincark equation: (5.2.19) Earlier, we minimized the importance of obtaining the generalized forces from a potential energy function U . But if this is done, and Lagrangian L = T - U is introduced to take advantage of the simplification,then, to complete the transformation to quasi-coordinates, we have to introduce a new Lagrangian appropriate to the new coordinates; the formula is -

L(q, s, t ) = n q , s, t ) - U(q),

(5.2.20)

whereT(q, s, t ) is given by Eq. (5.2.14). The “force” terms of the Poincark equations then follow as in Eq. (5.2.19). Cartan’s witticism that “tensor formulas exhibit a dkbauche d’indices” is supported by formula (5.2.17) for c‘li. We can make this formula appear less formidable in preparation for its practical evaluation using matrices. First, having introduced the notation

(which declines to distinguish between lower and upper indices), by performing Problem 5.2.1 you can show that c‘li is antisymmetric in its lower indices. Furthermore, note that the upper index r is “free” on both sides of the equation. To illustrate this last point, suppose that, as in Eqs. (5.2.2), quasi-velocities are symbolized by s 1 = s, s2 = g, . . . . Then definitions A’i 3 C; and A2i E r; correlate the “rows” of “matrix” A with the “output variables” s and g in a mnemonically useful way (because the uppercase Greek (C and r) and lowercase Roman (s and g ) symbols form natural pairs); an index has been suppressed in the bargain. With this notation, the defining equations (5.2.17) become

The definitions have been manipulated in this way to allow them to be treated by matrix multiplication; the order of factors has been changed and one matrix has been transposed in order to switch the order of indices. Also, the superscripts (s) and (g)

174

LAGRANGE-POINCARE DESCRIPTIONOF MECHANICS

are no longer running indices-they ously known as s’ and s2.

’

identify the particular quasi-velocities previ-

Example 5.2.1: A simple pendulum, bob mass rn = 1, hanging from the origin at the end of a light rod of length t = 1 swings in the x , z plane, in the presence of gravity g = 1. Let q 1 = 8 define the pendulum angle relative to the z-axis (which is vertical, positive up). Define quasi-velocity s = cos 0 8. Write the Poincare equation for s. Is there a coordinate 0 for which s = d a / d t in this case? Will this result always be true in the n = 1 case? The equation da/dr = cose d@/dtby which a is to be found, transforms immediately into d@/d8 = cos9 which, neglecting a constant of integration, yields u = sin 9. Since u exists, all the coefficients (5.2.17) vanish. (In the n = 1 case this will clearly always be true.) The potential energy function can be chosen as U ( 0 ) = cos 8; here a possible constant contribution to I/ has been dropped since it would not contribute to the Poincark equation anyway. The kinetic energy is T (0) = b2/2, and transformation to quasi-velocities yields

The derivatives needed are d

aZ

1

These are already complicated enough to make it clear that the transformation was ill-advised but, since this is just an example, we persevere. The Poincak equation (along with the quasi-velocity-defining equation) is

. s2sin6 s = -COS2

+ sin8cos8,

S

8 = __ cos 8 *

It is not hard to show that these are equivalent to the well-known equation 8 = -sin 8.

Example 5.2.2: Suppose the pendulum in the previous example is a “spherical pendulum,” free to swing out of the plane assumed so far. Letting Cp be an azimuthal angle around the z-axis, define quasi-velocitiesby

(i)(i) ( =

=

- sin4 0 cos$e

).

7Though they have lower indices, the elements Ci or Ti are not automatically the covariant components of a tensor-recall that they are only the row elements of an arbitrary, position-dependent matrix. However, in their role of relating two tangent plane coordinate systems, they will be subject to important invariance considerations later on.

THE POINCARE EQUATION

175

As mentioned before, only two of these (any two) are independent. Let us choose sx

and sz.The matrix A and its inverse are then

To correlate with the numbering system used above, let 8 -+ 1, 4 the Poincari coefficients are obtained from the upper row of A:

4 2.

For r = 1,

5.2.1. Some Features of the Poincare Equations In general, the factors c‘lj(q) are functions of position q and Poincark Eqs. (5.2.18) represent no improvement over the original Lagrange equations. However, there is immediate simplification in important special cases that will be considered now. Suppose that generalized coordinates a’(q) do in fact exist globally such that si = dai/dr, as in Eq. (5.2.5). Because the order of taking partial derivatives does not matter, differentiating Eq. (5.2.7) yields

(5.2.23) From Eq. (5.2.16), it then follows that the array of factors C‘li all vanish. In this case, the Poincari equations become simply the Lagrange equations in the new generalized coordinates d . This means that the analysis up to this point amounts to having been an explicit exhibition of the form invariance of the Lagrange equations under a coordinate transformation. Another important simplification occurs when the partial derivatives vanish. This possibility is closely connected with symmetry. If the coordinate ol can be regarded as fixing the orientation or location of the system and the kinetic energy = 0. If the external is independent of orientation or location respectively, then forces are similarly independent of cri, the generalized force factors Si also vanish. The best-known case in which both of these simple features occur is in the force-free rotation of a rigid body; if s is an angular velocity, then the kinetic energy depends on s but not on the corresponding angle (I (which would specify spatial orientation of the system), and there is no torque about the axis, so the corresponding generalized force S also vanishes. This example will be pursued shortly. In the traditiond Lagrangian vocabulary, the coordinate d is then said to be “ignorable.” In general, the factors c;‘(q), defined in Eq. (5.2.17), depend on position q, but it is the case where these factors are constant, independent of q. that Poincari had particularly in mind when he first wrote these equations. In this case, as will be shown in the next chapter, the transformations resulting from changing the quasicoordinates form a “Lie group” for which the c‘li are “structure constants” that fix

5

176

LAGRANGE-POINCARE DESCRIPTION OF MECHANICS

the commutation of infinitesimal group transformations. Examples illustrating this case are the subject of a series of problems to appear shortly.

*5.2.2. lnvariance of the Poincare Equation* The newly introduced coefficients c'li have a certain "intrinsic" coordinate independence. To show this they will be expressed in terms of the bilinear covariant introduced in Chapter 2. To accomplish this, start by recalling that the quasi-velocities s,g, . , defined by Eq.(5.2.1) can be written in the form

.

a [ d ] = C; dq',

y [ d ] = ri dq',

etc.,

(5.2.24)

where, as in Section 2.3.2, the "argument" d indicates that the coordinate deviation is dq' (rather than, say, Sq'). The (position-dependent) coefficients are, on the one hand, elements in the row corresponding to s of matrix Ar(q) and, on the other hand, coefficients of the form a[d]. If we introduce a second coordinate deviation Sq', by Eq. (3.2.7)the bilinear covariant is da[S] - h [ d ] =

(s - 3) aqj

aqk

8qk d q j

(Sqkdqj - S q j dqk)r.

(5.2.25)

In Section 3.2, this quantity was shown to have an invariant significance, and in Chapter 2 the coefficients of an invariant form linear in the contravariantcomponents of a vector were identified as covariant components. For the same reasons, the coefficients can be regarded as the covariant components of an antisymmetric tensor in the q' coordinates or as the coefficients of an antisymmetric differential two-form in the basis. One also recognizes the coefficients (!?% - $ as ap84' pearing in Eqs. (5.2.22). Though the elements & were chosen arbitrarily in the first place, the matrix elements Bki appearing in Eq. (5.2.22) are none other than the Jacobian coefficients that relate tensors in the original and quasicoordinates, and this imposes conditions on the partial derivatives of the E:k.According to Eqs. (5.2.1) and (5.2.21), the displacements are related by Sqk = Bk;Saiand dqJ = BJIdal,and as a result

(3 2)

1

&'

da[S] - Sa[d] = c$)a"S]a"d].

(5.2.26)

This shows that Eqs. (5.2.22) amount to being equations transforming the components of an antisymmetric tensor between two coordinate systems and that the coefficients c:;) are components of the forms (5.2.25) in the quasi-basis. 'This section can be skipped on first reading. The same result will be obtained more geometrically later on.

177

THE POINCARE EQUATION

We have seen therefore that the extra terms in the Poincare equation (over and above those in the Lagrange equation) exhibit an invariance to coordinate transformation that is much like the invariance to coordinate transformation of the Lagrange equation itself. The invariance is more difficult to express, but that should perhaps have been expected since the class of applicable coordinate transformations has been extended.

5.2.3. Translation into the Language of Forms and Vector Fields Using notation introduced first in Section (2.2), the defining equations (5.2.1) or (5.2.2) can be expressed as

etc. A differential duqi on the right-hand side is a form that, when o p e r sn g on a displacement vector, projects out the change in coordinate 4'.Similarly, do projects out the change in quasi-coordinate c.Note that at this stage the equations relate local coordinate systems and have no content whatsoever that depends on or describes actual motion of the system. Proceeding, as in Section 2.4.5, by labeling dual-basis vectors . -j as Z' = dq in the original system, andZ(') = da,3 8 ) = d y , etc., in the new system, Eq. (5.2.28)is equivalently written as

-

-

As defined in Section 2.4, the 2 are natural basis forms in the space dual to the space with basis vectors el along the coordinate directions 4'. The @), Z(g), etc., are similar basis forms for the quasi-coordinates. It was shown in that section that the basis vectors themselves are then related by

(e,

eg

.

.)

=(a/&

B/ay

.

.)

= (a/aql a/aq2 . .)a,

(5.2.29)

where the matrix has been expressed as A. All matrix elements are, of course, implicitly dependent on q. The velocity vector is expressed in the two bases by

v = q = se, +ge,

+ . . . = (i1el +(i2e2+. . . .

(5.2.30)

178

LAGRANGE-POINCARt DESCRIPTION OF MECHANICS

Repeating Eqs. (5.2.2) with the notation of Eq. (5.2.28), the velocity coordinates in the two frames are related by $1 =

C j i q i , and qi = A ’ j s j .

(5.2.3 1)

The transformation formulas for the components of forms have been given in Eq. (2.4.10). As in that section, the matrices that have been introduced here are related by X = A-’ .

5.2.4. Example: Free Motion of a Rigid Body with One Point Fixed The orientation of a rigid body with one point fixed can be described by Euler angles (#,@,@) (see Fig. 5.2.1). The problem of noncommuting rotations mentioned above is overcome in this definition by specifying the order, first by angle Q, about the zaxis, next by B about the new x-axis, then by @ about the new z-axis. The unprimed axes in Fig. 5.2.1 can be regarded as fixed in space; the triply primed axes are fixed in the rigid body, with origin also at the fixed point. That the origin remains fixed in space can result either from its being held there by a bearing or from the facts that the body is free, the origin is its center of mass, and the origin of space coordinates is taken at that point. At any instant, the rigid body is rotating around some axis, and the angular velocity vector points along that axis, with length equal to the speed of angular rotation around that axis. This vector can be described by coordinates referred to the fixed-inspace (“laboratory”) (x, y, z) axes or to the fixed-in-body (x”’, y”’, z’”) axes. For the

FIGURE 5.2.1. Definitionof Euler angles (&,O, $). Initial coordinate axes are (x, y , z ) ,final axes are (x”’, y”’,z”’), and the order of intermediate frames is given by the number of primes. The initial axes are usually regarded as fixed in space, the final ones as fixed in the rigid body with one point fixed at the origin.

THE POINCARE EQUATION

179

FIGURE 5.2.2. An iIlusJation2 Eulezngles that is less cluttered than that in Fig. 5.2.1, showing the rotation axes d@/dt,dO/dt, dJl/dt for angular rotations with one Euler angle varying and the other two held fixed.

former choice, evaluating the kinetic energy is complicated because the spatial mass distribution varies with time and with it the moment-of-inertia tensor. Hence we will w2,03)To . take the body angular velocities as quasi-velocities, calling them (d, calculate them, one can treat the Euler angular velocities d, 6, and $ one by one. Taking advantage of the fact that they are vectors directed along known axes, one can determine their components along the body axes. Finally the components can be superimposed. Fig. 5.2.2 shows the axes that correspond to varying Euler angles one at a time. The transformation to quasi-velocities is illustrated in the following series of problems.

Problem 5.2.1: Show that the factors c r l i ( q ) , defined in Eq. (5.2.17), are antisymmetric in their lower indices. For n = 3, how many independent components c'ri are there?

Problem 5.2.2: Define three vector fields (or operators)

a a R - - - z- - x y -

a#y

a

- ax

a

R,=--=x--ya@ a y

a az'

a ax'

where @', and 9' are azimuthal angles around the respective coordinate axes. In spherical coordinates, qjz is traditionally called simply 4. Show that these operators

180

LAGRANGE-POINCARC DESCRIPTION OF MECHANICS

satisfy the commutation relations

[Rx,RyI= -RZ, [Ry, &I = -Rx,

[R,, RxJ = -Rye

Using (r, 8 , # ) spherical coordinates and Example 3.4.1, derive the relations

a

R, =-sin#R,

86

= -cos#-

-ccos@cot&---,

a 84

a +sin#[email protected] ae a@

Problem 5.2.3: For a single particle, with the three components of quasi-velocity s defined as linear functions of Cartesian velocities ( i ,j , i) in Eq.(5.2.3), evaluate the elements Af according to Eq.(5.2.1). State why this is unsatisfactory and interpret the failure geometrically. Show how the result of the previous problem implies the same thing.

Problem 5.2.4: The pendulum defined in Example 5.2.1 is now allowed to swing freely out of the x , z plane, making it a “spherical pendulum.” Natural coordinates describing the bob location are the polar angle 6’ relative to the vertical z-axis and the azimuthal angle around the z-axis 4, measured from the the x, z plane. Instantaneous angular velocities around the x, y and z-axes are given by

0”= s x = -sin$& e y = s y = cos@0, (42

SZ

= 6.

Choose s x and sz as quasi-velocities and write the Poincark equations for these variables along with 0 and #, (This is just an exercise in organizing the work; there is no real merit to the choice of variables.)

Problem 5.2.5: With Euler angles ($I,@,$) as defined in Fig, 5.2.1 playing the role of generalized coordinates 4’.define quasi-velocities (ul, u2, u3) = ( w ’ , w2, w 3 ) as angular velocities of rotation of a rigid body around body-axes x”’,y”’, z”’. Evaluate ( w ’ , w 2 ,w 3 ) in terms of “Euler angular velocities” Express the transformations in the form w‘ = A r , ( q ) q i as in Eq. (5.2.1). [Since the angular velocity is a true vector (insofar as rotations and not reflections are at issue), it is valid to start with an angular velocity with only one Euler angle changing, say corresponding to 6 # 0, and work out its body components, do the same for the other two, and apply superposition.]

(4,(4,$).

Problem 5.2.6: For the velocity transformation of the previous problem, evaluate the coefficients c‘li (9) and show that they are independent of (4, 8, $). (Note that q stands for ( # , 8 , $) in this case.) If highly satisfactory cancellations do not occur in the calculations of c‘li, you have made some mistake.

VECTOR FIELD DERIVATION OF THE POINCARE EQUATION

181

Problem 5.2.7: The kinetic energy of a rigid body, when expressed relative to “body-axes” (necessarily orthogonal), is a sum of squares: 1 12 1 T(w)=-Z1w +-z2w 2 2

-

22

1 +-z3w 2

32

.

Using this expression for the kinetic energy, write the Poincari5 differential equations, Eq. (5.2.18), for the angular velocities (o’, w2,w3).

Problem 5.2.8: Specialize the solution of the previous problem to the case of the spherical pendulum.

5.3. VECTOR FIELD DERIVATION OF THE POINCARE EQUATION Since it has become standard for intermediate-level mechanics to be based on Hamilton’s principle, it is assumed that the reader is familiar with the calculus of variations. Nevertheless, application of the principles of “least time” and of “least action” will now be reviewed. This is introductory, on the one hand, to a variational derivation of the Poincari5 equation (later in this chapter) and, on the other hand, to the introduction into mechanics (in a later chapter) of ideas drawn from physical optics-the main new quantity being the “eikonal” or “optical path length.”

5.3.1. The Basic Problem of the Calculus of Variations A “Lagrangian” function L ( q , 4, t) depends on the generalized coordinates q, the generalized velocities q, and time t. Any configuration space curve specified in the form q = q ( t ) ,with instantaneous velocity given by q E is a candidate, though not necessarily a physically realized, trajectory of the system. Deferring any discussion of its physical meaning, the “action” S corresponding to such a curve is defined by

9,

where the path begins at point P1 : (?I, ql) and ends at point P2 : ( t 2 , qz). Only the dependence on endpoints is indicated explicitly by the notation S( P I , P . ) ,but S depends on the entire curve joining these points as well. For this reason, S is known as a “functional.” n o such curves are illustrated in Fig. 5.3.1. The basic problem is to find that “extremal” curve from P1 to P2 that makes S(P1, 9 )take on an extreme (minimum or maximum) value. The action S in mechanics will turn out to be the analog of the optical path length of physical optics. The basic integral like (5.3.1) in optics that can be analyzed using

182

LAGRANGE-POINCARE DESCRIPTION OF MECHANICS

t x

x* (t) +6x(t)

tl

t2

FIGURE 5.3.1. Graph showing the extremal trajectory x * ( t ) and a nearby nontrajectory x * ( t ) + 6X(t).

the calculus of variations has the form C

C

(5.3.2) Here x, y, and z are Cartesian coordinates with x and y “transverse” and z more or less parallel to the “beam” of light. The abbreviation O.P.L.stands for “optical path length,” which is the “path length weighted by index of refraction n.” This quantity, O.P.L./c, is the “time” in “principle of least time.” Although it is not entirely valid in physical optics to say that the “speed of light” in a medium is c / n , one may act as if it were and use the formula to obtain the time of flight of a particle following the given trajectory with this velocity. 5.3.2. The Euler-Lagrange Equations Once a curve q*(t) making Eq. (5.3.1) extreme has been found for arbitrary endpoints, its dependence S( PI,&), on its upper endpoint in particular, has become well-defined. The so-called Hamilton-Jacobi theory is based on this dependence, but for now we consider the endpoints fixed. For notational simplicity (generalization is easy), we will take q as having just two components and call them x and y. Fig. 5.3.1 shows the dependence of one of these, the x coordinate, on time t . A similar figure for y ( t ) need not be drawn, as we will vary only x ( t ) at this time. Plotted are the extremal trajectory x * ( t ) and both a deviant trajectory x * ( t ) Ax(t) and the deviation Ax(?) itself, which vanishes at both ends. The varied action is given by

+

S

+ (AS), =

il 12

+

L(x* SX, y , X*

+ &, t ) d t .

(5.3.3)

By Taylor expansion of the integrand, the variation is given by

(5.3.4)

VECTOR

FIELD DERIVATION OF THE POINCARE EQUATION

183

The function a x ( ? ) is arbitrary in shape, but smooth. It is also vanishingly small in order to justify having kept only first-order terms in the Taylor expansion. In order to make the integrand proportional to a x ( ? ) , the second term is integrated by parts, yielding the extremum condition

The last term vanishes because (for now) Sx(t1) = Ax(t2) = 0. The parenthesized integrand factor must then vanish because Sx is arbitrary. The result is the EulerLagrange ordinary differential equation d aL - aL -dt

ai

ax‘

(5.3.6)

Spelled out more explicitly, with partial derivatives indicated by subscripts, (5.3.7) Identical manipulations in n dimensions yield the equations satisfied by generalized coordinates q‘ :

(5.3.8) In mechanics, these equations reduce to Newton’s equation, F = ma, in optics to the equation for the path taken by a “ray” of light, Eq. (10.1.17).

Problem 5.3.1: Starting from Eq. (5.3.8) and the O.P.L. given by Eq.(5.3.2), derive the differential equation satisfied by optical rays (it will reappear as Eq. (10.1.17))

d ds ( n z ) = Vn, where n ( r ) is index of refraction and r is a radius vector from an arbitrary origin to a point on a ray at a position given by arc length s along the ray.

5.3.3. Calculus of Variations Using the Algebra of Vector Fields In this section {and only this section) we use the notation a/at instead of d / d t for the “total time derivative.” The reason for this is that a new subsidiary variable u will be introduced and the main arguments have to do with functions f (u, t ) and derivatives holding one or the other of u and t constant. Consider again Fig. 5.3.1. The particular (dashed) curve Sx(t) can be called the “shape” of a variation from the looked-for “true trajectory” x*(t) shown as a solid curve. We now restrict the freedom of variatior by replacing 6 x ( t ) with uSx(t), where u is an artificially introduced variable whose range runs from negative to pos-

184

IAGRANGE-POINCARE DESCRIPTIONOF MECHANICS

itive and hence certainly includes u = 0; the range will not matter, but it may as well be thought of as - 1 < u c 1. With 6 x ( t ) being called the shape of the variation, u can be called its “amplitude.” Differentiation with respect to amplitude at fixed time and with fixed shape will be expressed as a / a u . Differentiation with respect to time along a varied trajectory whose shape and amplitude are both fixed will be expressed as a / a t . The function u b x ( t ) is still an arbitrary function of time, but at intermediate points in the analysis, we will insist that only u vary so that the shape 6 x ( t ) can be held fixed and the action variable S treated as a function only of u. The varied curve joining P1 and 9is given, then, in parametric form, as

(5.3.9)

...

+

q“(u, t ) = q*”(t) ui$q“(r).

Though being “restricted” in this one sense, variations will be “generalized” in another sense. In the formalism developed so far, the variation 6 x ( t ) has been described only by deviations Sq’ of the generalized coordinates describing the system at time t. But the variation at time t can be regarded as belonging to the tangent space at the point x ( t ) on the candidate true trajectory at that time. As a result, the shape of a variation can be described by a general vector field w(t). This vector will be treated as “differentially small” so that it makes sense to add it to the position vector of a point in the space. Also it is assumed to vanish at the endpoints. Then Eq. (5.3.9) is generalized to become x(u, t ) = x*(t)

+ uw(t).

(5.3.10)

With this notation, as in Eq. (5.3.3), the action corresponding to Lagrangian L(x, i, t ) is given by (5.3.11) This is the first instance of our unconventional use of the symbol a / a t mentioned above in italics; its meaning here is that integrand L is being evaluated along a varied curve in which u and the shape of the variation are both held constant. Again S ( u ) depends on all aspects of the curve along which it is evaluated, but only the dependence on u is exhibited explicitly. The extremum condition is

+--

d t = i , r 2 ( aL .w+ ax

%g)

dt.

aql

au

(5.3.12) From this integral condition we wish to take advantage of the arbitrariness of the function w to obtain the differential equation of motion. As in Eq. (5.3.5) we must manipulate the integrand in such a way as to leave w (or one of its components) as

VECTOR FIELD DERIVATION OF THE POINCARE EQUATION

185

a common multiplier. This has already been done with the first term, which has furthermore been written as manifestly an invariant-which means it can be evaluated in any convenient coordinate system. If we proceeded from this point using the original coordinates we would reconstruct the earlier derivation of the Lagrange equations (see Problem 5.3.2). Instead we proceed to obtain the equations of motion satisfied by quasi-velocities. Basis vectors directed along the original coordinate curves in the tangent space at any particular point are a/&l , a/aq2,. . . , a/aqn.Symbolize the components of the velocity vector in this basis by vi = 4'. At every point in configuration space arbitrarily different, other basis vectors ql , q2,.. .q,, can be introduced and a vector in the tangent space such as the instantaneous velocity vector v can be expanded in terms of them. Such coordinates of the velocity are known as "quasi-velocities" si. (Recall that a typical example of an s' is an angular velocity around a coordinate axis.) The velocity can then be expressed in either of the forms

As in Q. (5.2.1), the quasi-velocities can also be expressed as linear combinations of the vi components: S"

= ArjvJ.

(5.3.14)

With the time argument suppressed for simplicity, after substitution of Eq. (5.3.14) the Lagrangian is expressible as a new function

-

L ( x , s) = L ( x , X).

(5.3.15)

Proceeding as in Eq. (5.3.12), we obtain

(5.3.16) As noted before, the first term is automatically expressible in invariant form, but we are still left with the problem of expressing the second term in a form that is proportional to the (arbitrary) vector w. We are forced at this point to take what may seem like a digression, referring back to Section 3.4.2, where vectors were interpreted as operators. Let f ( x ) be an arbitrary function of position in configuration space. If x is expressed as a function o f t and u to describe evolution of the system along a given trajectory, then f ( x ) will be similarly expressible as a function o f t and M . Its time rate of change with u fixed can be expressed two ways:

(5.3.17)

186

LAGRANGE-POINCARE DESCRIPTIONOF MECHANICS

Using Eq.(5.3.10), the derivative off with respect to u with t fixed can be expressed in terms of the quasi-basis:

(5.3.18) Recall that w is an arbitrary variation function. Note also that (though it is inconsistent with our normal conventions) the w i are its components relative to the quasibasis-not relative to the original coordinate basis. Eqs. (5.3.17) and (5.3.18) are therefore equivalent to a / a t = s i q i and a / a u = w ' q , . When Eqs. (5.3.17) and (5.3.18) are differentiated with respect to u and t respectively the results are therefore

(5.3.19) Subtracting these equations, the left-hand sides cancel because the order of partial differentiation does not matter, and the right-hand sides can be simplified using [qj,qkl

(5.3.20)

= cikoi?

where a formula for the coefficients cSk was derived in Eq. (3.4.15).The result is asi

awi

au

at

.

- = -+ c;ksjwk.

(5.3.21)

This formula is quite remarkable in that the si and w i are utterly independent quantities-the last term entirely compensates for the differences between their derivatives. This digression puts us in a position to substitute Eq. (5.3.21) into Eq. (5.3.16):

The first and third terms are proportional to w but, remembering that w k is the component in the quasi-basis, we should express the first term as e,t ( Z ) w k in order to obtain w k as a common factor. Also, the second term can be manipulated using integration by parts. Since the function w vanishes at both endpoints, the result is

With the factor w k being arbitrary, the integrand must vanish, or

d dt

aZ

.

ask

J k as1

aZ .

_ _ - cl. -sJ

-

= qk(L).

(5.3.24)

BIBLIOGRAPHY

187

Since the subsidiary variable u can now be discarded, the traditional d / d t notation for total time derivative has been restored. Lo and behold, the Poincare equation has reemerged. As before, the n first-order differential equations in Eq. (5.3.24) have to be augmented by the n defining Eqs. (5.3.14) in order to solve for the 2n unknowns q' and s'. In addition to being much briefer than the previous derivation of the P o i n c d equation, this derivation makes the interpretationof each term clearer. This derivation illustrates the power of the modem vector field formalism. (The derivation was made artificially long here by our verbosity and because of our need to digress to derive Eiq. (5.3.21).)

Problem . . 5.3.2: Starting from Eq. (5.3.12) and assuming a Lagrangian of the form L ( q ' , q ' , t ) where the q' are unconstrained generalized coordinates, derive the Lagrange equations (5.3.8). This yields a derivation that is equivalent to that of Section 5.3.2 but that is made neater by the introduction of the artificial variable u. BIBLIOGRAPHY References for Further Study Section 5.2 N. G. Chetaev, Theoretical Mechanics, Springer-Verlag. Berlin, 1989. H. Poincark, C. R. Hebd. Seances Acad. Sci., 132,369 (1901).

Section 5.3 M. Born and E. Wolf, Principles of Optics, 4th ed., Pergamon, Oxford, 1970. R. Courant and D. Hilbert, Methods of Mathematical Physics, Vol. 1. Interscience, New York, 1953. L. D. Landau and E. M. Lifshitz, Mechanics, Pergamon, Oxford, 1976. H. Rund, The Hamilton-Jacobi Theory in the Calculus of Variations, Van Nostrand, London, 1966. B. F. Schutz, Geometrical Methods of Mathematical Physics, Cambridge University Press, Cambridge, UK, 1995.

Section 5.3.3 V. I. Arnold, V. V. Kozlov, and A. I. Neishtadt, Mathematical Aspects of Classical and Celestial Mechanics, 2nd ed., Springer-Verlag. Berlin, 1997, p. 13.

SIMPLIFYING THE POINCARE EQUATION WITH GROUP THEORY

6.1. CONTINUOUS TRANSFORMATION GROUPS

It is obvious that the symmetries of mechanical systems have a significant impact on the possible motions of the system. To describe this the mathematical treatment can be functional or geometric. The former approach is probably familiar from the concept of “cyclic” or “ignorable” coordinates in Lagrangian mechanics: If the Lagrangian is independent of a coordinate, then its conjugate momentum is conserved. The more purely geometric treatment in Newtonian mechanics should be familiar, for example in the treatment of the angular velocity as a vector subject to the normal rules of vector analysis. In Chapter 8, the use of purely geometric, “Lie algebraic” methods in Newtonian mechanics will be studied, and here we apply group theory to the Lagrange-Poincarb approach. The power of the Lagrange procedure is that it becomes entirely mechanical once the coordinates and Lagrangian have been established, but it can be regarded as a weakness that symmetries of the system have an analytical but not a geometric interpretation. We wish to rectify this lack by incorporating group theoretic methods into Lagrangian mechanics, or rather into the P o i n c d equation, since, as has been mentioned repeatedly, the Lagrange procedure is insufficiently general to handle many systems. Of course we also wish to maintain the “turn the crank‘’ applicability of the Lagrangian method. Though a supposedly “advanced” subject+ontinuous groups-is to be used, only its simpler properties will be needed and those that are will be proved explicitly. Furthermore, only calculus and linear algebra are required. This is consistent with the general policy of this text of expecting as preparation only material with which most physicists are comfortable, and developing theory on a ‘3ust in time” basis. It is not appropriate to claim that a deep understanding of the subject can be 188

CONTINUOUSTRANSFORMATION GROUPS

189

obtained this way, but starting from “the particular”-manipulating a Lagrange-like equation-provides a good introduction to “the general.” As mathematics, therefore, the treatment in this chapter would have to be regarded as “old-fashioned” (being due to Lie it is certainly old) and perhaps even clumsy. A change of variables’ (such as x’ = (1 a ’ ) x a’) that depends on continuously variable parameters (such as a’ and a’) and with the property that a small change in parameters causes a small change in the transformed variable is called an r-parameter continuous transformation (r = 2). Such a transformation acts on a space of n-component variables x (n = 1). If there is a parameter choice (such as u1 = 0, u2 = 0) for which the transformation is the identity, and the inverse a’))x’ (-a2/(1 a ’ ) ) ) , and patransformation is included (x = (1 - u’/(l rameters can necessarily be found that give the same transformation as the product (also known as concatenation or composition) of any two transformations (x” = ( 1 +a1 +b’ b’al)x+ (a2 b2 +b ’a 2 )),the transformation is called a continuous transformation group or a Lie group. Let R(a) = R ( a ’ , a’, . . . ,a‘) symbolize the element of the transformation group corresponding to parameters a. (The main transformations of this sort that have been studied up to this point in the course are orthogonal transformations, with the orthogonal matrix 0 parameterized by three independent parameters, for example Euler angles. In this case R ( a ’ , a’, a 3 ) = O(+, 8,v).) For notational convenience, zero values for the parameters are assumed to correspond to the identity transformation-that is, R(0) = Z.(If this is not true, the parameters should be redefined to make it true as it simplifies the algebra.) In general, R need not be a matrix but it is the simplest case as concatenation is then represented by matrix multiplication. In any case, the concatenation of R(a) followed by R(b)is indicated by R(b)R(a). The existence of a transformation inverse to R(a) requires the existence of parameters 3 such that

+

+

+

+

+

+

+

R(ii)R(a) = R(0)

(6.1.1)

( -a1 - - a ’ / ( l + a ’ ) , a 2 = -a2/(l+a1)).Fortransformation R(a)followedby R(b) the group multiplication property is expressed as the requirement that parameters c exist such that

R(c) = R(b)R(a)

(6.1.2)

(c’ = a1 +b’ +b’a’, c2 = a’+b’ +b’a2).It is primarily this concatenation feature that causes these transformations to have useful properties. Expressed functionally, the required existence of parameters c requires the existence’ of functions 9“(a; b) ‘In the following paragraphs, as new quantities are introduced they are illustrated, in parentheses, by an ongoing example, starting as here with x’ = ( 1 a ’ ) x a2. ’Though the existence of functions gY(a;b) is analytically assured, their definition is implicit and they are not necessarily available in closed form. Examples for which they are explicitly available will be given shortly.

+

+

190

SIMPLIFYING THE POINCAREEQUATION WITH GROUP THEORY

such that cK = @ K ( a l..., , a'; b ' ,

. . . ,b'),

K

= 1 , 2 , . . . , r,

or c = &a; b) (6.1.3)

(@'(a;b) = ~ ' + b ' + b ' a ' , @ ~ b) ( a= ; a2+b2+b1a2).Forourpurposes,Eq. (6.1.3) will be employed primarily in situations where B is infinitesimal, meaning that it corresponds to a transformation close to the identity; to signify this, we change symbols b + 6a and identify c as a da. Then Eq. (6.1.3) yields

+

a

+ da = 4(a; ba).

(6.1.4)

(Throughout this chapter, the symbol S will always be associated with near-identity group transformations.) Differentiating EQ.(6.1.4) yields a linear relation between the increments (Sa', . , . ,Sa') and the increments (da', . . ., da'), daA = B i ( a ) S a f i , where B i ( a ) =

...,d; b l , . . . ,b r ) abfi

.

(6.1.5)

The matrix B is r x r. It provides the (linearized) variations of (arbitrary) Lie parameters due to a subsequent close-to-identity transformation.Introducing its inverse, A = B - ' , and inverting Eq. (6.1.5) yields

(6.1.6)

SaA = A\(a)dafi.

(In our example

Note that A and B, matrices belonging to the group, are independent of x. Continuous transformations, when acting on a configuration space with generalized coordinates q' (in the simplest case the original coordinates q' are Cartesian coordinates ( x , y, z)), are expressed by functions f (nonlinear, in general, and independent in ways to be clarified later) such that

'

q" = f ' ( q 1 , . . . ,q";a 1, . .. , a'),

i = 1 , 2 , . . . , n or q' = f(q; a)

(6.1.7)

(f'(q; a) = (1 + a ' ) x + a 2 ) .Derivatives of these transformations will be symbolized by (6.1.8)

USE OF INFINITESIMAL GROUP PARAMETERS AS QUASI-COORDINATES

191

= x , u12 = 1). Because the derivatives are evaluated at a = 0, the functions an instantaneous system reconfiguration can be represented by an infinitesimal group transformation, there is a corresponding velocity of evolution of the system coordinates (ull

u’, (q) are independent of the near-identity parameters a. If

dq‘- - uIK(q)--: . SaK -

(6.1.9)

dq’ = uiK(q)SaK,

(6.1.10)

dr

dt

or in general

where the SaKare the parameters of the transformation. A requirement to be used below is that the functions uIk be independent-none of them is allowed to be identically expandable in terms of the others. One says that all of the parameters are to be essential-identities like a2 = 2a’, a3 = a’ a2,or a’ = any-function(a2, a3, .. .) are to be excluded. The concatenation requirement of Eq. (6.1.3) for these transformation functions is

In our example r > n, but commonly the number of parameters r is less than the number of independent variables n. The arguments in the next section are followed most easily in the case r = n.

6.2. USE OF INFINITESIMAL GROUP PARAMETERS AS QUASI-COORDINATES Sometimes the time evolution of a mechanical system can be described using a continuous group (for example the rotation group when rigid body motion is being described) with the coordinates q(t) expressible in terms of the transformation functions f;

q(r) = f(q(0);W)).

(6.2.1)

Here it is assumed that the configuration can necessarily3 be described as the result of operating on an initial configuration q(0) with R(a(r)). At a slightly later time the same equation reads

q

+ dq = q(t + d t ) = f(q(0); a(t + d t ) ) .

(6.2.2)

3A continuous group is said to be “transitive” if it necessarily contains a transformation carrying any configuration into any other. This requires r ? n.

192

SIMPLIFYING THE POINCARE EQUATION WITH GROUP THEORY

FIGURE 6.2.1. Pictorial representation of alternate sequential transformation leading from an initial configuration to the same altered configuration.

The occurrence of time variable t suggests that Eq. (6.2.2) describes the actual motion of a particular system, but we wish also to describe "virtual" configurations that are close together, but not necessarily realized in an actual motion. For such configurations

q

+ dq = f(q(0);a + da).

(6.2.3)

Eqs. (6.2.1) show that the quantities a, called parameters so far, can be satisfactory generalized coordinates and can serve as independent variables in Lagrange equations for the system! Before writing those equations we pursue some consequences of Eq. (6.1.1 l), applying it to a case where a is macroscopic and b = da is differential;

q

+ dq = f(f(q(0);a);6a) = f(q(0); @(a;6a)) = f(q(0);a + da) = f(q; da). (6.2.4)

As illustrated in Fig. 6.2.1, parameters a

a+d?

+ da describe the direct system reconfig-

uration q(0) q 4-dq. But the final configuration can also be produced by the da sequence q(0) 8, q 3 q dq. In the latter case the final transformation is infinitesimal, close to the identity, and its parameters are 6a = (Sa' , . . . ,Sa'). We encountered equations like Eq. (6.1.10) (or rather its inverse) in Eq. (5.2.1), while discussing quasi-coordinates a' with their corresponding quasi-velocities si . As in that case, though small changes SaK can be meaningfully discussed, it is not valid to assume that the quasi-coordinatesaK can be found globally for which these are the resulting differentials. On the other hand, Eqs. (6.2.1) define the parameters a" globally. We will shortly employ the SaA's as differentials of quasi-coordinates. We now have three sets of independent differentials-local displacements can be expressed in terms of dq' (not independent if r < n), dux, or Sa". From the outermost members of Eq. (6.2.4), substituting from Eq. (6.1.8), variations dq' can be related to variations d a A indirectly via variations SaK,

+

dq' = uiK(q)6aK = uiK(q)AK,(a)daA,

(6.2.5)

4There are situations with r 5 n in which reconfigurations are usefully described by variations of a, but they can form a complete set of generalized coordinates only if r = n. The parameters a will in fact be interpreted as generalized coordinates below.

INFINITESIMAL GROUP OPERATORS

193

where 6aKhas been replaced using Eq. (6.1.6).This leads to

(6.2.6) The first factor depends on the configurational coordinates q (and implicitly on the definition of, but not the values of, the parameters a); the second factor is a property of the transformation group only. The elements of (6.2.6)can be regarded as the elements of the Jacobian matrix of the transformation q -+ a only if r = n, but they are well-defined even if r < n ; in that case, the variations generated by variation of the group parameters span less than the full tangent space. Expressions (6.2.6) are fundamental to the proof of Lie’s theorem, which governs the commutation relations of infinitesimal transformations of the Lie group. Before turning to that, it will be useful to first associate directional derivative operators with tangent space displacements. 6.3. INFINITESIMAL GROUP OPERATORS It has been remarked that different variables can be used to specify displacements in the tangent space at q. Here we concentrate on Lie transformations close to the identity, as parameterized by S a p . Consider any general (presumably real) scalar function F(q) defined on the configuration space. Using the left portion of (6.2.5), its change d F corresponding to parameter variation 6aP is

aF . aF d F = -dq’ = -u’ P (Q)6aP. (6.3.1) 34’ 89‘ The purpose of this expansion is to express dF as a superposition of changes in which one parameter aA varies while the rest remain constant. To emphasize this, Eq. (6.3.1) can be rearranged and reexpressed in terms of operators X, defined by5 ‘

.

dF = (Sap X,)F,

a

where X, = u*,(q)-.

aqi

(6.3.2)

The operators X, (which operate on functions F(q)) are called “infinitesimal operators of the group.” There are r of these operators, as many as there are independent parameters in the group. Each one extracts from function F the rate of change of F per unit change in the corresponding parameter with the other parameters fixed. Though the motivation for introducing these operators Xp comes entirely from our analysis of continuous groups, the operators are no different from the vector fields discussed in Section 3.4. To conform with notation introduced there, they have been assigned boldface symbols, and the O / 8 q i are given similar treatment. 51n Eq.(6.3.2) (and all subsequent equations) the order of the factors 2 and u’,(q) has beenreversed . Bq’ to avoid the nuisance of having to state that does not act on u I p . As a result, the indices no longer appear in their conventional, matrix-multiplication order. But since u I p has one upper and one lower index, their order doesn’t really matter; if one insisted one could restore the conventional order by defining u$ = upi and regarding the elements 8, as being arrayed as a row instead of a column.

w

B4’

194

SIMPLIFYING THE POINCARE EQUATION WITH GROUP THEORY

Instead of the differentials Sap, one can introduce "quasi-velocities''

(6.3.3) Then the system evolution described by Eq. (6.1.9) results in evolution of the function F ( q ) according to

dF -= sPXpF. dt

(6.3.4)

Results obtained previously can be reexpressed in terms of the X,. For example, choosing F to be the variable qi yields a result equivalent to Eq. (6.1.9):

+

+

Example 6.3.1: For the infinitesimal transformation x' = (1 a')x a2, it was shown above that u l1 = x , u \ = 1 , and the infinitesimal operators are therefore

Example 6.3.2: For 2-D rotations given by x' = R(a')x,

($) - (cossin aa'1

-sin a ' ) cos a 1

(d)

4' = a ' +b'

(6.3.6)

Note that this result followed from the fact that rotation angles about the same axis are simply additive. Explicitly, the transformation formulas and infinitesimal operators are

f' = cos a 1 x 1 -sin a' x2, u"

2

= -x ,

f 2 = sin a' x ' +cos a' x 2 ,

u 21 = X I ,

XI = - x 2 ( a / a x ' )

+ n'(B/8x2).

Anticipating later formulas, the same result could have been obtained using the matrix J3 defined in Eq. (4.2.26) (satisfying J: = -1 after suppressing the third row and the third column). x' = e4J3x,where a more conventional symbol for the parameter results from setting a' = 4. Differentiating with respect to r#~ yields

195

INFINITESIMAL GROUP OPERATORS

0 - o)x=(;:2). 1

Example 6.3.3: In 3-D, consider the transformation

o x' = ea.Jx, where a -J =

a2 -a'),

-a3

0

a3 -a2

(6.3.7)

o

a'

and the triplet of vectors J was defined in Eq. (4.2.26).This expresses the matrix for angular rotation around i by macroscopic angle a in terms of the matrix describing = 1 a . J/1! microscopic rotation around the same axis. Expanding it as (a . J)2/2! . . ., differentiating with respect to a', then setting a = 0 yields

+

+

+

Combining this with corresponding results for a2 and a3 yields -x3

0

x2

-x'

(6.3.8) 0

A common notation is to change the name of these particular differential operators to Ri:

R1 = x2(8/Lk3) - x3(8/ax2), R2 = x3(a/8x') - x'(8/ax3), R3 = x ' ( S / O X ~ -)x2(S/&').

(6.3.9)

These can be written compactly as Ri = e i j k x j a / a x k . In the next section it will be where #Ji is a rotation angle about axis i. shown that Ri = a/a#Ji,

Example 6.3.4: The 3-D Rotation Group. To calculate the matrix B i j , defined in Eq. (6.1.6), for the 3-D rotation group (in a special case) it is sufficient to consider a finite rotation R(a) like that of Example 6.3.2 followed by an infinitesimal rotation R ( b ) like that in Example 6.3.3: R(c) = R ( b ) . R(a) =

b2 -6') 1

0 ( Io cosa' o sina'

1 b3 -b2

-b3 1 b'

1 b3 -b2

-cosa'b3+sina'b2 cosal - sinal b' sina' + c o s a l b l

-sina' cosa'

sina'b3+cosa'b2 - sina' - cosa' b' cosa' - sinal b'

(6.3.10)

196

SIMPLIFYING THE POINCARE EQUATION WITH GROUP THEORY

Ideally, this result would be expressible in the form R ( c ) = since that would mean the coefficients c were known. Not knowing how to do this we have to proceed less directly. The computations can usefully be performed using Maple; the listing is given below. The eigenvalues of a 3-D orthogonal matrix are, in general, given by 1, e*@, where 4 is the rotation angle. The trace of a matrix is the sum of its eigenvalues, and it can be seen from the matrix R(a), which represents pure rotation around the x-axis by angle u ' , that the sum of eigenvalues for an orthogonal matrix is 1 cia' +e-iu' - 1 2cosa'. Letting #c be the rotation due to matrix R ( b ) . R ( a ) , it

+

+

+

follows that 1 2 c o s 4 ~= 1 + 2 COSU' - 2 sina' b'. and hence, to lowest order in b', 4~ w u1 + b'. For the matrix R ( c ) , let u(') be the eigenvalue corresponding to eigenvalue 1 . That is, Rv(') zz and hence also R T d ' ) = ~ ( ' 1 ,since RT = R - ' . It follows that

( R - RT)u(')= 0, and from this follows the proportionality

($::) U p

(R23 R31 - R13 R32) . R12 - R21

+

This can be converted to a unit vector and then multiplied by u' b' to produce a vector with both the correct direction and correct magnitude. These calculations are carried out in detail in the accompanying Maple run. For the argument value a = ( a ' , 0, 0), the functions defined in Eq. (6.1.3) are given by

(6.3.11) Then we have

To deemphasize the special significance of the first axis let us replace U ' by a variable #, a rotation angle around arbitrary axis. The determinant J ( 4 ) = IBI, the Jacobean of transformation (6.1S ) , may depend on the rotation angle, but not on the rotation axis; by Eq. (6.3.12) it is given by

(6.3.13)

INFINITESIMALGROUP OPERATORS

197

which satisfies J ( 0 ) = 1. This Jacobean relates increments of parameter volume according to

d3a = J(4)s3a

(6.3.14)

where da and 6a are the parameter increments illustrated in Fig. 6.2.1. 6.3.0.1. Inwarimt fntegration: For a function f defined for every element I;:, = 1,2, . . . , N, of a finite group, it is possible to define the “sum” F = This amounts to assigning equal (unit) weight to each element of the group, and the sum is “invariant” in the sense that, for any particular group element T, we have f(TI;:)= f(Tj),since the two summation differ only by the ordering of their terms. We wish to define a similar “sum” over elements of our continuous group. If the parameter u l ,u 2 ,u3 are plotted along rectangular axes, since “radius” /ul’ a2’ a3* = 4, the rotation angle, all possible rotations can be represented by points in the interior of a sphere of radius A, with diametrically opposite points being identified as representing the same transformation. A‘summation” of function f over all transformations of the group takes the form of a sum over “grains” ai or as an integral

EL,f(c).

i

EL,

EL,

+ +

(6.3.15)

where R(a) is the group element, and pd(4)d3a is the “number of grains” in parameter volume d3a. For this sum to be invariant, the “density” pd(4) must depend only on 4, as shown, and for the same reason, the function f must depend only on 4 as well. Complicating the task of defining this sum to be invariant is the fact that, for any particular transformation R,the parameters of product transformation RR(ai) will not in general match any of the initially chosen grains. But it is possible to choose the density pd to provide the integral form of the summation with the desired property. Reintroducing the infinitesimal parameters 6a1,Sa2, 6u3, the number of grains in parameter volume &i3acan be expressed as ps(Sa) 63a = S3u, where we have arbitrarily set the density in the vicinity of the identity transformation equal to 1. Invariance can be assured by requiring pss3a = pdd3a, and for this, Eq. (6.3.14), we require pd(4) = J - ’ ( + ) , where J ( 4 ) is given by Eq. (6.3.13). Using polar coordinates to evaluate integral (6.3.14), we obtain (6.3.16) as the invariant integral over parameter space.

198

SIMPLIFYING THE POINCARe EQUATION WITH GROUP THEORY

Maple Program > >

# Investigate

3D Rotations

uith(lina1g): readlib(trigsub6) : Ra := matrix( 3, 3, [ 1 , 0 , 0 0 , cos(a1) , -sin(al) o , sin(ai) , cos(a1)

, 1 1;

c1 0 [ Ra := [ 0 cos(a1) [ [ 0 sin(a1)

Rb := matrix( 3, 3, [ 1 , -b3 b3 , 1 -b2 , bl

, , ,

0

-

1

sin(a1) cos(a1)

1 1 1 1

b2 , -bl , 1 1 1; [

1

-b3

c Rb := [ b3

1

c [-b2

b 2 1

- bl

bl

1 1 1

1 1

RbRa := evalm(Rb &* Ra) ; [

1

c

RbRa := [ b3

c

[

-

b2

- b3

cos(a1)

cos(a1)

-

+

b2 sin(a1)

b3 sin(a1) + b2 cos(a1)

bl sin(a1)

bl cos(a1) + sin(a1)

CosphiC := 1/2*( RbRa[l,l] + RbRa[2,2] + RbRac3.31

phiC := a1 + bl; phiC := a1 + bl cllp :- RbRa[2,3] c12p := RbRa[3.1] c13p := RbRa[1,2]

-

-

-

RbRa[3,2] ; RbRa[1,3]; RbRa[2,11;

-

sin(a1) cos(a1)

-

1 1;

- bl

cos(a1)

- bl sin(a1)

1 1 1 1 1

INFINITESIMALGROUP OPERATORS

199

cllp := - 2 sin(a1) - 2 bl cos(a1) c12p := - b2 - b3 sin(a1) - b2 cos(a1) c13p := - b3 cos(a1) + b2 sin(a1) - b3 expand(cl1p-2 + c12p-2 + c13p-2): csq := subs( {bl^2=0,b2~2=0,b3'2~0,b1~3=O,b2~3=0,b3^3~0~, I' ; 2

csq := 4 sin(a1)

+ 8 sin(a1) bl cos(a1)

taylor( phiC*cllp/sqrt(csq), bl=0, 2): simplify( 'I , sqrt): # csgn stands for "complex sign" which can be taken as -1 subs( csgn(sin(al))=-l , 1: c-1 :=

1'

.

2 cll := a1 + bl + O(b1 ) taylor( cl2p/sqrt(csq), b2=0, 2) : # This result is already linear in small quantities (b's) taylor( phiC*(") , bl=O, 1): simplify( 'I , sqrt): subs( csgn(sin(al))=-l , 'I 1: c12 := ; # Dropped terms are actually quadratic in b's

taylor ( cl3p/sqrt (csq) , b2=0, 2) : taylor( phiC*(") , bl=O. 1): simplify( 'I , sqrt): , 1: subs( csgn(sin(al))=-l c13 := 11 .

Problem 6.3.1: Consider a matrix of the form ex where X = a . J as given in Eq. (6.3.7);that is, X is a skew-symmetric 3 x 3 matrix. Show that sin+ eX=l+-X+

4

1 -cos+ ~2

X , where

+2

1 2

= -- tr(X2).

(6.3.17)

DO this first for a' = 4, a2 = a3 = 0, which was the special case appearing in Example 6.3.4.

200

SIMPLIFYING THE POINCAREEQUATION WITH GROUP THEORY

6.4. COMMUTATION RELATIONS AND STRUCTURE CONSTANTS OF THE GROUP

The operators X, operate on functions to describe the effect of infinitesimal group transformations; the functions u i p are expansion coefficients (see Eq. (6.3.2)).The property of the infinitesimal operators making them specific to some continuous group is that their commutators can be expressed in terms of so-called group "structure constants." Lie proved that for a continuous group these structure constants are, in fact, constant. This will now be demonstrated. Substituting from Eq.(6.3.2), the required commutators can be expressed in terms of the operators 8;; 89

(6.4.1) Though quadratic in the functions X, (and hence in the functions do)these will be shown to be expressible as a linear superposition (with constant coefficients) of the operators Xu themselves. The expression for [X,,X,] in Eq. (6.4.1) can be simplified using results from the previous section. The quantities 6aK,being differentials of quasi-coordinates, are not necessarily the differentials of globally defined coordinates. But they occur only as intermediate variables, and the variables q and a are globally related by Eqs. (6.2.1).By equalifyofmixed partials it follows that

(6.4.2) In these equations it is not required that r = n. In the summations, Roman indices range from 1 to n, and Greek indices run from 1 to r . Substituting from Eq. (6.2.6) yields

(6.4.3) Differentiating ui,(q(a)) with respect to a* and using Eq. (6.2.6) yields

(6.4.4) Substitution into Eq. (6.4.3)yields a relation satisfied by the coefficient of a/&' on the right-hand side of Eq. (6.4.1):

COMMUTATION RELATIONS AND STRUCTURE CONSTANTS OF THE GROUP

Multiplying by B y

E

(A-’)?

where the coefficients c:~ PoincarC equation:

and by B i

= (A-’):

201

yields

have the same form as encountered previously in the

(6.4.7) This definition should be compared with Eq. (5.2.17), encountered while discussing the Poincari equation. (See also Section 3.4.5.) The only difference has to do with the choice of independent variable. Instead of using the qK variables in the earlier derivation of the PoincarC equation, we could have used the variables a K .In that case, the formula (6.1.6) would have resembled the formula (5.2.1) in which quasicoordinates were first introduced. It seems therefore as though our notation has been ill-advised, since the matrices A; appearing here and there have different meanings. However, it was shown in Section 5.2.2that the coefficients c“& depend on the quasicoordinates only, not on the particular variables used in defining them. It follows that the coefficients c:, calculated here are the coefficients appearing in the Poincari equation. Previously the matrix Aij was arbitrary; now it is specific to the Lie group. To continue the proof of Lie’s theorem, one can take advantage of the antisymmetry in its lower indices of cKTa by subtracting from Eq. (6.4.6) the same equation with r and CT reversed. Next differentiate with respect to a”, using the expansion a / a d = (aqk/aaP)(a/aqk) while evaluating the left-hand side-this permits the parameters a to be treated as constant-to show that it vanishes. Finally, because of the factorized form of the right-hand side, a/aaP can be applied directly to it. The final result is

(6.4.8) Such an identity would imply a functional dependency among the uiK,which have, however, been hypothesized to be independent. It follows that the coefficients vanish and hence that the c“& are constants. We have seen before that the constancy of these coefficients simplifies the Poincark equation markedly. Lie also proved the remarkable converse result according to which the constancy of the cKTuimplies the existence of a corresponding continuous transformation group. We will not need that result. Recapitulating, Eqs. (6.4.I), (6.4.6), and (6.3.2) yield

x

[X,,X,] =c;,,u’K-‘ 8 =

8s

‘TO

K’

(6.4.9)

J

The name “structure constants” given to the coefficients c:,, is based on this relation.

202

SIMPLIFYING THE POINCARE EQUATIONWITH GROUP THEORY

6.5. QUALITATIVE ASPECTS OF INFINITESIMALGENERATORS

The infinitesimaloperators Xi are examples of the vecrorjelds introduced in earlier chapters and are related therefore to directional derivatives. Recall, for example, Eq. (2.4.34): B

8 BY

-=j=ey.

(6.5.1)

When introduced, Eq. (2.4.34) was justified only for Euclidean axes. These definitions provided a natural association of basis vectors such as ey and the corresponding operator When applied to function F ( x ) , this operator extracts the (linearized) variation of F ( x ) as y varies by 1 unit while x and z are held fixed. This is also known as the directional derivative in the y direction of the function F. Before developing this further, we make two qualitative comments. First, an abstraction that most physicists first grasp in high school is that of insfuanraneousvelocity, as well as its distinction from average velocity. Whereas average velocity relates to actual displacements occuring over finite times, instantaneous velocity yields an imaginary displacement that would occur in unit time if conditions remained unchanged. Another abstract concept is that of “virtual displacement,” say Sq’, when the generalized coordinates are 4’.This is a possible displacement, consistent with constraints if there are any; it is a conceivable but not necessarily an actual displacement of the system. The actual motion picks one out of the infinity of possible virtual displacements. Second, for some purposes the detailed trajectory taken by an object is important, but the particular parameterization of the trajectory, and in particular the time, is not. Here is a far-fetched example: Suppose you are writing a field guide to birds, intended to provide information as to which birds can be seen near which cities. To research it you drive from New York to Florida, taking down at intervals the time, birds observed, and the towns where they were observed. Suppose you make a mistake and read the odometer instead of the clock, or fail to hold your speed constant. Neither of these errors would affect the data, since the time data is used at most for interpolating between towns. Of course, it is important for the correct route to be taken and for town names to be recorded correctly whenever a bird is observed. The effect of recording distance instead of time is only to change the parameterization of the route, but not the route itself. We now return to our discussion of continuous transformations. The operator X, is a “directional derivative operator” acting in the direction for which only parameter up varies. Hence rotation operator R1 operating on F(q) extracts a quantity proportional to the differential d F in an infinitesimal rotation around the x-axis. If F depended, say, only on r , then d F would vanish. Consider the “infinitesimal rotation” of Example 6.3.4:

6.

1 b3 -b2

-b3 1

b’

b2 -bl). 1

203

QUALITATIVE ASPECTS OF INFINITESIMAL GENERATORS

This equation requires b to be dimensionless. Substituting b = $a$, where $ is a unit vector along the rotation axis, is at least dimensionally consistent: 1

843 -842

-843 1 84’

842 -84’).

(6.5.2)

1

Explicitly, pure rotation around the x-axis through an angle 84’ is described by XI

= x,

yl = y - S&Z,

ZI

=z

+64’y.

(6.5.3)

Though valid for small rotation angles, these equations clearly break down well before the angle becomes comparable with one radian. They have to be regarded as a “linearized” extrapolation of the instantaneous angular motion as follows. Consider the vector P T = (Ax, Ay, Az) = (0, - z , y) shown in Fig. 6.5.1. Being tangential, it can be said to be directed “in the direction of instantaneously increasing &.” Also, its length being equal to the radius R, it is the tangential motion correspondingto unit increase in coordinate # I . An angular change of one radian can scarcely be called an infinitesimal rotation, but one may proceed indirectly, starting with the vector (0,- z / q , y/q) where the numerical factor q is large enough that arc and vector are indistinguishable. Scaling this back up by the factor q produces the segment PT, which is declared to be the geometric representation of instantaneous rotation of 1 radian. Such a “tangent vector” PT can also be associated with a “directional derivative” symbolized by a/&’, with operational meaning the same as those of previously introduced tangent vectors.

FIGURE 6.5.1. Pictorial illustration of the ”vector” a . The point T is reached from point P by Gf motion along the tangent vector corresponding to unit increment of

204

SIMPLIFYING THE POINCARE EOUATlONWITH GROUP THEORY

Referring again to Fig. 6.5.1, the linearized change of arbitrary function F ( x , y, z) when (x, y , z ) changes as in Eq.(6.5.3) is

+

d F = F ( x , y - W’Z, z 64’~) - F(x, y , Z) (6.5.4)

To be consistent with previous terminology, this change d F should be given by 6#’X4r F , where X+P’ is the infinitesimal operator corresponding to angle 4’. Guided by this, we de@e the infinitesimal operator X41 and the symbol 8/84’ by (6.5.5)

This is dimensionally consistent, since 4’ is an angle. Again, even though 1 is not a ‘‘small’’ change in I$’, XIF yields the change in F that results from simply “scaling up” the first-order Taylor series approximation in the ratio 1/6r$I. This yields the linearized change in F for unit change of the parameter 4’ in the direction in which only 4’ changes. Also, even if the existence of variable as a globally defined variable is problematic, the meaning of the partial derivative 8/84’ is not. The issue of angular units is confusing. If we choose to measure 4’ in degrees instead of radians, Eq. (6.5.3)acquires extra factors of n/180multiplying the 64”s (because they are now in degrees). In order for d F = &‘X,I F to remain true, Eq. (6.5.5) is replaced by

X+I = (n/180)(-zS/ay

+ y B / 8 z ) , = 8/84’,

4’ in degrees.

In this case, the “unit” coordinate tangent vector is (n/180)(0, -z, y). For pure rotation at angular velocity m1 around the x-axis, the positional changes occurring in time 6 r are given by xI=x,

yf = y - - w 1St

z, z‘=z+-w 1 6t y.

(6.5.6)

Except for the factor 6r (needed if nothing else to mollify physicists concerned with dimensional consistency), this is the transformation R(w) of Example 6.3.4 above.6 Since W ‘ is a velocity, one is perhaps interested in d F / d t , which is given by

18F = w 1X,p F

= w I R1

F . (6.5.7)

% h e factor 81, strongly suggestive of true particle motion, is misleading in Eq.(6.5.6) if virtual displacements are intended. We will be willing to simply suppress the factor 6t (that is. set it to 1) while continuing to call w the angular velocity. Then the changes in Eiq. (6.5.6)can be regarded as changes per unit time. The need to retain 6r in these equations for dimensional consistency can serve as a reminder that the angular velocity u1is not the quantity to be regarded as the Lie parameter to,& identified as a quasicoordinate. Rather, the Lie transformation to be used is that of Eq. (6.5.2). and &’ is the quasi-coordinate.

THE POINCARc EQUATION IN TERMS OF GROUP GENERATORS

205

These comments have been made in preparation for applying the terminology of continuous groups to the Poincard equation. A quantity such as da’ , the differential of a continuous group parameter, will be identified with d a , the differential of quasi-coordinate D , which is related to quasi-velocity s by s = d a / d t . Suppose, for example, that the role of s is to be played by m l , the angular velocity about the x-axis. For continuous group transformation relevant to this variable, we use Eq. (6.5.3), so that 4’ is the quasi-coordinate corresponding to a and = m’. The coefficients =y defined in Eq. (6.1.8) become uil = S f ’ / a # ’ , or u ’ ~= 0, u22 = -z, and and the infinitesimal operator corresponding to 4‘ is X @ I= y a / a z - z a , f a y . It is unfortunate that so much circumlocution has been required. With angular velocities being manifestly well-defined quantities, why not simply define matching angular coordinates? Once again, it is because such angular coordinates do not exist! (As an aid to memory on this point one can visualize taking a three-legged round trip, keeping track of one’s “westerly displacement” by integrating one’s westerly velocity. For the first leg, proceed in a more or less westerly direction. Proceed next in a pure northerly direction, all the way to the North Pole, as the second leg of the trip. For the final leg, take a pure southerly direction along the longitude passing through the starting point. One returns to the starting point but one’s “westerly displacement” does not.) Finally, we can express the remaining terms in the Poincark equation (or for that matter the Lagrange equation) using the infinitesimal operators X,. Derivatives, with respect to position, of functions such as T or V can be expressed in terms of derivatives with respect to quasi-coordinates such as p (which may be a true coordinate) using the relations

6’

aT

- =X,T, aP

and

BV -=XpV aP

(6.5.8)

6.6. THE POINCARE EQUATION IN TERMS OF GROUP GENERATORS In the last chapter, the Poincard equation was written in what we now appreciate was only a preliminary form. At this point we have developed machinery that permits it to be written in terms of group generators: d aT dt asp

aT c ; p ( w j p - X,,T = -XaV,

p = 1,..., n.

(6.6.1)

The left side contains “inertial terms,” the right side “force terms.” Many points of explanation have to be made about these equations, especially because some of the symbols have not been fully defined: The quantities sp are quasi-velocities, as defined previously. They are related to quasi-coordinates u p by s p = 3‘.As we know, it is in general not possible to “integrate” these to define D P globally, but neither will it be necessary. It

206

SIMPLIFYING THE POINCARC EQUATION WITH GROUP THEORY

is assumed that the kinetic energy has been reexpressed in terms of the quasivelocities T = T(q, s), but we are no longer indicating this with an overhead bar. 0

The coefficients ci,(q) have also been defined previously, in Eq.(5.2.17): (6.6.2) where the s p were given in Eq.(5.2.1): s p = AP(q)qi,

r = 1 , 2 , . , ., n .

(6.6.3)

Their dependence on position is indicated explicitly to emphasize the point that, in general, they are not constants. (In many cases of interest they are constant, however.) By inverting Eq. (6.6.3), one obtains expressions for q' which, when substituted in Eq. (5.1.3), provide the kinetic energy in the functional form T(q. s). 0

The quantities X, are directional derivatives taken in the direction of changing D P , which is to say, in the direction of s p :

a x, = BOP

(6.6.4)

*

(In discussing rotational motion below, these operators will be symbolized by R,, R,, R,, or just Ri.) In the most important cases, these directional derivatives are infinitesimal operators corresponding to parameters of a continuous transformation group. In general, they need only correspond to whatever quasivelocities have been chosen. Notice that the generalized force is calculated by differentiating potential energy function V in the direction of varying ur,which is just what the expression X, V(q, r ) provides. The term X,T has been derived previously in Eq. (5.2.18) in the same way. 0

The coefficients c' (q) can also be expressed in terms of commutators of the fie operators X, according to

[X,, x, I = CYpa (q)&

9

(6.6.5)

If the X, are infinitesimal operators of a continuous group, the coefficients cK 40 are constants, independent of q, as has been shown previously. In this case it is not necessary to obtain the coefficients from Eq. (6.6.2) since they can be obtained from Eq. (6.6.5). For example, for rotations they are given by Eq.(6.3.9). 0

As with the Lagrange equations, by defining L = T - V, and using the fact that V is independent of velocities, these equations can be simplified somewhat.

THE RIGID BODY SUBJECT TO FORCE AND TORQUE

207

Unlike in Lagrangian analysis, where finding appropriate generalized coordinates is the main focus, it is the choice of velocity variables that is central to the use of the Poincarb equations. Because rotational symmetry is so common, the quasiangles # x , 4 Y , and # z , corresponding to rotation angles around rectangular axes, are the prototypical quasi-coordinates. As we know, these angles do not constitute valid generalized coordinates because of the noncommutativity of rotations, but they are satisfactory as quasi-coordinates. Though attention has been focused on the quasi-velocities, once they have been found it remains to “integrate” them to find actual displacements.

6.7. THE RIGID BODY SUBJECT TO FORCE AND TORQUE 6.7.1. Group Parameters as Quasi-Coordinates By interpreting group parameters as quasi-coordinatesthe infinitesimal operators can also be derived directly, without using the continuous group formalism, though their signs depend on the interpretation of the transformations. Consider the group of translations and rotations in three dimensions x i = b‘ + O\zk.

(6.7.1)

The coordinates x k belong to a particular particle, say of mass m. A possible further index distinguishing among particles is not indicated explicitly. If there are N particles, this will have introduced 3N coordinates. But if the system is a rigid body, there must be enough constraints to reduce these to six independent generalized coordinates. The parameters of a transformation group can serve this purpose as quasicoordinates. Clearly the vector b is to be the quasi-coordinatecorresponding to translation. Its corresponding quasi-velocity is v = b. The matrix elements 0;will parameterize rotational motion. The group is transitive-there is a choice of parameters giving every configuration, and vice versa. As written, this transformation still has too many parameters, however. There are three parameters b‘ and nine parameters of the orthogonal matrix 0:. (Geometrically, the elements 0: are direction cosines of the axes in one frame relative to the axes in the other frame; this can be seen by assuming ( x j - b’)C, = xi& and using Eq. (6.7.1) to evaluate 8i . g j .) These matrix elements satisfy the orthogonality conditions, O‘kOJk = & j ,

oio k kj

=&j,

(6.7.2)

where summation on k is implied even though both are upper or lower indices. The reduction to a minimal set of independent parameters can proceed as follows. Since the transformation to quasi-coordinates is actually a velocity transformation we differentiate Eq. (6.7.1) with respect to time, yielding

xi = b‘

+ Oi$ + 0;x‘ A.

(6.7.3)

208

SIMPLIFYING THE POINCARE EQUATION WITH GROUP THEORY

As in Section 4.2.6, we introduce the matrix fl = OTO = -OT0 Qij

= okid5,

(6.7.4)

that was shown there to be antisymmetric.The components of 0 = On are

(6.7.5) which can be written as d n i = -6.. Ilk on.&. J

(6.7.6)

This exhibits each of the redundant 0 velocities (Lea,matrix elements) as a linear superposition of the three independent quasi-velocitieswk. We now interpret Eq. (6.7.1) in the spirit of Eqs. (6.2.1) and (6.57). with Tk E ~~(0 Consider ). an arbitrary function of position F ( x ( t ) ) and evaluate its derivative d F / d t , which is its rate of change as observed at a point with moving frame coordinates xk;

Because the variation of F can be expressed in terms of the nonredundant variables wk along with the ui,together they comprise a complete set of velocities. The infinitesimal translation operators are

(6.73) The infinitesimal rotation operators are

(6.7.9) In a problem below, this definition will be shown to be equivalent to our earlier definition of Ri . The infinitesimal translations commute with each other. The structure constants of rotation generators are given by 1

~ 3= 2 -~\3=

I,

(6.7.10)

with cyclic permutation, as the following problems show. These constants will also be obtained in a more direct way later on.

Problem 6.7.1: From these equations, derive the commutation relations (6.7.11)

209

THE RIGID BODY SUBJECT TO FORCE AND TORQUE

Perform a similar operation for cyclic permutations. This result could have been obtained differently; for example, it could have been obtained following solution of the next problem.

Problem 6.7.2: Show that the infinitesimal rotation generators can be written (6.7.12) where r$’, qP’, and tpz. are quasi-angles corresponding to w x ,my, and w z . (Evaluate a F / a @ k in terms of variations in O j k , and use the fact that aO/at$ and ad/@ mean the same thing.)

Problem 6.7.3: Evaluate all commutators of the form [Xi,R,].In the next section, to calculate the kinetic energy, uk will be specialized as being the centroid velocity. 6.7.2. Description Using Body Axes Calculation of the kinetic energy proceeds exactly as in Lagrangian mechanics. Consider a moving body with total mass M ,with centroid at x c moving with speed u, and rotating with angular velocity w about that point. With a general point in the body located at SZ relative to C, one has

C m i = 0,

(6.7.13)

by virtue of C’s actually being the centroid. Then the kinetic energy is given by

1 = -Mu2

2

1 + -1 x m ( W x Q2 = -1M u 2 + -Ss’i,,W”,

2

2

2

(6.7.14)

where

A notational clumsiness has appeared in (6.7.14) that will recur frequently in the text; it has to do with the symbols SZ and ZS. Since Z is a true vector it is not meaningful to associate it with a particular frame, and yet the notation seems to imply that it is so associated. The components X‘ themselves are not subject to this criticism; the overhead bar indicates that the components are taken relative to the moving frame (which is why they are constant). The only meaning the overhead bar on SZ can have is to suggest that these constant components will be the ones to be employed in subsequent calculations that use components. Once 5 and Z appear in the form (W x SZ)2, which, standing for (Z x SZ). (5x 9,is a scalar quantity, it is clear

210

SIMPLIFYING THE POINCARGEQUATION WITH GROUP THEORY

this quantity could equally as well be written (w x x ) ~Even . a hybrid expression like w x E could enter without error provided components of x and w and their cross product are all taken in the same frame.7 As usual, to simplify the T , one can choose body-fixed axes and orient them along the principal axes. J i that case the kinetic energy is given by

(I

1 1 2 T = -M u 2 + T2Z22 -i3G32) . (6.7.16) 2 2 For substitution into the Poincark equation we calculate partial derivatives of Eq. (6.7.16), assuming the elements lpare constant because (in this case) the axes are fixed in the body:

+

+

(6.7.17) where parentheses indicate absence of summation. Before including external forces, we consider the force-free case. Substitution into Eq. (6.6.I), with s -+ i& and using structure constants c”& from Eq. (6.7.10)yields

(6.7.18) and cyclic permutations. Clearly the first equation(s) describe translational motion, the second rotational. It is pleasing that the Euler rotational equations and the centroid translation equations emerge side-by-side without having been subjected to individualizedtreatment. Furthermore, though developing the machinery has been painful, once developed, the equations of rotational motion have been written down almost by inspection. Forced motion is described by including the right-hand sides of Eqs. (6.6.l),

M ~ J= ’ -Xi V,

I1ih’ + Z2Z3(73 - 72) = -R1 V

(6.7.19)

and cyclic permutations. The right-hand sides are externally applied force and torque, X2,X3)V are the components of a manifestly respectively. The three quantities (XI, true vector X V , and (R1,R2, R3)V is a true (pseudo)vector R V . * This result will soon be exploited to simplify the equations by permitting freedom in the choice of coordinate frames. The first of Eqs. (6.7.19)applies to translational motion, the second to rotational motion. Whatever influence external forces have on these motions is contained in the dependencies of potential energy V, which, it should be remembered, is a sum over the mass distribution:

(6.7.20) 71t is especially important not to make the mistake of assuming W = --w even though, given that w desribes the motion of the moving frame relative to the fixed frame, this might seem to be the natural meaning of Q. ‘There is a clash between the use of boldface to indicate that XI,Xz,and X3. in addition to being vector fields, are also the components of a three-component object X. In the examples in the next section, the boldface notation will be dropped.

THE RIGID BODY SUBJECT TO FORCE AND TORQUE

211

Hereporential energy V has been written as an explicit sum over the particles making up the rigid body, with “charges” e(i) (which would be defined as masses m(i) = e(j) in the gravitational case) to permit the porenrial V’(x) (potential energy per unit charge) to be regarded as an externally imposed field. In the final form of Eq. (6.7.20), the subscripts ( i ) have been suppressed, as they will be in most of the subsequent equations; they will have to be restored as appropriate. In this way gravitational forces are cast into terms like those used in electromagnetism.We assume V’ is timeindependent in the (inertial) space frame. Corresponding to V’ we define a “force intensity” field at the position of particle (i)

(6.7.21) For simplicity, assume F’ is approximately constant over the body. The first Poincari equation becomes Newton’s law for the motion of a point mass,

(6.7.22) Rotational motion is controlled by applied torque.For unconstrained motion in a uniform force field there is no torque about the centroid. In practical cases of unconstrained motion in a nonuniform force field, the resulting translational motion has the effect of making the force at the position of the body change with time. Since this would make it impossible to decouple the rotational and the translational motion in general, we exclude that possibility and consider rotational motion with one point, not necessarily the centroid, fixed. When using body-frame coordinates, tumbling of the body causes the force com-1 to have nontrivial ponents F;(i)aE e(i)F(i)(Yacting on particle (i) at location variation with time. To get around this complexity, let us work out the right-hand side of the Poincari equation in the space frame, and later use the recently demonstrated fact that force is a vector to obtain its body-frame coordinates. Though it has been necessary to introduce body-frame components Fii)u,it is not necessary to intoduce a symbol p’ as the force intensity F’ is a true vector. Assume F’ is constant in space and time, with its value being Fb, a vector pointing along a fixed-in-space direction 6, whose (space-frame) components are q‘. (For example, in a uniform gravitational force field, F’ = g+, FA = g, and ij would usually be taken to be -k, pointing vertically downward along the z-axis.) Hence we have

a v-1 - -FA+ ax

or V ’ ( X )= - F A X ’ ~ZE ~ -FA(~.

6).

(6.7.23)

According to Eq. (6.3.9),the rotation generators are Rj = €inmXn(B/BXm),and the right-hand side of the rotational equation (6.7.19)becomes

212

SIMPLIFYING THE POINCARE EQUATION WITH GROUP THEORY

This can be recognized to be the external torque, which is a true vector (actually a pseudovector). Taking advantage of its invariant property, as anticipated above, its body-frame components can be substituted directly into Eq. (6.7.19),

Ilk’ + Sj2553(T3 - 72) =

Z(Ex F),,

(6.7.25)

and cyclic permutations.

Problem 6.7.4: To an observer stationed on the rigid body, the gravitational field, though spatially uniform, has a varying direction $ ( t ) . (a) Show that its time derivative is

q=qxz.

(6.7.26)

Justify the sign, and (since the bars on the symbols for the vectors are either ambiguous or redundant) break out the same equation into equations for the separate body-frame coordinates. (b) The potential energy V of the body (not to be confused with potential V’) acquires time dependence because it depends on the body’s orientation relative to the gravitational axis or, if you prefer, on the orientation of the gravitdonal axis in the body frame. This can be expressed functionally as V = V ( q ( t ) ) . The (time-varying)“gradient” of this function is VqV. Show that the Poincark equation can be written

-

I,&’

+ &3(T3

- 72) = (7j x VFV)L,

(6.7.27)

with cyclic permutations. Paired with Eq. (6.7.26), this is known as the EulerPoisson equation. Its virtue is that the vector ;ri is independent of position in the body, facilitating the calculation of V (7). 6.8. EXAMPLE: ROLLING SPHERE A spherical body has the simplifying feature that the elements of its moment of inertia tensor are constant even in the space frame. This permits us to use space-frame velocities as the quasi-velocities in the Poincark equation. Also, it seems rather pedantic to continue to employ boldface symbols for the generators and partial derivatives that enter only for the purpose of emphasizing their interpretation as vector fields. We therefore review and simplify somewhat.

6.8.1. Commutation Relations Appropriate for Rigid Body Motion The continuous transformation equations (Eq. (6.7.1))

x i = b‘ + O $ z k

(6.8.1)

EXAMPLE: ROLLING SPHERE

213

FIGURE 6.8.1. Vector diagram illustrating the &, the generator of infinitesimal rotation about the z-axis.

form the “Euclidean Lie group in three dimensions” because they preserve lengths and angles. The infinitesimal displacement generators are

(6.8.2) and (as shown in Fig. 6.8.1), the infinitesimal rotation operators areRi = cijkxJa/axk,

a

a

R, = y- - z--, az ay

Ry = z-

a

ax

a

- x--,

az

R,= x -

a

ay

- y-,

a

ax

(6.8.3)

and the latter are equivalent to (6.8.4) These operators satisfy commutation relations

[Xi, Xjl = 0, (6.8.5)

[xi, R j ] = -6ijkXkt [Ri,Rjl = -cijkRk,

which constitute the “Lie algebra” of the Euclidean group. The Poincark equations, with potential energy U , are

d aT dt a w p

aT a d -X

ctp(q)wp-

T = -X,U.

(6.8.6)

214

SIMPLIFYING THE POINCARC EQUATIONWITH GROUP THEORY

This is the equation for quasi-coordinatenp,whose corresponding quasi-velocity I 7 i P and whose infinitesimal generator is X p = a / a n P . We interpret ( x , y. z) as the laboratory coordinates of the center of mass of a moving rigid body, ( u x , u y , u z ) as the corresponding velocities, and (ox,oy,oz)as the instantaneous angular velocity of the body, as measured in the laboratory. The structure coefficients c i P were defined in Eq. (6.6.5):

is

o P

[Xp

1

x, I = cKpo x, .

(6.8.7)

They can be obtained simply by identifying coefficients in Eq. (6.8.5).

6.8.2. Bowling Ball Rolling without Slipping

The sphere of unit mass and unit radius shown in Fig. 6.8.2 rolls without slipping on a horizontal plane, The moment of inertia of such a sphere about a diameter is I = 0.4. For specifying rotational motion, one uses axes parallel to the fixed-frame axes, but with origin at the center of the sphere. Since the Poincad equation is to be used, there is no need for concern that these would not be legitimate as Lagrangian generalized coordinates. There are two conditions for rolling without sliding,

FIGURE 6.8.2. Bowling ball rolling without slipping on a horizontal alley. In addition to constraint force components Rx and Ry and external forces Fx and Fy, which are shown, there are also possible external torques about the center of the sphere Kx, Kv, and Kz. All vertical force components cancel.

EXAMPLE: ROLLING SPHERE

215

and these imply

To record the structure constants in an orderly way let us assign indices according to (w”,wy, wL,x, y) + (1,2,3,4,5). There are five nontrivial Poincark equations, even though at any instant there are only three degrees of freedom. The excess is accounted for by the two conditions for rolling. The Lagrangian expressed in quasivelocities is 2 1 L =2 (Iwl

+ I J 2+ Iw32 + w42 + w ” )

(6.8.10)

The nonvanishing derivatives of L are

(6.8.11) The nonvanishing commutators are

(6.8.12)

Because the ball stays in the same plane, it is not necessary to retain x 6 = a/az. The nonvanishing operators appear on the right side of EQ.(6.8.12), and the corresponding structure constants are their coefficients: Ci3

= -1,

c32 1 = 1,

c213 = 1,

c$ = -1,

- -1, c453 - -1,

cl;

c;3 = I ,

c;4 = -1.

c312

= 1,

(6.8.13)

c& = 1,

Let the transverse components of the force of ccnstraint be R4 and Rs, and allow for the possibility of external transverse force components F4 and F5 (for example

216

SIMPLIFYINGTHE POINCARCEQUATION WITH GROUP THEORY

because the plane is tilted) as well as external torques ( K I ,K2, K3) about the center of the sphere. The constraint force itself provides torque ( R 5 , - R 4 , 0 ) . The vertical components of F and R need not be introduced, as they will always cancel. The Poincark equations are

+ K1, lh2 = -R4 + K2,

lhl = R5

I & 3 = K3,

+

(6.8.14)

h4 w3w5 = R4

+ F4,

h5- 0 3 w 4 = R5

+ Fs.

Reexpressed in more intuitive symbols these become

+ K,, = -R, + K y ,

Ihx = R,

I&'

I&' = K,,

(6.8.15)

fix +ozuY = R,

+ F,,

fi" - o z v x = R ,

+ F,.

As a special case, temporarily suppose that wz = 0; for this to be compatible with the

third equation implies K , = 0. Subsituting from Eq. (6.8.9) the equations become

(6.8.16)

These equations permit the constraint forces to be calculated from the external forces: I 1 R, = F,, 1 IKy - 1+1

+

1 I R , = -l+IFY. 1 + I Kx - -

(6.8.17)

(These equations imply that the absence of external forces and torques implies the absence of constraint forces, so that ignoring all forces gives the correct equations of motion. But this simplification will be seen below to be fortuitous; in general, the forces of constraint should be allowed for, and then eliminated using the rolling conditions.) Substituting Eq. (6.8.17) into Eq. (6.8.16) yields fix

1 =l+IK,+-

1 1+1

F,,

fiy

1 K , -k 1+IFY. 1+1

= --

1

(6.8.18)

EXAMPLE: ROLLING SPHERE

217

These equations show that, in the absence of external torque, the ball responds to external forces like a point mass, but with its apparent mass being increased by the factor 1 I . This result is derived, for example, in Landau and Lifshitz [l], as well as (painfully) in Whittaker [2]. From our derivation above we can see that the result is somewhat oversimplified, in that it makes no allowance for the possibility that the sphere is initially spinning around a vertical axis. To illustrate this possibility (in a special case), return to Eqs. (6.8.16), suppose that external forces and torques vanish, and impose initial condition wz = ~0 # 0. The equations of motion are

+

I 3 = R,,

16, = - R z;z

fix

X1

= 0,

(6.8.19)

+ wZvy = R x ,

$’ - wzux = R,.

The third equation implies that wz remains constant. Using conditions (6.8.8), the equations reduce to (1

+ Z)tY

= uclvy,

(1

+ Z)iY

= -fBovx.

(6.8.20)

These equations imply that the sphere does not travel in a straight line, even when subjected to no external force. In fact, the ball follows a circular path. (This presumably accounts for a bowler’s ability to make a bowling ball curve or a pool player’s ability to escape a seemingly impossible “snooker.”) The vertical component of angular momentum, due initially to the spin imparted to the ball on release, causes the total angular momentum to be not quite horizontal. This requires the constraint to exert a transverse force that causes the ball to curve.

Problem 6.8.1: A (riderless) “skateboard” is a pointlike object supported by a plane surface which has a line defined such that the skateboard slides or rolls without friction along that line, but not at all in the transverse direction. It can also rotate about the axis normal to the surface and passing through the single point of contact. Let the plane be inclined by a fixed angle 0 relative to the horizontal and let (x, y). with y-axis horizontal, be the coordinates of the skateboard in that plane. Let +(I) be the instantaneous angle between the skateboard axis and the x-axis. The skateboard has mass m and rotational inertia such that its rotational kinetic energy is Z42/2. Its potential energy is -mg sin 0 x. (a) Write the Lagrangian L ( x ; i ,j , $), and express the sliding constraint as a linear relation among the velocities. (b) For quasi-coordinates x 1 = x, x2 = y, and x3 = 4, evaluate all coefficients cKpa.

218

SIMPLIFYING THE POINCARE EQUATION WITH GROUP THEORY

(c) Write the Poincar6 equation for + ( t ) . Solve it for the initial conditions4(0)= 0, &O) = 00. (d) Write the Poincari equations for x and y. Solve them assuming the skate is at rest at the origin at t = 0. [Answer: x = sin2uot/(20& y = (Ogt sin 2oot)/2w$.l (e) As time is allowed to increase without limit, give the maximum displacements down the hill and horizontally along the hill.

Problem 6.8.2: A spherical marble of unit mass rolls without sliding on the inside of a circular cylinder whose axis is perfectly vertical. If the marble is released from rest it will obviously roll straight down with ever-increasing speed. Assuming it is released with finite initial transverse speed, solve for its subsequent motion. Be sure to allow for the possibility of initial angular velocity about an axis through the point of contact and normal to the surface. You should obtain the (surprising) result that the ball does not “fall out the bottom,” but rather oscillates up and down the tube. Problem 6.8.3: Hospital beds and some carts roll on wheels attached by casters that swivel at one end and are fixed at the other, as shown in Fig. 6.8.3. To control the position ( x , y) and angle 0 of the cart, forces F j or F, are applied at the midpoints between the wheels. (a) Write the equations of motion and constraint equations governing the motion. (b) Discuss the relative efficacy of pushing the cart from the fixed and swivel ends and explain the way you expect the solutions of the equations of motion to analytically confirm this behavior. (c) Complete the solution discussed in the previous part.

Problem 6.8.4: This problem exhibits the true power of the Poincari, Lie algebraic approach, but don’t expect to complete it in less than a day. It explains the motion of a “tippie-top,’’ the curious toy which, when set spinning “right way up” ends up

FIGURE 6.8.3. The wheels at one end of a rolling cart or hospital bed are “fixed,”while those at the other end are free to swivel. The cart can be propelled by forces Ffat the fixed end or Fs at the swivel end.

EXAMPLE ROLLING SPHERE

219

spinning “upside down” until it eventually falls over. The tippie-top will not be mentioned again however, since the system to be analyzed is slightly idealized. A solid plastic spherical bowling ball, radius R, mass M ,has one (instead of the usual three) shallow holes drilled on its periphery. This hole is filled with some material of density different from that of the plastic, so that the outer surface is again a perfectly smooth sphere. The drilled out hole is small enough to be treated as a point mass m at point P. (To cover the possibility that the filling material has lower density than plastic, m may be negative.) The ball rolls or spins without sliding on a hard surface. Friction (coefficient p ‘2’ 0.1) prevents the ball from sliding. For many purposes dissipation (due to the friction) is negligible, but to avoid unphysical residual motion as t --f 00 one can assume the ball is flattened in a tiny circle where it contacts the floor. (a) Define variables describing the motion of the ball, including the position of point P. Specify relations among redundant variables clearly. Define all relevant forces and torques and give formulas for them. Write formulas describing the constraints and make the no-slip restriction explicit. (b) Write equations of motion for the system that are free of approximation (other than those already mentioned.) It is not realistic to solve these problems in complete generality (except numerically) but there are special cases where the motion can be described approximately for some appreciable period of time by analytic formulas, and they can be used to infer how the motion will evolve qualitatively over long periods of time. The approximation that is most productive is to treat the extra mass perturbatively, assuming m << M .You are expected to use discretion as to what approximations are both useful and valid. In this spirit, discuss the motion in the following cases, giving information about the (quantitative) early motion and the (qualititative) eventual motion. Dependence of qualitative features on initial angular velocity should be featured. (c) The ball is initially approximately “sleeping”-spinning with angular velocity ~0 about a vertical axis, with the point P almost on the rotation axis and the centroid is at rest. Consider all four possibilities: P near top or bottom, m > Oorm 4 0. (d) Same setup as in (c) but the point P is initially at angle 80 relative to the vertical axis. Again consider all four possibilities. Attempt to find a “pure precession” in which the height of point P stays constant. (e) The ball is launched with no vertical spin but rolling straight down the bowling lane. The loaded point P starts out exactly at right angles to the motion so its initial velocity is equal to the centroid velocity. (The predicted motion can be persuasively confirmed by rolling a tippie-top on its side. It can also be used to understand how a bicycle can be steered by leaning.)

220

SIMPLIFYING THE POINCARE EQUATION WITH GROUP THEORY

BIBLIOGRAPHY References 1. L. D. Landau and E. M. Lifshitz, Mechanics, Pergamon, Oxford, 1976, p. 124.

2. E. T. Whittaker, A Treatise on the Analytical Dynamics of Particles and Rigid Bodies, Cambridge University Press, Cambridge, UK, 1989.

References for Further Study Section 6.1 M. Hamermesh, Group Theory and Its Application to Physical Problems, Addison-Wesley, Reading, MA, 1962.

Section 6.7.2 N. G. Chetaev, Theoretical Mechanics, Springer-Verlag. Berlin, 1989.

Section 6.8.2 V. I. Arnold, V. V. Kozlov, and A. I. Neishtadt, Dynarnical Systems 111. Springer-Verlag,Berlin,

1990.

7 CONSERVATION LAWS AND SYMMETRY The material in this chapter does not belong exclusively to Newtonian, Lagrangian, or Hamiltonian mechanics, as conservation laws can be derived using any of the formalisms. It is placed here largely to provide examples that exercise the Poincark approach and to show that certain ideas familiar from Lagrangian mechanics carry over trivially to “Poincark mechanics.” The main area that can be said to be strikingly simplified by the Poincark procedure is that of “integrability,” which is briefly discussed. Discussion of Noether’s theorem, which provides the most fundamental description of conservation laws, will also be discussed briefly here, even though it might be be more appropriately deferred until after discussion of the tangent space and the development of symplectic mechanics in Hamiltonian context. To make the present discussion self-contained, the “tangent bundle” is introduced here, even though the main discussion of it will be reserved for a later chapter, where it will be reintroduced. Because continuous transformation groups provided the original motivation for the Poincark equation, and because the mathematical description of symmetry is based on groups, it is natural to use the Poincark equation to investigate the effect of symmetry on mechanical systems. 7.1. MULTIPARTICLE CONSERVATION LAWS 7.1.l.Conservation of Linear Momentum The kinetic energy of a system of N particles is

221

222

CONSERVATIONLAWS AND SYMMETRY

xzl

x

Here, and in the future, replacements like x(i) + x will be made to reduce clutter. The presence of the summation sign is the only reminder that there is one term for each of the N particles. An essential simplifying feature of these rectangular coordinates, one that is not valid in general, is that the coefficients of the quadratic velocity terms are independent of position. B B The particle-(i)-specific infinitesimal displacement generators are r , K ,and The Poincark equation for D;) = i ( i ) is

&.

“(If

Z(i)

not different from the Lagrange equation, or for that matter from Newton’s equation. This has used the fact that = 0, which is a manifestation of the invariance of kinetic energy under pure translation. Defining total mass, centroid displacement, and velocity by

2

M

=Em,

MX = x m x ,

MV = x r n v

(7.1.3)

and summing Eqs. (6.8.13) yields (7.1.4)

where the operator

(7.1.5) is the infinitesimal (vector) generator that translates all particles equdly. The somewhat ad hoc inclusion of an overhead arrow is to indicate that mere is one operator for each component. Operating on -U,the vector of operators yields the components of the total force FtOt. Suppose translation of the mechanical system parallel to the ( x , y) plane generates a “congruent” system. This would be true for a displacement parallel to the earth’s surface (when treating the earth as flat) with vertical gravitational acceleration g . In this case

&

a -u Bx

a

= ---u = 0,

By

(7.1.6)

with the result that M V X and MVY are “constants of the motion.” But M V Z is not constant in this case. The case of linear momentum conservation just treated has been anomolously simple because of the simple relation between velocity and momentum. A more gen-

223

MULTIPARTICLE CONSERVATION LAWS

era1 formulation would have involved defining the linear momentum vector p(i) by (7.1.7) which would have led to the same result. 7.1.2. Rate of Change of Angular Momentum: PoincareApproach

For a rigid body rotating with one point fixed, the kinetic energy summation of Eq. (7.1.1) can be reexpressed in terms of particle-specific angular velocities about the fixed point (w&, w&, wti)),(see Eq. 6.7.15), T , , ~=

1

Cm ( ( y 2 + z2)wx2+ ( x 2 + .z2)wy2

+ ( x 2 + y2)wz2- 2yzwywz - 2xzwxwz - 2xyw wy > ,

(7.1.8)

(even though (w&, w ; ) , w f i ) ) are invalid Lagrangian velocities). Define particlespecific operators

We have seen previously that these are equivalent to

where 4 = ( # x , # Y , qiZ) is a “vector” of quasi-angles. Since appearance of these angles would not be valid in Lagrangian mechanics, the following development would not be valid there (though the same results can be obtained from Newton’s equations). It does represent a considerable emancipation to be able to work, guiltfree, with angular velocity components. We also define

(7.1.11) The first term needed for substitution in the equation for quasi-velocity wx is

224

CONSERVATION LAWS AND SYMMETRY

Also, using structure constants from Eq. (6.8.13),

a Trot

1

-(-aZ)= -(x2 m awy

1 aTrot -(d)m

ad

= ( x2

+ z 2 ) w W + y z d 2 + xyw”wZ,

+ y2 )w ywz - yzoy* - x z o x o y .

(7.1.13)

On substitution into the Poincark equation, these terms all cancel. Defining

a Trot

L, = a d’

a Trot

L , = -, awy

L , = - a Trot aoz

(7.1 14) I

’

the Poincark equation for L , is

d -L, = -R,U

dt

=

BU

(7.1.15)

with similar equations for L, and L,. These are the three components of angular momentum vector L and

s

L = --u

a4

= K,

(7.1.16)

where K is called the applied torque. If the externally applied force applies no torque around some axis, say the zaxis, then BU/B#, = 0, and it follows that the component of L along that axis is conserved. If U is independent of direction (“isotropic”), then all components of the angular momentum vector L are conserved. Problem 7.1.1: In the derivation of angular momentum conservation just completed, no account was taken of “internal” forces of one mass within the rotating object acting on another. Show that including such forces does not alter the result.

7.1.3. Conservation of Angular Momentum: Lagrangian Approach Proof of the conservation of angular momentum in the absence of external forces also is easy using ordinary Lagrangian methods. Under an infinitesimal rotation A#, each particle radius vector is shifted r --+ r A# x r, and its velocity is shifted similarly v --+ v A# x v. With no external forces, the Lagrangian and the kinetic energy are equal; under the same rotation, its change is given by

+

+

MULTIPARTICLE CONSERVATION LAWS

225

In the last step, both the defining relation for p and the Lagrange equation for p have been used. Defining angular momentum

L=Crxp,

(7.1.18)

dL - = 0. dt

(7.1.19)

it follows from Eq. (7.1.17) that

Problem 7.1.2: For the field sources listed, indicate what components of P and L are conserved. For definiteness suppose that the sources are static electric charge distributionsand that the mechanical system under analysis is subject only to electric forces due to those sources. The particles making up the mechanical system have arbitrary masses and charges, but the fields due to their charges are to be ignored. In each case, indicate which choice of axes best takes advantage of the symmetry and assume that choice has been made. (a) A plane is uniformly charged. (b) An infinite circular cylinder is uniformly charged. (c) The surface of a noncircular infinite cylinder is uniformly charged. (d) Two parallel infinite lines have equal, uniform charge densities. (e) Two points have equal charges. (f) An infinite cone is uniformly charged. (g) A torus, circular in both cross sections, is uniformly charged. (h) An infinite solenoid of uniform pitch is uniformly charged. The pitch is such that the distance along the solenoid’s axis at which the helix has made one revolution is A. In this case, the only conserved momentum is a combination of two of the elementary momenta.

7.1.4. Conservation of Energy Suppose that both T and U are independent of time. Multiplying P o i n c d equation (6.8.6) by up,summing over p , and utilizing the antisymmetry of cApp to justify setting o P c A p p o=~0 yields

= u P X p ( T- U).

(7.1.20)

The last two terms can be merged using

d aT - ( T - U )= -hp dt aOp

+ u P X P ( T- U).

(7.1.21)

226

CONSERVATIONLAWS AND SYMMETRY

(This equation would acquire an extra term ( a / a t ) ( T - U ) if we were not assuming this quantity vanishes. This extra term would otherwise have to be included in the following equations.) We obtain (7.1.22) Defining a new function

aT h(w,q)=mp--T+U

(7.1.23)

a0p

and integrating Eq. (7.1.22) yields (7.1.24)

h ( w , q) = constant = Eo.

This formula says that the function h remains equal to its initial value Eo. In our present approach the function h, obviously related to the “Hamiltonian,” makes its appearance here for the first time. It is a dynamical variable whose numerical value remains constant, equal to initial energy Eo. Technically the function h, though equal in value to the energy, cannot legitimately be called the Hamiltonian, however, since its functional dependency is incorrect.

Problem 7.1.3: Starting from the Euler equations of a freely moving rigid body with one point fixed, I,;’

3 1 i2&j2 = (z3 - il)@ ,

= (i2- 13)0203,

3 I3h

= (iI - i2)@

1 2

,

(7.1.25) exhibit explicitly the constancy of both energy h and the total angular momentum squared, L2 = Lx2 LY2 Lz2.

+

+

7.2. CYCLIC COORDINATES AND ROUTHIAN REDUCTION One of the important problems in “Dynamical System Theory” is that of reduction, which means exploiting some conserved quantity of the system to reduce the dimensionality of the problem. Normally the term “reduction” also includes the requirement that the equations retain the same form, be it Lagrangian, Poincark, or Hamiltonian, in the reduced number of unknowns. Even when a constant of the motion is known, it is not necessarily easy to express the problem explicitly in terms of a reduced set of variables. This section considers the simplest example of reduction. This is not essentially different from a procedure due to Routh for reducing the Lagrange equations to take advantage of an ignorable coordinate. Within the P o i n c k formalism the procedure is quite analogous. The procedure can also be regarded as a variant of the procedure for deriving Hamilton’s equations.

CYCLIC COORDINATES AND ROUTHIAN REDUCTION

227

Suppose that one quasi-coordinate, say q l , and its corresponding quasi-velocity u l , have the property that ( d / d t ) ( a T / a u ’ ) is the only nonvanishing term in their

Poincark equation. This equation can therefore be integrated immediately: aT

(7.2.1)

=B1,

where is an integration constant. Two conditions sufficient for this simplification to occur are that all relevant commutators vanish, [ X I ,X,] = 0, and that X l ( T U ) = 0. In this case the variable q is said to be “cyclic.” Elimination of q1 and u1 begins by solving Eq.(7.2.1) for u1: u 1 = u 1(q2 , ..., q “ ;B1, u2 , ..., u”).

(7.2.2)

For the method to work, this has to yield an explicit formula for u1. The “Routhian” is defined by R ( q2 , . . . , q n ;B1, v 2 , . . . , u”) = T - U - u 1-.aT

ad

(7.2.3)

The absence of q’ is due to the fact that q’ is cyclic, and the absence of u1 is due to the fact it will have been replaced using Eq.(7.2.2). The Poincak-based Routhian reduction continues by writing the Poincar6 equations for the remaining variables:

(Terms with coefficients c A l pnecessarily vanish.) The quantity B1 can be treated as a constant parameter as these equations are solved. After they have been solved, u1 can be found by substituting into Eq.(7.2.3) and then differentiating with respect to B1:

(7.2.5) which follows from substituting Eq. (7.2.1) into Eq. (7.2.3).

Example 7.2.1: Symmetric Top. Consider the axially symmetric top shown in Fig. 7.2.1, rotating with its tip fixed at the origin. Let its equal moments of inertia be 11 and its moment of inertia about the ?-axis be 13. Its body axis angular velocities sx‘, sy‘, and sz’are related to the Euler angular velocities $ , 6 , and li/ by the relations sin 8 sin

+

cos @ (7.2.6)

cos 9

0

1

228

CONSERVATION LAWS AND SYMMETRY

FIGURE 7.2.1. An axial symmetric top rotating with its tip fixed. its orientation is determined by Euler angles 4, 0, and @. The vertical force of gravity mg acts at its centroid C, at distance e from the tip. “Space axes” are x , y , z. “Body axes” are x’, y’, z‘.

The kinetic energy is given by 1 1 T = -11 (d2 s Y ’ ~ )+ - Z ~ S ” ~ 2 2 1 1 = -Zl(sin28d2 d2) -Z3(cos2 &j2 2cosBd$ 2 2

+

+

+

+

+ $’),

(7.2.7)

and the potential energy by

u = mgecose,

(7.2.8)

’,

neither of which depend on $, which is therefore chosen as the variable q and = u’ in the Routhian reduction. (This step was made possible by a cancellation depending on the equality of 11 and 12 and by the absence of “commutator terms” in the Poincark equation which, in this case, is simply the Lagrange equation because legitimate generalized coordinates are being used.) The relations corresponding to Eq.(7.2.1) and (7.2.2) are

6

aT

- = 13(cos8&+ 4)= / ? I ,

all-

B1 and u 1 = -.

(7.2.9)

13

The Routhian is

. a~

R = T - V - @ Y

all-

1

= - I1 (sin2ed2 2

+ 8’) + pl, cos8d - mgl cos 8

-

812

-,

213

(7.2.10)

CYCLIC COORDINATES AND ROUTHIANREDUCTION

229

Since the final term is constant, it drops out of the “Lagrange” equations obtained from R.

Problem 7.2.1: After the Routhian reduction just performed for the symmetric top, the Routhian R is independent of the coordinate 9. Perform another step of Routhian reduction to obtain a one-dimensional equation in Lagrangian form. Any such equation can be “reduced to quadratures.” 7.2.1. Integrability; Generalization of Cyclic Variables The Routhian reduction just studied was made possible partially by the absence of “commutator terms” in the Poincare equation. But reduction may be possible even if some of these terms are nonvanishing. It is possible for all terms in the Poincari equation except ( d / d t ) ( a T / a v ’ )to cancel even when some of the cAP1 coefficients are nonvanishing. With T given by (7.2.11) and assuming X I( T - U )= 0, the unwanted terms are (7.2.12) which vanishes if

Example 7.2.2: Consider the same axially symmetric top, subject to gravity, spin2 2 ning with its lower point fixed. With body axes, T = (1/2)(11w1’ Z2w2 13w3 ). If 11 = 12, then the Poincari equation for @3 = $r, which is a quasi-angle of rotation around the instantaneous axis of the top. is

+

13h3= -X3U = 0.

+

(7.2.14)

Since rotation around the top axis does not change the potential energy, X3U = 0, and as a result w3 is a constant of motion. Note that this cancellation is not “generic” in that ZI = 12 cannot be exactly true in practice. On the other hand, in the absence of this cancellation the equation of motion is nonlinear, which almost always leads to bizarre behavior at large amplitudes. Hence one can say that bizarre behavior is generic. There are a small number of other known choices of the parameters in rigid body motion with one point fixed that lead to completely integrable systems. (See Arnold et al. [l].)

230

CONSERVATIONLAWS AND SYMMETRY

7.3. NOETHER’S THEOREM I n this section (and only this section) we use notation slat instead of d l d t f o r the “total time derivative.” The reason for this is that a new subsidiary variable s will be introduced and the main arguments have to do withfunctions f ( t , s ) and derivatives holding one or the other o f f and s constant. For much of Lagrangian mechanics the Lagrangian can be regarded as a purely formal construct whose only role is to be differentiated to yield the Lagrange equations. For this purpose it is not necessary to have more than an operational understanding of the meaning of the dependence of the Lagrangian on q. But here we insist on treating L ( q , q) as a regular scalar function on the configuration space M of coordinates q augmented by the tangent spaces of velocities. We must therefore define carefully the meaning of the dependence on q. Based on curves q ( t ) in configuration space, parameterized by time t , one constructs the so-called tangent spaces TM, at every point q in the configuration space. TM, is the space of possible instantaneous system velocities at q. This space has the same dimensionalityn as does the configuration space M itself. The union of the tangent spaces at every point is known as the “tangent bundle” TM. It has dimensionality 2n with a possible choice of coordinates being q ’ , q 2 , . . . ,q” and the remaining n coordinates being the corresponding “natural” velocity components introduced in Eq.(3.1.8). The Lagrangian L ( q , q) is a scalar function on TM. We have seen many examples in which symmetriesof the Lagrangian are reflected in conserved dynamical variables. The simplest of these involve “ignorable coordinates.” For example, the absence of z in the Lagrangian for particles in Euclidean space subject to potential U ,depending on the xi and yi components but not on the z , components, (7.3.1) implies the conservation of total momentum p z ; this follows trivially within the customary Lagrangian formalism. The ad hoc nature of such conclusions should seem mildly troubling and one wishes there were a more general theoretical construct from which all such conserved quantities, or at least broad classes of them, could be derived. Such a construct would exhibit simple invariance properties (symmetries) of the Lagrangian and associate a definite conserved dynamical variable with each such symmetry. This is what Noether’s theorem accomplishes. Since the Lagrangian is a scalar function, constancy under particular transformation of its arguments is the only sort of symmetry to which it can be subject. For example, the Lagrangian of Eq. (7.3.1) is invariant under the transformation x+x,

y+y,

z--,z+s,

(7.3.2)

where s is an arbitrary parameter, provided this configuration space transformation is accompanied by the following tangent space transformation: (7.3.3)

NOETHER’S THEOREM

231

From a physical point of view, the fact that these configuration space and tangent space transformations have to go together in this way is an obvious requirement for descriptions of the same physics in two frames that differ only by constant displacement s along the z-axis. From a mathematical point of view, a smooth transformation f : M -+ M mapping q to f(q) implies a transformation f*, : TM, + TMf(q)from the tangent space at q to the tangent space at f(q). For any particular curve through q, this maps its instantaneous velocity q inferred at q into f*q(q).which is equal to the instantaneous velocity inferred at f(q). “Infer” here means “go through the usual limiting procedure by which instantaneous velocity is extracted from a system trajectory.” f*s is an n-dimensional transformation. Geometrically, the output of this transformation is a vector, which is the reason a boldface symbol has been used for f. Combining the transformations f*q at all points q we obtain an n-dimensional transformation f* at every point in an n-dimensional domain; it maps the entire tangent bundle. The coordinate map can depend on a parameters and therefore be symbolized by F,and the corresponding map of velocities is symbolizedf”, . Consider a valid system trajectory q(t). At every time t the point q(t) can be mapped by F to yield a point q(s, t). This yields a family of curves that are individually parameterized by t with different curves distinguished by parameter s (as illustrated in Fig. 7.3.1). The s = 0 curve satisfies the equations of motion, but the curves for other values of s will not be valid system trajectories, except under conditions to be considered next. The Lagrangian system is said to be invariant under a mapping f if

THEOREM 7.3.1: Ifa Lagrangian system is invariant for the transformations F along with the induced transformutions fs for all values of the parameter s, then the

(I(0. n’ -

/

FIGURE 7.3.1. A family of curves, each parameterized by time t, with the different curves differentiated by s.The curve with s = 0 is a valid system trajectory, but the curve with s # 0 is typically not.

232

CONSERVATION LAWS AND SYMMETRY

quantity

is a constant of the motion.

The middle term in Eq.(7.3.5) uses the notation first introduced in Eq.(2.2.3). Since is covariant and W( ) is contravariant, the quantity Z(q, q) is invariantly

Is=o

defined, independentof the choice of coordinates, but its practical evaluation requires the use of coordinates as spelled out in the rightmost term of Eq. (7.3.5).The former notation will be used in the proof that follows.

Pro08 The analytic statement of the invariance of L under P is

When this equation is expressed in terms of the function r,it becomes (7.3.7)

With the assumed invariance, the same variational calculation showing that q ( t ) satisfies the equations of motion shows that q(t, s) satisfies the equations of motion also, for all s. As a result, the Lagrange equations are satisfied as identities in s: (7.3.8)

Proceeding to calculate a I / a r (while recalling that the notation a / a t is a total time derivative with s held constant) and using Eq. (7.3.8), we obtain

(7.3.9)

In the final step, the order of s and t derivatives has been reversed and Eq. (7.3.7) has been used. This completes the proof.

Exumple 7.3.1: For the transformation given by Eqs. (7.3.2) and (7.3.3), we have (7.3.10)

where e, is a unit vector along the z-axis. With the Lagrangian of (7.3.1) we obtain

BIBLIOGRAPHY

233

The assumed Euclidean geometry and the orthonormal property of the coordinates has been used here (for the first time) in evaluating the invariant product as an ordinary dot product of two vectors.

Example 7.3.2: Consider next rotation about a particular axis, say the &-axis, through angle s. According to Eq. (4.2.24), keeping only the term linear in s, we have xi + x j + s & x x j + . . . ,

(7.3.12)

and therefore that (7.3.13) If the Lagrangian is invariant under transformation (7.3.12) (as (7.3.1) would be if U were invariant to rotation around &), then the Noether invariant is given by

This shows that the component of angular momentum about the &-axis is conserved.

Problem 7.3.1: Formulate each of the invariance examples of Problem 7.1.2 as examples of Noether’s theorem and express the implied conserved quantities in the form of EQ. (7.3.5). BIBLIOGRAPHY

Reference 1. V. I. Arnold, V. V. Kozlov, and A. I. Neishtadt, Mathematical Aspects of Classical and

Celestial Mechanics, 2nd ed.,Springer-Verlag. Berlin, 1997, p. 120.

Reference for Further Study

Section 7.3 V. I. Arnold, Mathematical Methods of Classical Mechanics, Springer-Verlag. 1978, p. 88.

This Page Intentionally Left Blank

IV NEWTONIAN MECHANICS

Roughly speaking, Newtonian methods work directly with position vectors (and their geometric generalizations) without introducing variational principles or artificial functions such as Lagrangians. The prototypical Newtonian equation, F = m a , expands to being a system of second-order, ordinary differential equations containing the measurable kinematic, inertial, and dynamic quantities. The Newtonian formulation is usually regarded as elementary, and the reader is assumed to be conversant with it. It is the introduction of the concept of gauge invariance, because it depends heavily on geometry, that justifies having delayed returning to the Newtonian approach until now. By relating descriptions in coordinate frames accelerating relative to each other, gauge invariance generalizes the concept of Galilean invariance, which applies only to inertial systems having constant velocities relative to each other. There will be a certain amount of repetition of material already encountered, for example concerning the Cartan matrix 51, whose role here will be more “physical,” with the emphasis being on reconciling descriptions made by observers in different frames of reference.

This Page Intentionally Left Blank

GAUGE-INVARIANT MECHANICS

Geometry as the basis of mechanics is a theme of this textbook. Though it may not have been recognized at the time, the importance of geometry was already made clear in freshman mechanics by the central role played by vectors. The purpose of this chapter is to develop a similar, but more powerful, algebraic/geometric basis for mechanics. However, unlike the chapters just completed, the approach will be Newtonian, with no artificial Lagrangian or Hamiltonian-like functions being introduced and no variational principles. The description of motion in noninertial frames of reference will be of central importance. To indicate the intended style, we begin by reviewing vector mechanics.

8.1. VECTOR MECHANICS

8.1 .l.Vector Description in Curvilinear Coordinates In its simplest form, Newton’s law for the motion of a point particle with mass m (an inertial quantity) subject to force F (a dynamical quantity) yields the acceleration (a kinematical quantity): a=

F m

-.

(8.1.1)

This is hypothesized to be valid only in an inertial frame of reference. In such a frame, the acceleration vector is given by (8.1.2)

237

238

GAUGE-INVARIANTMECHANICS

where r(t) is the radius vector from the origin. The traditional notation has been used of replacing d / d t , the “total derivative” taken along the actual particle trajectory, by an overhead dot. For actual c_om_putation,it is often appropriate to introduce unit vectors such as @,?,3 or @, 8, q5), with the choice depending, for example, on the symmetry of the problem. With Euclidean geometry being assumed implicitly, these are “unit vector”’ triads, mutually orthogonal and each having unit length. The “components” of r are then given (in rectangular and spherical coordinates) as the coefficients in

r = x%+ y y + z2= rF.

(8.1.3)

The component-wise differentiation of this vector is simple in the rectangular form, because the unit vectors are constant, but it is more complicated for other coordinate systems. In the case of spherical coordinates (see Fig. 8.1.l), as the particle moves, a local unit vector,?for example, varies. As a result, the velocity v = dr/dt is given by v

= u%+

A

+ u @ 4= tT+r k h

(8.1.4)

Already at this stage there are minor complications. One is notational: in this text, when (r, 8,d) = ( q ’ , q 2 ,q 2 ) are taken as “generalized coordinates,” we refer to ( q ’ , q 2 , q 2 ) as their “generalized velocities,” and these are not the same as u‘, u’, and [email protected], symbolizing the time derivative of r by v, we have to accept the fact that the components of the time derivative are not equal to the time derivatives of the components (except in rectangular components). Finally, formula (8.1.4), which is intended to give the velocity components, still depends on the rate of change of a basis vector.

FIGURE 8.1.1. For a particle moving instantaneously in the plane of the paper, the direction of the radial unit vector ?(f) at its instantaneous location varies with time, but its length remains constant. Instantaneously,7 is rotating with angular velocity o about the axis w normal to the paper. ’In this chapter, and only in this chapter. “unit vectors” are defined to have unit length. In other chapters, a unit vector is usually a vector pointing along the curve on which its correspondingcoordinate varies while the other coordinates are held fixed, and “unit” implies (in a linearized sense) that the coordinate increases by one unit along the curve. To reduce the likelihood of confusion, the overhead “hat” symbol will be used only for vectors having unit length.

239

VECTOR MECHANICS

In general, a vector can vary both in magnitude and direction, but a unit vector can vary only in direction-with its tail fixed, the most that can be happening to it is that it is rotating about some axis 8 with angular speed o;together w = w 8 . Consider the radial unit vector 7 illustrated in Fig. 8.1.1. Since its change in time At is given (in the limit) by wAt x F,we have

d?

-=wxr dt

A

(8.1.5)

3

This same formula holds for 5 and and hence for any vector u or unit vector F fixed relative to the coordinate triad (you should convince yourself):

dti

-= w x u . h

(8.1.6)

dt

The change in orientation of the unit triad is due to the motion of the particle; from the geometry of Fig. 8.1.1 one infers h

w=

rxv

(8.1.7)

+

r

Combining this with Eq. (8.1.6) yields h

V r d t i 1 -=-(Fxv)xF=(F~G)--(v*6)-. r r dt r

(8.1.8)

When this formula is applied to each of the three spherical coordinate unit vectors, the results are

+= 65+sinti@&

I

8 = -&,

A

4 = -sine@

(8.1.9)

Substitution into Eq. (8.1.4) yields v=?F+rG+rsint&ii.

(8.1.10)

sin e

i

FIGURE 8.1.2. Components of the velocity vector in a spherical coordinate system.

240

GAUGE-INVARIANT MECHANICS

This has been a circuitous route to obtain a result that seems self-evident (see Fig. 8.1.2) but, if one insists on starting by differentiating Eq. (8.1.4), it is hard to see how to derive it more directly. The reason Eqs. (8.1 .lo) seem self-evident is that it is taken for granted that velocity is a true vector whose spherical and Cartesian components are related as if they belonged to a displacement vector.

Problem 8.1.1: The acceleration can be calculated similarly, starting by differentiating Eq. (8.1.10). In this way confirm the calculations of Section 3.1.6. We have seen then that calculating kinematic quantities in curvilinear coordinates using vector analysis and the properties of vectors is straightforward though somewhat awkward. 8.1.2. The Frenet-Serret Formulas Describing the evolution of a triad of unit basis vectors that are naturally related to a curve in space is one of the classic problems of the subject of differential geometry. It is done compactly using the formulas of Frenet and Serret. If the curve in question represents the trajectory of a particle, these formulas describe only variation in space and contain nothing concerning the time rate of progress along the curve. Also, the case of free motion (in a straight line) is degenerate and needs to be treated specially. For these reasons (and a more important reason to be mentioned later), traditional treatments of mechanics usually rederive the essential content of these elegant formulas explicitly rather than using them as a starting point. A vector x ( r ) pointing from some origin to a point on its trajectory locates a particle’s position P at time t . But because time t is to be suppressed from this treatment we take arc length s along the curve as an independent variable and represent the curve as x(s). To represent differention with respect to s, a prime (as in x‘) will be used in the way that a dot is used to represent differention with respect to t , Any three disjoint points on a smoothly curving space curve define a plane. In the limit of vanishing separation of these points, the plane they determine is known as the “osculating plane.” This metaphorical terminology can be used when two smooth surfaces or curves “kiss,” which is to say, touch without crossing. If the smooth curve happens to be a straight line, no unique osculating plane is determined this way. This fact represents something of a nuisance in applying this formalism. Clearly, the velocity vector v defined in the previous section lies in the osculating plane. However, depending as it does on speed u, it is not a unit vector, so it is replaced by the parallel vector

v dx =; = - EX’. ds

(8.1.1 1)

is known as the “unit tangent vector.” The unique vector t2that also lies in the osculating plane but is perpendicular to and points “outward” is known as the “principal normal” to the curve. From the study of circular motion in elementary mechanics, one knows that the trajectory is

VECTOR MECHANICS

241

FIGURE 8.1.3. Vector construction illustratingthe derivation of the Frenet-Serret formulas. El is the unit tangent vector; ,$2 is the unit principal normal vector.

instantaneously circular, with the center C of the circle being “inward” and lying in the osculating plane as well. Letting p stand for the radius of curvature of this circle, we know that the acceleration vector is - ( u 2 / p ) t 2 , but we must again eliminate references to time. See Fig. 8.1.3. If xb is the tangent vector at the point P in question, then the tangent vector at a distance ds further along the curve is given by Taylor expansion to be xb +Gds +. .. Denoting the angle between these tangents by O(s), the radius of curvature is defined by (8.1.12)

e2is parallel to xg and that lxgl = l/p. Since c2

From the figure it can be seen that is to be a unit vector, it follows that

€2 = p x l l .

(8.1.13)

To make a complete orthonormal triad of basis vectors at the point C we also define the “unit binormal” by

e3

€3

=t 1 x

e2.

(8.1.14)

Proceeding by analogy with the introduction of the radius of curvature, the angle between t3Ip and t 3 ( s d s ) is denoted by #(s), and a new quantity, the “torsion” 1/T,is defined by

+

(8.1.15) This relation does not fix the sign of t . It will be fixed below. The Frenet-Serret formulas are first-order (in s) differential equations governing the evolution of the or-

242

GAUGE-INVARIANT MECHANfCS

(el,

thonormal triad 52, E3) (to be called “Frenet vectors”) as the point P moves along the curve. The first of these equations, obtained from Eqs. (8.1.1 1) and (8.1.13), is 52

=

(8.1.16)

P’

c3

Because the vector is a unit vector, its derivative 6; is normal to €3 and hence expandable in and &. But <$ is in fact also orthogonal to €1. To see this, differentiate the equation that expresses the orthogonality of t1and 5 3 ,

d 0 = -6. €3) = ds

<;

*

+ €1 . r;,

€3

(8.1.1 7)

where, using Eq. (8.1.16) and the orthogonality of <2 and t3,the first term must vanish. We have therefore that ($ is parallel to and the constant of proportionality is obtained from Eq. (8.1.15):

e2

t2.

(8.1 . I 8)

= -7’

the sign of T has been chosen to yield the sign shown. From Eq. (8.1.14), we obtain = t3x which can be differentiatedto obtain cb. Collecting formulas, we have obtained the Frenet-Serret formulas

<*

52 s; = -7.

(8.1.19)

Problem 8.1.2: Show that

_1 -- x’ . (X” XI‘

T

x

. XI‘

X”’)

Problem 8.1.3: If progress of a particle along its trajectory is parameterized by time t , show that the curvature p and torsion l / r are given by

_1 P2

*

-

-= t

. (X x X) (X . X)3 ’

(X x X)

X.(XXX)

(X x X)

. (X

x

X)

.

As they have been defined, both p and T are inverse parameters in the sense that trajectories they quantify become more nearly straight as the parameters become

VECTOR MECHANICS

243

large. For this reason, their inverses l/p, known as “curvature,” and l / t , known as “torsion,” are more physically appropriate parameters for the trajectory. Loosely speaking, curvature is proportional to f and torsion is proportional to 2.It might seem to be almost accurate to say that in mechanics the curvature is more important than the torsion “by definition.” This is because, the curvature being proportional to the transverse component of the applied force, the instantaneously felt force has no component along the binormal direction. This is also why the leading contribution to the torsion is proportional to X. The only circumstance in which the torsion can be appreciable is when the instantaneous force is strongly dependent on position. Unfortunately, in this case, the direction of the principal normal can change rapidly, even when the force is weak. If the motion is essentially free except for a weak transverse force, the principal normal tracks the force even if it is arbitrarily small, no matter how its direction is varying. In this case, the Frenet frame is simply inappropriate for describing the motion, as its orientation is only erratically related to the essential features of the trajectory. Furthermore, the torsion is also, in a sense, redundant because the specification of instantaneous position and velocity at any instant, along with a force law giving the acceleration, completely specifies the entire subsequent motion of a particle (including the instantaneous torsion). It is perhaps these considerations that account for the previously mentioned lack of emphasis on the Frenet-Serret formulas in most accounts of mechanics. &>remains orthonormal, it is related to the triad of inertial Since the triad frame basis vectors (i,?, 2) by a pure rotation and, instantaneously, by a pure angular velocity vector w such as was introduced just before Eq. (8.1.5). This being the case, the Frenet vectors should satisfy Eq. (8.1.6). Applying this result to the Frenet equations yields

(c1 c2,

v€2 w x €1 = -,

P

(8.1.20)

w x €3 = --.v€2 r

Furthermore, w should itself be expandable in terms of the Frenet vectors, and this expansion must be

w = -51 v r

+ Pv 4

3

9

(8.1.21)

as can be quickly checked. Normalized curvature v / p measures the rate of rotation of the Frenet frame about the principal normal, and normalized torsion v/t measures its rate of rotation around the tangent vector. Previously in this text, relations specifying the relative orientations of coordinate frames at different positions have been known as “connections”; curvature and torsion can be said to parameterize the connection between the fixed and moving frames of reference.

244

GAUGE-INVARIANT MECHANICS

8.1.3. Vector Description in an Accelerating Coordinate Frame Another important problem in Newtonian dynamics is that of describing motion using coordinates that are measured in a noninertial frame. W o important applications of this are the description of trajectories using coordinates fixed relative to the (rotating) earth and description of the angular motion of a rigid body about its centroid. These examples are emphasized in this and the following sections. Though frames in linear acceleration relative to each other are also important, the concepts in that case are fairly straightforward,so we will concentrate on the acceleration of rotation. The treatment in this section is not appreciably different and probably not clearer than the excellent and clear corresponding treatment in Symon [ 11. Though we will not describe rigid body motion until a later section we borrow terminology appropriate to that subject, namely spuce-frame K and body-frame F. Also, it will seem natural in some contexts to refer to frame P as “the laboratory frame” to suggest that the observer is at rest in this frame. The frame K, which will also be known as “the inertial frame,” has coordinates r = (r l , r2, r 3 ) ,which are related to coordinates F = (F’, F2, F3) by rotation matrix O ( t ) :

r = o(~)F, or r j = o j k ( t ) F k k .

(8.1.22)

The “inertial” designation has been interjected at this point in preparation for writing Newton’s equations in an inertial frame. Much more will be said about the matrix O(t),but for now we only note that it connects two different frames of reference. Unfortunately there is nothing in its notation that specifies what frames O ( t ) connects and it is even ambiguous whether or not it deserves to have an overhead bar? It would be possible to devise a notation codifying this information, but our policy is to leave the symbol 0 unembellished, planning to explain it in words as the need arises. In this section, the point of view of an observer fixed in the frame will be emphasized (though all vector diagrams to be exhibited will be plotted in the inertial system unless otherwise indicated). A x-frame observer locates a particular particle by a vector Fp, where the overhead bar connotes that its elements .“p refer to frame K.If this frame is accelerating or rotating, the motion will be described by Newton’s law expressed in terms of Z>, and the effects of frame rotation are to be accounted for by including fictitious forces, to be called “centrifugal force” and “Coriolis force.” The absolute coordinates of P in inertial frame K are then given by the second of Eqs. (8.1.22).3 Eq. (8.1.22) could have been interpreted actively, with F being, for example, the initial position of a particular point mass and r its position at time t. This would be a convenient interpretation for describing rigid body motion, for which F remains 21t has been mentioned before, and it will again become clear in this chapter, that when one attempts to maintain a parallelism between “intrinsic appearing” formulas l i e the first of Eq.(8.1.22)and coordinate formulas like the second, there is an inevitable notational ambiguity that can only be removed by accompanyingverbal description. 31t might seem artificial to describe motion from the point of view of a rotating frame were it not for the fact that. living on a rotating earth, we do it all the time.

VECTOR MECHANICS

245

constant in the K frame, because it signifies the arrow joining two points that are fixed in that frame. We do not permit this interpretation of Eq. (8.1.22), however. Our policy is explained in the following digression. When one vector, has an overhead bar and the other, V,does not, the equation V = O v will always be regarded passively. That is, the symbols V and 5 stand for the same arrow, and the equation is an abbreviation for the equation VJ = O j , v k , which relates the components of the arrow in two different coordinate frames: It is unattractive to have an intrinsic boldface symbol modifed to key it to a particular frame, but this is a price that has to be paid to maintain a compact matrix-like notation. It is important to remember this feature of the notation. So is that true vector (or arrow) whose K-frame components are Vi, and V is the same arrow whose K-frame components are x i.5 Note that it is only the equation relating apparently intrinsic (because they are in boldface type) quantities for which the notation has to be strained in this way-the relation among components V j = O j , v k is unambiguous. Whenever the “matrix” 0 appears in a formula like V = O v that links a barred and an unbarred quantity, the quantity 8 will not be used because it would become ambiguous later on. Also, with w being the angular velocity of frame relative to frame K ,we must resist the temptation to use 5 to signify the angular velocity of frame K relative to frame as that would clash with our later definition of W. For the time being (until Section 8.2), this discussion will have been academic since equations of the form V = O v will not appear and, for that matter, neither will vectors symbolized as Description of the motion of a single particle by observers in frames K and F is illustrated in Fig. 8.1.4. (One can apologize for the complexity of this figure without knowing how to make it simpler.) Heavy lines in this figure are the images of arrows in a double-exposure snapshot (at t and r A t ) taken in the inertial frame. Like all arrows, the arrows in this figure illustrate intrinsic vectors. Body-frame K rotates with angular velocity w about the common origin. Like 0,w is called a connecting quantity because it connects two different frames. Mainly for convenience in drawing the figure, the (n,y ) and (E, L> axes are taken orthogonal to w, which is therefore a common z and t axis, and the axes coincide at t = 0. (The axes will not actually be used in the following discussion.) At any time t , the position of the moving particle is represented by an arrow r(t). which is necessarily the same arrow whether viewed from K or F. But at time t A t , an observer in frame “remembers” the position of the particle at time t as having been at a point other than where it actually was-

v,

x,

v.

+

+

40ne must fear ambiguity whenever a frame-specific notation, such as an overhead bar, is attached to a (boldface) vector symbol. This ambiguity is intrinsic to the intrinsic nature of a true vector, since such a vector has an existence that transcends any particular frame of reference. There is no such ambiguity when a notation such as an overhead bar is attached to the components of a vector. We have to accept the unsettling feature of this notation that, though we say and V are the same arrow, it would not be good form to say = V since that would make the equation V = O v seem silly. If we slip into writing such an equation it should be written as V, or as an intermediate step in an algebraic simplification where the situation is to be repaired in a subsequent step.

v

246

GAUGE-INVARIANT MECHANICS

* 0

determination, K's plot

(a)

(b)

a determination, K's plot

FIGURE 8.1.4. Body-frame Ti rotates with angular velocity w about its common origin with inertial-frame K. All arrows shown are plotted In frame K. (a) Two pointsr(t) and r(t A t ) on the trajectory of a moving particle are shown. For an observer in frame K at time t At, the radius vector r(t -F At) is the same arrow as the K-frame arrow at that time; but r(f) \ K , ~ +the ~~, K-frame observer's recollection at t A t of where the particle was at time t , is different from its actual location at time t, which was r(r). (b) A similar construction permits determination of acceleration a IF.

+

+

+

in the figure this is indicated by the dashed arrow labeled r ( f ) l ~ , f + aAs f . a result, the actual displacement A r occuring during time interval At and the apparent-to-K displacement (shown dashed) & are different. Our immediate task is to relate the velocities observed in the two frames. Since the vectors Ar and & stand for unambiguous arrows, plotted in the same frame K, it is meaningful to add or subtract them. From the figure, in the limit of small At, r(t) = r(t)lE,t+Ar- wAt x r(t)

or A r =

ar + wAr x r(r).

(8.1.23)

From this we obtain -

dr d -=-r+uxr dt dt

or v = v \ ~ + w x r ,

(8.1.24)

where, transcribing the geometric quantities into algebraic quantities, we have defined -

d v [KE - r = dt

Ar lim A t 4 0 At' -

(8.1.25)

and thereby assigned meaning to the operator d / d t . The components of d/dt r are ( d T ' / d t , d Y 2 / d t , . . .). Since Eq. (8.1.24) is a vector equation, it is valid in any coordinate frame, but if it is to be expressed in components, it is essential that components on the two sides be taken in the same frame. Since we are trying to describe motion from the point of view of a x observer, we will eventually use components in that frame. First, though, to apply Newton's law, we must calculate acceleration.

VECTOR MECHANICS

247

Though the derivation so far was based on the displacement vector r, any other true vectur V , being an equivalent geometric object, must satisfy an equivalent relation, namely

-

dV d _ - -v dr dt

+w x v

(8.1.26)

In particular this can be applied to velocity v, with the result

-

-

dv d _ - -v dt dt

+

d2 w x v = -r dt2

-

d + dt -(w

x r)

+ w x v l z + w x (w x r),

(8.1.27)

where the extra step of using Eq. (8.1.24) to replace v has been taken. Though the formal manipulations have been simple, we must be sure of the meaning of every term in Eq. (8.1.27). The term on the left is the well-known inertialframe acceleration; for it we will use the traditional notation a = d v / d t . In the terms w x V I K and w x (w x r), only standard vector multiplication operations are performed on mows illustrated in Fig. 8.1.4. All except v l are ~ shown in the (a) part of the figure, and that is shown in the (b) part. V I K is the apparent velocity where “apparent” means “from the point of view of an observer stationary in the frame who is (or pretends to be) ignorant of being in a noninertial frame.” The K-frame components of v l are ~ ( d Z ’ / d t ,dF2/dt, . ..). It is shown as “approximately parallel” to r because average and instantaneous velocities over short intervals are approximately parallel-in the limit of small At this becomes exact. For simplicity, let us assume - that w is time-independent (this restriction will be removed later), in which ~ only remaining term in Eq. (8.1.27) deserves closer case $(w x r) = w x v l The scrutiny. Defining (8.1.28)

x

it can be said to be the apparent acceleration from the point of view of F. Its components are (d2Y’/dt2,d 2 F 2 / d t 2 ,. .). The (b) part of Fig. 8.1.4 continues the construction from the (a) part to determine a. Combining these results, we obtain a = al,+2w

x

VIK+

w x (w x r).

(8.1.29)

At the risk of becoming repetitious, let it again be stressed that, even though all terms on the right-hand side of this equation are expressed in terms of quantities that will be evaluated in the frame, the arrows they stand for are all plotted in the K frame in Fig. 8.1.4 and are hence commensurable with a-otherwise Eq. (8.1.29) couldn’t make sense. On the other hand, since Eq. (8.1.29) is a vector equation, it can be expressed in component form in any frame-in particular in the K frame. The resulting K components are related to the K components in the well-known

x

248

GAUGE-INVARIANT MECHANICS

-

(a) 7 determination,g’s plot

(b)

a determination, K s plot

FIGURE 8.1.5. Vectors entering into the determination of v IT and a IT plotted in a plot that is stationary in the R frame. a IT is given by xv lK / A t in the limit of small At. This figure violates our convention that a// figures be drawn in an inertial frame.

way vectors transform, namely the component form of Eq. (8.1.22).6 The point of introducingfictitiousforces has been to validate an analysis that describes kinematics purely in terms of the vectors shown as heavy arrows in Fig. 8.1.5. Since the second and third terms of Eq. (8.1.29),though artifacts of the description, appear to augment (negatively) the inertial acceleration, they are known as “fictitious” accelerations. With the inertial accelerationrelated to the “true force” F(true) by Eq. (8.I . I), the “fictitious” forces are F(centrifuga1) = -mw x (w x r ) = mo2r - m(w . r)w, F(Corio1is) = -2mw x

VIK,

(8.1.30)

and the equation of motion becomes 1

alF = &(F(true)

+ F(centrifuga1) + F(Corio1is)).

(8.1.3 1)

For practical calculation in component form, each of the terms is decomposed into K-frame components. In the case of F(true), this exploits the fact that F(true) is in fact a true vector. After some examples illustrating the use of these formulas, this analysis will be reexpressed in different terms, not because anything is wrong with the derivation just completed, but in preparation for proceeding to more complicated situations.

Problem 8.1.4: Express in your own words the meaning of the symbols Ar and A ( v 1 ~ )in Fig. 8.1.5. If that figure seems obscure to you, feel free to redraw it in a way that makes it seem clearer. 6A vector construction analogous to that of Fig. 8.1.4 can be performed in the K frame, as shown by 8.1.5. These vectors are only shown to make this point, though; the noninertial frame description describes the motion using only the heavy arrows. the dashed vectors in Fig.

VECTOR MECHANICS

249

Problem 8.1.5: The radial force on a mass m , at radius r relative to the center of the earth (mass M E ) is F = - m M E G / r 2 . If one ignores the motion of the earth about the sun, but not the rotation of the earth, one can describe the motion of a satellite of the earth in inertial coordinates with the earth at the origin or in terms of (r,8,4), which are the traditional radial distance, co-latitude, and longitude that are used for specifying geographical objects on earth. (a) It is possible for the satellite to be in a “geosynchronous” orbit such that all of its coordinates (r,8, #) are independent of time. Give the conditions determining this orbit and find its radius rs and latitude 0s. (b) Consider a satellite in an orbit just like That of part (a) except that it passes over the North and South poles instead of staying over the equator. Give (timedependent) expressions for the coordinates (r, @,#),as well as for the Coriolis and centrifugal forces, and show that Newton’s law is satisfied by the motion. 8.1.4. Exploitingthe Fictitious Force Description The mental exertion of the previous section is only justified if it simplifies some physical calculation. The reader has probably encountered discussions of the infiuence of Coriolis force on weather systems in the earth’s atmosphere [2]. Here we will provide only enough examples to make clear the practicalities of using Eq. (8.1.31). Working Problem 8.1.5 goes a long way in this direction. The most historically significant example, the Foucault pendulum, is analyzed in the next chapter. Though Eq. (8.1.31) was derived by working entirely with vectors drawn in the K frame, since it is a vector equation, it can be used working entirely with vector calculations in the frame. In this frame, the Coriolis and centrifugal forces are every bit as effective in producing acceleration as is the force which to this point has been considered fundamental. For terrestrial effects, the angular velocity is WE

=

2n = 0.727 x lOW4s-’. 24 x 3600

(8.1.32)

On the earth’s equator, since the centrifugal force points radially outward parallel to the equatorial plane, the acceleration it produces can be compared directly to the “acceleration of gravity” g = 9.8 m/s. The relative magnitude is

-W ~

2 R 3.44 ~

10-3.

(8.1.33)

g

Though appreciable, this is comparable to the variation of g over the earth’s surface and can be included by “renormalizing” the force of gravity slightly in magnitude and direction. The centrifugal force does not have appreciable meteorological consequences. The relative magnitude of the Coriolis and centrifugal forces tends to be dominated by the (small) extra factor of o in F(centrifugal) relative to F(Corio1is). The centrifugal force would be expected to make itself visible most effectively through the force difference occurring over an altitude difference compa-

250

GAUGE-INVARIANTMECHANICS

rable with the height of the earth‘s atmosphere; let a typical value be Ar = 10 km. The Coriolis force can be estimated as being due to the velocity of a particle having “fallen” through such a change of altitude. For a particle accelerating through distance Ar under the influence of earth’s gravity, the velocity v is &Z, and a typical value for the ratio v / A r is % 0.04s-’. This is for a “large” fall; for a smaller fall the ratio would be greater. Since it is already much greater than W E , the Coriolis force tends to be more significant than the centrifugal force in influencing terrestrial phenomena. The Coriolis force has the property of depending on the velocity of the moving particle. Some precedents for velocity-dependence in elementary mechanics are friction and viscous drag. These forces are dissipative, however, while the Coriolis force clearly is not, since it results from a change of reference. In the regard of depending on velocity but being lossless, the Coriolis force resembles the force on a moving charged particle in a magnetic field, with the magnetic field and w playing roughly analogous roles. The characteristic qualitative feature of motion of a charged particle in a magnetic field is that it tends to move in a roughly circular helix wrapping around the magnetic field lines. This suggests, at least in some ranges of the parameters, that the Coriolis force will cause qualitatively similar wind motion circulating around w. Of course the presence of the earth’s surface acting as a boundary that is not normal to w tends to invalidate this analogy, but it should be expected that the Coriolis force can lead to atmospheric motion in “vortices.”

Example 8.1.1: A Particle Falling Freely Close to the Earth’s Surface. Newton’s law for free fall with the earth’s curvature neglected is t = -g$ where g is the acceleration of gravity and f points in the local “vertical” direction. Starting with velocity vo, after time r the particle’s velocity is vo - gtf. Including the Coriolis force, the equation of “free fall” is

t + g i = 2OEv x G,

(8.1.34)

where & is directed along the earth’s axis of rotation. As a matter of convention to be followed regularly in this text, the terms describing an idealized, solvable system have been written on the left-hand side of this equation, and the “perturbing” force that makes the system deviate from the ideal system has been written on the right-hand side. If the perturbing term is “small,” then it can be estimated by approximating the factor v by its “unperturbed” value, which is obtained by solving the equation with the right-hand side neglected. This procedure can be iterated to yield high accuracy if desired. In the present case, the equation in first iteration is

4

where 8 is the “co-latitude” angle (away from North) and is a unit vector pointing from east to west along a line of latitude. Since all the force terms are now functions only of r, (8.1.35) can be integrated easily. Starting from rest (in the Northern hemisphere), the falling object veers eastward because of the Coriolis force. This is contrary to the intuition of those people visualizing the Earth rotating “out from under” the mass.

VECTOR MECHANICS

251

Problem 8.1.6: Integrate Eq. (8.1.35) (twice) to find r(t) for a freely falling mass subject to gravity and the Coriolis force. For the case vo = 0, find the approximate spatial trajectory by using the relation between time and altitude appropriate for unperturbed motion. Problem 8.1.7: Using the velocity obtained from Eq. (8.1.35), perform a second iteration and from it write an equation of motion more accurate than Eq. (8.1.35). Example 8.1.2: The Reduced Three-Body Problem. Though the problem of three bodies subject to each other’s gravitational attraction is notoriously nonintegrable in general, there are simplifying assumptions that can be made that simplify the problem while still leaving it applicable to realistic celestial systems. In the socalled “reduced three-body” problem, the three masses are taken to be l - p, p, and 0 (where mass being zero would better be stated as “mass is negligible,” implying that the position of the third mass has no effect on the motions of the first two). In this case, the motion of the first two is integrable and they move inexorably, independent of the third. This inexorable motion causes the gravitational potential sensed by the third particle to be time-varying. Since the problem is still complicated, one assumes also that all three orbits lie in the same plane; call it the (x, y) plane. For further simplification one assumes also that the orbits of the first two masses around each other are circular. All the approximations mentioned so far are applicable to the system consisting of sun, earth, and moon, so let us say that this is the system we are studying. Symbols defining the geometry are shown in Fig. 8.1.6. Formulated in this way, the problem still has interest apart from its mundane, everyday observability. One can, for example, inquire as to what stable orbits the earth’s moon might have had, or what the possible orbits are of satellites around other binary systems. For that reason, although

ms

FIGURE 8.1.6. The sun, earth, moon system, with all orbits assumed to lie in the same plane. The bisector of the sun-earth line is shown, and the point 4 makes an equilateral triangle with sun and moon.

252

GAUGE-INVARIANT MECHANICS

it would be valid to assume m , << me << m , , we will only assume m , << me and m , << m,. Also, we will not assume re << r,, even though it is true for the earth’s moon. As mentioned above, the gravitational potential at the moon depends on time. But if the system is viewed from a rotating coordinate system, this feature can be removed. Defining the “reduced mass” m of the sun-earth system by m 5 m,me/(m,+ me), R as their separation distance, M as their angular momentum, one knows that this system is rotating about the centroid with C O ~ S ~ angular Q I ~ ~ velocity !2 given by

a=---M

(8.1.36)

mR2‘ It was the requirement that C? and R be constant that made it appropriate to require the sun and earth orbits to be circular. The other constant distances satisfy R, = R m / m , , Re = R m / m e , and R, Re = R . Viewed from a system rotating with angular velocity R about the centroid, both sun and earth appear to be at rest, so the gravitationalpotential has been rendered time-independent. The potentials due to the sun and the earth are

+

v,

ms G

=-

J(x

+ RS12+ y2

and

Ve = -

meG J(x- Re)’

+ y2

(8.1.37) *

The centrifugal force can be included by including a contribution to the potential energy given by (8.1.38)

Combining all potentials we define

Including the Coriolis force, the equations of motion are

(8.1.40)

+

Problem 8.1.8: The quantity h = Veff + u 2 / 2 , where u2 = i 2 j 2 , would be the total energy of the moon, which would be conserved (because the total energy of the sun-earth system is constant) except (possibly) for the effect of being in a rotating coordinate system. Show, by manipulating Eqs. (8.1.40) to eliminate the Coriolis terms, that h is, in fact, a constant of the motion. It is known as the “Jacobi integral.” Problem 8.1.9: Find a Lagrangian for which Eqs. (8.1.40) are the Lagrange equations. (It is not necessary for a Lagrangian to have the form L = T - V , and if it

VECTOR MECHANICS

-2

-1

0’

1

253

2

X

FlGURE8.1.7. Contour plot of V,wfor R = 1, G = 1,52 = 1 , p =0.1, ms = 1 - p , me = p. The contours shown are for constant values of Vew given by -10, -5, -4, -3, -2.5, -2.4, -2.3, -2.2, -2.1, -2, -1.9, -1.8, -1.7, -1.6, -1.54, -1.52, -1.5, -1.48, -1.46, -1.44, -1.42, -1.4, -1.38, -1.36, -1.3, -1.2, -1.1, -1, -.5, -.2, -.1. The order of these contours can be inferred by spot calculation and the observation that there is a maximum running roughly along a circle of radius 1. Only the positive y region is shown; Vew is an even function of y .

is written in this form it is legitimate for V ( r ,r) to depend on both velocities and positions.) On the basis of the constancy of the Jacobi integral, some things can be inferred about possible motions of the moon from a contour plot of VeR. For a particular choice of the parameters, such a contour plot is shown in Fig. 8.1.7. Some of these contours are approximatetrajectories, in particular the “circles” close to and centered on the sun, but otherwise the relations between these contours and valid orbits is less clear. For each of these contours, if it were a valid trajectory, since both h and V,R are constant, so also would be u. For an orbit that is temporarily tangent to one of these contours, the tangential components of both the Coriolis force and the force due to Veg vanish, so u is temporarily stationary. Presumably the “generic” situation is for u to be either a maximum or a minimum as the orbit osculates the contour. For orbits that are approximate elliptical Kepler orbits around the sun, these two cases correspond approximately to the maximum and minimum values of u as the moon (in this case it would be more appropriate to say “other planet”) moves more of less periodically between a smallest value (along a semiminor axis) and a largest value (along a semimajor axis). In this case, then, the orbit stays in a band between a lowest and and a highest contour, presumably following a rosette-shaped orbit that, though, resembling a Kepler ellipse, does not quite close. If the moon’s velocity matches the speed u required by the osculating contour, then this band is

254

GAUGE-INVARIANT MECHANICS

slender.’ In greater generality, at any point in the space, by judicious choice of initial conditions, it should similarly be possible to launch a satellite with the correct speed and direction so that it will follow the particular contour passing through the launch point for an appreciable interval. It will not stay on the contour indefinitely, though because the transverse acceleration deviates eventually from that required to remain on the contour. The points labeled L1, L2, and L3 are known as “Lagrange unstable fixed points,” and LA, with its symmetric partner L5, are known as “Lagrange stable fixed points.” These are the points for which aVe,/i3x = aVeR/i3y = 0. If a “moon” is placed at rest at one of these points, since the Coriolis force terms and the Veff force terms vanish, the moon would remain at rest. The most interesting points are L4 and L5. Since there are closed curves surrounding these points, there appears to be the possibility of satellite orbits “centered” there. In modem jargon, one would say that Lagrange “predicted” the presence of satellites there. Some 100 years later it was discovered that the asteroid Achilles resides near LA in the sun-Jupiter system, and numerous other asteroids have been discovered subsequently near L4 and L5.It has been proposed that a “Next Generation Space Telescope,” to replace the Hubble Space Telescope, be located at L2. Its orbit would therefore resemble the Earth’s elliptical orbit about the sun.

Problem 8.1.10: On a photocopy of Fig. 8.1.7, sketch the contours that pass through the Lagrange fixed points, completing the lower half of the figure by symmetry and taking account of the following considerations. At “generic” points (x, y), the directional derivative of a function V ( x , y) vanishes in one direction (along a contour) but not in directions transverse to this direction. (On the side of a hill there is only one “horizontal” direction.) In this case, adjacent contours are more or less parallel and hence cannot cross each other. At particular points, though (fixed points), both derivatives vanish and contours can cross (saddle points) or not (true maxima or minima). It is easy to see that Ll is a saddle point, and from the figure it appears that L4 and L5 are either maxima or minima. For the parameter values given, test which is the case and see if this agrees with Eq. (8.1.44) below. For L2 and L3, determine if they are stable or unstable, and, if the latter, whether they are saddle points or m a ima. Sketch contours that either cross or not, as the case may be, at these points. It should be possible to follow each such contour back to its starting point, wherever that is. Also, in general, one would not expect two fixed points to lie on the same contour. The linearized equations of motion, valid near one of the fixed points, say L4,are i = 2aj - vxxx - V,,y, j; = -2Qf - vrxx - V,,y,

(8.1.41)

where partial derivatives are indicated by subscripts and the origin has been placed at the fixed point. Conjecturing a solution of the form x = A&‘, y = BeA‘,these ’There are remarkable “ergodic theorems” (due originally to Poincd) that permit heuristically plausible statements such as these to be turned into rigorous results.

255

SINGLE-PARTICLEEQUATIONS IN GAUGE-INVARIANT FORM

equations become

(2!a

+ v,,

+

VX,

-2m h2

+ vxy) (;)

+ v,,

= 0.

(8.1.42)

The condition for such linear homogeneous equations to have nontrivial solutions is that the determinant of coefficients vanish: h4

+ (4Q2+ v,, + Vy,)h2 + (VXXVYY- vx’y)= 0.

(8.1.43)

This is a quadratic equation in h2. The condition for stable motion is that both possible values of h be purely imaginary. This requires A* to be real and negative.

Problem 8.1.11: Evaluate the terms of Eq. (8.1.43) for the Lagrange fixed point LA, and show that the condition for motion in the vicinity of L4 to be stable is 2 7 ~ ( 1- P )

-= 1,

(8.1.44)

which satisfies the where p = m,/m,. For the sun-Jupiter system, p x condition for stability consistent with the previously mentioned stable asteroids near LA and L5.

Problem 8.1.12: Larmor’s Theorem (a) The force F, on a particle with charge q and velocity v in a constant and uniform magnetic field B is given by F, = q v x B. Write the equation of motion of the particle in a frame of reference that is rotating with angular velocity w relative to an inertial frame. Assume that w is parallel to B. Show that if the magnetic field is sufficiently weak, the magnetic and fictitious forces can be made to cancel by selecting the magnitude of the angular velocity. Give a formula expressing the “weakness” condition that must be satisfied for this procedure to provide a good approximation. (b) Consider a classical mechanics model of an overall neutral atom consisting of light negatively charged electrons circulating around a massive point nucleus. Causing what is known as the “Zeeman effect,” placing an atom in a magnetic field B shifts the energy levels of the electrons. In the classical model, each electron is then subject to electric forces from the nucleus and from each of the other electrons as well as the magnetic force. Assuming the weakness condition derived above is satisfied, show that the electron orbits could be predicted from calculations in a field-free rotating frame of reference.

8.2. SINGLE-PARTICLE EQUATIONS IN GAUGE-INVARIANT FORM

The term “gauge-invariant,” though long prevalent in electromagnetic theory, has recently acquired greater currency in other fields of theoretical physics [3]. In colloquial English, a “gauge” is a device for measuring a physical quantity-a ther-

256

GAUGE-INVARIANT MECHANICS

mometer is a temperature gauge, a ruler is a length gauge. In electromagnetic theory, “gauge-invariant”describes a kind of freedom of choice of scalar or vector potentials, but it is hard to see why the word “gauge” is thought to call such freedom to mind in that case. In the context of geometric mechanics, the term “gauge-invariant” more nearly approximates its colloquial meaning. When one describes a physical configuration by coordinates that refer to inertial, fixed, orthonormal, Euclidean axes, one is committed to choosing a single measuring stick, or gauge, and locating every particle by laying off distances along the axes using the same stick. A theory of the evolution of such a system expressed in those coordinates would not be manifestly gauge-invariant, because it would be expressed explicitly in terms of a particular gauge, but this does not imply that the same theory cannot be expressed in gauge-invariant form. An example of this sort of mathematical possibility (Gauss’s theorem) was discussed in Section 4.3.2; though Gauss’s theorem is commonly expressed in Euclidean coordinates, it is expressed in coordinate-independentform in that section. In this chapter the term “gauge-invariant” will have the similar meaning of “coordinate-frame-invariant.”The gauge could, in principle, depend on position, but since that will not be the case here, we have to deal only with a much simplified form of gauge-invariance. Much of the subsequent analysis can be described as the effort to derive equations in which all quantities have overhead bars (or all do not). Such an equation will then be said to be form-invariant.If the frame in which the equation was derived is itself general, then the equation will have the powerful attribute of being applicable in any coordinate system. An example of equations having this property are Maxwell’s equations; they have the same form in all frames traveling at constant speed relative to a base frame. 8.2.1. Newton’s Force Equation in Gauge-Invariant Form A particle of mass m,8 situated at a point P with coordinates x i ,is subject to Newton’s equation, d2xi

m-

dt2

.

= f’(r, v, t ) ,

(8.2.1)

where f’ is the force’ (possibly dependent on r, r, and I ) . Since the f’ are components of a vector, they are subject to the same transformation Eqs. (8.1.22) as r: f = O(t)i, f’ = 0’,(t).Tk.

(8.2.2)

Recall that, by our conventions, this is to be regarded as a passive transformation, relating the components o f f and j , which stand for the same arrow. Our mission is to write Newton’s equation in “gauge-invariant’’form where, in this case, choice of 8We talk of a point mass m even though it will often be the mass dm contained in an infinitesimal volume d V , perhaps fixed in a rigid body, that is being discussed. 9Since we will use only Euclidean coordinates for now, it is unnecessary to distinguish between covariant and contravariant components of the force.

SINGLE-PARTICLE EQUATIONS IN GAUGE-INVARIANT FORM

257

“gauge” means choice of coordinate system, with rotating coordinate systems being allowed. (The ordinary Newton equations are gauge-invariant in this sense when the choice is restricted to inertial frames; this is known as “Galilean invariance.”) Formally, the task is to reexpress Newton’s equation entirely in terms of quantities having overhead bars. The introduction of Coriolis and centrifugal forces has gotten us a long way toward realizing our goal-they are fictitious forces which, when added to true forces, make it legitimate to preserve the fiction that a rotating frame is inertial. This “fictitious force” formulation has amounted to evaluating inertial frame quantities entirely in terms of moving frame quantities. Once the gauge-invariant formulation has been established, there will be no further need for more than one frame, and it will be unnecessary to distinguish, say, between d / d t and d / d t . For want of better terminology we will use the terms “fictitious force formalism” and “gauge-invariant formalism” to distinguish between these two styles of description even though the terminology is a bit misleading. It is misleading because not only are the formulations equivalent in content, they are similarly motivated. The present treatment strives to express Newton’s equations in such a way that they preserve the same form in any reference frame. It is somewhat more general than the simple introduction of fictitious centrifugal and Coriolis forces because the axis and velocity of rotation of the rotating frame will not necessarily be constant. At some point a sense of dkja vu may develop, as the present discussion is very similar to that contained in Section 3.1.5, which dealt with the application of the absolute differential in mechanics, though the present situation is somewhat more general because time-dependent frames are being allowed. In that earlier section, an operator D was introduced with the property that position r and its time derivative were related by i = Dr, but that the curvilinear effect was shown to cause the absolute acceleration vector a = D2r to differ from r. In the present case, an analogous differential operator D, will be defined; in terms of this operator, the absolute velocity is Drr. Again it will be true that r # D:r, but the time-dependent relation between frames will cause r to differ from Dtr as well. Differentiating Eq. (8.1.22), the inertial-frame velocity v = r is given by

v = oi + O F = 06 + ( 0 T O ) F ) = 06 + a@,

(8.2.3)

wherelo

and the orthogonality of 0 has been used: 0-’ = O r . It is easy to be too glib in manipulations such as those just performed in Eq. -(8.2.3). Having said that r and I; are in some sense the same -but d / d r and d / d t are different, what is the meaning o f f ? We mean it as f = d/dtF, the m a y of lome quantity 0’0 is known in differential geometry as “the Cartan matrix.” It was introduced by Cartan in his “mdrhode du rephe mobile” or “moving frame method.”

258

GAUGE-INVARIANTMECHANICS

elements ( d x ’ l d t , dT*/dt, . . .), and we mean Of as an abbreviated notation for the array of elements OJ,dYk/dt. We introduce a vector 5, related to v in the same way F is related to r: v = O(r)V.

(8.2.5)

From Fq.(8.2.3), T is given by

+

v = + 51F.

(8.2.6)

(This shows that V is not equal to the quantity k, which might have seemed to deserve being called the F-frame velocity of the moving point; this reminds us that transformation between frames deserves considerable care.) We introduce a “time derivative” operator

-

-

d dt

D, = - +51,

(8.2.7)

dependent on “gauge” 61, that relates F and V as in (8.2.6):”

-

-

v = DtF.

(8.2.8)

This equation has the desirable feature that all quantities have overhead bars, making the equation “gauge-invariant.”12It can be reiterated that absence of a symbol 6 distinct from 0 is inessential,but it may be reassuring that 61, because it is the product of a “forward” and a “backward” matrix, can naturally link quantities evaluated in the same frame of reference. A way of calculating i equivalent to that of Eq. (8.2.8) is to first calculate the space-frame dispacement OF, find its time derivative (d/dr)(OF),and then transform back:

v = OT

($) OF = OTO-dF + OT0F =:(- + 51) F = BtF. dt

(8.2.9)

This shows that Bt can also be written as

-

D1 = OT

‘

($) 0.

(8.2.10)

‘V can be regarded as a relative velociry vector, composed from an apparent velocity in one frame and the relative velocity of the two frames. Its components are then subject to the same transformationrelation as all other true vectors. It is unclear how helpful to the intuition this is, however, since the individual terms are not m e vectors. ‘*When a tensor equation expresses a relationship between components in the same frame of reference using only invariant operations such as contraction on indices. it is said to be manifestly covarianr. The concept of gauge invariancecurrently under discussion is therefore a similar concept.

SINGLE-PARTICLE EQUATIONS IN GAUGE-INVARIANT FORM

259

a

Remembering that OT = O-I, this formula shows that and d l d t , though they operate in different spaces, can be regarded as being related by a “similarity transformation.” In this case, the two spaces are related by 0, and the K-frame evolution operator 5 is “similar” to the K-frame evolution operator d / d f . We now write Newton’s equation using the operator

z:

-2-

-

mDt r = f ( F , Q .

(8.2.11)

This has the sought-for property of being expressed entirely in terms of quantities with overhead bars. As in Eq. (8.2.2), the vectorial property of force f has been assumed. (The force can also depend on time, but that dependence is not shown because it does not affect the present discussion.) To check that this equation is correct, we need to see that it agrees with Eq. (8.2. l), which is Newton’s equation in an inertial frame:

m a 2 i =m

(OT (OT $0)

$0) F

= mOTr = OTf(r,v) = OTf(OT,OV) = f(F, V).

(8.2.12)

These manipulations have been formally motivated by the goal of eliminating quantities without overhead bars. The only remaining evidence of the fact that the body frame is rotating is that the evaluation o f f from F(t) depends on the gauge 51. When expanded more fully, the “acceleration” term of Newton’s equation becomes -2-

-*

D, r=o,(F+51?)=~+nT+262f+n2F.

(8.2.13)

The term -2mfli is the Coriolis force, -mf12T is the centrifugal force, and -mhT accounts for the nonconstancy of the relative angular velocities of the frames.’3

*8.2.2. Active and Passive Interpretationsof Time Evol~tion’~ In discussing rotational motion, the sign of the angular velocity vector w seems to be particularly confusing, and so also does the distinction between active and passive interpretations of transformations. Consider a point P fixed at location Tp in a rotating rigid body. At least temporarily, at a time we take as t = 0, the body, as viewed in inertial-frame K, is rotating with angular velocity w, and we assume that moving and inertial-frames are aligned at this time, so we have rp(0) = Tp(0). At later time t, the point P has K-frame 131t has been explained repeatedly why one need not be concerned by the absence of an overhead bar on n as it appears in Eq. (8.2.13). 14This and the next section are confusing without being extremely important. It commonly is needed only to determine the sign of a rotation velocity, a factor that can typically also be determined by other means. Once one has appreciated the difficulty of avoiding ambiguity, it may be sensible to defer worrying about signs and whether a matrix or its inverse is required until a concrete case is being studied.

260

GAUGE-INVARIANTMECHANICS

location r p ( t ) , related to its earlier location by (8.2.14) where 0’is an othogonal matrix that will later be related to 0. Other than some seemingly irrelevant differences in subscripts and arguments, the equality of the outer quantities here looks like the first of Fqs. (8.1.22). But we should compare these equations more closely; to allow for its possible difference from 0, the matrix 0’has been given given a prime in Eq. (8.2.14). We have stated that the vectors r and F in Eq. (8.1.22) are “the same arrow,” whose coordinates differ because they refer to different frames. On the other hand, a picture illustrating Eq. (8.2.14) would seem to have Fp(0) independent of time and hence fixed and rp(t) moving. This does not square with their being “the same arrow”-a paradox. The presence of the qualified equality symbol acknowledges this, and the same qualification should have been placed on the earlier equation r p (0) i p (0). Since the quantity r in Eq. (8.1.22) stood for the location of any particle, it can stand for our particular particle P, so the subscript P has no bearing on the paradox. Furthermore, the presence or absence of arguments ( t ) cannot matter, since each Fp(O), which just happens of the vectors could be given such an argument-ven to be time-independent (though only when viewed in the frame). This parenthetic comment is the tip-off; the resolution of the paradox is that Fp (0) is not in fact a fixed arrow in the K frame but is rotating along with r p ( t ) . This means that Eq.(8.1.22) and the outer equality of Eq. (8.2.14) are not necessarily inconsistent after all. In component form, Eq. (8.2.14) reads r’(t)

= O’ik(t)FqO),

(8.2.15)

Not only is this consistent with the second of Eqs. (8.1.22), it has no “seemingly paradoxical” character. The constant quantities Fk (0) are related to the time-varying quantities r j ( t ) by a time-varying matrix o”k(t) in a way that simply reflects the instantaneous relative orientation of the reference frames. These considerations are important in understanding the Cartan matrix s1 that figured prominently in the previous section. The Cartan matrix was also introduced in Section 4.2.6, where it was used to relate bivectors and infinitesimal rotations. As bivectors and infinitesimal rotations are discussed there, a fixed vector is related to a moving vector. The first equality in Eq. (8.2.14) (suppressing the P), r(t) = O’(t)r(O),

(8.2.16)

is just such an equality. Its natural interpretation is active with O’(t)acting on constant vector r(0) to yield rotating vector r(t). This is illustrated in the upper part of Fig. 8.2.1. Let us contrast this with Eq. (8.1.22), which we repeat now for emphasis:

r = O(t)F.

(8.2.17)

SINGLE-PARTICLE EQUATIONS IN GAUGE-INVARIANT FORM

y ( r ) = sin 8, x(0)

M

261

+ cos e, y(0)

y (positive) X

pAsslvE

x - sin ep-Y

x = cos ep

y = sin B,X

+ cos epy

r z r FIGURE 8.2.1. Pictorial representations of Eqs. (8.2.16) and (8.2.17) exhibiting the active and passive effects of the same matrix. The figures have been arranged so that the coefficients of the inset equations are element-by-element equal (in the case Be = Bp).

We have insisted on apassive interpretationof this equation, with r and F standing for the same arrow; the equation is simply an abbreviation for the matrix multiplication that relates components in different coordinate systems. The lower part of Fig. 8.2.1 illustrates this interpretation. Any relation between O’(t)and O ( t )would depend on the relation between active angle 6, and passive angle 8,. In general, there is no such relation since the relative orientation of frames of reference and the time evolution of systems being described are independent. Commonly though, the passive view is adopted in order to “freeze” motion that is rotating in the active view. To achieve this, we combine Eqs. (8.2.16) and (8.2.17), -

r(r) = (O-’(t)O’(r))T(O),

(8.2.18)

where we have assumed r(0) = F(0). In order for F(r) to be constant, we must have O’(t)= O(r). Though it is logically almost equivalent, essentially the same argument applied to linear motion seems much easier to comprehend-to freeze the motion x = a ut wedefinex = X + b ut so thatX = a ut - b - ut = a - b.

+

+

+

*8.2.3. Continued Discussion of the Cartan Matrix We now continue with the Cartan analysis, ceasing to distinguish between 0’and 0. Differentiating Eq. (8.2.16), which is the same as the first of Eqs. (8.2.14), the velocity of point P is v(t) = o(t)r(O).

(8.2.19)

262

GAUGE-INVARIANTMECHANICS

Using the inverse of Eq. (8.2.16),v(t) can therefore be expressed in terms of r(t),

v(r) = OOTr(t),

(8.2.20)

where OT = O-l, because 0 is an orthogonal matrix. It was shown in Section 4.2.6 (from the requirement r . v = 0) that the quantity OOT is an antisymmetric tensor. It then follows from d / d t ( O O T )= 0 that O*O, for which we introduce the symbol n,is also anti~ymmetric:’~

(8.2.21 ) This meets the requirement of antisymmetry, but for the time being the quantities e l , e2,and e3are simply undetermined parameters; the signs have been chosen for later convenience. We next introduce a quantity i that is related to v by

v = ov,

(8.2.22)

which is to say the same way vectors introduced previously have been related in the two frames of reference; that is, v and V stand for the same arrow but with components to be taken in different frames.16 Combining these formulas, we arrive at

-

v = O*V = O T 6 0 T r ( t = ) Q(r)r.

(8.2.23)

The essential feature of n is that it relates the instantaneous position vector and the instantaneous velocity vector “as arrows in the same frame.” This is the basis for the phrase “Cartan’s moving frame.” If we now allow the point P to move with velocity r in the moving frame, this becomes 5 =t f

n(t)r;.

(8.2.24)

We have rederived Eq. (8.2.6)though the components of n are as yet undetermined.

8.2.4. Reconciling Fictitious Force and Gaugeinvariant Descriptions Clearly the “fictitious force” and “gauge-invariant” descriptions contain the same physics, but their conceptual bases are somewhat different and their interrelationships are subtle. Eq.(8.2.13) is equivalent to Eq. (8.1.29) (or rather it generalizes that equation by allowing nonuniform rotation), but minor manipulation is required to demonstrate the fact, especially because Eiq. (8.2.13) is expressed by matrix multiplication and Eq. (8.1.29) is expressed by vector cross products. The needed formula was derived in Eq.(4.2.25),but to avoid the need for correlating symbols we rederive it now. I5Tbe calculation OOT has generated an element of the Lie algebra of antisymmetric matrices from the Lie gmup of orthogonal matrices. This generalizes to arbitrary continuous symmetries. 16Commentsmade in the footnote following Eq.(8.1.22) concerning the risk of ambiguity in the interpretation of quantities like B are just as applicable here.

SINGLE-PARTICLE EQUATIONS IN GAUGE-INVARIANT FORM

263

To do this, and to motivate manipulations to be performed shortly in analyzing rigid body motion, the two representations of the same physics can now be juxtaposed, starting with velocities from Eqs. (8.1.24) and (8.2.6): v = vlz + w x r, -

v = $ + JZF,

fictitious force description,

gauge-invariant description.

(8.2.25) (8.2.26)

The latter equation, in component form, reads

-

0

4

3

z 2

(8.2.27) where the components of 51 have been chosen to make the two equations match termby-term. The mental pictures behind Eqs. (8.2.25) and (8.2.26) are different. For the former equation one has mo coordinate frames explicitly in mind, and the equation yields the inertial-frame quantity v from quantities evaluated in a moving frame. In the gauge-invariant description, one “knows” only one franie, the frame one inhabits, and that frame has a gauge 51 externally imposed upon it. In Eq.(8.2.26), JZ acts as, and is indistinguishablefrom, an externally imposed field. That the quantity V is more deserving than, say f, of having its own symbol is just part of the formalism. (There is a similar occurence in the Hamiltonian description of a particle in an electromagnetic field; in that case the mechanical momentum is augmented by a term proportional to the vector potential. Furthermore, recalling Problem 8.1.12, one knows that JZ has somewhat the character of a magnetic field.) In Eq. (8.2.26), or more explicitly in Eq. (8.2.27), all coordinates have bars on them in the only frame that is in use. Except that it would introduce confusion into the comparison of the two views, we could simply remove the bars in Eqs. (8.2.26) and (8.2.27). For these two views to correspond to the same physics, there must -be an intimate connection between the quantities w and fl. Identifying VIE = (;Tix, d - 7x y-, 7xZ) T (this is necessarily a “qualified” equality, since the quantity on one side is intrinsic and on the other side is in component form) and equating corresponding coefficients, it is clear that the quantities ol, w2,and w3 entering the definition of 51 in Eq. (8.2.21) are in fact the components of w. The two formalisms can then be related by replacing vector cross product multiplication by w with matrix multiplication by s1: w x + 51.

(8.2.28)

Spelled out in component form, this is the well-known cross-product expansion of ordinary vector analysis: 0 (w3

-02

+Ji) ,2

-,3

0 Lo’

0

Iwxr.

(8.2.29)

264

GAUGE-INVARIANT MECHANICS

Like numerous previous formulas, this has to be regarded as a “qualified” equality since it equates an intrinsic and a nonintrinsic quantity. It is valid in any frame as long as the appropriate components are used in each frame. Accelerations, as given in the two approaches by Eqs. (8.1.29) and (8.2.13), can also be juxtaposed: a =a

IF +2w

-

($)=(Z2+2(

x

VIK

+ w x (w x r),

0

-533

-732 233

gj’ 0

7552

+0+ (

fictitious force description,

-

0 -Tjz Ts3

-733

732

0’ 0

0

gauge-invariant description.

(8.2.30)

(8.2.31) y-2-72-d’ alx = (a; x , ;ii y, dr q

nT.

One identifies acceleration components according to The quantities on the left sides of Eqs. (8.2.31) are “dynamical” in the sense that they are to be inferred from the applied force and Newton’s law; the quantities on the right side can be said to be “kinematical,” as they are to be inferred from the particle’s evolving position. Clearly the matrix and vector equations are equivalent representationsof the same physics. It would be a helpful mnemonic if the correction terms in the fictitious force formalism were expressed using cross-product notation and the correction terms in the gauge-invariant formalism were expressed in matrix form. But this would be too great a sacrifice in the latter case, since the cross-product formulas are so handy to to distinguish which interpreuse. There will still, however, be a notational device tation is intended. In the fictitious force formalism, d / d t will have an overhead bar, but vectors like w and v will not. In the gauge-invariant formalism, all quantities applicable to the frame, including will have overhead bars. At this point, one can further mull over the difficulty of maintaining logical, selfconsistent notation, especially concerning the quantities w and ZZ that, on the one hand, relate different frames and, on the other hand, have been said to represent “the same arrow.” Here the meaning of “same” includes all their intrinsic propertiesthey represent the same geometric object, the same axis of rotation, and the same angular speed and orientation around that axis. This geometric object, represented by a vector,17 is subject to the addition and transformation properties this implies. Of course the components of this geometric object differ in different frames. This is consistent with the components w iand Tsi being different. But, as stressed previously, w and Zi differ only by the frames in which their coordinates are expected to be expressed. When a cross-product expression such as w x i; is encountered, how is it to be interpreted intrinsically? How are its components to be calculated? Since it is the cross product of two unambiguous arrows, the rules of vector analysis assure that w x f is also an unambiguous arrow (it is an arrow perpendicular to both w and 3,but the latter question is poorly posed-it requires further specification of what frame is intended. If, as is likely, one wishes the components given as an answer to

m,

I7w is actually a pseudo-vectorbut the distinction is inessential here; the connection between bivectors and infinitesimal rotations is discussed in Section 4.2.6.

265

SINGLE-PARTICLEEQUATIONS IN GAUGE-INVARIANT FORM

refer to frame K,then components Ti and Ziare to be used in working out the cross product. Again our notational difficulties result from the fact that intrinsic quantities, by their nature, are not naturally related to any particular frame of reference.

’*

8.2.5. Newton’s Torque Equation

For analyzing rotational motion of one or more particles, it is useful to introduce “torques” and to write Newton’s equation in terms of the angular momentum L (relative to 0) of a particle with radius vector r (relative to 0 )and velocity v, which is defined by Lrrxmv.

(8.2.32)

By the rules of vector analysis, this is a true vector (actually a pseudo-vector) since both r and v are true vectors. The torque about 0 due to force F acting at position r is defined by

-r=rxF.

(8.2.33)

As it applies to L, Newton’s “torque law,” valid in inertial frames, is

dL _ --d r

dt

dt

dv = r x F = 7. dt

x mv+ r x m-

(8.2.34)

Consider next a point B, also at rest in the inertial frame, with radius vector rg relative to 0, and let r - r B = x so x is the displacement of mass m relative to point B. The angular momentum LB and torque T B relative to B are defined by

Lg=xxmv,

TBEXXF.

(8.2.35)

We have, therefore,

*

dv = ( i ( r - r I g ) ) x m v + x x m- = 7 8 - v g x mv, dt dt

(8.2.36)

where the final term vanishes because point B has so far been said to be at rest. We have not eliminated this correction term explicitly to prepare for the later possibility that point B is in fact moving. This formula is especially useful when a single particle is constrained by an element incapable of applying torque or when two or more particles are rigidly connected so that the torques due to their internal forces cancel in pairs; then, if there are external torques but no net external force, the point of application of the forces does not matter. But for the time being, all forces are to be applied directly to a single particle. I8It is because of the way that overhead bars are interpreted that we must not make the mistake of interpreting 5 as the angular velocity of frame K relative to frame since, by our conventions, that would yield the wrong sign.

266

GAUGE-INVARIANTMECHANICS

Next let us consider similar quantities reckoned in the rotating frame K.From the rules of vector analysis and from our algebraic conventions, certain relations have to be true:

-

LB

= il x mv,

FB

= I x 7,

d -LB dt

-

ddt

= -LB

-

+W x E B .

(8.2.37)

What is not clear is the relation between $LB and 7 ~One . complication is that even if the point B is at rest in the K frame, it will be moving in the frame and vice versa. Setting m = 1, we commence the evaluation: -

-

dd --Lg = - ( @ - F B ) dt dt

x

V) = (V- ve) x

V+X

x

(z-) -V

,

(8.2.38)

The tactic of this manipulation has been to work on removing bars. (The formula is now in some kind of hybrid state that would probably not pass mathematical muster because some quantities have bars and some do not, but the situation will be rectified in subsequent steps.) For a vector such as I,appearing at a place in the formula where it will not subsequently be differentiated, one has simply removed the bar. Continuing the evaluation, evaluating barred derivatives algebraically, and simply removing bars from quantities that will not be differentiated later, we find -

d-Lg dt

= (V - V B - w x

X)

x V+X x

d = -v,q x v - (w x x) x v + x x -v - x x dt d = x x - v - @ x (x x v) - v B x v. dt

(W

x v)

(8.2.39)

Taking advantage of the validity of putting bars back on quantities that will not be differentiated, this equation could have been written as

(z )-+SX

Lg =X

dv x m- - v B x mv=?B dt

- i B

x mv.

(8.2.40)

In the final line, the bars have been put back on the vectors so that both sides of the equation are expressed in variables of the same frame. (If we had trusted the formalism, we could have written this directly from Eq. (8.2.36) and the last of Eqs. (8.2.37).)

8.2.6. The Plumb Bob What could be simpler than a plumb bob, a point mass hanging at the end of a light string or rigid rod? It hangs straight down “in the laboratory.” But what is “down”?

SINGLE-PARTICLE EQUATIONSIN GAUGE-INVARIANT FORM

267

The fact that the earth rotates makes this system not quite so simple. But its apparent simplicity makes it a good system to exercise the methods under discussion. It will be approached several ways. Because the bob appears to be at rest there is no Coriolis force, even when working in the laboratory system.

(a) Inertial Frame Force Method: The earth-plumb bob system is illustrated in Fig. 8.2.2. The mass m hanging, apparently at rest, at the end of the light string of length 1 is subject to gravitational force FE directed toward the center of the earth and tension force FT along the smng. The resultant of these forces causes the mass to accelerate toward the axis of rotation of the earth. Of course this is just the acceleration needed to stay on a circular path of radius RE sine and keep up with the earth’s rotation at angular velocity o.Its radial acceleration is a = - R E sin802, where 8 is the co-latitude of the bearing point B on the earth from which the bob is suspended. The angle by which the plumb bob deviates from the line to the earth’s center is a.Since the mass is at rest axially, we have FT COS(8 - a) = FG COS

(8.2.41)

where an extremely conservativeapproximation has already been made in the second argument, and the term a ( ) c / R ~of) ,order RE relative to a,will immediately be dropped in any case. Taking FG = mg,we have therefore (8.2.42)

B

FIGURE 8.2.2. Mass m, hanging at rest at the end of light string of length A, constitutes a plumb bob. Its length is much exaggerated relative to the earth’s radius RE.

268

GAUGE-INVARIANTMECHANICS

Equating radial components yields (8.2.43) which simplifies to

(8.2.44) Finally this reduces to REU~ ff=-

2g

sin28

‘21.6 x

10-3radians,

(8.2.45)

where the Greenwich co-latitude of 8 = 38.5 degrees has been used. It is customary to incorporatethis tiny angle by redefining what constitutes “down” so that the plumb bob points in this direction and not toward the earth’s center. (Actually,the bob would not point toward the center of the earth in any case since the earth is not a perfect sphere. Its major deviation from being spherical is itself due to the centrifugal force which, acting on the somewhat fluid earth, has caused it to acquire an ellipsoidal shape.) Once this has been done the centrifugal force can (to good approximation) be ignored completely. For the rest of this section, it will be assumed that the direction of gravitational acceleration vector go = -gi, that is, it will be assumed that it points toward the center of the earth, but from then on we will assume an “effective” gravitational acceleration vector g(B) given by

which will permit the centrifugal force to be otherwise neglected. This constitutes an approximation, but since the deviation is so small it is a good one. If the earth were spinning sufficiently fast, the plumb bob would end up pointing sideways, and treating the centrifugal force would be more difficult, though not as difficult as the other problems life on Earth would present.

(b) lnertial Frame Torque/Angu/ar-MomentumMethod: To evaluate the torque acting on rn about point 0, we define local unit vectors i radial, C eastward, and C southward, as shown in Fig. 8.2.3. Still working in the inertial frame, the angular momentum of the mass rn relative to point 0 is

SINGLE-PARTICLEEQUATIONSIN GAUGE-INVARIANT FORM

269

FIGURE 8.2.3. The same plumb bob as in the previous figure is shown along with local basis vectors ? radial, e eastward, and & southward.

and its time rate of change is given by

dL _- -m R i a 2 sin 86 cos 8. dt

(8.2.48)

As shown in the figure, the only force applying torque about 0 is the string tension, which is given approximately by FT = mg. Its torque is r = RE? x FT = --REmg sin&.

(8.2.49)

Equating the last two expressions, we obtain

Rp2 sincr = -sin 28, 2g

(8.2.50)

in approximate agreement with Eq.(8.2.45) since a is an exceedingly small angle.

(c) Fictitious Force Method: If we act as if the earth is at rest and continue to use the center of the earth as origin, the situation is as illustrated in Fig. 8.2.4. There is an outward-directed fictitious force Fcentwith magnitude m RE^^ sin 8 which has to balance the gravitational force FG in order for the bob to remain at rest. Since both of these forces are applied directly to m , we can equate their components normal to the bob, which amounts also to equating their torques about point B. The condition for this balance is mgcr

mREw2 sin 8 cos 6,

(8.2.51)

270

GAUGE-INVARIANTMECHANICS

FIGURE 8.2.4. When viewed “in the laboratory,”the earth, the bearing point B and the mass m all appear to be at rest, but there is a fictitious centrifugal force Fcent.

which agrees with the previous calculations. In this case it has been valid to work with torques about B, but this would be risky in general because B is not fixed in an inertial frame.

(d) Transformation of Angular Momentum; Origin at 0: Alternatively we can use transformation formulas to infer inertial frame quantities. Since m is at rest (to excellent approximation) relative to B we can use RE? both as its approximate position vector and for calculating its velocity using Eq. (8.1.24). The inertial frame angular momentum of m about 0 is given by L = mRE? x (;RE?

+ w x RE?

= -mR&sinBi.

(8.2.52)

-

Note that the d / d t term has vanished because the mass appears to be at rest. The time rate of change of angular momentum is given by -

d_L d - -L dt dt

+w x L = -mRiu2

cos0 sin&,

(8.2.53)

__

where the d / d t term has again vanished because the angular momentum appears to be constant. The torque is again given by Eq. (8.2.49),and the result (8.2.45) is again obtained.

(e) Gauge-Invariant Method: Referring to Fig. 8.2.4 and substituting into Eq. (8.2.40), we have

LIE ALGEBRAIC DESCRIPTIONOF RIGID BODY MOTION

(z- )

@ x V) = w x

-+WX

(X

x V) = x x

FG.

271

(8.2.54)

(In evaluating expressions such as this, a frequent mistake is setting 7 to zero; this is a mistake because the bob is not moving-it is d/dtE that vanishes, not 9.The term V B x v in Eq.(8.2.40) vanishes because, though the support point B is moving in the inertial system, the velocities of the bob and the support point are parallel. The needed vectors are

x = -A?, w = o ( c o s 8 i - sin&), TB

= x x FG x hmgari,

(8.2.55)

and Eq. (8.2.40)becomes R

~

ee = hmgcri,

Osine I ~ cos~

(8.2.56)

which agrees with the previous determinations. Incidentally, no great significance should be placed on which of the above plumb bob equations have been indicated as equalities and which as approximations. Most of the equations are approximate in one way or another. The first method, though the most elementary, has the disadvantage compared to all the other methods that, because of the need to equate components parallel to the plumb line, more careful approximation is required.

8.3. LIE ALGEBRAIC DESCRIPTIONOF RIGID BODY MOTION Much of the material in this section repeats material in earlier chapters in which the Lagrange-Poincark approach is taken. This is partly to permit this chapter to be accessible whether or not that material has been mastered and partly to compare and contrast the two approaches. The main example will be the description of angular motion of a rigid body by the “Euler equations” describing rigid body motion. These are equations governing the time evolution of the body’s angular velocity components, as reckoned in the body frame (id, W 2 ,W3).The equations will be be derived using Lie algebraic methods. One purpose of this discussion is to practice with the commutator manipulations that are basic to application of Lie algebraic methods. It will turn out that the method to be developed is applicable to any system with symmetry describable as invariance under a Lie group of transformations. Every physicist has intuitively assimilated Newton’s equations as giving a valid description of mechanics (at least in the nonrelativistic, nonquantal domain). Every physicist who has advanced beyond freshman-level mechanics has acquired a similar

272

GAUGE-INVARIANTMECHANICS

(perhaps less deeply held) confidence in the Lagrange equations as containing the same physics and leading economically to correct equations of motion. And many physicists have followed a derivation of the Euler equations and their application to describe rigid body motion. Probably far fewer physicists can answer the question “Why is it that the Euler equations can be derived from Newton’s law, but not from the Lagrange equations?” For our theory to be regarded as satisfactorily powerful, it should be possible to derive the Euler equations by straightforward manipulations. Well, the Poincark equation derived in Section 5.2 provided exactly that capability, making it possible to derive the Euler equationsjust by “turning the crank.” The situation was still somewhat unsatisfactory, in the way that Lagrangian mechanics often seems, because the formulas tend not to have visualizable content. This may make it hard to make sensible approximations or to see how symmetries or simple features of the physical system can be exploited to simplify the equations, or how the method can be applied to other problems. This justifies working on the same problem with different methods, for example Newtonian methods. It will turn out that commutation relations again play an important role, but now the noncommuting elements will be 2 x 2 or 3 x 3 matrices rather than the vector fields prominent in Lagrange-Poincark mechanics. In addition to the material just covered concerning rotating reference systems and Coriolis and centrifugal forces, the description of rotational motion of rigid bodies and the mathematics of infinitesimal rotations deserve review at this point. It is entirely intentional that Hamilton’s equations and “canonical methods” have not been, and will not be, used in this discussion. Hamiltonian formulation is of no particular value for clarifying the issues under discussion, though of course the present discussion will have to be reviewed in that context later on,

8.3.1. Space and Body Frames of Reference We wish to describe rigid body motion in much the way single-particle motion was described in the preceding sections. At this point, familiarity with the inertia tensor and its use in expressing the kinetic energy of a rotating body is assumed. Position and orientation of a rigid body can be specified in an inertial “space-frame” K ,or a “body-frame” whose origin is fixed at the centroid of the body and whose axes are fixed in the body. Another inertial frame might be introduced with origin at the centroid and axes aligned with those of K , but for simplicity from now on we ignore centroid motion and assume the centroid remains at rest. The inertial-frame rotational, kinetic energy Trotcan be written in terms of K-space variables as

L

(8.3.1)

where the angular velocity vector w(r) is time-dependent (if the body is tumbling) and the inertia tensor I(t) is time-dependent because the mass distribution varies relative to an inertial frame (if the body is tumbling). In Eq. (8,3.1), the matrix multi-

LIE ALGEBRAIC DESCRIPTION OF RIGID BODY MOTION

273

plications are indicated by the . symbol. This is purely artificial since the spelled-out form

(8.3.2) is the same as is implied by ordinary matrix multiplication. The dot or “dyadic” notation is intended to discourage the interpretation of Zij as having any geometric significance whatsoever-it is preferably regarded as the array of coefficients of a quadratic form since there are quite enough transformation matrices without introducing another one.” The kinetic energy Tmtcan be written alternatively in terms of body variables?* However, 1 is time-independent (because the body is rigid, its particles are not moving relative to one another in the frame): Tmt = %(t) 2

. I .Z(t).

(8.3.3)

The inertia tensor also relates w to angular momentum 1:

I=I-w.

(8.3.4)

Here, as in Eqs. (8.3.1) and (8.3.3), we use dyadic notation in which expressions like these treat I as a matrix that multiplies a vector by the normal rules of matrix multiplication. Eq. (8.3.4) is simplest in the body frame where f is time-independent. Since I is symmetric, it can be diagonalized by the appropriate choice of axes, in which case

(8.3.5) Like other vectors, the angular momentum components in different coordinate frames are related by a rotation matrix 0, as in Eq. (8.1.22). The matrix 0, being orthogonal, can itself be expressed in terms of basis matrices defined in Eq. (4.2.26): J1=

(

0 0 0 0 0 -1 0 1 0

),

52 =

(

0 0 -1

0 1 0 O), 0 0

0 -1 0 0 0

53 = ( 1

0 0) . (8.3.6) 0

These satisfy commutation relations:

190f course, there is a useful “moment of inertia ellipsoid,” which is a kind of geometric significance that I i j has, but this has little to do with the geometry of transformation between coordinate frames. 201t is still true that vectors w and G signify the same amw and are best regarded as simple algebraic abbreviations for the arrays of elements wl, m2,.. . and Zl,E2, . . ..

274

GAUGE-INVARIANT MECHANICS

+

For rotation through angle about axis

3, defining

C#J

= 43, the formula for 0 is2’

0 = @J.

(8.3.8)

Example 8.3.Z: Let us check this for 53 (which satisfies the equation 5: = -1 after the third row and the third column have been suppressed): e@’3

(+J3)2 (#J3I3 = 1 + 953 -+ +-+... I! 2! 3!

=

cos+

-sin+

After restoring the suppressed rows and columns, when acting on a radius vector, this clearly produces rotation by angle about the z-axis.

+

Problem 8.3.1: Derive Eq. (8.3.8) by “exponentiating”Eq. (4.2.28). As explained above, since the angular velocity is a true vector (actually a pseudovector), the same rotation matrix relates angular velocity vectors w and G: w =o(f)G,

WJ

= ojk(f)Tjk;

(8.3.9)

similarly, the angular momentum vectors 1 and i are related by 1 = o(t)i,

ij

= ojk(t)ik.

(8.3.10)

In terms of 0, known to be an orthogonal matrix, the inertia tensors of E,q. (8.3.1) and (8.3.3) are related by

L o T. L O .

(8.3.11)

To confirm this, substitute it into Eq. (8.3.3) and use Eq. (8.3.9): 2

(8.3.12)

which agrees with Eq. (8.3.1). This manipulation has used the fact, mentioned previously, that the dot operation is performed by standard matrix multiplication. From here on, the dot will be suppressed.

8.3.2. Review of the “Association” of 2 x 2 Matrices to Vectors This section consists mainly of a series of problems that review material developed in Chapter 4,but they should be intelligible even to one who has not studied that material. Furthermore, it is only a digression, as the association between vectors and 2 x 2 matrices that is derived here will not actually be used for the analysis of rigid body motion. The purpose is to refresh your memory of the essential ideas. In the 21Because rotations do not commute, it is not legitimate to factorize this as a product of three exponentials, e-41 ’ ~ & ’ 2 e + 3 ’3, though, of course, angles can be found to make such a factorizationcorrect.

LIE ALGEBRAIC DESCRIPTIONOF RIGID BODY MOTION

275

section following this one, a corresponding association to 3 x 3 matrices will be developed, and that will be the basis for analyzing rigid body motion. A concept to be used again is that of “similarity transformation.” Consider two arrows a and b, and suppose that the pure rotation of a into b is symbolized by b = Ta. Imagine further an azimuthal rotation by some angle (such as one radian) around a; let it be symbolized by 9,.The result of this rotation about a of a vector x is a vector x’ = 9,x. Are we in a position to derive the operator !@b that rotates x by the same angle azimuthally around the vector b? The answer is yes, because we can first rotate b into a using T-I, then rotate around a using @ a , and then rotate back using T. The result is

This is known as a similarity transformation. The rationale for the terminology is that transformations ‘Pa and @b are “similar” transformations around different axes-the word “similar” is used here as it is in the “high school” or “synthetic” Euclidean geometry of rulers and compasses. The same argument would be valid if 9, and @ b designated reflections in planes orthogonal to a and b respectively. Consider the following associations (introduced in Section (4.5.3), where uppercase letters stand for matrices and lowercase letters stand for vectors: x3

x 1 -ix2

X = (xl+ix2

. I = (0~ 1o).

01

-x3 ~

x + X, y + Y,

2

=0 (

-o i~

),

+ x 2~2 ~

2i(x x y) -+ [X,Y]

3

+X

3

=1 (

~3

G

0~

= XY - YX

x . a, (8.3.14)

(8.3.15) (8.3.16)

Though these associations were derived previously, they can be rederived by solving the following series of problems, thereby obviating the need to review that material. In transforming between frames, as for example from inertial frame to body frame, vectors transform by matrix multiplication as in Eqs. (8.1.22),(8.2.2), (8.3.9), and (8.3.10). Because the matrices associated with these vectors themselves represent transformations, their transformations between frames are similarity transformations. This will now be spelled out explicitly. The matrices 01, cr2, ~3 are the Puuli spin matrices; they satisfy the algebraic relations

Problem 8.3.2: Show that (a * a ) ( b . a)= a b l

+ i(a x b) .a.

(8.3.18)

276

GAUGE-INVARIANTMECHANICS

Problem 8.3.3: Show that (8.3.19)

Shortly, this matrix will be symbolizedby S-l; it appeared previously in Eq.(4.5.24). Problem 8.3.4: According to EQ. (8.3.14), the matrices 0 1 , 02, and a3 are "associated with" the unit vectors el, ez, andZ3, respectively. Derive the following similarity transformations: 0 . -

e

-

+ siny 0 2 , = - sin y q + cosy 0 2 , = cosy 0 1

'+3cqei+3

e-ifu3Dazei+3

,-if03a3ei?iu3

= q.

(8.3.20)

A coordinate frame related to the original frame by a rotation by angle y around the n 3 axis has unit vectors given by cos yZ1 sin yZ2, - sin yZ1 cos ~ $ 2 , and$$ The

+

+

right-hand sides of Eq.(8.3.20) are the matrices "associated" with these unit vectors. This demonstrates, in a special case, that when vectors transform by an ordinary rotation, their associated matrices transform by a similarity transformation based on the corresponding matrix. Problem 8.3.5: Making the association x -+ X zz x .CT, show that

det

1x1 = -x

. x.

Problem 8.3.6: Show that the inverse association X + x can be written in component form as ' 1 xi = -tr(X~j).

2

Problem 8.3.7: Compute X' = e - i ( e / 2 ) " u X e i ( e / 2 ) and ~ . ~ show that x' = 6. x)G

+ cose[(ii x x) x 3 + sin$@ x XI.

Note that this is the same as Eq.(4.5.27). Problem 8.3.8: Show that x-yl=

XY +YX 2

i

and x x y + --[XY]. 2

8.3.3. "Association" of 3 x 3 Matrices to Vectors

We now set up a similar association between vectors x and 3 x 3 matrices X. The use of the same uppercase symbol for both 2 x 2 and 3 x 3 matrices should not be too confusing as only 3 x 3 matrices will occur for the remainder of this chapter.

277

LIE ALGEBRAIC DESCRIPTION OF RIGID BODY MOTION

Using the triplet of matrices J defined in Eq. (8.3.6), the association is

x-+XZX*J.

(8.3.21)

Observe that the matrix infinitesimal rotation operator dt and the angular velocity vector o introduced previously are associated in this sense, and their symbols were chosen appropriately to indicate this.

Problem 8.3.9: Show that with this association

x x y -+ [X,Yl, i.e., vector cross products map to matrix commutators.

Problem 8.3.10: By analogy with Eq. (8.3.20), one anticipates the following equations:

(8.3.22) Prove this result.

Problem 8.3.11: Compute X’ = e-@[email protected] show that x‘ = ox,

(8.3.23)

where 0 is given by Eq.(8.3.8). A coordinate frame related to the original frame by a rotation by angle 4 around sinq!i$2, -sin&* cos4$2, and the x 3 axis has unit vectors given by cos& g3. The right-hand sides of Eq. (8.3.22) are the matrices “associated” with these unit vectors. This demonstrates, in a special case, that when vectors transform by an ordinary rotation, their associated matrices transform by a similarity transformation based on the corresponding matrix.

+

+

8.3.4. Some Interpretive Comments Concerning Similarity Transformations What is the fallacy in the following line of “reasoning”? If x + X = x . J, and = X . J, and x = OZ, then X = Ox. The correct relation is X = OxO-’, as we now see. Referring to Fig. 8.3.1, consider a vector y(r) evolving in time according to

X+

YO) = A(t)Y(O).

(8.3.24)

Let us regard this as an active transformation, the vectors y(0) and y(t) being geometrically different arrows-the symbol A is mnemonic for “active.” Consider also

278

GAUGE-INVARIANT MECHANICS

X

FIGURE 8.3.1. Figure illustratingdescriptionof system evolution in two coordinate frames. Most of the discussion does not require the transformationsto be rotations.

another, “barred” coordinate system, related to the previous frame so that

Y=W,

(8.3.25)

or in components,

Our convention is to regard this as a passive transformation, with y and j7 standing for the same geometric arrow-the symbol P is mnemonic for “passive.” Both A and P stand for orthogonal transformations. When described in the “barred” coordinate system, the same time evolution as described in Eq. (8.3.24) is described by

-

YO)=

mm*

(8.3.27)

which serves to define X ( t ) . Combining these equations yields the result

A(t) = PA(r)P-’.

(8.3.28)

One says that operator A has been subjected to a “similarity transformation.” Let us spell out the active transformation in more detail, assuming it is the rotation “associated with” some time-varying vector x ( t ) , according to Eqs. (8.3.8) and (8.3.21): A(t) = ex(‘) = ex(‘)’J, so that y ( t ) = eX(’)’Jy(0).

(8.3.29)

The geometry of this equation is such that the configuration at time r is obtained from the configuration at time 0 by rotating by angle Ix(t)l around axis Z(r). ( x ( t ) is nor, for example, directed along the instantaneous axis of rotation.) Substitution into Eq. (8.3.28) yields

This formula relates finite rotations. The corresponding result for infinitesimal rotations,

LIE ALGEBRAIC DESCRIPTION OF RIGID BODY MOTION

x(r) = P

~ P - ’ ,

279

(8.3.31)

is obtained by making the replacement X + XS and allowing S to be very small. Various questions come to mind: 0 The operators A and P were stated to be active and passive, respectively. This was intended to permit a verbal description of one easily visualized situation to which these formulas apply. Were the active/passiveassignments essential? The answer has to be no. The reader should be able to visualize other pictures where the two transformations have reversed interpretation, or are both interpreted as passive, or both as active. 0 Given its definition in terms of x, namely X = x . J, can we assume = X. J? (Here the absence of an overhead bar on J is appropriate because, by convention, the matrix elements of (51, 52, 53) are the same in all frames.) For the answer to be yes, Eq. (8.3.27),

x

y(t)

= ,%r).JY (0) I

(8.3.32)

would have to be true. Since this is the same result as is obtained by applying Eq. (8.3.29) in the barred frame, the result = X .J is confirmed. 0 In traditional tensor analysis, inrrinsic geometric equalities are represented by demanding consistent indices on frame-dependent components-this is manifest covariance. In traditional vector analysis, intrinsic geometric equalities are represented by boldface equations such as c = a x b. Our policy of maintaining both of these conventions is greatly strained at this point; it has forced us, for example, to swallow the proposition that the quantities y and symbolize the same arrow. This is not entirely academic since, when one gets down to actually calculation using components, it is essential to be clear what frame is to be used. The use of group theoretic methods, in which vectors are associated with other objects, compounds the notational difficulties just discussed. As long as only one coordinate frame is in use, there is no problem. Introduction of other frames brings in complication. When symmetry considerations suggest system invariance under certain passive coordinate transformations, and this is exploited by coordinate transformation, the result is further abstraction.

x

8.3.5. Rigid Body Equations in Rotating Frame We now apply the associations defined in the previous section to the generalized Newton’s equation derived before that. For reasons that will only become clear gradually, rather than studying the evolution of displacement vector x, we will study the evolution of its associated “displacement matrix” X. - The fixed-frame and rotating-frame “displacement matrices” are X = x J and X = X . J, respectively; they are related by

x = oRoT.

(8.3.33)

280

GAUGE-INVARIANT MECHANICS

As we have seen, X and % have geometric interpretations as transformation matrices for infinitesimal rotation around the vectors x and iz. This conforms with the

remark made previously that the operators X and are related by “similarity transformation.” By analogy with our earlier treatment, the “time derivative operator” % should be defined so that “velocity” matrices are related by

-

v = OVOT = O(D1X)OT, (8.3.34) where the parentheses indicate that 5 does not operate on the final factor OT .Dif) 0 yields ferentiating Eq. (8.3.33) with respect to t and using ( d / d r ) ( O T O=

d v = x = -(OXOT) dt = O(k

+ okoT + oXOT

= OXoT

+ [n,%])Of,

(8.3.35)

This conforms with Eq. (8.3.34) if

- -

V = DrX = x + [a,x]

d or 5 = - + [a,.].

(8.3.36)

dt

x

The . in [a,-1 is to be replaced by the quantity being operated upon?2 Yet another dispensation is required in that the symbol & has acquired a new meaning that can be inferred only from the context.23 We can also define a matrix L(’) associated with angular momentum I(’) = x ( ~ x) m%(’) defined in Eq. (8.3.4). Here, in anticipation of analyzing multiparticle systems, the notation has been generalized by introducing superscript (i), which is a particle index. The space and moving frame angular momenta “matrices” for the i th particle are given by

L(i) = m ( i ) [ X ( i ) VW], ,

f;(i)

= OTL(i)O= m ( i ) [ X ( i ) --(i) v I

I.

(8.3.37)

Newton’s torque equation (8.2.36). expressed in arbitrary frame as in Eq. (8.2.40), relative to a point on the axis of rotation, becomes -(j)

DtL

-A(i) - L + [a,P]= T?

(8.3.38)

Expressed here in “associated” matrices, this is the gauge-invariant equation of rotation of a rigid body that consists of a single point mass m(i) subject to applied torque. Since the centroid of a one-particle system is coincident with the mass itself, this equation so far gives a useful and complete description only for a spherical pen”With n regarded as a member of a Lie algebra, the operator [bl, .] is known as its &joint operator adbl = [bl, .I. We refrain from using this notation because formal results from the theory of Lie Algebra will not be. used. 23A computer scientist would say 4 is “overloaded”;a mathematician might call it “abuse of notation“ (but use it nevertheless).

281

LIE ALGEBRAIC DESCRIPTIONOF RIGID BODY MOTION

dulum, with the mass attached to the origin by a light rod (or for an unconstrained cheerleader’s baton, which amounts to the same thing). One wishes to employ Eq. (8.3.38) to obtain ?)

and eventually the evolution

of the angular momentum. This is simplest to do in the body frame, where ?) vanishes. Working in that frame, where it is also true that Ti(’) = [Sa, and using Eq. (8.3.37), one obtains a formula for the “angular momentum” in terms of the “position” (and mass) of the particle:

!8i)],

-L (i)

-(i) = -m (.”[X

1

- ( i ) , Sall. [X

(8.3.39)

As required, this has dimensions [ML2/T]. Substituting it into Eq.(8.3.38), Newton’s torque equation becomes

(8.3.40) This equation will be exploited in the next section. There has been a large investment in establishing formalism. Eq. (8.3.40) represents the first return on this “overhead.” With a remarkable thrice-nested matrix commutation, the torque is augmented by a fictitious torque that accounts for the rotation of the moving frame relative to the inertial frame. 8.3.6. The Euler Equations for a Rigid Body

Consider a rigid body made up of masses m ( i ) .Any one of these masses m(i) can be considered as contributingL(’) (given by Eq. (8.3.39)) to the total angular momentum of the body, and hence an amount I(’) to the moment of inertia tensor: (8.3.41) Here the (per particle) “moment of inertia” tensor has been generalized to be a function that generates the “angular momentum” I linearly from the “angular velocity” 0.Then the total moment of inertia tensors and the total angular momentum are

Since the moment of inertia is a symmetric tensor, we know it can be diagonalized, yielding three orthogonal principal axes and corresponding principal moments of inertia, call them Ti. In the body frame, these are independent of time. The same algebra ensures the existence of principal axes determined by our new, generalized, moment-of-inertia tensor (see Problem 1.2.15). The argument of I, namely n, can itself be expanded in terms of the “basis” matrices Ji defined in Eq. (8.3.6), and each of these, being in turn associated with an angular rotation vector aligned with its respective axis, is the transformation matrix for an infinitesimal rotation around that axis. Superposition is applicable because 1 is a linear operator. Supposing that these

282

GAUGE-INVARIANTMECHANICS

axes were judiciously chosen to start with, to be these principal axes we must have

Clearly it is advantageous to express 61 in terms of components along these axes:

(8.3.44)

61 = T ZiJi. Y i=l

Of course, the coefficients are the body-frame, principal-axis components of the intstantaneous angular velocity of the rigid body. The total angular momentum is then given by

Substitution into Newton's Eq. (8.3.38) (with vanishing torque for simplicity) yields

j-1

j=1

J

i. i

since the J j satisfy the commutation relations of Eq. (8.3.7).The equations of motion become

(8.3.47) and cyclic permutations. Once again, these are the Euler equations. Once the machinery was in place their derivation was remarkably brief.

BIBLIOGRAPHY

References 1. K. R. Symon. Mechanics, 3rd ed., Addison-Wesley, Reading, MA, 1971. 2. D. Kleppner and R. J. Kleppner, An Introduction to Mechanics, McGraw-Hill, New York, 1973, p. 364. 3. D. H. Sattinger and 0. L. Weaver, Lie Groups and Algebras with Applications to Physics, Geometry, and Mechanics. Springer-Verlag, New York, 1993.

BIBLIOGRAPHY

283

References for Further Study Section 8.1.3 D. Kleppner and R. J. Kleppner, An Introduction to Mechanics, McGraw-Hill, New York, 1973, p. 355. K. R. Symon, Mechanics, 3rd ed., Addison-Wesley, Reading, MA, 1971, p. 271.

Section 8.1.4 V. I. Arnold, V.V. Kozlov, and A. I. Neishtadt, Mathematical Aspects of Classical and Celestial Mechanics, 2nd ed., Springer-Verlag, Berlin, 1997, p. 69. G. Pascoli, Elements de Mkcanique Ckleste,Masson, Paris, 1997, p. 150.

Section 8.3 D. H. Sattinger and 0. L. Weaver, Lie Groups and Algebras with Applications to Physics, Geomety, and Mechanics,Springer-Verlag.New York, 1993.

GEOMETRIC PHASES

9.1. THE FOUCAULT PENDULUM

Seen by every one who visits a science museum, the Foucault pendulum is one of the best experiments of all time. It rarely gets the credit it deserves. It is cheap to construct (though it behaves improperly if it is implemented too cheaply) and requires no more sophisticated data acquisition apparatus, even for quantitatively accurate measurements, than a patient “nurse” willing to look at it every few hours for a few days.’ Yet these observations have profound implications. Trusting mechanics, the (not so) profound implications concern the earth’s state of rotation. Trusting that the earth rotates, the (very) profound implications concern the most fundamental aspects of mechanics. If one starts the pendulum swinging at, say, noon, say parallel to a nearby wall, then leaves and comes back a few hours later, the pendulum is no longer swinging parallel to the wall. “It’s the earth’s rotation,” you say. “I’ll check it when the earth has made a complete revolution, so everything will be back where it started.” Coming back at noon the next day, you find that everything is back except the pendulum. The v2

‘If the base has a readable scale permitting one to note the pendulum’s advance over a couple of hours, one can check the performance of the setup to perhaps 10-percent accuracy. If one is prepared to spend all day at the science museum, one can do better yet. 2The most professional setup I have seen is at the Science Museum in London, England, where the co-latitude is 38.5’ and the plane of oscillation rotates 11.8°/h0ur.

204

THE FOUCAULT PENDULUM

285

FIGURE 9.1 .l.Illustration of parallel transport of an ideally mounted pendulum around a line of latitude. Note that the fixed frameis specificto the particular latitude along which the pendulum is carried. The support point is B.

wall is presumably back (the earth’s orbit around the sun introduces only an angular advance of 2n/365) but the pendulum is not.3 Since the Foucault pendulum at rest is nothing other than the plumb bob that was analyzed ad nauseam in the previous chapter, we know that it hangs down along the direction of eflective gravity and not precisely aimed toward the earth’s center. As was anticipated there, we take the resting orientation of the pendulum as defining the effective direction of the acceleration of gravity &c and continue to use the traditional symbol g for its magnitude. Once this is done, we can from then on, to a good approximation,ignore the centrifugal force. Furthermore, we will use&j = -3 even though that neglects the small deviation angle. Using the Foucault pendulum, we can illustrate the discussion of evolving orientation and at the same time introduce the curious concept of holonomy, or more interestingly anholonomy. You are instructed to perform the gedunken experiment illustrated in Fig. 9.1.1. Supporting a simple bob-pendulum of mass m, length A, by an ideal bearing (swivel bearing, not compass bearing) you are to walk west to east with angular velocity W E , once around a nonrotating earth, radius R, along the 8 line of latitude-say the one passing through New York. Here “ideal bearing” means that the bearing cannot apply any torque component parallel to the support wire. (In practice, if the support wire is sufficiently long and slender and the bob sufficiently massive, this condition can be adequately met even without the support point being an actual rotating bearing, but the analysis relating to this will not be described here.) 3Even such an excellent text as Kleppner and Kolenkov [ I J seems to get this wrong when it says, ‘The Foucault pendulum maintains its motion relative to the fixed stars.” And the literature describing the London Science Museum apparatus mentioned above is similar. (Not to gloat, though; the present text undoubtedly has statements that are as wrong as these.)

286

GEOMETRIC PHASES

The practical realization of this experiment with a very long, very heavy bobbed pendulum is known as the Foucault experiment. 9.1 .I.Fictltious Force Solution

At this point, we “solve” the Foucault pendulum problem using “fictitious force” arguments. Though extremely efficient, this solution does not explain how the motion is consistent with the conservation of angular momentum. In the following section, the motion will be studied in greater detail which will also serve to illustrate “gaugeinvariant” reasoning.“ With centrifugal force accounted for by the redefinition of the gravitational “up” direction, the forces acting on the pendulum bob are gravity - m g i and Coriolis force -2mw~(cos0 f - sin t3 x (f i $ ,where “south” has been assigned coordinate s and “east” Z. In the usual approximation of small pendulum oscillations? the equations of motion are

+

5

-S - 2 W E cos6F* + g- S- = 0,

A

(9.1.1) Since W F = W E cos t3 (given numerically by Eq. (8.I .32) to be 0.727 x lop4 s-’ cos 0) and wo = &$ are both frequencies, it is meaningful to compare their magnitudes. Recalling several seconds as being a typical period of Foucault pendulum oscillation, it is clear that

As a general rule, the presence of velocity terms in second-order linear equations such as this reflects the presence of damping or anti-damping (very weak according to the numerical estimate just given), and in that case solution by Laplace transform would be appropriate. However, the fact that the coefficients in the two equations are equal and opposite (and our expectation that the solution should exhibit no damping when none has been included in the model) makes a special method of solution appropriate. Introducing the “complex displacement”

-

ij = s + i z ,

(9.1.3)

the equations of motion become

ij+2iW&+W&

0.

(9.1.4)

4We continue the somewhat artificial distinction between the “fictitious force” and “gauge-invariant” formulations. 5Approximate solution of large-amplitude pendulum motion is treated in Chapter 16.

THE FOUCAULT PENDULUM

287

Substituting (9.1.5) we get (9.1.6) where (9.1.7) Assuming the pendulum is started from rest with southerly amplitude a a), the solution of this equation is

OF 5‘ x a coswt + i-a sinwt

X

a cos wt.

(t(t= 0) = (9.1.8)

0

With the “complex coordinate” defined as in Eq. (9.1.3), with south defined by the real axis and east defined by the imaginary axis, the transformation (9.1 S ) amounts to viewing the pendulum from a frame of reference rotating with angular velocity W F . Be sure to notice, though, that the rotation period is not the same as the earth’s rotation period (except on the equator). When the pendulum is viewed for only a few periods, the motion as given by Eq. (9.1.8) is just what one expects for a pendulum, because the time-dependent factor in transformaoscillation with frequency tion (9.1.5) is so slowly varying. Coming back some hours later and viewing a few periods of oscillation one sees the same thing, but now the plane of oscillation is altered, according to (9.1.5). All the observations have been accounted for.

m,

9.1.2. Gauge-Invariant Solution We now consider the same system in somewhat greater detail, using the gaugeinvariant formulas. Assuming the trip starts in the x , z plane, the “trip equation” of the support point is 4 = wEt, and the location of the point of support (x(t), y(t), z(t)) can be related to its initial position by COSWEt

-sinWEt

0

0

0) (

R T e )

1

Rcose

or r(t) = O(t)r(O). (9.1.9)

Because of attraction by the earth’s mass, which is treated as if it were concentrated at the origin (on the earth’s axis, but slightly south of its center to account for centrifugal force if we are in the Northern hemisphere), the pendulum bob always points more or less toward the origin. It is the orientation of its instantaneous swing plane that will be of primary interest. The gravitational force between the earth and the bob applies torque to the pendulum-that is what makes the pendulum

288

GEOMETRIC PHASES

oscillate-but the torque about the bearing point due to the gravitational attraction to the earth has no component parallel to ?. We therefore expect the radial component of angular momentum to be either exactly or approximately conserved, where the “approximate” reservation has to be included until we are sure the Coriolis force is correctly incorporated. Motion of the pendulum could be analyzed in the inertial space frame K, with unit vectors being (Si, 9, i3) as shown, (note that i l would be “south” except that it is not attached to the earth, -and i3 would be “east.”) We will instead analyze the pendulum in a moving framLK, (S, Z,F), with origin at the support point. The moving-frame axes satisfy g = z x $, and 3 = g x $. The angular velocity of frame is A

A

A

=w ~ i= ? ZSE = wE(cos0 F - sine 5). A

WE

(9.1.10)

x-

(Since this vector is constant, its K-frame components are constant, and its frame components are constant also, even though the body axes are varying. Also, ~ we know by now that we can get away with being careless by identifying i j and W E as intrinsic arrows, even though ( W E l , W E * , WE^) # (WE1, E2, WE3).) Notice that the moving frame in this case is not the body frame of the pendulum (unless the pendulum happens to be in its neutral position), so the use of body-fixed axes is contraindicated because the gravitational force applies a torque that complicates the motion in that frame. The bob location relative to the support position B is -

x = -AT;

(9.1.1 1)

+XI,

Here a small term, quadratic in the pendulum swing angle, has been neglected in the first term and a “transverse” K-frame displacement vector El has been introduced; it satisfies X l . F = 0. The angular momentum of the bob about the point E is 1

-

Lg=?ixmT=mix

:(

-E+GExE

)

.

(9.1.12)

In the 17 frame, the equation of motion is given by Eq. (8.2.40), simplified by the fact that the support point is at rest:

-

d-

-Lg dt

The force acting on mass m is -mg(i gravitationalcenter toward m

= T.

(9.1.13)

+ Z / R ) , directed along the line pointing from

T = -mgf x (i + % / R ) ,

(9.1.14)

which has no component parallel to i.As a result, the angular momentum component along $ is conserved. Taking as its initial value and substituting from

z,(O)

THE FOUCAULT PENDULUM

289

Eqs. (9.1.10) and (9.1.1 1) into Eq.(9.1.12),

Lr(0)= L g * F = m El x -

=

A

( :-)

-XI

'i+mx:[email protected])

(Si - F i + (e2+ s ~ > w Ecos 8 - )iwE sin 8 s),

(9.1.15)

where we have neglected the radial velocity and have defined .f = S and Z 1 . i = 2. The pendulum could be set swinging initially in an elliptical orbit, but for simplicity we suppose that it is released from rest from initial displacement a along the S-axis. As a result Z,(O) = 0. Casually viewed, the pendulum will continue to oscillate in this plane, but we will allow for the possibility that the plane changes gradually (because that is what is observed). Let us then conjecture a solution such that the components of El are given by

-

s = a sinwgt e = asinmot

+ @ ( r ) ) + sin(wgt - + ( t ) ) ) , sin+(t) = (a/2)(-cos(wgt + + ( t ) ) + c o s ( u ~ t- +(?))),

cos + ( t ) = (a/2)(sin(wgt

(9.1.16)

where a is an amplitude factor, @ ( r ) defines the axis of the elliptical orbit, wg = is the pendulum frequency, and the small, quadratic-in-amplitudevertical dispacement is neglected. This form of solution is likely to be valid only if the angle + ( t ) is slowly varying compared to wgt. We are also implicitly assuming that amplitude a remains constant; this relies on the fact that it is an adiabatic invariant, a fact that will be justified in Chapter 14. Substitution from Eq. (9.1.16) into Eq.(9.1.15) yields 1 -a2(1 - cos(2wgt))$+ u2mE cos0 sin2(Wt) - WEAU sinwgt cos@(t)sin8 = 0. 2 (9.1.17) Note that the ansatz (9.1.16) has assigned zero initial radial angular momentum to the pendulum. In detail the motion implied by Eq. (9.1.17) is complicated but, since we are assuming l$l << og, it is possible to distinguish between rapidly varying terms like sin wt and slowly varying terms like $ in Eq. (9.1.17). This permits the equation to be averaged over the rapid variation while treating the slow variation as constant. Recalling that < sin2 wgf >= $, this yields < $ > = -UEcos8.

(9.1.18)

[The sort of averaging by which Eq. (9.1.17) has been derived will be considered further and put on a firmer foundation in the chapter on adiabatic invariants.] This shows that carrying the support point along a line of latitude with angular velocity W E causes the plane of oscillation of the pendulum to rotate relative to axes f , 6, with rotation rate -WE cos 0. The presence of the cos 8 factor gives a nontrivial dependence on latitude; at the end of one earth-rotation period T = 28/WE, the support point has returned to its starting point, but the plane of oscillation has deviated by angle -WE COS 8 2 n / O ~= -28 COS 8.

290

GEOMETRIC PHASES

If one performs this experiment at the North Pole? the support point never moves and, it is clear that the plane of oscillation remains fixed in space. This agrees with Eq. (9.1.17) which, because cos 6 = 1, predicts a -WE rotation rate about the earth’s axis. This just compensates for the +WE rotation rate of the earth. If you were performing this experiment at the North Pole, it would not be necessary to rely on the earth’s rotation to perform the experiment. Rather, holding the pendulum at arm’s length, you could move the support point around a tiny circle (radius equal to the length of your arm)centered over the North Pole. In this case, W E could be chosen at will, say fast compared to the earth’s rotation rate, but still slow compared to og. Since the plane of oscillation of the pendulum would be perceived to be invariant, the plane of oscillation would return to its original orientation after your hand had returned to its starting point. But this special case is misleading, as it suggests that the pendulum necessarily recovers its initial spatial orientation after one complete rotation. In fact, the pendulum orientation suffers secular change, as observation of an actual Foucault pendulum after 24 hours confirms experimentally. Things become more complicated for latitudes other than the poles, and the only experimentally easy value for OE is (2rr/24) hour-’. At the equator there is no apparent rotation of the plane of oscillation-formula (9.1.17) gives this result and it is just as well, since symmetry requires it, especially in the case of the pendulum plane being either parallel to or perpendicular to the equatorial plane. The angle of precession after one revolution is independent of the earth’s rotation rate. Furthermore, the precession is independent of gravitational constant gthe same experiment on the moon would yield the same precession angle (after one moon rotation period). These features show that the effect is geometric rather than dynamic. W o masses joined by an ideal spring and supported on a frictionless horizontal table and oscillating losslessly along the line joining them would exhibit the same result-the line joining them would precess. A geometric analysis of the phenomenon will be pursued in the next section.

Problem 9.1.1: Analyze the average motion of a Foucault pendulum that is performing circular motion about a “vertical” axis. Problem 9.1.2: The circularly swinging pendulum of the previous problem can be used as a clock, with one “tick“ occuring each time the bob completes a full circle. Suppose there are two such clocks, initially synchronized, and at the same place. Suppose further that one of the clocks remains fixed in space (or rather it stays on the earth’s orbit about the sun) while the other, fixed on earth, comes side-by-side with the other clock only once per day. Which clock “runs slow” and by how much?

9.2. “PARALLEL” DISPLACEMENT OF COORDINATE AXES The fact that after one circumnavigation of the earth the Foucault pendulum plane does not return to its original orientation is an example of anholonomy. Though one %‘he distinction between magnetic axis and rotation axis is being ignored.

“PARALLEL“DISPIACEMENTOF COORDINATE AXES

291

may try one’s best to avoid “twisting it,” supporting the pendulum with an ideal bearing, its plane will still be found to be “twisted” when the pendulum returns to its home position. Because no external torque is applied to the pendulum, its radial angular momentum is conserved, but this does not prevent rotational displacement around the radial axis from accumulating. Z,F) used to analyze the Foucault penduRecall the two triads (il, i,i3) and 6, lum. (In this section we will refer to the latter triad as (i,$, f) and restrict the description to inertial frame quantities.) The following comments and questions arise: . . . . A

0

In 3-D, ( x , y, z) space the triads are manifestly not parallel except initially. Can meaning be assigned to “parallel translation” of such a triad in the 2-D surface of the sphere?

0

If so, are (il ,i,i3) and (i,$, f) parallel in this sense?

These questions relate to the concept of parallel displacement of a vector in differential geometry. This concept was introduced by the Italian mathematician LeviCivita toward the end of the nineteenth century. The importance of the concept in physics is discussed, for example, by Berry [2]. He describes requirements to be satisfied for the “parallel transport” of an initially orthonormal triad of unit vectors ($1, $2,:) that is attached to the tip of a radius vector r(r) pointing from the center of a sphere of unit radius to a point moving on the surface of a sphere: 0

0

0

0

r(t) is to remain a unit vector (so the origin of the triad stays on the surface of the sphere). i ( t ) is to remain parallel to r(t). 61 f. is to remain zero. That is, el remains tangent to the sphere. With f normal to the surface, and e2 normal to 2, e2 also remains tangential.

The triad is not to “twist” about f, i.e., w - 5 = 0, where w is the instantaneous angular velocity of the triad.

To visualize the meaning of the final requirement, imagine a single-gimbalmounted globe with bearings at the North and South Poles. Such a mount allows only pure rotations with w parallel to the north-south axis, and an arbitrary point A can be rotated into an arbitrary point B if and only if they are on the same latitude. The path followed by A is a circle with its center on the earth’s axis (the circle’s center however is not, in general, coincident with the earth’s center), and the condition w . f = 0 is not met unless both A and B lie on the equator; only in this case would the motion of the triad be said to be twist-free. Next suppose the globe has a double-gimbal mounting. Then any point A can be rotated into any point B by a twist-free rotation-to obtain a pure rotation about a single axis, one has to seize the globe symmetrically with both hands and twist them in synchronism about the desired axis. Point A is then said to be taking the “great circle” route to B. The center

292

GEOMETRIC PHASES

of such a circle necessarily coincides with the earth’s center. Since the path taken by the Foucault pendulum path is not a great circle path, the triads (;I, 3, i3) and (0, C, +) used to analyze that system are not parallel in the sense being discussed. To meet the first requirement of parallel transport, the evolution of r ( t ) must be describable by an orthogonal matrix O(t) as in Eq. (9.1.9):

r(r) = O(t)r(O).

(9.2.1)

Our task, then, for arbitrary evolution r(r), is to find how O(r) evolves with r. In the differential evolution occurring during time dt, the simplest rotation carrying r to r + r dr is around the axis r x r-the motion remains in a plane through the center of the sphere, The angular speed being r and the sphere having unit radius, the angular velocity vector of this rotation is w = r x i, which implies r=wxr.

(9.2.2)

This rotation does not determine O(r) uniquely, however, since there remains the possibility of further (pre- or post-) rotation of the globe around f. Still, this is the twist-free motion being sought since it satisfies w.r=O,

(9.2.3)

which is the no-twist condition listed above. As in Eq. (9.2.2), the time rates of change of unit vectors and due to angular rotation velocity w are (9.2.4) The moving origin is constrained to stay on the sphere, but it can otherwise be specified arbitrarily by specifying r(r), and then w follows from Eq. (9.2.2). An example “trip plan” for r(t) is that taken by the Foucault pendulum in Fig. 9.1.1, but notice that w as given by Eq. (9.2.2) is not parallel to the north-south axis and hence differs from the angular velocity vector of the Foucault experiment. From Eq. (8.2.4) we know the antisymmetricmatrix “associated” with w is J-o= OTOand hence that

0 = OJ . w = OJ . (r x r).

(9.2.5)

This differential equation is to be integrated to obtain the twist-free rotation matrix O ( t ) .Not surprisingly, the solution turns out to depend on the path taken by r(r). The geometry used to investigate this is shown in Fig. 9.2.1. As P, the point at the tip r moves on the unit sphere, its velocity i lies in the plane tangent to the sphere. Requiring the speed u of the point’s motion to be constant, the vector il = i/u is a unit vector parallel to the motion. (This vector was referred to as the unit tangent vector t in the Frenet-Serret description of a curve in space.) The vector I can be taken as the first axis of yet another local-coordinates coordinate system. It is specialized to the particular motion being studied and is not intended to be useful

"PARALLEL" DISPLACEMENT OF COORDINATE AXES

293

Top view looking back along - r

FIGURE 9.2.1. Vector geometry illustrating the rate of accumulation of angular deviation QT between a twist-free frame and a frame with one axis constrained to line up with i. To reduce clutter, it is assumed that v = 1. At any instant the figure can be drawn to look like the Foucault trajectory along a circular arc as shown here, but in general the point P moves along any smooth closed path on the surface of the sphere.

for any other purpose, but it does have the property by definition, assuming the point P moves on a smooth, kink-free closed path, of returning to its starting value after one complete circuit along a closed path. The other two local orthonormal axes can be taken to be i 3 = i and i2 = r x r/u. The latter is given by

(9.2.6) where 8 and k are to be defined next. There is a best fit (osculating) circle with center at point C, lying in the local orbit plane, and having radius of curvature p 3 sin8 equal to the local curvature. The unit vector k is directed from the origin toward point C. From the elementary physics of circular motion one knows that the acceleration vector has magnitude u 2 / p and points toward C. Explicitly, the acceleration is given bY

..r = -

u2

-r +cosBk

sin8

sin8

(9.2.7)

294

GEOMETRIC PHASES

The component of r lying in the tangential plane is u2 -r+cosBk

3k - . r.

.-

tan 8

(9.2.8)

Here the result has been used that in circular motion with speed v on a circle of radius q. the rate of angular advance satisfies

-v-. d4T

v*

a=--=

dt

rl

(9.2.9)

From the top view in Fig. 9.2.1 looking back along -r, it can be seen that the axis i l twists relative to an axis pointing along the tangential great circle through P, and that dc$T/dt measures the time rate of twist. The radius 9 in this case is the distance from P to S, but this distance does not appear explicitly in the formulas. The accumulated twist in making a complete circuit is c $= ~ -

f f . (r x :)

u dt =

f (r x r”) . dr,

(9.2.10)

where ds = v d t and primes indicate differentiation with respect to s. Let us apply this formula to the trip taken by the Foucault pendulum in Fig. 9.1.1. Using Q. (9.2.8), 1 -r+cos$k sin8 sin8

k - r2n sin8 = 2n cos0.

.-COS*

tan e

(9.2.1 1)

We have calculated the twist i l relative to the no-twist frame. But since we know that i l returns to its starting value, it follows that the no-twist frame returns rotated by --@T = -2n cos 8.This is the same twist we calculated (and observed) for the Foucault pendulum. This implies that the pendulum frame and the no-twist frame are the same thing. As far as I can see there is no a priori connection between the no-twist of geometry and the “no-twist” by the Foucault support bearing, but it has just been shown that they are equivalent. No-twist displacement of the axes is also known as “parallel displacement”of the axes. It is clear that the orientation of the oscillation of a Foucault pendulum would satisfy Eq. (9.2.1 1) for a trip plan more complicated than one that simply follows a line of latitude. Problem 9.2.1: Making any assumptions you wish concerning the orientations of path direction, twist direction, and solid angle orientation, show that the accumulated twist accompanying an arbitrary smooth closed path on the surface of a sphere can be expressed as 28 minus the solid angle enclosed by the path.

TUMBLERS, DIVERS, FALLING CATS, ETC.

295

9.3. TUMBLERS, DIVERS, FALLING CATS, ETC. A question that captures popular attention has to do with falling cats. Everyone “knows” that a cat released from an upside-down position from a height of a meter, or even somewhat less, always manages to land on its feet. Everyone also knows that angular momentum is conserved. An example of a little knowledge being less valuable than none at all is that most people (including some physicists) believe that these two statements are contradictory. One routine “explanation” is that the cat “pushes off,” giving itself some initial angular momentum. Anyone who, one hopes in youth, has investigated this issue experimentally is certain to doubt this explanation, but is likely to leave the observation as yet another of nature’s unsolved problems. Furthermore, this explanation requires the cat to be very good at mechanics to know how hard to push, and prescient enough to know its initial height. The stunts performed by divers and gymnasts are at least as amazing as falling cats. Again the maneuvers appear to violate the laws of nature, but human inability to register exactly what is happening makes one doubt one’s eyes. Does the trampoline artist push off with a twist? Otherwise how can she or he be facing one way on take off, the other way on landing? Is the diver erect or bent on taking off from the diving board? And so on. These ambiguities introduce enough confusion to prevent mental resolution of observations that appear to violate the laws of nature. Once one has unraveled one of these “paradoxes” one is far less troubled by all the rest (and can perhaps advance to harder problems like “does the curve ball really curve?”). The remainder of this chapter is more nearly an invitation for the reader to work through one or more of these “falling cat” problems then a full explanation of any one of them. It seems to me the moves of divers and gymnasts are more controlled, less ambiguous, and more subject to experimentation than are the gyrations of falling cats. They are therefore better subject to disciplined analysis using mechanics. The articles by Frohlich [3-51 describe these things clearly, and there is no reason to repeat his explanations in any detail. But the figures and tables of data in this section (largely extracted from the same article) are intended to give uniform definitions to the variables and the physical parameters. From these, individuals can formulate their own questions and design projects to solve them, qualitatively, semiquantitatively, or quantitatively. Here is the easiest conundrum of this sort, and its resolution. One of the astronauts may have (actually should have) exhibited this while “weightless.” Analysis of the maneuver will be based on Fig. 9.3.1, which defines variables that fix the “shape” or “internal” configuration of a nine-point-mass model of a rigid-limbed human being. The masses are not given here, but the principal moments of inertia are given in Table 9.3.1 for shapes illustrated in Fig. 9.3.2. The 27 moments of inertia given are enough (actually too many) to determine all parameters of the model, especially because symmetry reduces the independent masses to head H,shoulder S , arm A, pelvis P, and leg L , and independent lengths to neck n , shoulders, arm a, body b, pelvis p , and leg 1. (The point 0 in the figure is not a mass, but an origin from which the internal configuration is specified.)

296

GEOMETRIC PHASES

S

TH

A

$ L

FIGURE 9.3.1. Representation of a gymnast by point masses attached to light, rigid, connecting members. All lengths and masses are fixed. All shape (internal) angles are under the control of the gymnast. Euler angles specify the (external) orientation of the pelvis-body combination (assumed rigid).

All these definitions are spelled out in Table 9.3.2, which contains definitions of internal-configuration-defining variables and their valid ranges as well. For some maneuvers, it is essential that the azimuthal angles of the arms be allowed to increase (or decrease) past 2n and beyond without limit. Internal shape angles are (01,#1), the polar and azimuthal angles of the left arm (corresponding right-arm angles are (02, h)), the twist angle CY of the shoulders about the body axis, and the leg bend at the waist A.These angles are freely and arbitrarily controlled by gymnasts and divers

Table 9.3.1. Principal Moments of lnertla about Wlst, Somersault, and Cartwheel Axes for a “Typical” Man.

Position

Name

A

Layout throw Layout somersault Pretwist layout ’hist position %ist throw Loose pike Pretwist pike Tight pike

B C

D E F G

H I [Units are kg-m*]

13

1.10 1.10 3.42 1.06

1.08 4.83 5.54 1.75 2.03

19.85 14.75 16.38 16.65 17.41 10.45 10.09 5.89 3.79

20.66 15.56 19.17 17.24 18.20 7.53 10.42 6.05 3.62

TUMBLERS, DIVERS, FALLING CATS, ETC.

c t

297

t

EL

FIGURE 9.3.2. Positions used by divers and trampolinists (adapted from Frohlich). The names of the positions are; A: layout throw; 6:layout somersault; C: pretwist layout; D: twist position (one arm across chest, one across head); E: twist throw; F: loose pike; G: pretwist pike (arms out, as in C); H: tight pike (elbows bent); and I: tuck (knees bent). Asterisks on figures HA indicate that (with elbow/knee joints missing from the model) realistic mass distribution requires the gymnast to be empowered to change his or her ardleg lengths ( a / / )as well as altering angles. Principal moments of inertia for each of the positions for a typical man, as calculated by Frohlich, are given in Table 9.3.1.

as they execute their maneuvers. From the point of view of mechanics, this makes these variables “nonautonomous.” The position and orientation of the figure in (inertial) space is indicated by @, 0, \Ir, the Euler angles of the body-pelvis section (regarded as rigid), as well as (X,Y, Z), the position 0 of the intersection of body and pelvis. Suppose the astronaut, standing straight up initially, with arms straight down (9 = $1 = &I = 0) slowly (or quickly for that matter) raises her arms toward the front and on through one full revolution, ending therefore with the initial shape. The question is, “What is her final orientation?” The data in Table 9.3.1 that appear most useful for analyzing this motion seem to be the “somersault” moments of inertia, I (A) = 20 kg-m2 in the “layout throw” shape and I (B) = 15 kg-m2 in the “layout somersault” shape. (One may be pleased to note that the moment of inertia I ( E ) = 17.5 kg-m2 in the “twist throw” shape is about halfway between the two just mentioned, which seems right.) To simplify the discussion, let us simplify the model by lumping head and shoulder into one mass, ml = H 2.9, pelvis and and legs into m2 = 2P 2L, and arms into mg = 2A. This is not quite right for an actual human being, but it should be “good enough for government work.” The notation has been chosen to match that of

+

+

I

l

l

I

N

298

I

I

I

I

I

-PI

PI

TUMBLERS, DIVERS, FALLING CATS, ETC.

299

Problem 1.2.14. (Be aware, though, that the symbol a has two different meanings.) The moments of inertia are worked out in terms of masses and lengths in the solution given there. It would be a lucky coincidence if the centroid of the astronaut coincided with the rotation axis of her arms. (If she had been doing this in the "tight pike" position this would be more nearly true.) Nevertheless, in the spirit of the discussion so far, let us take this to be the case, even if it mildly contradicts the moments of inertia already adopted. Proceeding by steps, let us suppose the astronaut pauses at 4 = n/2--what is her orientation? Pauses at 4 = n-what is her orientation? Pauses at 4 = 3n/2-what is her orientation? Ends at 4 = n/2-what is her orientation? These configurations are illustrated in Fig. 9.3.3. In the first step, the astronaut has to apply the torque needed to rotate her arms forward, and the torque of reaction pushes her torso back. Therefore, once her arms are straight forward, her orientation is something like that shown in the second figure. Much the same action occurs in the next step and is illustrated in the third figure. The shoulders of most men would make the next step difficult, but this is irrelevant because the astronaut is a woman with exceptionally supple shoulders. The torque she applies to keep her arms going in the same direction

3

i,

FIGURE 9.3.3. Successive shapes and orientations of an astronaut performing the exercise of completing a full rotation of both arms.

300

GEOMETRIC PHASES

has to be accompanied by a pull on the rest of her body, and this causes her torso to continue rotating in the same direction as in the first two steps. The final step leaves her as shown in the final figure, with orientation very different from her original orientation. Since the astronaut is in free space, angular momentum was presumably preserved at all times in the exercise just described, but no use was made of this fact. Let the moments of inertia (about the shoulder) of arms and torso be ZI and 1 2 . respectively. Also let the angles of arms and torso in inertial space be 011 and 012 as shown, and let $ = a1 012 be the angle between arms and torso. One notes, in passing, a possibly unexpected feature-the angle $ does not in fact advance through 2n in the exercise. But let us work with angular momenta. The angular momentum of the arms is Z d r l and of the torso -Z&. The total angular momentum is zero initially and presumably stays that way throughout. By conservation of angular momentum we conclude that

+

(9.3.1) Solving this differential equation with appropriate initial conditions produces (9.3.2)

Problem 9.3.1: In the solution to Problem 1.2.14, formulas are given for the moments of inertia of a three-particle system. Of the parameters introduced there, if the particles are in the same line, then b = 0 and a is positivehegative when m 2 are aligned paralleYantiparalle1,but with equal magnitudes. Calling the moments of inertia in these two cases Z(A) and I ( B ) , solve for their sums and differences, and from these their values in terms of the masses and lengths. Problem 9.3.2: Using the result of the previous problem and plausible estimates of the parameters specifying the mass distribution of the astronaut described in the text, find her change of orientation after one arm revolution. Having persuaded ourselves in this simplest of cases that it is possible for an object to reorient itself in space even if its angular momentum is zero, we can contemplate the falling cat problem. It should by now be simple, qualitatively anyway, to conjecture a way the cat might do it. Let us represent the cat by the same point mass model we have been using, even to the extent of referring to the cat’s front legs as arms. When the cat falls, the reorientation it requires is a rotation around the twist axis. By going to the “pretwist layout” shape, the cat can make large the moment of inertia of its upper body. From there, the cat can twist its lower body (changing angle u in Fig. 9.3.1) by an arbitrarily large angle with negligible reorientation of its upper body. Then the cat can reduce its upper body moment of inertia and increase its lower body moment of inertia by going to a shape not shown in Fig. 9.3.2, but like the “loose pike” position except for front paws forward parallel to body. In this position the cat can twist its upper body without reorienting its lower body.

TUMBLERS, DIVERS, FALLING CATS, ETC.

301

That the cat can in principle land on its feet has therefore been demonstrated. How the cat is able to do it joins the list of problems that physicists are not talented at solving. The data in this chapter is organized as a source for possible projects somewhat more complicated and more quantitative that the discussion given so far. The remainder of the chapter enumerates issues, questions, and topics of possible interest. Perhaps trampoline moves make the best examples since they can be both amazing and amenable to analytic treatment. Even so, attempts at really faithful mass representations can lead to intolerable complexity. In any case one should start by oversimplifying. The simplest few definable shapes are listed in Table 9.3.3 with suggested parameter choices for achieving them. The case of two (heavy-)hinged masses is discussed analytically in an appendix to the Frohlich paper. Even in two dimensions, interesting and (superficially)paradoxical behavior can be observed. For this sort of motion, it may be helpful to think of planar figures sliding on a horizontal table (with connecting rods able to pass through each other), though the pure somersaults of a gymnast are equivalent once the translational and rotational motion have been separated. For anything more complicated than these, a computer program seems to be called for. But before following that line, one should allow oneself the luxury of speculating how such a program would perform without actually doing the programming. The shortage of reference material in this area does not indicate that the issues are simple or uninteresting-just the opposite is true, but the complexity makes idealization hard. One should begin by reading the references listed at the end of the chapter. Then, concentrating on one or more of the particular moves described there, one can elucidate its “surprising” features. The detailed description can be fairly complicated (as in “How does the cat land on its feet?”), limiting one to semiquantitativeanalysis. Simpler issues can be analyzed more quantitatively.

9.3.1. Miscellaneous Issues 9.3.1.1. Initial Angular Momentum Possibilities

(1) Angular momentum vanishes initially (and hence always). This is a favored case since it is the easiest to analyze and the most fertile ground for seeming paradoxes. The divedgymnast must not apply transverse force to the platform on takeoff. (2) Nonvanishing initial angular momentum only about somersault axis (pelvis). Provided the gymast preserves left-right symmetry, the subsequent motion is fairly simple. (3) Nonvanishing angular momentum around twist axis (head-foot). (4) Nonvanishing initial angular momentum components around front-to-back (cartwheel) axis. According to Frohlich, this possibility is disfavored by divers.

h)

0 0

S

s S S

-

-

-

2 Masses, Hinged, 2-D

2 Masses, Hinged, 3-D

2Masses,HeavyHinge,2-D

2Masses,HeavyHinge,3-D

s

-

Dumbbell, 2-D

Dumbbell, 3-D

S s

H -

Name of Shape

-

-

-

-

-

A -

-

P

P

-

p

P p

L

-

L

L

L

-

L

-

-

-

-

-

n

-

-

-

-

-

-

-

-

-

-

-

-

a

s

b

6

b

b

b

b b

-

-

-

-

-

p -

l

1

l

1

-

l -

X

X

X

X

X

X X

Y

-

Y

-

Y

Y -

Z

2

Z

2

Z

Z Z

@

-

Q

-

Q

Q -

@

0

0

O

O

Q Q

@

-

W

-

-

W -

TUMBLERS, DIVERS, FALLING CATS, ETC.

303

9.3.1.2. Control of Internal Configuration Possibilities (1) The simplest possibility (because it restricts the number of independent equations of motion) has all internal configuration angles under the control of the performer. This makes the problem nonautonomous. As such, it might be considered to be a problem of controls engineering. (2) Somewhat more traditional as physics is to consider the various members of the body free to move subject only to the jointed constraints. For example, one might be interested in the motion of a rag doll. These systems are known as auronomous. From the astronaut example described above it should be clear that generalized coordinates can be defined only with difficulty or not at all, ruling out the simple use of the Lagrange’s equation. This is a task for the PoincarC equation. (3) Continuing the previous point, it is curiously harder to analyze complicated autonomous system than nonautonomous systems. A rag doll on a trampoline requires the Poincari: equation because the shape of the rag doll needs to be calculated while the performer controls his or her own shape. This is perhaps an even worse problem than one might have guessed if multiple bounces are at issue. The performer is probably careful enough to interact with the trampoline in a controlled way, landing feet first or back first, but not arm first or head first. The rag doll motion is certain therefore to be “chaotic” and unpredictable over large times. Depending on assumptions, the motion of the trampoline artist may also be chaotic in some technical sense, but certainly far less chaotic than that of the rag doll. (4)The problem would be made more physical yet if, instead of constraints, the internal configuration were maintained by torsional restoring torques at the joints and if the connecting elements were replaced by springs. This would convert it into a molecular physics problem. 9.3.1.3. External Description Possibilities (1) The simplest possibility has the body free of external forces such as gravity. Analyzing the reorientation of spacewalkers and spacecraft due to internal manipulations are instances of the applicability of this assumption. For a completely satisfactory analysis of dives and stunts, the time interval available for performing them is essential. The appearance of the maneuver relies on both position and orientation, but these can be handled independently. (2) For sky divers, football passes, and curved baseballs, it is necessary to account for air resistance. (3) The possibility of external propulsion introduces a whole new class of problem called a “holonomic propulsion.” Ordinary swimming seems easy enough to understand if the swimmer’s arms move through the water in the head-tofoot direction and return through the air. But how does one swim underwater? In free space one can systematically and arbitrarily reorient oneself without

304

GEOMETRIC PHASES

applying external torque; does that mean one can displace oneself arbitrarily without applying external force? 9.3.1.4. Some (of the Many Posslble) Pmjects (1) Recast/extend/improve upon Frohlich’s description of motion of hinged body [3, Appendix, p. 5911. (2) Analysis of the “back-drop’’ trampoline stunt [3, Fig. 61. (3) Analysis of the “swivel-hips” trampoline stunt [3, Fig. 101. (4) Torque-free “cat-twists.” (5) Distinguish between moves that require initial angular momentum and those that do not. (6) How does a diver avoid making a splash? (7) How does a vaulter exploit the stability or instability of pure rotational motion around his or her principal axes?

BIBLIOGRAPHY References 1. D. Kleppner and R.J. Kleppner, An Introduction to Mechanics, McGraw-Hill, New York,

2.

3. 4. 5.

1973. M.V.Berry, in A. Shapere and F. Wilczek, eds., Geometric Phases in Physics, pp. 7-28, World Scientific, Singapore, 1989. C . Frohlich, Am. J. Phys. 47,583 (1979). C. Frohlich, Sci. Am. 242, 154 (1980). C. Frohlich, Am. J. Phys. 54,590 (1986).

V HAMILTONIAN MECHAN ICS

Hamiltonian methods are distinguished by treating momenta (dynamic quantities) on the same footing as configuration coordinates; together they form “phase space.” In phase space, motion is described by 2n first-order, ordinary equations of motion. Then a description of configuration space evolution by partial differential equations (PDEs), known as Hamilton-Jacobi theory, arises naturally. By referring to the flowchart in the preface to the book, one can see that the next two chapters, though geometric in character, do not depend appreciably on the previous material in this book, though elementary Langrangian notions, such as the definition of momentum pi conjugate to q’

are assumed. Though momentum is thoroughly familiar from Newtonian mechanics, it is only in Hamiltonian mechanics that momentum is treated as an independent variable on a nearly equal footing with the original coordinates. The Hamiltonian is defined by

where it is essential that the argument q be eliminated in favor or p as indicated. Only when we get to symplectic geometry, which is the geometry of phase space, is the material in the early geometric chapters required.

This Page Intentionally Left Blank

10 HAMILTONIAN TREATMENT OF GEOMETRIC OPTICS

In his formulation of classical mechanics, Hamilton was motivated by geometrical optics. For that reason, we digress into this subject. Though it may not be immediately apparent, only those aspects that can be reinterpreted as results or methods of classical mechanics will be discussed, though without much detail. It would be somewhat pointless to formulate mechanics in terms of a partial differential equation (which is what the Hamilton-Jacobi equation is) without reviewing a context in which that mathematics is familiar, namely physical optics. In this chapter, traditional vector analysis-gradients, divergences, curls-will be used.

10.1, MOTIVATION

Our initial purpose is to recast mechanics to more nearly resemble optics. One is to visualize a congruence of space-filling and nonintersecting valid trajectories, with time t parameterizing each curve. See Fig. 10.1.1. This very picture of the problem represents a deviation from Newtonian mechanics toward a description like the beam and wave-oriented discussion of physical optics. This formulation emphasizes the importance of boundary conditions satisfied by initial and final configuration space coordinates, which contrasts with Newtonian mechanics, which concerns itself more naturally with matching initial conditions for both configuration and velocity space coordinates. Also, while Newtonian mechanics concentrates its attention on the particular trajectory of a solitary system under study, it is more natural in optics to consider whole “beams” of rays, and the corresponding fields. We are after the analog of the rays of geometric optics. In the process we will also find the analog of wuvefronrs. The equation analogous to the “eikonal,” or wavefront, equation will be the Hamilton-Jacobi equation. The action S is the analog of 307

308

HAMILTONIAN TREATMENT OF GEOMETRIC OPTICS

wavefronts :/ \:

I

phase space

IP

z

FIGURE 10.1.l.(a) Configuration space curves, transverse coordinate x versus longitudinal coordinate z, natural for describing optical rays. They can usefully be parameterized by arc length s. (b) Phase space trajectories, p versus x . They cannot cross. Modulo an arbitrary additive constant, they can best be regarded as parameterized by time t. It is sometimes useful to refer orbits to a single reference orbit as shown.

the optical path length, and the principle of least action, also known as Hamilton’s principle, is the analog of Fermat’s principle of least time. Not attempting to justify it a priori, we will take Hamilton’s principle as a postulate leading to equations whose correctness is to be confirmed later. One way in which the analogy between mechanics and optics is imperfect follows from the fact that in variational integrals like Eq. (5.3.1), the curves are parameterized by independent variable t, whereas in geometric optics, it is the path taken by a ray rather than the rate of progress along it that is of interest. This can perhaps be understood by the historical fact that the existence of photons was not even contemplated when geometric optics was developed, the principle of least time notwithstanding. This made it natural to parameterize rays with arc length s or, in the paraxial case, with coordinate z along some straight axis. See Eq. (5.3.2). In mechanics we parameterize trajectories by time r and by treating velocities, or perhaps momenta, on the same footing as displacements, keeping track of progress along trajectories. This makes it natural to visualize trajectories in “phase space,” such as those in Fig. 5.3.lb. An invaluable property of phase space is that trajectories cannot cross-this follows because the instantaneous values of positions and velocities uniquely specify the subsequent evolution of the system. Another way the analogy of mechanics with optics will be defective is that, in systems describable by generalized coordinates, the concept of orthogonality does not exist in general. While rays are perpendicular to wavefronts in optics, the absence of metrics4istances and angles-requires that the relation between trajectories and “surfaces of constant phase” be specified differently in mechanics. This leads eventually to the so-called “symplectic geometry.” To a physicist who is unwilling to distinguish between “geometry” and “high school geometry,” this might better be called “symplectic nongeometry,” as the hardest step toward understanding it may

MOTIVATION

309

be the jettisoning of much of the geometric intuition acquired in high school. Stated differently, it may not seem particularly natural to a physicist to impose a geometric interpretation on Lagrangians and Hamiltoniansthat have previously been thought to play only formal roles as artificial functions whose only purposes were to be formally differentiated.

10.1.I. The Scalar Wave Equation To study geometric optics in media with spatially varying index of refraction n = n ( r ) , one should work with electric and magnetic fields, but to reduce complication (without compromising the issues to be analyzed) we will work with scalar waves. The simplest example is a plane wave in a medium with constant index of refraction n , @(r,t ) = a % ei(k.r-m‘) = a 8 eiko(nk.r-cf)

(10.1. l )

Here 8 stands for real part, a is a constant amplitude, c is the speed of light in a vacuum, w is the angular frequency, and k, the “wave vector,” satisfies k = konk, where k is a unit vector pointing in the wave direction and ko is the “vacuum wave number.” (That is, ko E 2n/ho = w / c , where ho is the vacuum wavelength for the Linearity implies that all time variation has the same frequency given frequency 0.) everywhere. The index of refraction n is a dimensionless number, typically in the range of 1 to 2 for the optics of light. Because n is the wavelength in free space divided by the local wavelength, the product n dr, or distance “weighted” by n, where dr is a path increment along k, is said to be the “optical path length.” The “phase velocity” is given by w

c

lkl

n

v=-=---,

(10.1.2)

and the “group velocity” will not be needed. A result to be used immediately, that is valid for constant n, is

-

v ( n k . r) = n v < k r) = n k .

(10.1.3)

A wave that is somewhat more general than is given by Eq. (10.1.1) but that has the same frequency is required if n ( r ) depends on position:

(r, t ) =

(r)sei’0(@(r)-ct).

(10.1.4)

Since this wave function must satisfy the wave equation, the (weak) spatial variation of the amplitude $(r) is necessarily position-dependent,so that @ can satisfy (10.1.5)

310

HAMILTONIANTREATMENT OF GEOMETRIC OPTICS

wave vector

a

=k

wave fronts, @ = constant

FIGURE 10.1.2. The wave vector k is normal to the wavefronts of a plane wave.

FIGURE 10.1.3. Wavefronts of light wave in a medium with nonconstant index of refraction.

which is the wave equation for a wave of velocity nlc. The wave vector and wavefronts for plane and not-quite plane waves are shown in Figures 10.1.2 and 10.1.3. 10.1.2. The Eikonal Equation

Since the function 4 (r) in Q.(10.1.4) takes the place of nk r in Eq.(10.1.l), it generalizes the previously mentioned optical path length; t# is known as the “eikonal,” a name with no mnemonic virtue whatsoever to recommend it. One can think of as a “wave phase” advancing by 2n and beyond as one moves a distance equal to one wavelength and beyond along a ray. The condition characterizing “geometric optics” is for wavelength A to be short compared to distances x over which n(r) varies appreciably in a fractional sense. << 1, or More explicitly, this is # IJ

a

1 dn n dx

--

<< k .

(10.1.6)

This is known as an “adiabatic” condition. (This condition is violated at boundaries, for example at the surfaces of lenses, but this can be accommodated by matching boundary conditions.) This approximation will permit dropping terms proportional to ( d n / d x \ .By matching exponents of Eq. (10.1.4) and Eq. (10.1.1) locally, one can

MOTIVATION

311

define a local wave vector k such that

This amounts to best-approximatingthe wave function locally by the plane-wave solution of Eq.(10.1.1). Because n and f are no longer constant, Eq. (10.1.3) becomes

~4 = (Vn(r))k(r) r

+ n(r)V(i;(r) . r) x n(r)L(r),

(10.1.8)

where inequality Eq. (10.1.6) has been used to show that the first term is small compared to the second. Also, spatial derivatives of f(r) have been dropped because deviation of the local plane-wave solution from the actual wave are necessarily proportional to IdnldxI. (A simple rule of thumb expressing the approximation is that all terms that are zero in the constant-n limit can be dropped. Eq. (10.1.8) shows that V+ varies slowly, even though qi varies greatly (i.e., by order 211) on the scale of one wavelength. One must ascertain, with qi given by Eq. (10.1.7), whether \I,, as given by Eq. (10.1.4), satisfies the wave equation (10.1S ) . Differentiating Eq. (10.1.4) twice, the approximation can be made by neglecting the spatial variation of r-dependent factors, n(r) and @(r),relative to that of eikonal qi(r):

V\I,

%

ikoV+(r)

\I,

and V2W = V . V\I, M -kilV+12\I,.

(10.1.9)

Using this approximation and substituting Eq.(lO.lS), we obtain

Iv+(r)i2 = n2(r) or IV+(r)I = n(r),

(10.1.10)

which is known as the “the eikonal equation.” It can be seen to be equivalent to Eq. (10.1.8), provided qi(r) and k(r) are related by Eq. (10.1.7), in which case the eikonal equation can be written as a vector equation that fixes the direction as well as the magnitude of V+:

v+ = nk.

(10.1.11)

The real content of this equation is twofold: It relates rate of phase advance V+, in magnitude, to the local index of refraction and, in direction, to the ray direction. Since it could have been considered obvious, and written down without apology at the start of this section, the discussion to this point can be regarded as a review of the wave equation and wave theory in the short-wavelength limit.

10.1.3. Determinationof Rays from Wavefronts Rays and wavefronts are related as shown in Fig. 10.1.4. Any displacement dqf) lying in a surface, qi (r) = constant, satisfies ( 10.1.12)

312

HAMILTONIAN TREATMENT OF GEOMETRIC OPTICS

X

.

FIGURE 10.1.4. Geometry relating a ray to the wavefronts it crosses.

This shows that the vector V + is orthogonal to the surface of constant 4 (which is why it is called the “gradient”-+(r) varies most rapidly in that direction). From Eq. (10.1.11) we then obtain the result that k(r) is locally orthogonal to a surface of constant 4(r). “Wavefronts” are, by definition, surfaces of constant # ( r ) , and rays are directed locally along k(r).’ It has been shown then that “rays” are curves that are everywhere normal to wavefronts. If the displacement dr lies along the ray, and ds is its length, then d r / d s is a unit vector and hence dr ds

k=-.

( 10.1.13)

Combining Eqs. (10.1.11) and (10.1.13), we obtain a differential equation for the ray, dr ds

1 n

- = - v+.

( 10.1.14)

10.1.4. The Ray Equation in Geometric Optics

Equation (10.1.14) is a hybrid equation containing two unknown functions r(s) and

4 (r) and as such is only useful if the wavefront function 4 (r) is already known. But we can convert it into a differential equation for r(s) alone. Expressing Eq.(10.1.14) in component form, differentiating it, and then resubstituting from it yields ( 10.1.15) ‘The unit vectors k(r), defined throughout some region and having the property that smooth rays are to be drawn everywhere tangent to them. is a good picture to keep in mind when contemplating the “vector

fields” of geometric mechanics. In the jargon of “dynamical systems,” the entire pattern of rays is known as a “flow.” Another mathematical expression for them is a “congruence” of rays. Though perfectly natural in optics, such a congruence may seem artificial in mechanics, but it may be the single most important concept differentiating between the dynamical systems approach and Newtonian mechanics.

MOTIVATION

313

The final expression can be reexpressed using Eq. (10.1.10):

( 10.1.16)

Combining results yields the vector equation ( 10.1.17)

This is the “ray equation.” A second-order, ordinary differential equation, it is the analog for “light trajectories” of the Newton equation for a point particle. In this analogy arc length s plays the role of time and the index of refraction n(r) is somewhat analogous to the potential energy function U(r). The analogy will be made more precise shortly. All of the geometric optics of refraction of light in the presence of variable optical media, such as lenses, can be based on the ray equation.

Problem 10.1.1: (a) Light Rays in a Lens-Like Medium. Consider paraxial rays, that is, rays almost parallel to the z-axis, in a medium, shown in Fig. 10.1.5, for which the index of refraction is a quadratic function of the transverse distance from the axis,

n ( x , y ) = no(l+ B r 2 ) ,

+

where r 2 = x 2 y 2 and B is a constant. Given initial values (xo, yo) and initial slopes (x;, y;) at the plane z = 0, using Eq. (10.1.17) find the space curve followed by the light ray. See any book about fiber optics for discussion of applications of such media. (b) Next suppose the coefficient B in part (a) depends on z (arbitrarily though consistently with short-wavelength approximation (10.1A)) but that x and y

FIGURE 10.1.5. A ray of light being guided by a graded fiber having axially symmetric index of refraction n. Darker shading corresponds to lower index of refraction.

314

HAMILTONIAN TREATMENT OF GEOMETRIC OPTICS

can be approximated for small r as in part (a). In that case the “linearized” ray equation becomes

where p ( z ) = n ( z ) ( d x / d z ) ,prime stands for d / d z , and there is a similar equation for y. Consider any two (independent) solutions xl(z) and x2(z) of this equation. For example. xl(z) can be the “cosine-like’’ solution with C(0) = 1, C’(0) = 0, and q ( z ) E S ( z ) the “sine-like” solution with S(0)= 0, n(O)S’(O) = 1. Show that propagation of any solution from z = zo to z = z l can be described by a matrix equation

where M is a 2 x 2 matrix called the “transfer matrix.” Identify the matrix elements of M with the cosine-like and sine-like solutions. Show also, for sufficiently small values of r , that the expression obtained from two separate rays, XI ( Z ) P Z ( Z >

- X Z ( Z ) P l (Z),

is conserved as z varies. Finally, use this result to show that det I M 1 = 1. The analog of this result in mechanics is Liouville’s theorem. In the context of optics it would not be difficult to make a more general proof after removing assumptions made in introducing this problem. Problem 10.1.2: Consider an optical medium with spherical symmetry (e.g., the earth’s atmosphere) such that the index of refraction n ( r ) is a function only of distance r from the center. Let d be the perpendicular distance from the center of the sphere to any tangent to the ray. Show that the product nd is conserved along the ray. This is an analog of the conservation of angular momentum. 10.1.5. Variation of Light Intensity along a Ray It is not difficult to include a bit of physical optics in this formulation-namely the variation of light intensity along a light ray. While not leading to a formula as directly analogous to mechanics as the ray equation, studying the evolution of light intensity encourages one to pay attention not just to one moving particle but to all nearby trajectories and to the evolution of their local density. Is one correct in anticipating that the intensity will decrease as the rays diverge? The mechanics addressing questions like this leads to Liouville’s theorem and an entire “dynamical systems” approach. Visualize the trajectory described so far as one ray in a steady, transversely extended beam of light. The quantity needed to describe the light intensity l is the Poynting vector S = lk = ZV+/n, where, as before, k is a unit vector pointing along the ray, and the new physics needed is conservation of energy. The energy

MOTIVATION

315

-

crossing directed area dA per unit time is S dA,and energy conservation is expressed by the continuity equation 0=v

.s = v .

(+)

1

( 10.1.18)

where Eq.(10.1.14) has been used, and there is no time derivative term because the beam is steady. This equation expresses the condition that there are no local sources or sinks of energy. Fig. 10.1.6 shows a sample of rays and wavefronts belonging to the steady waves-there is one ray and one wavefront through each point. It is important to distinguish between rays and general curves in the region under study. One such general curve C joins the points PI and 9 in the figure. As shown in Eq. (10.1.14), because Vq5 is directed along a ray, the operator Vq5. V = n ( d / d s ) is a directional derivative for displacements along a ray s; with s being arc length, this operator is tailor-made for use with the ray parameterization r(s). Performing the differentiation indicated in Fiq. (10.1.18) and solving for V2qj yields

's expression allows V 2 q j / n to be integrated, but only along a ray, to obtain the

tion of I (10.1.20)

FIGURE 10.1.6. A sample of rays and wavefronts belonging to a steady scalar "light"wave and also a non-ray integration path along which the Lagrange integral invariant is being calculated. There is one ray and one wavefront through each point.

316

HAMILTONIANTREATMENTOF GEOMETRICOPT~CS

Taking ratios between two points, at longitudinal positions s1 to s2, yields (10.1.21)

This result is significant in that, using only ray analysis, it governs a propertyintensity-that might have seemed to need a wave calculation.

Example 10.1.1: In the special case of a plane wave propagating in a region of constant n, one has V2# = 0 and the intensity is constant along a ray. 10.2. VARIATIONAL PRINCIPLES

10.2.1. The Lagrange Integral Invariant and Snell’s Law In this section, we will work with a “congruenceof curves,” which is the name given to families of curves like the rays just encountered that accompany a definite, singlevalued wavefront function #; it is a family of nonintersecting smooth curves that “fill” the region of space under study, one and only one curve passing through each point. Returning to Eq. (10.1.14), it can be seen that the quantity nk is the gradient of a scalar function, namely #. This causes the field nk to be related to # in the same way that an electric field is related to an electric potential, and similar “conservation” properties follow. We can define an invariant integral,2

6, 4

L.I.I.(C) =

nk . dr

(10.2.1)

This so-called “Lagrange invariant integral” takes the same value for any path C joining PI and Pz, points that are not necessarily on the same ray. One such path is shown in Fig. 10.1.6. As the integration path crosses any particular ray, such as at point P in the figure, the unit vector k corresponding to that ray is used in the integrand. Hence, though the integral is independent of path it depends on the particular steady-wave field under discussion. The same result can be expressed as the vanishing of an integral around any closed path,

f

nk . dr = 0,

(10.2.2)

v

x (nk) = 0.

(10.2.3)

or as the vector equation3

21n the context of mechanics an analogous integral is called the Poincadxartan integral invariant. 3 A possibility that is easily overlooked and is not being addressed satisfactorily here, but that will actually become important later on, is the case where nk is the gradient of a scalar function everywhere except at a single point at which it diverges. In that case the integral for curves enclosing the divergent point, though independent of path, may not vanish. This problem will be faced in a later chapter.

VARIATIONAL PRINCIPLES

317

FIGURE 10.2.1. Refraction of a ray as it passes from medium with index of refraction nl to medium with index of refraction n2. The Lagrange invariant integral is evaluated over the closed, broken line.

Example 10.2.1: Snell’s Law. Consider the situation illustrated in Fig. 10.2.1, with light leaving a medium with index nl and entering a medium with index n2. It is assumed that the transition is abrupt compared to any microscopic curvature of the interface but gradual compared to the wavelength of the light. (The latter assumption should not really be necessary but it will save us from getting distracted.) An actual ray is illustrated in the figure and the Lagrange invariant can be evaluated along the broken line shown. The result, obtained by neglecting the short end segments (as one has probably done before in electricity and magnetism), is n 1 sin 81 = n2 sin 02,

(10.2.4)

which is, of course, Snell’s law.

10.2.2. The Principle of Least Time The optical path length of an interval dr has previously been defined to be nldrl = nds. The optical path length O.P.L.(C) of the same curve C illustrated in Fig. 10.1.6 and used for the Lagrange invariant integral in Eq. (10.2.1) is

s,

9

O.P.L.(C) =

nldrl;

(10.2.5)

this need not be a path light actually follows, but if a photon did travel that path the time taken would be O.P.L.(C)/c because nds nu = -d t = d t . C

C

If the path taken is an actual ray R, which is only possible if PI and same ray R as in Fig. 10.2.2, then the optical path length is

(10.2.6)

4 lie on the

318

HAMILTONIAN TREATMENT OF GEOMETRIC OPTICS

FtGURE 10.52. Ray and non-ray curves used in the derivation of the principle of least time. Note that k . dr = cose < 1.

o.P.L.(R) = /R n k .dr =

Vc$. dr = #1(P2) - 449).

(10.2.7)

We can calculate both the L.I.I. and the O.P.L. for both the ray joining PI and f2 and for the non-ray C. The following series of inferences are each simple to derive:

and hence

O.P.L.(R) = L.I.I.(R),

(10.2.8a)

L.I.L(R) = L.I.I.(C),

(10.2.8b)

L.I.I.(C) < O.P.L.(C),

(10.2.8~)

O.P.L.(R) < O.P.L.(C).

(10.2.8d)

Eq. (10.2.8a) is the same as Eq.(10.2.7). Eq. (10.2.8b) follows because L.I.I. is in fact invariant. Eq. (10.2.8~)follows because the integrands of L.I.I. and O.P.L. differ only by a factor cos 0 where 8 is the angle between curve C and the particular (different) ray R’passing through the point P. Since cose < 1, the inequality follows. The conclusion is known as Fennut’s Principle. Spelled out more explicitly for ray R and non-ray C joining the same points, it is knds

-= L n d s .

(10.2.9)

Except for a factor c, the optical path length is the same as the time a “photon” would take traveling along the ray, and for that reason the condition is also known as the Principle ofLeust Eme-light gets from point PI to point P2 by that path which takes the least time. (Under some conditions this becomes a exfremum condition, and not necessarily a minimum.)

PARAXIAL OPTICS, GAUSSIAN OPTICS, MATRIX OPTICS

319

10.3. PARAXIAL OPTICS, GAUSSIAN OPTICS, MATRIX OPTICS In this section we will consider a light beam traveling almost parallel to and close to an axis-call it the z-axis. These conditions constitute the paraxial conditions. This setup lends itself to linearization and discussion using matrices, and is also known as Gaussian optics. For full generality the refractive index n ( x , y , z) depends on transverse coordinates x and y as well as on longitudinal coordinate z. But since the purpose of this section is only to illustrate Hamilton’s original line of thought, we will consider 1-D optics only and assume that n = n ( x , z) is independent of y. Consider Fig. 10.3.1 in which a ray R propagates from input plane z1 in a region with index of refraction nl to output plane 22 in a region with index of refraction n2. In the paraxial approximation, the intervals, such as from z to z’, are neglected (except as they contribute phase advance). The two outgoing rays are shown separate to illustrate the sort of effect that is to be neglected. Also, points such as z, z’, and z” are to be treated as essentially the same point. The fact that this is manifestly not the case reflects the fact that, to support this discussion, the figure illustrates a case somewhat beyond where the approximation would normally be judged appropriate. As the ray R passes z1,its displacement is x1 and its angle is 81, but, for reasons to be explained later, the quantity p1 = nl81 will be used as the coordinate fixing the slope of the straight line? Sometimes p may even be called a “momentum” in anticipation of later results. Curve C is a tentative trajectory, close to R but not, in fact, a physically possible ray. The quantity to be emphasized is #(zl, z2), the optical path length from the input plane at z1 to the output plane at z2.

FIGURE 10.3.1. Paraxial ray 7C propagates from q in a region with index of refraction n1 to z2 in a region with index of refraction %. A spherical surface of radius R separates the two regions. Curve C is tentatively, but in fact not, a ray.

4The ray equation, Eq. (10.1.17), suggests that the variable n sin 0 is preferable to no, but the distinction will not be pursued here.

320

HAMILTONIAN TREATMENT OF CIEOMETRIC OPTICS

All important points will be made in the following series of problems, adapted from Guillemin and Sternberg [ 11.

Problem 10.3.1: Referring to Fig. 10.3.1,the ray R leads from the input plane at z1 to the output plane at z2. With coordinate p = no defined as above: Show that propagation from z1 to z is described by 1z-z’

(10.3.1)

Using Snell’s law, the approximation x’ x x, and related approximations, show that propagation from z to z’ of ray R is described by (10.3.2)

The point z’ is to be interpreted as z+, just to the right of the interface. The 2 x 2 matrices in the previous parts are called “transfer matrices.” Find the transfer matrix M(21)for propagation from z1 to z2 as defined by (10.3.3) Suppose, instead of regarding Confirm that det lM(21)1= AD - BC = this as an initial-value problem for which the output variables are to be predicted, one wishes to formulate it as a boundary-value problem in which the input and output displacements are known but the input and output slopes are Dxz)/B to be determined. Show that p1 = (x2 - A x l ) / Band p 2 = (-xi can be used to obtain p1 and p2 from XI and x2. Under what circumstance will this fail, and what is the optical interpretation of this failure?

+

Problem 10.3.2: You are to calculate the optical path lengths for the three path segments of tentative ray C . (a) Show that the optical path length of the segment of C from z1 to z is given approximately by 1 (x” - x,)2

4(z1 z> = n l ( z - z i ) + n i -

2

(10.3.4)

2-21

You are to treat z , z‘, and Z” as being essentially the same point. As drawn in the figure, this optical path length is somewhat longer than the corresponding segment of the true ray R.Defining py = nl(x” - xl)/(z - 21) as above, 5F0rthe simple I-D case under study, the condition det ]MI= 1 is necessary and sufficient for M to be a physically realizable transfer matrix. For higher dimensionality, it is necessary but not sufficient.

321

PARAXIAL OPTICS, GAUSSIAN OPTICS, MATRIX OPTICS

show that #(z1, z) also can be written as

(b) There is one phase advance for which the separation of z, z’, and zl’ cannot be neglected. The tiny segment of path from the transverse plane at z to the spherical surface occurs in a region with index nl while, on-axis, this region has index 112. Show that this discrepancy can be accounted for in the optical path length of C by #(z,

Zll)

l n z - n l I12 = --x . 2 R

(10.3.6)

Defining p;’ = n2(x2 - x ” ) / ( z 2 - z ) , show that d ( z ,z”) can also be written as

(10.3.7) (c) Now that the gap region has been accounted for, the rest of the optical path length can be calculated as in part (a), and the complete optical path length of C is #(Zl,

z2) = n l ( z - z1) + n 2 ( z 2 - z )

+ 2 [nl (x”z -- x112 --n2 -R n1 x112 + n2 (x222--x z” ) ~ 21

(10.3.8) (d) By differentiating Eq. (10.3.8) with respect to X I ’ , regarded as its only variable, express the condition that the optical path length of C be an extremum. Show that this condition, reexpressed in terms of p;’ and p;, is (10.3.9) which is the same condition as was obtained using Snell’s law in part (b) of the previous problem.

Problem 10.3.3: Eqs. (10.3.5) and (10.3.7) have been written to facilitate concaknation, or the successive accumulation of the effects of consecutive segments. A similar expression can be written for the third segment. (a) Using this remark and reverting to the case of true ray R,show that 4 (z 1, z 2 ) can also be written as

322

HAMILTONIAN TREATMENT OF GEOMETRIC OPTICS

where “constant” means independent of x and p . The motivation behind this form is that it can next be transformed to depend only on displacementsat the endpoints and not on the “momenta.” (b) When expressed entirely in terms of x1 and x2. 4(z1, 2 2 ) is known as “Hamilton’s point characteristic,” W(x1,xz). Substituting from part (c) of the first problem of this series, show that W(x1,x2) = constant

- 2~1x2 + AX: + OX: 2B

(10.3.11)

Repeating it for emphasis, W(x1,x2) is the optical path length expressed in terms of XI and x2. As such it has an intuitively simple qualitative content, whether or not it is analytically calculable. Show finally that (10.3.12) These are “Hamilton’s equations” in embryonic form. The reason Hamilton , considered W ( q , x2) valuable was that it could be obtained from ~ ( z IZZ), which was obtainable from the simple geometric constructions of the previous two problems.

10.4. HUYGENS’ PRINCIPLE

Because rays of light respect the principle of least time, there is a construction due to Huygens by which their trajectories can be constructed. (This construction is also much used in heuristic analyses of diffraction patterns, for example due to multiple thin slits.) This principle is illustrated in Fig. 10.4.1. In this case it is assumed that the medium in which the light is traveling is anisotropic in such a way that light travels twice as fast in the y direction as in the x direction. (None of the formulas appearing earlier in this chapter apply to this case since the most general possibility to this point has been the variability with position of the index of refraction.) As well as making it easier to draw a figure illustrating Huygens’ principle, this anisotropic possibility more nearly represents the degree of complexity encountered when one proceeds from optics to mechanics in the next chapter. It is not really obvious that a medium such as this is possible or, if it is, that the principle of least time is actually valid, but we accept both to be true. The little ellipses in the figure indicate the progress that rays starting from their centers would make in one unit of time. Recall that, by definition, a “wavefront” is a surface of constant 4, where 4 is the “optical path length,” which is the flight time to that point along the trajectory taken by the light. The Huygens construction centers numerous little ellipses on one wavefront and calls the envelope they define the “next” wavefront. A few of the centers are marked by heavy dots, two on each of three wavefronts, or rather on the same wavefront photographed at equal time intervals.

HUYGENS‘ PRINCIPLE

323

FIGURE 10.4.1. Rays and wavefronts in a medium in which the velocity of propagation of light traveling in the y direction is twice as great as for light traveling in the x direction. Three “snapshots” of a wavefront taken at equal time intervals are shown.

The emphasized points, such as P, Q , and R, are chosen so that the point on the next curve is at the point of tangency of the little ellipse and the next envelope. If light actually travels along this path it gets to P at time 3, to Q at time 4, and to R at time 5 , where “time” is the value of @. Huygens’ principle declares that the light actually does travel along the curve P Q R constructed as has been described. (In this case the rays are nor normal to the wavefronts.) To make this persuasive, a spurious wavefront supposedly appropriate for rays emerging from the point Q is drawn with a heavy broken line. This spurious “wavefront” has the property that light leaving point Q gets to it in 1 unit of time. We consider the path P Q as the correcr path taken by a ray under consideration. If the spurious wavefront were correct, the ray would proceed from Q to point R’ along the faint broken line. This means the light will have followed a path P Q R ’ with a kink at point P . But in that case, light could have got to R’ even more quickly by a path not including P. Since this contradicts our hypothesis, the ray must actually proceed to point R. In other words, any tentative “next” wavefront is spurious unless it is tangential to the envelope of little ellipses centered on the earlier wavefront. As already noted, because of the anisotropy of the medium, the light rays are not normal to the wavefronts of constant 4. On the other hand, the vector

P=v+

(10.4.1)

324

HAMILTONIANTREATMENT OF GEOMETRIC OPTICS

is normal to these wavefronts. This quantity has been called p in anticipation of a similar formula that will be encountered in mechanics; the momentum p will satisfy an analogous equation. Letting x be the velocity vector of a “photon” following a ray, we have seen graphically that x and p are not generally parallel. More striking yet, as Fig. 10.4.1 shows, their magnitudes are, roughly speaking, inverse. This is shown by the lengths of the vectors labeled V4;they are large when directed along the “slow” axis. The gradient is greatest in the direction in which the ray speed is least. For this reason, Hamilton himself called p the vector of nonnal slowness to the front. Quite apart from the fact that this phrase grates on the ear, the picture it conjures up may be misleading because of its close identification of 9 with time of flight. Whether or not one is willing to visualize a ray as the trajectory followed by a “material” photon, one must not think of a wavefront as derived from this picture. In preparation for going on to mechanics, it is better to to suppress the interpretation of 4 as flight time and to concentrate on the definition of the surface 9 = constant as a surface of constant phase of some wave. Anyone who has understood wave propagation in an electromagnetic waveguide will have little trouble grasping this point. In a waveguide the “rays” (traveling at the speed of light) reflect back and forth at an angle off the sides of the guide, but surfaces of constant phase are square with the waveguide. Furthermore, the speed with which these fronts advance (the phase velocity) differs from the speed of light-in this case it is less. The “rays” are therefore able to keep in step since they are traveling at an angle. (Another velocity that enters in this context is the group velocity, but what has been discussed in this section bears at most weakly on that topic.) What has been intended is to make the point that p as defined in Eq.(10.4.1) need not be proportional, either in magnitude or direction, to the velocity of a “particle” along its trajectory.

BIBLIOGRAPHY References 1. V. Guilleman and S. Stemberg, Symplectic Techniques in Physics, Cambridge University Press, Cambridge, UK, 1984.

References for Further Study

Section 10.1 M. Born and E. Wolf, Principles of Optics, 4th ed., Pergamon, Oxford, 1970. L. D. Landau and E. M. Lifshitz, The Classical Theory ofFields. Pergamon, Oxford, 1971.

I1 HAMILTON-JACOB I THEORY

11.1. THE HAMILTONJACOBI EQUATION 11.1.l.Derivation

To develop mechanics based on its analogy with optics, we work initially in q-only configuration space rather than (4, p) phase space. Because they are both integrals on which variational principles are based, it is natural to regard the action

S=l,

P

Ldt

(1 1.1.1)

as the analog of the eikonal 4. The LagrangePoincar6 equations were derived in Section 5.3 by applying Hamilton's principle to S. The present discussion will deviate from that primarily by replacing the upper limit by a point P that is variable in the vicinity of 4. For fixed lower limit P I , and any choice of upper limit P = P2 6 P in Eiq. (1 1.1.l), after the extremal path has been found by solving the Lagrange equations, the action S(P1, P) = S(q, t ) is a well-defined function of (q, t ) = (q2, t2) (bq, 6 t ) , the coordinates of P. Three particular variations are illustrated in Fig. 11.1.1. The variation 6s accompanying change Sx with 6t = 0 as illustrated in Fig. 1l.l.la can be obtained directly from Eq. (5.33, for which an upper-boundary contribution was calculated but then set to zero at that time. The result is SS = ( a L / a i ) S x E p x 6 x . For multiple variables, the result is

+

+

(11.1.2) 325

326

HAMILTON4ACOBI THEORY

6t

tx AX

I

L .v,p

X6t -7,,

PI

FIGURE 11.1.1. Possible variations of the location upperend point P for extremal paths from PO to points P close to P2. The "reference trajectory" is the solid curve.

With the upper position held fixed, 6q = 0, but with its time varied by 6t as indicated in Fig. 11.1.1b, the change of action is SS(ii) = (aS/at)Gt.

(1 1.1.3)

The case in which the motion is identical to the reference motion over the original time interval, but is followed for extra time, is illustrated in Fig. 11.1. Ic. In this case, the path of integration is unchanged and the dependence of S comes entirely from the path's upper end extension. Differentiating (5.3.5) with respect to the upper limit yields (1 1.1.4)

Using Fig. 11.1.l c and combining results,

as .i + -at as

L S= ~ 7 9 6t a9

at

or

as

- - = Piq'

-

L.

(1 1.1S)

at

The final expression pi$ - L is of course equal to the Hamiltonian H, but one must be careful to specify the arguments unambiguously for the final result to be usable.' Solving

as --(st at

t ) = H(q, P,t )

( 11.1.6)

is not quite practical since H(q, p, t ) depends on p, which itself depends on the motion. This dependency can be eliminated using Eq. (1 1.1.2) to yield ( 11.1.7)

This is the Hamilton-Jacobi Equation. It is a first-order (only first derivatives occur) partial differential equation for the action function S(q, t). It is the analog of the eikonal equation. Momentum variables do not appear explicitly, but for given S(q. t ) 'The reader is assumed to be familiar with the Hamiltonian function but, logically, it is k i n g introduced here for the first time.

THE HAMILTONJACOBIEQUATION

327

the momentum can be inferred immediately from Eq.(11.1.2):

as

p=-.

aq

(11.1.8)

The coordinates q used so far are any valid Lagrangian generalized coordinates. In the common circumstance that they are Euclidean displacements, this equation becomes

p = vs.

(11.1.9)

This result resembles Eq. (10.1.1 l), which relates rays to wavefronts in optics. 11.1.2. The Geometric Picture2 We have not yet explained what it means to “solve” the Hamilton-Jacobi equation, or even what good it would do us to have solved it. The latter is easier to explain. A function S ( q , t ) satisfying the Hamilton-Jacobi equation over all configuration space includes descriptions of the evolution of the system from various (but not all possible) initial conditions. Actual system trajectories are “tran~verse”~ to “wavefronts” of constant S in the following sense. Eqs. (1 1.1.2) and (1 1.1.6) indicate that the dependencies of S near point P are given by

d S = pidq’ - H d t .

( 11.1.10)

If we treat q and t together as Cartesian coordinates, then a pair of dynamical variables (A, B) can be said to be “transverse” to ( d q , dt) if

Aidq’+Bdt=O.

(11.1.11)

Suppose the displacement ( d q i , d t ) lies in a surface of constant S, i.e., d S = 0. Using Eq. (1 1.1.10) one can then determine that the “vector” ( p i , - H ) is transverse to the surface of constant S. Stated more simply, ( p i , - H ) is a “generalized gradient” of s: (1 1.1.12) This can be regarded as the analog of the hybrid ray equation n ( d r / d s ) = V&. Since there is now an extra independent variable, t, the geometry is more complicated than it is in optics. If the system configuration is specified by a point in configuration space and if, as is usual, many noninteracting identical systems are indicated by points on the same figure, then the hyperplane of constant t yields a “snapshot” of all systems at that time. (At each point in this space the further specification of momentum p uniquely specifies the subsequent system evolution-another 2The geomeVic discussion in this section is not “intrinsic.” This means the picture depends on the coordinates being used. An intrinsic description will be provided in Chapter 13. 31t is the notion that a vector is “transverse” that makes the present discussion nonintrinsic, Inferences drawn from this discussion may therefore be suspect, especially for constrained systems.

328

HAMILTONJACOBITHEORY

+

snapshot taken infinitesimally later, at time r dt, would capture this configuration and determine the subsequent evolution.) Consider a curve of constant S in the time t snapshot. By Eq.(1 1.1.9), momentum p is transverse to that surface. This resembles the relation between rays and wavefronts in optics. However, since system velocity and system momentum are not necessarily proportional, the system velocity q is not necessarily transverse to surfaces of constant S (though it often will be). The important case of a charged particle in an electromagnetic field is an example for which p and q are not parallel; this is discussed in Section 13.1.1. 11.1.3. Constant S Wavefronts We now develop the qualitative picture of the connection between “wavefronts” and surfaces of constant S shown in Fig. 11.1.2 for a system with just two coordinates (x, y) plus time t. At an intial time t = to, suppose the function S = S(x, y , to) is specified over the entire t = to plane. One can attempt to solve an initial-value problem with these initial values. Loosely speaking, according to the theorem of Cauchy and Kowaleski, discussed for example in Courant and Hilbert [l], the HamiltonJacobi equation (1 1.1.7) uniquely determines the time evolution of S(x, y , t) as t varies away from t = to. On the t = to plane, the out-of-plane partial derivative a S / a t needed to propagate S away from t = to is given by Eq.(1 1.1.12). In principle, then, S(x, y , t ) can be regarded as known throughout the (x. y, t ) space when it is known at an initial time and satisfies the Hamilton-Jacobi equation. One can consider a contour of constant S in the t = to plane such as Co : Sfx, y , to) = S, = constant.

(11.1.13)

As shown in Fig. 11.1.2, there is a surface on which S = S, and its intersection with the plane t = to is Co. Intersections of the constant S surface with planes of other constant t determine other “wavefronts.” The equation S(x, y; t ) = S, can be inverted, t = S-’(x, y; Sc), in order to label curves with corresponding values oft.

X

-H FIGURE 11.1.2. Wavefront-like curves in the right figure are intersections with planes of constant t of the surface of constant S shown in the lefl figure. Trajectories are not in general orthogonal to curves of constant S in the right figure, but p, -H is orthogonal to surfaces of constant S on the left.

TRAJECTORY DETERMINATION USING THE HAMILTONJACOBI EQUATION

329

A solution S(x, y; t ; s(x, y)) of the Hamilton-Jacobi - equation like this, that is able to match arbitrary initial condition S ( x , y, to) = S(x, y), is known as a general integral. It is satisfying to visualize solving the Hamilton-Jacobi equation as an initial-value probem in this way, and using the solution to define wavefronts and hence trajectories. But that is not the way the Hamilton-Jacobi equation has been applied to solve practical problems. Rather there is a formal, operational procedure that makes no use of the geometric picture used in this section. This will be developed next.

11.2. TRAJECTORY DETERMINATION USING THE HAIWILTON-JACOBI EQUATION 11.2.1. Complete Integral In solving practical problems in mechanics, it is not necessary to find the general integral discussed in the previous section. Rather, one starts by finding a “complete integral” of the Hamilton-Jacobi equation. This is a solution containing as many free parameters as there are generalized coordinates of the system. Though there are ways of using such a complete integral to match initial conditions, that is not the profitable approach. Rather, there is an operational way of solving the mechanics problem of interest without ever completing the solution of the Hamilton-Jacobi initial-value problem. Though the method is completely general we will continue to work with just x, y , and 1. We seek a solution for x, y , px, and p y as functions oft,satisfying given initial conditions. If we have a solution with four “constants of integration,” we can presumably find values for them such that intitial values xo, yo, pxo, and pro are mat~hed.~

11.2.2. Finding a Complete Integral by Separation of Variables The Hamilton-Jacobi equation is (1 1.2.1) Recall that its domain is (x, y , t)-space, with momentum nowhere in evidence. Assume a variable-separated solution of the form5 S(x, y , t ) = S(x)(x)

+ S(Y)(y)+ S q t ) .

(1 1.2.2)

4Without saying it every time, we assume that mathematical pathologies do not occur. In this case we assume that four equations in four unkowns have a unique solution. ’Since S is a “phase-like”quantity, its simple behavior is additive, in contrast to a quantity like $I = exp i$, whose corresponding behavior is multiplicative. This accounts for the surprising appearance of an additive form rather than the multiplicative form appearing when “separation of variables” is applied to the Schradingerequation in quantum mechanics.

330

HAMILTOKJACOBITHEORY

For this to be effective, substitution into Eq. (1 1.2.1) should cause the HamiltonJacobi equation to take the following form:

By straightforward argument, this form assures the validity of introducing two arbitrary constants a1 and u2 such that

(1 I .2.4)

Being first-order ordinary differential equations, each of these supplies an additional constant of integration, but only one of these, call it u,is independent since S(’), S(Y)(y),and S(’) are simply added in Eq. (1 1.2.2). Whether found this way by separation of variables or any other way, suppose then that one has a complete integral of the form

s = S(x, y , t , a1, OLZ, t ) +a.

(1 1.2.5)

That there is an additive constant a is obvious since the Hamilton-Jacobi equation depends only on first derivatives of S. It is required that a1 and a2 be independent. For example, it would not be acceptable for a2 to be a definite function of a1,even with position- or time-dependent parameters. For reasons to be explained later, the constants a1,al,. .. appearing in such a complete integral are sometimes called “new momenta.” For the time being, the symbol a (and later P) will be reserved for these quantities. Before showing how (1 1.2.5) can be exploited in general, we consider an example. 11.2.3. Hamilton-Jacobi Analysis of Projectile Motion Consider the motion of a projectile of mass m in a uniform gravitational field for which

v = mgy,

L =m --ti 2 + j 2 )- m g y , 2

1 2m

H = - (p:

and for which the Hamilton-Jacobi equation is

+ p;) + m g y ,

(1 1.2.6)

TRAJECTORY DETERMINATION USING THE HAMILTONJACOBIEQUATION

(?l2+-

as 1 -+at 2m

1 2m

ax

(as)2 - +mgy=O. ay

331

(11.2.7)

Neither x nor 1 appears explicitly in this equation. The reason for this is that there is no explicit dependence of the Lagrangian on x or t . This equation is therefore simple enough that the variable separation approach of Section 11.2.2 can be performed mentally to yield

s = O1lt + ff2x + S q y ) + ff.

(1 1.2.8)

Substitution back into Eq. (1 1.2.7) yields

(1 1.2.9)

Arbitrarily picking yo = 0 and merging additive constants yields

s = O1lt +a2x

ti( --

ff;

-a1 - - - mgyI3l2 +a. 2m

(11.2.10)

One can check Eq. (11.1.12): px = as =a2,

ax

as - *&,/-a1

P Y = a y -

3

- 2m - mgy.

(1 1.2.11)

From the first of these, it is clear that a 2 = pxo = mio, the initial horizontal momentum, and that it is a conserved quantity. Rearranging the second equation and substituting p y = m j yields 1

-mj 2

2

1 2 + -mio f mgy = -q, 2

(1 1.2.12)

a result clearly interpretable as energy conservation, with -011 being the (conserved) total energy Eo. Based only on theory presented so far, the interpretation is merely that the function of variables y and j appearing on the left side of Eq. (1 1.2.12) is conserved.

332

HAMILTOKJACOBI THEORY

11.2.4. The Jacobi Method for Exploiting a Complete Integral By “solving” the mechanics problem at hand, one usually means finding explicit expressions for x ( t ) and y(r). If velocities or momenta are also required, they can then be found by straightforwarddifferentiation.It was recently seen that a complete integral S(x, y, a1,a2,t) yields immediately

(1 1.2.13) (In Section 11.2.3,these were Eqs. (1 1.2.1l).) This can perhaps be regarded as having completed one level of integration since it yields p x and p y as functions of x, y, and t , There are only two free parameters, a1 and a 2 , but that is all that is needed for matching initial momenta. This leaves another level of integration to be completed to obtain x ( t ) and y(t). Where are we to get two more relations giving n(t) and y(r)? One relation available for use is ( 1 1.2.14)

but this is just another complicated implicit relation among the variables. The remarkable discovery of Jacobi was that, starting from a complete integral as in Eq. ( I 1.2.5),expressions for x ( t ) and y(t) can be written down mechanically, uncluttered with dependency upon p x and p y . His procedure was to regard equations obtained by inverting Eqs. (1 1.2.13)as transformation equations defining new dynamical variables a1 and a2 as functions of x,y,px,pyand t. Then he defined two further dynamical variables and 8 2 by

(1 1.2.15) These are to be regarded as a coordinate transformation (x, y) -+ (PI, 82). The amazing result then follows that all four of the dynamic variables al,a 2 , PI, and B 2 , defined by Eqs. (1 1.2.13)and (1 1.2.15),are constants of the motion. This can be The newly introduced dynamical variables visualized geometrically as in Fig. 11.2.1. PI, 82, . . . will be known as “new coordinates” and the symbol p (and later Q) will be reserved for them for the time being, To demonstrate Jacobi’s result, first differentiate Eqs. (1 1.2.15)with respect to t:

a2s

aalan

i+-y

a2s . = - _ a2s _

aalay

aalat

a2s

a2s .

y = --a’s , (11.2.)6) and -i+aa2ax aa2ay aa2at

with terms proportional to &I, &2. bl, and 8 2 vanishing by hypothesis. The order of differentiations in these equations has been reversed for convenience in the next step. Next partially differentiate the Hamilton-Jacobi equation itself (1 1.2.14)with respect to arl (respectively a2)holding x, y and t fixed, after substituting for p x and

TRAJECTORY DETERMINATION USING THE HAMILTOWACOBI EQUATION

333

FIGURE 11.2.1. A particletrajectorysatisfying Hamilton's equations is found as the intersection of two surfaces derived from a complete integral of the Hamilton-Jacobi equation.

py from Eq. (1 1.2.13):

a2s

aH a2s --

aH a2s +--.---apx a a l a x apy a a l a y

atr,at

and

aH a2s +--=-aH a2s -apx aa2ax

aPy aa2ay

a2s aa2at' (1 1.2.17)

Subtracting Eq. ( 11.2.17) from ( 11.2.16) yields

a2s a2s -

aH = 0.

- -

(1 1.2.18)

Y--

Unless the determinant formed from the coefficients vanishes, this equation implies

.

x=-

a~ apx

and j =

-.aH

(1 1.2.19)

aPY

But the vanishing of the determinant would imply that a1 and a2 were functionally dependent, contrary to hypotheses. It has been shown therefore that half of Hamilton's equations are satisfied. Similar manipulations show that the remaining Hamilton equations are satisfied. Differentiate Eqs. (1 1.2.13) with respect to t , again under the hypothesis that a 1 and a2 are constant, x @

a2s + -a2s a2s j + -. axay

- -i - ax2

axat

(1 I .2.20)

334

HAMILTONJACOBI THEORY

Also partially differentiate the Hamilton-Jacobi equation with respect to x to obtain

a2s - - - -a-H- _ _ _a -H- a2S axat

ax

a H a2S

apY ayax’

apx ax2

( 1 1.2.21)

Using Eq. (1 1.2.19) and subtracting these equations, pr =

aH --1

(1 1.2.22)

ax

and by = -8 H / a y follows similarly. Hence all of Hamilton’s equations are satisfied by Jacobi’s hypothesized solution. 11.2.5. Completion of Projectile Example

To follow the Jacobi prescription for the projectile example of Section 1 1.2.3,define by substituting Eq. (1 1.2.10)into the first of Eqs. (1 1.2.15) to obtain 1 2 = t f -JIJEo g

m

2

- 2m - m g y .

( 1 1.2.23)

It was noted earlier that -a1 and a2 could be identified with the total energy Eo and pXo.and those replacements have been made. The expression inside the final square root is the vertical contribution to the kinetic energy. It vanishes as the projectile passes through “zenith,” the highest point on its trajectory. This makes it clear that can be interpreted as the time of passage through that point-clearly a constant of the motion. It is necessary to take the positive sign fort < and the negative sign otherwise. Eq. (11.2.23) can be inverted to give y ( t ) directly since n has dropped out. Superficially #?I appears to increase linearly with t but this time dependence is precisely cancelled by the variation of the spatial quantities in the second term, in this case y . The remaining equation is the other of Eqs. (1 1.2.15):

( I 1.2.24) Eliminating the square root expression using Eq. (1 1.2.23)yields p2

=x

+ -Pm(XpOI

- t).

(1 1.2.25)

Since /I1 is the time of passage through zenith, the second term vanishes at that instant and 82 is the x-coordinate of that point-obviously also a constant of the motion. Again the superficial dependence (on x) in the first term is cancelled by the second term. This example has shown (and it is the same in general) that the constants of motion P I , 8 2 , . .. in the Jacobi procedure have a kind of “Who’s buried in Grant’s tomb?’ character. In this case the question is “How do the coordinates of the highest point on the trajectory vary as the projectile moves along its trajectory?”

TRAJECTORY DETERMINATIONUSING THE HAMILTONJACOBIEQUATION

335

When PI, 8 2 , . . . are expressed in terms of the evolving coordinates they may not look constant, but they are nevertheless.

11.2.6. The Time-Independent Hamilton-Jacobi Equation Though this problem has been very simple and special, the features just mentioned are common to all cases where H is independent of r . The Hamilton-Jacobi equation can then be seen to be at least partially separable, with the only time dependence of S being an additive term of the form S(‘) = at. Furthermore, since H = E = constant when the Hamiltonian is independent of time, it can be seen that -a can be identified as the energy E, so S(X, y ; t ) = -Et

+ SO(X,y ; E ) .

(1 1.2.26)

So(x, y; E ) contains the spatial variation of S with E as a parameter but is independent of t . It satisfies the “time-independent”Hamilton-Jacobi equation:

H

(x,y,

2,$1

= E.

(1 1.2.27)

The Jacobi coordinate definition yields the “constant” #?I according to

as

a so

B1 = -= t - - .a E a(-E)

(1 1.2.28)

Reordering this equation, it becomes (1 1.2.29) Since subtracts from t , the symbol is commonly replaced by to. which is commonly then called the “initial” time. Translating the origin of time gives a corresponding shift in PI. (However, with So expressed in terms of x and y, it is typically not obvious that it can also be expessed as a linear function o f t in this way.) We will develop shortly a close analogy between the Hamilton-Jacobi equation and the Schrodinger equation of quantum mechanics. Eq. (1 1.2.27) will then be the analog of the time-independent Schrodinger equation.

11.2.7. Hamilton-Jacobi Treatment of l-DSimple Harmonic Motion Though it is nearly the most elementary conceivable system, the one-dimensional simple harmonic oscillator is basic to most oscillations and provides a simple illustration of the Jacobi procedure. This formalism may initially seem a bit “heavy” for such a simple problem, but the entire theory of adiabatic invariance follows directly from it and nonlinear oscillations cannot be satisfactorily analyzed without this approach. The Hamiltonian is P2 H ( q , p) = 2m

1 + -rn&q2 2

(1 1.2.30)

336

HAMILTONJACOBITHEORY

This yields as the (time-independent)Hamilton-Jacobi equation (1 1.2.3 1) which can be solved to give (1 1.2.32)

(The lower limit has been picked arbitrarily.) It will be necessary to handle the f l ambiguity coming from the square root on an ad hoc basis; here the positive sign has been chosen. This is a complete integral in that it depends on E, which we now take as the first (and only) “Jacobi momentum” that would previously have been denoted by rq (or -XI). Following the Jacobi procedure we next find f l 1 , which we will now call Q , or rather Q E since it is to be the “new coordinate” corresponding to E. (If we were to insist on conventional terminology we would also introduce a “new momentum” P = E. That is, we are performing a transformation of phase space variables (x, p) + (Q, P). Since the main purpose of So(q, E) is to be differentiated, explicit evaluation of the integral in Eq. (1 I .2.32) may not be necessary, but for definiteness the result is

By Jacobi’s defining equation for Q E ,we have QE=-=-

1

dq’ = -sin-’ 1

(Eq .) (11.2.34)

w As stated in a previous warning, it is not obvious that Q E is a linear function of t , but from the general theory of the previous section, in particular Eq. (1 1.2.29). we know this to be the case: Q E = t - to.

(1 1.2.35)

Combining Eqs. (1 1.2.34) and (1 1 2.35)yields (11.2.36) which begins to look familiar. The corresponding variation of p is given by

TRAJECTORY DETERMINATIONUSING THE HAMiLTON-JACOBI EQUATION

337

FIGURE 11.2.2. The phase space trajectory of simple harmonic motion is a circle traversed at constant angular velocity o0 if the axes are q and p / ( m o ) . The shaded area enclosed within the trajectory for one cycle of the motion in q and p phase space is 2n/,where I is the "action."

Phase space plots of the motion are shown in Fig. 11.2.2. Considerations relating to continuity in the figure made it necessary to restore the foptions for the square root that was entered in the first place. The trajectory equation is E = - +P-2m w o1q . 2 2 2m 2

(1 1.2.38)

11.2.8. The Kepler Problem We now take up the Kepler problem from the point it was left in Problem 1.2.12. It is important both for celestial mechanics and as a classical precursor to the theory of the atom. The latter topic is introduced nicely by Ter Haar [2].

7 7.2.8.7. Coordinate Frames: Since we are dealing with formulas derived by astronomers, we may as well use their frames of reference, referring to Fig. 11.2.3. For studying earth satelites it is natural to use the equatorial plane as the x , y plane. For studying solar planetary orbits, such as that of the earth, it is natural to use the eclipticplane, which is the plane of the earth's orbit around the sun. (Recall that the equatorial plane is inclined by about 23" relative to the plane of the earth's orbit.) In either case it remains to fix the orientation of the x-axes, and if comparisons are to be made between the two frames, the rule relating to these choices. By convention, the x-axis in both frames is chosen to be the line ofequinoxes, which is to say the line joining the earth to the sun on the day of the year the sun is directly over the equator so night and day are equal in duration everywhere on earth. This line necessarily lies in both the equatorial and ecliptic planes and is therefore their line of intersection. It happens that a distant star called Aries lies approximately on this line, and it can be used to remember this direction at other times in the year.

338

HAMILTON-JACOEI THEORY

Aries

earth’s orbit

plane FIGURE 11.2.3. Equatorialand ecliptic planes.

77.2.8.2. Orbit Efemenfs:Having chosen one or the other of these frames, specification of the three-dimensional motion of a satellite can be discussed using Fig. 11.2.4,in which the instantaneous satellite position is projected onto a sphere of constant radius. The trace of the orbit has the polar coordinates 0.4 of the true orbit.

FIGURE 11.2.4. A Kepler orbit is projected onto a sphere centered on the center of gravity of the binary system. The true orbit emerges from the x , y plane, passes through perigee, and is instantaneously situated along the lines OA, OR and OC respectively.

TRAJECTORY DETERMINATION USING THE HAMILTONJACOBIEQUATION

339

The satellite lies instantaneously along ray C and the orbit plane is defined by this line and the line OA of the “ascending node.” Fixing three initial positions and three initial velocities fixes the subsequent threedimensional trajectory of the satellite. There are, however, other ways of specifying the orbit. The orbit plane can be specified by the “azimuth 83 of the the line OA, and the “inclination” i, which is the polar angle between the normal to the orbit plane and the z-axis. Two coordinates locate the point of nearest approach (“perigee”) along the line OP in the orbit plane plus the particle speed as it passes perigee (necessarily at right angles to OP). However, it is more conventional to choose parameters that characterize the geometric shape and orientation of the orbit, which is known to be to be elliptical as in Fig. 1.2.17. The semimajor axis is a and the eccentricity is 6. The angle 82 between OA and OP specifies the orientation of the ellipse. Finally the location C can be located relative to P by specifying the “time of passage through perigee” 81. The parameters introduced in this way are known as “orbit elements.” Other parameters are sometimes introduced for convenience. The parameters B1, 82, and 83 (which depend on choice of coordinates and initial time) have already been named in anticipation of the way they will appear in the Hamilton-Jacobi theory, but it remains to introduce parameters ( ~ 1(. ~ 2and . a3 as functions of a, c, and i. There are always six independent parameters in all.

7 7.2.8.3. HamiltonJacob1 Formulation: Using polar coordinates, the Lagrangian for a particle of mass m moving in three dimensions in an inverse square law potential is (11.2.39) The canonical momenta are Pr

=mi,

2’

pe = m r 6 ,

p+ = m r 2 sin26 4,

(1 1.2.40)

and the Hamiltonian is (1 1.2.41) Preparing to look for a solution by separation of the variables, the time-independent Hamilton-Jacobi equation is E=-

2m

+7 -r2 ( (( ) 7)

.

(11.2.42)

+-

Since @ does not appear explicitly, we can separate it immediately in the same way r has already been separated:

s=-

+ (~34+ s(@(B)+ F ( r ) .

~ t

(1 1.2.43)

340

HAMILTONJACOBI THEORY

Here a3 is the second “new momentum” of Jacobi. (E is the first.) It is interpretable as the value of a conserved angular momentum around the z-axis because

as pg =

(1 1.2.44)

G = ff3.

Substituting this into Eq. (11.2.42)and multiplying by 2mr2 yields

2mEr2+2mKr-r2

(dir)2= (-I2++ dSce)

sin 0

( s ) 2 = a i , (11.2.45)

where the equality of a pure function of r to a pure function of 0 implies that both are constant; this has pemitted a third Jacobi parameter a2 to be introduced. The physical meaning of a 2 can be inferred by expanding M 2 , the square of the total angular momentum:

M = J(mr28)2

+ (mr2 sine412 =

(1 1.2.46)

From the interpretation of a3 as the z component of a2 it follows that a3 = a2 cosi.

( I 1.2.47)

Determination of the other terms in S has been “reduced to quadratures,” because Eqs. (11.2.45)give expressions for dSce)/dO and dS(‘)/dr that can be rearranged to yield S(@)(0)and S(‘)(r)as indefinite integrals:

S3 =

1’

,/hE

2mK a; +7 - -dr’. r r12

(1 1.2.48)

Instead of using E as the first Jacobi “new momentum,” it is conventional to use a function of E, namely (1 1.2.49)

Like a2 and a3,a1 has dimensions of angular momentum. Refemng to Fig. 1.2.17, the semimajor axis a and the orbit eccentricity E are given by

(1 1.2.50)

TRAJECTORY DETERMINATIONUSING THE HAMILTONJACOBIEQUATION

341

with inverse relations 2 aI = Kma,

or22 = (1

-6

2

)Kma.

(1 1.2.51)

Combining results, the complete integral of the Hamilton-Jacobi equation is

m2K2

2mK a; +- -dr’. r‘ rr2

(11.2.52)

The lower limits and some signs have been chosen arbitrarily so that the Jacobi “new momenta” 81, 82, and 83 will have conventional meanings. To define them requires the following “tour de force” of manipulations from spherical trigonometry. They are the work of centuries of astronomers. Starting with 83 and using J3q. (1 1.2.47), we obtain

= 4 - sin-’(cote coti) = 4 - p .

(1 1.2.53)

The second to last step is justified by the trigonometry of Fig. 11.2.5a and the last step by the spherical trigonometry of Fig. 11.2.5b, which can be used to show that cot 6 cot i = p. Refening back to Fig. 11.2.4, one sees that 83 is indeed the nodal angle and, as such, is a constant of the motion.

/// L sin 0 sin i

(sin2 8 - coszi)112

FIGURE 11.2.5. Figures illustratingthe trigonometry and spherical trigonometry used in assigning meaning to the Jacobi parameters in Fig. 1.2.17.

&, and &. Angle x in part (c) is the same angle as

342

HAMILTON4ACOBI THEORY

We next consider 82:

as

82 = aa2

(1 1.2.54) The second integral was evaluated using Eq. (1.2.28). The first integral was performed using Eq. (1 1.2.47) and changing the variable of integration according to

case = sin i sin +,

or

+ = sin-'

(Z).

(1 1.2.55)

Using spherical trigonometry again on Fig. 1 1.2.5b, the variable can be seen to be the angle shown both there and in Fig. 11.2.5~.That is, is the angle in the plane of the orbit from ascending node A to the instantaneous particle position. It follows that 8 2 is the difference between two fixed points, A and P, and is hence a constant of the motion (as expected). Since a1 is a function only of E , we expect its conjugate variable 81 to be a linear function of t . It is given by

+

as

dr'

I=-=--

dr'

= - F rnr a3 + u - r

sinu.

(1 1.2.56)

The integral has been performed by making the substitution r = a (1 - c cos p ) , and the result of Eq.(1.2.36) has been replicated, with /I1 being proportional to the time since passage through perigee. We will return to this topic again in Section 14.6.3 as a (degenerate) example of conditionally periodic motion and as an example of the use of action-angle variables.

Problem 11.2.1: In each of the following problems, a Lagrangian function L(q, q, t ) is given. In every case the Lagrangian is appropriate for some practical physical sys-

343

ANALOGIES BETWEEN OPTICS AND QUANTUM MECHANICS

tem, but that is irrelevant to doing the problem. You are to write the Lagrange equations and, after defining momenta p, give the Hamiltonian H ( q , p, t ) and write the Hamilton-Jacobi equation. In each case find a “complete integral” (in all cases except one this can be accomplished by separation of variables). Finally, use the complete integrals to solve for the motion given initial conditions. Leave your answers as definite integrals, which in some cases will be quite ugly. C o n k that there are enough free parameters to match arbitrary initial conditions, but do not attempt to do it explicitly. Figure out from the context which symbols are intended to be constants and which variables. I 1 2 (a) L = 2mi2 - 2kx .

+

(b) L = $i2 Atx. (Something to try here for solving the Hamilton-Jacobi equation is to make a change of dependent variable S + S’(S, x , t ) such that the equation for S’ is separable. Another alternative is to try “cheating” by solving the Lagrange equation and working from that solution.) (c) L = $m(R2d2 R2 sin2 8 w2 - mgR(1 - cos8). 8 is the only variable.

+

+ r2d2) - V ( r ) .(r and 6’ are cylindrical coordinates.) (e) L = A(d2 + sin28 $z) + C($ + $ cos 8)2 - MgL cos8. (Euler angles.)

(d) L = ;m(t2

(f) L = m o c 2 ( 1 - , / m ) + e A ( r ) . r - V ( r ) , whereA(r)isavectorfunction of position r and V ( r ) is a scalar function of radial coordinate r . In this case write the Hamilton-Jacobi equation only for the special case A = 0 and assume the motion is confined to the z = 0 plane, with r and 8 being cylindrical coordinates. This problem contains much of the content of relativistic mechanics.

11.3. ANALOGIES BETWEEN OPTICS AND QUANTUM MECHANICS 11.3.1. General Discussion In the Hamilton-Jacobi formulation of mechanics, one proceeds by solving a partial differential equation for the “wavefront” quantity S. This makes the mathematics of mechanics closely analogous to the mathematics of waves, as shown in Fig. 1 1.3.1.

11.3.2. Classical Limit of the Schrodinger Equation It seemed natural to Schrodinger to pursue the possibility that the Hamilton-Jacobi equation was itself the short-wave approximation to a more general wave equation. We know that, for the case of a single particle of mass m in a potential V(r), Schrodinger was led to the equation

a+ = --v’+ h2

jh-

at

2m

+ V(r)+.

( I 1.3.1)

344

HAMILTON-JACOBI THEORY

GEOMETRIC PICTURE

MECHANICS

OPTICS

Waves PDE

1= small Eikonal equation $=cpnstant

Wavefronts

I

-

Hamilton-Jacobi equation S = constant

Variational

principle

b Ray equation

Trajectories ODE

Lagrange’s equation

- Hamilton’s equation Newton’s equation

FIGURE 11.3.1. Chart indicatlng analogies between optics and mechanics. Topics only mentioned in the text are in broken-line boxes, and derivation paths dlscussed are indicated by arrows.

As in Section 10.1.1, we can seek a solution to this equation that approximates a

plane wave locally:

$,(f, r) = Ae[iS(r.f)l/h,

(1 1.3.2)

Planck’s constant h, as introduced here, and also expressed as 2nh, must have the same units as S but is otherwise, for now, arbitrary. This establishes h to have units of action which partially accounts for its having the name “quantum of action.” Substitution into Eq. (1 1.3.1)yields

as

at

I iA + -IVSI’ + V(r) - -V’S 2m 2m

= 0.

(1 1.3.3)

We will consider the final term in more detail below, but it vanishes in the limit R + 0, and in that limit the Schriidinger equation becomes

as + --JVSI’ I

-

at

2m

+ V(r) = 0.

(1 1.3.4)

This is precisely the Hamilton-Jacobi equation since the Hamiltonian for this system is

P‘ + V(r). H =2m

(11.3.5)

ANALOGIES BETWEEN OPTICS AND QUANTUM MECHANICS

In Eq. (11.3.2), to make the local wavelength of wave function dependence at fixed time of S(r, t ) should be

S(r) =

27r~ii.r ~

A

+ be A, the spatial

h-or V S = - k .

,

346

(11.3.6)

A

But in the Hamiiton-Jacobi formalism the momentum is given by p = VS, and hence self-consistency requires

hp = -k,

(11.3.7)

A.

which is the deBroglie relation between momentum and wavelength. Since the momentum can be inferred from mechanics and the wavelength A can be measured-for example by electron diffraction from a crystal of known lattice spacing-the numerical value of h can be determined. It can be compared with the value appearing in E = hu, measured for example using photoelectric measurements. This provides a very significant test of the validity of quantum mechanics. 11.3.3. Condition for Validity of Semiclassical Treatment With little loss of generality we can consider the one-dimensional motion of mass m jn potential V ( n ) .Classically, because Hamiltonian H is independent of time, the

Hamilton-Jacobi solution takes the form S ( X ,t ) = -Er

+ S,(X; E )

and p = d S X dx ~

3

S:.

(11.3.8)

The time-independentquantum mechanical equation is A2 2 2m dx2

--1//

+ ( E - V(X))*

= 0.

(1 1.3.9)

As in Eq. (1 1.3.2), trying a solution

(11.3.10)

yields

S’2 - iAS” = 2m(E - V ( x ) )

(1 1.3.1 1)

346

HAMILTONJACOBI THEORY

Expanding S in powers of h / i , (11.3.12) terms independent of h yield (1 1.3.13) and the condition needed to justify dropping the next term is

where the replacement of S’ by “classical” momentum p is presumably valid in the classical regime. Recall Eq. (10.1A), which expressed the condition for validity of geometric optics. Reexpressing it slightly yields

121<<

(1 1.3.15)

2n,

the same as Q. (1 1.3.14). This shows that the condition for the validity of classical mechmics with the deBroglie relation for momentum is essentially the same as the condition for the validity of geometric optics. With wavelength A related to momentum p = 1/(2rn)(E - V ) according to deBroglie’s relation ( 11.3.7j, the condition can be written (11.3.16) With d V / d x being force and A. = h / p , the product h/pIdV/dxI can be interpreted as the work done by the external force as the particle travels a distance equal to one deBroglie wavelength. With p2/(2rn) being kinetic energy K.E., the inequality can be expressed A (K.E.) deBroglie wavelength

1 4n

<< -(K.E.).

( 11-3.17)

When this relation is applied to macroscopic (nonatomicj particles subject to realistic laboratory force fields, this condition for validity of classical mechanics is usually overwhelmingly satisfied.

BIBLIOGRAPHY

347

BIBLIOGRAPHY References 1. R. Courant and D. Hilbert, Methods of Mathematical Physics, Vol. 2, Interscience, New York, 1962. 2. D. Ter Haar, Elements of Hamiltonian Mechanics, 2nd ed., Pergamon, Oxford, 1971.

References for Further Study

Section 71.2 L.A. Pars, Analytical Dynamics, Ox Bow Press, Woodbridge, CT, 1979. Section 11.2.8 F. T. Geyling and H. R. Westerman, Introduction to Orbital Mechanics, Addison-Wesley, Reading, MA, 1971.

12 RELATIV ISTIC MECHAN ICS After adopting Einstein's relativity principle, the entire Hamiltonian and HamiltonJacobi formalism remains intact. It should not be surprising therefore that, in addition to encompassing special relativity. this chapter can be regarded as one long application of the formulas from Chapter 1 1. It is assumed that ideas like Lorentz contraction, time dilation, failure of simultaneity, etc., are already familiar from an elementary physics course. There is no shortage of reference material at that level. Nevertheless, to make the treatment selfcontained, the first section rederives these results as succinctly as possible. Some of the tensor manipulations use or illustrate results from the early chapters of this text, but no explicit mention is made when this occurs.

12.1. REVIEW OF SPECIAL RELATIVITY THEORY

12.1 .l.Form lnvarlance

Form invariance is the main principle of relativity: All equations should have the same form in all coordinate frames. If a scalar quantity such as c = 3 x 10' d s e c occurs in an equation in one frame of reference, then that same quantity c, with the same value, must appear in the corresponding equation in any other coordinate system. If it should turn out, as it does, that c is the speed of propagation of waves (i.e., light) predicted by the equations, then the speed of light must be the same in all frames. (In systems of units such as SI units, c does not appear explicitly in Maxwell's equations, but it is derivable from other constants, and the same conclusion follows.) It is found that there are wave solutions to Maxwell's equations and that their speed is c. Maxwell evaluated c from electrical and magnetic measurements and, finding the value close to the speed of light, conjectured that light is an electromagnetic phenomenon. 348

REVIEW OF SPECIAL RELATIVITY THEORY

349

Putting these things together we can say that light travels with the same speed in all inertial frames. This conclusion was corroborated by the Michelson-Morley experiment. Of course, the numerical values of physical quantities such as position, velocity, electric field, and so on, can have different values in different frames. In Galilean relativity, velocities measured by two relatively moving observers are necessarily different. Hence the constancy of the speed of light is not consistent with Galilean relativity. Einstein introduced the concept that time need not be the same in different frames. Treating time and space similarly he stressed the importance of world points in a four-dimensional plot with almost symmetric treatment of time as one coordinate and space as the other three. 12.1.2. World Points A world event is some occurrence, at position x, and at time t, such as a ball dropping in Times Square, precisely at midnight, on New Year’s Eve. Such a world event is labeled by its time and space coordinates (t, x). To describe the trajectory of the rocket requires a world Line which describes where the rocket is, x(t), at time t .

12.1.3. World Intervals If this world line describes a light pulse sent from world point (tl ,XI) and received at world point ( t 2 , a),then, because it is light, 1x2

- X I I = c(t2 - t l ) .

(12.1.1)

When described in a different coordinate frame, designated by primes, the same two events would be related by

; .1

- x;

I = .(ti - t i ) .

(12.1.2)

The interval between any two such world events (not necessarily lying on the world line of a light pulse) is defined to be s12

= Jcqt2 - t1)2 - (x2 - x1)2.

(12.1.3)

Since c2(t2 - t1)2 - (x2 - ~ 1 can ) have ~ either sign, s12 can be real or purely imaginary. If real, the interval is said to be time-like; if imaginary it is space-like. From Eq.(12.1.1) it can be seen that the interval vanishes if the two ends are on the world line of a light pulse. The points are then said to be “on the light cone.” This condition is frame-independentsince, from Eq.(12.1.2), the same interval reckoned in any other frame also vanishes. A differential interval d s is defined by ds2 = c2 d t 2 - d x 2

- dy2 - dz2 = c2 dt2 - I dXl2.

(12.1.4)

350

RELATIVISTIC MECHANICS

It has been seen that the vanishing of the value ds in one frame implies the vanishing of the value ds’ in any other. This implies that ds and ds’ are proportional (even if they are not on a light pulse), ds = A ds’. With space assumed to be homogeneous and isotropic, and time homogeneous, the proportionality factor A can at most depend on the absolute value of the relative velocity of the two frames. This implies A = 1. ds = ds’.

(12.1.5)

From the equality of differential intervals it follows that finite intervals are also invariant: s12

=4 2 .

(12.1.6)

12.1.4. Proper Time In special relativity the word “proper” has a definite technical meaning that happens to be fairly close to its colloquial meaning-as in “If you do not do it properly it may not come out right.” Hence the “proper” way to time a mile run is to start a stopwatch at the start of the race and and stop it when the leader crosses the finish line. The stopwatch is chosen for accuracy, but that is not the point being made here-the point is that the watch is present at both world events, start and finish. (This assumes the race is four laps of a &mile track.) In this sense the traditional method of timing a 100-meter dash is not “proper,” because the same watch is not present at start and finish (unless the winner of the race is carrying it). In practice, if the timing is not done “properly,” it may still be possible to compensate the observation so as to get the right answer. In timing the 100-meter dash, allowance can be made for the time it takes for sound to get from the starting gun to the finish line. Hence a “proper time” in relativity is the time between two world events occurring at the same place. A “proper distance” is the distance between two world events occuring at the same time; this requires the use of a meter stick that is at rest in a frame in which the two events are simultaneous. The world line of a particle moving with speed u is described, in a fixed, unprimed frame, by coordinates ( 2 , x ( t ) ) , where ldxl dt

v = -.

(12.1.7)

Consider differential motion along this world line. The proper time on that world line advances not by dt, which is the time interval measured in the unprimed frame, but dr’, the time interval measured by a clock carried along with the particle. The same interval can be worked out in the fixed frame, yielding ds, and in the frame of the particle, yielding ds‘: ds2 = c2 d t 2 - v2 dt2 dSI2

= c2dtI2

(12.1.8)

REVIEW OF SPECIAL RELATIVITYTHEORY

351

Since these are known to be equal we obtain for the proper time (12.1.9) These equations include the result that, except for factor c, proper time and proper distance are the same thing in this case. For finite intervals the proper time is obtained by integration: (12.1.10) It will turn out that the use of proper time, rather than time measured in any particular frame, is the appropriate independent variable for describing the kinematics of the motion of a particle, i.e., the relativistic generalization of Newton’s law. Recall the twin paradox, according to which a portable clock, carried away from and then returned to a stationary clock, gains less time than does the stationary clock. This leads to a seemingly paradoxical “principle of greatest time” according to which, of the motions from the initial to the final world-point, free motion takes the greatest proper time. Superficially it seems difficult to reconcile this with the principle that a straight line is the shortest distance between two points, but that’s relativity !

12.1.5. The Lorentz Transformation In the so-called Galilean relativity, time is universal, the same in all coordinate frames. In Einstein relativity, space and time coordinates transform jointly. From an unprimed frame K , at rest to a primed, identically oriented frame K’ moving with uniform speed V along the x-axis, the coordinates transform according to the Lorentz transformation, derived as follows. Except for signs, the metric given in Eq. (12.1.4) is the same as the Pythagorean formula in Euclidean geometry. In Euclidean geometry the transformations that preserve distances are rotations or translations or combinations of the two. Here we are referring to the relation between the coordinates in frames K and K’.By insisting that the origins coincide initially we exclude translations and are left with “rotations” of the form x = x’ cosh $V

ct = x’ sinh $V

+ cr’ sinh $v, + ct’ cosh $ v .

(12.1.11)

Substitution into Eq. (12.1.4) verifies that indeed ds = ds’. The occurrence of hyperbolic functions instead of trigonometric functions is due to the negative sign in Eq. (12.1.4). Consider the origin of the K’ system, x’ = 0. By Eq. (I 2.1.1 I), the motion of that point in frame K is described by

352

RELATIVISTIC MECHANICS

x = ct’ sinh $ v ,

ct = ct’ cosh $ v ,

(12.1.12)

and hence X

- = tanhqv,

V tanh$v = -.

or

ct

(12.1.13)

C

Using properties of hyperbolic functions, this yields

V

sinh $V = yv -,

cosh $V = yv , where yv =

C

1

d

r

n

.

(12.1.14)

Substitutingback into Eq. (12.1.1l),we have the Lorentz transformation equations x = yv (x’

+ Bvcr’)

Y =Y‘

z = 2’ ct = yv (pvx’

+ct’),

(12.1.15)

where

V

& = - I

Y V ’ J 1 - a : ’1

( 12.1.16)

C

The inverse relation can be worked out algebraically; by symmetry the result must be equivalent to switching primed and unprimed variables and replacing BV by -PV. x’

= y v ( x - Bvct’)

Y’ = Y 2’

=2

cr’ = yv (-Bvx

+ ct),

(12.1.17)

12.1.6. Transformation of Velocities The incremental primed coordinates of a particle moving with velocity v’ in the moving frame are given by dx’ = ui dt’, dy’ = u; dt’, dz’ = u: dt’. Using Q. (12.1.15), the corresponding unprimed coordinates are therefore dx = yv (u: dt’

+ pvcdt’)

dy = U; dt‘

dz = U: dt’ c dt = yv (Bv V: dz’

+ c dt’),

(12.1.18)

REVIEW OF SPECIAL RELATIVITY THEORY

353

The fixed-frame, unprimed, velocity components are obtained by dividing the first three of these equations by the last one, u:

v, =

+v

1

+ u;v/c2

1

+ v; v/c’

u;IYv

uy =

( 12.1.19)

In the special case that the particle is moving along the x’ axis with speed u’ this becomes u=

v

v’+

1

( 12.1.20)

+ v’V/c’‘

Of all the formulas of relativity, this is, to me, the most counterintuitive, since the truth of the same formula, but with the denominator set to one, seems so “obvious.” It is easy to see from these formulas that the particle velocity cannot exceed c in any frame.

12.1.7. Four-Vectors The formulas of relativity are made much more compact by using the four-component tensor notation introduced by Einstein. The basic particle coordinate 4-vector is given by x i , i = 0, 1,2,3, where x0 = c t ;

x I = x -,

x

x 3 =z.

=y;

(12.1.21)

Though it grates on the ear, and though it includes also time, for want of a better name this is called the 4-position of the particle. Any other four-component object whose components in different frames are related by Eqs. (12.1.15) is also called a 4-vector. Hence the 4-vector components A’ (lowercase Roman indices are always assumed to range over 0, 1,2, and 3) and A” are related by

+ flv A”) A’ = yv (Pv A’’ + A”) A’ = yv (A’’

A’ = A’’ A3 = At3.

(12.1.22)

The components A’ are called contravariant. Also introduced are covuriunt components given by A0 = A’;

A1 = - A 1 ;

A2 = -A2-,

A3 = - A

3

.

(12.1.23)

354

RELATIVISTIC MECHANICS

This is also called “lowering the index.” The same algebra that assured the invariance of $12 (see Eq. (12.1.6)) assures the invariance of the combination 3

- ( A ’ ) 2 - ( A 2 ) 2- ( A 3 ) 2=

A’ Ai

= A’&.

(12.1.24)

‘-0

Because of its invariance one calls A‘A; a 4-scalar. From two 4-vectors A’ and B‘ a scalar, called the scalar product, can be formed: A‘B,

3

A, B’ = AoBo

+ A’B1 + A2B2 + A3B3.

(12.1.25)

Its invariance is assured by the same algebra. A 16-component object, called a 4-tensor, can be formed from all the products of the components of two 4-vectors:

T’J = A‘Bj

i , j = 0, 1,2, 3.

( 12.1.26)

Any 16-componentobject transforming by the same formulas as A’BJ is also called a 4-tensor. If Ti’= Tj’,as would be true of A’ Bj if A’ and B’ happened to be equal, then the tensor is said to be “symmetric.” If T’j = - T i i it is “antisymmetric.” The operation of lowering or raising indices can be accomplished with the socalled metric tensor

Thus Ai = gijAi.

(12.1.28)

The indices of tensors of any order can be raised and lowered the same way. Note that g’J and gij themselves are consistent with this. Also the mixed tensor g; is equal to the “Kronecker delta”

( 12.1.29)

The terminology “metric” for written

g’j

is justified by the fact that Eq. (12.1.4) can be

ds2 = g i j dx‘ dxj.

(12.1.30)

The tensor g’J has the same components in all coordinate frames. Show this. Also it is symmetric.

REVIEW OF SPECIAL RELATIVITY THEORY

355

12.1.8. Antisymmetric 4-Tensors From the components p , , p y , and p , of a polar 3-vector, and the components a,, a,, , and a, of an axial 3-vector it is possible to construct an antisymmetric 4-tensor according to (Aik) =

(

-px O -Py

-Pz

? -2, :; 0 --ax

a2

-ay

Q,

)

(12.1.31)

0

This is a very important form in electrodynamics. By restricting transformations to pure rotations between frames at rest relative to each other, it can be checked that the elements p x , p y , and p , transform among themselves like the components of a polar 3-vector, partially justifying Eq. (12.1.31). Also, an axial 3-vector and an antisymmetric 3-tensor are closely related, as will now be shown. The cross product C = A x B of two polar vectors A and B can be represented in component form as

ci = eijkAJBk,

(12.1.32)

where ei,k is the usual antisymmetric symbol. The quantity C, unlike A and B, is an axial vector (also known as a pseudo-vector); their components switch sign upon inversion of the coordinate axes while those of C do not. But Ci can also be related to an antisymmetric tensor C i k = A JBk - AkB J according to (12.1.33)

12.1.9. The 4-Gradient of a 4-Scalar Function One can form a four-component object called the 4-gradient by differentiating a 4-scalar function #(ct, x , y , z) with respect to its four arguments. These derivatives appear naturally in the expression for the differential d 4 : ( 12.1.34)

From Eq. (12.1.25) it can be seen that for d 4 to be a scalar quantity, as it should be, this 4-gradient must be a covariant tensor. A compact notation is (12.1.35)

12.1.10. The 4-Velocity and 4-Acceleration The 4-velocity is defined by dxi u =-. ds

(12.1.36)

356

RELATIVISTIC MECHANICS

Comparison with Eq. (12.1.9) shows that, except for a factor c, ui is the rate of change of a 4-position with respect to proper time

)

dx cdtJ>

= (Y". Y"f)

3

(12.1.37)

where v is the ordinary particle velocity. As defined, the 4-velocity does not have the dimensions of velocity. Rather it is scaled by a factor c so that it is the velocity in units in which c = 1; as far as units are concerned, that makes it, like &, the ratio of the particle speed to the velocity of light. Because d s 2 = dx' dxi, the 4-scalar formed from u i is constant, independent of the particle's three velocity;

u'ui = 1;

(12.1.38)

this makes calling it the "magnitude squared' of ui potentially misleading and hence inappropriate. The 4-acceleration w iis defined similarly: (12.1.39) Differentiating Eq. (12.1.38), the 4-velocity and 4-acceleration are seen to be mutually "orthogonal": U'Wi

= 0.

(12.1.40)

These simple results have considerable significance in relativistic mechanics. 12.2. THE RELATIVISTIC PRINCIPLE OF LEAST ACTION

In nonrelativistic mechanics the Lagrange equation is derivable from the principle of least action, according to which the actual trajectory taken by a particle, between times to and t , minimizes the action function S defined by

S=

lof

(12.2.1)

Ldt

When the minimized function S(x0, to; x , t) is expressed in terms of the coordinates of the upper limit x and t , with the lower endpoint fixed, it satisfies the HamiltonJacobi relations.

as

p=-, ax

H --- -

as at

'

where H is the Hamiltonian corresponding to Lagrangian L .

(12.2.2)

ENERGY AND MOMENTUM

357

It is straightforward to generalize what has just been stated in such a way as to satisfy the requirements of relativity while at the same time leaving nonrelativistic relationships (i.e., Newton's Law) valid when speeds are small compared to c. Owing to the homogeneity of space and the homogeneity of time, the relativistically generalized action S cannot depend on the particle's coordinate 4-vector X I . Furthermore, it must be a relativistic scalar, as otherwise it would have directional properties forbidden by the isotropy of space. Owing to Eq.(12.1.38), it is impossible to form a scalar other than a constant using the 4-vector u' . In short, the only possibility for the action of a free-particle (ie., one subject to no force) is

1 I

S = (-mc)

d s = (-mc2)

lo1 7 J1 -

v2

dt,

(12.2.3)

where d s , the invariant interval defined in Eq.(12.1.4), is the proper time multiplied by c. The integral is to be evaluated between initial and final world points P1 and P2.A priori the multiplicative factor could be any constant, but it will be seen below why the factor has to be (-mc).The negative sign is significant. It corresponds to the seemingly paradoxical result mentioned above that the free-particle path from P1 and P2 maximizes the proper time taken. Comparing J3q. (12.2.1) and (12.2.3), it can be seen that the free-particle Lagrangian is L ( x , ~= ) -mc2

J

1 - -.

(12.2.4)

12.3. ENERGY AND MOMENTUM Using standard Lagrangian formalism the momentum p is dejned by

mv

(12.3.1)

For u small compared to c this gives the nonrelativistic result p 2 mv. This is the relation that fixed the constant factor in the initial definition of the Lagrangian. Using Eq. (12.2.4) and Eq. (12.3.1), one obtains the Hamiltonian H and hence the energy & by

&=p.v-L=-----

mcL

Jiq

- m c 2 yu.

(12.3.2)

For v small compared to c this gives

(12.3.3)

358

RELATIVISTICMECHANICS

which is the classical result for the kinetic energy, except for the additive constant €0 = mc2, which is known as the rest energy. An additive constant like this has no effect in the Lagrangian description. From Eq. (12.3.1) and Eq. (12.3.2) come the important identities E 2 -- p 2c 2 i - m2 c4 ,

EV

P=-p.

(12.3.4)

For massless particles like photons these reduce to u = c and

E

p = -;

(12.3.5)

C

This formula also becomes progressively more valid for a massive particle as its total energy becomes progressively large compared to mc2. As stated previously, m is the “rest mass,’’ a constant quantity, and there is no question of “mass increasing with velocity” as occurs in some descriptions of relativity, such as the famous “E = rnc2,” which is incorrect in our formulation. Remembering to express H in terms of p , the relativistic Hamiltonian of a free particle is given by

.-/,

H(p) =

(12.3.6)

H depends on p but not on q. 12.3.1. 4-Vector Notation

Referring back to Eq.(12.1.37),it can be seen that p. as given by Eq. (12.3.1), and E , as given by Eq. (12.3.2), are closely related to the 4-velocity ui.We define a momentum 4-vector p’ by

(12.3.7)

We expect that p i pi, the scalar product of pi with itself should, like all scalar products, be invariant, The first of Eqs. (12.3.4)shows this to be true: p i p , = E 2 / c2

- p 2 = m2c2.

(12.3.8)

Because they belong to the same 4-vector, the components of p and E in different coordinate frames are related according to the Lorentz transformation, Eq. ( 1 2.1.15).

359

RELATlVlSTlC HAMILTON-JACOEITHEORY

12.4. RELATIVISTIC HAMILTON4ACOBITHEORY Corresponding to the Hamiltonian of Eq.(12.3.6), the Hamilton-Jacobi equation is

(

$)2

= c2

(g)2+ c2 (

;)2

+ c2

(g) + 2

m2c4.

(12.4.1)

Since the relations p = aS/ax and & = -aS/at of Eq. (12.2.2) were derived purely from the calculus of variations, without reference to physical meaning, they must remain valid in relativistic mechanics. Nevertheless we will rederive them in order to illustrate an abbreviated manipulation. The variations, 6S,in the action accompanying a variation, 6 x i ( t ) , away from the true world trajectory are what establish the equations of motion. Here S x i ( t ) is an arbitrary function. Variation of the integrand of Eq.(12.2.3) yields Sds = 6

J

m =

(Adxi)dx'

+ dXiS(dx')

2J&s

(12.4.2) The last line is preparatory to integration by parts. The variation of the action is (12.4.3) With 6 d s given by Eq. (12.4.2), the integration limits can be held fixed or varied as we wish. If the endpoints are held fixed, the term coming from the first term of the last Iine of Eq.(12.4.2) vanishes. In that case, since the principle of least action assures that SS vanishes, and since Ax' is arbitrary, the vanishing of the 4-acceleration wi = dui / d s follows; this is appropriate for a force-free situation. When the upper endpoint of the integral in Eq. (12.4.3) is varied, but with the requirement that the trajectory be a true one, then the term in the integral coming from the second term in Eq. (12.4.2) vanishes leaving

SS = -mcuiSx'.

(12.4.4)

This yields pi = mcui =

(F,

as

-p) = --

axi

(12.4.5)

which agrees with Eqs. (12.3.7). Notice that the contravariant and covariant indices magically take care of the signs. Also, the result is consistent with Eq.(12.1.35); the 4-gradient of a scalar is a covuriant 4-vector.

360

RELATIVISTIC MECHANICS

12.5. FORCED MOTION If the 4-velocity is to change, it has to be because force is applied to the particle. It is natural to define the 4-force g' by the relation

where the classically defined Newtonian force is F = - -dP . dt

(12.5.2)

The time component yuF . v/c2 is related to the rate of work done on the particle by the external force. Note that it vanishes in the case that F . v = 0, as is true for a charged particle in a purely magnetic field.

12.6. GENERALIZATIONOF THE ACTION TO INCLUDE ELECTROMAGNETICFORCES The action for a free particle t

S = - m c 1 ds

(12.6.1)

was selected because, except for an arbitrary multiplicative factor, d s is the only first-order-differential,origin-independent4-scalar that can be constructed. The constant factor was selected to assure correspondence with Newtonian mechanics in the nonrelativistic limit. We now generalize this by introducing an initially arbitrary 4-vector function of position A' ( x i ) = ( 4 ,A), and take for the action

S = l o t ( - m c d s - eA; d x ' ) .

(12.6.2)

The integrand certainly satisfies the requirement of being a relativistic invariant. Like the factor -mc, chosen to make free motion come out right, the factor e is chosen to make this action principle lead to the forces of electromagnetism, with e being the charge on the particle. SI units (also known as MKS units) are being employed. The factors qj and A are called the scalar and vector potentials, respectively. Spelling out the integrand more explicitly, and making the differential be d t , so as to be able to

GENERALIZATION OF THE ACTION TO INCLUDE ELECTROMAGNETICFORCES

361

extract the Lagrangian, the action is S=

lo‘(

- m c 2 / z

+ eA . v - e#)

dt.

(12.6.3)

This shows that the Lagrangian is

L = -mc2Jl-

212 C2

+ eA . v - etp.

(12.6.4)

(Another candidate for the action that would be consistent with relativistic invariance is A ( x ’ ) ds, where A ( x ’ ) is a scalar function of position, but that would not lead to electromagnetism.) Once the Lagrangian has been selected, one must slavishly follow the prescriptions of Lagrangian mechanics in order to introduce the “momentum” P, conjugate to x, and to obtain the equations of motion. This newly introduced (uppercase’) momentum will be called the generalized momentum, to distinguish it from the previously introduced “ordinary momentum” or “mechanical momentum” p. You should continue to think of the (lowercase) quantity p as the generalization of the familiar mass times velocity of elementary mechanics. The generalized momentum P has a more formal significance connected with the Lagrange equations. It is given by

aL p=--

-

mv

t- e A = pt- eA. Jq

(12.6.5)

Notice in particular that, unlike p, the generalized momentum P depends explicitly on 4-position x’ . We need only follow the rules to define the Hamiltonian by

which must still, however, be expressed in terms of P rather than v. In Q. (12.3.4), the rest mass m ,the ordinary momentum p , and the ordinary or mechan&il energy &kin = m c 2 / d m were related by

&in = p2c2 + m2c4

(12.6.7)

Here we have used the symbol &kin, which, since it includes the rest energy, differs by that much from being a generalization of the “kinetic energy” of Newtonian mechanics. Nevertheless it is convenient to have a symbol for the energy of a particle ‘Some authors reverse the roles of uppercase P and lowercase p.

362

RELATIVISTIC MECHANICS

that accompanies its very existence and includes its energy of motion but does not include any “potential energy” due to its position in a field of force. Using Eq. (12.6.5) and Eq. (12.6.6). this same relation can be expressed in terms of P and H:

(H- e#)2 = (P - eA)2c2 + m2c4.

(12.6.8)

Solving for H yields

+

+

H ( x , P)= ,/m2c4 (P- eA(x’))*c2 e 4 ( n i ) .

(12.6.9)

Remember that the Hamiltonian is important two ways. One is formal; differentiating it appropriately leads to Hamilton’s equations. The other deals with its numerical value, which is called the energy, at least in those cases where it is conserved. Eq. (12.6.9) should seem entirely natural; the square root term gives the mechanical energy (remember that the second term under the square root is just c2 times the ordinary momentum) and the other term gives the energy it has by virtue of its charge e being located at position xi where the electric potential function is 4(xi . Corresponding to this Hamiltonian, the Hamilton-Jacobi equation is

(g+

2

e#) - (VS - eA)2c2 - m2c4 = 0.

(12.6.

12.7. DERIVATION OF THE LORENTZ FORCE LAW To obtain the equations of motion for our charged particle, we write the Lagrange equations with L given by Eq. (12.6.4). One term is V L = ~ v ( A ( x ’ )V) . - eV#(x’).

(12.7.1)

Remembering that the very meaning of the partial derivative symbol in the Lagrange equation is that v is to be held constant, the first term becomes

e V ( A . v ) = e ( v . V ) A + e v x ( V x A),

(12.7.2)

where a well-known vector identity has been used. The meaning of the expression (v . V)A is certainly unambiguous in Cartesian coordinates. Its meaning is ambiguous in curvilinear coordinates, but we assume Cartesian coordinates without loss of generality since this term will be eliminated shortly. With Eq.(12.7.2), and using Eq. (12.6.5), the Lagrange equation becomes d dt

-p

d + e-A dt

= e ( v - V)A

+ ev x (V x A) - eV#.

(12.7.3)

At this point a great bargain appears. For any function F ( x , t ) , the partial derivatives and the total derivative are related by

-dF

= -aF

dt

at

+ (v. V ) F

(12.7.4)

GAUGE INVARIANCE

363

The first term gives the change of F at a fixed point in space, and the second term gives the change due to the particle’s motion. This permits a hard-to-evaluate term on the left-hand side, d A / d t , and a hard-to-evaluate term on the right-hand side, ( v . V ) A ,to be combined to make an easy-to-evaluate term, yielding

aA 9 = -e-eV$+ev dt at

x (V x A).

(12.7.5)

At this point we introduce the electric field intensity E and the magnetic field intensity B defined by

aA E = -- - V $ , at

B=VxA.

(12.7.6)

Finally we obtain the so-called Lorentzforce law

_ dp - eE+ev dt

x B.

(12.7.7)

Since the A’ was arbitrary, the electric and magnetic fields are completely general, consistent with Eq.(12.7.6). From its derivation, the Lorentz equation, though not manifestly covuriunr, has unquestionable relativistic validity. It describes the evolution of spatial components. One can look for the corresponding time-component evolution equation. It is (12.7.8) and gives the change in mechanical energy due to the applied field. (Check it using Eq. (12.5.2).) This is entirely consistent with the Newtonian formula that the rate of change of energy is the rate that the external force (given by d p l d t ) does work, as given by the right-hand side of Eq. (12.7.8). Under the Lorentz force law, Eq. (12.7.7), since the magnetic force is normal to v, it follows that a magnetic field can never change the particle energy. Rather, the rate of change of energy is given by

dEkin - - eE . v , dt

(12.7.9)

just as should have been expected. 12.8. GAUGE INVARIANCE Though the 4-potential A’ E (4, A ) was introduced first, it is the electric and magnetic fields E and B that manifest themselves physically through the forces acting on charged particles. They must be determinable uniquely from the physical conditions. But because E and B are obtained from A’ by differentiation, there is a lack of uniqueness in A’, much like the “constant of integration” in an indefinite integral. In

364

RELATIVISTIC MECHANICS

electrostatics this indeterminacy has already been encountered; adding a constant to the electric potential has no observable effect. With the 4-potential, the lack of determinacy can be more complicated because a change in $ can be compensated for by a change in A. For mathematical methods that are based on the potentials, this can have considerable impact on the analysis, though not on the (correctly) calculated E and B fields. The invariance of the answers to transformations of the potentials is called “gauge invariance.” The gauge invariance of the present theory follows immediately from the action principle of Eq. (12.6.2).Suppose the action in that equation is altered according to

(12.8.1) where f ( x ) is an arbitrary function of position. When integrating over the revised Lagrangian, the extra term does not alter the action and hence does not affect determination of extremals. As a result, the physics is unaffected by this “change of gauge.” It is instructive also to confirm that this change in A’ has no effect on E and B when evaluated by Eq.(12.7.6).

12.9. TRAJECTORY DETERMINATION 12.9.1. Motion in a Constant Uniform Electric Field As an example using these equations consider a particle with charge e moving in a uniform electric field E directed parallel to the z-axis. Nonrelativistically one would solve rn dx/dt = eE, and this equation remains relativistically valid if cast in the form dP dt

- = eE.

(12.9.1)

With E constant this can be integrated once immediately,

+

(12.9.2)

p = PO eEt,

but further integration requires p to be expressed in terms of v = x. This becomes progressively more important as the motion becomes more relativistic, u 2 c, since, though p can increase without limit, v cannot. The best procedure for finding v is to find €kin, and use v = c2p/€kin. To simplify the algebra a bit, with no essential loss of generality, consider the special initial conditions illustrated in Fig. 12.9.1, with initial energy €0, initial mo, to the y-axis and normal to E, and origin chosen so the particle mentum p ~ ?parallel starts from x = €o/(eE), y = 0. From Eq. (12.9.2),using Eq. (12.6.7).

px = eEt,

p y = po.

€kin = ,/E:

+ (ceEt)2.

(12.9.3)

TRAJECTORY DETERMINATION

365

"1"' I

-

x c

Eo/eE FIGURE 12.9.1. Trajectory followed by a charged particle in a uniform electric field. The initial momentum is transverse to the electric field.

Then, using Eq. (12.3.4),

c2eEt

P0C2

€02 + ( c e E t )

(12.9.4)

Notice the superficially surprising result that v y approaches zero for large t. Integrating these yields

(12.9.5) The orbit equation can be obtained by eliminating t from this equation:

(12.9.6)

12.9.2. Motion in a Constant Uniform Magnetic Field The equation of motion in a magnetic field is

dP - = e v x B. dt

(12.9.7)

Since the force depends on v, this equation cannot be integrated immediately. However the motion is even simpler in the respect that €kin is conserved in a pure magnetic field. This follows immediately from Eq. (12.7.8): d&in

-= v . ( e v x B) = 0. dt

(12.9.8)

As a result, all of the quantities flu, yv, €, u , and p are constants of the motion. Assuming the magnetic field B& is directed along the z-axis, Eq. (12.9.7) becomes

366

RELATIVISTIC MECHANICS

dv ec2B v x 2. dt &o

(1 2.9.9)

Introducing the “cyclotron frequency” (constant in the nonrelativistic regime, but energy-dependent at high energy) ec2B

(12.9.10)

w, = -,

€0

the velocity components satisfy ux = o c u y ,

iry = -w,v,,

ljz

= 0.

(12.9.11)

With appropriate initial conditions these yield

(12.9.12) Integrating these yields the expected motion on a cylinder of radius r given by

(12.9.13)

12.10. THE LONGITUDINALCOORDINATE AS INDEPENDENTVARIABLE Highly relativistic particles usually belong to “beams” of more or less parallel particles. As in optics, it is then convenient to distinguish between “transverse” coordinates x and y and a “longitudinal” coordinate, now to be called s, and to use s rather than t as the independent variable. (In the “paraxial approximation,” if all particles are traveling at almost the speed of light, s and ct are approximately equal.) Ordinarily these coordinates are defined as “increments” relative to a “nominal” or “reference” particle that defines the center of the beam. That means that the triplet ( x , y, t) is to be regarded as being the quantities whose evolution as a function of s is to be described by the equations of motion. The motion of the reference particle is as, p ; ( ~ )f ,o ( s ) , p:(~)),where sumed to be known; it is given by (xo(s), p ; ( ~ )yo(s), P‘ will be defined shortly. For writing linearized equations we could define small differences, S x ( s ) = x(s) - A&), etc., but instead, at a certain point below, we will simply redefine the coordinates (x, P x , y , PY, t , P ‘ t ) as small deviations from the reference orbit. For now, they are absolute particle coordinates of a general particle in a global frame of reference. The transformation from t to s, as independent variable, in Hamiltonian language, is straightforward but confusing. For the moment suppress ( y , PY), since they enter

THE LONGITUDINAL COORDINATE AS INDEPENDENT VARIABLE

367

just like (x,P"). The Hamiltonian (12.6.9) has the form H = H ( x , P x , s, P s , t ) = Jm2c4

+ (P - eA(x))2c2+e4(x),

(12.10.1)

where the canonical momentum P and mechanical momentum p are related by

P=p+eA.

( 12.10.2)

Of Hamilton's equations, the ones we will refer to below are ds aH _ ---aps' dt

or

dt -=(=) ds

-1

,

(12.10.3)

and (12.10.4) Define a new variable P' = - H ( x , P x , S, P s , t ) .

(12.10.5)

This is to be solved for P s , with the answer expressed in terms of a function K , which will turn out to be the new Hamiltonian: P S = -K(X, P X ,t , P', s).

(12.10.6)

From Eqs. (12.6.9) and (12.6.5), it can be seen that the numerical value of -P' is the total energy € = €M ecp. The differential d P s can be obtained either directly from Eq. (12.10.6), or indirectly from Q. (12.10.5), using Eq. (12.10.3). The results are

+

an

d P S = --dx ax

= (-dP'-

- -aK: dPX

a PX

8H =dx

- -an dt at

aH a PX

- -dPX

an

- -dP' apt

aH

- -ds

as

an

- -d s

as

- -adHt at

)

(%)-I

. (12.10.7)

Equating coefficients, and using Eq. (12.10.4), as well as the other Hamilton equations in the original variables, the equations of motion in the new variables can be written in Hamiltonian form, with derivatives with respect to s being symbolized by primes:

(12.10.8)

368

RELATIVISTICMECHANICS

The manipulations that have been described can be performed explicitly, using Eqs. (12.10.1) and (12.10.5), with the result

K = -eAli - (PI + e#)2/c2

- m2c2

- (Pl- eA1)2,

(1 2.10.9)

where components parallel to and perpendicular to the reference orbit have been introduced. (In field-free regions we have

where the generalized momentum p' is minus the energy.) The Hamilton equations are (12.10.11) These equations can be "linearized" by approximating the right-hand side of Eq. (12.10.11) by the first term in a Taylor expansion:

2.10. As forewarned, the quantities xi and Pi are now to be interpreted as small deviations

from the known reference trajectory. The partial derivatives are evaluated on the reference trajectory. These equations correspond to a quadratic Hamiltonian,

BIBLIOGRAPHY

References for Further Study L. D.Landau and E.M.Lifshitz, The Classical Theory of Fields, Pergamon, Oxford, 1971.

13 SYMPLECTIC MECHANICS 13.1. DERIVATION OF HAMILTON’S EQUATIONS

“Sympiectic mechanics” is the study of mechanics using “symplectic geometry,” a subject that can be pursued with no reference whatsoever to mechanics. However, we will regard “symplectic mechanics” and “Hamiltonian mechanics” as essentially equivalent. For coherence, we start by rederiving Harniltonian mechanics, using however a more formal approach than in the previous chapter, where Harniltonian mechanics was introduced from a “Hamilton-Jacobi” perspective. Here we review the formal, analytical derivation of Hamilton’s equations starting from Lagrange’s equations. This is largely repetitive of material in Section 7.2, where the Routhian reduction procedure was described; there a single momentum variable was introduced and used to replace the second-order differential equation for its conjugate coordinate by two first-order equations. This amounted to treating one coordinate by Hamilton’s equations and all the others by Lagrange’s equations and was effective primarily if the coordinate was cyclic so the momentum variable was conserved. Here we transform all the equations into Hamiltonian form. Given coordinates q and Lagrangian L,“canonical momenta” are defined by (13.1.1)

p, is said to be “conjugate” to qj. To make partial differentiation like this meaningful, it is necessary to specify what variables are being held fixed. We mean implicitly that variables q’ for all i, q’ for i # j , and t are being held fixed. Having established variables p, it is absolutely required in all that follows that velocities q be explicitly expressible in terms of the q and p, as in

4’ = f’(q,p, r ) ,

Of

4 = f(q,P,t ) .

(13.1.2) 369

370

SYMPLECTIC MECHANICS

The prototypical example of this is (13.1.3) L = -m ( x * 2 + y- )2 - V ( x , y ) , p = - -aL ;=mr, r-= - .P 2 ar m Hamilton's equations can be derived using the properties of differentials. Define the "Hamiltonian" by

H ( q , p. t ) = p i f ' ( q , p, t ) - L ( q . f ( q , t ) . f ) , ~9

( 13.1.4)

where the functions f ' were defined in Eq. (13.1.2). If these functions are, for any reason, unavailable, the procedure cannot continue; it is absolutely obligatory that the velocity variables be eliminated in this way. Furthermore, as indicated on the left-hand side of E,q. (13.1.4), it is essential that the formal arguments of H be q and p and t. Then, when writing partial derivatives of H , it will be implicit that the variables being held constant are all but one of the q and p and t. If all independent variables of the Lagrangian are varied incrementally the result is (13.1.5)

(It is important to appreciate that the q' and the q' are being treated as formally independent at this point. Any temptation toward thinking of q' as some sort of derivative of q' must be fought off.) The purpose of the additive term pk f in the definition of H is to cancel terms proportional to dq' in the expression for dH;

= -pi dq'

aL + q'dpi - dt, at

(13.1.6)

where the Lagrange equations (5.3.8) as well as Eq. (13.1.2) have been used. This transformation is known as a "Legendre transformation." Such a transformation has a geometric interpretation,' but it is probably adequate to think of it as purely a formal manipulation. Similar formal manipulations are common in thermodynamics. Hamilton's first-order equations follow from Fq.(13.1.6):

aH pi = -7, 34

(13.1.7)

'The geometric interpretation of a Legendre transformationis discussed in Arnold [ I ] and Lanczos 121.

DERIVATION OF HAMILTON’S EQUATIONS

371

Remember that in the partial derivatives of H,the variables p are held constant, but in aL/at the variables q are held constant. For reminding oneself how p , q i - L metamorphoses into “total energy,” a good example is that of a particle in a 2-D potential, mentioned in Eq.(13.1.3). The Hamiltonian is H ( X ,p) = p x i

1 -2 1 1 + p y y - -mx - - m i 2 + V ( X , y ) = %(p: + p : ) + V ( X ,y ) . 2 2 ( 13.1.8)

This example is also a good way to remember which of the Hamilton equation gets a negative sign.

Problem 13.1.1: Recall Problem 10.1.1, which described rays approximately parallel to the z-axis, with the index of refraction given by n = no(1 + B(x2 + y 2 ) ) . Generalizing this a bit, allow the index to have the form n ( p ) where p = -./, Using ( p , 4) coordinates, where 9 is an azimuthal angle around the z-axis, write the Lagrangian L ( p , p’, 4, #, z) appropriate for use in Eq. (5.3.2). (As in that equation primes stand for d/dz.) Find momenta pp = aL/ap’ and p# = aL/a#’, and find the functions f’ defined in Eq. (13.1.2). Find an ignorable coordinate and give the corresponding conserved momentum. Write the Hamiltonian H according to Eq. (13.1.4). Why is H conserved? Take H = E. Solve this for and eliminate using the conserved momentum found earlier. In this way the problem has been “reduced to quadratures.” Write the integral that this implies.

4

Problem 13.1.2: For coordinates that are not rectangular, the kinetic energy ac, quires a somewhat more general form than in Eq. (13.1.8); T = i A r s ( q ) q r q S with V = V ( q ) . In this case, defining matrix B = A-’, find the Hamiltonian, write Hamilton’s equations, and show that the (conserved) value of H is the total energy E = T +V . The momentum components are proportional to the velocity components only if matrix A,, is diagonal, but they are always homogeneously related. 13.1.1. Charged Particle in ElectromagneticField Both as review and to exercise the Hamiltonian formalism, consider a nonrelativistic particle in an electromagnetic field. In Section 12.6, it is shown that the Lagrangian is

+

L = j m ( i 2+ j 2 + i 2 ) e ( A , i

+ A y j + A , i ) - e @ ( x ,y , z ) ,

(13.1.9)

where A is the vector potential and @ is the electric potential. The middle terms, linear in velocities, cannot be regarded naturally as either kinetic or potential energies. Nevertheless their presence does not impede the formalism. In fact, consider an even more general situation,

L = iArs(q)q‘qS

+ Ar(q)q‘ - V ( q ) .

(13.1.10)

372

SYMPLECTIC MECHANICS

Then Pr

= Ars$

+ Ar,

and

4‘‘ = Brs(ps - A r ) .

(13.1.1 1)

By comparing this with Problem 13.1.2, it can be seen that in this case the momentum and velocity components are inhomogeneously, though still linearly, related. The Hamiltonian is

and Hamilton’s equations follow easily.

13.2. RECAPITULATION We have seen that Newtonian and Lagrangan mechanics are naturally pictured in configuration space, while Hamiltonian mechanics is based naturally in phase space. This is illustrated in Fig. 13.2.1. In configuration space, one deals with spatial trajectories (they would be rays in optics) and “wavefront-like” surfaces that are transverse to the trajectories. A useful concept is that of a “congruence” or bundle of spacefilling, nonintersecting curves. A point in phase space fixes both position and slope

CONFIGURATION SPACE

PHASE SPACE

wavefronts f P

I

trajectories

Y

Trajectories can cross. Initial position does not determine trajectory.

trajectory of particle ( 1)

reference trajectory

Trajectories cannot cross. Initial position determines subsequent trajectory.

FIGURE 13.2.1. Schematic representation of the essential distinctions between configuration space and phase space. In phase space it is especially convenient to define a “referencetrajectory‘‘ as shown and to relate nearby trajectories to it.

THE SYMPLECTIC PROPERTIES OF PHASE SPACE

373

of the trajectory passing through that point, and as a result there is only one trajectory through any point and the valid trajectories of the mechanical system naturally form a congruence of space-filling, nonintersecting curves. This is in contrast to configuration space, where a rule relating initial velocities with initial positions must be given to define a congruence of trajectories. In Newtonian mechanics, it is natural to work on finding trajectories starting from the n second-order, ordinary differential equations of the system. In Hamilton-Jacobi theory one first seeks the wavefronts, starting from a partial differential equation. As stated already, both descriptions are based on configuration space. If the coordinates in this space are the 3n Euclidean spatial components, the usual Pythagorean metric of distances and angles applies and, for example, it is meaningful for the wavefronts to be orthogonal to the trajectories. Also, the distance along a trajectory or the distance between two trajectories can be well defined. The natural geometry of Hamiltonian mechanics is phase space and one seeks the trajectories as solutions of 2n first-order, ordinary differential equations. In this space, the geometry is much more restrictive, as there is a single trajectory through each point. Also, there is no natural metric by which distances and angles can be defined. “Symplectic geometry” is the geometry of phase space. It is frequently convenient, especially in phase space, to refer a bundle of system trajectories to a single nearby “reference trajectory,” as shown in Fig. 13.2.1.But because there is no metric in phase space, the “length” of the deviation vector is not defined. Even in Hamiltonian mechanics one usually starts from a Lagrangian L(q,q, t ) .

13.3. THE SYMPLECTIC PROPERTIES OF PHASE SPACE 13.3.1. The Canonical Momentum One-form Why are momentum components indicated by subscripts, when position components are indicated by superscripts? Obviously it is because momentum components are covariant whereas position components are contravariant. How do we know this? Most simply it has to do with behavior under coordinate transformations. Consider a transformation from coordinates q’ to Qi = Q’ (q).Increments to these coordinates are related by

aQ‘ dQ = - d q J a4

.

= A’j(q)dqj,

(13.3.1)

J

which is the defining equation for the Jacobean matrix A’ j ( q ) .This is a linear transformation in the tangent space belonging to the manifold M , whose coordinates are q. The momentum components P corresponding to new coordinates Q are given by

374

SYMPLECTIC MECHANICS

where ( h - ’ ) j l = a4’/aQ’.2 This uses the fact that . the. matrix of derivatives a q j / a Qiis the inverse of the matrix of derivatives 8 Q J / a q l .It is the appearance of the transposed inverse Jacobean matrix in this transformation that validates calling p a covariant vector. With velocity q (or displacement dq) residing in the tangent space, one says that p resides in the cotangent space. From Eq. (2.2.3) we know that these transformation properties ensure the existence of a certain invariant inner product. In the interest of making contact with the notation used there, we therefore introduce, temporarily at least, the symbol 7 for momentum. Then the technical meaning of the statement that 7 resides in the cotangent space is that the quantity (7,dq) = pi d4’ is invariant to the coordinate transformation from coordinates q to coordinates Q. As an alternate notation for 5, one can introduce a one-form or defined so that :(*) E (5,.), which yields a real number when acting operator if1) on increment dq. (The . in (E, .) is just a placeholder for dq.) It is necessary to distinguish mathematically between p dq and pi dq’, two expressions that a physicist is likely to equate mentally. Mathematically the expression pi dq’ is a one-form definable on any manifold, whether possessed of a metric or not, while p .dq is a more specialized quantity that is only definable if it makes sense for p and dq to be subject to scalar multiplication because they reside in the same metric space. The operator is known as a “one-form,” with the tilde indicating that it is a form and the superscript (1) meaning that it takes one argument. Let the configuration space, the elements of which are labeled by the generalized coordinates q, be called a “manifold” M. At a particular point q in M, the possible velocities q are said to belong to the “tangent space” at q, denoted by TMq. The operator “maps” elements of T M , to the space R of real numbers:

-

Consider a real-valued function f(q) defined on M, f : M + R.

As introduced in Section 2.4.5, the prototypical example of a o2e-form is the “differential” of a function such as f ;it is symbolized by #’) = dfq. An incremental deviation dq from point q is necessarily a local tangent vector. The corresponding (linearized) change in value of the function, call it dfq (not boldface and with no tilde) is proportional to dq. Consider the lowest-order Taylor approximation,

20ur convention i s that matrix elements such as A’, do not depend on whether the indices are up or down, but their order matters; in this case the order j then 1 is indicated by the I being spaced to the right.

THE SYMPLECTIC PROPERTIES OF PHASE SPACE

375

By the “linearized” value of df we mean this approximation to be taken as exact so that (13.3.3) this is “proportional” to dq in the sense that doubling dq doubles df,. If dq is tangent to a curve y passing through the point q then, except for a scale factor proportional to rate of progress along the curve, d f , can be regarded as the rate of change of f along the curve. Except for the scale factor, d f q is the same for any two curves that are parallel as thexpass through q. Though it seems convoluted at first, for the particular function f,df, therefore maps tangent vector dq to real number dfq;3

Zf, : TM, + R . To recapitulate, the quantity (p, .), abbreviated as $ or later even just as p, is said to be a “one-form,” a linear, real-valued function of one vector argument. The components pj of in a particular coordinate system, which in “classical” terminology are called covariant components, in “modern” terminology are the coefficients of a one-form. We are to some extent defeating the purpose of introducing one-forms by insisting on correlating their coefficients with covariant components. It is done because components are to a physicist what insulin is to a diabetic. A physicist says that “piq’ is manifestly covuriant (meaning invariant under coordinate transformation) because q i is contravariant and pi is covariant.” A mathematician says the same thing in coordinate-free fashion as “cotangent space one-form 5 maps tangent space vector q to a real number.” What about the physicist’s quantity peq? Here physicists (Gibbs initially I believe) have also recognized the virtue of intrinsic coordinate-free notation and adopted it universally. So p . q is the well-known coordinate-independentproduct of three factors, the magnitudes of the two vectors and the cosine of their included angle. But this notation implicitly assumes a Euclidean coordinate system, whereas the “one-form” notation does not. This may be the source of the main difficulty a physicist is likely to have in assimilating the language of modem differential geometry: Traditional vector calculus, with its obvious power, already contains the major benefits of intrinsic description without being burdened by unwelcome abstraction. But traditional vector analysis contains the implicit specialization to Euclidean geometry. This makes it all the more difficult to grasp the more abstract analysis required when Euclidean geometry is inappropriate. Similar comments apply with even greater force to cross products p x q and even more yet to curls and divergences. -i For particular coordinate q i , the coordinate one-form dq picks out the corre-i sponding component V i from arbitrary vector V as V‘ = (dq ,V). Since the components pi are customarily called “canonically conjugate” to the coordinates q’, the 3We use the notation d”f for the differentid-of f but, according to Section 4.4, where the operator is discussed, it is shown that this is the same as df.

376

SYMPLECTIC MECHANICS

one-form ( 13.3.4)

is said to be the “canonical momentum one-form.” Incidentally, when, one uses the differential form will evenexpanded in terms of its components as pi tually be replaced by an ordinary differential dq’ , and manipulations of the form will not be particularly distinguishable from the manipulations that would be performed on the ordinary differential. Nevertheless, it seems somewhat clearer when describwhich is a property ing a multiplicity of mechanical systems, to retain the form of the coordinate system, than to replace it with dq’, which is a reconfiguration of a particular mechanical system.

a,

G’,

a,

13.3.2. The Symplectic Two-Form ;3 In spite of having just gone to such pains to explain the appropriateness of using the symbol 5 for momentum in order to make the notation expressive, we now drop the tilde. The reason for doing this is that we plan to work in phase space, where q and p are to be treated on a nearly equal footing. Though logically possible, it would be simply too confusing, especially when introducing forms on phase space, to continue to exhibit the intrinsic distinction between displacements and momenta explicitly, other than by continuing to use subscripts for the momentum components and superscripts for generalized coordinates. By lumping q and p together we get a vector space with dimension 2n, double the dimensionality of the configuration space. (As established previously, there is no absolute distinction between covariadt and contravariant vectors per se.) Since we previously identified the p’s with forms in configuration space and will now proceed to introduce forms that act on p in phase space, we will have to tolerate the confusing circumstance that p is a form in configuration space and a portion of a vector in phase space. Since “phase space” has been newly introduced, it is worth mentioning a notational limitation it inherits from configuration space. A symbol such as x can mean either where a particle actually is or where, in principle, it c d d be; context is necessary to determine which is intended. Also, when the symbol iappears it usually refers to an actual system velocity, but it can also serve as a formal argument of a Lagrangian function. The same conventions have to be accepted in phase space. But the q’s and the p’s are not quite equivalent as the q’s are defined independently of any particular Lagrangian, while the definition of the meaning of the p ’ s depend on the Lagrangian. Still, they can refer either to a particular evolving system or to a possible configuration of the system. Mainly, then, in phase space the combined set q, p plays the same role as q plays in configuration space. In Problem 10.1.1 it was found that the quantity q ( z ) p 2 ( z ) - xz(z)pl(z) calculated from two rays in the same optical system is constant, independent of longitudinal coordinate z. This seemingly special result can be generalized to play a central role in Lagrangian (and hence Hamiltonian) mechanics. That is the immediate task.

THE SYMPLECTIC PROPERTIES OF PHASE SPACE

377

The simultaneous analysis of more than one trajectory at a time characterizes this newer-than-Newtonian approach. We start by reviewing some topics from Section 2.1. Recall Eq. (2.4.23), by which tensor product f = x 63 y is defined as a function of one-forms 6 and 5

fc;,’;> = (6,x)G, Y).

(13.3.5)

Furthermore, as in Eq. (2.4.29), a (mixed) tensor product f = 6 63 y can be similarly defined by

f(x, 3 = (ii, x) 6, y).

(13.3.6)

From quantities like this “wedge products” or “exterior products” are defined by

-

X A

y6,

v)

U A v(X, y) =

(X,

6 )(y,?) - (X, 7)(Y,S),

6,X) fi,y) - (ii,y)fi,X).

(13.3.7)

Another result from Section 2.1 was the construction of a multicomponent bivector from two vectors, x and y, with the components being the 2 x 2 determinants constructed from the components of the two vectors. As shown in Fig. 4.2.2 and again in Fig. 13.3.1, these components can be interpreted as the areas of the projections onto the coordinate axes of the paralellogram formed from the two vectors. They can also regarded (except for a possible combinatorial factor) as the components of an anti-symmetric two-component tensor, xi’, with x12 = x ’ y 2 - x 2 y 1 , etc. We now intend to utilize these quantities in phase space. As in geometric optics, we will consider not just a solitary orbit, but rather a congruence of orbits or, much of

FIGURE 13.3.1. The “projected area” on the first coordinate plane (9’. p1) defined by tangent vectors d q l ) = (dq(l),dp(l)ITand dz(2) = (dq(n), d ~ ( 2 ) ) ~ .

378

SYMPLECTICMECHANICS

the time, two orbits. As stressed already, in phase space there can be only one valid orbit through each point, which is the major formal advantage of working in phase space. To discuss two particular close orbits without giving preference to either, it is useful to refer them both to a reference path as in Fig. 13.2.1. Though it would not be necessary, this reference path may as well be thought of as a valid orbit as well. A point on one nearby orbit can be expressed by dz(1) = (dq(l), dp(l))Tand on the other one by dq2) = (dq(2,, dp(2))T. Consider a particular coordinate q , say the first one, and its conjugate momentu_m p. Since these can be regarded as functions in phase space, the differential forms d2 and are everywhere defined4 As in Eq. (2.3.l), when “coordinate one-form” dq operates on the vector dz(11,the result is

6

&(dz(l)) = d q ( l ) , and similarly &(dz(l$ = dp(1).

(13.3.8)

&,

Notice that it has been necessary to distinguish say, which is a form specific to the coordinate system, from dq(l),which is specific to a particular mechanical system (1). As usual, the placing of the (1) in parentheses, as here, “protects” it from being interpreted as a vector index. Consider then the wedge product5 (1 3.3.9)

Copying from Eq. (13.3.7), when Z operates on the two system vectors, the result is

This quantity vanishes when the components are proportional, but not, in general, otherwise. So far q and p have either referred to a one-dimensional system or are one pair of coordinates in a multidimensional system. To generalize to more than one configuration space coordinate we define (13.3.I 1) i-1

This is known as the “symplectic two-form” or, because conjugate coordinates are singled out, “the canonical two-form.’’ (The sum is expressed explicitly, rather than by the repeated index convention, to defer addressing the question of the geometric

6

4Recall that, since we are working in phase space, the symbol has a meaning different from what it would have in configuration space. Here it expects as argument a phase s.ace tangent vector dz. A notational difficulty we will have is that it is not obvious Khether the quantity dq is a one-form associated with one particular coordinate q or the set of one-forms dq’ cooresponding to all the coordinates 4’.We shall state which is the case every time the symbol is used. Here it is the former. 5T0be consistent, we should use z(’) to indicate that it is a two-form, but the symbol will be used so frequently that we leave off the superscript (2).

THE SYMPLECTIC PROPERTIES OF PHASE SPACE

379

character of the individual terms.) Acting on vectors u and v, this expands to

v)(Gj,u)>.

( 13.3.12)

(dz(l)*dz(2)) = C ( d q f l )dp(2)i - dqf2)dp(l)i)-

( 1 3.3.13)

u ) ( ~ v) i ,-

ij(u, v) =

(

~

'

3

i=l

When Z acts on dz(l) and d z ( 2 ) , the result is n

i=l

If the two terms are summed individually they are both scalar invariants, but it is more instructive to keep them paired as shown. Each paired difference, when acting on two vectors, produces the directed area of a projection onto one of the (q' , p i ) coordinate planes; see Fig. 13.3.1. For example, dq&)dp(z)l- dqfi)dp(l)lis the area of a projection onto the q ' , p1 plane. For one-dimensional motion there is no summation , and no projection, and Z(dz(l),dz(2)) is simply the area defined by ( d q ( l ) dp(1)) and (&(2), 4 7 ( 2 ) ) . As in Section 4.4.3, a two-form &(2) can be obtained by exterior differentiation of Zc1). Applying Eq. (4.4.10), (13.3.14)

This yields an alternative expression for the canonical two-form.

13.3.3. lnvariance of the Symplectic Two-Form Now consider the coordinate transformation from q' to Qi = Qi (9) discussed earlier in the chapter. Under this transformation

than the expression (The expression for the differential of Pj is more complicated . . for the differential of Q because the coefficients a QJ /aq' are themselves functions of position.) The Jacobian matrix elements satisfy (1 3.3.16)

After differentiation this yields

The factor %(*) in the final term in Eq. (13.3.15) can be evaluated using these a4 ~ Q J two results. In the new coordinates, the wedge product is

380

SYMPLECTIC MECHANICS

G. &’

Here the terms proportional to A with equal index values have vanished individually and those with unequal indices have canceled in pairs because they are odd under the interchange of i and I , whereas the coefficient a2Qif ag’aq‘ entering by virtue of Eq. (13.3.17) is even under the same interchange. To obtain the canonical two-form Z and demonstrate its invariance under coordinate transformation, all that has been assumed is the existence of generalized coordinates g’ and some particular Lagrangian L(q, q, r), as momenta pi were derived from them. One says that the phase space of a Lagrangian system is sure :o be “equipped” with the form G. It is this form that will permit the identificationof oneforms and vectors in much the same way that a metric permits the identification of covariant and contravariant vectors (as was discussed in Section 4.2.4). This is what will make up for the absence of the concept of orthogonality in developing within mechanics the analog of rays and wavefronts in optics. One describes these results as “symplectic geometry,” but the results derived so far, in particular Eq. (1 3.3.18), can be regarded simply as differential calculus. The term “symplectic calculus” might therefore be as justified.6 Another conclusion that will follow from Eq. ( I 3.3.18) is that the two-form A dpi evaluated for any two-phase space trajectories is “conserved’ as time advances. We will put off deriving this result (which amounts to being a generalized Liouville theorem) for the time being. It is mentioned at this point to emphasize that it follows purely from the structure of the equations-in particular from the definition in Eq. (13.1.1) of the momenta pj as a derivative of the Lagrangian with respect to velocity 4’. Since the derivation could have been completed before a Hamiltonian has even been introduced, it cannot be said to be an essentially Hamiltonian result, or a result of any property of a system other than the property of being characterized by a Lagrangian. For paraxial optics in a single transverse plane a result derived in Problem 10.1.1 was the invariance of the combination x(1)p(2)- x(2)p(1)for any two rays. This is an example of Eq. (13.3.18). Because that theory had already been linearized, the conservation law applied to the full amplitudes and not just to their increments. In general however the formula applies to small deviations around a reference orbit, even if the amplitude of that reference orbit is great enough for the equations of motion to be arbitrarily nonlinear.

-

G’

6Explanation of the source of the name symplectic actually legitimizes the topic as geometry since it relates to the vanishing of an antisymmetric form constructed from the coordinates of, say, a triplex of three points. The name “symplectic group” was coined by Hermann Weyl (from a Greek word with his intended meaning) as a replacement for the term “complex group” that he had introducedeven earlier, with “complex” used in the sense, “Is a triplex of points on the same line?” He intended “complex” to mean more nearly “simple” than “complicated and certainly not to mean But the collision of meanings become an embarrassment to him. Might one not therefore call a modem movie complex a “cineplectic group”?

n.

THE SYMPLECTIC PROPERTIES OF PHASE SPACE

381

13.3.4. Use of G to Associate Vectors and One-Forms To motivate ttus discussion recall, for example from Eq. (2.5.15) (which read xi = g i k x k ) that a metric tensor can be used to obtain covariant components xi from contravariant components x k ; this was “lowering the index.” The orthogonality of two vectors x i and y’ could then be expressed in the form x . y = X j y ’ = 0. The symplectic two-form discussed in the previous section can be written in the form G(.,.) to express the fact that it is waiting for two vector arguments, from which it will linearly produce a real number. It is important also to remember that, as a tensor, Z is antisymmetric. This means, for example, that Z(6 ii) = 0 where ii is any vector belonging to the tangent space TM, at system configuration x. For the time being, here we are taking a “belt and suspenders” approach of indicating a vector 5 with both boldface and overhead arrow. This is done only to stress the point and this notation will be dropped when convenient. Taking ii as one of the two vector arguments of G, we can define a new quantity (a one-form) G(.) by the formula u

u(*) = G(G,.).

(13.3.19)

This formula “associates” a one-form ’ii with the vector ii. Since the choice of whether to treat as the first or second argument in Eq. (13.3.19) was arbitrary, the sign of the association can only be conventional. The association just introduced provides a one-to-one linear mapping from the tangent space T M , to the cotangent space TMZ, spaces of the same dimensionality. For any particular choices of bases in these spaces, the association could be represented by matrix multiplication ui = A j j u j , where Ai, is an antisymmetric, square matrix with nonvanishing determinant, and which would therefore be invertible. Hence the association is one-to-one in both directions and can be said to be an isomorphism. The inverse map can be symbolized by

I : TM,* + T M , .

(13.3.20)

As a result, for any one-form 5there is sure to be a vector

ij = 1% such that

5= Z(5,

a).

(13.3.21)

An immediate (and important) example of this association is its application to gf, which is the standard one-form that can be constructed from any fynction f defined over phase-space; Eq. (13.3.21) can be used to generate a vector df = Igf from the one-form df so that

i f = I%

satisfies iif = Z(if, -1.

(13.3.22)

13.3.5. Explicit Evaluation of Some Inner Products Let q be a specific coordinate, say the first one, and p be its conjugate mome_ntum, and let f ( q , . . . , p , . . .) be a function defined on phase space. Again we use dq and

382

SYMPLECTIC MECHANICS

& tempo@ly

as the one-forms corresponding to these particular coordinates. The one-form df can be expressed two different ways, one according to its original definition, the other using the association (13.3.22) with G spelled out as in Eq.(13.3.12);

It follows that

In practice these equations would be applied to each of the individual pairs of coordinates and momenta. 13.3.6. The Vector Field Associated with

d7i

Since the Hamiltonian is a function on phase space, its differential one-form d% is well defined:

What is the associated vector dH = Z%? Fig. 13.3.2 shows the unique trajectory (q(t), p(t)) passing through some particular point q(o), p(0) and an incremental tangential displacement at that point represented by a 2n-component column vector

ai=

(t;) (jjdr.

(13.3.26)

=

(Our notation is not consistent, as, this time, dq and dp do stand for an array of components. Also, to reduce clutter we have suppressed the subscripts (0), which

P

-9 FIGURE 13.3.2. The vector d z , ~ ,= (dq(o,, dp(o,)T is tangent to a phase space trajectory given by q O ) ( t )= (q(o,(t),p(o)(t)).The trajectory is assumed to satisfy Hamilton's equations.

THE SYMPLECTIC PROPERTIES OF PHASE SPACE

383

were only introduced to make the point that what followed referred to one particular point.) Hamilton’s equations state

.). aH

($)=(--

(13.3.27)

a4*

and these equations can be used to evaluate the partial derivatives appearing in Eq.(13.3.25). The result is (13.3.28) On the other hand, evaluating the symplectic two-form on i z yields

where the inner products have been evaluated using Eq.(13.3.24). Dividing through by dt, the equation implied by Eqs. (13.3.28) and (13.3.29) can therefore be expressed using the isomorphism introduced in Eq. (13.3.21): -+

-

i=ZdH.

(13.3.30)

Though particular coordinates were used in deriving this equation, in the final form the relationship is coordinate-independent, which is to say that the relation is intrinsic. This is in contrast with the coordinate-dependent geometry describing the Hamilton-Jacobi equation in the previous chapter. In configuration space, one is accustomed to visualizing the force as being directed parallel to the gradient of a quantity with dimensions of energy, namely the potential energy. Here in phase space we find the system velocity related to (though not parallel to) the “gradient” of the Hamiltonian, also an energy. In each case, the motivation is to change the problem from that of finding vectorial quantities to the (usually) simpler problem of finding scalar quantities. This motivation should also be reminiscent of electrostatics, where one finds the scalar potential and from it the vector electric field.

13.3.7. Hamilton’s Equations in Matrix Form7 It is customary to represent first-order differential equations such as Eqs. (13.3.30) in matrix form. Since there are 2n equations, one wishes to represent the operator I , which has so far been entirely formal, by a 2n x 2n matrix. Actually, according to 7This section applies especally the geometry of covariant and contravariant vectors developed in Section 2.2.

384

SYMPLECTIC MECHANICS

Eq. (13.3.22), Q. (13.3.30) can be written even more compactly as = dH, but the right-hand side remains to be made explicit. When expressed in terms of canonical coordinates, except for sign changes and a coordinate-momentum interchange, the components of dH are the same as the components of the ordinary gradient of H.If we are prepared to refrain from further coordinate transformations (especially if they mix displacements and momenta), we can artificially introduce a Pythagorean relation in phase space (even though displacementsand momenta have different physical dimensions). Then incremental “distances” are given by ds 2 = d q . d q + d p . d p ,

(13.3.31)

where these are ordinary dot products of vectors. With this metric, the vector dp, -dq is orthogonal to dq, dp (see Fig. 13.3.3). In metric geometry, a vector can be associated with the hyperplane to which it is orthogonal. If the dimensionality is even and the coordinates are arranged in (9’, p i ) pairs, the equation of a hyperplane through the origin takes the form aiq’ +b’pi = 0. The vector with contravariantcomponentsq’ = 6‘ , pi = -ai is normal to this plane. On the other hand, the coefficients (ai , b’) in the equation for the hyperplane can be regarded as covariant components of the vector normal to the plane since the formula expressing its normality to vector (q;, poi) lying in the plane is qjqt; p’poi = 0. -+ In this way, a contravariant vector dz is associated with a covurianf vector, or oneform. Fig. 13.3.3 illustrates the vector with components (a, 6) normal to the dashed line with equation ax by = 0. With this (coordinate-dependent) identification, the isomorphism I can be expressed explicitly as

+

+

/aH\

where 0 is an n x n matrix of 0’s and 1 is an n x n diagonal matrix of 1’s. One equality sign in Eq. (13.3.32) is “qualified” because it equates intrinsic quantities to coordinate specific quantities. Notice that S is a rotation matrix yielding rotation

FIGURE 13.3.3. The line bx + ay = 0 is perpendicular to the line ax - by = 0.

SYMPLECTIC GEOMETRY

385

through 90” in the q , p plane when n = 1, while for n > 1 it yields rotation through 90” in each of the q‘ , pi planes separately. Using S, the 2n Hamilton’s equations can be written in the form 2

aH = -s-.

(13.3.33)

aZ

At this point it would have seemed more natural to have defined S with the opposite sign, but the choice of sign made in Eq. (13.3.32) will be convenient for making comparisons with notation standard in a different field in a later section. When the alternate symbol J = -S is used, Hamilton’s equations become i = J(aH/az). It should be reemphasized that, though a geometric interpretation has been given to the contravariantkovariant association, it is coordinate-dependent and hence artificial. Even changing the units of, say, momenta, but not displacements, changes the meaning of, say, orthogonality. It does not, however, change the solutions of the equations of motion. The isomorphism (13.3.32) can be applied to an arbitrary vector to relate its contravariant and covariant components

(13.3.34)

13.4. SYMPLECTIC GEOMETRY In the previous sections, the evolution of a mechanical system in phase space was codified in terms of the antisymmetricbilinear form z, and it was mentioned that this form plays a role in phase space analogous to the metric form in Euclidean space. The geometry of a space endowed with such a form is called “symplectic geometry.” The study of this geometry can be formulated along the same lines that ordinary geometry was studied in the early chapters of this text. In Chapter 3, one started with rectangular axes for which the metric tensor was the identity matrix. When skew axes were introduced, the metric tensor, though no longer diagonal, remained symmetric. Conversely it was found that, given a symmetric metric tensor, axes could be found such that it became a diagonal matrix-the metric form became a sum of squares (possibly with some signs negative). It was also shown that orthogonal matrices play a special role describing transformations that preserve the Pythagorean form, and the product of two such transformations has the same property. Because of this and some other well-known properties, these transformations were said to form a group, the orthogonal group. Next, when curvilinear coordinates were introduced, it was found that similar diagonalization could still be performed locally. Here we will derive the analogous “linearized” properties and will sketch the “curvilinear” properties heuristically.

386

SYMPLECTIC MECHANICS

13.4.1. Symplectic Products and Symplectic Bases

For symplectic geometry the step analogous to introducing a metric tensor was the step of introducing the “canonical two-form”

Here, in a step analogous to neglecting curvilinear effects in ordinary geometry, we have removed the differential “d” symbols because we now assume purely linear geometry for all amplitudes. Later, when considering “variational” equations that relate solutions in the vicinity of a given solution, it will be appropriate to put back the “d” symbols. (Recall that for any vector z = ( q ’ , P I ,q 2 , p 2 , . ..)T, one has q’ (z) = (G*, z) = q’ and so on.) The form i;l accepts two vectors, say w and z, as arguments and generates a scalar. One can therefore introduce an abbreviated notation

-

[w,21 = Law, 2 ) .

(1 3.4.2)

and this “skew-scalar” or “symplectic” product is the analog of the dot product of ordinary vectors. If this product vanishes, the vectors w and z are said to be “in involution.” Clearly one has

[w,21 = - [ z , wl and [z, 23 = 0,

(13.4.3)

so every vector is in involution with itself. The concept of vectors being in involution will be most significant when the vectors are solutions of the equations of motion. A set of n independent solutions in involution is said to form a “Lagrangian set.” An example of such a set is given below in Section 15.4. The skew-scalar products of pairs drawn from the 2n basis vectors e,l, eq2, . . .

and e p ’ , ep2,. . .

( 13.4.4)

are especially simple; (with no summation implied) [edr) , e p ( i ) ] = 1,

and all other basis vector products vanish.

(13.4.5)

To express this in words, in addition to being skew-orthogonal to itself, each basis vector is also skew-orthogonal to all other basis vectors except that of its conjugate mate, and for that one the product is f l . Any basis satisfying these special product relations is known as a “symplectic basis.” Though the only skew-symmetric form that has been introduced to this point was that given in Eq. (13.4.1), in general a similar skew-product can be defined for any skew-symmetric form 6 whatsoever. Other than linearity, the main requirements for i;l are those given in Eq. (13.4.3). but to avoid “degenerate” cases it is also necessary to require that there be no nonzero vector orthogonal to all other vectors. With these properties satisfied, the space together with Zi is said to be symplectic. Let N stand for its dimensionality. A symplectic basis like (13.4.4) can be found

SYMPLECTIC GEOMETRY

387

for the space. To show this one can start by picking any arbitrary vector u1 as the first basis vector. Then, because of the nondegeneracy requirement, there has to be another vector, call it v1, that has a nonvanishing skew-scalar product with u1, and the product can be made exactly 1 by appropriate choice of a scale factor multiplying v1. If N = 2, then n = N / 2 = 1, and the basis is complete. For N > 2, if an appropiate multiple of u1 is subtracted from a vector in the space, the resulting vector either vanishes or has vanishing skew-scalar product with u1. Perform this operation on all vectors. The resulting vectors form a space of dimensionality N - 1 that is said to be “skew complementary” to u1; call it U1.It has to contain v1. Similarly one can find a space V1 of dimensionality N - 1 skew complementary to v1. Since V1 does not contain v1, it follows that U1 and Vl do not coincide, and hence their intersection, call it W, has dimension N - 2. On W we must and can use the same rule .] for calculating skew-scalar products, and we now check that this product is nondegenerate. If there were a vector skew-orthogonal to all elements of W ,because it is also skew-orthogonal to u1 and v1 it would have to be skew-orthogonal to the whole space, which is a contradiction. By induction on n we conclude that the dimensionality of the symplectic space is even, N = 2n, and since a symplectic basis can always be found (as in Fq.(13.4.5)), all symplectic spaces of the same dimensionalityare isomorphic, and the skew-scalar product can always be expressed as in J2q. (13.4.1). The arguments of this section have assumed linearity, but they can be generalized to arbitrary curvilinear geometry and, when that is done, the result is known as Darboux’s theorem. From a physicist’s point of view, the generalization is obvious since, looking on a fine enough scale, even nonlinear transformations appear linear. A variant of this “argument” is that, just as an ordinary metric tensor can be transformed to be Euclidean over small regions, the analogous property should be true for a symplectic “metric.” This reasoning, however, is only heuristic (see Arnold [ 1, p. 2301 for further discussion). [a,

13.4.2. Symplectic Transformations

For symplectic spaces, the analogs of orthogonal transformation matrices (which preserve scalar products) are symplectic matrices M (that preserve skew-scalar products). The “transform” Z of vector z by M is given by

Z = Mz.

( 1 3.4.6)

The transforms of two vectors u and v are Mu and Mv, and the condition for M to be symplectic is

[MU,Mv] = [u, v].

(13.4.7)

If MI and M2 are applied consecutively, their product, M2M1, is necessarily also symplectic. Since the following problem shows that the determinant of a symplectic matrix is 1, it follows that the matrix is invertible, and from this it follows that the symmetric transformations form a group.

388

SYMPLECTICMECHANICS

Problem 13.4.1: In a symplectic basis, the skew-scalar product can be reexpressed as an ordinary dot product by using the isomorphism I defined in Eq.(13.3.21), and I can be represented by the matrix S defined in Eq. (13.3.32). Using the fact that det IS1 = 1, adapt the argument of Section 4.1.1 to show that det 1MI = 1 if M is a symplectic matrix. 13.4.3. Properties of Symplectlc Matrices Vectors in phase space have dimensionality 2n and, when expressed in a symplectic basis, have the form ( q l , p1, q 2 ,p 2 , . . .)T or ( q l , q 2 , .. ., PI, p 2 , . . . ) T , whichever one prefers. Because it permits a more compact partitioning, the second ordering is more convenient for writing compact, general matrix equations. But when motion in one phase space plane, say (ql , P I ) , is independent of, or approximately independent of, motion in another plane, say (q2, p2), the first ordering is more convenient. In Eq.(13.3.32). the isomorphism from covaria$ to contravariant components was expressed in coordinates for a particular form dH. The inverse isomorphism can be applied to arbitrary vector to yield a form %

1%S(:)

where S = ( l 0 -1 o)

.

(13.4.8)

(The qualified equality symbol % acknowledges that the notation is a bit garbled, with the left-hand side appearing to be intrinsic and the right-hand side expressed in components.) Using Eq.(13.4.8) it is possible to express the skew-scalar product [w,z] of vectors w and z (defined in Eq.(13.4.2)) in terms of ordinary scalar products and from that a quadratic form:

Since displacements and momenta are being treated homogeneously here, it is impossible to retain the traditional placement of the indices for both displacements and momenta. Eq.(13.4.9) shows that the elements -Sj, are the coefficients of quadratic form giving the skew-scalar product of vectors and Zb in terms of their components:

When the condition Eq. (13.4.7) for a linear transformationM to be symplectic is expressed with dot products as in Eq. (13.4.9), it becomes su . v = SMU

+

M V= M ~ S M UV..

( 1 3.4.1 1)

SYMPLECTIC GEOMETRY

389

This can be true for all u and v only if

M ~ S M= s.

(13.4.12)

This is an algebraic test that can be applied to a matrix M whose elements are known explicitly to determine whether or not it is symplectic.

Problem 13.4.2: Show that condition (13.4.12) is equivalent to

M S M ~= s.

(13.4.13)

Problem 13.4.3: Hamilton’s equations in matrix form are (13.4.14) and a change of variables with symplectic matrix M,

z = MZ, is performed. Show that the form of Hamilton’s equations is left invariant. Such transformations are said to be “canonical.” A result equivalent to Eq. (13.4.12) is obtained by multiplying it on the right by

M-’ and on the left by S: M-’ = -SMTS.

(13.4.15)

This provides a handy numerical shortcut for determining the inverse of a matrix that is known to be symplectic since the right-hand side requires only matrix transposition and multiplication by a matrix whose elements are mainly zero, and the others fl.Subsequent formulas will be abbreviated by introducing to be called the “symplectic conjugate” of arbitrary matrix A by

x,

-

A = -SATS.

( 13.4.16)

A necessary and sufficient condition for matrix M to be symplectic is then

M-’ =

m.

(13.4.17)

From here on, when a matrix is symbolized by M, it will implicitly be assumed to be symplectic and hence to satisfy this equation. For any 2 x 2 matrix A, with S given by Eq.(13.4.8), substitution into Eq. (13.4.16) yields (13.4.18)

390

SYMPLECTIC MECHANICS

assuming the inverse exists. Hence, using Eq. (13.4.17), for II = 1 a necessary and sufficient condition for syrnplecticity is that det IMI = 1. For n > 1, this condition will shortly be shown to be necessary, but it can obviously not be sufficient since Eqs. (13.4.17) imply more than one independent algebraic condition. For most practical calculations, it is advantageous to list the components of phase space vectors in the order z = (4’, p1, q 2 , p2)‘ and then to streamline the notation further by replacing this with z = ( x , p , y. q ) T . (Here, and when the generalization to arbitrary n is obvious, we exhibit only this n = 2 case explicitly.) With this ordering, the matrix S takes the form 0 - 1 0

0

. = ( I 0 0O 0 O - 01 ). 0

0

1

( 13.4.19)

0

Partitioning a 4 x 4 matrix M into 2 x 2 blocks, it and its symplectic conjugate are (13.4.20) The eigenvalues of a symplectic matrix M will play an important role in the sequel. The “generic” situation is for all eigenvalues to be unequal, and that is much the easiest case for the following discussion since the degeneracy of equal eigenvalues causes the occurrence of indeterminate ratios that require special treatment in the algebra. Unfortunately there are two cases where equality of eigenvalues is unavoidable: (1) Systems often exhibit symmetries that, if exactly satisfied, force equality among certain eigenvalues or sets of eigenvalues. This case is more a nuisance than anything else since the symmetry can be removed either realistically (as it would be in nature) or artificially; in the latter case the perturbation can later be reduced to insignificance. It is very common for perturbing forces of one kind or another, in spite of being extremely small, to remove degeneracy in this way. (2) It is often appropriate to idealize systems by one or more variable “control parameters” that characterize the way the system is adjusted externally. Since the eigenvalues depend on these control parameters the eigenvalues may have to become equal as a control parameter is varied. It may happen that the system refuses to allow this (see Problem 1.2.5). or sometimes the eigenvalues can pass through each other uneventfully. In any case, typically the possibility of such “collisions” of the eigenvalues contributes to the “essence” of the system under study, and following the eigenvaluesthrough the collision or absence of collision is essential to the understanding of the device. For example, a “bifurcation” can occur at the point where the eigenvalues become equal, and in that case the crossing point marks the boundary of regions of qualitatively different behavior. In spite of this inescapability of degeneracy, in the interest of simplifying the discussion, for the time being we will assume all eigenvalues of M are distinct. When discussing approximate methods in a later chapter, it will be necessary to rectify this oversimplification.

SYMPLECTIC GEOMETRY

391

The eigenvalues h and eigenvectors +A of any matrix A satisfy the “eigenvalue” and the “eigenvector” equations det IA - 111 = 0,

and A+, = h+A.

(13.4.21)

Since the determinant is unchanged when A is replaced by A T ,a matrix and its transpose share the same set of eigenvalues. From Eq. (13.4.16) it follows that the symplectic conjugate & also has the same set of eigenvalues. Then, from Eq. (13.4.17) it follows that the eigenvalue spectrum of a symplectic matrix M and its inverse M-’ are identical. For any matrix, if h is an eigenvalue, then l/h is an eigenvalue of the inverse. It follows that if h is an eigenvalue of a symplectic matrix, then so also is l/h. Even if all the elements of M are real (as we assume), the eigenvectors can be complex, and so can the eigenvalues. But here is where symplectic matrices shine. Multiplying the second of Eqs. (13.4.21) by M-’ and using Eq. (13.4.17), one concludes both that

Writing h = reie, then l / h = ( l / r > C i eis also an eigenvalue, and these two eigenvalues are located in the complex A-plane, as shown in Fig. 13.4.la. However, it also follows from the normal properties of the roots of a polynomial equation that if an eigenvalue I = reie is complex then its complex, conjugate A* = re-” is also an eigenvalue. This is illustrated in Fig. 13.4.lb. It then follows, as shown in Figures 13.4.1~ and 13.4.ld, that the eigenvalues can only come in real reciprocal pairs, or in complex conjugate pairs lying on the unit circle, or in quartets as in Figure 13.4.1~. For the cases illustrated in Fig. 13.4. Id, these requirements can be exploited algebraically by adding the equations (13.4.22) to give

(M +

= A+A

where A = A.

which shows that the eigenvalues A of M explicitly in the 4 x 4 case yields

+

+ A-’,

(13.4.23)

are real. Performing the algebra

MfB=(CA +B +A D B+C B)

(13.4.24)

where the off-diagonal combination E and its determinant € are defined by

E=C+Br

(i g),

and & = d e t J E ( = e h - fg.

(13.4.25)

392

SYMPLECTIC MECHANICS

0

Imh I

FIGURE 13.4.1. (a) If A = rei8 is an eigenvalue of a symplectic matrix, then so also is l / h = (l/r)e-”. (b) If an eigenvalue A = re’’ 1s complex, then its complex conjugate A* = re-ie is also an eigenvalue. (c) Ifany eigenvalue is complex with absolute value other than 1, the three complementary points shown are also eigenvalues. (d) Ergenvalues can come In pairs only if they are real (and reciprocal) or lie on the unit circle (symmetrically above and below the real axis).

The eigenvalue equation is8 (trD - h ) l

= h2-(trA+trD)h+trAtrD-&

= 0 (13.4.26)

whose solutions are

+

AA,D = (@A trD)/2

f

J;rA-lrD)2I4+E.

(13.4.27)

The eigenvalues have been given subscripts A and D to facilitate discussion in the common case that the off-diagonal elements are small, so the eigenvalues can be associated with the upper-left and lower-right blocks of M, respectively. Note that the eigenvalues satisfy simple equations A A + A D = trA

+ trD,

A A A D = trAtrD - &.

(13.4.28)

*It i s not in general valid to evaluate the determinant of a partitioned matrix treating the blocks as if they were ordinary numbers, but it is valid if the diagonal blocks are individually proportional to the identity matrix, as is the case here.

SYMPLECTIC GEOMETRY

393

Though we have been proceeding in complete generality and this result is valid for any n = 2 symplectic matrix, the structure of these equations all but forces one to contemplatethe possiblility that E be “small,” which would be true if the off-diagonal blocks of M are small, as would be the case if the x and y motions were independent or almost independent. Calling x “horizontal” and y “vertical,” one says that the offdiagonal blocks B and C “couple” the horizontal and vertical motion. If B = C = 0, the horizontal and vertical motions proceed independently. The remarkable feature of Eqs. (13.4.28) is that, though B and C together have eight elements capable of not vanishing, they shift the eigenvalues only through the combination €. In Eq. (13.4.27), we should insist that A ( D ) go with the +(-) sign respectively when trA - trD is positive and vice versa. This choice ensures that, if & is in fact small, the perturbed eigenvalue AA will correspond to approximatelyhorizontal motion and AD to approximately vertical. Starting from a 4 x 4 matrix, one expects the characteristic polynomial to be quartic in h, but here we have found a characteristic polynomial quadratic in A. The has nothing but pairs of degenerate reason for this is that the combination M roots, so the quartic characteristic equation factorizes exactly into the square of a quadratic equation. We have shown this explicitly for n = 2 (and for n = 3 in a problem below), but the result is true for arbitrary n. Anticipating results to appear later on, multiplying M by itself repeatedly will be of crucial importance for the behavior of Hamiltonian systems over long periods of time. Such powers of M are most easily calculated if the variables have been transformed to make M diagonal, in which case the diagonal elements are equal to the eigenvalues. Then, evaluating M’ for large (integer) I, the diagonal elements are A’ and their magnitudes are I h f , which approach 0 if 1311 < 1 or 00 if lhl > 1. Both of these behaviors can be said to be “trivial.” This leaves just one possibility as the case of greatest interest. It is the case illustrated on the left in Fig. 13.4.1dall eigenvalues lie on the unit circle. In this case there are real angles p~ and p~ satisfying

+a

+

A A = e i p A ediCLA = 2 cos P A ,

AD = ,&WD + e--iPD = 2cospD.

(13.4.29)

In the special uncoupled case, for which B and C vanish, these angles degenerate into F~ and p y ,the values appropriate for pure horizontal and vertical motion, and we have

The sign of determinant E has special significance if the uncoupled eigenvalues are close to each other. This can be seen most easily by rearranging Eqs. (13.4.27) and (1 3.4.29) into the form ( C O S ~ A- C

1 OS~D = )-(trA ~ - trD)* +€. 4

(13.4.31)

394

SYMPLECTIC MECHANICS

If the unperturbed eigenvaluesare close, the first term on the right-hand side is small. Then for E c 0, the perturbed eigenvalues A can become complex (which pushes the eigenvalues A off the unit circle, leading to instability.) But if E > 0 the eigenvalues remain real and the motion remains stable (at least for sufficiently small values of & > 0). An even more important inference can be drawn from Eqs. (13.4.27) and (13.4.29). If the parameters are such that both c o s p ~< c o s p ~lie in the (open) range -1 < c o s p ~< c o s p ~CI 1, then both angles p~ and are real and the motion is “stable.” What is more, for sufficiently small variations of the parameters the eigenvalues, because they must move smoothly, cannot leave the unit circle and these angles necessarily remain real. This means the stability has a kind of “robustness” against small changes in the parameters. This will be considered in more detail later. Pictorially, the eigenvalues on the left in Fig. 13.4.ld have to stay on the unit circle, as the parameters are varied continuously. Only when an eigenvalue “collides” with another eigenvalue can the absolute value of either eigenvalue deviate from 1. Furthermore, if the collision is with the complex conjugate mate, it can only occur at either +I or -1. The reader who is not impressed that it has been possible to find closed-form algebraic formulas for the eigenvalues of a 4 x 4 matrix should attempt to do it for a general matrix ((a, b, c, d ) , (e, f,g, h ) , . . .). It is symplecticity that has made it possible. To exploit our good fortune we should also find closed-form expressions for the eigenvectors. One can write a four-component vector in the form z=(:>,

where

x=(;)

One can then check that the vectors’ X=(

&

)

and

and

c=(:>

( I 3.4.32)

Y = ( TE l )

(13.4.33)

h-trDX

+

+

satisfy the (same) equations (M M-’)X = AX and (M M-’)Y = AY for either eigenvalue and arbitrary x or 5 . If we think of & as being small so that the eigenvectorsare close to the uncoupled solution, then we should select the A factors so that Eqs. (13.4.33) become (13.4.34) In each case, the denominator factor has been chosen to have a “large” absolute value so as to make the factor multiplying its two-component vector “small.” In this way, the lower components of X and the upper components of Y are “small.” In the limit of vanishing & only the upper components survive for x-motion and only the lower for y. This formalism may be mildly reminiscent of the four-component spin vectors describing relativistic electrons and positrons. 91f the coupling is strong it is technically advantageous to define X with an explicit factor A protects the lower component from divergence. Similarly Y is defined with factor A - trA.

- trD that

SYMPLECTIC GEOMETRY

395

There is another remarkable formula that a 4 x 4 symplectic matrix must satisfy. A result from matrix theory is that a matrix satisfies its own eigenvalue equation. Applying this to M one has

+ m,

(M+

- (AA

+ AD) (M + M)+ AA A D = 0.

(13.4.35)

Rearranging this yields

M2

+m2 - (AA + h ~ ) ( M + m +) 2 +

A A A D = 0.

(13.4.36)

By using Eq. (13.4.28), this equation can be expressed entirely in terms of the coefficient of M

M2 +

m2- (tr A + trD)(M + m) + 2 + tr A tr D - E = 0.

Problem 13.4.4: Starting with M (13.4.35) explicitly.

(13.4.37)

+ M expressed as in Eq. (13.4.24), verify Eq.

Problem 13.4.5: Find the equation analogous to Eq. (13.4.36) that is satisfied by a 6 x 6 symplectic matrix A B E M = C D F ((3 H J )

(13.4.38)

It is useful to introduce off-diagonal combinations

(13.4.39) The eigenvalue equation for A = A

+ 1/A is cubic in this case,

A 3 - p l A 2 -p2A - p 3 = O ,

(13.4.40)

but it can be written explicitly, and there is a procedure for solving a cubic equation. The roots can be written in terms of the combinations

This is of more than academic interest as the Hamiltonian motion of a particle in three-dimensional space is described by such a matrix.

13.4.4. Alternate Coordinate Ordering The formulas for symplectic matrices take on a different appearance when the coordinates are listed in the order z = ( q l , q 2 ,. . . , p1, p2, . . .)T.With this ordering, the

390

SYMPLECTIC MECHANICS

2n x 2n matrix S takes the form ( I 3.4.42) with each of the partitions being n x n . Partitioning M into 2 x 2 blocks, it and its symplectic conjugate are ( 13.4.43)

Subscripts a have been added as a reminder of the alternate coordinate ordering. This formula has the attractive property of resembling the formula for the inverse of a 2 x 2 matrix. With this ordering, the symplectic product defined in Eq. (13.4.9) becomes

This combination, which we have called a “symplectic product,” is sometimes called “the Poisson bracket” of the vectors &, and iQ,but it must be distinguished from the Poisson bracket of scalar functions to be defined in the next section. 13.5. POISSON BRACKETS OF SCALAR FUNCTIONS Many of the relations of Hamiltonian mechanics can be expressed compactly in terms of the Poisson brackets that we now define.

13.5.1. The Poisson Bracket of Two Scalar Functions Consider two functions f(z) = f(q, p) and g(z) = g(q, p) defined on phase space. From them can be formed $f and c&g and from them (using the symplectic 2-form G and the standard association) the vectors i f and i g . The “Poisson bracket” of functions f and g is then defined by

(f,8 ) =

aif,43).

(13.5.1)

Spelled out more explicitly as in Eq. (13.3.12), this becomes

where the scalar products have been obtained using Eqs. (13.3.24). Though the terms in this sum are individually coordinate-dependent,by its derivation the Poisson bracket is itself coordinate-independent.

397

POISSON BRACKETS OF SCALAR FUNCTIONS

One application of the Poisson bracket is to express time evolution of the system. Consider the evolution of a general function f(q(t), p(t), t) as its arguments follow the phase space system trajectory. Its time derivative is given by

= If, HI

af + at.

(13.5.3)

In the special case that the function f has no explicit time dependence, its time derivative is therefore given directly by its Poisson bracket with the Hamiltonian.

13.5.2. Properties of Poisson Brackets The following properties are easily derived:

Jacobi Identify

If,

Leibnitz Property

Ig, h ) I + Iflf2,

Is, Ih, f 1) + Ih, If,

g) = f1 If2, g )

a

Explicit Time Dependence - [f1, at

+ f2 If1

(13.5.4)

g ) ) = 0.

(13.5.5)

gl .

f2) =

Jacobi’s Theorem

THEOREM 13.5.1: If {H, f i ) = 0 and (H, f2) = 0, then (H, ( f l ,

f2))

= 0.

Proof:

Corollary: If f1 and f2 are “integrals of the motion,” then so also is { f l , the form in which Jacobi’s theorem is usually remembered.

f2).

This is

398

SYMPLECTIC MECHANICS

Perturbation Theory: Poisson brackets will be of particular importance in perturbation theory when motion close to integrable motion is studied. Using the term “orbit element” frequently used in celestial mechanics to describe an integral of the unperturbed motion, the coefficients in a “variation of constants” perturbative procedure will be expressible in terms of Poisson brackets of orbit elements, which are therefore themselves also orbit elements whose constancy throughout the motion will lead to important simplification.

13.5.3. The Poisson Bracket and QUantUm Mechanics 73.5.3.7. CommutationRelations: In Dirac’s formulation, there is a close correspondance between the Poisson brackets of classical mechanics and the commutation relations of quantum mechanics. In particular, if u and u are dynamical variables, their quantum mechanical “commutator” [u,U ] Q M E u u - uu is given by

where h is Planck’s constant (divided by 27r) and ( u , u ) is the classical Poisson bracket. Hence, for example,

In the Schrodinger representation of quantum mechanics, one has q --f q and p +. -iha/aq, where q and p are to be regarded as operators that operate on functions f (4).One can then check that ( 13.5.10)

in agreement with Eq. (13.5.9).

13.5.3.2. Time Evolution of Expectation Values: There needs to be “correspondence” between certain quantum mechanical and classical mechanical quantities in order to permit the “seamless” metamorphosis of a system as the conditions it satisfies are varied from being purely quantum mechanical to being classical. One such result is that the expectation values of quantum mechanical quantities should evolve according to classical laws. A quantum mechanical particle is characterized by a Hamiltonian H , a wave function 9,and the wave equation relating them:

a\t

ih-

at

= HYI.

(13.5.11)

INTEGRAL INVARIANTS

399

The expectation value of a function of position f ( q ) is given by ( 13.5.12)

Its time rate of change is then given by

?= J

(F* **-w

+ **f

af

f q J+

at

at

=/((?)* =JW*[:+pZf

+f H )

1

1

Wdq.

( 13.5.13)

In the final step the relation H* = H required for H to be a “Hermitean” operator has been used. To assure that

.

-

_

f =f,

(13.5.14)

we must then have

f

af

=at

+ -[iti1

f, HI.

(13.5.15)

When the quantum mechanical commutator [H, f ] is related to the classical Poisson bracket { H,f )as in Eq. (13.5.8), this result corresponds with the classical formula for f given in Eq.(13.5.3).

13.6. INTEGRAL INVARIANTS 13.6.1. Integral Invariants in Electricity and Magnetism In anticipation of some complications that will arise in studying integral invariants, it would be appropriate at this time to digress into the distinction between local and global topological properties in differential geometry. Unfortunately, discussions of this subject, known as “cohomology” in mathematics texts, are formidably abstract. Fortunately, physicists have already encountered some of the important notions in concrete instances. For this reason we digress to develop some analogies with vector integral calculus. Since it is assumed the reader has already encountered these results in the context of electricity and magnetism, we employ that terminology here, but with inessential constant factors set equal to 1; this includes not distinguishing between the magnetic vectors B and H. In the end the subject of electricity and magnetism will have played only a pedagogical role.

400

SYMPLECTIC MECHANICS

We have already encountered the sort of analysis to be performed in geometric optics. Because of the “eikonal equation” Eq. (10.1.11). n(dr/ds) = Vd was the gradient of the single-valued eikonal function 4. The invariance of the line integral of n ( d r / d s ) for different paths connecting the same endpoints then followed, which was the basis of the “principle of least time.” There was potential for fallacy in this line of reasoning however, as Problem 13.6.1 is intended to illustrate. One recalls from electrostatics that the electric field is derivable from a single-valued potential 9~such that E = -V@E, and from this one infers that $, E . ds = 0, where the integration is taken over a closed path called y . One then has the result that:/ E . ds is independent of the path from PI to 9.Poincark introduced the terminology of calling such a path-independent integral an “integral invariant” or an “absolute integral invariant.” But the single-valued requirement for @ E is not easy to apply in practice. It is more concise in electrostatics to start from V x E = 0, rather than E = - V i p ~ ,since this assures $, E ds = 0. (You can use Stokes’ theorem to show this.) Though V x E = 0 implies the existence of @ E such that E = -V@E, the converse does not follow.

-

Problem 13.6.1: The magnetic field H of a constant current flowing along the z-axis has only x and y components and depends only on x and y. Recalling (or looking up) the formula for H in this case, and ignoring constant factors, show that

H = V@M

where @M =tan-’

l. X

In terms of polar coordinates r and 8 , one has x = r cose and y = r sin8. After expressing H in polar coordinates, evaluate $, H . ds where y is a complete circle of radius ro centered on the origin. Comment on the vanishing or otherwise of this integral. After having completed Problem 13.6.1, let us consider magnetostatic fields. In current-free regions of space, the magnetostatic field can be derived from a “magnetic” potential, H = - V 9 w . Why is this consistent with Amphe’s law, $y Hads = I, where the line integral is taken over closed path y . and where I is the (nonzero in general) current linking y? In electromagnetism V x H = J, where “current density” J is, in general, nonvanishing. One has to distinguish between J’s vanishing over regions near the integration path into which the contour is allowed to be deformed and its vanishing everywhere on a surface bounded by the integration path. In the former case, the integral $, H . d s is independent of path, consistent with Am@re’s law, but the integral necessarily vanishes only in the latter case. This may be most likely to seem paradoxical when the current density vanishes everywhere except in an “infinitesimally fine” wire carrying finite current I and linking the integration path. Recapitulating, the vanishing of J everywhere near the path of integration is insufficient; for the vanishing of the line integral of H to be assured, the curl has to vanish everywhere on a surface bounded by the path of integration. Let us formalize these considerations.

INTEGRAL INVARIANTS

401

“Physical”Argument: For symplectic mechanics, it is the mathematical equivalent of Amphe’s law that we will need to employ. That law follows from the equation

VxH=J. Integrating this relation over a surface Stokes’s theorem, the result is

rl

bounded by closed curve y1 and using

1, L, H . ds =

(13.6.1)

(V x H) .da =

s,,

J . da,

(13.6.2)

giving the “flux” of J through surface rl. As shown in Fig. 13.6.1, since J is “current density,” it is natural to visualize the flow lines of J as being the paths of steady current. The flow lines through y1 form a “tube of current.” The flux of J through y1 would also be said to be the “total current” flowing through y1. If another closed loop y2 is drawn around the same tube of current, then it would be linked by the same total current. From this “physical” discussion, the constancy of this flux seems to be “coming from” the conservation of charge, but the next section will show that this may be a misleading interpretation.

“Mathematical”Argument: Much the same argument can be made with no reference whatsoever to the vector J. Rather, refemng again to Fig. 13.6.1, let H be any vector whatsoever, and consider the vector V x H obtained from it. The flow lines of V x H passing through closed curve y1 define a “tube.” Further along this tube

I

/

/

z surface

lLX

c4

/

’

Y,

flowlinesof

V x H through Y,

FIGURE 13.6.1. A “tube” formed by flowlines of V x H passing through closed curve y1. The part C between y1 and another closed curve y2 around the same tube forms a closed volume when it is “capped“ by surfaces r1 bounded by y1 and r2 bounded by y2.

402

SYMPLECTICMECHANICS

is another closed curve M linked by the same tube. Let the part of the tube’s surface between y1 and M be called X.The tube can be visualized as being “capped” at one end by a surface I’l bounded by y1 and at the other end by a surface r2 bounded by n to form a closed volume V. Because it is a curl, the vector V x H satisfies

V.(VxH)=O

(1 3.6.3)

throughout the volume, and it then follows from Gauss’s theorem that

where d V is a volume differential and da is a normal, outward-directed surface area differential. By construction, the integrand vanishes everywhere on the surface X.” Then applying Stokes’s theorem again yields

f, H . d s =

H .ds.

(13.6.5)

t.9

Arnold refers to this as “Stokes’s lemma.” Poincark introduced the terminology “relative integral invariant” for such quantities. Since H can be any (smooth) vector, the result is purely mathematical and does not necessarily have anything to do with the “source” of H. This same mathematics is important in hydrodynamics,where H is the velocity of fluid flow and the vector V x H is known as the “vorticity,” its flow lines are known as “vorticity lines,” and the tube formed from these lines is known as a “vorticity tube.” This terminology has been carried over into symplectic mechanics. One reason this is being mentioned is to point out the potential for misinterpretationof this terminology. The terminology is in one way apt and in another way misleading. What would be misleading would be to think of H as in any way representing particle velocity even though H stands for velocity in hydrodynamics. What is apt, though, is to think of H as a static magnetic field or rather to think of . I= V x H as the static current density causing the magnetic field. It is the flow lines of J that are to be thought of as the analog of the configuration space flow lines of a mechanical system, and these are the lines that will be called vortex lines and will form a vortex tube. H tends to wrap around the current Aow lines, and Amfire’s law relates its ‘‘circulation’’ H - ds for various curves y linked by the vortex tube.

13.6.2. The PoincarHartan Integral Invariant In spite of having identified a possible hazard, we boldly apply the same reasoning to mechanics as we applied in deriving the principle of least time in optics. In the space of g, p, and t-known as the (time) extended phase space-we continue to analyze “Later in the chapter there will be an analogous “surface” integral, whose vanishing will be similarly essential.

INTEGRAL INVARIANTS

403

the set of system trajectories describable by function S(q, t ) satisfying the HamiltonJacobi equation. The “gradient” relations of Eq. (1 1.1.12) were asla? = -H and a S / a q i = p i . If we assume that S is single-valued, it follows that the integral from Pl : (q(l),?l) to p : (q9 t > , (13.6.6) which measures the change in S in going from PI to P , is independent of path. This is called the “Poincar6-Cartan integral invariant,” which for brevity we designate by 1.1. The integration path is a curve in “extended configuration space,” which can also be regarded as the projection onto the extended coordinate space of a curve in extended phase space; it need not be a physically realizable orbit, but the functions pi and H must correspond to a particular function S such as in Eq. (1 1.1.12). Unfortunately it will turn out that the requirement that S be nonsingular and single-valued throughout space is too restrictive in practice, and a more careful statement of the invariance of 1.1. is (13.6.7) where the integrationpaths y1 and yz are closed (in phase space, though not necessarily in time-extended phase space) and encircle the same tube of system trajectories. The evaluation of 1.1. for a one-dimensional harmonic oscillator is illustrated in Fig. 13.6.2-in this case the solid curve is a valid system path in extended phase space. Because t k form in the integrand is expanded in terms of coordinates, the differential form dq can be replaced by ordinary differential d x . Energy conservation in simple harmonic motion is expressed by P2 + -1k x 2 = E,

2m

2

(13.6.8)

FIGURE 13.6.2. Extended phase space for a one-dimensional simple harmonic oscillator. The heavy curve is a valid system trajectory and also a possible path of integration for the evaluation, the PoincarMartan integral invariant.

404

SYMPLECTIC MECHANICS

as the figure illustrates. This is the equation of the ellipse, which is the projection of the trajectory onto a plane of constant 1. Its major and minor axes are and Integration of the first term of Eq. (13.6.6) yields

m.

f p ( x ) d x = / / d p d x = n m m = 2 n E m = ET,

(13.6.9)

since the period of oscillation is T = 2 n m . The second term of Eq. (13.6.6) is especially simple because H = E and it yields - E T . Altogether 1.1. = 0. If the path defining 1.1. is restricted to a hyperplane of fixed time t , like curve y1 in Fig. 13.63, then the second term of (13.6.6) vanishes. If the integral is performed over a closed path y . the integral is called the “Poincark relative integral invariant” R.I.I. (13.6.10)

This provides an invariant measure of the tube of trajectories bounded by curve y1 and illustrated in Fig. 13.6.3. Using the differential form terminology of Section 4.4.2, this quantity is written R.I.I.(t) =

f E(t)

(13.6.11)

and is called the circulation of p(t) about y . Since this integral is performed over a closed path, its value would seem to be zero under the conditions hypothesizedjust before Eq. (13.6.6), but we have found its value to be 2n E m , which seems to be a contradiction. Clearly the R.I.I. acquires a nonvanishing contribution because S is not single-valued in a region containing the

FIGURE 13.6.3. A bundle of trajectories in extended phase space, bounded at time tl by curve y1. The constancy of R.I.I., the Poincare relative integral invariant, expresses the equality of line integrals over y1 and w.This provides an invariant measure of the tube of trajectories bounded by curve y1.

INVARIANCE OF THE POINCARE-CARTAN INTEGRAL INVARIANT 1.1.

405

integration path. Looking at Eq. (1 1.2.37), obtained as the Hamilton-Jacobi equation was being solved for this system, one can see that the quantity aSo/aq is doubly defined for each value of q , This invalidates any inference about the R.I.I. integral that can be drawn from Eqs. (1 1.1.12). This shows that, though the Hamilton-Jacobi gradient relations for p and H provide an excellent mnemonic for the integrands in the 1.1. integral, it is improper to infer integral invariance properties from them.

*13.7. INVARIANCE OF THE POINCARE-CARTAN INTEGRAL INVARIANT 1.1.” 13.7.1. The Extended Phase Space Two-Form and Its Special Eigenvector Eq. (13.6.11) shows that the integral appearing in 1.1. is the circulation of a oneform ;(’). To analyze it using the analog of Stokes’s lemma that was used in the electromagneticexample described above, it is necessary to define the vortex tube of a one-form, and this requires first the_definitionof a vortex line of a one-form. We start by finding the exterior derivative dG(’) as defined in Eq. (4.4.10). To make this definite, let us analyze the “extended momentum one-form’’ (13.7.1) which is summed on i and is in fact the one-form appearing in 1.1. This is the canonical coordinate version of the standard momentum one-form with the one-form H d? subtracted. In this case, the integral is to be evaluated along an n 1-dimensional curve in the 2n 1-dimensional,time-extended phase space. For all the apparent similarities between Fig. 13.6.1 and Fig. 13.6.3, there are important differences, with the most important one being that the abscissa axis in the latter is the time t. Since all the other axes come in canonical conjugate pairs, the dimensionality of the space is necessarily odd. A two-form (E ;) can be obtained by exterior differentiation of -3(E1), as in Eq. (4.4.10);

+

+

“This section depends on essentially all the geometric concepts that have been introduced in the text. Since this makes it particularly difficult, it should perhaps only be skimmed initially. But because the proof of Liouville’s theorem and its generalizations, probably the most fundamental results in classical mechanics, and the method of canonical transformation depend on the proof, the section cannot be said to be unimportant. h o l d has shown that the more elementary treatment of this topic by Landau and Lifshitz is incorrect. Other texts, such as Goldstein, do not go beyond proving Liouville’s theorem, even though that just scratches the surface of the rigorous demands that being symplectic places on mechanical systems. This section depends especially on Section 4.4.

406

SYMPLECTIC MECHANICS

As in Eq.(13.4.9), the content of this two-form can be reexpressed by associating a

one-formZE = @(ZE.

a),

with arbitrary extended phase space displacement vector

dq2 dp;! d t ) T

% ~ = ( d q l dpl

(13.7.3)

One can then define an extended skew-scalar product of two vectors: l2Ebq 2 E a l E

4 2 )

-

( 13.7.4)

= U E (ZEb, i E a ) .

ti’[

This can in turn be expressed as a quadratic form as in Eq.(13.4.10):

o

-1

[ZEb, i E a l E =

o

dPb2

1 0

o

o

0

0

o

o

o o

-1 aH aH aH

aH

a41

apl

apZ

o

K?

1

-aH/aq’ -aH/apl -aH/aq2 - a ~ / a p ~ 0 (13.7.5)

The partial derivatives occurring as matrix elements are evaluated at the particular point in phase space that serves as origin from which the components in the vectors are reckoned.

Problem 13.7.1: Show that the determinant of the matrix in Eq. (13.7.5) vanishes but that the rank of the matrix is 4. Generalizing to arbitrary dimensionality n, show that the corresponding determinant vanishes and that the rank of the corresponding matrix is 2n. If one accepts the result of this problem the determinant vanishes and, as a result, it is clear that zero is one of the eigenvalues. One confirms this immediately by observing that the vector (1 3.7.6)

(or any constant multiple of this vector) is an eigenvector with eigenvalue 0. Furthermore, one notes from Hamilton’s equations that this vector is directed along the unique curve through the point under study. It will be significant that the vector uLH),because it is an eigenvector with eigenvalue 0, has the property that (13.7.7) for arbitrary vector WE. Recapitulating, it has been shown that the Hamiltonian system evolves in the direction given by the eigenvector of the (2n 1) x (2n 1) matrix derived from the 2-form dFE. This has been demonstrated explicitly only for the case n = 2,

+

+

407

INVARIANCE OF THE POINCARE-CARTAN INTEGRAL INVARIANT 1.1.

but it is not difficult to extend the arguments to spaces of arbitrary dimension. Also, though specific coordinates were used in the derivation, they no longer appear in the statement of the result. 13.7.2. Proof of tnvariance of the Poincar6 Relative Integral Invariant Though we have worked only on a particular two-form, we may apply the same reasoning to derive the following result known as Stokes’s lemma. Suppose that G(’) is an arbitrary two-form in a 2n 1 odd-dimensional space. For reasons that will become clear immediately, we start by seeking a vector uo having the property that d2)(uo,v) = 0 for arbitrary vector v. As before, working with specific coordinates, we can introduce a matrix A such that the skew scalar product of vectors u and v is given by

-

+

G(’)(u, V) = AU . V.

(13.7.8)

Problem 13.7.2: Following Eqs. (13.4.9), show that the matrix A is antisymmetric. Show also that the determinant of an arbitrary matrix and its transpose are equal, and also that, if it is odd-dimensional, changing the signs of every element has the effect of changing the sign of its determinant. Conclude therefore that A has zero as one eigenvalue. Accepting the result of the previous problem, we conclude that (if the stated conditions are met) a vector uo can always be found such that, for arbitrary v,

-

w(’)(uo, v) = 0.

(13.7.9)

This relation will be especially important when &(’) serves as the integrand of an area integral as in Eq. (4.4.9) and the vector uo lies in the surface over which the integration is being performed since this causes the corresponding contribution to the integral to vanish.

-

Vortex Lines ofa One-Form: If the two-form &(’) for which the vector uo was just found was itself derived from an arbitrary one-form G(’)according to G(’) = dG(’), then the flowlines of ~0 are said to be the “vortex lines” of ;(I). We now wish to employ Stokes’s theorem for forms (4.4.8)to a vortex tube such as is shown in Fig. 13.6.3. (Temporarily ignore that Fig. 13.6.3 illustrates phase space; think of the space as arbitrary and of the curves as vortex lines of the arbitrary form ;(I).) The curve y1 can be regarded, on the one hand, as bounding the surface rl and, on the other hand, as bounding the surface consisting of both )3 (formed from the vortex lines) and the surface I‘2 bounded by y2. Applying Stokes’s theorem (4.4.8) to curve y1, the area integrals for these two surfaces are equal. But we can see from the definition of the vortex lines that there is no contribution to the area integral coming from the area C. (The vortex lines belong to G(l),the integrand is &(I), and the grid by which the integral is calculated can be formed from differential areas each having one side aligned with a vertex line. Employing Eq.(13.7.9),

408

SYMPLECTIC MECHANICS

one finds that the contribution to the integral from every such area vanishes.) We conclude therefore that ( 1 3.7.10)

This is known as “Stokes’s lemma for forms.” It is the vanishing of the integral over C that has been essential to this argument, which should be reminiscent of the discussion given earlier of Ampthe’s law in electromagnetism. All that was required to prove that law was the vanishing of a surface integral, and it was anticipated in a footnote to that derivation that the argument would be repeated. We again specialize to phase space and consider a vortex tube belonging to the - H &. The vortex lines for this form are extended momentum one-form p , shown in Fig. 13.6.3, and we have seen that these same curves are valid trajectories of the Hamiltonian system. This puts us in a position to prove

G‘

R.I.I. =

1

pi dq = independent of time.

(13.7.11)

-i

In fact, this has already been done because Eq. (13.7.10) implies Eq. (13.7.11).This completes the proof of the constancy in time of the Poincark relative integral invariant R.I.I. under The constancy of R.I.I. is closely related to the invariance of Gi A coordinate transformations, as was shown earlier. The new result is that the system evolution in time preserves the invariance of this phase space area. This result is most readily applicable to the case in which many noninteracting systems are represented on the same figure, and the curve y encloses all of them. Since points initially within the tube will remain inside, and the tube area is preserved, the density of particles is preserved. The dimensionality of R.I.I. is

Gi

[R.I.I.] = [arbitrary] x

[arbitrary/time]

= [energy x time] = [action]. (13.7.12)

Knowing that Planck’s constant h is called “the quantum of action,” one anticipates connections between this invariant and quantum mechanics. Pursuit of this connection will lead first to the definition of adiabatic invariants as physical quantities subject to quantization. It will be shown that R.I.I. is an adiabatic invariant. It is interesting to reflect on the similarities and differences between the present derivation of the invariance of R.I.I. and the earlier derivation of Ampi?re’s law. Consider, for example, the flux through the differential element defined by the vectors uo and v shown in Fig. 13.6.3. (For simplicity we use the same figure for both electromagnetism and differential forms discussions.) Actually, two cases are shown. In the first case, both uo and v lie in the surface of the vortex tube; in the second case, though uo lies in the surface, v does not. In the two-form integration, the fact that uo is everywhere parallel to a vortex line already assures the vanishing of the differential

SYMPLECTIC SYSTEM EVOLUTION

409

contribution corresponding to uo and v whether or not v lies in surface of the vortex tube. In the case of a magnetic field of a straight current-carrying wire, the field H is parallel to the surface and hence its flux vanishes in the first case. This tempts one to leap to the (incorrect) conclusion that the vanishing of the surface integral is due to this. One might then be troubled by the nonvanishing of the flux of H in the second case since the vanishing in the calculation using forms depended only on uo, with no reference to the direction of v. The resolution of this “paradox” is that one leaped to the wrong conclusion in the magnetostatics case-the vanishing results from V x H lying in the surface, and the direction of H itself is irrelevant.

13.8. SYMPLECTIC SYSTEM EVOLUTION According to Stokes’s theorem for forms, Eq. (4.4.8), an integral over surface r is related to the integral over its bounding curve y by (13.8.1) Also, as in Eq.(13.7.2), we have

-d(pi a?;.)= -a?;’ &. A

(13.8.2)

With Cj = p i q , these relations yield

Since the left-hand side is an integral invariant, so also is the right-hand side. Because it is an integral over an open region, the latter integral is said to be an absolute integral invariant, unlike R.I.1, which is a relative integral invariant because its range is closed. It is not useful to allow the curve y of R.I.I. to become infinitesimal,but it is useful to extract the integrand of the absolute integral invariant in that limit, noting that it is the same quantity that has previously been called the canonical two-form -Yi

dq

A

- = canonical two-form G = invariant.

dpi

(13.8.4)

The “relative/absolute” terminology distinction does not seem particularly helpful to me, but the invariance of the canonical two-form will lead immediately to the conclusion that the evolution of a Hamiltonian system can be represented by a symplectic transformation. For the simple harmonic oscillator, the R.I.I. was derived in Eq. (13.6.9) using

k j p(x)dx =

11

dpdx.

(13.8.5)

W o important comments can be based on this formula. One is that, for area integrals in a plane, the relation Eq. (13.8.3) here reduces to the formula familiar from

410

SYMPLECTIC MECHANICS

elementary calculus by which areas (two-dimensional) are routinely evaluated by one-dimensional integrals. The other result is that the phase space area enclosed is independent of time. Because this system is simple enough to be analytically solvable, the constancy of this area is no surprise, but for more general systems this is an important result. One visualizes any particular mechanical system as one of a cloud of noninteracting systems, each one represented by one point on the surface r of Q.(13.8.3).Such a distribution of particles can be represented by a surface number density, which we may as well regard as uniform, since I' can be taken to be arbitrarily small. (For a relative integral invariant, there would be no useful similar picture.) As time increases, the systems move, always staying in the region r ( t )internal to the curve y ( t ) formed by the systems that were originally on the curve y . (It might be thought that points in the interior could in time change places with points originally on y , but that would require phase space trajectories to cross, which is not allowed.) Consider systems close to a reference system that is initially in configuration z(0) and later at z ( t ) , and let Az(t) be the time-varying displacement of a general system relative to the references system. By analogy with Eq. (13.4.6),the evolution can be represented by

Az(t) = M(t)Az(O).

(13.8.6)

We can now use the result derived in Section 13.4.2. As defined in Eiq. (13.4.2), the skew-scalar product [z, (t), Z b ( t ) ] formed from two systems evolving according to Eq. (13.8.6),is the quantity R.I.I. discussed in the previous section. To be consistent with its invariance, the matrix M(t) has to be symplectic. 13.8.1. Liouville's Theorem and Generalizations

In Sections 4.2.2 and 4.2.3, the geometry of bivectors and multivectors was discussed. This discussion, including Fig. 4.2.2, can be carried over to the geometry of the canonical two-form. Consider a two-rowed matrix (13.8.7)

whose elements are the elements of phase space vectors z, and Zb. By picking two columns at a time from this matrix and evaluating the determinants, one forms the elements x'j of a bivector z, A Zb:

(13.8.8)

By introducing p vectors and arraying their elements in rows, one can form p-index multivectors similarly. As in Eq. (4.2.9).after introducing a metric tensor g'j and us-

SYMPLECTIC SYSTEM EVOLUTION

411

ing it to produce covariant components x i j . . . k , one can define an “area” or “volume” (as the case may be) V by

We have used the notation of Eq. (13.4.2) to represent the skew-invariant products of phase space vectors. For p = 1, we obtain V;) = ZiZi = det I [z, Z] I = 0;

(13.8.10)

like the skew-invariant product of any vector with itself, it vanishes. For p = 2, we obtain

As we have seen previously, if the vectors Zi represent (time-varying) system configurations, the elements of the matrix in Eq. (13.8.9). such as

are invariant. (As shown in Fig. 13.3.1, the first term in this series can be interpreted as the area defined by the two vectors after projection onto the q I , p’ plane, and similarly for the other terms.) Since its elements are all invariant, it follows that V(p) is also invariant. In Section 4.2.3, this result was called “the Pythagorean relation for areas.” One should not overlook the fact that, though the original invariant given by Eq. (13.8.12) is a linear sum, of areas, the new invariants given by Eq. (13.8.9) are quadratic sums. The former result is a specifically symplectic feature, while the new invariants result from metric (actually skew-metric in our case) properties. A device to avoid forgetting the distinction is always to attach the adjective Pythagorean to the quadratic sums. By varying p , we obtain a sequence of invariants. For p = 2 we obtain the original invariant, which we now call V(2) = [zl , z21. Its (physical) dimensionality is [action] and the dimensionality of V(2p) is [actionlp. The sequence terminates at p = 2n, since beyond there all multivector components vanish, and for p = n, except for sign, all multivector components have the same value. Considering n = 2 as an example, the phase space is four-dimensional and the invariant is

412

SYMPLECTIC MECHANICS

If the vectors have been chosen so the first two lie in the 4’.p ’ plane and the last two lie in the q 2 , p 2 plane, the matrix elements in the upper right and lower left quadrants vanish and V(4)is equal to the product of areas defined by the first, second and third, fourth pairs. This is then the “volume” defined by the four vectors. It is the invariance of this volume that is known as Liouville’s theorem. If noninteracting systems, distributed uniformly over a small volume of phase space, are followed as time advances, the volume they populate remains constant. Since their number is also constant, their number density is also constant. Hence one also states Liouville’s theorem in the form the density of particles in phase space is invariant if their evolution is Hamiltonian. Liouville’s theorem itself could have been derived more simply, since it follows from the fact that the determinantof a symplecticmatrix is 1, but obtaining the other invariants requires the multivector algebra.

BIBLIOGRAPHY References 1 . V. I. Arnold, Mathematical Methods of Classical Mechanics, 2nd ed., Springer-Verlag. New York, 1989. 2. C. Lanczos, The Variational Principles of Mechanics, University of Toronto Press, Toronto, 1949.

VI APPROXIMATE METHODS

Hardly any problems of mechanics (or any other area of physics) are exactly solvable but many are close to exactly solvable systems and they constitute the bread and butter of the subject. Here the exactly solvable system will be called the “unperturbed” system, the forces causing the system to be no longer solvable will be called “perturbations,” and the actual system of interest will be called the “perturbed” system. There is a vast literature describing approximation methods and their application to practical problems. In the next chapter the analytic bases of some of these methods will be investigated. In the chapter after that linear systems are analyzed. Because linear systems can almost always be solved exactly, it may seem artificial to include them among approximate methods. But usually systems are linear only because terms making them nonlinear have been dropped. Furthermore, most nonlinear methods assume that some degree of linear analysis has preceded the application of nonlinear methods. In the final chapter, practical methods of solution and examples will be studied in greater detail. Most of the early developments in this field came in the field of celestial mechanics, and some of the terminology derives from that source. A valid trajectory of the solvable system is known as the unperturbed “orbit,” and the constants specifying the orbit geometry are known as “orbit elements.” The method of “variation of constants” is the closest thing there is to a universal approach to analyzing the perturbed motion. At every instant, the dynamic values of the system are matched to the newly-allowed-to-be-variableorbit element values.

This Page Intentionally Left Blank

14 ANALYTIC BASIS FOR AP PROXlMATI0N Once equations of motion have been found they can usually be solved by straightforward numerical methods, but numerical results rarely provide much general insight and it is productive to develop analytic results to the extent possible. Since it is usually believed that the most essential “physics” is Hamiltonian, considerable effort is justified in advancing the analytic formulation to the extent possible without violating Hamiltonian requirements. One must constantly ask, “Is it symplectic?” In this chapter, the method of canonical transformation will be introduced and then exercised by being applied to nonlinear oscillators. Oscillators of one kind or another are probably the systems most frequently analyzed using classical mechanics. Some, such as relaxation oscillators, are inherently nonsinusoidal, but many exhibit motion that is approximately simple harmonic. Some of the sources of deviation from harmonicity are (usually weak) damping, Hooke’s law violating restoring forces, and parametric drive. Hamiltonian methods, and in particular phase space representation, are especially effective at treating these systems, and adiabatic invariance, to be defined shortly, is even more important than energy conservation.

14.1. CANONICAL TRANSFORMATIONS 14.1.l.The Action as a Generator of Canonical Transformations We have encountered the Jacobi method in relation to Hamilton-Jacobi theory while developing analogies between optics and mechanics. However, one may also come upon this procedure in a more formal context while developing the theory of “canonical transformation,” which means transforming the equations in such a way that Hamilton’s equations remain valid. The motivation for restricting the field of acceptable transformations in this way is provided by the large body of certain knowledge one has about Hamiltonian systems, much of it described in the previous chapter. 415

416

ANALYTIC BASIS FOR APPROXIMATION

From a Hamiltonian system initially described by “old” coordinates q ’ , q 2 , . . . ,q“ and “old” momenta p1, p2, . . . , p,,,we seek appropriate transformations ( q ’ , q 2,...,q”;pi,p2 ,... , p d +

( Q ’ , Q’,..., Q”;Pi,P2,..., Pfl) (14.1.1)

to “new coordinates” Q’ , Q 2 , . . . , Q“ and “new momenta” P I , P2,. . . , P,,.’ (Within the Jacobi procedure these would have been known as j?-parameters and a-parameters, respectively.) Within Lagrangian mechanics we have seen the importance of variational principles in establishing the invariance to coordinate transformation of the form of the Lagrange equations. Since we have assigned ourselves essentially the same task in Hamiltonian mechanics, it is appropriate to investigate Hamiltonian variational principles. This method will prove to be successful in establishing conditions that must be satisfied by the new Q and P variables. Recall the Poincark-Cartan integral invariant 1.1. defined in Eq. (13.6.6). and from it write the closely related “Hamiltonian, variational” line integral H.I.:2 (14.1.2) Other than starting at PI and ending at P2 (and not being “pathological”), the path of integration is arbitrary in the extended phase space q’, p ; , and r. It is necessary, however, for the given Hamiltonian H(q, p, t) to be appropriately evaluated at every point along the path of integration. Here we use the modified symbol H.I. to indicate that p and H are not assumed to have been derived from a solution of the HamiltonJacobi equation, as they were in Eq. (13.6.6). In particular, the path of integration is not necessarily a solution path for the systed H.I. has the dimensions of action and we now subject it to analysis something like that used in deriving the Lagrange equations from 1L dt. In particular, we seek the integration path for which H.I. achieves an extreme value. In contrast to coordinate, velocity space where the principle of extreme action was previously analyzed, consider independent smooth phase space variations (SqJp) away from an arbitrary integration path through fixed endpoints ( P I ,t l ) and (9, t 2 ) . (Forgive the fact that P is being used both for momentum components and to label endpoints.) Evaluating the variations of its two terms individually, the condition for H.I. to achieve an extreme value is

o=

’

l:tr(.;+ dq’

aH

pid(6q’) - -6q’ 34’

aH d t - -6p; api

)

dt .

(14.1.3)

It would be consistent with the more formally correct mathematical notation introduced previously to use the symbol F; for momentum pi since the momenta are more properly thought of asfonns, but this is rarely done. *It was noted in the previous chapter that, when specific coordinates are in use, the differential forms are eventually replaced by the old-fashioned differentials dq’ and similarly for the other differential forms appearing in the theory. Because we will not be insisting on inrrinsic description, we make the replacement from the start.

a.

CANONICALTRANSFORMATIONS 6 P dq

P

417

varied path unvaried path

db; P

4

-

*

FIGURE 14.1.l.Areas representingterms Sp dq+p d(Sq) in the Harniltonian variational integral.

1

The last two terms come from H d t just the way two terns came from S L d t in the Lagrangian derivation. Where the first two terms come from is illustrated in Fig. 14.1.1. At each point on the unvaried curve, incremental displacements Sq(q) and 6 p ( q ) locate points on the varied curve. Since the endpoints are fixed, the deviation Sp vanishes at the ends and d ( 6 q i )must average to zero in addition to vanishing at the ends. With a view toward obtaining a common multiplicative factor in the integrand, using the fact that the endpoints are fixed, the factor pi d(6q') can be replaced by -6q' d p i , as the difference d(pi6q') is a total differential. Then, since the variations 6q' and 6pi are arbitrary, Hamiltons equations follow:

aH q'. = ,

and

pi

aH =--.

api

(14.1.4)

34'

It has therefore been proved that Hamilton's equations are implied by applying the variational principle to integral H.I. But that has not been our real purpose. Rather, as stated previously, our purpose is to derive canonical transformations. Toward that end we introduce3 an arbitrary function G ( q ,Q, t ) , of old coordinates q and new coordinates Q and alter H.I. slightly by subtracting the total derivative dG from its integrand:

=

s,"

aG (pidq' - H d t - -dq' 84'

'

aG

- -dQ, aQi

3Goldsteinuses the notation F l ( q , Q, t ) for our function G(q, Q,t ) ,

-

at

418

ANALYTIC BASIS FOR APPROXIMATION

This alteration cannot change the extremal path obtained by applying the same variational principle since the integral over the added term is independent of path. We could subject H.I.’ to a variational calculation like that applied to I, but instead we take advantage of the fact that G is arbitrary to simplify the integrand by imposing on it the condition (14.1.6)

This simplifies Eq. (14.1.5) to H.I.’ =

S,,

4 (PidQi - H ’ d t ) ,

(14.1.7)

where we have introduced the abbreviations Pi = - aG(s, Q, 0 , a Qi

aG and Hf(Q,P,r)= H+-. at

(14.1.8)

The former equation, with Eq. (14.1.6),defines the coordinate transformation and the latter equation gives the Hamiltonian in the new coordinates. The motivation for this choice of transformation is that Eq. (14.1.7) has the same form in the new variables that Eq. ( 14.1.2) had in the old variables. The equations of motion are therefore

a H‘ and Pi = --, aHf =(14.1.9) api ’ a Q‘ Since these are Hamilton’s equations in the new variables, we have achieved our goal. The function G(q, Q, r ) is known as the “generating function” of the canonical transformation defined by Eq. (14.1.6) and the first of Eqs. (14.1.8). The transformations have a kind of hybrid form (and this is an inelegance inherent to the generating function procedure) with G depending as it does on old coordinates and new momenta. Also, there is still “housekeeping” to be done, expressing the new Hamiltonian H’ in terms of the new variables, and there is no assurance that it will be possible to do this in closed form. Condition (14.1.6), which has been imposed on the function G, is reminiscent of the formula for p in the Hamilton-Jacobi theory, with G taking the place of action function S. Though G could have been any function consistent with Eq. (14.1.6), if we conjecturethat G is a solution of the Hamilton-Jacobi equation H + %/at = 0, we can determine from Eq. (14.1.8) that the new Hamiltonian is given by H’ = 0. Nothing could be better than a vanishing Hamiltonian because, by Eqs. (14.1.9), it implies that the new coordinates and momenta are constants of the motion. Stated conversely, if we had initially assigned ourselves the task of finding coordinates that were constants of the motion, we would have been led to the Hamilton-Jacobi equation as the condition to be applied to generating function G. The other equation defining the canonical transformationis the first of Eqs. (1 4.1.8): Qi

(14.1.10)

TIME-INDEPENDENT CANONICAL TRANSFORMATION

419

Without being quite the same, this relation resembles the Jacobi-prescription formula B = aS/act for extracting constants of the motion /3 corresponding to separation constant ct in a complete integral of the Hamilton-Jacobi equation. It is certainly true that if G is a complete integral and the Pi are interpreted as the separation constants in that solution, then the quantities defined by Eq. (14.1.10) are constants of the motion. But, relative to the earlier procedure, coordinates and momenta are interchanged. The reason is that the second arguments of G have been taken to be coordinates rather than momenta. We are therefore motivated to subtract the total differential of an arbitrary function d S ( q , P, r)4 (or rather, for reasons that will become clear immediately, the function d ( S - Pi Q’)) from the variational integrand:

H.I.’ =

l:

(pi&; - H d t - -dq‘ as 84’

as a pi

. - -dPi

- -dt as

ar

+ PidQ’ + Q’dPi (14.1.1 1)

where we have required

(It was only with the extra subtraction of d(P;Q i ) that the required final form was obtained.) We have now reconstructed the entire Jacobi prescription. If dS(q,P, r ) is a complete integral of the Hamilton-Jacobi equation, with the Pi defined to be the a;separation constants, then the pi 3 Qi obtained from the second of Eqs. (14.2.1) are constants of the motion. To recapitulate, a complete integral of the Hamilton-Jacobi equation provides a generator for performing a canonical transformation to new variables for which the Hamiltonian has the simplest conceivable form-it vanishes--causing all coordinates and all momenta to be constants of the motion.

14.2. TIME-INDEPENDENT CANONICAL TRANSFORMATION Just as the Hamilton-Jacobi equation is the short-wavelengthlimit of the Schrodinger equation, the time-independent Hamilton-Jacobi equation is the same limit of the time-independent Schrijdingerequation. As in the quantum case, methods of treating the two cases appear to be rather different even though time independence is just a special case. 4Goldstein uses the notation F2(q, P, t ) for our function S(q, P. t ) . This function is also known as “Hamilton’sprincipal function.” Other generating functions, F3(p. Q, t ) and F4(p, P, r ) in Goldstein’s notation, can also be used.

420

ANALYTIC BASIS FOR APPROXIMATION

When it does not depend explicitly on time, the Hamiltonian is conserved, H(q, p) = E, and a complete integral of the Hamilton-Jacobi equation takes the form

where the independent parameters are listed as P. The term acrion, applied to S up to this point, is commonly also used to refer to So.5 In this case the Hamilton-Jacobi equation becomes

3

=E,

H q,-

(14.2.2)

and a complete integral is defined to be a solution of the form So = So(q, P)

+ const.,

(14.2.3)

with as many new parameters Pi as there are coordinates. It is important to recognize, though, that the energy E can itself be regarded as a Jacobi parameter, in which case the parameter set P is taken to include E. In this time-independentcase it is customary to use So(q, P) (rather than S(q, P,r ) ) as the generating function. By the general theory, new variables are then related to old by (14.2.4)

In particular, taking E itself as one of the new momenta, its corresponding new coordinate is (14.2.5)

which is nonvanishing because the parameter set P includes E. Defined in this way, Q E is therefore not consrani. The quantity whose constancy is assured by the Jacobi theory is

as

-= QE- t aE

+ to = constant.

(14.2.6)

This shows that Q E and time t are essentially equivalent, differing at most by the choice of what constitutes initial time. Eq. (14.2.6) is the basis of the statement that E and t are canonically conjugate variables, Continuing with the canonical transfor5Goldsteinuses the notation W(q, P) for our function So(q. P). This function is also known as “Hamilton’s characteristic function.”The possible basis for this terminology has been discussed earlier in connection with Problem 10.3.3. Landau and Lifshitz call SO the “abbreviated action.”

ACTION-ANGLEVARIABLES

421

mation, the new Hamiltonian is

H’(Q, P, t ) = H

a so = E . +at

(14.2.7)

We have obtained the superficiallycurious result that in this simpler, time-independent case the Hamiltonian is less simple, because nonvanishing, than in the timedependent case. This is due to our use of So rather than S as the generating function. But H’ is constant, which is good enough.6 We can already test one of the Hamilton equations, namely the equation for b ~

.

aH’ aE

Q E = -= 1 ,

(14.2.8)

which is in agreement with Eq. (14.2.6). For the other momenta, not including E , Hamilton’s equations are Pi = O

and

. ’ aE Q‘ = - = O . a pi

(14.2.9)

Hence finding a complete integral of the time-independentHamilton-Jacobi equation is tantamount to having solved the problem.

14.3. ACTION-ANGLE VARIABLES 14.3.1. The Action Variable of a Simple Harmonic Oscillator Recall that the variation of action S along a true trajectory is given, as in Eq. (1 I . 1. lo), by dS = pjdq‘

-Hdt,

or S ( P ) =

(14.3.1)

Applying this formula to the simple harmonic oscillator, since the path of integration is a true particle trajectory, H = E, the second term integrates to - E ( t - to). Comparing this with Eq. (14.2.1). we obtain for the abbreviated action So(q) = 1; pidq’, or, in one dimension, (14.3.2) The word “action” has already been used to define the basic Lagrangian variational integral and as a name for the function satisfying the Hamilton-Jacobi equation, but it now acquires yet another meaning as “1/2n times the phase space area enclosed 6When applying the Jacobi prescription in the time-independent case, one must be careful not to treat E as functionally dependent on any of the other Pi, though.

,

422

ANALYTIC BASIS FOR APPROXIMATION

after one cycle.” Because this quantity will be used as a dynamic variable, it is called the “action variable” I of the oscillator.’ For simple harmonic motion 1

2n

2n

dpdq=-nJ2mE 2n

2E

-

E

.

(14.3.3)

The first form of integral here is a line integral along the phase space trajectory, the second is the area in ( q , p) phase space enclosed by that curve. In quantum mechanics, Planck‘s constant h specifies a definite area of phase space, and the number of quantum states is given by d p d q / h. (Reviewing the solution of the Schrodinger equation for a particle in a box would confirm this at least approximately.) Commonly, units are employed for which A = h/(21r) = 1, and in those units the number of states is given by 1d p d q . This is a possible justification for, or at least mnemonic to remember, the factor 1 / ( 2 n )entering the conventional definition of I. This factor will also give the “right” period, namely 231, for the motion expressed in terms of “angle variables” (to be introduced shortly).

i

&

14.3.2. Adiabatic lnvariance of the Action I

Consider a one-dimensional system that is an “oscillator” in the sense that coordinate q returns to its starting point at some time. If the Hamiltonian is time-independent, the energy is conserved, and the momentum p returns to its initial value when q does. In this situation, the area within the phase space trajectory is closed and the action variable I just introduced is unambiguously defined. Suppose however that the Hamiltonian H (4,p , r) and hence the energy E(r) have a weak dependence on time that is indicated by writing

The variable A. has been introduced artificially to consolidate whatever time dependence exists into a single parameter for purposes of the following discussion. At any time r the energy E ( t ) is defined to have the value it would have if k ( t ) were held constant at its current instantaneous value. Any nonconstancy of E ( r ) reflects the time dependence of H.The prototypical example of this sort of time dependency is pururnetric variation-for example, the “spring constant” k, a “parameter” in simple harmonic motion, might vary slowly with time, k = k ( r ) . Eventually what constitutes “slow” will be made more precise but, much like the short-wavelength approximations previously encountered, the fractional change of frequency during 7The terminology is certainly strained since I is usually called the “action variable,” in spite of the fact that it is constant, but “variable” does not accompany “action” when describing So, which actually does vary. Next we will consider a situation in which I might be expected to vary, but will find (to high accuracy) that it does not. Hence the name “action nonvariable” would be more appropriate. Curiously enough the word “amplitude” in physics suffers from the same ambiguity; in the relation x = a coswr it is ambiguous whether the “amplitude” is x or a.

ACTION-ANGLE VARIABLES

423

one oscillation period is required to be small. Motion with )I fixedvariable will be called “unperturbed/perturbed.” During perturbed motion the particle energy, (14.3.5)

E(t) = H ( q , p , h(t)),

varies, possibly increasing during some parts of the cycle and decreasing during others, and probably accumulating appreciably over many cycles. We are now interested in the systematic or averaged-over-one-cycle variation of quantities like E (I) and Z(r). The “time average” f ( t ) of a variable f ( t ) that describes some property of a periodic oscillating system having period T is defined to be t+T

fo= f [

f (t‘)dt’.

(14.3.6)

From here on we take t = 0. Let us start by estimating the rate of change of E as h varies. Since h ( t ) is assumed to vary slowly and monotonically over many cycles, its average rate of change dh/dt and its instantaneous rate of change dhldt differ negligibly, making it unnecessary to distinguish between these two quantities. But the variation of E will tend to be correlated with the instantaneous values of q and p , so E can be expected to be above average at some times and below average at others. We seek the time-averaged value d E / d t . To a lowest approximation we anticipate d E / d t d k / d t , unless it should happen (which it won’t) that d E / d t vanishes to this order of approximation. Two features that complicate the present calculation are that the perturbed period T is in general different from the unperturbed period and that the phase space orbit is not in general closed, so its enclosed area is poorly defined. To overcome these problems, the integrals will be recast as integrals over one cycle of coordinate q , since q necessarily returns to its starting value, say q = 0. (We assume q ( t = 0) # 0.) The action variable

-

I ( E , A) = - p ( q , E , 2n l f

(14.3.7)

is already written in this form. From Eq. (14.3.5) the instantaneous rate of change of energy is given by (14.3.8) and its time average is therefore given by (14.3.9) (Because of the assumed slow, monotonic variation of h(r), it is legitimate to move the $ factor outside the integral in this way.) To work around the dependence of T on

424

ANALYTIC BASIS FOR APPROXIMATION

A, we recast this expression in terms of phase space line integrals. Using Hamilton’s equations we obtain

dt,

1

and hence T =

4.

(14.3.10)

This formula was encountered first in part (c) of Problem 1.2.1. Here we must respect the assumed functional form H ( q , p , A) and, to emphasize the point, have indicated explicitly what variables are being held constant for the partial differentiation. (To be consistent, we should have similarly written aH/aJ.l,,, in the integrand of Eq. (14.3.9).) Making the same substitution (14.3.10) in the numerator, formula (14.3.9) can be written (14.3.1 1)

Because this expression is already proportional to dA/dt, which is the order to which we are working, it is legitimate to evaluate the two integrals using the unperturbed motion. Terms neglected by this procedure are proportional to d A / d t and give only contributions of order (dA/dr)2 to d E / d t . (This is the sort of maneuver that one always resorts to in perturbation theory.) The unperturbed motion is characterized by functional relation (14.3.5) and its “inverse” E = H ( q , p , A)

and p = p ( 4 , A, E ) ,

or E = H ( q , p ( 4 . A, E l , A). (14.3.12)

From now on, since A is constant because unperturbed motion is being described, it will be unnecessary to list it among the variables being held fixed during differentiation. Differentiatingthe third formula with respect to E yields (14.3.13)

which provides a more convenient form for one of the factors appearing in the integrands of Eq. (14.3.11). Differentiating the third of Eqs. (14.3.12) with respect to A yields

Finally, substituting these expressions into Eq. (14.3.11) yields

-=-----#El

__

dE dt

dA I dt T

4.E

d4.

(14.3.15)

ACTION-ANGLE VARIABLES

425

As stated previously, the integral is to be performed over the presumed-to-be-known unperturbed motion. We turn next to the similar calculation of d l fdt. Differentiating Eq. (14.3.7) with respect to t using Eq. (14.3.8) and the first of Eqs. (14.3.10) yields

dt

2n

From the second of Eqs. (14.3.14) it can then be seen that -

d _l -- 0. dt

(14.3.17)

Of course this is only approximate, as terms of order (dh/dt)2have been dropped. Even so, this is one of the most important formulas in mechanics. It is usually stated as the action variable is an adiabatic invariant. That this is not an exact result might be regarded as detracting from its elegance, utility and importance. In fact the opposite is true since, as we shall see, it is often an extremely accurate result, with accuracy in parts per million not uncommon. This would make it perhaps unique in physics-an approximation that is as good as an exact result-except that the same thing can be said for the whole of Newtonian mechanics. It is still possible for I to vary throughout the cycle, as an example in Section 14.3.4 will show, but its average is constant. There is an important relation between action I and period T (or equivalently frequency w = 23712') of an oscillator. Differentiatingthe defining equation (14.3.7) for I with respect to E and using Eq. (14.3.13) and Eq. (14.3.10) yields

ar aE

1 2n

T

1

4.).

(14.3.18)

This formula can be checked immediately for simple harmonic motion. In Eq. (14.3.3) . hence we had I = E f w ~and (14.3.19)

To recapitulate, we have considered a system with weakly time-dependent Hamiltonian H , with initial energy Eo determined by initial conditions. Following the continuing evolution of the motion, the energy, because it is not conserved, may have evolved appreciably to a different value E. Accompanying the same evolution, other quantities such as (a priori) action I and oscillation period T also vary. The rates d E fdt, d l fdt, dh fdt, etc., are all proportional to dh fdt-doubling d h f d t doubles

426

ANALYTIC BASIS FOR APPROXIMATION

all rates for small dA/dt. Since these rates are all proportional, it should be possible to find some combination that exhibits a first-order cancellation, and such a quantity is an “adiabatic invariant” that can be expected to vary only weakly as A is varied. It has been shown that I itself is this adiabatic invariant. In thermodynamics one considers “quasistatic” variations in which a system is treated as static even if it is changing slowly, and this is what we have been doing here, so “quasistatic” invariant would be slightly more apt than “adiabatic,” which in thermodynamics means that the system under discussion is isolated in the sense that heat is neither added nor subtracted from the system. But the tenninology is not entirely inappropriate, as we are considering the effect of purely mechanical external intervention on the system under discussion; heat is not even contemplated, and certainly neither added nor removed. There is an important connection between quantized variables in quantum mechanics and the adiabatic invariants of the corresponding classical system. Suppose a quantum system in a state with given quantum numbers is placed in an environment with varying parameters (such as a time-varying magnetic field, for example), but that the variation is never quick enough to induce a transition. Let the external parameters vary through a cycle that ends with the same values as they started with. Since the system has never changed state, it is important that the physical properties of that state should have returned to their starting values-not just approximately, but exactly. That it does this is what distinguishes an adiabatic invariant. This strongly suggests that the dynamical variables whose quantum numbers characterize the stationary states of quantum systems have adiabatic invariants as classical analogs. The Bohr-Somerfeld atomic theory, which predated slightly the discovery of quantum mechanics, was based on this principle. Though it became immediately obsolete, this theory was not at all ad hoc and hence has little in common with what passes for “the Bohr-Somerfeld model” in modem sophomore physics courses. In short, the fact that the action is an adiabatic invariant makes it no coincidence that Planck’s constant is called “the quantum of action.”

14.3.3. ActiodAngle Conjugate Variables

Because of its adiabatic invariance, the action variable I is an especially appropriate choice as parameter in applying the Jacobi procedure to a system with slowly varying parameters. We continue to focus on oscillating systems. Recalling the discussion of Section 14.2. we introduce the abbreviated action (14.3.20) Until further notice, A will be taken as constant, but it will be carried along explicitly in preparation for allowing it to vary later on. Since A is constant, both E and I are constant, and either can be taken as the Jacobi “momentum” parameter; previously we have taken E, now we take I, which is why the arguments of So have been given

ACTION-ANGLE VARIABLES

427

as (q, I, A). Since holding E fixed and holding I fixed are equivalent, (14.3.21) Being a function of q through the upper limit of its defining equation, So(q, I, A) increases by 2n I as q completes one cycle of oscillation since, as in Eq.(14.3.7), (14.3.22) Using So(q, I, A), defined by Eq. (14.3.20) as the generator of a canonical transformation, Eqs. (14.2.4) become (14.3.23) where (p, the new coordinate conjugate to new momentum I, is called an “angle variable.” For the procedure presently under discussion to be useful, it is necessary for these equations to be reduced to explicit transformation equations ( q , p) -+ (I, cp), such as Eqs. (14.3.30) of the next section. By Eq. (14.2.7), the new Hamiltonian is equal to the energy (expressed as a function of I)

H I U , (p, A) = E ( I , A),

(14.3.24)

and Hamilton’s equations are (14.3.25) where Eq. (14.3.18) has been used, and the symbol @ ( I , h ) has been introduced to stand for the oscillator frequency. Integrating the second equation yields (p

= w(Z,A ) ( ? - to).

(14.3.26)

This is the basis for the name “angle” given to cp. It is an angle that advances through 217 as the oscillator advances through one period. In these ( q , p) + ((p, I) transformation formulas, h has appeared simply as a fixed parameter. One way to exploit the concept of adiabatic invariance is now to permit A to depend on time in a formula such as the second of Eqs. (14.3.25), I$ = @ ( I , h ( t ) ) .This formula, giving the angular frequency of the oscillator when A is constant, will continue to be valid with the value of I remaining constant, even if A varies arbitrarily, as long as its variation over one cycle is negligible when the frequency is being observed. A more powerful way of proceeding is to recognize that it is legitimate to continue using Eqs. (14.3.23) as transformation equations, even if A. varies, provided A is replaced by A ( ? ) everywhere it appears. The generating function is then &(q, I, A ( ? ) ) , and (p will still be called the “angle variable,” conjugate to I. Using Eq. (14.1.7), and

428

ANALYTIC BASIS FOR APPROXIMATION

taking account of the fact that the old Hamiltonian is now time-dependent, the new Hamiltonian is H'(q, I , t ) = H

a so = E +at

an

(14.3.27)

The new Hamilton equations are

a ~ .u ., ) i )

. q=-+-

,

a1

(14.3.28) 1.1

Since no approximations have been made, these are exact equations of motion provided the function SOhas been derived without approximation. 14.3.4. Parametrically Driven Simple Harmonic Motion Generalizingthe simple harmonic motion analyzed in Section 11$2.7by allowing the spring constant k ( t ) 3 mh2(t) to be time-dependent, the Hamiltonian is

- + -mA2(t)q2. 1 2

~ ( qp , ,t ) = P 2 2m

(14.3.29)

Though time-dependent, this Hamiltonian represents a linear oscillator because the frequency is independent of amplitude. The time-independent transformations corresponding to Eqs. (14.3.23) can be adapted from Eq. (1 1.2.36)by substituting00 = X, E = IWO = I A , andwg(t - to) = (p:

(14.3.30)

The abbreviated action is given by

The dependence on q is through its presence in the upper limit. This dependence can be rearranged as h = __ 21 sin2 p.

q2m

(14.3.32)

ACTION-ANGLE VARIABLES

429

This can be used to calculate the quantity (14.3.33) which can then be substituted into Eqs. (14.3.28);

i i i= A. + sin2(a-. A

(14.3.34)

Here the frequency @ ( I , A) has been calculated as if A were time-independent; that is, w ( I , A) = A. Since in this case the slowly varying parameter has been chosen as A = w, one can simply replace A by w, eliminating the artificially introduced A. The first equation shows that d l l d t is not identically zero, but the fact that cos 2(a averages to zero shows that the equation implies that d l l d t averages to zero to the extent that I is constant over one cycle and can therefore be taken outside the averaging. Though this statement may seem a bit circular-if I is constant then I is constant-it shows why I is approximately constant and can be the starting point of an estimate of the accuracy to which this is true. The new Hamiltonian is obtained from Eqs. (14.3.27) and (14.3.33),

where the time dependence is expressed as the dependence on time (but not amplitude) of the “natural frequency” o(t).The linearity of the oscillator is here reflected by the fact that H’depends linearly on I. Problems below illustrate how this can be exploited to complete the solution in this circumstance. Fiq. (14.3.35) can be used to check Eqs. (14.3.34) by substituting into Hamilton’s equations, although that is not different from what has already been done. The angle (a has appeared in these equations only in the forms sin (a, cos (a, sin 2(a, cos 260. This is not an accident because, although the abbreviated action is augmented by 2rr I every period, with this term subtracted the action is necessarily a periodic function of (a. The accumulating part does not contribute to Iq,l because I is held constant. It follows that H’is a periodic function of (a with period 2 n and can therefore be expanded in a Fourier series with period 2n in variable (a. For the particular system under study this Fourier series has a single term, sin 260, augmenting its constant part.

2

Problem 14.3.1: Eq. (14.3.30) gives a transformation ( q , p) -+ (I,(a). Derive the inverse transformation ( I , 9) + (q, p ) . Using a result from Section 13.4.2, show that both of these transformations are symplectic.

430

ANALYTIC BASIS FOR APPROXIMATION

Problem 14.3.2: Consider a one-dimensional oscillator for which the Hamiltonian expressed in action-angle variables is

H

=

w

~

+

t

~

~

~

~

2

~

,

where w and t are constants (with E not allowed to be arbitrarily large). From Hamilton’s equations, express the time dependence p(t) as an indefinite integral and perform the integration. Then express I (?) as an indefinite integral.

&

Problem 14.3.3: For the system with Hamiltonian given by H ( q , p , t ) = -Iirn12(t)q2as in Eq. (14.3.29),consider the transformation ( q , p ) -+ ( Q , P) given by

(14.3.36) where r ( t ) will be specified more precisely in a later problem but is, for now, an arbitrary function of time. Show that this transformation is symplectic.

Problem 14.3.4: For the same system, in preparation for finding the generating function G ( q , Q, t ) defined in Eqs. (14.1.6) and (14.1.8), rearrange the transformation equations of the previous problem into the form P = P ( q , Q , t ) and p = p ( q , Q,t). Then find G ( q , Q,t) such that p = -

aG a4 ’

p=--

aG

aQ‘

(14.3.37)

Problem 14.3.5: In preparation for finding the new Hamiltonian H ’ ( Q . P,t) and expressing it (as is obligatory) explicitly in terms of Q and P,invert the same transformation equations into the form q = q ( Q , P, t ) and p = p ( Q , P, t ) . Then find H ’ ( Q , P, t ) and simplify it by assuming that r ( t ) satisfies the equation i-’

+ 1 2 ( t ) r - r p 3 = 0.

(14.3.38)

Then show that Q is ignorable and hence that P is conserved.

Problem 14.3.6: Assuming that the system studied in the previous series of problems is oscillatory. find its action variable and relate it to the action variable E / w of simple harmonic motion.

EXAMPLES OF ADIABATIC INVARIANCE

431

14.4. EXAMPLES OF ADIABATIC INVARIANCE

14.4.1. Variable-Length Pendulum Consider the variable-length pendulum shown in Fig. 14.4.1. Tension T holds the string, which passes over a frictionless peg, the length of the string below the peg being Z(t). Assuming small-amplitude motion, the “oscillatory energy” of the system Eosc is defined so that the potential energy (with the pendulum hanging straight down) plus kinetic energy of the system is - m g l ( t ) Eosc. With fixed I ,

+

Eosc = -mgl8,,, 1 2 2

(14.4.1)

If the pendulum is not swinging, E,,, continues to vanish when the length is varied slowly enough that the vertical kinetic energy can be neglected. We assume the length changes slowly enough that i2 and i’ can be neglected throughout. The equation of motion is

id

g

1

1

8 + - + - sin8 = 0.

(14.4.2)

For “unperturbed” motion, the second term is neglected, and the (small-amplitude) action is given by (14.4.3) Change dZ in the pendulum length causes change d8,, in maximum angular amplitude. The only real complication in the problem is that the ratio of these quantities depends on 8. The instantaneous string tension is given by mg cos 8 +m1d2 -mi: but we will neglect the last term. The energy change d E,,, for length change dl is equal to the work done -Tdl by the external agent acting on the system less the change in

FIGURE 14.4.1. Variable-length pendulum. The fractional change of length during one oscillation period is less than a few percent.

432

ANALYTIC BASIS FOR APPROXIMATION

potential energy:

d E,,, = -(mg cos 0

+ rnld2) dl + mg d l .

(14.4.4)

Continuing to assume small oscillation amplitudes,

dEosc 1 2 - m1d2. -- - -mgo dl 2

(14.4.5)

The right-hand side can be estimated by averaging over a complete cycle of the unperturbed motion and, for that motion, (14.4.6) As a result, using Q. (14.4.1). we have

(14.4.7)

Then from Eq.(14.4.3), (14.4.8)

Here we have treated both 1 and E,,, as constant and moved them outside the averages. The result is that I is conserved, in agreement with the general theory.

14.4.2. Charged Particle In Magnetic Field Consider a charged particle moving in a uniform magnetic field B ( t ) that varies slowly enough that the Faraday law electric field can be neglected and also slowly enough that the adiabatic condition is satisfied. With the coordinate system as defined in Fig. 14.4.2, the vector potential of such a field is 1 A, = - 5 y B ,

1 A,=-xB,

2

Az=O,

(14.4.9)

because

i VxA=l& Ax

j

e

&

t/=s”.

A,

A,

( 14.4.1 0)

Introducing cylindrical coordinates, from Eq. (12.6.4) the (nonrelativistic) Lagrangian is

EXAMPLES OF ADIABATIC INVARIANCE

433

FIGURE 14.4.2. A charged particle moves in a slowly varying, uniform magnetic field.

1 u L=-m 2

2+

eA-v

(14.4.1I )

2

Because this is independent of 8 , the conjugate momentum,8 1

Pe = mr2e + -eB(r)r2, 2

(14.4.12)

is conserved. With B fixed, and the instantaneouscenter of rotation chosen as origin, a condition on the unperturbed motion is obtained by equating the centripetal force to the magnetic force: m8 = -eB,

(14.4.13)

1 20, ' Po = -mr 2

(14.4.14)

with the result that

and the action variable is (14.4.15)

SRecall that (uppercase) P stands for conjugate momentum, which differs from (lowercase) p, which is the mechanical momentum.

434

ANALYTIC BASIS FOR APPROXIMATION

It is useful to express Ze in terms of quantities that are independent of the origin using Eq. (14.4.13), (14.4.16) where u l is the component of particle velocity normal to the magnetic field. To recapitulate, u i / B is an adiabatic invariant. The important result is not that PO is conserved when B is constant, which we already knew, but that it is conserved even when B varies (slowly enough) with time. Furthermore, since the change in B is to be evaluated at the particle’s nominal position, changes in B can be due either to changes in time of the external sources of B or to spatial variation of B in conjunction with displacement of the moving particle’s center of rotation (for example parallel to B). Po is one of the important invariants controlling the trapping of charged particles in a magnetic “bottle.” This is pursued in the next section. 14.4.3. Charged Particle in a Magnetlc Trap A particle of charge e moves in a time-independent, axially symmetric magnetic field B(R). Symbolizing the component of particle velocity normal to B by w, the approximate particle motion follows a circle of radius p with angular rotation frequency wc (known as the “cyclotron frequency”). These quantities are given by mw P=-,

eB

w eB and w C = 2 n = -, 2np m

(14.4.17)

with the latter being independent of the speed of the particle. The field is assumed to be nonuniform but nor too nonuniform. This is expressed by the condition

p

y

<< 1.

(14.4.18)

This condition ensures that formulas derived in the previous section are applicable and that the particle “gyrates” in an almost circular orbit. This is also a kind of adiabatic condition in that the particle retraces pretty much the same trajectory turn after turn.The system is then known as a “magnetic trap”; the sort of magnetic field envisaged is illustrated in Fig. 14.4.3, which also shows typical particle orbits. But in general the particle also has a component of velocity parallel to B so that the center of the circle (henceforth to be known as the “guiding center”) also travels along B. This motion is said to be “longitudinal.” There will also be an even slower drift of the guiding center “perpendicular” to B. This is due to the fact that condition (14.4.18) is not exactly satisfied and the radius of gyration is least in regions where B = IBI is greatest. To describe these motions we introduce the radius vectors shown in Fig. 14.4.3a:

r=R+p.

(14.4.19)

EXAMPLES OF ADIABATIC INVARIANCE

(a) gyration

(b) plus longitudinal drift

435

(c) plus perpendicular drift

FIGURE 14.4.3. (a) Charged particle gyrating in a nonuniform magnetic field. Its longitudinal and azimuthal motion is exhibited in (b) and (c). The reduction in radius of gyration near the end of the trap is also shown.

The corresponding three velocities v = d r / d t , u = d R / d t , and w = dp/dt satisfy

v=u+w.

(14.4.20)

Presumably R and IpI are slowly varying compared to p, which gyrates rapidly. Because particles with large longitudinal velocities can escape out the ends of the bottle (as we shall see), the ones that have not escaped have transverse velocity at least comparable with their longitudinal velocity, and it is clear from condition (14.4.18) that the transverse guiding center drift velocity is small compared to the gyration velocity. These conditions can be expressed as VII = U I I ,

and UL

<< w,

and hence VL M w.

(14.4.21)

General Strategy: To start with, one will ignore the slow motion of the guiding center in analyzing the gyration. (This part of the problem has already been analyzed in Section 14.4.2, but we will repeat the derivation using the current notation and approximations.) Having once calculated the adiabatic invariant p for this gyration, it will subsequently be possible to ignore the gyration (or rather to represent it entirely by p ) in following the guiding center. This accomplishes a kind of “averaging over the fast motion.” It will then turn out that the motion of the guiding center itself can be similarly treated on two time scales. There is an oscillatory motion of the guiding center parallel to the z-axis in which the azimuthal motion is so slow that it can be ignored. This motion is characterized by adiabatic invariant 111( p ) .As mentioned already, its only dependence on gyration is through p. Finally, there is a slow azimuthal drift I l ( p , Ill) that depends on gyration and longitudinal drift only through their adiabatic invariants. As a result of this analysis, at each stage there is a natural

436

ANALYTIC BASIS FOR APPROXIMATION

time scale defined by the period of oscillation, and this oscillation is described by equations of motion that neglect changes occuring on longer time scales and average over effects that change on shorter time scales.

Gyration: According to Eq. (12.63, the components of the canonical momentum are given by

where approximations(14.4.21) have been used. The nonrelativistic Hamiltonian is (14.4.23) this is the mechanical energy expressed in terms of appropriate variables. There is no contribution from a scalar potential because there is no electric field. The gyration can be analyzed as the superpositionof sinusoidal oscillations in two mutually perpendicular directions in the transverse plane. For adiabatic invariant Z,, we can take their average:

(14.4.24) where d l l is incremental tangential displacement in the (x, y) plane, “right-handed” with the ( x , y , z ) axes being right-handed. It is therefore directed opposite to the direction of gyration as shown in Fig. 14.4.2 since B is directed along the (local) positive z-axis. Using Eq. (14.4.22), we have

-

mwp A d l l = -2

+4n

s

B + i d s , (14.4.25)

where dS is an incremental area in the plane of gyration. The first term is negative because the gyration is directed opposite to dll.The second term (in particular its positive sign) has been obtained using Stokes’s theorem and B = V x A. Using Q. (14.4.17), we get

e Bp2 I , = -4 .

(14.4.26)

This agrees with Eq. (14.4.16). I, can be compared to the “magnetic moment” p = (e2/2m)Bp2 of the orbit (which is equal to the average circulating current eoc/2rc multiplied by the orbit area np2).Except for a constant factor, and I , are identical, so we can take p as the adiabatic invariant from here on. If we regard 1.1as a vector perpendicular to the plane of gyration, then

/./,.B
(14.4.27)

EXAMPLES OF ADIABATIC INVARIANCE

437

We also note that the kinetic energy of motion in the perpendicular plane is given by

(14.4.28) Longitudinal Drift of the Guiding Center: Because of its longitudinal velocity, the particle will drift along the local field line. Since the field is nonuniform, this will lead it into a region where B is different. Because the factor Bp2 remains constant, we have p B-’/2 and (by EQ. (14.4.17)) w B’/2. Superficially this seems contradictory, since the speed of a particle cannot change in a pure magnetic field. It has to be that energy is transferred to or from motion in the longitudinal direction. We will first analyze the longitudinal motion on the basis of energy conservation and later analyze it in terms of the equations of motion. The total particle energy is given by

-

-

E = pB(R)

+ -21t r ~ u t .

(14.4.29)

Since the first term depends only on position R, it can be interpreted as potential energy. It is larger at either end of the trap than in the middle. Since both E and ,u are conserved, this equation can be solved for the longitudinal velocity

Jf

U I I= f

-(E - pB(R)).

(14.4.30)

In a uniform field, U I I would be constant, but in a spatially variable field it varies slowly. As the particle drifts toward the end of the trap the B field becomes stronger and U I I becomes reduced in magnitude. At some value ZQ, the right-hand side of Eq. (14.4.30) vanishes. This is therefore a “turning point” of the motion, and the guiding center is turned back to drift toward the center and then the other end. Perpetual longitudinal oscillation follows, but the motion may be far from simple harmonic, depending as it does on the detailed shape of B(R)-for example B can be essentially constant over a long central region and then become rapidly larger over a short end region. In any case, an adiabatic invariant 111 for this motion can be calculated (on-axis) by

(14.4.31 ) where, by symmetry (as in Eq. (14.4.9)) A, vanishes on-axis. Then the period of oscillation can be calculated using Eq. (14.3.18):

(14.4.32) Problem 14.4.1: For the long uniform field magnetic trap with short end regions mentioned in the text, use Eq. (14.4.32)to calculate the period of longitudinal oscil-

438

ANALYTIC BASIS FOR APPROXIMATION

lation TI, and show that the result is the same as one would obtain from elementary kinematic considerations.

Equation of Motion of the Guiding Center: We have still to study the transverse drift of the guiding center and in the process will corroborate the longitudinal motion inferred purely from energy considerations in the previous paragraph. The equation of motion of the particle is (14.4.33) which approximates the magnetic field by its value BJoat the guiding center plus the first term in a Taylor expansion evaluated at the same point. We wish to average this equation over one period of the (rapid) gyration which is described relative to local axes by px = w cose,

py = w sine,

w X = w sine,

w,, = -wcosO.

(14.4.34)

When Eq. (14.4.33) is averaged with all other factors held fixed, the result is du

rn- =eu x B + e ( ( w x p - V ) B ) . dt Terms with an odd number of factors of p and w have averaged out second term evaluates to

i e ((w x pa V)B) = e

(14.4.35) to

f

f

WY

0

zero. The

(14.4.36) where V . B = 0, and B,

%

B have been used. The equation of motion is therefore du m- = eu x B - V(gB). dt

(14.4.37)

When applied to the longitudinal motion of the guiding center, the final term can be seen to be consistent with our earlier interpretation of p B as a potential energy. Furthermore, the only influence of gyration is through the parameter p. The magnitude of the magnetic field presumably falls with increasing R. This causes the gyration to be not quite circular, with its radius increased by Ap when the field is reduced by A B : (14.4.38)

ACCURACY OF CONSERVATIONOF ADIABATIC INVARIANTS

439

Along with the cyclotron frequency wc/n this can be used to estimate the ration of the transverse drift velocity U L to w: UL -

(&/2X)AP

W

W

(wc/2n)(aB/ar)P2 1 P aB - -1 P , (14.4.39) - -_wB 2n B ar 2n RtYP

where Rtyp is a length of the order of the transverse dimensions of the apparatus. Since typical values of the cyclotron radius are much less than this, and since u 11 and w have comparable magnitudes, our estimate shows that UL

<< w ,

and U L

<< U I I .

(14.4.40)

There will nevertheless be a systematic azimuthal motion of the guiding center on a circle of some radius RL centered on the axis. Let the angular frequency of this motion be W I . We then have WL

<< WII << wc.

(14.4.41)

By a calculation just like that by which Zg was calculated, an adiabatic invariant can also be obtained for this perpendicular drift: 11 = -mu1

2

(1-2)

+--eBR: - eR:B 4

4

(14.4.42)

In practical situations the second term is negligible and we conclude that the third adiabatic invariant ZL is proportional to the magnetic flux linked by the guiding center as it makes a complete azimuthal circuit.

*14.5. ACCURACY OF CONSERVATION OF ADIABATIC INVARIANTS9 In order to estimate the accuracy with which the action is invariant as a parameter changes, we continue to analyze the oscillator discussed in Section 14.3.4, but with a specific choice of time variation of the natural frequency A@), namely

(14.5.1) As sketched in Fig. 14.5.1,this function has been carefully tailored to vary smoothly from w~ = wo - Am at -00 to oz = wg Am at 00, with the main variation occurring over a time interval of order l / a , and with a = ( ~ / w l )The ~ . adiabatic condition is

+

<< 1 . 9This section can be skipped without loss of continuity.

(14.5.2)

440

ANALYTIC BASIS FOR APPROXIMATION

FIGURE 14.5.1. Prototypical adiabatic variation: The natural frequency A(t) of a parametric oscillator varies from 01 at -w to 02 at w, with the time range over which the variation occurs being of order 11..

With definite parametric variation given as in Eq. (14.5.1). since the action-angle equations of motion (14.3.34) are exact, if they can be solved it will supply an estimate of the accuracy with which I is conserved. The second of Eqs. (14.3.34) yields the deviation of the instantaneous angular frequency from A ( t ) during one cycle, but averaged over a cycle this vanishes and the angle variable satisfies (14.5.3)

As shown in Fig. 14.5.2, the variable rp increases monotonically with t and at only a slowly changing rate. We will change the integration variable from t to bp shortly. In integrating an oscillatory function modulated by a slowly varying function, the frequency of the oscillation is not critical, so for estimation purposes we accept Eq. (14.5.3) as an equality. Assuming this variation, one can then obtain the A1 = I(+w)- I ( - o o ) by solving the first of Eqs. (14.3.35). Substituting from the first of Eqs. (14.3.34) and changing the integration variable from r to tp, we obtain (14.5.4) Here I has been moved outside the integral in anticipation of it being shown to be essentially constant. With A ( t ) given by Eq. (14.5.1), (14.5.5)

When substituting this expression into the integral, it is necessary to replace t by t (rp). We will approximate this relation by t = ((p - @/Z,where @ and 55 are parameters to be determined by fitting a straight line to the variation shown in Fig. 14.5.2.

ACCURACY OF CONSERVATIONOF ADIABATIC INVARIANTS

I

441

slope = o2

FIGURE 14.5.2. Dependence of angle variable cp on time t as the natural frequency of an oscillator is adiabatically varied from 01 to y.

The integral becomes

The integrand has been made complex to permit its evaluation using contour integration as shown in Fig. 14.5.3.Because of the e2ip factor and the well-behaved nature of the remaining integrand factor, there is no contribution from the arc at infinity. Also, the integration path has been deformed to exclude all poles from the interior of the contour.

FIGURE 14.5.3. Contour used in the evaluation of integrals in Eq. (14.5.6). The integrals are dominated by the pole with smallest imaginary part, Imcpo.

442

ANALYTIC BASIS FOR APPROXIMATION

Our purpose is to show that the integral in Fq. (14.5.6) is “small.” Because it is difficult to accurately evaluate the contributions of the contour indentations, this demonstration would be hard to carry out if it depended on the cancellation of the two terms, but fortunately the terms are individually small. One can confirm this by looking up the integrals in a table of Fourier transforms, such as Oberhettinger [ 11. Alternatively, continuing to follow Landau and Lifshitz, the integral can be estimated by retaining only the dominant pole. Since the contour is closed on the side of the real axis for which the numerator factor eZiV is a decaying exponential, the integral is dominated by the singularity having the least positive imaginary part; the exponential factor strongly suppresses the relative contribution of the other poles. The first term of Eq. (14.5.6)has a pole for

(14.5.7)

Since this is just an estimate, the precise value of 5 does not matter, but it is approximately the smaller of w1 and O.Q. By virtue of the adiabatic condition, the ratio in (14.5.7) is large compared to 1. As a result, apart from the other factors in Eq. (14.5.6), the deviation A1 acquires the factor (14.5.8)

This factor is exponentially small, and the other term of Eq. (14.5.6) gives a similarly small contribution. Roughly speaking, if the rate of change of frequency is appreciably less than, say, 1 cycle out of every 10 cycles, the action I remains essentially constant. For more rapid change it is necessary to calculate more accurately. For slower variation, say 10 times slower, the approximation becomes absurdly good.

14.6. CONDITIONALLY PERIODIC MOTION It may be possible to define action-angle variables even in cases where no multiple time scale approximation is applicable [2]. (An example of this is the threedimensional Kepler satellite problem. This example is not entirely typical however, because the orbit is closed and as a result all three independent momentum components vary periodically with the same frequency.) In fact, action-angle variables can be defined for any oscillatory multidimensional system for which the (timeindependent) Hamilton-Jacobi equation is separable. The basic theorem on which this approach is based is due to Stackel. In this section, unless stated otherwise, use of the summation convention will be suspended.

CONDITIONALLY PERIODIC MOTION

443

14.6.1. Stackel's Theorem Let the system Hamiltonian be r

n

(14.6.1) where the Ci are "inverse mass functions" of only the coordinates q' so pi = q'/ci. The time-independent Hamilton-Jacobi equation, assumed to be separable, is (14.6.2) where the first separation constant a1 has been taken to be the energy E. Let the complete integral (assumed known) be given by

so = s(1) (q1 ; a)+ P ( q 2 ; a)+ . . . + S(")(q";a),

(14.6.3)

where a stands for the the full set of separation constants u1,u2, . . . , u,,,but the individual terms each depend on only one of the 4'. Differentiating Eq. (14.6.2) partially with respect to each uj in turn yields (14.6.4) Because So has the form (14.6.3), the function ui, (9') introduced as an abbreviation for aSo/aqia2So/aa,aqi is a function only of 4'. and the same can be said for all j. Rearranging Fq. (14.6.2) and exploiting the expansion of Slj given by Eq. (14.6.4) yields an expansion for the potential energy

where the newly introduced functions wi are also functions only of 4'. This has shown that separability of the Hamilton-Jacobi equation implies this related separability of V-a superposition with the same coefficients ci as appear in the kinetic energy multiplying functions wi that depend only of q'. Substituting back into Eq. (14.6.2) and using Eq. (14.6.4) again, the Hamilton-Jacobi equation can therefore can be written as (14.6.6) Defining f;:(qi) = 2(C;=,t r j u i j ( 4 ' ) - w i ( q i ) ) , the individual terms in So must satisfy

444

ANALYTIC BASIS FOR APPROXIMATION

(14.6.7) Each of these, being a first-order equation, can be reduced to quadratures: (14.6.8)

Then, according to Hamilton-Jacobi theory, the momentum pi is given by (14.6.9) When this equation is squared to yield p: = fi (q'),it resembles the conservation of energy equation for one-dimensional motion, with fi(q') taking the place of total energy, which is constant, minus potential energy. The function fi ( 4 ' ) can, therefore, be called "potential energy-like." The corresponding velocities are given by

4'

(14.6.10)

= cipi = hci(q)J f i ( q i ) $

where the second factor depends only on q' but the first depends on the full set of coordinates q.

Problem 14.6.1: With a Hamiltonian known to have the form given by Eq. (14.6.1) and the potential energy function V therefore necessarily having the form given by Eq. (14.63, write the Lagrange equations for the q' variables; use the same functions uij (4') and W i (4') as were used in the proof of Stackel's theorem. Then show that a matrix with elements V i j (9)can be found such that the quantities --+w,)

for j = 1 ,

..., n,

(14.6.11)

j=1

are first integrals of the Lagrange equations. 14.6.2. Angle Variables

Equation (14.6.9) is amazingly simple. Its close similarity to the conservation of energy equation in one dimension implies that the motion in each of the ( q i , p i ) phase space planes is a closed orbit that repeats indefinitely as time goes on. This is illustrated in Fig. 14.6.1. The middle figure shows the function f1(q1) for any one of the coordinates (taken to be q' ) and the right figure shows the corresponding q l , p1 phase space trajectory. It is easy to overestimate the simplicity of the motion, however, for example by incorrectly assuming that the time taken in traversing this phase space orbit once will be the same for all subsequent traversals.

CONDITIONALLY PERIODIC MOTION

445

b

a]

b

a2

'-'

42

FIGURE 14.6.1. Sample relationships among phase space and regular space orbits and the "potential energy-like function" in more than one dimension.

In fact, if we use Eq. (14.6.10)to find the period we obtain 4'

dq"

%4

ci (4')

(14.6.12) '

Since the integrand depends on all the coordinates, the period in any one q ' , pi plane depends on the motion in all the others. The sort of motion that is consistent with this motion is shown in the left-most portion of Fig. 14.6.1, which shows the motion in the q ' , q2 plane. With the q 1 motion limited to the range u1 to b', and q2 limited to u2 to b2, the system has to stay inside the rectangle shown. The motion shown is started with both coordinates at one extreme. But depending on the relative rates of advance of the two coordinates, there are an infinity (of which only two are shown) of possible routes the system trajectory can take. The only simple requirement is that the trajectory always "osculates" the enclosing rectangle as it reaches its limiting values. Each pair ( q l , p i ) , (q2,p2), .. ., (q", p n ) lives a kind of private existence in its own phase space, repeatedly following the same trajectory without reference to time. In this respect, the motion resembles the motion of a one-dimensional mechanical system. If the motion is bounded, it takes the form of librurion, as illustrated in Fig. 14.6.1, and this can be represented as rotation as in Problem 1.2.3. Because of the Hamilton-Jacobi separability, this representation is especially powerful.

446

ANALYTIC BASIS FOR APPROXIMATION

In Section 14.3.3, the one-dimensional action function So was used as a generator of canonical transformation. We now do the same thing with SO as it is given in Eq. (14.6.3). But first we will replace the Jacobi momenta a by a set I = I 1 , 1 2 , . . . , In, which are action variables defined for each of the phase space pairs (q’, p i ) that have been under discussion: (14.6.13) where the integration is over the corresponding closed phase space orbit. As in the one-dimensional treatment, we express the generating function in terms of these action variables:

c n

So =

s q q ; ; I).

(14.6.14)

;=I

The “new momenta” are now to be the I, and the “new coordinates” will be called (pi. The new Hamiltonian H = H(1) must be independent of the pi in order for the first Hamilton equations to yield

aH

li = --

avi

=o,

i = 1 , 2 ..., n,

(14.6.15)

as must be true since the li are constant. The “angle variables” V j are defined by (14.6.16) The Hamilton equations they satisfy are (14.6.17) since H = E. These integrate to (pi

=a E ( l )t

ali

+ constant.

(14.6.18)

Though new variables I, cp have been introduced, the original variablesqi in terms of which the Hamilton-Jacobi equation is separable are by no means forgotten. In particular, by their definition in Eq.(14.6.13), each I; is tied to a particular qi. and if that variable is allowed to vary through one complete cycle in its qi , pi plane with the other q j held fixed, the corresponding angle change Acpi is given by (14.6.19)

CONDITIONALLY PERIODIC MOTION

447

Remember, though, that this variation is entirely formal and does not refer to an actual motion of the entire system. The transformation relations pi = pi (9, I) therefore have a rather special character. It is not as simple as pi depending only on q’, but should all the variables return to their original values, like the phase of a one-dimensional simple harmonic oscillator, pi can only have changed by a multiple of 2rr. Stated alternatively, when the angles pi (q, p) are expressed as functions of the original variables q, p, they are not single-valued, but they can change only by integral multiples of 237 when the system returns to its original configuration.For this reason the configuration space is said to be “toroidal,” with the toroid dimensionality equal to the number of angle variables and with one circuit around any of the toroids, cross sections corresponding to an increment of 2n in the corresponding angle variable. The greatest power of this development is to generalize to more than one dimension the analysis of not quite time-independent Hamiltonian systems discussed in Section 14.1.1. If this time dependence is described by allowing a previously constant parameter A ( t ) of the Hamiltonian to be a slowly varying function of time, the earlier analysis generalizes routinely to multiple dimensions. The “variation of constants” equations are

(14.6.20) The strategy for using these equations perturbatively has been explained earlier. 14.6.3. ActiodAngle Coordinatesfor Keplerian Satellites All this can be illustrated using the Kepler problem. We now pick up the analysis of Kepler orbits where it was left at the end of Section 11.2.8, with Jacobi mo, a3 and coordinates PI, j32,83 having been introduced and related to menta o r ~ or2, coordinates r. 8, r#~ and momenta p r , pe, p+. Substituting from Eq. (1 1.2.48) into Q. (14.6.13), we obtain

I# = 2n # P m d d = a 3 .

(14.6.21)

Similarly,

and Ir

1 = --J-2mE 2rr

dr J ( r - ro)(rn - r ) - = a1 - a2, r

(14.6.23)

448

ANALYTIC BASIS FOR APPROXIMATION

where the subscripts 0 and n indicate the minimal and maximal radial distances (at the tips of the major axis) and are values of u in the formula (1.2.34)r = a -a€ cos u , giving r in terms of the “eccentric anomaly angle” u as shown in Fig. 1.2.10.Transforming the integration variable to u used the relations

J(r - ro)(rx - r ) = uc sin u and dr = Q E sin u d u ,

(14.6.24)

and the integration range is from 0 to 2n. Notice that

(14.6.25) which can be reordered as

(14.6.26) Two immediate inferences can be drawn from this form. According to Eq.(14.6.18) the period of the oscillation of, say, the r variable is

(14.6.27) The second inference is that TO and T#, calculated the same way, have the same value. The equality of these periods implies that the motion is periodic and vice versa.

BIBLIOGRAPHY References I . F.Oberhettinger, Tables of Fourier Transforms and Fourier Transforms of Distributions, Springer-Verlag, Berlin, 1990. 2. L. A. Pars, Analytical Dynamics, Ox Bow Press, Woodbridge, CT.1979.

References for Further Study Section 14.3.2 L. D. Landau and E. M. Lifshitz, Mechanics, Pergamon, Oxford, 1976.

Section 14.4.3 A. J. Lichtenberg, Phase Space Dynamics of Particles, Wiley, New York, 1969.

15 LINEAR SYSTEMS

Many systems, though time-dependent, are approximately Hamiltonian and approximately linear. Before facing other complications like nonlinearity it is therefore appropriate to find results applicable to systems that are exactly linear, though possibly time-dependent. Many of the important results have already been given in Chapter 13. Here we will study only some of the more important further properties of such systems. Many more are given in the two books listed at the end of the chapter. The book by Yakubovitch and Starzhinskii [I] is praised faintly by the authors of the other book, Meyer and Hall [2], as being “well-written but a little wordy,” which I think means “mathematically valid but not concise.” This is actually a good combination for a physicist, especially when the fully worked examples are taken into account.

15.1. LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS Consider two identical interacting particles that are moving more or less parallel to the z-axis (to be called “longitudinal”) with their transverse coordinates being (XI, y l ) and (x2, y 2 ) (x to be called “horizontal,” y “vertical”). The particles are held in the vicinity of the z-axis by linear restoring forces. The equation governing one of the coordinates, say horizontal, of one of the particles, say particle 1, is (15.1.1) where the first term describes pure simple harmonic motion at natural frequency p due to the linear restoring force. The second term describes “cross-plane coupling” 449

450

LINEAR SYSTEMS

between the two coordinates of the same particle. The magnitude of this coupling is governed by the constant p E , with the factor c~ included for later dimensional convenience. The third term describes the interparticle force. Its strength is specified , the p is again included only for dimensional convenience, but the by ~ W Iwhere factor I is to be regarded as a dimensionless “external control parameter” that multiplies the constant interaction strength W. The range of possible qualitative effects of the interparticle force can be investigated by varying i from small to large values. All terms described so far are both linear and time-independent.The remaining term R,1, though possibly time-dependent, is assumed to depend only linearly on ( x i , y1) and (x2, y2), at least for sufficiently small excursions of the transverse coordinates. This term is temporarily parked on the right-hand side of the equation (waiting to be discussed in a later section) while the constant terms on the left-hand side are being processed. This is done to illustrate how manipulation of the constant terms can complicate, or at least alter, the time-dependent terms. In summary, Eq.(15.2.1) is linear, though R,1 may later be permitted to be nonlinear. These equations are appropriate for describing two bunches of high-energycharged particles (each with current i ) in an accelerator, but to be more picturesque, and to avoid getting side-tracked into relativistic irrelevancies, we can think of them as describing the motion of two airplanes or race cars or whatever, flying in close formation or racing approximately side by side. The term p Wixz represents the “wake force” the first car feels because of the presence of the second car, and so on.‘ Any system of two interacting objects, to some lowest (linearized) approximation, will be described by equations like (15.1.1). Here we are merely using them as typical example equations on which to base the subsequent discussion. There are similar equations for y1, n2, and y2. These equations are not intended to be completely general. Rather they are simple enough to be tractable while at the same time being general enough to illustrate the problems that typically arise in Hamiltonian or near-Hamiltonian motion. The dimensionalityn = 4 is large enough to be far from trivial and to illustrate the most important Hamiltonian properties. Defining matrices

the equations of motion are

( ,“:2 + ) 1-

P‘ x = R’(t).

(15.1.3)

‘Wake forces are known to give a following car or bike racer something of a “free ride.” This is the sort of thing equations like these are useful for, though, Eq. (15.1.1) describes transverse effects and does not include description of the longitudinal motion. We will make no attempt to obtain practical results in this area.

LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS

451

The constant X and the product Wi will be treated as at least potentially “small,” again in spite of the fact that the linearity of the equations causes their most essential properties to be independent of these magnitudes. Taking R’ = 0, in the limit X 3 0 and i +. 0, the four coordinates are uncoupled and the motions degenerate to four independent simple harmonic motions. Since all four natural frequencies are equal, we are faced in this limit with degenerate eigenvalues-a situation we prefer to continue to ignore. To avoid the rather sophisticated algebra (Jordan normal forms) that is required to treat this limit in closed form, we will therefore treat the constants X and Wi,though small, as nonvanishing. The natural manipulation for Eq. (15.1.3) is to seek a transformation from the ~ “normal mode components” e = original coordinates x = (XI, y1, x2, ~ 2 to)new (el, e2, e3, e4)T via a matrix G:

e = G-’X,

x = Ge,

R = G-’R’.

( 15.1.4)

With these relations, Eqs. (15.1.3) become

(2 ) 1-+P

e=R,

(15.1.5)

where G is to be chosen so that

P = G-’P‘G = diag(p:, p i , p:, p i , ).

(15.1.6)

At this point, continuing to neglect the right-hand side of the equations, the equations are uncoupled and the “eigenmode” frequencies (squared) are (except for factor the elements along the diagonal of the “potential energy” matrix P.In terms of the new basis vectors, x is given by

4)

1

x = e e l + e 2 e 2 + e 3 e 3 + e 4 e4.

(15.1.7)

For these manipulations to be valid, it is not necessary for the off-diagonal constants to be small, but “weak coupling” is an extremely common circumstance and we mainly wish to illustrate the breaking of the degeneracy before advancing to the consideration of time-dependent terms. To be able to determine the matrix G in closed form, we therefore assume C << p and Wi << p and find an eigenvector basis in that limit to be

(15.1.8) where the elements are the XI, y1, x2, y2 components. This set is not uniquely determined but, once fixed, the transformation matrix that corresponds to it is given

452

LINEAR SYSTEMS

mode number

2

Pi , (normalized) mode frequency

p+z+tw

p+Z-tw

______-_---___-__

k

P-I;+tw

p -Z-lw

1

control

parameter

FIGURE 15.1.1. Labeling of the eigenmodes and dependence of eigenvalues on “control parameter” l. To break the degeneracy, the unperturbed system is taken at slightly positive I .

by

G=i(i

;’ 1

1

-1

-1 -1

1’) -1

( 15.1.9)

(which is built from the basis vectors (15.1.8) as columns). As a consequence of this choice of basis vectors,

and P is given by

P = pdiag(g+E+rW,

g-C+iW,

g + C - - ~ w , p-Xd-lW).

(15.1.11)

The dependencies of the eigenvalues (diagonal elements) on the parameters are illustrated in Fig. 15.l. l.

*15.2. TREATMENT OF VELOCITY TERMS2 There are at least two ways that Eqs. (15.1S ) could also have acquired “first derivative terms” to become

(15.2.1) One way was considered in Problem 1.2.10,where the term in question came from frictional damping forces. (In electronics such terms come from resistors according *This section can be skipped without loss of continuity.

TREATMENT OF VELOCITY TERMS

453

to Ohm’s law.) From experience one knows that “lossy” terms like this will make the system non-Hamiltonian, whether of not this is true in their absence. Another way first derivative terms can enter was encountered while discussing the sudearthlmoon problem using a rotating coordinate system. See for example Eq. (8.1.40):

Forces of this second type are known as “gyroscopic.” Since these terms entered only through a change of coordinates in a Lagrangian system, they did not make the system non-Hamiltonian. From these examples it is clear one cannot say automatically whether or not the terms Qd/dt destroy the possibility of expressing Eqs. (15.2.1) in Hamiltonian form, and we also do not as yet know under what conditions on the coefficients of P the equations are Hamiltonian. Hamiltonian conditions on Q will next be derived and the (largely successful) attempt will be made to eliminate the Qdldt terms from the equations. In Eq.(13.4.14), Hamilton’s equations were written in matrix form as

The notation has been altered a bit to match that in current use, and our present purpose is to find the conditions under which Eqs. (15.2.1) can be written in the form of Eqs. (15.2.2). In this section, H will be assumed to be a quadratic function of the coordinates z (with possibly time-dependent coefficients) so that it yields linear equations. With little loss of generality, coordinates can be used such that the kinetic energy is a sum of squares (with all “mass” coefficients equal to l), and the Hamiltonian can then be written as 1 H(z, t ) = pi2 2 i

1

+ -e‘P(t)e 2

1 2

= -zTH(t)z,

(15.2.3)

where (15.2.4) By its definition as the matrix of a quadratic form, P(t) is necessarily symmetric; its elements possibly depend on t but not on z. Here the dimensionality is being artificially restricted to n = 4, large enough to demonstrate most of the important results. Since half of the Hamilton equations imply p = e, the full set can be written as dz -- -SH(t)z. dt

(15.2.5)

454

LINEAR SYSTEMS

Digression: Yakubovitch and Starzhinskii [ 11 define a class of differential equation considerably more general than Eq. (15.2.5):

--

dx - = -SH(t)x, dt

(15.2.6)

where is any antisymmetric matrix with real constant coefficients and 6(z, t ) is any symmetric matrix with possibly time-dependent real coefficients. They further show (in their Eq. (2.8) on p. 176) that a real transformation matrix T can always be found such that u

s =T~ST.

(15.2.7)

This implies that Eqs. (15.2.6) can be transformed into Eqs. (15.2.2) by the substitutions x = Tz and

H(t) = (T-’)Tfi(t)T.

(15.2.8)

We will not exhibit the proof here since the equivalent transformation will be exhibited explicitly in the examples. Leaving the digression, if we assume that Q is skew-symmetric (which is sometimes also called “antisymmetric”) and constant and furthermore include the timedependent terms R(t) from IZq. (1 5.2.1) in P ( t ) (which must therefore be allowed to be time-dependent), then Eq. (15.2.1) can be written as

(15.2.9)

and this equation can be written in Hamiltonian form with the Hamiltonian given by ( 15.2.10)

provided the dependent variable is expressed as (15.2.1 1)

This can be confirmed by substituting these expressions into Eq. (15.2.5). In general, the matrix Q can be written as the sum of symmetric and skewsymmetric parts Q = QsYm Qskew.The substitutions leading to Eq. (15.2.9) can be performed whether or not Q is antisymmetric, but one sees that the contribution from Qsym cancels, and hence the terms Qsymmde/drin Eq. (15.2.1) cannot have come from the Hamiltonian formalism. On the other hand, this manipulation has shown that the term Qskewde/dtis consistent with the equations being Hamiltonian. As mentioned above, forces that can be handled this way are known as “gyroscopic.”

+

TREATMENT OF VELOCITY TERMS

455

Many mechanical systems are characterized by weak or vanishing forces of the type Qsy-, and this is the case of greatest importance from the point of view of pure mechanics. The forces are weak because the systems are almost lossless and these terms are likely to modify only slightly those relationships that must be rigorously satisfied by Hamiltonian systems-such as the requirement that the determinant of the “transfer matrix” must be 1. If these terms are in fact small, they can be studied profitably using perturbation theory; this is developed in the next chapter. Commonly such systems are not effectively studied by brute force numerical integration of their equations of motion; this is because over long periods of time such numerical solutions inevitably deviate gradually from the true motion, and these deviations may violate rigorously assured Hamiltonian properties. If forces of the type Qsym are not small, they tend to wash out the Hamiltonian features. Such systems tend to “shrink” or “expand” or otherwise behave in non-Hamiltonian ways. Commonly it is effective to attack such systems by numerical approximation and solution of their differential equations. Even if small errors are made in processing the fully Hamiltonian terms, these errors are likely to have negligible effect relative to the effects of non-Hamiltonian terms. It is rarely possible to ignore “gyroscopic” terms coming from Qskew,and there is commonly little reason for them to be “small.” If possible they should therefore be “transformed away,” for example as described above or as in the next problem. In the derivation given in this section it has been assumed that Q is a constant matrix, but the following problem obtains essentially the same result for timedependent Q(t).

Problem 15.2.1: Perform a coordinate transformation of the form (15.2.8)

x=Tz=T( y+

’pY) ’

(15.2.12)

with T given by ( 15.2.13)

to the equations obtained from the Hamiltonian given by

(15.2.14) Because Q(t) is assumed to be skew-symmetric, H(t) is symmetric. Show that the equations take the form given in Eq. (15.2.6) with

(15.2.15) According to the theorem quoted in the text below Eq. (15.2.7), the term coming from Q(t) therefore leaves the system Hamiltonian if Q(t) is a symmetric, possibly time-dependent matrix.

456

LINEAR SYSTEMS

15.3. LINEAR HAMILTONIAN SYSTEMS

We have seen that under fairly general conditions linear Hamiltonian systems can be described by equations of the form (15.2.l), which can be reduced to the form

(

1- +P(t) it:

)

e =O.

(15.3.1)

All time-dependent terms have been lumped into P(t), and velocity-dependent terms have been transformed away or dropped. Furthermore, as in Eq. (15.2.5),these equations can be written in Hamiltonian form as 2n equations for the unknowns arrayed as a column vector z = (e, eF:

dz - = A(t)z, dt

( 1 5.3-2)

where

It is possible to group independent solutions zl ( t ) , z z ( t ) , . . . of Eq. (15.3.2)as the columns of a matrix Z ( r ) = (21 ( t ) z 2 ( t ). . .), which therefore satisfies

dZ

- = A(t)Z.

dt

(15.3.4)

The matrix Z can have as many as 2n columns; if it contains 2n independent solutions, it is known as a “fundamental matrix solution” of Eq. (15.3.2).The most important matrix of this form is the “transfer matrix” M(t) formed from the unique set of solutions for which the initial conditions are given by the identity matrix 1:

M(0) = 1.

(15.3.5)

Such transfer matrices were already much employed in Subsection 10.1.4.If the initial conditions to be imposed on a solution of Eiq. (15.3.1)are arrayed as a column z(0) of 2n values at t = 0, then the solution can be written as z ( t ) = M(t)z(O).

(15.3.6)

For some purposes it is useful to generalize transfer matrix notation to M(tf, ti), letting M depend on both an initial time ti and a final time r f . Then Eq. (15.3.6) can be manipulated into the form z ( t ) = M(t, O)z(O)

= M(t, t’)M(t’, O)z(O),

(15.3.7)

where t’ is an arbitrary time in the range 0 5 t’ 5 t. This again illustrates that “concatenation” of linear transformations is accomplished by matrix multiplication.

LINEAR HAMILTONIAN SYSTEMS

457

15.3.1. lnhomogeneousEquations Commonly Eqs. (15.3.2) are modified by inhomogeneous terms, perhaps due to external forces; these terms can be arrayed as a 2n-element column matrix, and the equations are

dz dt

- = A(t)z + k(t).

(15.3.8)

Such terms destroy the linearity (the constant multiple of a solution is in general not a solution), but the transfer matrix can still be used to obtain a solution satisfying initial conditions z(0) at t = 0. The solution is (15.3.9) This can be confirmed by direct substitution.

15.3.2. Exponentiation,Diagonaliration,and Logarithm Formation of Matrices Suppose the elements of matrix A in Eq. (15.3.2) are constant:

dz - = Az. dt

(15.3.10)

Formally the solution to this equation with initial values z(0) is z = eA'z(0).

(15.3.1 1)

The factor eArcan be regarded as an abbreviation for the power series (15.3.12) in which all terms are well-defined. Then, differentiatingterm-by-term, Eq.(15.3.10) follows. It is not hard to persuade yourself that these manipulations are valid in spite of the fact that At is a matrix. If independent solutions of the form (15.3.11) are grouped as columns of a matrix Z, the result is

z = eA'Z(0).

( 15.3.13)

In particular, for Z(0) = 1 the matrix Z becomes the transfer matrix M and Eq. (15.3.13) becomes

M = eAr.

(15.3.14)

It is similarly possible to define the logarithm of a matrix. Recall that the logarithm of complex number z = rei&is multiply defined by Inre'@= Inr + i4

+ 2nim,

(15.3.15)

458

LINEAR SYSTEMS

where m is any integer. For the logarithm to be an analytic function, it is necessary to restrict its domain of definition. Naturally, the same multiple definition plagues the logarithm of a matrix. To keep track of this it is all but necessary to work with diagonalized matrices. This makes it important to understand their eigenvalue structure, especially because the eigenvalues are in general complex. But for problems that are “physical” the elements of A are real, and this restricts the range of possibilities. Because the eigenvalues are complex the eigenvectors must be permitted also to have complex elements. There is a way, though, in which the complete generality that this seems to imply is not needed. It is possible to require the basis vectors el, e2, . . ., to have real components while allowing vectors to have complex expansion coefficients. For example, a complex vector u may be expressible as ale1 a2e2 . . . where the coefficients aj are complex. The complex conjugate of u is then given by

+

u* =

+

+ ale;!+ . . ..

It is not necessarily possible to restrict basis elements to be real in this way if vectors are permitted to have arbitrary complex elements--consider for example a two-dimensional space containing both (1, 1) and ( i , 1). But if a vector space is sure to contain u* when it contains u, a real basis can be found. All possible arrangements of the eigenvalues of a symplectic matrix have been illustrated in Fig. 13.4.1. Since the eigenvalues are either real or come in complex conjugate pairs, the complex conjugate of an eigenvector is also an eigenvector. It follows that basis vectors can be restricted to be real (see Meyer and Hall [ l , p. 471 or Halmos [3] for further explanation). Returning to the transfer matrix M,because it is known to be symplectic, according to Eq. (13.4.12) it satisfies

M~(~)sM= ( ~s. ) Substituting from Eq. (15.3.14), differentiating this equation with respect to r , and canceling common factors yields the result

ATS = -SA.

(15.3.16)

A constant matrix A satisfying this relation is said to be “infinitesimally symplectic” or “Hamiltonian.” This equation places strong constraints on the elements of A. (They resemble the relations implicit in Eq. (13.4.12) that must be satisfied by any symplectic matrix.) With the elements ordered as in Eq. (13.4.43), condition (15.3.16) becomes

(15.3.17) These conditions reduce to the requirements that B, and C, be symmetric and AT = -D,. It is sometimes possible to diagonalize (or block diagonalize) a real Hamiltonian matrix A by a similarity transformation A! = R-’AR

( 15.3.18)

LINEAR HAMILTONIAN SYSTEMS

459

using a matrix R that is also real, even when the eigenvalues are complex. The general strategy is to simplify the factor AR in this equation by building R from column vectors that are eigenvectors of A. One can consider, one by one, the various possible eigenvalue arrangements illustrated in Fig. 13.4.1.

Real, Reciprocal Eigenvalues: If A. = eff and l/h = e-ff are eigenvalues of eA, then a! and -a! are eigenvaluesof A. Taking A to be a 2 x 2 matrix, let the eigenvectors ; satisfy be x- = ( x - , p - ) T and x+ = (x+, P + ) ~they and AX+ = ax+.

AX- = -ax-,

( 15.3.19)

Let the symplectic product (defined in Eq. (13.4.44)) of the two eigenvectors be

[-,+I = x - p + - p-x+, and build R from columns given by x- and x+/[-,

(15.3.20)

+I: (15.3.21)

Direct calculation shows that

A'=(-"o

')

a !

(15.3.22)

as required.

Pure Complex, Complex-ConjugatePairs: Consider the pair of eigenvalues k i p with the first eigenvector being x = u iv, where u and v are independent and both real. Since we have both

+

Ax = ipx,

and Ax* = -ipx*,

(15.3.23)

A U = - ~ V and A v = ~ u .

(15.3.24)

it follows that

If the symplectic product [u, v] is positive, build R according to U

R

=

(

V

m

m>

(15.3.25)

Direct calculation shows that

(15.3.26) as required. If necessary to make it positive, change the sign of the symplectic product before taking the square root.

460

LINEAR SYSTEMS

Quartet of Two Complex Conjugate Pairs: Consider a quartet of complex eigenvalues f y f i s . According to Eq.(15.3.17),a 4 x 4 Hamiltonian matrix reduced to block-diagonal form must have the structure

(15.3.27) and the 2 x 2 real matrix D’, must have the form

3

(15.3.28)

in order for A to have the correct overall set of eigenvalues. Meyer and Hall show that the transformation matrix R accomplishing this is real, and the manipulations in Section 13.4.3 explicitly performed an equivalent real diagonalization.

Pure Diagonalization: If one insists on pure diagonalization rather than block diagonalization, it is necessary for the matrix R to have complex elements. This will be the procedure of choice in much of the next chapter because it is so much simpler Letting the to work with purely diagonal matrices. Let the eigenvalues of A be eigenvalue x = u iv, with u and v both real, Eqs. (15.3.24)are applicable. The symplectic product of x and x* is given by

&is.

+

[x, x*l = -2i[u, v].

(15.3.29)

Build R according to

R=

1

-( x I[U, vll

X*).

(15.3.30)

Direct calculation shows that

(15.3.31) After diagonalization in ways such as the examples have illustrated, it is possible to find the logarithm of a matrix by taking the logarithms of the diagonal elements. In most cases the logarithm of a real matrix can be taken to be real.

15.3.3. Eigensolutions A well-known approach is to seek solutions of Eq. (15.3.10)in the form z(t)

= eAra,

(15.3.32)

where h is a number and a is a column vector to be obtained. Substitution into Eq. (15.3.10)yields

Aa = ha.

(15.3.33)

LINEAR HAMILTONIAN SYSTEMS

461

The possible values of A. and the corresponding vectors a are therefore the eigenvalues and eigenvectors of the matrix A. A11 eigenvalues and eigenvector elements can be complex. We are to a large extent retracing the mathematics of normal mode description. But the present case is not quite identical to that of Problem 1.2.6 or Section 15.1 because it concerns first-order equations so complex eigenvalues and eigenvectors are more prominent. For simplicity, we assume the eigenvalues are all distinct, if necessary forcing this to be true as in Section 15.1; a set of 2n independent solutions is therefore

zi = e 4 1 ai .

( 1 5.3.34)

Transformation to these eigenvectors as basis vectors proceeds in the well-known way. If this has been done already, the matrix A is diagonal:

A = diag(A1, A2, . . . , A z n ) .

(15.3.35)

The characteristics of the eigenvalues of a Hamiltonian matrix A were discussed in the previous section. In particular, if any one of them is not purely imaginary, then either it or its “mate” yields a factor eAir having magnitude greater than 1. By Eq. (15.3.34), the motion would then diverge after a long period of time. Furthermore, this would be true for any (physically realistic) initial conditions, since the initial motion would contain at least a tiny component of the divergent motion. With A in diagonal form as in (15.3.35),one easily derives the “Liouville formula” det IeAr(= eltrA,

( 1 5.3.36)

and this result can be manipulated to derive the same formula whether or not A is diagonal.

15.3.4. Eigenvectorsof a Linear Hamiltonian System In the next chapter, we will need the explicit eigenvectorsof the Hamiltonian matrix. Defining them now will serve as an example illustrating the preceding formulas and those of Section 2.6. Consider the matrix A appearing in Eq. (15.3.10). Changing its symbol to C (for convenience in the next chapter) and letting its eigenvectors be partitioned as c = (a, b)T,using Eq. (15.3.3), its eigenequation is

(15.3.37) This equation requires

b = Aa.

( 15.3.38)

462

LINEAR SYSTEMS

Assuming the coordinates have been chosen to diagonalize the matrix P as in

%. (15.1.1I), the eigenequation reduces to

(15.3.39)

As a result, the eigenvalues of C are

and the corresponding eigenvectors are

(15.3.41)

where

(15.3.42)

Later it will be useful to abbreviate the aj also into upper and lower partitions as (15.3.43)

In labeling the 1)'s we have used the abbreviation & j E f i p , where the pj are assumed to be real and positive. The normalizing coefficients will be justified shortly. Using Eq. (13.4.lo), the symplectic products of these basis vectors are

(15.3.44) Since the f and signs are confusing, it seems better to allow h to range from -4 to f 4 (skipping 0) and then to refer to I f h) simply as Ih); and similarly for I j ) .

LINEAR HAMILTONIAN SYSTEMS

463

Then Eq. (15.3.44) can be written -i i 0

h=-j>O h=-j
(15.3.45)

otherwise

The only nonvanishing products occur for h and j equal in magnitude, opposite in sign. Recall that vectors for which this product vanishes are said to be “in involution.” Copying from Eq. (2.6.17), the adjoint eigenequation is

ct(ult= u*(ult,

or

(15.3.46)

( U ~= C u(v1.

Letting (ul = (et, f?) and substituting for C yields (15.3.47) This requires et = uft and hence

Pf= -u 2f.

(15.3.48)

Since this is the same as Eq. (153.39). the solution vectors fare the vectors a, but it is important to label them consistently. For u = iph, arbitrarily set ft = -ia% and define the symbol (+h I by T

T

(+hl = (et - i a i ) = ((iph)(-ial) - ia,T>= (/&ah - iah).

(15.3.49)

Making consistent choices for the u = - i p h case we obtain (15.3.50) Note that ( f h ( is not the same as I f h)t. (This reflects the fact that they are the eigenvectors of a non-Hermitean matrix.) But the conventions have been such that they differ only by the placement of the factor ph. Furthermore, the labeling and normalizations have been chosen so that

(fh I * j ) = (PhahT

(

iai) ki:jaj)

=

[

6hj

signs equal signs unequal .

(15.3.51)

Again allowing h to range from -4 to +4 (skipping O), this simplifies to

( h I j ) = 6hj.

(15.3.52)

The most important application of this formula is for the derivation of eigenvector expansions. Suppose an arbitrary vector v can be expressed in the form (15.3.53)

464

LINEAR SYSTEMS

where the vh are “expansion coefficients.” Forming the Hermitean product with (hl, we obtain (hlV

=C ( h

I

j)Vj

= Vh.

(15.3.54)

j

Substituting this back into (15.3.53), we obtain

In Euclidean geometry a similar expansion of an arbitrary vector v as a superposition of orthonormal basis vectors is possible. In that case, the expansion coefficients are dot products of v with the basis vectors. In the present case there is no such thing as a scalar product of two vectors belonging to the same space, but there is the usual way of forming a scalar from a (covariant) vector in the dual space with a (contravariant) vector in the original space, and that is what is being done in Eq. (15.3.52). Furthermore, the basis vectors in the dual space have been judiciously chosen to make Eq. (15.3.55) valid. In Eq. (2.4.4), the analogous choice was said to be the natural choice of basis vectors in the dual space. Later we will also have to deal with matrix elements of S with the Hermitean definition of scalar products. Because of the complex conjugation, this will not be quite the same as Eq. (15.3.45). We define the “Hermitean symplectic form” to be (15.3.56) For the basis vectors Ih), the elements of this form are (15.3.57) These evaluate to Yjh = f s j h

=

I

-1 1 0

forj=h 0 . for j # h

(15.3.58)

The upper of the f signs is to go with j > 0 and/or h > 0; this is adequate specification, since the coefficient vanishes when their signs are opposite. As with Eq. (15.3.45), almost all of these products vanish, but the self-products are nonvanishing. Finally, substituting into Eq. (15.3.32), the functions z ( t ) = eifihtlh), h = f

l , f 2 , f 3 , f4

provide a fundamental set of solutions of Eq. (15.3.10).

(15.3.59)

A LAGRANGIAN SET OF SOLUTIONS

465

*15.4. A LAGRANGIAN SET OF SOLUTIONS3 Since there are 2n first-order Hamilton equations, one expects to have to find 2n independent solutions in order to match arbitrary initial conditions. Suppose we have found only half this many independent solutions but know them to be “in involution.” Calling these solutions zi E (xi pi)*, all their pair-wise symplectic products therefore vanish. An example of such a set is all the solutions listed in Eq. (15.3.59) for which h has the same sign, say positive. That these solutions are in involution is exhibited by Eqs. (15.3.45). The present discussion will be more general, however, in that the Hamiltonian, though linear, will be allowed to be time-dependent. It will now be demonstrated that the known functions can be used to “reduce the problem to quadratures.” Build a 2n x n matrix E out of the known solutions: (15.4.1)

Also build the matrix

D = -SE(ETE)-’,

(15.4.2)

and from these form the square matrix

P=(D E).

(15.4.3)

This matrix P can be shown to be symplectic by evaluating the product

PTSP = (D%D E*SD D E%E % E > = ( O1 -1) 0 = s.

(15.4.4)

Here the vanishing of ETSE is assured by the fact that the original set of solutions is in involution, and direct calculation shows that the presence of this factor causes DTSD = 0 as well. Direct calculation also yields ETSD = 1.Because P is symplectic, its inverse can be obtained using Eq. (13.4.15): (15.4.5) The matrix P and its now-known inverse can be used to define a coordinate transformation

w=Pz,

and z = P - ’ w .

(15.4.6)

3This section, drawn from Meyer and Hall (1992). is more play than work, since it solves a problem based on a situation that is unlikely to arise in practice. But it does provide practice working with nonsquare partitioned matrices and exploiting symplectic relations. Since this material illustrates fundamental properties of symplectic geometry, it more properly belongs in Chapter 13, but it has been deferred to this point to take advantage of the example just worked out in the previous section.

466

LINEAR SYSTEMS

Since z satisfies the equation i = Cz, the equation of motion for w

(;)

= P-’(cP-P)

(:>,

E

(uv)’

is (15.4.7)

The matrix appearing in this equation simplifies remarkably: ( 15.4.8)

where the last element vanishes because E = CE. Furthermore, the upper elements also vanish. This can be seen by differentiating the relation ETSD = 1and applying C’S = -SC, which the matrix C satisfies, according to Eq. (15.3.16), since it is Hamiltonian. Eq. (15.4.7) therefore reduces to (15.4.9) Since the upper portion of this equation assures that u = uo = constant, the lower equation becomes V

= u0DTS(D - DE).

(15.4.10)

Finally, since the right-hand side consists of known functions oft, v can be obtained by integration. It is common in physics for “integrals of the motion” such as total energy or total angular momentum to be known even though the detailed trajectories are unknown. That such a quantity is constant implies that the system orbit lies in the surface on which the integral of the motion in question has the appropriate fixed value. In the unlikely event that there were 2n - 1 known integrals of the motion, then the actual orbit would be the curve formed by the intersection of all these surfaces. The ultimate purpose of equations of motion is to find detailed orbits matching known initial conditions, so the challenge is to learn how to make use of a known integral of the motion to advance toward that end. Pictorially this is easy since, as mentioned before, the range of possib!e orbits has been restricted to those lying in the appropriate surface. Unfortunately, it is usually not easy to take advantage of this feature analytically. The conditions that have been assumed in the reduction performed in this section are considerably more favorable than the “knowledge of a few integrals of the motion.” In particular, with n solutions known, new coordinates, the first n of which are, say, X I , x 2 . . . . , xn, can be tailored to those solutions by assuming that only x1 varies along the first solution, only x 2 along the second, and so on. The remaining n coordinates, call them (1, ( 2 , . . . , (,,, can then be regarded as constants of the motion, since they do not vary along the known solutions. (This construction is “local,” however, and may not necessarily be valid “globally.”) In treatments more advanced than the one in this text, it is demonstrated how integrals of motion can be be exploited in similar ways. Much as our assumed vectors

PERIODIC LINEAR SYSTEMS

467

have been said to be “in involution,” one defines scalar functions to be “in involution” if all their Poisson brackets vanish. Then one can prove that the presence of k integrals of the motion in involution can be used to reduce the number of degrees of freedom by k and hence the dimensionality by 2k.

Suppose the matrix A in Hamilton’s Eq. (15.3. lo), dz

- = A(t)z, dt

(15.5.1)

though time-dependent, is periodic with period T ,

A(T

+t ) = A(t).

(15.5.2)

This condition does not imply that the solutions of the equation are periodic, but it nevertheless greatly restricts the possible variability of the solutions. Condition (15.5.2) causes the “once-around” or “single-period transfer matrix MT: which is the ordinary transfer matrix M evaluated at t = T ,

M(T)

MT,

(15.5.3)

to have special properties. A single-period transfer matrix can also be defined for times other than t = 0, and this is indicated by assigning t as an argument: M T ( ~ ) . Recall that the columns of M are themselves solutions for a special set of initial conditions, so the columns of MT are the same solutions evaluated at t = T . There are two main ways in which equations containing periodically varying parameters arise. One way is that the physical system being described is itself periodic. Examples are crystal lattices, lattices of periodic electrical or mechanical elements, and circular particle accelerator lattices. For particle accelerators, it is customary to work with a longitudinal coordinate s rather than time t as independent variable, but the same formulas are applicable. The other way periodic systems commonly arise is when one is analyzing the effects of perturbations acting on otherwise-closed orbits. The main theorems satisfied by solutions of Eq. (15.5.1) are due to Floquet and Lyapunov. These theorems are essentially equivalent. Lyapunov’s contributions were 4There is no consistency concerning the names for matrices M,which we call “transfer matrices,” and matrices MT.which we call “single-period transfer matrices.” Some of the names used by mathematicians are scarcely suitable for polite company. The term “monodromy matrix” is commonly applied to MT. Yakubovitch and Starzhinskii [l] refer to M as the “matrizant:’ and Meyer and Hall [2] refer to it as the “fundamental matrix solution satisfying M(0) = 1,” where any matrix 2 satisfying Eq.(15.3.4) is known as a “fundamental matrix solution.” This terminology agrees with that used by Pars. I, however, prefer terminologydrawn from electrical engineering and accelerator physics. Since (it seems to me) these fields make more and better use of the formalism, it seems their notation should be favored. The term “oncearound” comes from circular storage rings that are necessarily periodic in the present sense of the word. However, “single-period transfer matrix” may be more universally acceptable.

468

LINEAR SYSTEMS

to generalize Floquet’s theorem to multiple dimensions and to use it for an especially effective coordinate transformation. For convenience of reference, we will muddle the chronology a bit by regarding the multidimensional feature as being included in Floquet’s theorem and only the transformation in Lyapunov’s. Both of these theorems are valid whether or not the system is Hamiltonian, but the most interesting questions concern the way that Hamiltonian requirementsconstrain the motion when “conditions,” though changing, return periodically to previous values.

15.5.1. Floquet’s Theorem Substituting t’ = T in Eq. (15.3.7), propagation from t = 0 to to I described by M(t

+ T) = M(t + T, T)M(T).

+ T can be (15.5.4)

Because of periodicity condition Eq. (15.5.2), propagation from t = T is identical to propagation from t = 0, or

M(t

+ T , T ) = M(t).

(15.5.5)

Using definition (15.5.3), it follows then from Eq. (15.5.4) that

M(t

+ T ) = M(t)MT.

(15.5.6)

This is the essential requirement imposed by the periodicity. Since M(0) = 1, this relation is trivially true for t = 0 and an equivalent way of understanding it is to recall the definitions of columns of M(t) as solutions of Eq. (15.5.1) satisfying special initial conditions. The corresponding columns on both sides of the equation are clearly the same solutions. According to Eq. (15.3.14), if the coefficient matrix A were constant, the singleperiod matrix MT would be related to A by MT = eTA.Motivated by this equation, and being aware of the considerations of Section 15.3.2, we form a logarithm and call it K: 1 K = - lnMT, T

which implies MT = eTK.

(15.5.7)

Assuming that the elements of A are real, the elements of MT will also be real, but the matrix K may be complex. In any case, because the logarithm is not singlevalued and because A(t) is time-dependent in general, it is not legitimate to identify K with A. By its definition, the matrix K is “Hamiltonian” in the sense defined below Eq. (15.3.16), but it is almost certainly misleading to read this as implying a direct relationship between K and the system Hamiltonian (assuming there is a Hamiltonian, that is). From K and transfer matrix M(t) we form the matrix F ( t ) = M(t)e-‘K,

(15.5.8)

469

PERIODIC LINEAR SYSTEMS

an equation that we will actually use in the form

M(t) = F(t)erK.

(15.5.9)

What justifies these manipulations is that F(t) can next be shown to be periodic (obviously with period T ) . Evaluating (15.5.8) with t + t T and using condition (15.5.6) and e-rK = M;', we obtain

+

F(t + 7')= M(t)Mp-TKe-fK= F(t).

( 15.5.10)

We have therefore proved Floquet's theorem, which states that transfer matrix M(t) can be written as the product of a periodic function F(t) and the exponential matrix efK,as in Eq. (15.5.9). By manipulating Eq. (15.3.7) and substituting from Eq. (15.5.9). we obtain a formula for the two argument transfer matrix:

M(t, t') = M(t)M-'(t') = F(t)e('-'')KF-' (t').

(15.5.11)

With the transfer matrix given by Eq. (15.5.9). solutions take the form z(r) = F(t)erKz(O).

(15.5.12)

Such a solution is known as "pseudo-harmonic" if K is diagonal and pure imaginary, because the motion can be regarded as simple-harmonic (that is to say, sinusoidally time-varying) but with "amplitude" being "modulated" by the factor F(t). By differentiating Eq. (15.5.8) and rearranging terms, one finds that F(t) satisfies the equation

F = AF - FK.

(15.5.13)

Since F(t) is known to be periodic, it is necessary to select the particular solution of this equation having this property. 15.5.2. Lyapunov's Theorem We seek the coordinate transformation z + w that best exploits FIoquet's theorem to simplify Eq. (15.5.1). It will now be shown to be

z = F(t)w.

(15.5.14)

There are two ways that i can be worked out in terms of w. On the one hand,

i = AZ= AFw.

(15.5.15)

On the other hand, differentiating both Eq. (15.5.14) and Eq. (15.5.8) and taking advantage of the fact that the transfer matrix satisfies the same equation as z yields

i = Fw + FW = (Me-'K - Me-'KK)w + FW = (AMe-rK - Me-'KK)w

+ FW

(15.5.16)

470

LINEAR SYSTEMS

Applying Eq. (15.5.8) again, the first term of this equation can be seen to be the same as the right-hand side of Eq. (15.5.15), and the second term can be simplified. Canceling a common factor, it then follows that w satisfies the equation W

= Kw.

(15.5.17)

This result is known as Lyapunov’s theorem. With K being a constant matrix, this constitutes a major improvement over Eq. (15.5.1), whose matrix depended on time.

15.5.3. Characteristic Multipliers, Characteristic Exponents An eigenvalue p corresponding to eigenvector a of the single-period transfer matrix MT satisfies MTa = pa

and

det IMT - p l l = 0

(15.5.18)

and is known as a “characteristic multiplier” of Eq. (15.5.1) . If solution z ( t ) of Eq. (15.5.1) satisfies the initial condition z(0) = a, then its value at time t is M(t)a and, using Eq. (15.5.6), its value at t T is

+

z(t

+ T ) = M(t + T)a = M(t)Mra = pM(t)a = p z ( t ) .

(15.5.19)

This relation, true for arbitrary t , is the basis for the name characteristic multiplier. It shows that the essential behavior of the solution after a long period of time is controlled by the value of p. In particular, the amplitude grows (shrinks) uncontrollably if lpl =- 1 (IpI < 1). The case of greatest interest from our point of view is therefore IpI = 1, in which case p can be expressed as p = eiaT (factor T has been included for later convenience), where aT is real and can be thought of as an angle in the range from -jr to K . ~ When MTis expressed in terms of K as in Eq. (15.5.7), Eq. (15.5.18) becomes 1 or Ka = - Inpa. (15.5.20) T This shows that K has the same eigenvectors as MT and, calling its eigenvalue a, eTKa= pa,

1 (15.5.21) T Because of the multiple-valued nature of the logarithm this determines ct only modulo 2jr/T. The (Y values are known as “characteristicexponents.”6 This discussion has been rather theoretical in that much has been inferred about the long-term behavior of the system from the eigenvalues of MT, but it is not necessarily simple to find MT.Approximate methods for finding MT from A will be studied in the next chapter. (Y

= - Inp.

50ne is accustomed to being able to make similar inferences from the eigenvalues of A in the coefficient-constant case. In this sense at least, M T can be thought of as being the constant propagation matrix that “best represents” the time dependence of A(t). %n acceleratorphysics, aT (or rather aT/(2n))is known as the “tune” of its corresponding eigenmotion.

PERIODIC LINEAR SYSTEMS

471

For the case )p1 = 1 which we have declared to be of greatest interest, the a,can be taken in the range -JC < ai < n. Even limited in this way there is too great a variety of possible arrangements of the ai to permit a thorough survey in this text of all possibilities. We will assume them to be all unequal. In the cases we will study, they come in equal and opposite pairs f a i , and we will therefore be excluding even the case ai = 0. If the coefficient matrix A appearing in Eq. (15.5.1) is a constant matrix C, it can be regarded as periodic for any period T. In this case, the single-period transfer matrix is given by

M T = CT ~.

(15.5.22)

Then a characteristic multiplier p belonging to MTcan be associated with an eigenvalue h of C according to p = e AT

.

(15.5.23)

If the eigenvalues of c are expressed as hh = f i p h as in becomes

m.(15.3.40),this equation

Comparing this with Eq. (15.5.21), we see in this case (constant C) that (modulo 2nil T)

the characteristic exponents are simply the eigenvalues of C.

15.5.4. The Variational Equations This section may initially seem to be a digression from our consideration of periodic systems. What will make it germane is that the significance of 1 as a characteristic multiplier will be illuminated. The first-order equations of motion for the most general autonomous system have the form x = X(x) or

ii= X'(x),

(15.5.26)

where the 2n functions of Xi(x)are arbitrary. Let x ( t ) stand for a known actual solution of Eq. (15.5.26). and let x ( t ) ax@) be a nearby function that also satisfies (15.5.26). Sufficiently small values of Sx will satisfy an equation obtained by taking the first terms in a Taylor expansion centered on the known solution. This set of equations is

+

8xj = j=1

Ai,(f)axj.

(15.5.27)

472

LINEAR SYSTEMS

The matrix of coefficients A(t) is normally called the Jacobian matrix and these are known as “the variational equations,” or sometimes as “the P o i n c d variational equations,” even though they are unrelated to the calculus of variations. By construction it is a linear set of equations, but they have become nonaufonomous since the coefficients depend explicitly on t . However, if the unperturbed solution is periodic, the coefficients will be periodic functions oft. The theory that has been developed can therefore be applied to equations that emerge in this way.

Problem 15.5.1: For a planar Kepler orbit, the Hamiltonian was given in Eq.(1 1.2.411 to be (15.5.28)

For an orbit characterized by the (conserved) value of pe being a, and with the coordinates listed in the order (r, 8,p r , PO), show that the matrix of the variational equations is

0 -2a

A=

-3a’ 2K 4, +;r 0

0

;;;;s

0

0

0

0

2u

(15.5.29)

For the special case of the orbit being circular, find the eigenvalues of this matrix.

Problem 15.5.2: For not quite circular orbits, the unperturbed radial motion in the system of the previous problem is given by r = a ( l - ~ c o s p ) , where t =

-(u

Jm:

- E sinu),

(15.5.30)

where 6 can be treated as a small parameter. Find the time-dependent variational matrix. Each coefficient should be expressed as a (possibly terminating) Fourier series. Just by looking at the Jacobian matrix in Problem 15.5.1 one can see that it has 3c = 0 as an eigenvalue. If the system is Hamiltonian (which it is), then this has to be a double root. This is too bad, since we have said that we would neglect the possibility of double roots. We will, in fact, for want of space and time, not work more with these equations, but we can at least contemplate the source of the vanishing eigenvalues. If A has 0 as an eigenvalue, then M T = e T A has 1 as an eigenvalue, and 1 is therefore one of the multipliers of the variational equations. If the multiplier is 1, then the correspondingsolution is itself periodic, with the same period T as the underlying unperturbed motion. We could have seen a priori that such a periodic solution of the variational equations can be obtained directly from the known unperturbed solution

PERJODIC LINEAR SYSTEMS

473

x(t). Simply differentiating Eqs. (15.5.26) with respect to t yields (15.5.31) This means that x is a solution of the variational equations. But x has the same periodicity as x and hence has period 7’.We see therefore that to have at least one vanishing eigenvalue is a generic property of the variational equations describing motion perturbed from a periodic orbit. This is something of a nuisance, and we will not pursue it further. *15.5.5. Periodic Solutions of lnhomogeneous Equation7 A closed-form solution to inhomogeneousequation (15.3.8) was given by Eq. (15.3.9). Here we are interested in the situation when the coefficient matrix A(t) is periodic with period T ,in the existence of a periodic solution in this case, and whether or not such a solution is unique. That is, we seek a solution of the equation

dx

dt = A(t)x + k(t)

(15.5.32)

or equivalently x ( T ) = x(O),

(15.5.33)

for which

x(t

+ T ) = x(t),

since the latter equation implies

x(t

+ T ) = M(t)x(T) = M(t)x(O) = X(t).

(15.5.34)

By Eq. (15.3.91, a formula for the required solution at time T is

+

x(T) = MT(x(O)

1’

M-’(t)k(t) d t ) .

(15.5.35)

Two questions arise: is this solution periodic, and if so is it unique? If there is a periodic solution to the homogeneous equation obtained by setting k = 0 in Eq. (15.5.32), then the answer to the second question has to be “no,” since such a solution could be added without affecting the equation. Let us demand therefore that Eq. (15.5.32) not have a periodic solution when k = 0. The previous section indicates that this is the same as demanding that the characteristic multipliers of A not include the value 1 . The question still remains: Does a periodic solution of Eq. (15.5.32) exist? If it does, substituting from Eq. (15.5.35) into Eq. (15.5.33) and rearranging terms yields

Jd’

(MT - l)x(O) = -MT

M-’(t)k(t) d t .

(15.5.36)

7Thssection may seem to be excessively mathematical, but its major result will be crucial for the development of symplectic perturbation theory in the next chapter.

474

LINEAR SYSTEMS

Since, by hypothesis, 1 is not a multiplier, the matrix on the left can be inverted and we obtain

I’

~ ( 0=) -(MT - ~ ) - ‘ M T

M-’(t)k(t) d t ,

(15.5.37)

which is therefore the required unique penodic solution. It will prove useful to manipulate this result into a form that is applicable for arbitrary t and is expressed as the convolution of an “influence function” R(t) with “external force” k(r). Toward that end, we again apply Eq.(15.3.9). but at a displaced time, plus our knowledge that x ( t T ) = x ( t ) :

+

~ ( t=) M(t

+ T ) (x(0) +

Jd”’

M-’(t’)k(t’) d t ’ ) .

(15.5.38)

When the range of integration in this formula is broken into one range from 0 to t and another from t to t T, the first range yields the integral from which x(r) can be obtained. Substituting from Eq. (15.3.9), we obtain

+

M-’(t’)k(t’) dr’) .

(15.5.39)

The purpose of this manipulation has been to permit grouping the terms proportional to x ( r ) on the left-hand side of the equation. Doing this, and solving, we obtain

x(t) = (1 - M(t + T)M-’(t))-’

t+T

M(t)M-’(t’)k(t’)dt’.

(15.5.40)

To be certain of the validity of the previous step, we have to be sure the assumed matrix inversion is possible, which it is, but we defer the demonstration temporarily. First we define a new function: R(t, S ) = M(t

+ T)M-’(t + s).

(15.5.41)

One can check that this function is periodic in t (though not in s):

R(t

+ T, S) = M(t)M*(T)M-’(T)M-’(t +

S)

= R(r, s).

(15.5.42)

Setting s = 0, the same equation shows that matrices R(t, 0) and M(T) i~ MT are related by a similarity transformation and therefore have the same eigenvalues. Since MT is known not to have 1 as an eigenvalue, the same is true of R(r, 0). This validates the matrix inversion assumed above. The denominator factor in Eq. (15.5.40) can be expressed in terms of R(t, S) because

M(t

+ T ) M - ’ ( t ) = R(t, 0 ) ,

(15.5.43)

PERIODIC LINEAR SYSTEMS

and, changing the variable of integration from t’ to T according to t’ = t the integral is transformed to

l+T

M(t)M-’(r’)k(r’) dr’ =

I‘

=

J,

M(t

475

+T -t,

+ T)M-’(t + T - t)k(t - r ) dr

R(t, T - t)k(t - T ) d r .

(15.5.44)

Finally, then, we have x(t)

= (1 - R(t, O))-’

I’

R(t, T - t)k(t - t ) d r .

(15.5.45)

The function (1- R(t, O))-’R(r, T - t ) is therefore a kind of “Green’s function” or “influence function” giving the effect at t of unit impulse at t - t . If the original matrix A is time-independent, then M(t) = erA,so

R(t, s) = e ( T - s ) A , and x(t) = (1- e

)

-’ ITerAk(t

- T)

dt. (15.5.46)

Repeating the main result so far, Eq. (15.5.32) has a unique periodic solution if the homogeneous equation obtained from it has no periodic solution. But what if the homogeneous equation

dY = A(t)y dt

(15.5.47)

has a periodic solution? This would imply that

det JMT- 11 = 0,

(15.5.48)

which would invalidate Eqs. (15.5.37) and (15.5.45). It would also imply the existence of a solution vector or vectors y1, y2, . . ., satisfying y ( T ) = y(0) or

(MT - 1)yj (0) = 0.

( 15 5 4 9 )

Any particular periodic solution of Eq. (15.5.32) can be augmented by any linear combination of these solutions and still remain a periodic solution. To answer the question associated with Eq.(15.5.47), we need the results on adjoint equations derived in Section 2.6. In particular, since the matrix MT - 1 has been intentionally constructed to have 0 as an eigenvalue, the discussion of singular equations in Subsection 2.6.1 will be needed.

476

LiNEAR SYSTEMS

The equation adjoint to (15.5.47) is dz dt

- = -At(t)z,

(15.5.50)

and, according to Eq. (2.6.12), the transfer matrix for this equation is (Mt)-’(t), so the adjoint single-period transfer matrix is (15.5.51) The condition for the adjoint equation to have a periodic solution n~is therefore

((Mi)-’ Multiplying this equation by

- 1) z(0) = 0.

(15.5.52)

-(MI)on the left yields ((ML) - 1) ~(0) = 0.

(15.5.53)

This equation has as many independent solutions as does Eq. (15.5.49). Call them z1,z2,.

. ..

Suppose that x is a solution of Eq. (15.5.32) and z is any one of the solutions of Eq. (15.5.50). Utilizing the Hermitean metric of Eq.(2.6.1) and proceeding as in Eq. (2.6.9), we obtain d

-(z, dt

X)

= (z,AX

+ k ( t ) )- (At(t)z, X) = (z,k ( t ) ) .

( 15.5.54)

Since the left-hand side is a pure time derivative, this equation can be integrated over one period, say from 0 to TI to yield rT

(15.5.55) Integrals over one period of a periodic function like that appearing on the righthand side are of great importance in perturbation theory, where the integrand is evaluated using unperturbed motion and the integral has meaning for the perturbed motion. Dedicated notation will therefore be introduced, with the integral recast as an “average” over the period T of the scalar product of arbitrary functions v ( t ) and ~ ( t ) (15.5.56)

If the vectors are expressed in braket notation, this average is indicated by double angle brackets instead of double parentheses. (15.5.57)

:

PERIODIC LINEAR SYSTEMS

477

Note that, in the event the vectors are independent of time, their “averaged scalar product” reduces to the scalar product itself. Rewriting Eq. (15.5.55) with this notation yields

The condition for solution x to be periodic is therefore

((z,k)) = 0.

(15.5.59)

In words, there can be a periodic solution only if k is orthogonal on the average to all solutions of the adjoint homogeneous equation. The converse is also true, but we omit the proof. Condition (15.5.59) effectively imports condition (2.6.36). applicable to algebraic equations, into the context of periodic linear differential equations. In the next chapter, formulas like (15.5.58) will be of importance in the development of perturbation theory. In that context, one is not surprised to see an averuging procedure playing an important role. In mechanics such procedures are usually based on “physical intuition” and, from a strict mathematical point of view, can be described more nearly as wishful rhinking than as theorems.One should observe however that no wishful thinking whatsoever has been needed to support the validity of Eq. (15.5.58) (and some of the others like it in the next chapter). It is a remarkable, rigorous, global property of periodic systems, valid in spite of the fact that arbitrary functions A(r) and k(t) appear in it. Another complication to be faced is the lack of uniqueness of an inhomogeneous equation when there is a nontrivial solution of its homogeneous equation. As in Eq. (2.6.32) or Eq. (2.6.38), the solution x can be made unique by imposing as many conditions

as there are independent solutions y of the homogeneous equation. As in Eqs. (2.6.39) and (2.6.40), the original equation can be modified by introducing matrices U and V built from independent solutions of the homogeneous and adjoint homogeneous equations. Including an (appropriately vanishing) extra term -VU’x, Eiq. (15.5.32) becomes

(i

)

- A ( t ) - VUt x = k(t).

(15.5.61)

The unique solution of the original equation, augmented by condition (15.5.60), will be denoted x = Sk.

(15.5.62)

This amounts to being a formal replacement for the explicit formula (15.5.45) that covers also the degenerate situation when (15.5.45) is not applicable. Note that S, though a linear operator, is more complicated than a pure matrix.

478

LINEAR SYSTEMS

BIBLIOGRAPHY

References 1 . V. A. Yakubovitch and V. M. Starzhinskii, Linear Direrential Equations with Periodic Coefficients, Wiley, New York, 1975. 2. K . R. Meyer and R. Hall, Introduction to Hamiltonian Dynamical Systems and the N-Body

Pmblem, Springer-Verlag, New York, 1992. 3 . P. R. Halrnos, Finite-Dimensional Vector Spaces, Springer-Verlag,New York, 1987, p. 150.

16 PERTURBATION THEORY Nowadays students of physics may be inclined to think of perturbation theory as a branch of quantum mechanics, since that is where they have mainly learned about it. For the same reason, they may further think that there are precisely two types of perturbation theory-time-independent and time-dependent. It seems to me more appropriate to think of perturbation theory as a branch of mathematics with almost as many perturbative methods as there are problems, all naturally motivated by the particular features of the problem. Still, there are perturbative methods that arise naturally and repeatedly in classical mechanics, and some of them are illustrated in this chapter. One natural way to categorize methods is by whether or not they assume the unperturbed motion is Hamiltonian. Since the “purest” mechanical systems are Hamiltonian, we will emphasize (with one important exception) methods for which the answer is affirmative. Another natural way of categorizing is by whether the perturbation (1) violates the Hamiltonian requirements or (2) respects them. It is not possible to say which of these possibilities is the more important. (1) The “generic” situation in physics is for perturbations to violate d’Alembert’s principle and therefore to be non-Hamiltonian. In fact, most systems are treated as Hamiltonian only because non-Hamiltonian terms have been neglected in anticipation of later estimating the small deviations they cause. Lossy or viscous forces like friction and wind resistance are examples. This low-loss case is by far the most important as far as engineering considerations are concerned, and the required methods are rather straightforward. (2) The hardest problem, and the one that has tended to be of greatest theoretical interest over the centuries, is the case of loss-free perturbations that, though they respect d’Alembert’s principle, lead to equations that can only be solved by approximation. It is usually very difficult to ensure that an approximation method being employed does not introduce artificial nonsymplectic features into the predicted motion. This difficulty is most pronounced in nearly lossless systems such as occur with 479

480

PERTURBATIONTHEORY

high-energy particles circulating in the vacuum of particle accelerators or heavenly bodies moving through the sky. Another categorization method is based on whether the Hamiltonian is timeindependent or time-dependent: (1) Time-independent systems are said to be “autonomous.’’ They are systems that are so isolated from the rest of the world that there is no possibility of their being influenced by time-dependent external influences. (2) Systems that are not isolated are called “nonautonomous”;in general, the external effects influencing them will be time-dependent. Among such systems the time dependence can be periodic or nonperiodic. It might be thought justified to slight the periodic case as being too special, but the opposite is more nearly appropriate. When external conditions return regularly to earlier values, any errors that have been made in analyzing the motion are likely to “stick out,” and this imposes serious demands on the methods of approximation. On the other hand, if the external conditions vary in an irregular way, that very irregularity tends to overwhelm any delicate Hamiltonian features of the motion. Some of the important themes are (1) variation of constants, (2) averaging over a cycle of the unperturbed motion, (3) elimination of secular terms, (4)eliminating arbitrariness from the solutions of homogeneous equations, (5) successive approximation (“iterative” methods of approximation), and (6) taking account of the possibility of “resonance.” The first two of these have already been much discussed, and the others will be discussed in this chapter.

16.1. THE LAGRANGE PLANETARY EQUATIONS 16.1 .l.Derivation of the Equations It was Lagrange himself who introduced powerful approximationtechniques into celestial mechanics. He developed a procedure for analyzing the effect of perturbing forces on the same Kepler problem that has played such a prominent role in the history of physics, as well as in this textbook. Lagrange’s method can be characterized as “Newtonian,” though Poisson brackets will play a prominent role, but not before the closely related and historically prior “Lagrange brackets” appear. Copying the Kepler potential energy from Problem 1.2.12 and augmenting it by a perturbing potential energy m R ( r , t ) that depends arbitrarily on position and time, but is weak enough to perturb the motion only slightly, we are to find the trajectory of a particle of mass m with potential energy

K

V(r) = -- - m R ( r ) . r

(16.1.1)

It is not really essential that the perturbing force be representable by a potential energy function as here, and the force can be time-dependent without seriously complicating the solution, but we simplify the discussion a bit by making these assumptions. Because the motion is assumed to resemble the pure Kepler motion analyzed in earlier chapters, it is appropriate to introduce the Jacobi parameters

THE LAGRANGE PLANETARY EQUATIONS

481

011, 012,013, B l , / ? 2 , 8 3 of the nearby pure motion. (Recall that these were also known as “orbit elements.”) For the time being, since there will be no need to distinguish between the a! and the fi elements, we will label them as (111, 012, 013, (114, ( r 5 , 016 and represent them all as a.Knowing the initial position and velocity of the mass, one can solve for the orbit elements that would match these conditions if the motion were unperturbed and hence actually do match the perturbed motion briefly. Using rectangular coordinates, the equations accomplishing this have the form

For later convenience, the Cartesian velocity components have been expressed in terms of Cartesian momentum components. These equations (actually their inverses) can be employed at any time r to find the instantaneous values of a that would give the actual intantaneous values of r and r. The (Newtonian) perturbed equations of motion are

..

Kx Ky

aR ax aR

mr3

af

x+-=--,

mr3

..

y+-=-

Kz i‘+-=--. mr3

aR

az

(1 6.1.3)

A convention that is commonly employed, especially in the remainder of this text, is to place the terms corresponding to unperturbed motion on the left-hand sides of the

equations and to place the perturbing terms on the right-hand sides, as here. Respecting the functional form of the unknowns as they were introduced in Eqs. (16.1.2), the unperturbed equations of motion can be written as

-

Kx

KY Kz

( 16.1.4)

In words, these equations state that the functions specified in Eqs. (16.1.2) satisfy the unperturbed equations if the orbit elements are constants. The method of “variation of constants” (which is by no means specific to this problem) consists of allowing the “constants” (Y to vary slowly with time in such a way that the perturbed equations of motion are satisfied, while insisting that the relations (16.1.2) continue to be satisfied at all times. At any instant, the motion will

482

PERTURBATION THEORY

initially matching unperturbed orbit

FIGURE 16.1.1. (a) The true orbit “osculates“ the matching unperturbed orbits at successive times. The orbits need not lie in the same plane. (b) The deviation of the true orbit (solid curve) from the unperturbed orbit that matched at the start of the calculation (dashed curve) is based on evaluating all functions on the dashed curve. The perturbed orbit may come out of the plane.

be appropriate to the orbit elements as they are evaluated at that time, and they will vary in such a way as to keep this true. This matching is based on a picture of “osculation,” which means that the perturbed and unperturbed orbits not only touch, they “kiss,” meaning they have the same slopes, as in Fig. 16.1.l. Figure 16.1.la shows that a more or less arbitrary trajectory can be matched by Kepler ellipses, but Figure 16.1.1b more nearly represents the sort of almost-ellipticalperturbed orbit we have in mind. The true instantaneous velocities are obtained (by definition) from dx

ax

ax

(16.1.5) and the matching unperturbed velocities are given by the first terms on the right-hand sides of these equations. Hence the calculus expression of the osculation condition is

c s

ax -cis

affs

= 0,

c s

aY -4, = 0, 3%

azar ., = 0. G

(16.1.6)

Differentiating the lower of Eqs. (1 6.1.2) with respect to t , substituting the result into Eqs. (16.1.3), and taking advantage of Eqs. (16.1.4), we obtain

THE LAGRANGE PLANETARY EQUATIONS

483

Together, Eqs. (16.1.6) and (16.1.7) are six differential equations for the six orbit elements, but they are not yet manageable equations, as they depend as well on the Cartesian coordinates and momenta. This dependency can be removed by the following remarkable manipulations. Multiplying the first of Eqs. (16.1.7) by ax/aa, and subtracting the first of Eqs. (1 6. I .6) multiplied by ap, pa, yields aR ax EXrs&, =rn--, ax aa, S

ax ap, ap, ax where X,, = --- --. aa, aa, aa, aa,

(16.13)

Quantities Y,, and Z,, are defined similarly, and the same manipulations can be performed on the other equations. We define the “Lagrange bracket” of pairs of orbit elements by

where we have introduced ( q ’ ,q 2 ,q 3 ) = ( x , y , z) and (pi, p2, p3) = ( p x ,px, p z ) and, though n = 3 in this case, similar manipulations are valid for arbitrary n. The purpose of the duplicate notation for [a,,a,] is to allow us to regard the Lagrange brackets as the elements of a matrix L = (L,,). Adding the three equations like (16.1.S), we obtain

a R ax ax aa,

a R ay ay aa,

-- + --

az +--)aazR aar

( 16.1.10)

After these manipulations, the coordinates x, y, and z no longer appear explicitly on the left-hand sides of the equations. This may appear like an altogether artificial improvement, since the Lagrange brackets themselves depend implicitly on these quantities. The next stage of the development is to show that there is no such dependence, or rather that the dependence can be neglected in obtaining an approximate solution of the equations. More precisely, we will show that [a,,a,]is a constant of the unperturbed motion (provided both a r and as are constants of the unperturbed motion which, of course, they are.) This is exact for unperturbed orbits but, applying it to the perturbed orbit, we will obtain only an approximate result. We defer this proof and continue to reduce the perturbation equations. While discussing adiabatic invariants in the previous chapter, we already learned the efficacy of organizing the calculation so that the deviation from unperturbed motion can be calculated as an integral over an unperturbed motion. This is illustrated in Fig. 16.1.1b. In this case the unperturbed orbit, shown as a dashed curve, is closed while the perturbed orbit may not be, and that is the sort of effect a perturbation is likely to have. But we assume that over a single period the deviation of the perturbed orbit is small on the scale of either orbit. It is implicit in this assumption that the changes in the orbit elements (I! will also be “small” over one period. As in the proof of constancy of the action variable in Section 14.3.2, we can approximate the

484

PERTURBATIONTHEORY

right-hand side of Eq. (1 6.1.10) by averaging over one period T :

These are the “Lagrange planetary equations.” Since they can be written in matrix form, (16.1.12) they can be solved for the time derivatives (16.1.13) Since the integrations required to evaluate the averages in Eq. (,16.I . 1 I ) are taken over the initially matching unperturbed orbit, the Lagrange brackets are, in principle, known. If they are calculated numerically, they are known also in practice, and this commonly solves the problem at hand satisfactorily. But we will continue the analytical development and succeed in completing realistic calculations in closed form. a,] are constants of the It has already been announced that the coefficients [a,., motion and, since linear differential equations with constant coefficients are very manageable, we can see what a great simplification the Lagrange procedure has brought.

16.1.2. Relation Between Lagrange and Poisson Brackets Because it is needed in Eq. (16.1.13), we now set about finding P = L-’, or rather showing that the elements of P are in fact the Poisson brackets of the orbit elements, defined in Eq.(13.5.2), ( 16.1.14)

We are generalizing somewhat by allowing arbitrary generalized coordinates and momenta, and Eq. (16.1.9) has already been generalized in the same way. Recall that there are 2n orbit elements arras they include both the n Jacobi / 3 j elements and the n Jacobi ai elements. As a result, the matrices under study are 2n x 2n; the indices r and s run from 1 to 2n while the indices i and j run from 1 to n. Using summation convention and relations like (16.1.15) we now show that

THE LAGRANGE PLANETARY EQUATIONS

485

( 1 6.1.16)

= &t,

and hence LT = p-1.

(16. I. 17)

which is the desired result. It was proved earlier that the (Poisson bracket) elements of P are constants of the unperturbed motion, so we now know that the Lagrange brackets (elements of P-’) are also constants of the motion. This greatly simplifies the Lagrange planetary equations. It also shows, incidentally, that Lagrange himself was aware of the most important properties of these bracket expressions well before Poisson. His proof of the invariance of the Lagrange brackets was specific to the Kepler problem and proceeded as in a problem below.

16.1.3. Advance of Perihelion of Mercury

To exercise the formulas derived so far we consider Einstein’s famous prediction, based on the theory of general relativity, that the perihelion of the planet Mercury should advance with time. See Fig. 16.1.2. Not pretending to understand general relativity, we simply accept Einstein’s word that the mass of the sun introduces “curvature” into the space which has the effect of modifying the Kepler potential slightly, into the form

U ( r ) = - -K- - , B r rn

B

or m R = - - . rn

(16.1.18)

We will calculate the influence of the correction term on the orbits of planets of the sun. The planet for which the effect is most appreciable and the observations least subject to extraneous difficulties is Mercury.’ ‘In practice, the orbit of Mercury appears to precess at a rate roughly 100 times greater than the Einstein effect because the coordinate system (described in Section 1 1.2.8) is not itself fixed. Furthermore, there are perturbations to Mercury’s orbit due to other nearby planets, and these forces cause precession on the order of 10 times greater than the Einstein effect. These precessions are eminently calculable using the Lagrange planetary equations, but we ignore them, or rather treat them as nuisance corrections that need to be made before the result of greatest interest can be extracted.

486

PERTURBATIONTHEORY

FIGURE 16.1.2. Advance of perihelion is registered as deviation from zero of the coordinate 82, which is the angle from the major axes at time t = 0 to the major axis at time t.

Orbit elements to be used are a , 6 , which are intrinsic properties of the unperturbed orbit, along with i and @ 2 , 8 3 , which establish its orientation. These quantities were all defined in Section 11.2.7. We specialize to planar geometry corresponding to the figure by assuming i = n/2, ,93 = 0. Generalizing Eqs. (1.2.35) by introducing 8 2 to allow for the possible advance of the perihelion angle 8 2 , the Cartesian coordinates of the planet are given by I

n = c o s ~ 2 ( cosu a - a € ) - s i n ~ 2 a d C - Psinu z = sin pz(a cos u - ae) -I-cos p z a J i - 7 sin u.

( 16.1.19)

Note that to specify the azimuthal position of the planet, we are using the intermediate variable u, known as the “eccentric anomaly,” rather than ordinary cylindrical coordinate x,which is the “true anomaly.” A formula relating u to time t is Eq. (1.2.36), t--r=

-(u-~sinu), Jrn:

where we have introduced the “time of passage through perigee” related to #?I because, according to Eq.(1 1.2.56),

t

=

-gp*.

(16.1.20) t ;it

is closely

(16.1.21)

For a general perturbing force, a replacement like this would be ill-advised, as the dependence of the coefficient on one of the other orbit elements, namely a, “mixes” the elements, which would lead at best to complication and at worst to error. But

THE LAGRANGE PLANETARY EQUATIONS

487

our assumptions have been such that energy E is conserved, and the definitions of the orbit elements then imply that a is also conserved for the particular perturbation being analyzed. Since we need derivatives with respect to t, we need the result 1

_-

(16.1.22)

Differentiating Eqs. (1 6.1.19), we obtain px =

(- cos 8 2 sin u - JC-2sin 8 2 cos u)&F 1 -€COSU’

pz =

(- sin 8 2 sin u + J1-2 cos 82 cos u) 1

-€COSU

.

(16.1.23)

Since no further differentiations with respect to t will be required, and since the Lagrange brackets are known to be independent of t, we can set t = 0 from here on. Then, by Eq. (16.1.20), u is a function only o f t . Furthermore, we can set 8 2 = 0 and, after differentiating with respect to t, we can also set t = 0 (and hence u = 0). These assumptions amount to assuming perigee occurs along the x-axis at r = 0, and that the orbit elements are being evaluated at that point. Hence, for example,

(16.1.24) The vanishing of a x / k reflects the fact that it is being calculated at perigee. For the Lagrange brackets involving orbit elements other than t,it is legitimate to make the u = 0 simplification before evaluating the required partial derivatives: x = a ( l -€)COS/?2,

z=a(l-~)sin/?2,

1-€

(1 6.1.25)

Completion of this example is left as an exercise. Einstein’s calculation of the constants and n in Eq.(1 6.1.18) leads to the “astronomically small“ precession rate of 43 seconds of arc per century. One should be suitably impressed not just by Einstein but also by Newton, who permits such a fantastic “deviation from null” observation, made possible by the fact that the unperturbed orbit closes to such high precision.

488

PERTURBATION THEORY

Problem 16.1.1: Setting m = 1 so p = x, the potential energy that yields a “central force,” radial and with magnitude depending only on r , is given by V ( r ) . The unperturbed equations of motion in this potential are

av

av

av

(16.1.26)

a

Let a stand for the orbit elements in this potential. By explicit differentiation show that d/dr[a,, as]= 0

Problem 16.1.2: Check some or all of the following Lagrange brackets for the Kepler problem. They assume as orbit elements a , 6 , i along with 82.83, all defined in Section 11.2.7, as well as t which is closely related to /31 as in Eq. (16.1.21). [a. €1 = 0,

[a, i ] = 0.

la, B21 = --

[a. 831 = --

cos i ,

[E,

il = 0,

[t,i ] = 0,

Problem 16.1.3: To obtain orbits that start and remain in the (x, z) plane, assume i = ~ / 2 B3 , = 0, aR/a& = 0, and a R / a i = 0. Show that the Lagrange planetary equations are

ITERATIVE ANALYSIS OF ANHARMONIC OSCILLATIONS

489

Problem 16.1.4: Check some or all of the coefficients in the following formulas, which are the planetary equations of the Kepler problem solved for the time derivatives of the orbit elements.

.

2 a2 m Z T K

a = --

at’

(1 - e 2 ) a K sini ai ’ ( 16.1.29)

Problem 16.1.5: Complete the calculation of the precession of Kepler orbits caused by the second term of Eq. (16.1.18), preferably by looking up Einstein’s value of the constants B and n and evaluating the rate of precession of Mercury’s orbit about the sun. 16.2. ITERATIVE ANALYSIS OF ANHARMONIC OSCILLATIONS Consider a system that executes simple harmonic motion for sufficiently small amplitudes and hence is described by an equation like Eq. (15.1. l), which we now simplify to one dimension:

(-$+4).=n(x)=ax

2

+ B x3.

(16.2.1)

For now we assume the system is autonomous, which means that R does not depend explicitly on t , but R is here allowed to depend on “nonlinear” powers of x of higher than first order. Such terms could have been derived from a potential energy function V = - a x 3 / 3 - Px4/4. (Note that function R is not the same as in the previous section.) Like all one-dimensional problems, this one could therefore be studied as in

490

PERTURBATIONTHEORY

Problem 1.2.1. The motion oscillates between the (readily calculable) turning points closest to the origin. Such motion is trivially periodic, but the presence of nonlinear terms causes the time dependence to be not quite sinusoidal and the system is therefore called “anharmonic.” We now wish to apply a natural iterative method of solution to this problem. This may seem to be an entirely academic undertaking since the solution described in the previous paragraph has to be regarded as already highly satisfactory. Worse yet, on a first pass the proposed method will yield an obviously wrong result. We are then led to a procedure that overcomes the problem, thereby repairing the iterative “tool” for use in multidimensionalor nonconservative situations where no exact method is available. The previously mentioned nonphysical behavior is ascribed to so-called “secular terms,” and the procedure for eliminating them is known as “Linstedt’s method.” By choosing the initial position xo and velocity uo small enough, it is possible to make the terms on the right-hand side of Eq. (16.2.1) negligibly small. In this approximation,the solution takes the form x = acoswgt.

(16.2.2)

where we have simplified to the maximum extent possible (with no loss of generality) by choosing the initial time to be such that the motion is described by a pure cosine term. This solution will be known as the zeroth-order solution. We will be primarily interested in somewhat larger amplitudes, where the anharmonic effects have become noticeably large. This region can be investigated mathematically by keeping only to the leading terms in power series in the amplitude a. In fact, one usually works only to the lowest occurring power unless there is a good reason (and there often is) to keep more terms. An intuitively natural procedure, then, is to approximate the right-hand side of Eq. (16.2.1) by substituting the zeroth-order solution to obtain

Pa3 cos wgt cos 3 w t ) (16.2.3) = aa2 -( 1 cos 2 w t ) -(3 2 4 The terms on the right-hand side have been expanded into Fourier series with period 2~r/*. Note that R ( x ) could have been any function of x whatsoever, and the righthand side would still have been expressibleas a Fourier series with the same periodany function of a periodic function is periodic. In general, the Fourier series would be infinite, but for our simple perturbation the series terminates. Though Eq. (16.2.1) was autonomous, Eq.(16.2.3) is nonautonornous. In fact, the terms on the right-hand side are not different from the terms that would describe . external sinusoidal drive at the four frequencies 0, 00,200, and 3 ~Furthermore, the equations have magically become “linear”-that was the purpose of the Fourier expansion. Methods for solving equations like these have been illustrated in the problems in the first chapter, especially Problem 1.2.9 and Problem 1.2.10.

+

+

+

ITERATIVEANALYSIS OF ANHARMONIC OSCILLATIONS

491

The drive term (3/?a3/4)cos wgt is troublesome. Solving by, for example, the Laplace transform technique, one finds its response to be proportional to t sin wgt, which becomes arbitrarily large with increasing time. This occurs because the “drive” frequency is equal to the natural frequency of the unperturbed system (which is an ideal lossless simple harmonic oscillator). The infinite buildup occurs because the drive enhances the response synchronously on every cycle, causing the amplitude to grow inexorably. This is known as “resonance.” This infinite buildup is clearly nonphysical, and a perturbing term like this is known as a “secular term.” The rate of growth is proportional to b, but the motion will eventually blow up no matter how small the parameter #I. Having identified this problem, its source is fairly obvious. It is only because of the parabolic shape of the potential well that the frequency of a simple harmonic oscillator is independent of amplitude. Since the extra terms that have been added distort this shape, they can be expected to cause the frequency to depend on amplitude. This is known as “detuning with amplitude.” This detuning will disrupt the above-mentioned synchronism, and this is presumably what prevents the unphysical behavior. Having identified the source of the problem, it is not hard to repair the situation. We need to include a term 200Swx on the left-hand side of Eq. (16.2.3) to account ~ w = for amplitude-dependent shift of the “natural frequency of oscillation”: 6 . + q + 6 w . (The term 602x that might also have been expected will be dropped because it is quadratically small.)

+

(F+

2awgSW

1

cos 0 0 1

+ Pa3 -cos 3wgt. 4 (16.2.4)

Having added a term to the left-hand side of the equation, it was necessary to add the same term to the right-hand side in order to maintain the equality. But (consistent with the iterative scheme), this term has been evaluated using the zeroth approximation to the motion. The only way for this equation to yield a steady periodic solution is for the coefficient of cos q t to vanish; this yields a formula for w : (16.2.5)

For want of a better term, we will call this the procedure “the Linstedt trick.” We have made this only a “qualified” equality since it will shortly be seen to be not quite right unless a = 0. Making this substitution, the equation of motion becomes (16.2.6)

492

PERTURBATION THEORY

Because none of the frequencies on the right-hand side of the equation are close to w , they have been approximated by wg -+ o.At this particular amplitude, the frequency wg has lost its significance and all functions of x have become periodic with frequency w. A particular integral of this equation can be obtained by inspection:

(Yu2 1 (Yu2 1 flu3 1 (15.2.7) 2 w2 2 3w2 4 8w2 where ... is a reminder that the solution of an inhomogeneous equation like Eq. (16.2.6)remains a solution when it is augmented by any solution of the “homogeneous equation” (obtained by dropping the terms of the right hand side). Augmenting Eq. (16.2.7)by the zeroth-order solution yields q

n(t) = - - - - c o s 2 w t - - - c o s 3 w r - - + - . . ,

9

x ( t ) = u cos wt

1 (Ya2 1 pa3 1 +- - --cos 2ot - --cos 3 ~+t . . . . (16.2.8) 2 69 2 302 4 8w2 (YU2

Each of these terms comes from the general formula for the response to drive term cos r u t , where r is any integer, which is 1

(-r2

+ l)w2 cos rwt.

(16.2.9)

It was the fact that the denominator factor r 2 - 1 vanishes for r = 1 that made it necessary to suppress the secular term before proceeding. This problem is ubiquitous in mechanics; it goes by the name “the problem of vanishing denominators.” Had we solved the equation by the Laplace transform technique, the vanishing denominator problem would have manifested itself by the structure of the formula for F-(s): 1 1 s

S

-+ i w s - iws2 + r2w2’

(16.2.10)

For r = 1 , the poles become double. We have to leave it ambiguous whether Linstedt’s trick constitutes a theorem of mathematics for solving the equation or a principle of physics stating that “nature” shifts the frequency of oscillation to avoid the infinity. As it happens, nature has another way of handling the problem, namely by creating chaos. Like a spoiled child who’s losing, nature dumps the game board on the floor. Speaking very loosely, for small amplitudes nature chooses to maintain regular motion with shifted frequency, but for large amplitudes has to resort to irregular, chaotic motion. The way nature proceeds from the regular to the irregular regime is tortuous and poorly understood, but we will stay close to the regular regime. Returning to Eq. (16.2.8),even given that we are suppressing terms proportional to a4 and above, the . . and the 9 are still present for two reasons: The last term is not quite correct (which we rectify below) and, more fundamentally, but also more easily taken care of, the solution is not unique. To make it unique, we should make it match given initial conditions x(0) = xg, i ( 0 ) = ug. Since the choice of time origin has been left arbitrary we can make the replacement t + t - to in Eq. (16.2.8) and then adjust a and to to provide the required match.

-

ITERATIVE ANALYSIS OF ANHARMONIC OSCILLATIONS

493

(b) quartic perturbed potential

(a) cubic perturbed potential

FIGURE 16.2.1. Perturbed potential energy functions leading to anharmonic oscillations. (a) “Cubic”deformation makes the spring “hard“ on the left, “soft”on the right. (b) “Quadratic” deformation make the string symmetrically hard on the left and on the right.

One may find it surprising that only the anharmonic term proportional to x 3 in Eq. (16.2.1) has led to an amplitude-dependent frequency shift in Eq. (16.2.5). Fig. 16.2.1 should make this result at least plausible, however. If one considers the restoring force in the problem as being due to a spring, then pure simple harmonic motion requires perfect adherence to “Hooke’s law” by the spring. An actual spring may violate Hooke’s law either by being “soft” and giving too little force at large extension or “hard” and giving too little. If a spring is “hard,” the natural frequency increases with increasing amplitude. But if the spring is soft on the left and hard on the right, the frequency shifts on the left and right will tend to cancel. We have seen for a pure quadratic force term (cubic potential term) that this cancellation is exact. There is one more thing we should be concerned about, though. To some tastes, we may have been a bit cavalier in dropping small terms, and we have in fact made a mistake in the treatment so far. In our haste at accepting the absence of frequency shift coming from the quadratic force term a x 2 , we failed to register (on dimensional grounds, by comparison with Eq. (16.2.5)) that the frequency shift it would have caused (but didn’t because of a vanishing coefficient) would have been proportional to aa/wo. Since formula (16.2.5) can be said to be the a2 correction in a formula for the natural frequency as a power series in a, we have included only the effect of the B x 3 term in the original equation, but not yet the a x 2 term. We must therefore perform another iteration stage, this time substituting for x ( t ) from Eq. (16.2.8) into the right-hand side of Eq. (16.2.1). Exhibiting only the secular term from Eq. (16.2.4) and the only new term that may contain a part oscillating at the same frequency, the right-hand side of Eq. (16.2.3) becomes

aa2

I

aa2 1

+....

(16.2.11)

494

PERTURBATIONTHEORY

In making this step, we have been justified in dropping some terns because they lead only to terms with higher powers of a than the terms being kept. To apply the Linstedt trick to this expression, we need only isolate the term on the right-hand side that varies like c o s q t and set its coefficient to zero. In a later section, a formula will be given that performs this extraction more neatly, but here it is simple enough to do it using easy trigonometric formulas. Completing this work, and setting the coefficient of the secular term to zero, we correct Eq.(16.2.5) to (16.2.12) One sees that “in a second approximation” (contribution proportional to a*),the quadratic force term gives a term of the same order that the cubic force term gave in the first approximation.Here “the same order” means the same dependence on a. Of course, one or the other of LY*/WOand j3 may dominate in a particular case. Already at this stage, the solution is perhaps adequate for most purposes. One has determined the frequency shift at amplitude a and, in Eq. (16.2.7), has obtained the leading “overtone” amplitude of the motion at “second harmonic frequencies” 2w as as well as the “DC offset” at zero frequency. Since the third harmonic amplitude is proportional to a 3 , it is likely to be negligible. But if it is not, the final term in Eq. (16.2.8) needs to be corrected. This is left as an exercise. Problem 16.2.1: Complete the second interation step begun in the text in order to calculate to order u3 the third harmonic response (at frequency 3 0 ) for the system described by Eq. (16.2.1).

Let us consider the degree to which Eq. (16.2.7) (with its last tern dropped or corrected), or any similar “solution” obtained in the form of a truncated Fourier series, “solves” the problem. By construction such a solution is perfectly periodic. But as the amplitude a increases, the convergence of the Fourier series, which has only been hoped for, not proved, becomes worse. From our knowledge of motion in a one-dimensional potential, we know that the true behavior at large a depends on the values of LY and /I.If j3 = 0, the potential energy becomes negative either on the left or on the right, depending on the sign of LY.In this case, our periodic solution will eventually become flagrantly wrong. On the other hand, if LY = 0 and B < 0 (in which case we have what is known as “Duffing’s equation”) the restoring force becomes arbitrarily large both on the left and on the right, and the motion remains perfectly periodic. Even in this case, direct calculation of the total energy would show that energy is only approximately conserved according to our solution. If we declare that periodicity is the essential “symplectic” feature in this case, then we might say that our solution satisfies symplecticity (by construction) but not energy conservation.This example illustrates the difficulty in characterizing the strong and weak aspects of any particular method of approximation. Most methods of solutions that derive power series solutions one term at a time do not converge. Like most valid statements in this area, there is a theorem by P o i n c d

THE METHOD OF KRYLOV AND BOGOLIUBOV

495

to this effect. (However, there is a method due to Kolmogorov, called “superconvergent perturbation theory,” that can yield convergent series. It is described briefly in Section 16.5.) It is not so much mathematical ineptitude that causes these procedures to not yield faithful solutions as it is the nature of the systems-most systems exhibit chaotic motion when the amplitude is great enough to cause the Fourier series convergence to deteriorate. (This last is a phenomenologicalobservation, not a mathematical theorem.) In spite of all these reservations, solutions like 4.(16.2.7) can describe the essential behavior of anharmonic systems.

16.3. THE METHOD OF KRYLOV AND BOGOLIUBOV The method of Krylov and Bogoliubov (to be abbreviated here as “the K-B method”) is probably the closest thing there is to a universal method for analyzing oscillatory systems, be they single- or multidimensional,harmonic or anharmonic, free or driven. The book by Bogoliubov and Mitropolsky [l] is perhaps the best (and not an excessively difficult) reference, but unfortunately it is not universally available. The method starts by an exact change of variables resembling that of Section 14.3.3 and then continues by combining the variation of constants, averaging, and Linstedt methods described in the two previous sections of this chapter. Perhaps its greatest weakness (at least as the method is described here) is that it is not explicitly Hamiltonian. It is, however, based on action-angle variables, or rather on amplitude and phase variables that are very much like action-angle variables. It cannot be said that this method is particularly illustrative of the geometric ideas that this text has chosen to emphasize. But the method does lend itself to qualitative description that is well motivated and, in any case, every mechanics course should include study of this method. Furthermore, it will be appropriate in a later section to compare a symplectic perturbation technique with the K-B method. There is little agreement as to where credit is due. Even Bogoliubov and Mitropolsky credit Van der Pol for the method that is now commonly ascribed to Krylov and Bogoliubov. What is certain is that this school of Russians validated, expanded the range of applicability of, and otherwise refined the procedures. We will derive this method only in one dimension, but will apply it to a multidimensional example in a later section. Since the method is so well-motivated and “physical,” this extension is not particularly suspect.

16.3.1. First Approximation We continue to analyze oscillatory systems and assume that the motion is simple harmonic for sufficently small amplitudes, so the equation of motion has the form d2x dt2

+

W ~ = X cf

(x, dxldt),

(16.3.1)

where f ( x , d x l d t ) is an arbitrary perturbing function of position and velocity, and E is a small parameter. The unperturbed motion can be expressed as

496

PERTURBATION THEORY

x = acos a,

i = - a w sin@, where 0 = Wt

+ @,

(16.3.2)

where @ is a constant, and perturbed motion will later be expressed in the same form. These equations can be regarded as a transformation x , i + a, 0 for which the inverse transformation is given by

The variables a and @ are not essentially different from action and angle variable, but it will not be assumed that a is an adiabatic invariant. Since this will be a “variation of constants” method, the “constants” being a and @, the motion in configuration and phase space will be as illustrated in Fig. 16.3.1. Since the parameter ~0 will remain fixed, it is not wrong to think of @ as an angle measured in a phase space that is rotating at constant angular velocity wg. Viewed in such a frame, the system point moves slowly, both in radial position and in angle. For brevity, we will continue to use both symbols, g5 and 0. It is important to remember that they are equivalent, always satisfying CP = q t 4. But the @ will be used to express the argument of “slowly varying” functions, while @ will be the argument of “rapidly varying” functions. Eq.(16.3.1) can be transformed into two first-order equations for a and 0. Differentiating the first of Eqs. (16.3.3) and resubstituting from Eq.(16.3,l) yields

+

sin @f(a cos @, -awg sin 0).

(16.3.4)

i l

a ( t ) cosy c

mt

+e

w

FIGURE 16.3.1. (a) In K-B approximationthe actual motion is fit by a cosine function modulated by amplitude a(t).(b) The angle in (normalized) phase space advances as w o t + # ( f ) , where WJ is constant and Q ( t ) varies slowly.

497

THE METHOD OF KRYLOV AND BOGOLIUBOV

The arguments of the function f have also been reexpressed in terms of a and @. Since this factor will appear so frequently, we will abbreviate it as F ( a , 0)= f(a cos @, -aw sin a). An expression like (16.3.4) can also be found for &. Together we have2 E

U = -- sin@F(a,@)

w

= cG(a, 4),

E

d) = wo - -cos @F(a,@) aw

E

wo

+ c H ( a , 4).

(16.3.5)

These are exact equations. They are said to be “in standard form.” They have much the same character as Eqs. (14.3.34), but here we are dealing with autonomous equations with the perturbation expressed as a direct drive whereas there we were dealing with a nonautonomous system with the perturbation expressed as a parametric drive. It is nevertheless natural to contemplate approximating the equations by averaging the right-hand sides, for @ ranging from 0 to 2 n . This yields U

(U)

1’”

= EGav(a), where Gav(a) = -2nw 0

F(a,

@ sin @ I da,

(16.3.6) These equations constitute “the first K-B approximation.” They are ordinary differential equations of especially simple form. Since the first depends only on a, it can be solved by quadrature. Then the second can be solved by integration.

16.3.2. Examples Conservative Forces: If the force is derivable from a potential, then f ( x , d x l d t ) is, in fact, independent of d x / d r . In this case we have

I”

Gav(a) = -f ( acos a) sin CP d ~=,0, 2nw -n

(16.3.7)

because the integrand is an odd function of @. The first of Eqs. (16.3.6) then implies that a is constant-a gratifying result. The second of Eqs. (16.3.6) then yields wl(a)

I”

= @ = wg - 2nawo

0

f ( a cos @) cos cp dcp.

(16.3.8)

*The functions G ( a ,9) and H ( a , 9) are introduced primarily for convenience in a later section of the text. As Hamiltonian methods are not being employed, there should be no danger that H(a, 4) will be interpreted as a Hamiltonian.

498

PERTURBATION THEORY

Here the frequency at amplitude a has been expressed by 01 (a), where the subscript indicates “first K-B approximation.” For a gravity pendulum with natural frequency wo = the equation of motion for the angle x is

m,

(16.3.9) We have F(a, Q) = w;a3 c0s3 @/6, and the equations in standard form are

Averaging the second equation and setting t = 1.

(16.3.11) This dependence on amplitude makes it important that pendulum clocks run at constant amplitude if they are to keep accurate time. The importance of this consideration and the quality of the approximationcan be judged from the following table: Radians

Degrees

0.0 1.o 2.0

0.0

57.3 114.6 171.9

3 .O

01

(a)/wo 1 .o

0.938 0.75 0.438

wexact(a)/wo

1 .o 0.938 0.765 0.5023

Van der Pol Oscillator: Consider the equation L -d 2Q dt2

+ (-1RI

2dQ + c Q )dt

Q =O. +C

(16.3.12)

The parameters in this equation (for charge Q on capacitor C) have obviously been chosen to suggest an electrical L R C circuit. For small Q. we can neglect the term cQ2 But the term -1 R I has the wrong sign to represent the effect of a resistor in the circuit. Stated differently, the resistance in the circuit is negative. Normally, the effect of a resistor is to damp the oscillations, which would otherwise be simple harmonic. With negative resistance, one expects (and observes) growth. In fact, the circuit should spring into oscillation, even starting from Q = 0 (because of inevitable tiny noise terms not shown in the equation) followed by steady growth. But with growth, it will no longer be valid to neglect the c Q 2 9 term. This term has the “correct” sign for a resistance, and at sufficiently large amplitude it “wins.” We anticipate some compromise, therefore, between the growth due to one term and the damping due to the other.

%.

%

THE METHOD OF KRYLOV AND BOGOLIUBOV

499

This system, known as the “Van der Pol” oscillator, is readily analyzed (which was done by Van der Pol, early in the twentieth century) using the K-B method. Eliminating superfluous constants, the perturbing term can be written as

(16.3.13) and the equations in standard form are

(16.3.14) After averaging, these become

a

a = E 2 (1

);

-

,

@=6xJ.

(16.3.15)

Problem 16.3.1: By solving Eqs. (16.3.15),show that for a Van der Pol oscillator starting with amplitude a0 in the range 0 < a0 -= 2, the subsequent motion is given by (16.3.16)

which inexorably settles to pure harmonic oscillation at u = 2 after a period of time lengthy compared to l/e. For 2 -= m, the solution settles to the same amplitude with the same time constant. According to the previous problem, the motion settles to a “limit cycle” at a = 2 independent of its starting amplitude. The following graph of “growth rate” d a / d t makes it clear that this result could have been expected. Only at a = 0 and at a = 2 is d a / d t = 0, and only at a = 2 is the sign “restoring.” For amplitudes in the vicinity of a = 2, it is sound to approximate d a / d t by a straight line. Then one obtains da dt

- = ~ ( -2a ) ,

and a = 2 - (2 -ao)e-“.

(16.3.17)

The growth rate is plotted in Fig. 16.3.2,which also shows the linear approximation. 16.3.3. Equivalent Linearization We have seen that the first K-Bapproximation accounts fairly accurately for some of the most important nonlinear aspects of oscillators, such as amplitude dependence

500

PERTURBATION THEORY

3

FIGURE 16.3.2. Graph of (1 It) da/dt for Van der Pol oscillator in lowest K-B approximation and its approximation near a = 2.

of frequency and limit cycles. Since autonomous linear equations do not exhibit oscillation, it can be said that autonomous oscillators are inherently nonlinear. Unfortunately, this takes away from us our best tool-the ability to solve linear equations. For multidimensional systems this problem is especially acute. In this section we study the method of “equivalent linearization,” which is based on the K-B approximation (or similar methods) and imports much of the effect of nonlinearity into a description using linear equations. Nowadays, such approaches find their most frequent application in the design of electrical circuits. Such circuits can have many independent variables, and it is very attractive to be able to apply linear circuit theory even when some of the branches of the circuit are somewhat nonlinear. Consider again Eq.(16.3.1), which we rewrite with slightly modified coefficients intended to suggest “mass and spring”: d2x mdt2

+ kx = r f ( x , d x l d t ) .

(16.3.18)

The small-amplitude frequency is wo = f l , and the nonlinear forces are contained in f ( x , d x l d t ) . We define an “equivalent system” to be one for which the equation of motion is d2x +A&)dt2

m-

dx dt

+ k,(a)x = 0,

2

ke(a)

and @,(a) = -.

m

( 16.3.19)

THE METHOD OF KRYLOV AND BOGOLIUBOV

501

It is not quite accurate to say that this is a “linear equation,” as the parameters depend on amplitude a. But if a is approximately constant (and known), this may be a tolerable defect. By applying the K-B approximation, we find that the two equations “match” as regards their formulas for ri and b if we define the “equivalent damping coefficient” &(a) and the “equivalent spring constant” ke(a) by copying from Eqs. (16.3.6):

F ( ~ , @ ) c o s @ ~ @ = ~ + + ~ ( u(16.3.20) ). These formulas are equivalent to making in Eq. (16.3.18) the replacement E ~ ( x d, x l d t )

dx dt

-+ -kl(a)x - Ae(a)-,

(16.3.21)

and the averaged equations are (16.3.22) The fractional reduction in amplitude after one period, (Ae/Ue)(n/m), is sometimes known as the “damping decrement.”

16.3.4. Power Balance, Harmonic Balance If we wish, we can interpret Eq. (16.3.18) as describing the interplay of an “agent” providing force c f ( x , d x l d t ) and acting on a linear system described by the terms on the left-hand side of the equation. The work done by the agent during one period of duration T = 2 n / w is given by

LT

Ef ( x ,

dx dx/dt)- d t = - 6 q a dt

F( a , @) sin @d@.

(16.3.23)

Our “equivalent agent” provides force -kl(a)x - A,(a) d x ld t, and hence does an amount of work per cycle given by

The first term here gives zero, and the second gives -na*WAe(a). Equating the results of these two calculations, we recover the first of Eqs. (16.3.20). The expression ~ f ( n dx/dr)(dx/dt) , is the instantaneous power dissipated in lossy elements, and we have matched the average power dissipated in the equivalent agent to that of the actual agent. To obtain ke(a) by a similar argument, it is necessary to define average

502

PERTURBATIONTHEORY

reactive power by E ~ ( x dx/dt)x/l”. , The equivalent parameters can then be said to have been determined by the “principle of power balance.” Another (and equivalent) approach to establishing an “equivalent” linear model is to express the function F (a, @) as a Fourier series:

+ Eh.(a)sinn@.

(16.3.25)

n=l

The coefficients in this expansion are given by P2n

1

g, =

n o

1

F(a, @)cosn@d@, h, =

1”

F ( a , 0)sinn@d@. (16.3.26)

The “in-phase,” “fundamental component” of force is therefore given by €

-cos@Jd n

2n

F (a,@’)cosn@’d@’= -kl(a)acos@,

(16.3.27)

where the defining equation (16.3.20) for kl (a) has been employed. This is equal to the in-phase portion of the “equivalent” force. The out-of-phaseterm can be similarly confirmed. This is known as the “principle of harmonic balance.”

16.3.5. Qualitative Analysis of Autonomous Oscillators From the analysis of the Van der Pol oscillator, and especially from Fig. 16.3.2, it is clear that much can be inferred about the qualitative behavior of an oscillator from the equation da

- = €G(U).

dt

(16.3.28)

The function G ( a ) may be approximated by G,, obtained using the first or higher K-B approximation,or even phenomenologically. Points a, at which G(a,) = 0 are especially important because da/dt = 0 there, but it is not known a priori whether this “equilibrium” is stable or unstable. The linearized dependence on deviation from equilibrium Sa is given by G(u

+ 6a) = G’(a,)Sa.

(16.3.29)

As in Eq.(16.3.17), it is clear that an initial deviation 6a10 evolves according to 6a = 6a IOeCG’(ae)‘.

(16.3.30)

Stability is therefore governed by the sign of G’(a,). Some possible oscillator profiles are illustrated in Fig. 16.3.3. In every case, G ( a ) becomes negative for sufficiently large a, because otherwise infinite amplitude oscillation would be possible. Points where the curve crosses the horizontal axis are possible equilibrium points, but only those with negative slope are stable, and this is

THE METHOD OF KRYLOV AND BOGOLIUBOV

503

FIGURE 16.3.3. Growth-rate profiles G(a) for various autonomous oscillators. Arrows indicate directions of progress toward stable points or stable limit cycles.

indicated by arrows that indicate the direction of system evolution. Stability at the origin is a bit special in that it is influenced by the sign of G(0).In the case shown in Fig. 16.3.3d, the system springs into oscillation spontaneously and evolves to the first zero crossing. In the case shown in Fig. 16.3.3c, stable oscillation is possible at the second zero crossing, but the system cannot proceed there spontaneously from the origin because the growth rate at the origin is negative. In Fig. 16.3.3a, the origin is stable and, in Fig. 16.3.3b. the system, like the Van der Pol oscillator, moves spontaneously to the first zero crossing (after the origin). The sorts of behavior that are possible can be discussed analytically in connection with a slightly generalized version of the Van der Pol oscillator (see Fig. 16.3.4). Let its equation of motion be d2x dt2

+

1

+ k3x2 + A5.x’)

dx

2 + w0x = 0.

(16.3.31)

The coefficient of dx/dt could also have even powers, but it is only the odd powers that contribute to da/dr in first K-B approximation. The first of Eqs. (16.3.6) yields (16.3.32)

504

PERTURBATIONTHEORY

dI

no oscillation

stable oscillation

FIGURE 16.3.4. A small change in a parameter can move the system curve from the lower, nonoscillatorycase to the upper curve that indicates the possibility of stable oscillation at amplitude al . “Bifurcation”between these states occurs when a1 = a2.

Let us assume that 11 > 0 so that self-excitation is absent. Other than the root at the origin, the zeros of G,, (a) are given by

(16.3.33) Points at which a qualitative feature of the motion undergoes discontinuous change are known as points of “bifurcation.” Assuming the first term is positive, the condition establishing a bifurcation point as any one of the parameters is varied is the vanishing of the square root term: A: = 8J.lA.5.

(16.3.34)

Rather than having multiple parameters, the qualitative behavior of the oscillator can be more clearly understood if one dominant “control parameter” or “stabilizing parameter,” call it p, is singled out. Suppose G ( a ) is given by U

G ( a ) = -P

+ Gr(U),

(16.3.35)

where the relative strength of the leading term is regarded as externally controllable via the parameter p . Small p corresponds to very negative G ( a ) and no possiblility of oscillation. In Fig. 16.3.5, the separate terms of Eq. (16.3.35) are plotted (with the sign of the control term reversed). Different control parameter values are expressed by different straight lines from the origin and, because of the negative sign in Eq. (16.3.35). stability is governed by whether and where the straight line intersects the curve of G,(u). If the curve of Gr(a) at the origin is concave downward, as in Fig. 16.3.5a, then, as p is increased, when the initial slope of the control line matches that of Gr(u), the oscillator is self-excited and settles to the first intersection point. This is known

THE METHOD OF KRYLOV AND BOGOLIUBOV

505

(a) soft turn-on a

(b) hard turn-on a

*I

111

p2

(c) hysteretic dependence on control parameter FIGURE 16.3.5. Characteristics of autonomous oscillator exhibiting hysteretic turn-on and extinction.

506

PERTURBATION THEORY

as “soft turn-on.” But if the curve of G,(a) at the origin is concave upward, as in Fig. 16.3.5b, as p is increased from very small values, a point is reached at which self-sustaining oscillation would be possible but does not if fact occur because the origin remains stable. This point is indicated by pl in Fig. 16.3.5b. As p is increased further, a point p2 is reached where the origin is unstable and the system undergoes “hard turn-on” and continues to oscillate at the large amplitude a2. From this point, if p is increased the amplitude increases. Furthermore, if p is reduced only modestly, the amplitude will follow down below a2 without extinguishing. But when p is dropped below p1, the oscillator turns off suddenly. The overall “hysteresis cycle” is illustrated in Fig. 16.3.5~.It is beyond the capability of this model to describe the turn-on and turn-off in greater detail, but the gross qualitative behavior is given.

Problem 16.3.2: A grandfather clock keeps fairly regular time because it oscillates at constant amplitude but, as lossless as its mechanism can be, it still has to be kept running by external intervention, and this can affect its rate. For high precision, its amplitudehas to be kept constant. A “ratchet and pawl” or “escapement” mechanism by which gravitational energy is imparted to the pendulum to make up for dissipation is illustrated schematically in Fig. 16.3.6. This mechanism administers a small impulse I , once per cycle, at an approximately optimal phase in the cycle. An equation of motion for a system with these properties is d2x dt2

m-

dx

+A-

dt

dx

-I

ar

dx

‘4i( x - no> + k x = 0, 2

+

(16.3.36)

FIGURE 16.3.6. Grandfather clock with “escapement mechanism” exhibited schematically.

THE METHOD OF KRYLOV AND BOGOLIUBOV

507

where the &function controls the phase of the impulse and the other factor in the term proportional to I assures that the impulse occurs on only one or the other of the “back” and “forth” trips. For “too small” amplitude, the K-B approximation yields d a / d t = -(A/2m)a, and the clock stops. Find the amplitude no such that the clock continues to run if a > xo. Find the condition on xo, I, and A that must be satisfied for the clock to keep running if the pendulum is started with initial amplitude exceeding xo. In the same approximation, find the dependence on a of the frequency of oscillation. *16.3.6. Higher Approximation3 Proceeding to an improved approximation in solving Eq. (16.3.1) may be necessary, especially if higher harmonics are to be accurately evaluated. Since this discussion is somewhat complicated, to make this section self-contained, we will rewrite some equations rather than referring to their earlier versions. The solution is sought in the form

(16.3.37)

x ( t ) = a cos @

and the phase is separated into “fast” and “slow” parts:

(16.3.38)

@=wgt+#J.

The equations satisfied by a and #J are

(16.3.39) where G(a,4) and H ( a , 4) are known functions, appearing in the “equations in standard form,” Eq. (16.3.5). If the system were nonautonomous, these functions would also depend explicitly upon t . The following development would still proceed largely unchanged, but we will simplify by restricting discussion to autonomous systems. To solve these equations, we anticipate transforming the variables (a, @) + (a*,@*) according to a = a*

+ El+*,

a*),

@ = @*

+ E U ( U * , @*).

(16.3.40)

For the time being, the functions u and u are arbitrary. Later they will be chosen to simplify the equations. The “small parameter” 6 will be used to keep track of the “order” of terms. Corresponding to @* we define also 4*, related as in Eq. (16.3.38): @* = wgt

+ #J*

and hence

&* = wg + $*.

(16.3.41)

3This section follows closely Stratonovich [2]. However, the concentration on random processes (though lucidly explained in this remarkable book) would probably be disconcerting to someone interested only in mechanics. According to Stratonovich the procedure for proceeding to higher approximation is due to Bogoliubov.

508

PERTURBATION THEORY

(From here on, time derivatives will be indicated with dots, as here.) The equations of motion will be assumed to have the same form as in Eq. (16.3.39). u* = cG*(a*,$*),

$* = CH*(a*,4*),

(16.3.42)

so the new functions G* and H* will also have to be found. Since Eqs. (16.3.39) are to be satisfied by values of Q and @ given by Eqs. (16.3.40), we must have

Ci = EG(Q*+ cu(a*, a*),4* + cu(a*, @*)),

4= cH(a* + cu(a*, @*), 4* + E U ( U * , a*)).

(16.3.43)

These are the same as Eqs. (16.3.39) except that the arguments are expressed in terms of the new variables. They are exact. From here on, it will be unnecessary to exhibit arguments explicitly since the arguments of G x and H* will always be (a*,4*) and the arguments of u and u will always be (a*, a*).(Since @* and 4* are equivalent variables, the distinction in arguments here is essentially cosmetic; the rationale behind the distinction should gradually become clear.) There is an alternate way of determining the quantities appearing on the lefthand side of Eqs. (16.3.43). It is by time-differentiating Eqs. (16.3.40) and using Eqs. (16.3.42): U

= ci*

au au + E H * ) . + ~ i =i EG*+ E-G* + E---((wo aa* a@*

(16.3.44) Equating Eqs. (16.3.43) and (16.3.44). we obtain

au

G*+wg- au = G ( ~ * + E u , ~ * + E u ) - E au- G * - c - H * , aw aa* a@*

au

au

a@

aa*

H * + w g T =H(~*+Eu,$*+cu)-E-G*

av -~-H*.(16.3.45)

a@*

These are exact functional identities; that is, they are true for arbitrary functions u and u. But terms have been grouped with the intention of eventually exploiting the smallness of E . This is a “high-frequency approximation” in that terms proportional to q are not multiplied by 6. u and IJ will be determined next. We assume that all functions are expanded in powers of E:

G* = c; + CG; + . .., u = u1 + E U 2

+-.

*,

H* = H ;

+EH; + ... ,

u = u1 f E U 2

+....

( 16.3.46)

Because all of the functions that have been introduced have to be periodic in @*, one is to imagine that all have also been expanded into Fourier series. Then, averaging over one period amounts to extracting the term in the Fourier series that is independent of a*.The guidance in determining the functions ui and V i is that they are to contain all the terms that depend on @* and only those terms. According to

THE METHOD OF KRYLOV AND BOGOLIUBOV

509

Eqs. (16.3.40), the quantities a* and 4* will then contain no oscillatory factors. Then, because of Eq. (16.3.42), the terms Gf and Hi* will also be independent of @*.That this separation is possible will be demonstrated by construction. The formalism has been constructed so that, at each stage, @*-dependentterms enter with an extra power of E because u and u entered with a multiplicative factor E . This is also legitimized constructively, but the overall convergence of the process is only conjectural. Since all functions are Fourier series, it is too complicated to make these procedures completely explicit, but all functions can be determined sequentially using Eq. (16.3.44). Since these equations contain only derivatives of u and u, only those derivatives will be determined directly. But the antiderivatives of terms in a Fourier series are easy-the antiderivatives of sinr@ and cosr@ are - cosr@/r and sinr@/r. Since all coefficients will be functions of a*, it will be necessary to evaluate the antiderivatives of the terms aui/aa* and aui/aa* to obtain the ui and ui functions themselves. All this is fairly hard to describe but fairly easy to accomplish. It is easiest to understand when one is guided by an example. Substituting Eq. (16.3.46) into Eq. (16.3.45) and setting E = 0, we obtain the first approximation:

(16.3.47) The functions on the right-hand side are unambiguous, as G and H are the functions we started with, only now the “old” arguments have been replaced by “new” arguments. We separate these equations into @*-independentterms

(16.3.48) and @*-dependentterms

(16.3.49) where the ad hoc notation Y ) stands for the bracketed quantity after constant terms have been removed. Before continuing, we illustrate using the Van der Pol oscillator as an example. From the equations in standard form, Eq. (16.3.14), after using some trigonometric identities to convert them to Fourier series, we have a a3 a G(a, 4 ) = - - - - - cos 2@

2

8

2

+ a3 - cos4@, 8

510

PERTURBATION THEORY

(i ):

H(a, r#) = -

+-

a3 sin2Q - - sin4Q. 8

(16.3.50)

Applying Eq.(16.3.48), we obtain (16.3.51) which recovers the result Eq. (16.3.15) obtained in the first K-B approximation. Applying Eq. (16.3.49) and integrating, we also obtain ul =

a -sin2@-t4wo

a3 sin4@,

VI

32wo

= 0.

(16.3.52)

All that remains in this order of approximation is to substitute these into Eqs. (16.3.40) and from there into Eq.(16.3.37) to obtain the harmonic content of the self-sustainingoscillations. We will show just one more step, namely the equations corresponding to Eq. (16.3.47) in the second approximation.

All functions required are available from the previous step, and the separation is performed in the same way.

Problem 16.3.3: Complete the next iteration step in the Krylov-Bogoliubov analysis of the Van der Pol oscillator. That is to say, complete Eqs. (16.3.53) and perform the separation into constant and varying terms. Show that G; = 0 and evaluate H;. Write Eqs. (16.3.15) with the newly calculated term included.

Problem 16.3.4: Find the term proportional to a4 in the amplitude dependence of the ordinary gravity pendulum. In other words, extend Eq. (16.3.11) to one more term. *16.4. MULTIDIMENSIONAL, NEAR-SYMPLECTIC PERTURBATION THEORY4

For multidimensional systems, the problems that arise and the methods of overcoming them are much the same as have been described up to this point, but an already complicated situation becomes even more so. In this section, the formalism of Hamil41f the previous few sections have left the reader staggering, this section may deliver a knockout punch.

MULTIDIMENSIONAL.NEAR-SYMPLECTIC PERTURBATIONTHEORY

511

ton’s equations based on unitary geometry will be employed. In addition to exercising some of the geometric methods described previously in the text, this will yield formulas that are quite reminiscent of quantum mechanics. In particular, the main result will be a formula for the eigenfrequency shift caused by a perturbing force, based on “matrix elements” of the perturbing term, themselves calculated using unperturbed eigenfunctions. We wish to analyze Hamiltonian systems starting from Eqs. (15.3.2), but with the coefficient matrix separated into two parts: A(t)= C

+ B(t).

(16.4.1)

The part C is to be the “unperturbed part” and is necessarily linear. For the formalism to be developed, its elements could be allowed to be periodic functions o f t , but it will simplify the formalism if the elements are constant, so we assume this to be true. In any case, we are assured by Lyapunov’s theorem (Section 15.5.2) of the existence of a transformation to coordinates for which C is constant. So there is little loss in generality in accepting C to be a constant matrix. The unperturbed equations are therefore dx dt

- = cx,

(16.4.2)

where

W)=

(7 !),

and C = - S H =

(16.4.3)

Solution of this equation (actually 2n equations) has already been studied in detail, starting in Section 15.3.3.The present development will be based on the eigenvectors of C defined in Section 15.3.4, the theory of adjoint systems developed in Section 2.6, and the methods developed in Section 15.5.5. All notation and all simplifying assumptions from these sections will be retained. (The most important assumption is that all eigenvalues are disjoint-ways of altering the system to make this applicable have been discussed previously.) A fundamental set of solutions to Eq.(16.4.2) has been given in Eq.(15.3.59). The perturbed equations are dx - = (C dt

+ B(t))x,

where B ( t ) =

(-&I

-Q(t)

).

(16.4.4)

Note that the symbol P has now been taken over to be the perturbed part of the Hamiltonian, and the possibility of lossy terms has been incorporated by the Q element. Since B(t) is only a perturbation, the requirements placed on it can be less strict than those on C. In particular, it can be time-dependent, but we assume such dependence is periodic with period T. Since the perturbing terms are periodic, it is possible to

512

PERTURBATIONTHEORY

Fourier-expand them in complex exponentials: (16.4.5) m

rn

Here p~ = 2n/ T is the frequency of an externally imposed, nonautonomous perturbation. The subscripts h correspond to the indices in Eq. (15.3.59). We may as well also adopt T as the “period” of the elements of C since they can be regarded as periodic for an arbitrary period. Even if the perturbation is nonlinear, it can be expressed in the form of Eq. (16.4.4) if the elements are allowed to depend on z. A kind of “poor man’s linearity” could then be obtained using Fourier expansions and iterative methods such as have been described in earlier sections of this chapter. But for simplicity we require B(t) to be linear. By hypothesis the unperturbed system is Hamiltonian, but the perturbed system may or may not be. The method under study can therefore be called “nearsymplectic” perturbation theory. If the matrix B(t) is Hamiltonian (i.e., it satisfies Eq. (15.3.16)) then we want to be sure that any claimed solution is also symplectic. If B(r) is not Hamiltonian, then the perturbed system will obviously not be Hamiltonian, but the same method of solution will be applicable and the degree to which it is non-Hamiltonian (for example in the form of a growth rate or decay rate or threshold of instability) will be of interest. As we have defined the problem, the entire theory of periodic systems, as described in Section 15.5, is valid, both for the unperturbed and the perturbed systems. This attaches special importance to the characteristic multipliers and characteristic exponents defined there. Since C is constant, the discussion at the end of Section 15.5.3 applies and, in particular, Eq.(15.5.25), according to which the characteristic exponents are the eigenvalues of C (even though the period T is independent of C).In the limit of weak perturbation B, continuity considerations require the eigenvalues of the perturbed system to deviate only smoothly from the unperturbed eigenvalues. As a result, if B is not too large, the eigenvalues of the perturbed system will remain disjoint, each one unambiguously associated with a particular unperturbed eigenvalue. Calling the unperturbed and perturbed exponents (YO and 01, our major task will be to obtain formulas for the “shift” 01 - a0 caused by B. One hopes the approximation will remain valid for shifts comparable with the initial eigenvalue spacings, since, when previously disjoint eigenvalues approach equality, qualitatively different motion, often instability, usually ensues. In this case, the relatively simple formulas of the theory can accurately predict the position of instability thresholds. Were we to solve the equations (16.4.4) by brute force numerical solution of an initial-value problem, it would be impossible to retain the periodicity properties that we know to be guaranteed by Hamiltonian considerations, except approximately. As a result, over an arbitrarily large period of time the error would be arbitrarily large. We avoid the worst aspects of this problem by solving directly for the eigenvalue shifts. Then the long-time behavior of the solution can be kept “physical” through monitoring of the variation of the eigenvalues. Only if the eigenvalues move off the unit circle, or lose their symmetric relationships, or collide with one another, can the motion be qualitatively different from the unperturbed motion.

MULTIDIMENSIONAL, NEAR-SYMPLECTICPERTURBATIONTHEORY

513

Under our assumptions, all eigenvalues are qualitatively equivalent, so we can work on any one of them and the formulas developed will be applicable for any of the others after obvious alterations. Of the unperturbed solutions listed in Eq. (15.3.59) we adopt the one labeled h = 1,

+

x(t) = euorI1), where Cll) = i pl l l ).

(16.4.6)

The subscript on (YO indicates that it is an unperturbed eigenvalue? We immediately make a change of variable x --+ y, defined by

x = euory,

(16.4.7)

so that the equation for y, instead of being Eq. (16.4.2), is

dy- (C- cr0l)y = 0, dt

(16.4.8)

with (time-independent)solution YO = 11).

(16.4.9)

Since this (unperturbed) solution is a constant, we have gone formally from a constant-operator, variable-solution equation to a variable-operator, constant-solution equation. In this respect, transformation (16.4.7) is analogous to the transition from the Schrodinger picture to the Heisenberg picture in quantum mechanics. In the present context, it can be perhaps be more usefully regarded as viewing the motion from a judiciously chosen “moving” phase space frame of reference. (Because a0 is pure imaginary) the equation that is adjoint to Eq. (16.4.8) is6

dz dt

- + (c’ +cuol)z = 0.

(16.4.10)

According to Eqs. (15.3.46),this is satisfied (uniquely except for the multiplicative factor) by 20

= (11’.

(16.4.11)

As in earlier sections of this chapter, perturbing terms will be placed on the righthand sides of the equations. We must therefore be prepared to include inhomogeneous terms on the right-hand side of Eq. (16.4.8):

(16.4.12) ’Since cro corresponds to the first eigenvector,a more consistent notation would seem to be something like “10, but the extra subscript would just be a nuisance. In any case, since there is no eigenvalue labeled 0.there Is no excuse for thinking of a0 as belonging to it. In short, for the case to be worked out, cro = iF1, and for the normal mode frequency ps it will be “0 = i p s . ‘The “adjoint” of a differential equation was defined in Eq. (2.6.7).

514

PERTURBATIONTHEORY

where lu(t)) is, a priori, an arbitrary function. (The ad hoc notations ly) and lu) do not correspond to our previous notation of placing eigenvalues within the I ) box, but this notation will improve the appearance of the formulas to be introduced next.) From our previous study of perturbation schemes, we know that Eq. (16.4.12) may not make sense because [ u ( t ) )may contain secular terms. Physically speaking, a necessary condition for this to be the case would be that lu(t))contain a term varying at a “natural frequency” of the system described by the left-hand side of the equation. Mathematically speaking, the condition to be concerned about is the possibility that the operator on the left-hand side of the equation has 0 as an eigenvalue and can therefore not be inverted. This is the case in which there is a nontrivial solution of the homogeneous equation. From the discussion of singular algebraic equations in Section 2.6.1 we anticipate that, though singularity will not necessarily make Eq. (16.4.12) unsolvable, the right-hand side will have to meet conditions analogous to Eq. (2.6.36) for a solution to exist. Now, since we are dealing with a differential equation, the condition will have to be that given by Eq. (15.5.59). Since this condition was only applicable to equations with periodic coefficients,the present analysis is similarly restricted. The same comments can be made in more physical terms. The fact that the system is multidimensional does not generically change the problem of secular terms, but it is possible for a particular vector lu(r)) to “not couple to” a particular normal mode even though its frequency matches the natural frequency of the mode. Before proceeding, we must therefore develop a plan for dealing with secular terms. Recall the definition of average given in (15.5.57):

According to Eq. (15.5.59), a periodic solution of Eq. (16.4.12) exists if and only if

In words, the inhomogeneous term has to be orthogonal “on the average” to the solution of the homogeneous adjoint equation? We must therefore require that u satisfy Eq. (16.4.13). If a constant vector u is a candidate for appearing on the right-hand side of Eq. (16.4.12), but does not satisfy Eq. (16.4.13), it can be replaced by Pu where u has been “operated on” by an operator P,defined by Flu) = lu) - Il)(llu).

(16.4.14)

The “projection operator” P can therefore be written

P = 1 - 11)(11.

(16.4.15)

’We have slipped into a somewhat careless use of the term “orthogonal.” A more careful statement would be that the inhomogeneous term u(r) has to be annihilated by the solution of the adjoint equation ( I I that corresponds to the unperturbed solution 11) under study.

MULTIDIMENSIONAL,NEAR-SYMPLECTIC PERTURBATIONTHEORY

515

With normalizations given by Eq.(15.3.52), this causes P to have the properties that and Pll) = 0.

(1IPu) = 0,

(16.4.16)

P can be said to project onto the space annihilated by (1I. Substituting for J 1) from Eq. (15.3.41) and (1I from Eq. (15.3.50), the projection operator for our particular problem is therefore

c 2

0

0

1; pi 0

P=

0

1 0 0 0 1 0 0 0 1

+ o o o ;

~

0 0 0

0

0

0 0 0 0

0 0 0 0

0

)

0 0 0 0 ’

(16.4.17)

The designation of P as a “projection operator” requires that P 2 = P; this relation can be confirmed directly (see for example Eq.(16.4.17)). When operating on a time-dependent function u(t ), the averaged projection operator will be defined by

When operating on a function u(t), whose scalar product with (1I time-averages to zero, P = 1. Returning to Eq. (16.4.12), to assure that the inhomogeneous term appearing on the right-hand side satisfies Eq. (16.4.13), we write that term as P u and obtain

d y- (C- a0l)y = PU(f). dt

(16.4.19)

In this form, u can therefore be arbitrary and the equation still be guaranteed to have a periodic solution. We also have to face the other feature of an inhomogeneous equation whose homogeneous part is singular, namely that its solution is not unique; a particular solution can be augmented by any solution of the homogeneous equation. From a physical point of view this is more nearly a technicality than a problem of principle, since the requirement that initial conditions be matched is the “physics” that selects a unique solution. However, as with the K-B procedure, it is not easy to keep the solution matched to initial conditions and, even if it were, the effort would not be justified if the system is autonomous because the phase of an autonomous oscillator is inessential. Adding any solution y ( ’ ) ( t )of (16.4.8) to some tentative particular solution y(O)(r) of (16.4.12) yields another solution of (16.4.12). One way of assuring a unique solution is to specify as many extra conditions like Eq. (15.5.60) as there are independent

516

PERTURBATIONTHEORY

solutions of (16.4.8).To make the solution of Eq. (16.4.12)unique we therefore modify y@)(r)according to

lY) = IY(O)) - ((llY(o)))ll);

(16.4.20)

because of normalization (15.3.52), this is equivalent to requiring

((llY))= 0.

(16.4.21)

As in Eq. (15.5.62), the solution to Eq. (16.4.19). made unique by condition (16.4.21), can be said to be the result of "operating on" Pu(r) with a linear operator S; that is,

y(t) = SPu(t).

(16.4.22)

By dejnition, then, the solution of Eq. (16.4.19) produced by operator S satisfies

( ( 1 I S P U ) ) = 0,

(16.4.23)

for arbitrary u.* Combining this definition with Eq. (16.4.20) we have the identity SPU(t)

= y(0) - ((lIy(0)))ll).

(16.4.25)

With these preliminaries out of the way, we now seek the coefficient a (close to ao) in a solution to Eq. (16.4.4) of the form x = eufy, where y ( t

+ T ) = y(t).

(16.4.26)

The unknown y satisfies

2 - (C - a0l)y = ((a0- a ) l + B)y. dr

(16.4.27)

As established above, this equation can only be valid if the right-hand side satisfies

((ll((a0 - all

+ WY)) = 0.

(16.4.28)

As a result, it has to be legitimate to replace Eq. (16.4.27) by

dy- (C - a0l)y = P((a0 - a)1+ B)y. dt awere it valid to assume written as

(16.4.29)

2 = Oy for some constant matrix or linear operator n, then S could be S = (0- c+lYol)-',

(16.4.24)

which would operate on the inhomogeneous term to yield a tentative solution needing only to be augmented by an appropriate solution of the homogeneous equation. In practice, it will typically not be possible to write S in closed form: rather it will be necessary to evaluate SPu(t) for each particular value of u(r) that arises by explicitly solving the applicable differential equation and applying condition (16.4.23). To avoid undefined expressions, S must always be "preceded" by P, as in SP.

MULTIDIMENSIONAL, NEAR-SYMPLECTIC PERTURBATIONTHEORY

517

A solution y ( t ) of this equation will not in general satisfy condition (16.4.21), but in any case there is a related function y - J I 1) that satisfies both Eq. (16.4.29) and condition (16.4.21).Here 5 is a constant remaining to be determined. By the definition of S this same unique solution can also be obtained by operating with S on the right-hand-side expression. Hence we have

+ B)y.

(16.4.30)

+ B))-‘[ll).

(16.4.31)

y - 5 11) = SP((a0 - a ) l Rearranging this equation and solving for y yields y ( t ) = (1 - SP((a0 - a ) l

By construction this solution y ( t ) satisfies condition (16.4.21),which now reads

411 = YO - a)l -I-B)(1 - SP((a0 - a ) l + B))-’Il)) = 0.

(16.4.32)

The requirement that this “matrix element” 411 vanish yields an implicit formula for the eigenvalue a being sought. The other frequency shifts are obtained similarly from 422 = 0 , 4 3 3 = 0, erc. These formulas are valid for arbitrary B, but it will be necessary to assume that B is “small” in order to evaluate 41 1. At this point, all conceptually difficult points have been mastered, and all that remains is “turning the crank.” Unfortunately, the algebra remaining to be faced is rather complicated. Although each step has been fully justified, the development has been rather formal. It can be said to have been “geometrically inspired because of the importance of “annihilation conditions,” Eqs. (16.4.13),(16.4.21),and (16.4.32),and their satisfaction using the projection operator P. To proceed further, it is necessary to take advantage of the smallness of B, sorting terms by “orders of smallness” equal to the number of factors of B. We introduce the temporary abbreviations

Q = (a0- a)1+ B,

and

N = SPQ,

(16.4.33)

both of which are quantities of “first order of smallness.” With these, the middle factor in Eq. (16.4.32)can be transformed using

(1 - N)-*= I

+N ( 1 - N)-’.

(16.4.34)

The matrix element 411 that has to vanish is therefore given by

((lIQ(1- N)-’I1))= ((11Q11))

+ ( ( l l W C 1-AO-’I1))-

(16.4.35)

The final term is of a “second order of smallness.” In it, because of Eq. (16.4.21), it is legitimate to keep only the second term from the factor Q, ((lIW(1 -NPIW = ((lIBN(1-N)-lll)) = ((B(1- N ) - ’ N l l ) ) ,

(16.4.36)

518

PERTURBATION THEORY

the latter step being allowed because N commutes with (1- N)-'. Further simplification results from using NI1) = S P B l l ) ,

(16.4.37)

which follows from the second of Eqs. (16.4.16). Setting #q1 = 0, using (111) = 1, and solving for a! yields the main result Q! = ( Y O

+ ((l(B(1- SP((a!o- a!)l+B))-'SPB(I)).

f ((lIB(1))

(16.4.38)

This is a big improvement over Eq. (16.4.32) since it immediately yields an explicit expansion for a! in ascending powers of B. Such a series is reminiscent of a quantum mechanical perturbation series. With the factor (ao-a!) in the last term approximated by -((llBll)), Eq. (16.4.38) is valid up to terms of order B3. Eq. (16.4.38) is the master formula on which subsequent analysis can be based. Once an eigenvalue has been found, it is normally straightforward to find its eigenvector. In any case, the long-term behavior is governed by the value of a. Throughout this discussion, although it has not been indicated explicitly, the factor B has been allowed to be time-dependent.But the leading correction term, (( 1IBI I ) ) , because it is first-order in B, averages to zero for all but the time-independentpart of B. In the higher-order terms, however, it is possible for the products of time-varying factors to have nonzero average values. Though Eq. (16.4.38) is pleasingly compact, there is still quite a bit of work to do to evaluate the higher-order terms, mainly because the operator S has been defined only implicitly so far. The expansions (16.4.5) enter the evaluation of the matrix elements in Eq. (16.4.38) via the factor B. Recall that W T is the frequency of an externally imposed, nonautonomous perturbation. It is this frequency that determines the period over which the preceding matrix elements are to be calculated, and it is only terms with m = 0 that give a contribution from the lowest-order term. Of course, products of oscillating terms can also yield nonvanishing averages.

16.4.1. Pure Damping Purely damped motion, with no coupling between modes, though elementary, provides practice in evaluating the matrix elements of Eq. (16.4.38). Consider a weakly damped, one-dimensionaloscillator with equation of motion

d2e __ + 2u1,o-de dt2

dt

2 = 0. + poe

(16.4.39)

In the four-mode oscillating system that we have been analyzing, this equation could describe the motion of one of the normal modes, say the first. Let us assume this to be the case and that all coupling terms in Eq. (15.1.1) vanish. Then the only perturbing term in the matrix B corresponds to the second term of Eq. (16.4.39). Free motion of this oscillator is known to be described by the real part of (16.4.40)

MULTIDIMENSIONAL, NEAR-SYMPLECTIC PERTURBATIONTHEORY

519

The “complex frequency” is (16.4.41) Normally it is the term a1,o that constitutes the “important” perturbation. This is both because it is linear in the small quantity (which is a1,o itself) and because it is real and therefore causes decay of the amplitude (or growth if it should be negative). To describe this system by the near-symplectic perturbation formalism, define

c = ( -POO 2

;), (16.4.42)

(These are 2 x 2 matrices and two-component vectors.) In this case, the perturbation is time-independent. The first order matrix elements are (hlBlj) = (ma:

- i.1)

( O0’

-Q

) ( ‘J) = -pjahQaj. t ipjaj

(16.4.43)

With Q = 2 diag(a!l,o,O,0, 0), (TI]

= (lIB(1)= - c Y ~ ,

(16.4.44)

and Eq. (16.4.38) reduces to a!

= ipo - a!1,0.

(16.4.45)

This agrees to a first approximation with Eq. (16.4.41). Proceeding to the next approximation, the operator P is obtained by striking all but rows and columns 1 and 5 from Eq. ( I 6.4.17). Then

PBI1) =

1

( * f) (’

0

-2w.o

) ( iPoa1 “ )=(

-iP0 )a!i,ou1. (16.4.46)

The result of operating on this vector with S is defined operationally by Eqs. (16.4.19), (16.4.22), and (16.4.23);

(Since the perturbation term is time-independent its response is also constant, so the term d y l d t has been dropped.) The attempt to solve this equation by inverting C - ipo1 fails because its determinant vanishes, but solving the equations directly

520

PERTURBATION THEORY

yields (16.4.48) as a vector satisfying Eq.(16.4.47) for any value of e. The second term is proportional to 11) (as, according to the general theory, it must be). Condition (16.4.21).

fixes e = -(icu1,oa1)/2~0,and (16.4.50) Finally we evaluate

(16.4.51) When this is substituted into Eq. (16.4.38). the result agrees with Eq. (16.4.41). If the coupling terms in Eq. (15.1.1) had not been neglected, the damping calculated so far would be unaffected. But in higher orders of approximation, nonvanishing products coming from Q and P would appear. There would therefore be “sympathetic damping” by which a mode that would otherwise be undamped actually loses amplitude gradually because it couples “energy” into the damped mode, and does not get it all back when it later expects to, because some of it has been lost by the damped mode. Even nearly ideal oscillators are subject to some damping of the sort analyzed in this section. When the damping is weak, the oscillator is said to be a “high Q ’ oscillator where, speaking loosely, Q is the number of cycles over which the amplitude suffers appreciable fractional decay. In a multimode system, the damping rates of different modes are in general different. The presence of some such tiny damping is almost guaranteed for any physical system because even the tiniest of growth mechanisms would otherwise drive the amplitude of such an oscillator to infinity, which is, of course, impossible. A standard and sound method of approximation (to be employed in the next section) is to “turn off’ these damping rates while calculating other perturbative effects, and then, after those effects have been calculated, to turn the damping back on again to obtain a final answer. This is a kind of “superposition principle satisfied by small effects of the same order of smallness.” Though physically plausible, in principle the validity of this procedure needs to be checked case by case. Commonly a “threshold of instability” can then be obtained from the condition that the growth due to the perturbation just cancels the natural damping. For pertur-

MULTIDIMENSIONAL, NEAR-SYMPLECTIC PERTURBATIONTHEORY

521

bation stronger than this, the normal mode in question grows uncontrollably,though of course not to infinity, as limiting effects invariably show up. 16.4.2. Time-Dependent Perturbations Returning to Eq. (15.1.5), we now make it more realistic by including two timedependent effects that have been neglected up to this point. Both are important only because of motion in the longitudinal direction. These extra effects will now be incorporated via the terms R , I , R , I , Rx2, and Ry2. Let us suppose the two particles are regularly changing places, alternately overtaking and being overtaken by each other. If it were bicycle racers this would be because they were, on the one hand, trying to win the race but, on the other hand, saving energy while being in the tail position because of being pulled along by the head rider. We will call this a “head-tail” effect and assume that the relative longitudinal coordinates of the riders are given by9 A

(16.4.52) = - cos PTf. 2 This perfectly regular interchange of head-tail positions is being treated as externally imposed, making the equations nonautonomous. The longitudinal range over which the particles move relative to each other is A. One of the two effects we wish to include is “parametric.” While one particle is passing the other, its momentum is temporarily greater. It is therefore “stiffer” and less subject to the transverse restoring force. Its “natural” transverse frequency is therefore less. This effect can be accounted for by including in the equations of motion the terms

Z2(f) = -Z]

R,I = ( D + d ) s i n p ~ t x l , Rx2 = -(D

+ d ) sinpTf x2,

(f)

Ryl = ( D - d) sinpTt y1, Ry2 = - ( D - d ) sinpTf y2, (16.4.53)

where D and d are “small,” known, constant parameters. For d = 0, the chromaticities are equal. The other effect to be included is the transverse wake force. This is nor the force that is pulling the tail particle forward; rather it is the sideways force one particle exerts on the other; it depends on their relative positions. We assume these forces are described by

The coefficient iW appearing here is the same factor that appeared already in Eq.(15.1.1). This reflects the following bit of “sleight of hand” that has been taking 9Eq. (16.4.52)is valid with good accuracy for the motion of two bunches of particles in a high-energy accelerator. The coefficient D,to be introduced shortly, is known as the “chromaticity.”

522

PERTURBATION THEORY

place in the discussion to this point. In fact, though there is a transverse wake force on the tail particle due to the head particle, there is no reciprocal force on the head particle. This violates Newton’s third law, but nobody said Newton’s law was valid when relevant objects are left out of the description. In the case of bicycle riders, the air is capable of taking up transverse momentum. In the case of relativistic charged particles, electromagnetic radiation is capable of taking up transverse momentum. By violating one of Newton’s laws, we are removing one of the cornerstones of classical mechanics, and we will not be surprised if we find below that we have jettisoned symplecticity as a consequence. Because of this nonreciprocal nature, the forces in (16.4.54) should have been expected to have the form Rxl = -’lW

2rr

cos,UTtX2U(Z2 - z l ) ,

etc.,

(16.4.55)

where the extra factor U (22 - z 1 ) is a step function, equal to 0 when particle 1 is in the head position and equal to 1 when it is in the tail position. The “sleight of hand”” mentioned above was to artificially include the term p W1x2 on the left-hand side of Eq.(15.1.1). This meant that the focusing effect of the wake field was already accounted for “on the average.” The situation is now being rectified by the inclusion of the perturbation terms (16.4.54). Including the wake force with the “wrong” sign when the particle subject to it is in the head position cancels the force that has previously been included “erroneously”;including a wake force with the “correct” sign when the particle subject to it is in the tail position gives twice the “correct” force. (The factor of 2 is being subsumed into the definition of W.)Since we are, in effect, approximating a step-function by a “biased sine wave,” we are making an error, but the error amounts to neglecting higher-order terms in a Fourier expansion of a “sawtooth’’ function for a saw with square teeth. This Fourier expansion was the source of the factors 1/(2n) appearing in Eqs. (16.4.54).Since this is just a constant factor, its actual value will not affect the analytic formulation to follow. Changing this constant is equivalent to changing the scale of the control parameter 1. The equation of free motion Eq.(15.1. l), now expressed in eigencoordinates as in Eq. (15.13, is

(16.4.56) where

(16.4.57)

“If the following discussion seems too “flaky:’ the reader can just accept the fact that the time dependencies in Eqs. (16.4.53) and (16.4.54) have been taken as sine function and cosine function, respectively, purely to produce a manageably simple perturbation as an example of symplectic perturbation theory.

MULTIDIMENSIONAL, NEAR-SYMPLECTICPERTURBATIONTHEORY

523

where the superscripts s and a differentiate symmetric and antisymmetric matrices. The matrices Pa have been generalized a bit compared to Eq. (15.1.2) by introducing the parameter w (small compared to W )to allow the horizontal and vertical wake fields to be unequal. The fact that PS and Pa are symmetric and antisymmetric, respectively, is required by known laws. For algebraic convenience, the time-varying factors sinpTt and c o s p ~ will t eventually be expanded in complex exponential form as shown, and the perturbation matrices have been partitioned to later take advantage of their zero blocks. To be fully realistic, a lossy term Qdeldt should also have been included in Eq. (16.4.57), but, as explained above, it will be dropped for now; its effects, as calculated in the previous section, will be restored later. *16.4.3. Resonance Induced by Time-Dependent Perturbation We are now ready to substitute the perturbation matrices into Eq. (16.4.38) in order to calculate the shifts in natural frequency they cause. (Here we are refemng to “characteristic exponents” as “natural frequencies” in spite of the factor i by which these quantities differ.) Because P“ and pa both average to zero, they cause no frequency shift in lowest order. That is to say, (( 1IBI 1 ) ) = 0 in EQ.(16.4.38). Substituting from Eq. (16.4.4) into Eq. (16.4.38), the second-order matrix elements due to PF and Pa are

(16.4.60) where l j ) was defined in Eq. (15.3.41) and Eq. (15.3.42). For example, 11) =

(

j;:al).

The bra (hl was defined in Fiq. (15.3.49). The factor P has been set

to 1 because it operates on a quantity that averages to zero. By its definition in Eq. (16.4.22). the factor

524

PERTURBATION THEORY

is the solution of the equation (16.4.6 1) made unique if necessary by the inclusion of extra conditions. Here there will be no need for “extra conditions” since (to the order being retained) the second term in Eq. (16.4.20) vanishes. (The abbreviation u(t) has been temporarily reintroduced in Eq.(16.4.61) to allow brevity in the next few steps, for example by suppressing the index j . ) It will be necessary to solve Eq. (16.4.61) for each of the values of j . For the problem under study, its time-dependent terms will be proportional to e * i f i T ‘ . ) Specializing again to j = 1, this equation can be written as

Seeking a solution with time variation eiPr,this becomes (16.4.63) Although there are eight equations, they are not very complicated and they can be decoupled by multiplying both sides by

This has the effect of diagonalizing the coefficient matrix on the left-hand side of the equation, thereby simplifying the equations into the form

The quantity p is not simple here, as it depends on the particular unperturbed eigenvalue under study through the time dependence of the perturbation term being operated upon. If the perturbation term ~ ( tis) a sum of terms with different time dependence, p has a different value for each different term. The matrix of coefficients on the left-hand side of Eq. (16.4.64) is diag(7L, ‘Id, where K k

= -(p

+ pd21+PO

MULTIDIMENSIONAL,NEAR-SYMPLECTIC PERTURBATION THEORY

E

(5;

$).

525

(16.4.65)

This has been symbolized by %* because it depends on the particular eigenvalue be. ing evaluated, namely pj = p1. and on p , which in our case will be f p ~Inverting this matrix and continuing with (16.4.64),

which serves to define the “matrix” &. The “matrix” S,, are defined similarly for general index j . The quotation marks on “matrix” are a reminder that its elements depend on the time dependence of the vector on which it operates. The kth element of q* is given by

5 f . k = -(pj

*

+

(16.4.67)

P T ) ~

Then we have

(16.4.68) Substitutingfrom Eq. (16.4.58) and Eq. (16.4.59), the submatrix (P“+Pa)T,;’

P“) is diagonal, with elements and

(DsinpTt - WcospTt)?;(:)-’(DsinpTr

+ WcospTt)

(Dsin p T t - w cos pTt)?$)-’

+ w cos p T t ) .

(Dsin p T t

(P“+

526

PERTURBATION THEORY

Before completing the calculation, since the same calculations have to be made for j = 1 , 2 , 3 , 4 , it is worth introducing more compact notation. The elements of - 7,:’ can be expressed in terms of the following functions: the differences

g.

Ik

-I-’ +7? = - j+Tk J-*k

1

1 -(pj

+ FT)2 +

+

-(p/

- p T ) 2 + wf (16.4.70)

The quantity pi - p i can be related to “line separations” that can be read off from Fig. 15.1.1, as illustrated by the following formulas:

(16.4.7 1)

and similarly for other possibilities. For C x p~ and/or i W X p ~one, or more of the denominators in these expressions is capable of becoming small or vanishing. The result is unlimited growth that is said to be due to “resonance.” Such resonance is almost certain to be considered unacceptable behavior for the system under study, and the avoidance of resonance is typically of greater interest than is the precise analysis of the motion when the condition for resonance is met. Although this is the very sort of result the method under study is intended to obtain, we will not go into greater detail here.

MULTIDIMENSIONAL,N EAR-SYMPLECTICPERTURBATION THEORY

527

16.4.4. Threshold of Instability Not all instability is due to vanishing denominators. We now turn to one such example. We continue with the evaluation of the matrix element appearing in Eq.(16.4.69) and collect the factors in Eq. (16.4.38) to obtain the shift of mode 1:

(16.4.72) Similar calculations give the shifts for the other modes; they satisfy

To simplify further discussion let us make some inessental (easily fixed, that is) assumptions,which are usually valid. We assume C << p ~It Wl , <( p ~and, p~ << p j for all j, which leads to

(16.4.74) Furthermore, let us take d = w = 0. We then obtain

Consistent with the comment made just below Eq. (16.4.41), it is the second term that is likely to be the most important because, being real, it causes nonsympiectic growth or decay of the amplitude of motion. Since both D and W can have either sign, it is undetermined whether mode 1 will decay or grow. But the signs in Eq. (16.4.73) show that if mode 1 is damped, then mode 3 is antidamped and vice versa. Let us say that the signs are such that, according to (16.4.75), mode 1 will grow for any finite value of i (which is necessarily positive). However, before concluding from this that the motion is never stable, we recall that we intended to include the true damping effect studied in the previous section. Combining the damping rates

528

PERTURBATIONTHEORY

given by Eqs. (16.4.45) and (16.4.75), (16.4.76) The important qualitative feature of this formula is that the motion is stable for small current I but unstable for large 1. The instability threshold is given by ( 1 6.4.77)

The Hamilton-violatinggrowth that has been obtained can be traced to the antisymmetric form of the wake potential Pa,and this was a consequence of the nonreciprocal nature of the wake force of one particle on the other. Notice, though, that the presence of P is not enough, by itself, to cause growth or decay. Rather, the presence of P oscillating at the same frequency is also required. In this sense, then, the growth that has been obtained can also be ascribed to a kind of resonance. 16.5. SUPERCONVERGENT PERTURBATION THEORY Because transformations based on generating functions, such as have been discussed in Section 14.1.1, are automatically symplectic, there has been a strong historical tendency to base perturbation schemes on this type of transformation. G.D. Birkoff was the leader of the successive canonical transformation approach. His book, Dynumicul Systems, reprinted in 1991 by the American Mathematical Society [3]. is both important and readable. The down side of this approach, as has been noted previously, is that it mixes old and new variables, giving implicit rather than explicit transformation formulas. The only systematic way to obtain explicit formulas is by the use of series expansion. When one truncates such series (as one always must), one loses the symplecticity that provided the original motivation for the method. It is my opinion, therefore, that the more “direct” methods described to this point are more valuable than this so-called canonical perturbation theory. There is, however, a theoretically influential development, due to Kolmogorov and known as “superconvergent perturbation theory,” based on this approach. This is the basis for Kolmogorov’s name being attached to the important “KAM,” or “Kolmogorov, Arnold, Moser” theorem. 16.5.1. Canonical Perturbation Theory For this discussion, we return to a one-dimensional,oscillatory system described by Hamiltonian H ( q , P ) = Ho

+ HI,

(16.5.1)

where the term H1 is the perturbation. We assume that the unperturbed system for Hamiltonian Ho has been solved using the actiodangle approach described in Section 14.3.3. When described in terms of action variable lo and angle variable PO.the

SUPERCONVERGENT PERTURBATIONTHEORY

529

unperturbed system is described by the relations

(16.5.2) When q is expressed in terms of lo and 400 and substituted into the function HI, the result Hl(rp0, lo) is periodic in (PO with period 2n; it can therefore be expanded in a Fourier series, much as was done on the right-hand side of Eq. (16.2.3):

(16.5.3) k=-W

To “simplify” the perturbed system, we now seek a generating function S(q10, Z l ) to be used in a transformation from “old” variables l o and (PO to “new” variables I1 and (PI that are actiodangle variables of the perturbed system. The generating function for this transformation has the form S((P0,Il)= (Pol1

+

(16.5.4)

@((Po, 11).

According to Eqs. (14.2.I), the generated transformation formulas are then given by

(16.5.5) The second terms are of lower order than the first terms. Substituting these into Eq. (16.5.1), the new Hamiltonian is

= HO(ll)+

a@

(Hl((PO~~l>) + o o - ~ m ~ ~ l ) + y ~ 1 ( ( P11)) o, a(P0

+*.*

(16.5.6)

Here we have used the same notation for averaging that was used in Eqs. (16.3.49); operating on a periodic function ( ) yields the average and f) yields what is left over. It has been unnecessary to distinguish between lo and I1 where they appear as arguments in terms of reduced order, because the ensuing errors are of lower order yet. By choosing @(gpo, Z I ) appropriately, one can eliminate the angle-dependent part of Eq.(16.5.6)(the last two terms). This determines @ according to

a@ Wg--((go,

a(P0

11)

= -W1((Po, 11)).

(16.5.7)

530

PERTURBATIONTHEORY

This is known as “killing” these angle-dependent terms. The task of obtaining @ is straightforwardly faced, just as in Eqs. (16.3.49). Doing this makes the transformation equations (16.5.5) explicit. Since the Hamiltonian is then, once again, independent of the angle coordinate (PI, the variable 11 is the action variable to this order. After this procedure, the newly “unperturbed” Hamiltonian is

and its frequency is given by (16.5.9) By choosing (16.5.8) as another “unperturbed Hamiltonian,” the whole procedure can (in principle) be iterated. In practice, the formulas rapidly become very complicated. It is easiest to follow an explicit example such as the following. 16.5.2. Application to Gravity Pendulum

To illustrate the preceding formulas and to see how they can be extended to higher order, let us consider the gravity pendulum, closely following the treatment of Chirikov listed at the end of the chapter. The Hamiltonian is P2 p2 82 o4 86 88 H = - + (case - 1) = - + - - - + - - - + . . . . 2 2 2! 4! 6! 8!

(16.5.10)

The constants have been chosen to simplify this as much as possible. In particular, mass = 1 and 00 = 1. Taking the quadratic terms as the unperturbed part of this Hamiltonian. we have

Ho = l o ,

0 = 4210

COS(PO.

(16.5.1 1)

Expressing H in the form Eq. (16.5.3), 4Ii H ( ~ o(PO) , = l o - -cos 4!

(PO

+ 81;6! ~

161;

C O S ~(PO

-cos rpo. 8!

(16.5.12)

This series has been truncated arbitrarily after four terms. For the first approximation, only the first two terms have any effect, but to later illustrate the Kolrnogorov superconvergence idea it is appropriate to complete some calculations to a higher order than might initially seem to be justified. Define Fourier expansions based on the identities ( c o s ” ~= )

[

1 2n

;(n

nl2

)

for n even, for n odd,

( 16.5.13)

SUPERCONVERGENT PERTURBATION THEORY

531

and the definitions fn = cos” 9 -

df”

F,, = [F,), where

(cos” q ) , f,’= -, d9

dF,,

- - fn. d9

(16.5.14)

For example,

fi = - sin29 - -21 sin49, F4

=

/ys

. (16.5.15)

(kcos29’+

Rearranging the terms of Eq. (16.5.12) yields

(16.5.16)

The angle-dependent part of this Hamiltonian is of order I * . The averaged Hamiltonian is ( H l ( l 0 ) ) = I0

1;

z,3

1;

16

288

9216

- - + - - -.

(16.5.17)

It is a priori unclear how many of these terms are valid, so the same is true of the perturbed frequency derived from this formula. Choosing to “kill” only the f4(90) term, the leading tern of Eq. (16.5.7) yields

Substituting this into Eqs. (16.5.5) yields

As mentioned previously, because of the generating function formalism, the new and old coordinates are still inconveniently coupled at this point. The cost of uncoupling them is further truncated Taylor expansion:

532

PERTURBATIONTHEORY

(16.5.20) The result of reexpressing Hamiltonian (16.5.17) in terms of the new variables (keeping only terms up to I:> is H(#Jl,ZI)= I, - -I ;+

16

I; - 1 i4 - I? (f4- - f6 + 288

9216

f; f:- fi2 + -If 1 8 ( -8 - - +15 3 )

6

48

90

5)

(16.5.21)

F4.

At this point, the angle-dependent terms in the Hamiltonian are of order 13. The reason for this is that the order increased from l 2 (the previous order) by the order of a @ / a l l , which was Z’. 16.5.3. Superconvergence

For the particular problem (pendulum) being discussed, an analytic solution in the form of elliptic functions is known, so it is possible to check the formulas that have been obtained. One finds that Eq. (16.5.17) is correct only up to the l 2 term and Eq. (16.5.17) is correct only up to the l 3term. This is the same “rate of improvement” as has been obtained with the methods described previously in this chapter. What is remarkable is that, when we have completed the next iteration step using the current method, the next result will be correct up to the l5 term. The step after that will be correct up to the l9 term. In general, the nth iteration yields 2n 1 correct powers of I. This is Kolmogorov’s superconvergence. To see how this comes about, let us determine the generating function @((PI, 12) that “kills” the leading angle-dependent term of Eq. (16.5.21). By Eq. (16.5.7) we have

+

(16.5.22) which is of order 1 3 . The order of a@/a12 is therefore 12. After this iteration the angle-dependent part of the Hamiltonian will be of order Is. The other statements in the previous paragraph are confirmed similarly. This is superconvergence. The key to its success is the appropriate segregation of time-dependent and time-independent terms at each stage, since this prevents the pollution of lower-order terms in higher-order calculations. The accelerated number of valid terms in each order is inherent to the scheme of iteration.

BIBLIOGRAPHY

533

BIBLIOGRAPHY

References 1. N. N. Bogoluibov and Y. A. Mitopolsky,Asymptotic Methods in the Theory of Non-Linear Oscillations, Gordon & Breach, New York, 1961. 2. R. L. Stratonovich,Topics in the Theory of Random Noise, Vol. 2, Gordon & Breach, New York, 1963, p. 97. 3. G. D. Birkhoff, Dynamical Systems, American Mathematical Society, Providence, RI, 1991.

References for Further Study Section 16.1 F. T. Geyling and H. R. Westerman, Introduction to Orbital Mechanics, Addison-Wesley, Reading, MA, 197 1.

Section 76.2 L. D. Landau and E. M. Lifshitz, Mechanics, Pergamon, Oxford, 1976.

Section 16.4 V. Yakubovitch and V. Starzhinskii, Linear Differential Equations With Periodic Coeficients, Wiley, New York, 1975.

Section 16.5 B. V. Chirikov, A universal instability of many-dimensionaloscillators. Phy. Rep., 52 (1979).

This Page Intentionally Left Blank

INDEX 8 / 8 x , basis vector, 65 ;, covariant derivative, 147 *, Hodge-star operation, 142 ., overhead dot, time derivative notation, 103 ",overhead double dot, second time derivative, 103 ,,partial derivative, 147 1, identity matrix, 82 [). suppressed terms notation, 509,529 cri or Pi,Jacobi momenta, 332,447 A' = (I$,A), relativistic four-potential, 360

A. electromagnetic vector potential, 53, 360 a , @, Krylov-Bogoliubov variables, 496 (a1 ,,a2, . , an),covariant components, 40

..

invariant product, 40 -six', a(x), invariant product, 40

6,x). invariant product, 40 Abbreviated action, 420,426,428 Absolute differential of and bilinear covariant, 109 contravariant vector, 99 covariant vector or constant vector, 100 metric tensor, 100 scalar function or scalar product, 100 Absolute time derivative operator, D,,257 Accelerating frame description, 244 Acceleration absolute, 103 fictitious, 248 Accuracy of adiabatic invariant, 439 Action, 307 abbreviated SO, 335,420.426 abbreviated, parametric oscillator, 428 adiabatic invariance of, 422 approximate invariance, 427 expanded in powers of A / i , 346 generator of canonical transformation, 415 h , Planck constant, 344

multidimensional, 447 principle of least, 181,356 related to eikonal, 325 related to Lagrangian, 325 relation to period of motion, 425 relativistic, 356 relativistic, including electromagnetism, 360 simple harmonic oscillator, 422 spatial dependence, Hamilton-Jacobi, 345 Action variable. See Action-angle; variable Action-angle conjugate variable, 426 conditionally periodic motion, 448 Fourier expansion, 429 -like K-Bparameterization, 496 variable, 42 1,446 Activdpassive interpretation. 41, 59, 259 Adiabatic approximation, Foucalt pendulum, 289 condition, 3 10,346,427,430,439 invariant, action is, 422 use of the term in thermodynamics, 426 Adiabatic invariance, 289,408 accuracy of conservation, 439 charged particle in magnetic field, 434 importance for quantum mechanics, 426 magnetic moment in magnetic bottle, 437 proof of invariance, 425 R.I.I., 409 Adjoint equation, 81, 477 equation, linear Hamiltonian system, 463 of matrix, 81 of single-period transfer matrix, 476 Affine centered-, transformation, 90 connection, 94 transformation, 90

535

536

INDEX

Alternate coordinate ordering, Hamiltonian, 395 AmNre law. mathematical and physical arguments for, 401 Analogy eikonal/action, 343 opticslmechanics. 415 optics/mechanics/quantum mechanics, 343 Angle in metric geometry, 90 Angle variable, 444. See also Action-angle; variable adiabatic dependence, 442 defined, 427 proportional to I , 446 Angular momentum, 169,216 conservation, 224,233 rate of change, 223 Anharmonic oscillator, 489 first-order solution. 492 potential illustrated, 493 second-order solution, 494 zeroth-order solution, 490 Anholonomy, 285,295,300 Anisotropic medium, 3 12 Annihilation, 56 Anomaly, 28,449,486 Antisymmetrization, 140 Apparent time derivative, 246 velocity, acceleration. 247 Approximation accuracy of adiabatic invariance, 439 analytic basis for, 415 linear, 449 need to preserve symplecticity, 415 using linear expansion, 43 Area integral, 151 projected onto coordinate planes, 377 Pythagorean relation for. 141 Aries, star fixing astronomical axis, 337 Ascending node, 339 Association 2 x 2 matrix and vector, 274 2 x 2 matrix with vector or bivector, 156 3 x 3 mAtrix and vector, 276 z with dH. 383 angular momentum as matrix, 281 cross product, commutator, I57 dot, cross, triple products, 157 plane and vector, 134 spinor with vector, 155 torque as matrix, 280

trajectory and differential of Hamiltonian. 382 trivector with matrix, 157 vector or bivector with 2 x 2 matrix, 156 vector, plane, and matrix, 134 vectorlform, induced by &2), 381 Asteroid, 254 Autonomous oscillators, qualitative analysis, 502 Autonomoudnonautonomous, 489 Average, 423 Averaged perturbed Kepler equations, 481 projection operator, 5 15 scalar product, linear system, 476 Avoided line-crossing, 12 Axis body, 178 laboratory, 178 space, 178

pi or Q i,Jacobi coordinates, 332 Ball rolling without slipping, 213 Base frame, 91 Basis orthonormal. 133 reciprocal, 71 symplectic, 386 Basis vector linear Hamiltonian system. 462 partial derivative notation, 64 Beam of particles, 307 Bifurcation, 504 Bilinearcovariant, 47, 109 independence of coordinates. 1 10 related to quasi-velocity, 176 Bilinear form, 18 Bivector, 139 and infinitesimal rotation, 143 association with rotation, 156 Body -axes, 180 -axis principal moments of inertia, 209 -frame Poincark description, 208 -frame expression for kinetic energy, 209 and space frames, 270 frame, 244 Bohr-Somerfeld theory. 426 Boundary value formulation, ray tracing, 320 Bowling ball, 2 14 Bra and ket form linear Hamiltonian system. 462 multidimensional system, 5 16 Bra. Dirac, 8 1

INDEX

Bracket Lagrange, 483 Poisson. See Poisson bracket Bragg law, 76 scattering, or reflection, 74 c, speed of light, 348 .,c: structure constant, 172

Calculus differential form, 50 of variations, 325 of variations. basic problem, 181 of variations, vector field formulation, 183 Canonical. See also Symplectic momentum, 305 momentum one-form, 376 perturbation theory, 528 transformation, 415, transformation. time-independent, 4 19 Cartan, 37 mkthode du repkre mobile, 257 matrix n, 206,257,261 marrix Q as bivector, 144 matrix Q related to angular velocity, 145 Cartesian axes, 37 basis. 66 vectors in metric geometry, 84 Cat, how it lands on its feet, 301 Catalytic, 50 Cayley-Klein parameters, 154, 159 Central force. perturbed equations of motion, 488 Centrifugal acceleration, 247 force, 244,248 meteorological unimportance, 249 Centroid, 29,208 variables, 212 Chaotic motion, 303,492 Characteristic exponent, or tune, 471 multiplier, 471 Christoffel , form, W i J , 95 cylindrical coordinates. 107 derived from metric, 96 evaluation using MAPLE, 107 polar coordinates, 102 practical evaluation, 106 spherical coordinates, 107 95 symbol, riJk, Circulation, of one-form, 150,407

537

Close-to-identity matrix A;, 190 transformation, 190 Coherence, 75 Commutator Euclidean, 2 13 matrix, 156 quantum mechanical, and Poisson bracket, 398 quasi-basis vectors, 125 relation to loop defect, 123 rotation and translation, 214 same as Lie derivative, 126 and structure constants, 200 vector fields, 122 Complete integral examples, 330,343 CompI ex multiple meanings of word “complex,” 380 Complex conjugate, 81 Component curvilinear, 89 skew, 90 Composition of transformations, 59 Concatenation, 59,321 requirement for continuous group, 191 Condition adiabatic, 3 10.440 extremal, 183, 184. 318,416 Floquet, 468 Hamilton-Jacobi transverse, 327 Hamiltonian or infinitesimally symplectic, 459 imposed to yield unique solution, 87,477 magnetic trap velocities, 435 for matrix to be symplectic, 389 minimum, or extremum, 3 18 orthogonality on the average, 5 14 osculation, Lagrange planetary equations, 482 periodicity, 467 resonance, 526 single-value. 400 subsidiary, to make solution unique, 87 symmetric/antisymmetricsymplectic distinction, 523 validity of geometric optics, 3 10, 346 validity of magnetic trap analysis, 434.40 validity of semiclassical treatment, 345 Conditionally periodic motion, 442 Kepler problem, 447 Congruence, 3 16 lie-dragged, 126

538

INDEX

Congruence (con?.) n o crossing requirement. 372 of curves, 123,372 Conjugate momentum, 305 syrnplectic,of matrix, 389 Connection affine, 94 symmetric, 94 Conservation angular momentum, Lagrange approach, 224 angular momentum, Poincad approach, 223 cancellation of internal forces, 224 and cyclic coordinates, 226 energy, 24,225 linear momentum, 24,221,331 magnetic flux linked by guiding center, 439 multiparticle laws, 221 Noether’s theorem, 230 of charge, misleading application, 401 reduction, 226 symmetric top, 227 and symmetry, 221 Constraint holonomic, or integrable, 165 nonholonomic.or nonintegrable, 167 Continuity equation, 53,315 Continuous group concatenationrequirement, 191 examples, 194 investigationusing MAPLE, 196 Contour equivalue, 40 reduced three-body problem, 253 Contour integration,441 Contraction of indices, 63 of indices theorem, 62 Contravariant relativistic components, 353 vector, 39 Control of internal configuration,303 parameter,450,504 Coordinate affne connection, 94 complex, 66, 156.286 congruence, 123 curvilinear, 89, 146 flawed, 96 genzralized, 165 ignorable, 175,230 longitudinal, I82

in magnetic trap, 434 skew, 68 transformation, 56 transverse, 182 Coriolis acceleration,247 force, 244,248 force, free fall subject to, 250 force, importance for satellite motion, 25 1 force, meteorologicai consequence. 250 Correspondence, quantudclassical, 398 Cosine of angle in metric geometry, 90 -like trajectory, 314 Coupled motion of two interacting particles, 449 Covariant component, 70 manifest, 363 relativisticcomponents. 353 vector, 37.39 vector, as pair of planes, 37 Curl analog of exterior derivative, 153 as differential form, 52 Curvature, radius of Frenet, 240 particle orbit, 146 Curvilinear coordinate, 89, 146.237 Cyclic (or ignorable) coordinate, 226,229 Cyclotron frequency, 434 6 ; j , Kronecker,91,354 6(x). Dirac delta function, 60

D,absolute differential, 97, 103 Dt ,absolute time derivative operator, 257 D,,gauge-invarianttime derivative operator, 259 d or 6, equivalent differential notations, 47 apparent time derivative, 246 d x , “ordinary” differential, 43 dJdA, vector field notation, 121 dx, differential form, 43 d’Alemben’s principle, 479 Darboux’s theorem, 387 deBroglie frequency/energyrelation, 345 relation, 78 wavelength, 346 wavelengthhornenturn relation, 345 Degeneracy, breaking the, 451 Density, in phase space, conserved, 412

m,

INDEX

Derivative covariant, 109 directional,along ray, 3 15 exterior. See Exterior differential invariant, 109 Lagrange, 105 second-order,related to commutator, 186 total, 103 Diagonalization, 12, 18 linear Hamiltonian system, 45 1,458 matrix, 458 Differential covariant,or absolute, 97 D,absolute differential, 103 d or 8, equivalent differential notations, 47 d_x, ‘‘ordinary” differential, 43 df of function f ( x ) , 42,375 exterior. See Exterior differential related to gradient. 65 Differential form ad hoc notation, 46 calculus, 46 closed, 47 divergence, 153 exact, 48 Gauss’ theorem, 147 geometric interpretation, 42 integration and differentiation, 150 “old-fashioned,”47 Stokes’ theorem, 150 surface integral, 15 1 and vector calculus, 50 Dimensional considerations, 3 1 Dirac bra and ket, 81 8-function, 60 Distribution function, 60 parameters, 60 Diver. See Gymnast Divergence, 154 as differential form, 52 generalized, 147 metric-free definition, 153 theorem. See Gauss’s theorem Dot product, 67 Drift azimuthal, in magnetic trap, 439 longitudinal, in magnetic trap, 437 Dual space, 54 vector, 54 Dyadic, 273

539

Dynamical system, 81, 226,312,314, 528 Dynamics, 105 c i j k , Lvi-Civita

three-index symbol, 73

E/B,electric/magnetic field, 363 E , relativistic energy, 357 EO = mc2, relativistic rest energy, 357 ej, basis vector, 58

2,basis covector, 58 Earth, moon, sun system, 25 I Eccentric anomaly, 28,449,486 Ecliptic plane, 337 Effective potential, 25 Eigenvalue dependence on control parameter, 452 labeling, 83.452 linear Hamiltonian system, 459 symplectic matrix, 392 Eigenvector,83 degenerate, 45 1 four degrees of freedom, 45 1 linear Hamiltonian system, 461 multidimensional linear system, 461 special, in extended phase space, 406 symplectic matrix, 391 Eikonal, 310,323,325 equation, 310,401 Einstein-Minkowski metric, or law, 67, 350 Eliminating velocity terms, 452 Energy kinetic, 29, 104, 166, 181 potential, 173,205,497 relativistic, 357 Equality of mixed partials, 199 Equation anharmonic oscillator, 489 Bragg. 76 coupled oscillator, 12,449 eikonal, 310 Floquet, 470 Hamilton-Jacobi equation for S, 325 Hamilton’s. See Hamilton homogeneous, 85 inhomogeneous, 457 Laue, 71 linear, periodic, Hamiltonian system, 469 Lorentz, 352,362 multidimensional,perturbed, 5 11 multidimensional,unperturbed, 5 1 I Newton, 237 Poincar6, 172 rolling ball, 215 Schrodinger, 343

540

INDEX

Equation (conr.) variational, 472 wave, 309 Equatorial plane, 338 Equivalent damping (or growth) rate, 499 linearization, K-B method. 499 spring constant, 499 Ergodic theorems, 254 Essential parameter, 191 Euclidean basis, 66 Euler -Lagrange equation, 182 -Poisson equation, rigid body in force field, 21 1 angle, 178, 179, 296 equation, Lie algebraic derivation. 28 1 rigid-body equation, Paincad-derived, 21 1 Evolution. active/passiveinterpretations, 259 Examples action variables, 448 adiabatic invariance. 289,429 adiabatic invariantsfor magnetic trap, 434 advance of perihelion of Mercury, 485 astronaut reorientation, 300 beads on a string, 22 bowling ball, 2 14 canonical transformation, 429 charged particle in electric field, 364 charged particle in electromagnetic field, Hamiltonian,372 charged particle in magnetic field, 365. 432 commutation of group generators, 207 complete integral, 343 conservation and symmetry, 225-227 constrained motion, Poincart, 216 continuous transformation group, 194 contour map of Colorado, 44 Coriolis force, 25 1,254 differential forms, 50 exterior differential, 50.52 falling cat, 295 fiber optics, 3 13 fictitious force, 249 Foucault pendulum, 284 free fall in rotating frame, 250 geometric phases, 295 grandfather clock autonomous oscillator, 506

Hamilton’sequations, 372 Hamilton-Jacobi method. 342 Hamiltonian,342 Kepler problem. See Kepler

Kry Iov-Bogoliubov method, 497 Lie algebra, 275.277 magnetic trap, 434 matrix optics, 320 multidimensionalperturbation theory, 5 I8 one-dimensionalpotential, 6 parametric oscillator,action-angle approximation,428 pendulum, 174.284 pendulum anharmonic oscillator, 497 pendulum. superconvergentanalysis. 53 I perturbed oscillators, 429 plumb bob. See Plumb bob Poincart equation using group theory, 21 1 Poincar6 method, 303 projectile, Hamilton-Jacobi treatment. 330. 334 relativisiic trajectory determination, 364 rigid body. See Rigid body rolling cart, 217 rolling inside cylinder, 217 semiclassical quantum mechanics, 345 simple harmonic motion, Hamilton-Jacobi treatment. 23. 335 skateboard, 2 16 solution by iteration. 250 successive approximation, 250 symmetric top, 227 symplecticity in optics, 320 three body problem, 25 I tippie-top, 2 I8 trampoline, 304 two interacting particles, 4 9 Van der Pol oscillator. K-B treatment, 503, 509 variable-lengthpendulum, 43 1 weakly coupled oscillators, 12 worked, 6 Expansion, impossibility of remote, 94 Exponentiation, 120, 274 matrix, 274,457,469 representation of parametrized curve, 120 Extended phase space, 402 displacement vector, 406 one-form,405 simple harmonic oscillator,403 skew-scalar product, 406 special eigenvector,406 trajectories, 404 tWO-fOrm, 405 Exterior differential, 50. 153,405 defined, 52 independence of coordinates, 110 Extrernal. 18 I

INDEX

F ( a , @), G ( a ,@),K-B functions, 497. 507 Falling cat, 295 Fermat principle, 308,3 18 Few-particle system shapes, 302 Fiber optics, 3 13 Fictitious force, 248 Foucault pendulum, 286 gauge-invariantreconciliation, 262 Field electric. E,53,363 electromagnetic. derived from four-potential, 53, 362 magnetic, B,53,363 Fixed points, 254 Floquet analysis of periodic system, 468 equation. 469 pseudo-harmonic description, 469 theorem, 468 variables, 469 Flux, 40I see divergence, 154 Force central, 488 centrifugal, 244, 259 conservative,Krylov-Bogoliubov treatment, 497 Coriolis. 244, 259 fictitious, 248 generalized, 105, 170 intensity, 209 wake, 521 Form, 40 fundamental,67 linear, 55 metric, 67 n-. 153 one-. 40, 150,375 symplectic (or canonical) two-, 378 symplectic, as phase space “metric,” 386 two-, 51, 153 Form-invariant, 256 Maxwell’sequations, 256, 348 requirement of relativity, 348 Formalism gauge-invarianVfictitiousforce comparison, 251 Hamilton-Jacobi, 325 spinor. 154 Foucault pendulum, 284 fictitious force solution, 286 gauge-invariantmethod, 287 Four-vector,353

541

Fourier expansion in terms of angle variable, 429 expansion of perturbation. 502 expansion, multidimensional perturbation theory, 5 12 expansion, purpose for angle variables, 447 Frame astronomical, 337 inertial, 9 1,244 Frenet-Serret description, 293 expansion of w, 243 formula. 240 limited applicability in mechanics, 243 Frequency shift anharmonic oscillator,494 multidimensional perturbation theory, 511

perturbation series for, 517 Function linear, 55 Functional, 179 F i j k , Christoffel symbol, 95

G ( a ) ,growth rate profile, K-B method, 502 G(q, Q), generating function. 417

Galilean invariance, 257 relativity, 349 Gauge-invariant, 255 angular motion definitions, 264 description of rigid body motion, 271 electromagnetism,363 fictitious force reconciliation,262 form of Newton’s equation, 256,259 Foucault pendulum, 287 Gauss’s theorem, 149 manifest covariant, 258 mechanics, 237 Newton torque equation, 268 rigid-body description, 279 single-particleequation, 276 time derivative operator, 259,280 torque, 265 Gauss’s theorem gauge-invariant form, 147 generalized, 147 as a special case, 153 Gaussian optics, 319 Generalized coordinate, 238 gradient, 327 rotation, 132 velocity, 238

542

INDEX

Generating function, 4 17 F ~ PQ, . I ) , 417 F4(p7P,f ) . 417 G(q, Q, 1 ) =_ F I(9, Q. 0 , 4 1 7 S(q, P,1 ) = F2(q, P,r), 417 inherent impracticalicy of, 41 8 Generator commutation relations for, 201 of unique solution, S , 87,5 16 transformation group, 200 Geodesic Euler-Lagrange equation, 104 great circle, 292 Geometric optics, 307 optics, condition for validity, 310 phase, 284 Geometry differential, of curve, 240 Euclidean, 66 generalized Euclidean, 132 linear, 35 metric. 66 n-dimensional, 89 ordinary, 89 symplectic. 373,385 synthetic. 54 unitary. 80 vector, 239 Gradient, 355 related to differential form, 43 Grandfather clock, 506 Gravitational acceleration, effective, 268 Great circle, 292 Green's function solution, driven system, 475 Group commutation relations, 200 infinitesimal operator, 193 Lie, 188 n coordinates, x I , x2, . . . ,x" , 189 operator as vector field, 193 parameters as coordinates, 191 r parameten,a',a2,. .. ,o r , 189 StNCtlIre constants. 172. 201 transformation. continuous, I88 transitive. 191 velocity, 324 Growth (or damping) rate multidimensional system. 518 Van der Pol oscillator, 499 Guiding center, 434 drift in magnetic trap, 437 Gymnast, 295 control. 303

external orientation, 298 external possibilities, 303 initial angular momentum, 301 internal angles, 298 internal configuration control, 303 mass. shape, and orientation parameters, 298 modeled by point masses, 298 positions, 297 principal moments of inertia, 296 trampoline stunts, 304 Gyration, 432 illustrated, 434 in magnetic trap, 436 Gyroscopic terms, 453 do not break symplecticity, 454 reduced three-body problem, 254.453 H.I., Hamiltonian variational line integral, 416 H/@M. magnetic fieldpotential, 400 h. Planck constant, quantum of action, 344 A = h/2n, 344 Hamilton, 324 characteristic function, 420 equations. 371.417 matrix, linear system, 456 original line of thought, 319 point characteristic, 417 G(q.Q).417 W ( X , ~, 2 ) 322 . principle, 307, 356 principle of least action, 181,356 variational line integral H.I., 416 Hamiltonian. 305,326 symmetric/antisymmetric requirements, 523 Hamilton's equations conditionally periodic motion, 447 in action-angle variables, 427 in embryonic form. 322 in matrix form, 383 linear Hamiltonian system, 453 linear system, 456 multidimensional, perturbed, 51 1 multidimensional, unperturbed, 5 I I Hamilton-Jacobi, 325 and quantum mechanics, 343 canonical transformation approach, 4 I5 equation, 326 geometric picture, 327 inverse square law potential. 339 nonintrinsic discussion, 327 relations, 327, 356

INDEX

relativistic, 359 simple harmonic motion, 335 transverse condition, 327 Hamilton-Jacobi equation derived from Schrodinger equation, 343 energy E as Jacobi momentum, 335 Kepler problem, 339 projectile, 330 relativistic, 359 relativistic, electromagnetic,362 separability,443 Stackel analysis, 443 time-dependent, 418 time-independent, 335,419 Hamiltonian charged particle in magnetic trap, 436 defined, 370 differential of, 382 expressed in terms of action variables, 446 longitudinal coordinate as independent variable, 367 matrix formulation,456 matrix symmetry related to symplecticity, 458 perturbed, 422 quadratic, for linearized motion, 367 relativistic, including electromagnetism, 362,367 Hard or soft spring, sign of cubic term, 493 Harmonic balance in K-B method, 501 Heisenberglschrodingerpicture, 5 13 Hermitean non-, matrix eigenvectors, 463 product, 8 1,464 Hodge-star operation, 53, 142 Holonomy. 285 holonomic drive or propulsion, 300.304 Hooke's law, 415,493 Huygens construction,323 principle, 322 principle. proof of, 323 Hydrogen atom and Kepler problem, 339 Hyperplane, 68 Hysteresis autonomous oscillator,505 cycle, 503 1.1.. PoincarkCartan integral invariant, 403

I, action variable, 422 I = R(O), identity transformation, 189 3, imaginary part, 309

543

Identity matrix 1.82 transformation, I = R(O), 189 Ignorable (or cyclic) coordinate, 226. 230 Inclination of orbit, 339 Index lowering, 70 Index of refraction, 181,309,319 Inertial frame, 9 1,244 Inexorable motion, 251 Infinitesimal generator of group, 200 generator, commutation relations for, 205 group operator, 193 rotation operator, 207 translation operator, 207 Inhomogeneousequation, 457 Initial value formulation, ray tracing, 320 Instability threshold, coupled system, 520 Integrability,229 Integral complete, Hamilton-Jacobi, 329,332 and derivative of form, 150 evaluation by contour integration,444 general, Hamilton-Jacobi, 329 Hamilton, H.I., 416 invariant, 197 of motion, exploitation, 466 pmicular, 492 Integral invariant, 316,400 absolute, 401 invariance of 1.1.. 406 Poincark relative. 405 Poincark-Cartan.403 R.I.I. as adiabatic invariant, 409 R.I.I., dimensionality,409 R.I.I., time-independence,409 relative, 409 Integratioddifferentiationof forms, 150 Intensity variation along ray, 314 Interface, spherical, 319 Intrinsic, 40 nature of vector analysis, 375 Invariance adiabatic, of action, 422 form, 348 gauge, 364 symplectic two-form. 379 Invariant,41 area, volume, etc., in phase space, 412 form, 256 Galilean, 257 gauge-. See Gauge-invariant integral, 399 Lagrange integral, 3 16

544

lNDEX

Invariant (cont.) multivectormeasure in phase space, 413 Poincd-Cartan, 316 product, 40.67 Inverse matrix, 57 of symplectic matrix, 389 Involution in-, Lagrangian solution set, 465 in-, linear Hamiltonian solutions, 465 in-, property of solutions, 386 Isomorphism I, vector/for, induced by Z(2), 38 1 Isotropic vector, 67, 155 Iterative solution anharmonicoscillator, 489 first-order solution, 492 second-order solution, 494 zeroth-order solution, 490 (51. J2, J3), rotation matrices, 145,273.279 lacobean matrix, 373 Jacobi identity. 130 initial time, ,t?~. 334 integral, 252 method, 332 new coordinate parameters, 332 new coordinates and momenta, 340 new momentum parameters, 332 parameter,420 parameter, nature of, 334 parameters, Kepler problem, 339,447,480 theorem, 332,397 Jupiter satellite at Lagrange point, 254

k, wave vector. 74,344 KAM, Kolmogorov. Arnold, Moser, 528 K-B, abbreviationfor Krylov-Bogoliubov, 495 Kepler geosynchronousorbit, 249 Jacobi integral, 254 orbit, 253 orbit trigonometry,34 I problem, 25 reduced thee-body probiem, 25 1 satellite orbit, 254 sun, earth, moon system, 253 Kepler problem, 25,337,447,480 action elements, 447 canonical momenta, 339 conditionally periodic motion, 448 equality of periods, 449 Hamilton-Jacobi treatmenr, 339

Hamiltonian. 339 and hydrogen atom, 339 Jacobi parameters, 447 perturbation of, 480 zero eigenvalue,472 Ket, Dirac. 81 Killing terms in canonical perturbation theory, 53 1 Kinematic, 105 Kinetic energy expressed in quasi-velocities, 169 space and body formulas, 273 Kolmogorov superconvergentperturbation theory, 528 Kronecker 6,91,354 Krylov-Bogoluibov method, 495,507 equations (exact) in standard form, 497,507 equivalent damping and spring constant. 499 first approximation,495 higher approximation, 507 power and harmonic balance, 501 A' j , transformation matrix, 58

<< A 1 j i >>, averaged scalar product, 476 L.I.I., Lagrange integral invariant, 316 Lagrange bracket, 483 bracket related to Poisson bracket, 484 brackets for Kepler problem, 488 identity, 73 integral invariant, L.I.I., 3 16 planetary equations, 480,484 planetary equations. explicit, 488 set of solutions in-involution,465 stablehnstable fixed points, 254 stationary points, 253 Lagrange equation, 182 equivalence to Newton equation, 105 from absolute differential, 101 from geodesic, 104 from variational principle, 104 related to ray equation, 183 review, 165 Lagrangian expressed in quasi-velocities,214 inclusion of potential energy. 166 mechanics, 163 related to action, 325 relativistic, including electromagnetism, 360 set of solutions in involution. 386 Laplace transform method, 21,286,491,492 Larmor theorem, 255

INDEX

Lattice plane, 74 Laue equation, 77 Law Ampbre’s, 400 Einstein-Minkowski, 67 Hooke’s, 415,493 Newton, “violation”, 522,528 Pythagorean,67 Least time, principle of, 317 Left eigenvector, 83 Legendre transformation, 370 Leibniz rule, 130 Lens-like medium, 3 13 Levi-Civita three-index symbol, 73, 355 Libration, 11 in multiperiodic motion, 445 Lie algebra, 129,262 derivation of Euler equation, 281 rigid-body description, 271,279 structure coefficient or commutation coefficient, 200 Lie derivative contravariantcomponents, 120 coordinate approach, 1 I 1 Lie algebraic approach, 120 related to absolute derivative, 119 same as vector field commutator, 126 scalar function, I16 vector, 1 16 Lie theorem, 200 Lie-dragged congruence, 126 coordinate system, 11 1 scalar function, I I6 vector field, 116 Light cone, 349 Limit cycle, Van der Pol oscillator, 499 Line of equinoxes, 338 Linear system, 449 averaged scalar product ((v(r), u(r))), 476 averaged scalar product ((A I p ) ) ,476 constant coefficient equations, 449 Hamiltonian.456 Hamiltonian, eliminating velocity terms, 452 Hamiltonian, periodic, 468 unique solution generator 3,477 Linearization, equivalent, 499 Linearized, 43 change of function, 203,374 coordinate translation, 95 equations of motion, 449 function variation, 43 introduction of tangent space, 17 1,202

545

Lie-dragged coordinate transformation, I 15 motion, 367 ray equation, 314 Linstedt method, 490 Liouville determinant formula, 461 symplectic transfer matrix requirement, 410 theorem, 314,380,405.410 Logarithm of matrix, 457,470 Longitudinal coordinate as independent variable, 366 Loop defect, 122 Lorentz “rotation,” 35 1 force law, derived from Lagrangian, 362 transformation, 35 1 velocity transformation. 352 Lyapunov’s theorem, 470.5 1 1 Magnetic trap, or bottle, 434 guiding center, 434 invariants Ill and Il.437 magnetic moment invariant. p , 437 Manifest covariance and gauge invariance, 258 covariant. 45 Manifold. 154,374 MAPLE solution, 4 Christoffel symbol evaluation, 107 investigation of 3-D rotation group, 196 particle in one-dimensional potential, 6 reduced three-body problem, code not shown, 252 two weakly coupled oscillators, 12 Matrix 2 x 2, relation between inverse and symplectic conjugate, 389 J = - S ,In ‘ Hamilton’s equations, 384 S = -J, in Hamilton’s equations, 384 associated to torque, 28 1 associated with angular momentum, 281 associated with plane, 134 association with vector or bivector, 156 commutator, 156 composition, 59 concatenation,59 conventions, 57 diagonalization, 12, 18,451,458 element, multidimensional perturbation theory, 5 17 equations of motion, 450,456 exponentiation. 274,457, 469 fundamental solution, 467 Hamiltonian, 458

546

INDEX

Matrix (con?.) Infinitesimally symplectic, 458 inverse of symplectic, 390 Jacobian, 374 logarithm, 457,470 monodromy, 467 optics, 319 Pauli spin, 156 single-period transfer, 467 symplectic conjugate in block form, 391 symplectic, determinant = 1,389 transfer. See tranasfer matrix transformation to normal coordinates, 45 1 Matrizant,467 Maxwell equations, 52 Mechanics gauge-invariant,237 Hamiltonian, 305 Lagrange-Poincak. 164 Lagrangian, I63 Newtonian, 235 quantum, 329,343 related to optics, 343 relativistic, 348 symplectic. 369 vector, 237 Mercury, advance of perihelion, 485 Method action-angle. 426 adiabatic invariance,435 approximate, 413 averaging fast motion, 435 canonical transformation,4 16 Fourier expansion, nonlinear, 490 generating function, 417 Hamilton-Jacobi, 418 invariants ofslow motion after averaging fast, 435 iterative, 250,489 Jacobi, 332 Krylov. Bogoliubov (Mitropolsky),495 Linstedt, for anharmonic oscillator, 490 perturbation, 479 semiclassical, or WKB, 344 separation of variables, 329 symplectic perturbation, 5 10 variation of constants. See Variation of constants Metric artificial Pythagorean metric in phase space, 385 Einstein-Minkowski, 350 form or tensor, 67.90.354

geometry, 66 Hermitean, 476 Michelson-Morley experiment,349 Minkowski metric, 67,350 Mixed partials, equality of, 199 Mode shape, 23 Moment of inertia ellipsoid. 273 tensor, 30,273 Momentum canonical one-form, 376 canonical. in magnetic trap, 436 conjugate, 188 conjugate of 8, particle in magnetic field, 431 conservation, 221 conserved, 188 from action S, 332 from action S(q), 327 from eikonal, 325 from Lagrangian, 222,305,357.369 in geometric optics, 3 19 not essential to Lagrangian description, 172 one-form yields canonical two-form, 379 quantum mechanical relation to wavelength, 345 relativistic, 357 Monodromy matrix, 467 Moon, earth, sun system, 251 Motion autonomous, 2%. 303 nonautonomous, 296 Moving frame method, Cartan's, 257 Multidimensionalsystem, 28,443,449 Multivector, 137 area, volume, etc.. 140 Hodge-star. 142 invariant measure, 140 measure in phase space, 4 1 I p-vector, 140 supplementary, 142 Natural basis vectors, 92 frequency,429 Near-symplecticperturbation theory, 5 10 New coordinates Q' and momenta Pi, 4 16 coordinates and momenta, 332 Newton equation, 237, 256 law, 237 Noether invariant. 232 theorem. 230

Nonlinear. See Anharmonic oscillator Normal mode analysis, 19.45 1 Notation, xxiii multiple meanings of x i , 124 relative/absolute integral invariant, 409 Q.Cartan matrix, 144 oil, Ctuistoffel form, 95

1, identity matrix, 82 Old coordinates gi and momenta p i , 416 O.P.L., optical path length, 181,309,317 Operator infinitesimal group, 193 vector field, 121 Optics angular momentum analog, 314 geometric, 307 matrix, 3 19 paraxial or Gaussian, 319 related to mechanics, 307,343 Orbit element, 27 element, planetary, 338 equation, 26 Orientation innedouter, 39 of orbit, 339 Orthogonal condition, 85 on the average, 5 14 vector, 68.81 Orthogonal matrix eigenvalues, 196 orthonormal rowdcolumns, 156 parameterization, 206 Oscillator action So(9. E ) , 336 action-angle approach, 427 anhannonic, 489 curve of constant energy, 337 damped and driven, 23 multidimensional, 18,443 new Hamilton’s equations, 426 new Hamiltonian, 426 parametric, 428,431 parametric, new Hamiltonian, 429 phase space trajectory, 337 Van der Pol, 498 weakly coupled, 12 Osculating plane, 240,482

4, electromagnetic scalar potential, 53, 360

Pi or a;,Jacobi momenta, 332

P,projection operator, multidimensional perturbation theory, 5 14 p, momentum, 305,325,369 p/P, mechanicailgeneralizedmomentum, 361 Pair of planes as covariant vector, 37 Parallel displacement of coordinate triad, 29 1 displacement, LeviXivita, 291 pseudo-, 117 translation, 29 1 Parallelism, 94 Parameter control, 450 essential, 191 independent, 330 Jacobi, 334.420 reduction to minimal set, 206 Parametric oscillator, 422,430,43 1,521 Paraxial optics, 3 19 Particle mass as gravitational “charge.” 21 1 Pauli spin matrix, 156 algebra, 275 Pendulum canonical perturbation method, 530 example of Poincark approach, 174 Foucault, 284 K-Bmethod, 497,506,410 variable length, 43 1 Perigee, 339,486 Perihelion, 485 Periodic solution of inhomogeneous linear equation, 474 Periodic system, 467 characteristic exponent, 470 characteristicmultiplier, 470 conditionally, motion, 442 linear Hamiltonian system, 468 variation has 1 as characteristic multiplier, 474 Perturbation central force, 488 Kepler equations, 480 parametric of Hamiltonian, 422 of periodic system, 1 as multiplier, 473 series, 5 18 theory, 479 time-dependent, multidimensional, 52 1 weak damping, 23,456,518,526 Perturbation theory, 479 canonical, 528 Fourier expansion, 5 12 Fourier expansion of perturbation,502 Lagrange planetary equations, 480 multidimensional, 5 10

548

INDEX

Perturbation theory (conr.) superconvergent,528 time-dependent,52 1 unperturbed action-angle analysis, 427 Pfafiian form, 95 Phase -like, action S, 329 advance, rate of, 3 1 I astronaut exercise, 300 geometric. 284 velocity, 309,324 Phase space artificial Pythagorean metric, 384 configuration space comparison, 372 density conserved, 412 extended, 403,405 measure of area, volume, etc., 41 I no crossing requirement, 372 orbit illustrated, 445 rotation, 445 trajectory of oscillator, 337 Photon energy, 78 momentum, 78 trajectory, 324 Planck’s constant. 78,344,408 Plane ecliptic, 338 equatorial. 338 Plumb bob, 264 fictitious force method, 269 gauge-invariantmethod, 270 inertial frame force method, 267 inertial frame torque method, 268 transformationmethod, 270 Poincark relative integral invariant,404 terminology for integral invariants,400, 403 variational equations, 471 Poincark equation, 166, 172 and rigid body motion, 178 derivation using vector fields, 181 examples. 174, 180 features, I75 generalized Lagrange equation, 165 in terms of group generators, 205 invariance. 176 rigid body motion, 2 I3 simplificationusing group theory. 188 PoincarWartan integral invariant. 402 Poisson bracket, 396 and quantum mechanics, 398 in perturbation theory, 398

properties, 397 related to Lagrange bracket, 484 solutions, in-involution. 463 Poisson theorem, 397 Potential effective, 25 gravitational, 210 Morse, 6 multidimensional, 18 one-dimensional, 6 relation to potentiai energy, 209 scalar, 360 vector, 360 Potential energy and generalized force. 105, 173 derived from potential, 209 inclusion in Lagrangian, 173 Power balance, in K-B method, 501 Precession of rigid body, 219 of orbit, Einstein, 485 Foucault pendulum, 289 Principal axes, 30,280 normal, 240 Principle of constancy of speed of light, 348 d’Alembert’s,479 Fermat’s. 3 I8 greatest (proper) time, 35 1 Hamilton’s, 181 Huygens’, 322 of least time, 317 of least time, relativistic, 356 variational. 3 16 Product exterior, 140 Hermitean. 8 I inner, explicit evaluation, 381 skew-scalar symplectic, 386 skew-scalar, in various forms, 388 symplectic, linear Hamiltonian system, 462 tensor, 377 wedge, 140,377 Projected area, 378 Projection, 56 operator P,multidimensional perturbation theory, 5 14 Proper distance, 350 time, 350 Pseudo-harmonicsolution, 469 Pseudo-parallel, I17

INDEX

Pythagorean law, 35,67 relation for areas, 141,412

Q' or Bit Jacobi coordinates, 33 I Qualitative analysis, autonomous oscillators, 502 Quantum mechanics, 328 commutator from Poisson bracket, 398 importance of adiabatic invariants, 426 Poisson bracket, 399 quantudclassical correspondence, 398 related to optics and classical mechanics, 343 Schrijdingerequation, 343 Quasibasis vectors, 125 coordinate, 168 coordinate as group parameter, 191 coordinate, example of its nonexistence, 204 displacement, expressed as form, 176, 177 velocity, 168 velocity, related to generalized velocity, 178 R.I.I., Poincarc! relative integral invariant. 404 R ( 0 ) = I, identity transformation, 191 R(r), Lagrange perturbation function, 480 I,real part, 309 Radius of curvature Frenet-Serret, 241 invariant expression for, 147 Ray, 307 -wavefront equation, 3 12 analog of Newton equation, 3 13 equation, 312 hybrid equation, 31 1,328 in lens-like medium, 3 13 linearized equation, 314 obtained from wavefront, 31 1 variation of intensity along, 3 14 Reciprocal basis, 71 Reduced mass, 28 three-body problem, 25 1 Reduction quadratic form,18 Routhian, 226 to quadrature, 6,340,465 to quadrature, Stackel, 444 to sum or differenceof squares, 68 Reference frame, 9 1 trajectory, 308,366,372,378

549

Reflection Bragg. 74 in hyperplane, 133 vector and bivector, 157 Refractive index, 3 19 Relative angular velocity, 259 velocity, 259 Relativistic action, 357 action, including electromagnetism,360 antisymmetric tensor, 355 energy, 357 forced motion, 360 four-acceleration, 355 four-gradient, 355 four-momentum, 358 four-potential. 360 four-tensor, 354 four-vector, 353 four-velocity,355 Hamilton-Jacobi equation, 359 Hamilton-Jacobi equation, including electromagnetism, 362 Hamiltonian, including electromagnetism, 362 metric tensor, 354 momentum, 357 principle of least time, 356 rate of change of energy, 363 rate of work done, 360 rest energy E = mc*,357 trajectory determination, 364 Relativity, 348 Einstein, 348 Galilean, 349, 351 special, 348 Remembered position, 246 Repeated-index summation convention,40 Representation reflection of bivector, 157 reflection of spinor, 157, 159 reflection of vector, 157 rotation of spinor, 159 rotation of vector or bivector, 158 Resonance, 49 I , 523 condition, 526 threshold, 528 vanishing denominator, 492,526 Ricci's theorem, 100 Rigid body Euler equation, 281 gauge-invariantdescription, 279 Lie algebraic description, 279

-

550

INDEX

Rigid body (cow.) motion, commutation relations, 2 12 Poincart analysis, 206 Poincark equation, 213 Rolling ball, 21 1 asymmetrically loaded, 2 I9 equations of motion. 215 Rotation and reversal, 133 as product of reflections, 134. 135 expressed as product of reflections, 135 group, 195 infinitesimal,relation to bivector, 143 Lie group of, 136 noncommutationof, 274 proof of group property, 136 proper/improper, 133 spinor representation, I58 Routh procedure. 226 Routhian. 227 ( o l , q . 03). Pauli spin matrices, 156,275

S, unique solution generator, 477,523

S(q),action. 326 S(q. p, 1 ) . generating function, 419

So(q,P),abbreviated action, 420 S = -J. matrices in Hamilton's equations,

384

S . unique solution generator, 87,516 S.I. units, 348 Satellite orbit stability, 255. See ulso Kepler Scalar product, 67.90 wave equation, 309 Scaling considerations, 3 1 Schriidingerequation, 329,335.422 A = 0 limit, 344 time-dependent, 344 time-independent, 345 Schrodingerrneisenbergpicture, 5 13 Science Museum. London, England, 284 Secular terms, anharmonic oscillator. 490, 514 Self-adjoint,83 Separation additive/multiplicative,329 of variables. 329 of variables, Kepler problem. 339 Shape normal mode, 2 I of variation, 183 or internal configuration, 296,302 Similarity, 3 I, 275,277

Simple harmonic oscillator, 6. 23 action, 404.422 coupled, 449 Hamilton-Jacobi treatment of, 335 parametrically driven. 422,428,431 R.I.I. and I.I., 404 Sine-like trajectory,314 Singular sets of linear equations, 84 Skew coordinate frame, 68 slowness, vector of normal, 324 Snell's law, 25, 317 SO(3) orthogonal group, 137 related to SU(2). 154 Soft or hard spring, sign of cubic term. 493 Solution Hamilton-Jacobi, 328 sindcosine-like, 314 Space -like. 67,349 and body frames, 272 extended phase-, 403,405,416 or inertial, frame, 244 phase and configuration, 372 Special relativity.See Relativity; special Spinor association with vector, 155 defined, 155 in pseudo-Euclideanspace, 160 operation on, 159 proof it is a tensor, 155 reflection and rotation, 159 three dimensions, 154 unitarity of its rotation matrix, 160 Stackel's theorem, 443 Stability of satellite orbit, 255 Standard K-B form, 497 Stokes lemma, 402,405,407 lemma for forms, 408 theorem for forms, 153 Structure constant, 172 antisymmetry in lower indices, 179 as component of form, 176 as Lie algebra commutation coefficient, 200 Euclidean, 2 I5 example, 180 in terms of close-to-identity elements, 200 rotation group, 207 rotationhanslation, 2 15 SU(2) related to SO(3). 154 unimodular group, 160 Summation convention, 40

INDEX

Sun. earth, moon system, 25 1 Superconvergent perturbation theory, 494, 528,532 Surface integral. 151 Sylvester’slaw of inertia, 70 Symmetrictop, 227 Symmetry and conservation laws, 2 19 Sympatheticdamping, 520 Symplectic,320, 369 alternate coordinate ordering, 395 basis, 386 canonical form, 386 conjugate in block form, 390 conjugate, of matrix, 389 feature of anharmonic oscillation, 494 geometry,373 geometry, analogy to Euclidean geometry, 385 geometry, properties derived, 388 group, 387 hard to maintain perturbatively,494 infinitesimally,458 mechanics, 369 multidimensionalperturbation theory, 5 10 origin of the name “symplectic,”380 perturbation theory, 5 10 properties of phase space, 373 skew-scalarproduct, 386 space, dimension must be even, 387 symmetriclantisymmetric symplectic distinction, 523 system evolution, 409 transformation, 387 two-form, derived from momentum one-form, 379 two-form, or canonical two-form, 376 Symplectic matrix 4 x 4 diagonalization, 394 6 x 6 diagonalization, 395 determinant = 1,388 eigenvalue, 391 eigenvalues,392 eigenvector,391 robustness of eigenvalue under perturbation, 394 T M , union of all tangent spaces, 228 T M q , tangent space at q. 228,374 Tangent bundle, 228 Tangent space, 169,374 algebra, 171 and instantaneousvelocity, 170 legitimacy of dividing by d r , 171 linearized introduction of. 170

551

or tangent plane, 171 Taylor approximation, 374. See also Linearized Telescope in space Hubble. 254 next generation, 254 Tensor, 6 1 algebra, 54 alternating, 62 antisymmetric, 62 contraction, 61 multi-index, 61 notation, Einstein, 353 overlap of algebra and calculus, 64 product, 378 Theorem adiabatic invariance, 425 contraction of indices, 62 Darboux’s, 387 Fermat’s, 318 Fermat’s, fallacy in proof, 400 Floquet’s, 469 Gauss’s, 147 1.1.. integral invariant, 404 invariance of R.I.I., 405 Jacobi’s, 332,397 Kolmogorov’s, superconvergence,533 Larmor’s, 254 Lie’s, 200 Liouville’s. 314,380,405,410 Lyapunov’s, 470 Noether’s, 23 1 Poincark, series nonconvergence,494 Poisson’s, 397 Ricci’s, 100 Rotation as product of reflections, 135 Rotations form group, 136 Stackel’s, separability, 443 Stokes’, 150,400 Sylvester’s,70 time evolution of quantum commutator, 398 Three-body problem, 25 1 Three-index antisymmetric symbol, 73 Threshold of instability, 527 Time -like, 67, 349 average, 423 dependence, normal mode, 460 derivative expressed as Lie derivative, 207 derivative operator, 259, 280 of passage through perigee, 339 Toroidal configuration space geometry, 447 Torque, 210,265 Torsion, 240

552

INDEX

Trace, 63 Trajectory configuration space, 308 determination. relativistic, 364 parameterizationnot important, 201 phase space, no crossing property, 308 photon, 313 reference. 308,366, 372,378 Trampoline, 295 Transfer matrix, 82,314,457 periodic system, 468 required to be symplectic by Liouville, 410 Transform Laplace, 2 I , 286 symplectic. 387 Transformation affine. 91 canonical, 415 canonical, using G(q, Q, t), 418 canonical, using S(q, P, 1 ) . 419 centered affine, 90 close-to-identity, 190 of coordinates, 56 of distribution, 60 of force vector. 256 from unperturbed action-angle analysis, 427 gauge, 363.364 group of continuous, 188 of Hamiltonian independent variable, 366 of Hamiltonian matrix to standard form, 455 Legendre, 370 Lorentz, 352 Lyapunov, 470 relativistic velocity, 352 similarity, 259,277 symplectic, 388 from time to longitudinal position, 366 to action-angle variables, 427 to normal coordinates, 45 1 Transitive group, 191,206 Transport, parallel, 291 Transverse wake force, 521 Trigonometry of Kepler orbit, 341 Trivector association with matrix, 157 invariant measure of, 141 vector triple product, 157 True anomaly, 486 True vector, 45 Tumbler. See Gymnast Tune, or characteristic exponent, 470 ’hrning point, 6 Twin paradox for Foucault “clock,” 291

Unit cell, 71 Unit tangent vector, 240 Unit vector, 9% time derivativeof, 239 unit length, 238 Unitary geometry, 80 Hermitean metric, 81 ((v(t), u ( t ) ) ) ,averaged scalar product,

476 Van der Pol oscillator, 499,509 solution, precursor to K-B, 495 Vanishing denominator problem of, 492 source of resonance, 526 Variable-lengthpendulum, 43 I Variation calculusof, 181, 183,324,325 end point, 326 mono-frequency,309 Variation of constants, 413.448,480,489. 5 10,528 conditionally periodic motion, 448 Kepler problem, 480 Krylov-Bogoliubov approach, 495 Variational (or Poincark)equations, 471 principle, 183, 316 Vector association with reflection, 156 curvilinear coordinates, 237 incommensurateat disjoint points, 94 mechanics, 237 true. 45 Vector field, 98 as directional derivative, 201,202 as group operator, 193 associated with dH. 382 identified with differential operator, 12 I rotation operators. 179 total derivative notation, 121 Velocity group, 309,324 phase, 309,324 terms, elimination of, 452 Virial theorem, 3 1 Virtual displacement, 201 Volume determined by n vectors, I37 oriented, 137 Vortex line, 402.407 Vorticity, 402

INDEX

W ( x l ,x2). Hamilton point characteristic, 322. 417 Wake force, felt by trailing object, 52 I Wave -front, 14,307,322 -front, surface of constant action, S, 327 equation, 309,343 function, 309,343 -length, A, 309,345 -length, vacuum, Ao, 309 number, k, 309 number, vacuum, k g , 309 phase, 3 I0 plane, 309 vector, k, 74 vector, related to electric field, 3 16

553

Wedge product, 63,377 Weyl, originator of symplectic geometry, 380 Worklenergy relation, relativistic. 363 World interval, 349 point, 349 (x', x2, . . . ,x n ) , contravariant components,

39.55 (xi, x2, . . . ,x n ) ,covariant components, 70 x 8 y. tensor product, 62

x A y. wedge product, 63

i ,39 X-ray scattering, 74 Zeeman effect. 255

This Page Intentionally Left Blank