Essentials of Mathematical Methods in Science and Engineering
S. Selguk Bayin Middle Etrst Technical University Ankurc...
310 downloads
2491 Views
23MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Essentials of Mathematical Methods in Science and Engineering
S. Selguk Bayin Middle Etrst Technical University Ankurcl, T w k q
WILEY A JOHN WILEY & SONS, INC., PUBLICATION
This Page Intentionally Left Blank
Essentials of Mathematical Methods in Science and Engineering
This Page Intentionally Left Blank
Essentials of Mathematical Methods in Science and Engineering
S. Selguk Bayin Middle Etrst Technical University Ankurcl, T w k q
WILEY A JOHN WILEY & SONS, INC., PUBLICATION
Copyright C 2008 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means. electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written peimission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Perniissions Department, John Wiley & Sons, Inc., 11 1 River Street, Hoboken, NJ 07030. (201) 748-601 I , fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental. consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (3 17) 572-3993 or fax (3 17) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiIey.com.
Library of Congress Cataloging-in-Publication Data:
Bayin, $. SelGuk, 1951Essentials of mathematical methods in science and engineering / $. SelGuk Bayin. p. cm. Includes bibliographical references and index. ISBN 978-0-470-34379- I (cloth) I . Science-Mathematics. 2. Science-Methodology. 3. Engineering mathematics. 1. Title. Q158.5.B39 2008 501'S l L d c 2 2 2008004313
Printed in the United States of America. I 0 9 8 7 6 5 4 3 2 1
To my father, Omer Bayan
This Page Intentionally Left Blank
Contents in Brief
1
FUNCT I0NA L ANALYSIS
2
VECTOR ANALYSIS
3
GENERALIZED COORDINATES and TENSORS
139
4
DETERMINANTS and MATRICES
207
5
LINEAR ALGEBRA
241
6
SEQUENCES and SERIES
303
7
COMPLEX NUMBERS and FUNCTIONS
331
8
COMPLEX ANALYSIS
369
9
0RDINARY DIFFER ENT IA L EQ UAT I 0NS
407
1
57
10 SECOND-ORDER DIFFERENTIAL EQUATIONS and SPECIAL FUNCTIONS
11 BESSEL’S EQUATION and BESSEL FUNCTIONS
12 PARTIAL DIFFERENTIAL EQUATIONS and SEPARATION of VARIABLES
469 509 541
13 FOURIER SERIES
585
14 FOURIER and LAPLACE TRANSFORMS
607
15 CALCULUS of VARIATIONS
637
16 PROBABILITY THEORY and DISTRIBUTIONS
667
17 INFORMATION THEORY
721
vii
This Page Intentionally Left Blank
CONTENTS
Preface
xxi
Acknowledgments 1
xxvii
FUNCTIONAL ANALYSIS
1
1.1 1.2 1.3 1.4 1.5 1.G 1.7 1.8 1.9 1.10 1.11 1.12 1.13
1
Concept of Function Continuity and Limits Partial Differentiation Total Differential Taylor Series Maxima and Minima of Functions Extrema of Functions with Conditions Derivatives and Differentials of Composite Functions Implicit Function Theorem Inverse Functions Integral Calculus and the Definite Integral Riernann Integral Improper Integrals
4 6 8 10
14 18 22
24 30 32 34 37 ix
X
CONTENTS
1.14 1.15 1.16 1.17 1.18 1.19
Cauchy Principal Value Integrals Integrals Involving a Parameter Limits of Integration Depending on a Parameter Double Integrals Properties of Double Integrals Triple and Multiple Integrals Problcms
VECTOR ANALYSIS
2.1 2.2 2.3 2.4
2.5
2.6
2.7
2.8
2.9 2.10
Vector Algebra: Geometric Method 2.1.1 Multiplication of Vectors Vector Algebra: Coordinate Representation Lines and Planes Vector Differential Calculus 2.4.1 Scalar Fields and Vector Fields 2.4.2 Vcctor Differentiation Gradient Operator 2.5.1 Meaning of the Gradient 2.5.2 Directional Derivative Divergence and Curl Operators 2.6.1 Meaning of Divergence and the Divergence Theorem Vector Integral Calculus in Two Dimensions 2.7.1 Arc Length and Line Integrals Surface Area and Surface Integrals 2.7.2 An Alternate Way to Write Line Integrals 2.7.3 2.7.4 Green’s Theorem 2.7.5 Interpretations of Green’s Theorem Extension to Multiply Connected Domains 2.7.6 Curl Operator and Stokes’s Theorem 2.8.1 On the Plane 2.8.2 In Space 2.8.3 Geometric Interpretation of Curl Mixed Operations with the Del Operator Potential Theory 2.10.1 Gravitational Field of a Spherically Symmetric Star 2.10.2 Work Done by Gravitational Force
40 42 46 47 49 50 51
57 57 60 62 68 70 70 72 73 74 75 77 78 83 83 87 89 91 93 94 97 97 102 105 105 108 111 112
CONTENTS
2.10.3 Path Independence and Exact Differentials 2.10.4 Gravity and Conservative Forces 2.10.5 Gravitational Potential 2.10.6 Gravitational Potential Energy of a System 2.10.7 Helmholtz Theorem 2.10.8 Applications of the Helmholtz Theorem 2.10.9 Examples from Physics Problems
3
GENERALIZED COORDINATES and TENSORS
3.1
3.2
3.3
3.4
3.5
Transformations Between Cartesian Coordinates 3.1.1 Basis Vectors and Direction Cosines Transformation Matrix and the Orthogonality 3.1.2 Relation 3.1.3 Inverse Transformation Matrix Cartesian Tensors 3.2.1 Algebraic Properties of Tensors 3.2.2 Kronecker Delta and the Permutation Symbol Generalized Coordinates 3.3.1 Coordinate Curves and Surfaces Why Upper and Lower Indices 3.3.2 General Tensors 3.4.1 Einstein Summation Convention Line Element 3.4.2 Metric Tensor 3.4.3 How to Raise and Lower Indices 3.4.4 3.4.5 Metric Tensor and the Basis Vectors Displacement Vector 3.4.6 Transformation of Scalar Functions and Line 3.4.7 Integrals 3.4.8 Area Element in Generalized Coordinates Area of a Surface 3.4.9 3.4.10 Volume Element in Generalized Coordinates 3.4.11 Invariance and Covariance Differential Operators in Generalized Coordinates 3.5.1 Gradient 3.5.2 Divergence 3.5.3 Curl
xi
114 116 118 120 122 123 127 130
139 140 140 142 144 145 148 151 154 154 159 160 163 164 164 165 166 168 169 171 173 177 178 179 179 180 182
xii
CONTENTS
3.6
4
D E T E R M I N A N T S and M A T R I C E S
4.1 4.2 4.3 4.4 4.5 4.6 4.7 -1.8 -1.9 4.10
5
3.5.4 Laplacian Orthogonal Generalized Coordinates 3.6.1 Cylindrical Coordinates 3.6.2 Spherical Coordinates Problems
Basic Definitions Operations with Matrices Subinatrix and Partitioned Matrices Systems of Linear Equations Gauss’s Method of Elimination Determinants Properties of Determinants Cramer’s Rule Iiivcrse of a Matrix Homogeneous Linear Equations Problems
LINEAR ALGEBRA
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.11 5.15 5.16 5.17 5.18
Fields and Vector Spaces Linear Combinations, Generators, and Bases Coniponents Linear Transformations Matrix Representation of Transformations Algebra of Transformations Change of Basis Irivariants Under Similarity Transformations Eigenvalues and Eigenvectors Moment of Inertia Tensor Inner product Spaces The Inner Product Orthogonality and Completeness Gram -Schmidt Ort hogonalization Eigenvalue Problem for Real Symmetric Matrices Prcsciice of Degenerate Eigenvalues CJiiatlratic Forms Herniitian bIatrices
186 186 187 193 198 207 207 208 214 216 217 22 1 223 226 230 233 234
241 241 244 246 249 250 252 254 256 256 265 270 271 274 276 277 278 285 289
CONTENTS
5.19 5.20 5.21 5.22
6
SEQUENCES and SERIES
6.1 6.2 6.3
6.4 6.5 6.6 6.7 6.8 6.9 6.10
7
Matrix Representation of Linear Transformations Functions of Matrices Function Space and Hilbert Space Dirac’s Bra and Ket vectors Problems
Sequences Infinite Series Absolute and Conditional Convergence 6.3.1 Comparison Test 6.3.2 Limit Comparison Test 6.3.3 Integral Test 6.3.4 Ratio Test 6.3.5 Root Test Operations with Series Sequences and Series of Functions Ail-Test for Uniform Convergence Properties of Uniformly Convergent Series Power Series Taylor Series and Maclaurin Series Indeterminate Forms and Series Problems
COMPLEX NUMBERS and FUNCTIONS
7.1 7.2 7.3 7.3 7.5 7.G 7.7 7.8 7.9 7.10
The Algebra of Complex Numbers Roots of a Complex Number Infinity and the Extended Complex Plane Complex Functions Limits and Continuity Differentiation in the Complex Plane Analytic Functions Harmonic Functions Basic Differentiation Formulas Elementary Functions 7.10.1 Polynomials 7.10.2 Exponential Function 7.10.3 Trigonometric Functions
xiii
293 294 296 297 298 303 304 308 309 309 309 309 310 310 314 316 318 319 32 1 324 324 326
331 332 336 339 342 344 345 349 350 352 353 353 354 356
xiv
CONTENTS
7.10.4 Hyperbolic Functions 7.10.5 Logarithmic Function 7.10.6 Powers of Complex Numbers 7.10.7 Inverse Trigonometric Functions Problems
8
CO MPL EX ANALYSIS 8.1 8.2 8.3 8.4
8.5 8.6 8.7
8.8
8.9 8.10 8.11
9
Contour Integrals Types of Contours The Caucl-iy-Goursat Theorem Iiidefinit e Integrals Simply and Multiply Connected Domains The Cauchy Integral Formula Derivatives of Analytic Functions Coniplex Power Series 8.8.1 Taylor Series with the Remainder 8.8.2 Laurent Series with the Remainder Convergelice of Power Series Classification of Singular Points Residue Theorem Problems
0 R DI N A R Y DIFFER ENTIA L EQ UAT1 0 NS 9.1 9.2 9.3
9.4
Basic Definitions for Ordinary Differential Equations First-Order Differential Equations First-Order Differential Equations: Methods of Solution 9.3.1 Dependent Variable Is Missing 9.3.2 Independent Variable Is Missing The Case of Separable f ( z ,y) 9.3.3 9.3.4 Homogeneous f ( ~y), of Zeroth Degree 9.3.5 Solution When f ( z ,y) Is a Rational Function 9.3.6 Linear Equations of First-Order 9.3.7 Exact Equations 9.3.8 Integrating Factors 9.3.9 Bernoulli Equation 9.3.10 Riccati Equation 9.3.11 Equations That Cannot Be Solved for y’ Second-Order Differential Equations
357 358 359 362 362
369 370 372 376 379 381 381 384 385 385 389 393 394 397 40 1
407 408 410 412 412 412 412 413 413 416 417 419 423 424 426 429
9.5
9.6
9.7 9.8
10
CONTENTS
xv
Second-Order Differential Equations: Methods of Solution 9.5.1 Linear Homogeneous Equations with Constant Coefficients 9.5.2 Operator Approach 9.5.3 Linear Homogeneous Equations with Variable Coefficients 9.5.4 Cauchy -Euler Equation 9.5.5 Exact Equations and Integrating Factors 9.5.6 Linear Nonhomogeneous Equations 9.5.7 Variation of Parameters 9.5.8 Method of Undetermined Coefficients Linear Differential Equations of Higher Order 9.6.1 With Constant Coefficients 9.6.2 With Variable Coefficients 9.6.3 Nonhomogeneous Equations Initial Value Problem and Uniqueness of the Solution Series Solutions: Froberiius Method 9.8.1 Frobenius Method and First-Order Equations Problems
430 431 437 438 44 1 442 444 445 446 450 450 451 451 452 452 462 463
SECOND-ORDER DIFFERENTIAL EQUATIONS and SPECIAL 469 FUNCTIONS
10.1
10.2
Legendre Equation 10.1.1 Series Solution 10.1.2 Effect of Boundary Conditions 10.1.3 Legendre Polynomials 10.1.4 Rodriguez Formula 10.1.5 Generating Function 10.1.6 Special Values 10.1.7 Recursion Relations 10.1.8 Orthogonality 10.1.9 Legendre Series Hermite Equation 10.2.1 Series Solution 10.2.2 Hermite Polynomials 10.2.3 Contour Integral Representation 10.2.4 Rodriguez Formula 10.2.5 Generating Function
4 70 470 473 474 477 4 78 480 48 1 482 484 487 487 491 492 493 494
xvi
CONTENTS
10.3
11
BESSEL’S EQUATION and BESSEL FUNCTIONS
11.1
11.2
12
10.2.6 Special Values 10.2.7 Recursion Relations 10.2.8 Orthogonality 10.2.9 Series Expansions in Hermite Polynomials Laguerre Equation 10.3.1 Series Solution 10.3.2 Laguerre Polynomials 10.3.3 Contour Integral Representation 10.3.4 Rodriguez Formula 10.3.5 Generating Function 10.3.6 Special Values and Recursion Relations 10.3.7 Orthogonality 10.3.8 Series Expansions in Laguerre Polynomials Problems
Bessel’s Equation and Its Series Solution 11.1.1 Bessel Functions J*,(z), N,(z), and H:’”(x) 11.1.2 Recursion Relations 11.1.3 Generating Function 11.1.4 Integral Definitions 11.1.5 Linear Independence of Bessel Functions 11.1.6 Modified Bessel Functions I m ( z )and K,(z) 11.1.7 Spherical Bessel Functions jl(x),nl(z), and h1(1’2)(x) Orthogonality and the Roots of Bessel Functions 11.2.1 Expansion Theorem 11.2.2 Boundary Conditions for the Bessel Functions Problems
495 495 496 499 500 500 502 502 503 504 504 505 506 507
509 510 514 518 519 521 522 523 525 527 531 531 535
PARTIAL DIFFERENTIAL EQUATIONS and SEPARATION of VARIABLES 541
12.1
12.2
Separation of Variables in Cartesian Coordinates 12.1.1 Wave Equation 12.1.2 Laplace Equation 12.1.3 Diffusion and Heat Flow Equations Separation of Variables in Spherical Coordinates 12.2.1 Laplace Equ at ion ’
542 544 546 550 553 557
CONTENTS
12.3
13
14
12.2.2 Boundary Conditions for a Spherical Boundary 12.2.3 Helmholtz Equation 12.2.4 Wave Equation 12.2.5 Diffusion and Heat Flow Equations 12.2.6 Time-Independent Schrodinger Equation 12.2.7 Time-Dependent Schrodinger Equation Separation of Variables in Cylindrical Coordinates 12.3.1 Laplace Equation 12.3.2 Helmholtz Equation 12.3.3 Wave Equation 12.3.4 Diffusion and Heat Flow Equations Problems
xvii
558 563 563 564 565 566 567 569 570 570 572 580
FOURIER SERIES
585
13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8
Orthogonal Systems of Functions Fourier Series Exponential Forni of the Fourier Series Convergence of Fourier Series Sufficient Conditions for Convergence The Fundamental Theorem Uniqueness of Fourier Series Examples of Fourier Series 13.8.1 Square Wave 13.8.2 Triangular Wave 13.8.3 Periodic Extension 13.9 Fourier Sine and Cosine Series 13.10 Change of Interval 13.11 Integration and Differentiation of Fourier Series Problems
585 59 1 592 593 595 596 597 597 597 599 600 601 602 603 604
FOURIER and LAPLACE TRANSFORMS
607
14.1 14.2 14.3 14.4 14.5 14.6 14.7
Types of Signals Spectral Analysis and Fourier Transforms Correlation with Cosines and Sines Correlation Functions and Fourier Transforms Inverse Fourier Transform Frequency Spectrums Dirac-Delta Function
607 610 611 615 615 617 618
xviii
CONTENTS
14.8 14.9 14.10 14.11 14.12 14.13
15
General Fourier Transforms and Their Properties Basic Definition of Laplace Transform Diffcrcntial Equations arid Laplace Transforms Transfer Functions and Signal Processors Coririectiori of Signal Processors Problems
CALCULUS of VARIATIONS 15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 15.9 15.10
16
A Case with Two Cosines
A Siiiiple Case Variational Analysis Alternate Form of Euler Equation Variational Notation A Nore General Case Hamilton’s Principle Lagrange’s Equations of Motion Definition of Lagrangian Prescrice of Constraints in Dynamical Systems Conservation Laws Problems
PROBABILITY T H E O R Y and DISTRIBUTIONS 16.1
16.2
Introduction t o Probability Theory 16.1.1 Fundamental Concepts 16.1.2 Basic Axioms of Probability 16.1.3 Basic Theorems of Probability 16.1.4 Statistical Definition of Probability 16.1.5 Conditional Probability and Multiplication Theorem 16.1.6 Bayes’ Theorem 16.1.7 Geometric Probability and Buffon’s Needle Problem Permutations and Combinations 16.2.1 The Case of Distinguishable Balls with Replacement 16.2.2 The Case of Distinguishable Balls Without Replacement 16.2.3 The Case of Indistinguishable Balls
619 620 622 625 627 629 632 637 638 639 642 645 647 65 1 653 657 659 662 663 667 668 668 669 669 672 673 674 677 678 678 679 680
CONTENTS
16.2.4 Binomial and Multinomial Coefficients Applications to Statistical Mechanics 16.3.1 Boltzmann Distribution for Solids 16.3.2 Boltzmann Distribution for Gases 16.3.3 Bose-Einstein Distribution for Perfect Gases 16.3.4 Fermi -Dirac Distribution 16.4 Statistical Mechanics and Thermodynamics 16.4.1 Probability and Entropy 16.4.2 Derivation of p 16.5 Random Variables and Distributions 16.6 Distribution Functions and Probability 16.7 Examples of Continuous Distributions 16.7.1 Uniform Distribution 16.7.2 Gaussian or Normal Distribution 16.7.3 Gamma Distribution 16.8 Discrete Probability Distributions 163.1 Uniform Distribution 16.8.2 Binomial Distribution 16.8.3 Poisson Distribution 16.9 Fundamental Theorem of Averages 16.10 Moments of Distribution Functions 16.10.1 Moments of the Gaussian Distribution 16.10.2 Moments of the Binomial Distribution 16.10.3 Moments of the Poisson Distribution 16.11 Chebyshev’s Theorem 16.12 Law of Large Numbers Problenis
681 682 684 686 687 688 689 689 691 693 696 698 698 699 699 700 70 1 701 703 704 705 706 707 708 710 712 713
INFORMATION THEORY
721
16.3
17
xix
17.1 17.2
Elements of Information Processing Mechanisms Classical Information Theory 17.2.1 Prior Uncertainty and Entropy of Information 17.2.2 Joint and Conditional Entropies of Information 17.2.3 Decision Theory 17.2.4 Decision Theory and Game Theory 17.2.5 Traveler’s Dilemma and Nash Equilibrium 17.2.6 Classical Bit or Cbit 17.2.7 Operations on Cbits
724 726 729 731 735 736 742 746 750
XX
CONTENTS
17.3
Quantum Information Theory 17.3.1 Basic Quantum Theory 17.3.2 Single-Particle Systems and Quantum Information 17.3.3 Mach- Zchnder Interferometer 17.3.4 Mathematics of the Mach-Zehnder Interferometer 17.3.5 Quantum Bit or Qbit 17.3.6 The No-Cloning Theorem 17.3.7 Entanglement and Bell States 17.3.8 Quantum Dense Coding 17.3.9 Quantum Teleportation Problems
752 752 758 760 763 767 770 771 776 777 780 787
I11dcs
793
Prefa ce
After a year of freshman calculus, the basic mathematics training in science and engineering is accomplished during the second and third years of college education. Students are usually required to take a sequence of three courses on the subjects of advanced calculus, differential equations, complex calculus, and introductory mathematical physics. The majority of science and engineering departments today are finding it convenient t o use a single book that assures uniform formalism and a topical coverage in tune with their average needs. The objective of Essentials of Mathematical Methods in Science and Engineering is to equip students with the basic mathematical skills that are required by the majority of science and engineering undergraduate programs. Some of the basic courses taught in these programs are on the subjects of classical electrodynamics, classical mechanics, statistical mechanics, thermodynamics, modern physics, quantum mechanics, and relativity. The entire book contains a sufficient amount of material for a three-semester course meeting three or four hours a week. All this being said, respecting the disparity of the mathematics courses taught throughout the world, the topical coverage and the modular structure of the book make it versatile enough to be adopted for a number of mathematics courses and allows instructors the flexibility to individualize their own teaching while maintaining the integrity xxi
xxii
PREFACE
of the discussions in the book for their students.
About the Book
We give a coherent treatment of the selected topics with a style that makes the essential mathematical skills easily accessible to a multidisciplinary audience. Sirice t,he book is written in modular format, each chapter covers its subject thoroughly and thus can be read independently. This makes the book very useful as a reference or refresher for scientists. It is assumed that the reader has been exposed to two semesters of freshman calculus, which is usually taught, at the level of Thomas’ Calculus by Thomas, Jr. and Finney, or has acquired an equivalent level of mathematical maturity. The derivations and discussions are usually presented in sufficient detail so that the reader can follow the mathematics without much pause. Occasionally, when the proofs get t,oo technical for our purposes, we quote them without proof but refer to an appropriate book. All t,he references are collected at the back in alphabetical order with their full titles. Whenever there is credit due or some special reference worth pointing out, it is cited within the text. However, most of the references in our list are included as extra resources for the interested reader who wants to dwell on these topics further. Along with these references, students and researchers can use the websites http://en.wikipedia.org and http://scienceworld.wolfram.com/ for further resources. Of course, the website litt,p://lanl.arxiv.org/ is an indipensible tool for researchers on any subject,. This book concentrates on the analytic techniques. Computer programs like MathematicaO and MapleTh’arc capable of performing symbolic as well as numerical calculations. Even though they are extremely useful to scientists, one still needs a full grasp of the basic mathematical techniques to produce the desired result and to interpret it correctly. There are books specifically writt,eri for niatheniatical methods with these programs. The books by Kelly on Matheniatica and by Wang on Maple are included in our list of references at the back.
Summary of the Book
Chapter 1. Functional Analysis: This chapter aims to fill the gap between the introductory calculus and advanced mathematical analysis courses. It introduces the basic techniques that are used throughout mathematics. Limits, derivatives, integrals, extremum of functions, implicit function theorem, inverse functions, and improper integrals are among the topics discussed. Chapter 2. Vector Analysis: Since most of the classical theories can
PREFACE
xxiii
tie introduced in terms of vectors, we present a rather detailed treatment of vectors and their techniques. Vector algebra, vector differentiation, gradient, divergence and curl operators, vector integration, Green’s theorem, integral theorems, and the essential elements of the potential theory are among the topics discussed. Chapter 3. Generalized Coordinates and Tensors: Starting with the Cartesian coordinates, we discuss generalized coordinate systems and their transformations. Basis vectors, transformation matrix, line element, reciprocal basis vectors, covariant and contravariant components, differential operators in generalized coordinates, and introduction t o Cartesian and general tensors are among the other essential topics of mathematical methods. Chapter 4. Determinants and Matrices: A systematic treatment of the basic properties and methods of determinants and matrices that are much needed in science and engineering applications are presented here with examples. Chapter 5. Linear Algebra: We start with a discussion of abstract linear spaces, also called vector spaces, and then continue with systems of linear equations, inner product spaces, eigenvalue problems, quadratic forms, Hermitian matrices, and Dirac’s bra and ket vectors. Chapter 6. Sequences and Series: This chapter starts with sequences and series of numbers and then introduces absolute convergence and tests for convergence. We then extend our discussion to series of functions and introduce the concept of uniform convergence. Power series and Taylor series are discussed in detail with applications. Chapter 7. Complex Numbers and Functions: After the complex number system is introduced and their algebra is discussed, complex functions, complex differentiation, Cauchy-Riemann conditions and analytic functions are the main topics of this chapter. Chapter 8. Complex Analysis: We introduce the complex integral theorems and discuss residues, Taylor series and Laurent series along with their convergence properties. Chapter 9. Ordinary Differential Equations: We start with the general properties of differential equations, their solutions and their boundary conditions. Most commonly encountered differential equations in applications are either first- or second-order. Hence, we discuss these two cases separately in detail and introduce methods of finding their analytic solutions. We also study linear equations of higher order. We finally conclude with the Frobenius method applied to first- and second-order differential equations with interesting and carefully selected examples. Chapter 10. Second-Order Differential Equations and Special Functions: In this chapter, we discuss three of the most frequently encountered second-order differential equations of physics and engineering, that is, Legendre, Hermite, and Laguerre equations. We study these equations in detail from the viewpoint of the Frobenius method. By using the boundary conditions, we then show how the corresponding orthogonal polynomial sets
xxiv
PREFACE
arc constriictcd. We also discuss how and under what conditions these polynomial sets can be used to represent a general solution. Chapter 11. Bessel’s Equation and Bessel Functions: Bessel functions are among t,lie most frequently used special functions of mathernatical physics. Siiice their orthogonality is with respect to their roots and not with respect to it parameter in the differential equation, they are discussed here sepa,rately in great detail. Chapter 12. Partial Differential Equations and Separation of Variables: Most of the second-order ordinary differential equations of physics and engineering are obtained from partial differential equations via the method of separation of variables. We introduce the most commonly encountered partial differential equations of physics and engineering and show how the method of separation of variables is used in Cartesian, spherical, and cylindrical coordinates. Interesting examples help the reader connect with the knowledge gained in the previous three chapters. Chapter 13. Fourier Series: We first introduce orthogonal systems of functions and then concentrate on trigonometric Fourier series. We discuss their convergence and uniqueness properties along with specific examples. Chapter 14. Fourier and Laplace Transforms: After a basic introduction t,o signal analysis and correlation functions, we introduce the Fourier transforms and their inverses. We also introduce Laplace transforms and their applicat,ions to differential equations. We discuss met hods of finding inverse Lapla.cc transforms and their applications to transfer functions and signal proccssors. Chapter 15. Calculus of Variations: We introduce basic variational analysis for different types of boundary conditions. Applications to Hamilton‘s principle and to Lagrangian mechanics is investigated in detail. The presciicc of const,raiiits in dynaniical systems along with the inverse problem are discusscd with examples. Chapter 16. Probability Theory and Distributions: Some of the interest,ing t,opics covered in this chapter include the basic theory of probability, permutations and combinations, applications to statistical mechanics, and the connection with thermodynamics. We also discuss Bayes’ theorem, random variables, distributions, distribution functions and probability, fundamental theorem of averages, moments, Chebyshev’s theorem, and the law of large numbers. Chapter 17. Information Theory: The first part of this chapter is devoted to classical information theory, where we discuss topics from Shannon‘s tlieory, dccision theory, game theory, Nash equilibrium, and traveler’s dileninia. The definition of Cbits and operations with them are also introduced. Thc second part of this chapter is on quantum information theory. After a general survey of quantum mechanics, we discuss Mach-Zehnder interferometer, Qbits, entanglement, and Bell states. Along with the no-cloning theorem. quantum cryptology, quantum dense coding, and quantum teleportation arc amoiig the other interesting topics discussed in this chapter. This
PREFACE
XXV
chapter is written with a style that makes these interesting topics accessible to a wide range of audiences with minimum prior exposure t o quantum mechanics.
Course Suggestions Chapters 1-15 consist of the contents of the three, usually sequentially taught, core mathematical methods courses meeting 3-4 hours a week that most science and engineering departments require. These chapters consist of the basic mathematical skils needed for the majority of undergraduate science and engineering courses. Chapters 1-8 can be taught during the second year as a two-semester course. During the first or the second semester of the third year, a course composed of the Chapters 9-15 can complete the sequence. Chapters 9 through 12 can also be used in a separate one-semester course on differential equations and special functions. The two extensive chapters on probability theory and information theory (Chapters 16 and 17) are among the special chapters of the book. Even though most of the mathematical methods textbooks have chapters on probability, we have treated the subject with a style and level that prepares the reader for the following chapter on information theory. We have also included sections on applications to statistical mechanics and thermodynamics. The chapter on information theory is unusual for the mathematical methods textbooks at both the graduate and the undergraduate levels. By selecting certain sections, Chapters 16 and 17 can be incorporated into the advanced undergraduate curriculum. In their entirety, they are more suitable t o be used in a graduate course. Since we review the basic quantum mechanics needed, we require no prior exposure to quantum mechanics. In this regard, Chapter 17 is also designed to be useful to beginning researchers from a wide range of disciplines in science and engineering. Even though it is not meant to be complete, we have a rich list of references a t the back on probability theory, decision theory, game theory, and classical and quantum information theories. Others can be traced from these. Examples and exercises are always an integral part of any learning process, hence the topics are introduced with an ample number of examples. To maintain continuity of the discussions, we have collected excercises at the end of each chapter, where they are predominantly listed in the same order that they are discussed within the text. Occasionally, when proofs or extensions of certain results are too technical to be discussed within the text, they are assigned as exercises. Hence, it is recomended that the entire problem sections be read quickly before their solutions are attempted. Parts of this book are based on my lectures delivered at Canisius College, Buffalo, NY, during the years 1984-1986 and the Middle East Technical University, Ankara, Turkey, on various occasions. With their exclusive chap-
xxvi
PREFACE
ters, uniform level of formalism and coordinated, and complenientary coverage of topics, Essentzals of Mathematacal Methods an Scsence and Enganeerang connects with rny graduate textbook, Mathematzcal Methods zn Scaence and Engzneering, thus forming a complete set spanning a wide range of basic mathematical techniques for students, instructors, and researchers. For communications about the book and for some relevant sites to our readers, we will usc the website http://www.physics.metu.edu.tr/" bayin.
5.
Selquk Bayin ODTU Ankara, Turkey April 2008
Ac knowIedgment s
I would like to thank Prof. J.P. Krisch of the University of Michigan for always being there whenever I needed advice and for sharing my excitement at all phases of the project. My special thanks go to Prof. J.C. Lauffenburger and Assoc. Prof. K.D. Scherkoske at Canisius College. I am grateful to Prof. R.P. Langlands of the Institute for Advanced Study at Princeton for his support and for his cordial and enduring contributions t o METU culture. I am indebted to Prof. P.G.L. Leach for his insightful comments and for meticulously reading two of the chapters. I am grateful to Wiley for a grant to prepare the camera-ready copy, and I would like to thank my editor Susanne SteitzFiller for sharing my excitement. My work on the two books Mathematical Meth,ods in Science and Engineering and Essentials of Mathematical Methods in Science and Engineering has spanned an uninterrupted period of 6 years. With the time spent on my two books in Turkish published in the years 2000 and 2004, which were basically the forerunners of my first book, this project has dominated my life for almost a decade. In this regard, I cannot express enough gratitude to my darling young scientist daughter Sumru and beloved wife Adalet, for always being there for me during this long and strenuous journey, which also involved many sacrifices for them.
8.S.B. xxvii
This Page Intentionally Left Blank
CHAPTER 1
FUNCTIONAL ANALYSIS
A function is basically a rule that relates the members of one set of objects to the members of another set. In this regard, it has a very wide range of applications in both science and mathematics. Functional analysis is basically the branch of mathematics that deals with the functions of numbers. In this chapter, we confine ourselves t o the real domain and introduce some of the most commonly used techniques in functional analysis. 1.1 CONCEPT OF FUNCTION We start with a quick review of the basic concepts of set theory. Let S be a set of objects of any kind: points, numbers, functions, vectors, etc. When s is an element of the set S , we show it as s E
s.
(1.1)
For finite sets we may define S by listing its elements as SE
{Sl,SZ,...
>%I-.
Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
(1.2)
9. S e l p k Bayin
1
2
FUNCTIONAL ANALYSIS
For infinite sets, S is usually defined by a phrase describing the condition to be a member of the set, for example,
S = {All points on the sphere of radius R } .
(1.3)
When there is no room for confusion, we may also write a n infinite set as
S = { l , 3 , 5 ,." } .
(1.4)
When each member of a set A is also a member of set B, we say that A is a subset of B and write
A
c B.
(1.5)
The phrase B covers or contains A is also used. The union of two sets,
A
U B,
(1.6)
consists of the elements of both A and B. The intersection of two sets, A and B,is defined as
A n B = {All elements common t o A and B } .
(1.7)
When two sets have no common element, their intersection is called the null set or the empty set, which is usually shown by 4. The neighborhood of a point, ( x l , y l ) , in the zy-plane is the set of all points, ( x , y ) , inside a circle centered at (zl, y1) and with the radius 6:
An open set is defined as the set of points with neighborhoods entirely within the set. The interior of a circle defined by
x2 + y2 < 1
(1.9)
is an open set. A boundary point is a point whose every neighborhood contains at least one point in the set and at least one point that does not belong to the set. The boundary of the set in Equation (1.9) is the set of points on the circumference, that is,
x2 + y2
= 1.
(1.10)
An open set plus its boundary is a closed set. A function, f , is in general a rule, a relation that uniquely associates members of one set, A , with the members of another set, B. The concept of function is essentially the same as that of mapping, which in general is so broad that it allows mathematicians to work with them without any resemblance to the simple class of functions with numerical values. The set
CONCEPT OF FUNCTION
3
A that f acts upon is called the domain, and the set B composed of the elements that f can produce is called the range. For single-valued functions the common notation used is f :z
+f(z).
(1.11)
Here f stands for the function or mapping that acts upon a single number z, which is an element of the domain, and produces f ( z ) ,which is an element of the range. In general, f refers to the function itself and f ( x ) refers to the value it returns. However, in practice, f ( z ) is also used t o refer to the function itself. In this chapter we basically concern ourselves with functions that take numerical values as f ( z ) ,where the argument, z, is called the independent variable. We usually define a new variable, y, as
which is called the dependent variable. Functions with multiple variables, that is, multivariate functions, can also be defined. For example, for each point (z,y) in some region of the zy-plane we may assign a unique real number, f ( z ,y), according t o the rule
We now say that f (z, y) is a function with two independent variables, 2 and y. In applications, f ( z ,y) may represent physical properties like the temperature or the density distribution of a flat disc with negligible thickness. Definition of function can be extended to cases with several independent variables as
where rL stands for the number of independent variables. The term function is also used for the objects that associate more than one element in the domain to a single element in the range. Such objects are called multiple-to-one relations. For example,
f ( z ,y) = 2zy + x2: f ( z ) = sinz: f ( z ,y) = z + x2: f ( z ) = x 2 , z # 0: f(z, y) = sin zy:
single-valued or one-to-one, many- t o-one, single-valued, two-to-one, many-to-one.
Sometimes the term “function” is also used for relations that map a single point in its domain to multiple points in its range. As we shall discuss in Chapters 7 and 8, such functions are called multivalued functions, which are predominantly encountered in complex analysis.
4
1.2
FUNCTIONAL ANALYSIS
CONTINUITY AND LIMITS
Similar to its usage in everyday language, the word continuity in mathematics also implies the absence of abrupt changes. In astrophysics, pressure and density distributions inside a solid neutron star are represented by continuous functions of the radial position: P ( r ) and p ( r ) , respectively. This means that small changes in the radial position inside the star also result in small changes in the pressure and density. At the surface, r = R, where the star meets the outside vacuum, pressure has to be continuous. Otherwise, there will be a net force on the surface layer, which will violate the static equilibrium condition. In this regard, in static neutron star models pressure has t o be a monotonic decreasing function of T , which smoothly drops to zero at the surface:
P ( R ) = 0.
(1.15)
On the other hand, the density at the surface can change abruptly from a finite value to zero. This is also in line with our everyday experiences, where solid objects have sharp contours marked by density discontinuities. For gaseous stars, both pressure and density have t o vanish continuously at the surface. In constructing physical models, deciding on which parameters are going to be taken as continuous at the boundaries requires physical reasoning and some insight. Usually, a collection of rules that have to be obeyed at the boundaries are called the junction conditions or the boundary conditions. We are now ready to give a formal definition of continuity as follows: Continuity: A numerically valued function f ( z ) defined in some domain D , is said to be continuous at the point 20 E D if, for any positive number E > 0, there is a neighborhood N about z g such that If(.) - f(zo)l < E for every point common to both N and D , that is N fl D. If the function f ( z ) is continuous at every point of D , we say it is continuous in D. We finally quote two theorems, proofs of which can be found in books on advanced calculus: Theorem 1.1. Let f ( z ) be a continuous function at z and let {z,} be a sequence of points in the domain of f ( z ) with the limit lim zn
n+cc
42 ;
(1.16)
then the following is true:
(1.17) Theorem 1.2. For a function f(x) defined in D , if the limit (1.18) exists whenever x, E D and lim
n-cc
2, +z E
D,
(1.19)
CONTINUITY AND LIMITS
5
then the function f(z)is continuous a t z. For the limit in Equation (1.18) to exist, it is sufficient t o show that the right and the left limits agree, that is, lim f(z- E )
E’O
=
+E),
(1.20)
f(.).
(1.21)
lim f(z
E+O
f (z-1 = f(.+)
=
In practice, the second theorem is more useful in showing that a given function is continuous. If a function is discontinuous at a finite number of points in its interval of definition, [z,,zb], it is called piecewise continuous. Generalization of these theorems t o multivariate functions is easily accomplished by taking z to represent a point in a space with n independent variables as
(1.22)
z = (z1, 5 2 , . . . ,z n ) .
However, with more than one independent variable one has to be careful. Consider the simple function
(1.23) which is finite at the origin. Depending on the direction of approach to the origin, f(h,y) takes different values: lim(z,y)+(o,o)f(z,y) lim(z,y)+(o,o)f(z,y) lim(z,y~+(o,o) f ( z ,y)
+0
if we approach along the y = z line, 1 if we approach along the z axis, + -1 if we approach along the y axis. +
Hence the limit lim(z,y)+(o,o) f(s, y) does not exist and the function f ( z , y ) is not continuous at the origin. Limits: Basic properties of limits, which we give for functions with two variables, also hold for a general multivariate function: Let u = f ( z , y ) and ‘u = g(z,y) be two functions defined in the domain D of the zy-plane. If the limits
exist, then we can write =
fo + go,
(1.26)
= f o . go, =
fo
-, go
go
(1.25)
# 0.
(1.27)
6
FUNCTIONAL ANALYSIS
If the functions f ( x , y ) and g ( x , y ) are continuous a t (xo,yo), then the functions
are also continuous at (zo, yo), provided that in the last case g(x,y) is different from zero at ( 5 0 yo). , Let F ( u , v ) be a continuous function defined in some domain Do of the uv-plane and let F ( f ( z ,y), g(x,y)) be defined for (x,y) in D. Then, if ( f o , go) is in Do, we can write (1.29) If f ( x ,y) and g(x,y) are continuous at ( 2 0 , yo), then so is F ( f ( z ,y ) , g(x,y)). In evaluating limits of functions that can be expressed as ratios, L’HBpital’s rule is very useful. L’Hbpital’s rule: Let f and g be differentiable functions on the interval a 5 x < b with g’(z) # 0 there, where the upper limit b could be finite or infinite. If f and g have the limits lim f(x) = 0 and lim g(x) = 0
(1.30)
lim f(x) = 03 and lim g ( x ) = 00,
(1.31)
f’(x) = L lim -
(1.32)
x-b
x-b
or x-b
x-b
and if the limit x+b
g’(Z)
exists, where L could be zero or infinity, then = L.
1.3
(1.33)
PARTIAL DIFFERENTIATION
A necessary and sufficient condition for the derivative of f(x) to exist at xo is that the left, f L ( z ~ )and , the right, f$(xo), derivatives exist and be equal (Fig. 1.1),that is, fL(It.0) =
f’(Zo),
(1.34)
where (1.35) (1.36)
7
PARTIAL DIFFERENTIATION
t'
line.
Figure
When the derivative exists, we always mean a finite derivative. If f ( x ) has derivative a t xo, it means that it is continuous a t that point. When the derivative of f (x)exists a t every point in the interval (a, b ) , we say that f (x) is differentiable in (a, b ) and write its derivative as (1.37) Geometrically, derivative a t a point is the slope of the tangent line at that point:
(1.38) When a function depends upon two variables: z
=
f (z, Y),
(1.39)
the partial derivative with respect to x at (xo,yo) is defined as the limit lim Ax-0
f
(20
+ Ax,Yo) Ax
-
f ( 2 0 , Yo)
(1.40)
and we show it as in one of the following forms:
Similarly, the partial derivative with respect to y at
( 2 0 yo) ,
is defined as
(1.42)
8
FUNCTIONAL ANALYSIS
A geometric interpretation of the partial derivative is that the section of the surface z = f ( x , y ) with the plane y = yo is the curve z = f(x,yo); hence the partial derivative ~ ( X yo) O , is the slope of the tangent line (Fig. 1.2) t o z = f (x, yo) at ( 2 0 ,yo). Similarly, the partial derivative ~ ( x oyo) , is the slope of the tangent line to the curve z = f(x0, y) at (XO, yo). For a multivariate function the partial derivative with respect to the i t h independent variable is defined as df(X1,.
=
. . , x i , . . . , xn) 8x2 f(x1,. . . , x i
lim
+ Axz,. . . , x n )
-
f ( x 1 , . . . , x i , .. .
AX^
A z , -0
'xn).
(1.43)
For a given f ( x , y) the partial derivatives fz and fy are functions of x and y and they also have partial derivatives which are written as f
--=--
22 -
a2f ax2
YI
--=.["i d2f
f xy
When f C c gand
- dxdy
fyz
dx dy
["I
- y dy=dy-' -
y y - a2f d
ax ax
I
f
yx -
dydx a2f
-
i"
dy a x
(1.44) (1.45)
are continuous at (20,yo), then the relation fzy
(1.46)
=fyz
holds at (20,yo). Under similar conditions this result can be extended to cases with more than two independent variables and to higher-order mixed partial derivatives.
1.4 TOTAL DIFFERENTIAL When a function depends on two or more variables, we have seen that the limit at a point may depend on the direction of approach. Hence, it is important that we introduce a nondirectional derivative for functions with several variables. Given the function
for a displacement of Ar = (Ax, Ay, Az) we can write its new value as
f(r
+ A r ) = (X + Ax)(z + Az)
-
(y
+A Y ) ~
(1.48)
+ X A Z+ Z A X+ AXAZ y2 - 2yAy ( A Y ) ~ (1.49) = (xz y2) + (ZAX 2yAy + XAZ)+ AXAZ ( A Y ) ~ , (1.50) = xz
-
-
-
-
-
TOTAL DIFFERENTIAL
Figure 1.2
Partial derivative, f z , is the slope of the tangent line to
9
z = f(z,yo).
where r stands for the point (x,y, z ) and A r is the displacement (Ax, Ay, Az). For small. A r the change in f (x,y, z ) to first order can be written as
A f = f ( +~AT) - f ( r ) = (XZ - y 2 ) + (zAx - 2yAy + ZAZ)- ( I C Z - y2), (1.51)
Af
=
ZAX- 2yAy + X A Z .
(1.52)
Considering that the first-order partial derivatives of f are given as
af = z , -a f-- -2y, -
af = 2 , -
(1.53)
a f + -Az. af + -Ay dY dz
(1.54)
aY
dX
dz
Equation (1.52) is nothing but
af Af = -Ax dX
In general, if a function f (x,y, z ) is differentiable at (x,y, z ) in some domain D with the partial derivatives (1.55) then the change in f(x,y, z ) in D t o first order in (Ax, Ay, as
Af
2
-Ax df dX
+ -Ay a f + -Az. af dY
dz
Az) can be written (1.56)
10
FUNCTIONAL ANALYSIS
Figure 1.3
Total differential gives a local approximationto the change in a function.
In the limit as A r
-+
0 we can write Equation (1.56) as (1.57)
which is called the total differential of f (2, y, z ) . In the case of a function with one variable, f (x),the differential reduces to
Af
N
dfAx,
(1.58)
dx
which gives the local approximation to the change in the function at the point x via the value of the tangent line (Fig. 1.3) at that point. The smaller the value of Ax, the better the approximation. In cases with several independent variables, Af is naturally approximated by using the tangent plane at that point.
1.5
TAYLOR SERIES
The Taylor series of a function about
1?(x n.
50, when
it exists, is given as
00
f(x)=
-
(1.59)
x0)n
n=O
= a0
a2 + al(x - 2 0 ) + -(x 2!
-
2
XO)
+ .' .
.
(1.60)
To evaluate the coefficients, we differentiate repeatedly and set x = xo to find
11
TAYLOR SERIES
(1.61)
where (1.62) and the zeroth derivative is defined as the function itself, that is,
f'O'(x) = f ( x ) .
(1.63)
Hence the Taylor series of a function with a single variable is written as (1.64) This formula assumes that f ( x ) is infinitely differentiable in an open domain including X O . Functions that are equal to their Taylor series in the neighborhood of any point xo in their domain are called analytic functions. Taylor series about xo = 0 are called Maclaurin series. Using the Taylor series, we can approximate a given differentiable function in the neighborhood of zo to orders beyond the linear term in Equation (1.58). For example, to second order we obtain
(1.66) (1.67) Since xo is any point in the open domain that the Taylor series exists, we can drop the subscript in xo and write
df A ( 2 ' f (= ~ )-AX dx
1d2f + --(AX) 2 dx2
2
,
(1.68)
where A(')f denotes the differential of f to the second order. Higher-order differentials are obtained similarly.
12
FUNCTIONAL ANALYSIS
The Taylor series of a function depending on several independent variables is also possible under similar conditions and in the case of two independent variables it is given as
(1.69) n=O
where the derivatives are to be evaluated at (x0,yo). For functions with two independent variables and to second order in the neighborhood of (x,y ) , Equation (1.69)gives
which yields the differential, A(2)f (x, y) = f (x
+ Ax, y + Ay)
-
f ( x , y), as
(1.71)
For the higher-order terms note how the powers in Equation (1.69)are expanded. Generalization to n independent variables is obvious.
Example 1.1. Partial derivatives: Consider the function
z(x,y)
= xy2
Partial derivatives are written as dz
- = y2
dX dz
+ ex.
+ ex,
(1.74) (1.75)
- = 2xy,
d
&
dY dz
(&)
d22 = @ = ex,
&)=v d
dz
$($)=z&-
d2z
= 22,
d2z -
d 2z
(1.73)
2Y,
(1.76) (1.77) (1.78)
&($)=a-
2y.
(1.79)
TAYLOR SERIES
13
Example 1.2. Taylor series: Using the partial derivatives obtained in the previous example, we can write the first two terms of the Taylor series [Eq. (1.69)] of z = zy2 + e" about the point (0, I). First, the required derivatives at ( 0 , l ) are evaluated as
z ( 0 , l ) = 1,
(1.80) (1.81)
Using these derivatives we can write the first two terms of the Taylor series about the point (0,1>as
z(x, y) = z ( 0 , I )
($) (")
+
(Y-1)
0
+ - (1- ) 0 d 22 2+ z
x ( y - l ) + - ( -1) d 2 z ( y - 1 I 2 + . dXdY 0 2 dY2 0 1 1 = 1 + 22 + O(y - 1) + -x2 22(y - 1) -O(y - 1 ) 2 2 2 (1.87) 1 (1.88) = 1 22 -x2 2x(y - 1) + . .. . 2 2
8x2
+
+ +
+
+
t . .
+
where the subscript 0 indicates that the derivatives are to be evaluated at the point ( 0 , l ) . To find A(2)z(0,l ) ,which is good to the second order, we first write Lwz(0,l) =
(g)o + ($) ay Ax
(1.89)
0
= 2Ax
(1.90)
14
FUNCTIONAL ANALYSIS
Figure 1.4
Maximum and minimum points of a function.
and then obtain
(1.91)
1 = 2Ax + - (Ax)’ 2
+ 2AxAy.
(1.92)
1.6 M A X I M A A N D M I N I M A OF F U N C T I O N S We are frequently interested in the maximum or the minimum values that a function, f ( z ) , attains in a closed domain [a,b].The absolute maximum, M I , is the value of the function at some point, XO, if the inequality MI
=f(X0)
2 f(.)
(1.93)
holds for all x in [a,b]. An absolute minimum is also defined similarly. In general we can quote the following theorem (Fig. 1.4): Theorem 1.3. If a function, f(x), is continuous in the closed interval [a,b], then it possesses an absolute maximum, M I , and an absolute minimum, Adz, in that interval. Proof of this theorem requires a rather detailed analysis of the real number system, which can be found in books on advanced calculus. On the other hand, we are usually interested in the extremum values, that is, the local maximum or the minimum values of a function. Operationally, we can determine whether a given point, XO, corresponds to an extremum or not by
MAXIMA AND MINIMA OF FUNCTIONS
Figure 1.5
15
Analysis of critical points.
looking at the change or the variation in the function in the neighborhood of 2 0 . The total differential introduced in the previous sections is just the tool needed for this. We have seen that in one dimension we can write the first, Af('), the second, A(2)f , and the third, A(3)f , differentials of a function with single independent variable as (1.94) (1.95)
Extremum points are defined as the points where the first differential vanishes, which means (1.97) In other words, the tangent line a t an extremum point is horizontal (Fig. 1.5a,b). In order to decide whether an extremum point corresponds t o a local maximum or minimum we look at the second differential: (1.98) For a local maximum the function decreases for small displacements about the extremum point (Fig. 1.5a), which implies A(2)f(xo)< 0. For a local minimum a similar argument yields Ac2)f (xg) > 0. Thus we obtain the following criteria: = 0 and
(s) < 0 d2f
for a local maximum
(1.99)
for a local minimum.
(1.100)
50
and = 0 and
(z) d2f
xo
>0
16
FUNCTIONAL ANALYSIS
Figure 1.6
Plot of y(z) = z3.
In cases where the second derivative also vanishes, we look at the third differential, ~ l ( ~ ) f (We z ~now ) . say that we have an inflection point; and depending on the sign of the third differential, we have either the third or the fourth shape in Figure 1.5. Consider the function
f(.) = x 3 ,
(1.101)
where the first derivative, f ’ ( z ) = 3x2, vanishes at zo = 0 . However, the second derivative, f ” ( z ) = 622, also vanishes there, thus making 20 = 0 a point of inflection. From the third differential: (1.102)
1 3!
= -6(Az)3,
(1.103)
we see that A(3)f(zo)> 0 for Ax > 0 and A(3)f(zo)< 0 for Az < 0. Thus we choose the third shape in Figure 1.5 and plot f ( z ) = z3 as in Figure 1.6. Points where the first derivative of a function vanishes are called the critical points. Usually the potential in one-dimensional conservative systems can be represented by a (scalar) function, V ( z ) .Negative of the derivative of the potential gives the z component of the force on the system:
F,(z)
=
dV dz
--.
(1.104)
Thus the critical points of a potential function, V ( z ) correspond , t o the points where the net force on the system is zero. In other words, the critical points are the points where the system is in equilibrium. Whether an equilibrium is stable or unstable depends on whether the critical point is a minimum or maximum, respectively. Analysis of the extrema of functions depending on more than one variable follows the same line of reasoning. However, since we can now approach the
MAXIMA AND MINIMA OF FUNCTIONS
17
critical point from infinitely many different directions, one has to be careful. Consider a continuous function
z = f(X,Y),
(1.105)
defined in some domain D. We say this function has a local maximum at ( 2 0 , yo) if the inequality S(X,Y)
5 f(X0,Yo)
is satisfied for all points in some neighborhood of minimum if the inequality
(1.106) ( 5 0 , yo)
and to have a local
f ( x , Y) 2 f(z0,Yo)
(1.107)
is satisfied. In the following argument we assume that all the necessary partial derivatives exist. Critical points are now defined as the points where the first differential, A(')f(z, y), vanishes:
A ( l ) f ( ~ , y=) AX :[
+3 A y ] dY
= 0.
(1.108)
Since the displacements A x and Ay are arbitrary, the only way to satisfy this equation is to have both partial derivatives, fz and fv, vanish. Hence at the critical point ( I C O , yo), shown with the subscript 0, one has
(g)o
= 0,
(1.109)
($)o
= 0.
(1.110)
To study the nature of these critical points, we again look at the second differential, A(2)f(xo,yo), which is now given as
For a local maximum the second differential has to be negative, A(2)f(xo,yo) < 0, and for a local minimum positive, ~ I ( ~ ) f ( xyo) o , > 0. Since we can approach the point (50,yo) from different directions, we substitute (Fig. 1.7)
Ax
= Ascosd
and Ay
= Assind
(1.112)
to write Equation (1.111) as 1
A(2)f(xo,yo)= - [ A c o s 2 d + 2 B c o s d s i n ~ + C s i n 2 d ]AS)^, 2
(1.113)
18
FUNCTIONAL ANALYSIS
Figure 1.7
Definition of As.
where we have defined
A=
(g)o,(g)o> (w),’ f B=
‘=
d2
(1.114)
Now the analysis of the nature of the critical points reduces to investigating the sign of ~ I ( ~ ) f ( yo) z o ,[Eq. (1.113)]. We present the final result as a theorem (Kaplan). Theorem 1.4. Let z = f(z, y) and its first and second partial derivatives be continuous in a domain D and let (20,yo) be a point in D , where the partial derivatives (&)nand
($)
vanish. Then, we have the following cases:
n
I. For B2 - AC-< 0 and A % C < 0 we have a local maximum at (20,yo). 11. For B2 - AC < 0 and A + C > 0 we have a local minimum at (zo, yo). 111. For B2 - AC > 0 , we have a saddle point a t (z0,yo). IV. For B2 - AC = 0 , the nature of the critical point is undetermined. When B2 - AC > 0 at (z0,yO) we have what is called a saddle point. In this case for some directions A ( 2 ) f ( z ~ , y ois) positive and negative for the others. When B2 - AC = 0 , for some directions A(’)f(zo,yo) will be zero, hence one must look at higher-order derivatives to study the nature of the critical point. When A , B , and C are all zero, then A(2)f ( 2 0 , yo) also vanishes. Hence we need to investigate the sign of A(3)f (zo,yo).
1.7
EXTREMA OF FUNCTIONS W I T H CONDITIONS
A problem of significance is finding the critical points of functions while satisfying one or more conditions. Consider finding the extremums of
w
= f(z,y,z)
(1.115)
gl(z,Y,z) = 0
(1.116)
while satisfying the conditions
EXTREMA OF FUNCTIONS WITH CONDITIONS
19
In principle the two conditions define two surfaces, the intersection of which can be expressed as (1.118) (1.119) (1.120) where we have used the variable x as a parameter. We can now substitute this parametric equation into w = f (x,y, z ) and write it entirely in terms of 2 as
extremum points of which can now be found by the technique discussed in the previous section. Geometrically, this problem corresponds to finding the y, z ) on the curve defined by the intersection of extremum points of w = f(z, g1(.r, y, z ) = 0 and g2(x, y, z ) = 0. Unfortunately, this method rarely works to yield a solution analytically. Instead, we introduce the following method: At a critical point we have seen that the change in w to first order in the differentials Ax, Ay, and Az is zero:
Aw
=
af -Ax dX
8.f af + -Az + -Ay dY dz
= 0.
(1.122)
We also write the differentials of g1(x, y, z ) and g2(2,y, z ) as
%ax + -Ay ag1 dX
dY
-Ax dg2
+ -Ay ag2
dX
dY
+ -Az 891
=0
dz
(1.123)
and
+dg2 az
= 0.
dz
We now multiply Equation (1.123) with A 1 and Equation (1.124) with add to Equation (1.122) to write
(1.124) A2
and
(1.125) Because of the given conditions in Equations (1.116) and (1.117), Ax,Ay, and Az are not independent. Hence their coefficients in Equation (1.122)
20
FUNCTIONAL ANALYSIS
cannot be set to zero directly. However, the values of A 1 and X2, which are called the Lagrange undetermined multipliers, can be chosen so that the coefficients of A x ,Ay, and Az are all zero in Equation (1.125):
(1.126) (1.127) (1.128) Along with the two conditions, g1(x, y,z ) = 0 and g2(x, y,z ) = 0, these three equations are t o be solved for the five unknowns:
The values that A 1 and A2 assume are used to obtain the x,y, and z values needed, which correspond to the locations of the critical points. Analysis of the critical points now proceeds as before. Note that this method is quite general and as long as the required derivatives exist and the conditions are compatible, it can be used with any number of conditions.
Example 1.3. E x t r e m u m problems: We now find the dimensions of a rectangular swimming pool with fixed volume Vo and minimal area of its base and sides. If we denote the dimensions of its base with x and y and its height with z , the fixed volume is
vo = xyz
(1.130)
and the total area of the base and the sides is
a
= xy
+ 2x2 + 2yz.
(1.131)
Using the condition of fixed volume we write a as a function of x and y as
avo + -.avo
a = xy+ -
Y
X
(1.132)
Now the critical points of a are determined from the equations
(1.133) which give the following two equations:
(1.134) (1.135)
21
EXTREMA OF FUNCTIONS WITH CONDITIONS
or
yz2 - 2vo = 0,
(1.136)
2vo = 0.
( 1.137)
zy2
-
If we subtract Equation (1.137) from Equation (1.136), we obtain
(1.138)
Y = 5,
which when substituted back into Equation (1.136) gives the critical dimensions
(1.139) (1.140)
.=(?)
1/3
,
(1.141)
where the final dimension is obtained from Vo = xyz. To assure ourselves that this corresponds to a minimum, we evaluate the second-order derivatives at the critical point,
(1.142) (1.143)
(I.144) and find
B2 - AC
=
1-4
=
- 3 < 0 and A + C = 2 + 2 = 4 > 0.
(1.145)
Thus the critical dimensions we have obtained [Eqs. (1.139)-(1.141)] are indeed for a minimum by Theorem 1.4.
Example 1.4. Lagrange undetermined multipliers: We now solve the above problem by using the method of Lagrange undetermined multipliers. The equation to be minimized is now f(5, y, z ) = xy
+ 2zz + 2yz
(1.146)
with the condition g(z, g, 2 ) =
& - xyz = 0.
(1.147)
22
FUNCTIONAL ANALYSIS
The equations to be solved are obtained from Equations (1.126)-(1.128) as
y x 22
+ 22 - yzx = 0, + 22 xzx = 0, + 2y xxy = 0. -
(1.148) (1.149)
-
(1.150)
Along with VO= xyz, these give 4 equations to be solved for the critical dimensions x , y , z , and A. Multiplying the first equation by x and the second one by y and then subtracting gives
x
= y.
(1.151)
Substituting this into the third equation [Eq. (1.150)] gives the value of the Lagrange undetermined multiplier as A = 4/x, which when substituted into Equations (1,148)-(1.150) gives
xy
+ 2x2 4yz = 0, x + 22 - 42 = 0, 22 + 2y 4y = 0. -
-
(1.152) (1.153) (1.154)
Using the condition Vo = xyz and equation (1.151) these three equations [Eqs. (1.152)-(1.154)] can be solved easily to yield the critical dimensions in terms of Vo as =
(1.155)
y=
(1.156)
(T)
1/3
z=
(1.157)
Analysis of the critical point is done as in the previous example by using Theorem 1.4.
1.8 DERIVATIVES A N D DIFFERENTIALS OF COMPOSITE FUNCTIONS
In what follows we assume that the functions are defined in their appropriate domains and have continuous first partial derivatives. Chain rule: If z = f ( x , y) and x = x ( t ) , y = y(t), then
dz - -_ dzdx dt
Similarly, if z
= f ( x , y)
dx dt
+--ddyz ddty
and x = g ( u , v) and y = h(u,v), then
(1.158)
DERIVATIVES AND DIFFERENTIALS OF COMPOSITE FUNCTIONS
23
(1.159) (1.160)
A better notation t o use is
(1.162) This notation is particularly useful in thermodynamics, where z may also be expressed with another choice of variables, such as
(1.163) (1.164) (1.165) Hence, when we write the derivative
dz -
dX’
(1.166)
we have t o clarify whether we are in the ( ~ , y or ) the ( x , w ) space by writing
(1.167) These formulas can be extended to any number of variables. Using Equation (1.158) we can write the differential dz as
dz =
(”ax at +--d y ”) dt at dz
= -dx
ax
dz + -dy. dy
(1.168) (1.169)
We now treat x,y and z as functions of (u, v) and write the differential dz as
dz dU
dz
dz = - du + - dv
=
(g)
dV
(1.170) (1.171)
24
FUNCTIONAL ANALYSIS
Since z and y are also functions of u and u,we have the differentials (1.173) and (1.174) which allow us to write Equation (1.172) as dz
=
dz
dz
dX
dY
- dX + - dy.
(1.175)
This result can be extended t o any number of variables. In other words, any equation in differentials that is true in one set of independent variables is also true for another choice of variables. Formal proofs of these results can be found in books on advanced calculus (Apostol, Kaplan). 1.9
IMPLICIT FUNCTION THEOREM
A function given as
can be used to describe several functions of the form
z = f(X,Y), y = g(x,z ) , etc.
(1.177)
+ z2
(1.179)
(1.178)
For example,
x2 +y2
-
9=0
can be used to define the function
z
=
JW'
(1.180)
or (1.181) both of which are defined in the domain x2 + y2 + z 2 5 9. We say these functions are implicitly defined by Equation (1.179). In order t o be able to define a differentiable function. = f ( x ,Y),
(1.182)
IMPLICIT FUNCTION THEOREM
25
by the implicit function F ( x ,y, z ) = 0, the partial derivatives
a f and ax
af
(I.183)
-
ay
should exist in some domain so that we can write the differential (1.184) Using the implicit function F ( x ,y, z ) = 0, we write
F, dx
+ Fu d y + F, dz = 0
(1.185)
and F X
dz = -- dx F,
-
3 dy,
(1.186)
Fz
where
dF F --, F
,-ax
dF
--
dy
y -
andF,=-
dF dz
(1.187)
Comparing the two differentials [Eqs. (1.184) and (1.186)], we obtain the partial derivatives (1.188) Hence? granted that F, # 0, we can use the implicit function F ( x ,y , z ) = 0 to define a function of the form z = f ( x , y ) . We now consider a more complicated case, in which we have two implicit functions: (1.189) (1.190) Using these two equations in terms of four variables, we can solve, in principle, for two of the variables in terms of the remaining two as (1.191) (1.192) For f (x,y ) and g(x,y) to be differentiable, certain conditions must be met by F ( x ,y, z , w) and G ( x ,y, z , w). First we write the differentials
+
+
+
F, dx Fy dy F, dz F, dw = 0 , G, dx+Gy dy+G, dz+G, dw=O
(1.193) (1.194)
26
FUNCTIONAL ANALYSIS
and rearrange them as
+
F, d z Fw d w G, dz + Gw d w
-Fz d x - Fy d y ; = -G, d x - G, dy. =
We now have a system of two linear equations dur to be solved simultaneously. We can either determinants and the Cramer’s rule to write -F, d x - Fy dy -G, d x - Gy d y dz =
(1.195) (1.196)
for the differentials d z and solve by elimination or use
Gw Fw
I
(1.197)
and
F, dw =
~
G,
-F, d x -G, d x
Fy d y - G, d y -
(1.198)
Using the properties of determinants, we can write these as
and
For differentiable functions, z = f (x,y ) and w = g ( x , y ) , with existing firstorder partial derivatives we can write (1.201) (1.202) Thus by comparison with Equations (1.199) and (l.200), we obtain the partial derivatives
d ( F ,G) a ( F ,G) d f - - a(xlw) 3f - - d(Y,W) d(F,G)’ & d(F,G) dx q z , w) a ( z , w)
(1.203)
IMPLICIT FUNCTION THEOREM
27
and
(1.204)
(1.205) are called the Jacobi determinants. In summary, given two implicit equations
we can define two differentiable functions = f ( x , y ) and
w
= g(Z,Y)
(1.207)
with the partial derivatives given as in Equations (1.203)-(1.204), provided that the Jacobian
(1.208) is different from zero in the domain of definition. This useful technique can be generalized t o a set of m equations in n number of unknowns:
+m
(1.209)
(1.210)
28
FUNCTIONAL ANALYSIS
and obtain a set of m linear equations to be solved for the m differentials, d y i , i = 1,.. . ! nz, of the dependent variables. Using Cramer’s rule, we can solve for dyi if and only if the determinant of the coefficients is different from zero, that is,
To obta.in closed expressions for the partial derivatives,
(1.213)
we take partial derivatives of the Equations (1.209) to write
(1.2 14)
dYi
which gives the solubion for - as dXj
and similar expressions for the other partial derivatives can be obtained. In general, granted that. the Jacobi determinant does not vanish, namely
IMPLICIT FUNCTION THEOREM
29
dYi we can obtain the partial derivatives - as dXj
8% dXj
q y 1 , . . . ,yi-1, Xj,yi+1,. ‘ . ,Ym) d(F1,. ,Fm)
>
( 1.217)
”
a(vl,.. . > Ym) where i = 1 , .. . , m and j = 1,.. . n.We conclude this section by stating the implicit function theorem, a proof of which can be found in Kaplan: Implicit function theorem: Let the functions
Fi(y1,. . . , y m , x l , . . . ,xn)= 0, i = 1 , . . . ,m,
(1.218)
be defined in the neighborhood of the point
with continuous first-order partial derivatives existing in this neighborhood. If (1.220) then in an appropriate neighborhood of Po,there is a unique set of continuous functions yi = fi(zl,.. . ,x,), i = l , ,. . ,m,
(1.221)
with continuous partial derivatives,
where i = 1 , .. . , m and j = 1 , . . . n,such that ;yoi = fi(z01,. . . , ZO,),
i = 1 , .. . , m,
(1.223)
and
Fi(fl(zl,. . . , z n ) ., . . , f m ( z l , . . . , z n ) , z l , . . . ,x,) = 0, i = l , ,. . , m , (1.224) in the neighborhood of Po. Note that if the Jacobi determinant [Eq. (1.120)] is zero at the point of interest, then we search for a different set of dependent variables to avoid the difficulty.
30
FUNCTIONAL ANALYSIS
1.10 INVERSE FUNCTIONS
A pair of functions, (1.225) (1.226) can be considered as a mapping from the xy space to the uu space. Under certain conditions, this maps a certain domain D,, in the xy space t o a certain domain D,, in the uu space on a one-to-one basis. Under such conditions, an inverse mapping should also exist. However, analytically it may not always be possible to find the inverse mapping or the functions: (1.227)
( 1.228) In such cases, we may consider Equations (1.225) and (1.226) as implicit functions and write them a s
We can now use Equation (1.215) with y1 = u,y2 = u and x1 write the partial derivatives of the inverse functions as
= x, x2 = y
to
a F 1 , F2) (1.231)
(1.232)
(1.233) Similarly, the other partial derivatives can be obtained. As seen, the inverse function or the inverse mapping is well-defined only when the Jacobi determinant J is different from zero, that is, (1.234) where J is also called the Jacobian of the mapping. We will return to this point when we discuss coordinate transformations in Chapter 3. Note that
INVERSE FUNCTIONS
31
the Jacobian of the inverse mapping is 1/J.In other words,
(1.235)
Example 1.5. Change of independent variable: We now transform the Laplace equation:
(1.236) into polar coordinates, that is, to a new set of independent variables defined by the equations
x = r cos 4,
(1.237) (1.238)
y = r sin 4,
4 E [0,27r]. We first
where r E (0, cm)and of 2 = z(x,y) :
write the partial derivatives
dz d z d x dzdy - -+ --, dr dxdr dydr dz d z d x dzdy --a$ - dxdcp +--> dyd$
(1.239) (1.240)
which lead to d z = dz
-cos$+-sin$, dz .
dY
dx
dr
d z = -(-rsind) dz
a$
ax
+ -(rcosd). dz dY
(1.241) (1.242)
Solving for dzldx and dzldy,we obtain
dz
dz
- = -cosq!dx dr dz az . - = -sin$ dy dr
dz 1 --sin$,
(1.243)
dz 1 + -cos$.
(1.244)
84
84 r
32
FUNCTIONAL ANALYSIS
We now repeat this process with dz/dx to obtain the second derivative d 2 z / d z 2 as
[
1
sin4 d d z -cos+--sin4 r 84 ar 84 r . d 2z d2z 2 d2z 1 - -cos2 4 - -cos 4 sin 4 + -- sin2 4 dr drd4 r r2 1dz 2 dz 2 +--sin 4+--sin@cos4. (1.245) r dr 84 r2
A similar procedure for dz/dy yields d 2 z / d y 2 : d22 d2z . 2 822 2 d2z 1 -=-sin 4+-sin 4cos 4 -- cos2 4 dy2 dr2 drd4 r d42 r2 182 2 a2 2 -- cos 4 - -- sin4cos4. (1.246) r dr dd r2
+
,
+
Adding Equations (1.245) and (1.246), we obtain the transformed equation as
d 2 z ( r ,6 ) dr
( r ,0) z ( r ,6 ) + -r1-d z dr + -r21 d 2&b2 = 0.
(1.247)
Since the Jacobian of the mapping is different from zero, that is, J = - -d(x,Y)
d ( r , Q )-
I
- rc0s4 sin$
rcos4
= r, r
# 0,
(1.248)
the inverse mapping exists and it is given as
r=d
1.11
m
(1.249)
4= tan-' 2. 5
(1.250)
INTEGRAL CALCULUS A N D T H E D E F I N I T E INTEGRAL
Let f ( x ) be a continuous function in the interval [x,, 561. By choosing (n- 1) points in this interval, xl,z 2 , . . , ~ ~ - we 1 ,can subdivide it into n subintervals, Ax1 , Ax2, . . . , Az,, which are not necessarily all equal in length. From
INTEGRAL CALCULUS AND THE DEFINITE INTEGRAL
33
-z
AX, Ax2 Ax3 xo
XI
x2
Figure 1.8
Ax4
x3
.-.X b
*
Upper (left) and lower (right) Darboux sums.
Theorem 1.3 we know that f(x) assumes a maximum, M , and a minimum, m,in [x,,xb].Let Mi represent the maximum and mi the minimum values that f (x)assumes in Axi. We now denote a particular subdivision by d and write the sum of the rectangles shown in Figure 1.8 (left) as n
S ( d )= C M Z A X ,
(1.251)
i=l
and in Figure 1.8 (right) as n
~ ( d=)
C
(1.252)
miAXi.
i=l
The sums S ( d ) and s ( d ) are called the upper and the lower Darboux sums, respectively. Naturally, their values depend on the subdivision d. We pick the smallest of all S ( d ) and call it the upper integral of f(x) in [x,,xb]: (1.253) Similarly, the largest of all s ( d ) is called the lower integral of f(x) in [x,,xb] : (1.254) When these two integrals are equal, we say the definite integral of f (x)in the interval [x,,xb] exists and we write
I:'
-f ( x ) dx = l y f ( x ) dx = -
l:
f(x) dx.
( 1.255)
34
FUNCTIONAL ANALYSIS
T' Figure 1.9
Riemann integral
This definition of integral is also called the Riemann integral, and the function f(x) is called the integrand. Darboux sums are not very practical to work with. Instead, for a particular subdivision we write the sum n
a(d) =
f(zk)axk,
(1.256)
k=l
where 5 k is an arbitrary point in Axk (Fig. 1.9). It is clear that the inequality
s ( d ) 5 a ( d )5 S ( d )
(1.257)
is satisfied. For a given subdivision the largest value of Axi is called the norm of d , which we will denote as n ( d ) . 1.12
R I E M A N N INTEGRAL
We now give the basic definition of the Riemann integral as follows: Definition 1.1. Given a sequence of subdivisions d l , dz, . . . of the interval [x(~ q,] , such that the sequence of norms n ( d l ) ,n ( d z ) , . . . has the limit lim n(&)
---f
k-oo
( 1.258)
0
and if f ( ~ is ) integrable in [ x , , z ~ ] then , the Riemann integral is defined as
f(x) dx = lim a ( & ) , k-cc
(1.259)
where lim S ( d k ) = lim s ( d k ) = lim a ( & ) .
k-cc
k-cc
k-cc
(1.260)
RIEMANN INTEGRAL
35
Theorem 1.5. For the existence of the Riemann integral
L-b
f(x) dx,
where x, and xb are finite numbers, it is sufficient t o satisfy one of the following conditions: i) f ( x ) is continuous in [x,,zb]. ii) f(z)is bounded and piecewise continuous in [x,,z b ] . From these definitions we can deduce the following properties of Riemann integrals. Their formal proofs can be found in books on mathematical analysis such as Apostol: I. If fl(z)and fi(z) are integrable in [z,,zb],then their sum is also integrable and we can write
JI:'
[fl
(z)
+ f 2 ( ~ ) 1 dx =
l"
fl(z) dx +
Ixb
f2(z) dz.
(1.261)
2,
11. If f(x) is integrable in [z,, zb],then the following are true: a f ( z )dz = a
i:'
f(z)d z , a is a constant,
(1.262)
(1.263)
(1.264)
(1.265)
111. If f(z)is continuous and f(x) 2 0 in [z,,xb],then
f(z)dx = 0 means . f ( z ) = 0. IV. -The average or the mean, defined as
(f), of f(z)in the interval
(1.266)
[z,,zb]
is
(1.267)
36
FUNCTIONAL ANALYSIS
If f ( x ) is continuous, then there exist a t least one point z* E [x,,xb] such that
1:
f ( ~dx) = f ( ~ * ) ( b - a).
(1.268)
This is also called the mean value theorem or Rolle's theorem. V. If f ( x ) is integrable in [x,,xb]and if x, < z,< X b , then
l:
f ( z ) dx =
l:
f ( x )d z
+
1"
f ( x ) dx.
( 1.269)
VI. If f ( z ) 2 g(z) in [x,,xb],then ( 1.270) VII. Fundamental theorem of calculus: If f ( z ) is continuous in [x,,zb], then the function
(1.271) is also a continuous function of x in [z,, zb]. The function F ( x ) is differentiable for every point in [x,,zb] and its derivative at x is f ( x ) :
(1.272)
F ( z ) is called the primitive or the antiderivative of f ( x ) .Given a primitive, F (x),then F ( x ) + constant
(1.273)
is also a primitive. If a primitive is known for [x,,xb], then we can write
l:
f(x) dx
=
1:'
dx
(1.274) (1.275) (1.276)
When the region of integration is not specified, we write the indefinite in-
tegral
1
f ( x ) dx = F ( x )
+ C,
where C is an arbitrary constant and F ( z ) is any function the derivative of which is f(x).
37
IMPROPER INTEGRALS
VIII. If f(x) is continuous and f(x) 2 0 in [z,, zb], then geometrically the integral (1.277) Jza
is the area under f (x)between 2, and 2 6 . IX. A very useful inequality in deciding whether a given integral is convergent or not is the Schwarz inequality:
(1.278)
X. One of the most commonly used techniques in integral calculus is the integration by parts:
i:'
uddx = [UW];~
(1.279)
or
I:'
u du = [ U U ] ; ~-
1'"
u du,
(1.280)
where the derivatives u' and v' and u and v are continuous in [x,,xb]. XI. In general the following inequality holds:
szab
sxxab
that is, if the integral If(x)l dx converges, then the integral f(x) dx also converges. A convergent integral, f(x) dx, is said to be absolutely convergent, if If(x)l dx also converges. Integrals that converge but do not converge absolutely are called conditionally convergent.
sz:
s'y
1.13
IMPROPER INTEGRALS
We introduced Riemann integrals for bounded functions with finite intervals. Improper integrals are basically their extension to cases with infinite range and to functions that are not necessarily bounded. Definition 1.2. Consider the integral rc
(1.281) which exists in the Riemann sense in the interval [a,c],where a < c < b. If the limit
(1.282)
38
FUNCTIONAL ANALYSIS
exists, where the function f ( x ) could be unbounded in the left neighborhood b of b, then we say the integral f ( x )dx exists, or converges, and write
sa
[/(XI
dx = A .
(1.283)
Example 1.6. Improper integrals: Consider the improper integral (1.284) where the integrand, x / ( l - x ) l l 2 , is unbounded at the end point x = 1. We write I1 as the limit (1.285) =
lim
2(1 - x)3/2
(1.286)
c-1-
thereby obtaining the value of I1 as 413. We now consider the integral (1.288) which does not exist since 12 =
lim
1': 5
(1.289)
(1 - 2 ) = lim [-1n(l - x ) ] : c-1-
0
(1.290)
C-1-
=
lirn [-In(1 - c)] 4
00.
c-1-
(1.291)
In this case we say the integral does not exist or is divergent, and for its value we give fco.
A parallel argument is given if the integral
ib
f ( x ) dx
(1.292)
exists in the interval [c,b ] , where a < c < b. We now write the limit b
I = lim c-a+
f ( x ) dx,
(1.293)
IMPROPER INTEGRALS
39
where f(x) could be unbounded in the right neighborhood of a. If the limit (1.294) exists, we write
{ f(x) dx
= B.
(1.295)
We now present another useful result from integral calculus: Theorem 1.6. Let c be a point in the interval ( a , b ) and let f(x) be integrable in the intervals [a,a’] and [b’, b ] , where a < a’ < c < b’ < b. Furthermore, f(x) could be unbounded in the neighborhood of c. Then the integral (1.296) exists if the integrals (1.297) and b
I2 =
f(x) dx
(1.298)
both exist and when they exist, their sum is equal to I : I = 11 + 1 2 .
If either 11 or
12
(1.299)
diverges, then I also diverges.
Example 1.7. I m p r o p e r integrals: Consider the integral (1.300) (1.301) which converges provided that the integrals in Equation (1.301) converge. However, they both diverge: = lim
dx - = lim [ln1x1]: x c40-
- lim 1nIcl + -a (1.302) - c-0-
40
FUNCTIONAL ANALYSIS
and similarly,
i3e+
lim
2
c-o+
dx = lirn ; c+o+
[In 1x11:
-
In 3 - lim In IcI c-o+
--j
+a, (1.303)
hence the integral,
l:J $, also diverges.
When the range of the integral is infinite, we use the following results: If f(x) is integrable in [ q b ] and the limit rb
exists, we can write
La
f(x) dx = A.
(1.305)
f(x) dz
(1.306)
Similarly, we define the integral = B.
If the integrals (1.307) and (1.308) both exist, then we can write (1.309)
1.14
CAUCHY PRINCIPAL VALUE INTEGRALS
In Example 1.7, since the integrals
11 = l ! J
$ and 12 = Ji% both diverge,
-
%
we used Theorem 1.6 to conclude that the integral I = J !l is divergent. However, notice that I1 diverges as 1nIe -00, while I , diverges as lim,,o+(- In Icl) +cm.In other words, if we consider the two integrals
-
CAUCHY PRINCIPAL VALUE INTEGRALS
41
together, the two divergences offset each other, thus yielding a finite result for the value of the integral as (1.310) =
lim In IcI - In 1
C-0-
+ In3 -
lim In IcI -+ In3
(1.311)
c-o+
= ln3.
(1.312)
J21
The problem with $ is that the integrand, 1/x, diverges at the origin. However, at all the other points in the range [-1,3] it is finite. In Riemann integrals (Theorem 1.6), divergence of either I I or I2 is sufficient to conclude that the integral I does not exist. However, as in the above case, sometimes by considering the two integrals, I1 and 1 2 , together, one may obtain a finite result. This is called taking the Cauchy principal value of the integral. Since it corresponds t o a modification of the Riemann definition of integral, it has to be mentioned explicitly that we are taking the Cauchy principal value as X
= ln3.
(1.313)
Another example is the integral cc
I =
[
( 1.314)
x3dx,
J -co
which is divergent in the ordinary sense, since a4 + 00. x3dx = lim l a x 3 d x = lim a-cc a-co 4
(1.315)
However, if we take its Cauchy principal value, we obtain lim x3dx = a-cc
[Ta + la x3dx
x’dx]
(1.316) (1.317)
Example 1.8. Cauchy principal value: Considering the integral O3
( 1 + x ) dx
(1.318)
we write
( 1.319)
42
FUNCTIONAL ANALYSIS
For a finite c we obtain the integral (1.320) = tan-'
c
+ -21 l o g ( l + c'),
(1.321)
+
+
log(1 c')] 00. which in the limit as c + 00 diverges as [tan-' c Hence the integral I also diverges in the Riemann sense by Theorem 1.6. However, since the other integral also diverges, but this time as tan-'(-c)
00'C
-
C-00
1 2
- log(1
1
+ c') ,
--f
(1.322)
we consider the two integrals in Equation (1.319) together to obtain the Cauchy principal value of I as O0
1.15
( 1 + x ) dx
(1.323)
=T.
INTEGRALS I N V O L V I N G A P A R A M E T E R
Integrals given in terms of a parameter play an important role in applications. In particular, integrals involving a parameter and with infinite range are of considerable significance. In this regard, we quote three useful theorems: Theorem 1.7. If there exists a positive function Q(x) satisfying the inequality /f(cr,x)I Q(x) for all cy E [ a 1 , c r 2 ] ,and if Q(x)dx is convergent, then the integral
s,"
<
da)=
00
f ( a , x ) dx
(1.324)
is uniformly convergent in the interval [al,cra]. This is also called the Weierstrass M-test for uniform convergence. If an integral, Jamf(a, x)dx, is uniformly convergent in [cy1,cy2], then for any given E > 0, there exists a number co depending on E but independent of cr such that f ( a ,z)dxl < E for all c > cg > a.
IsCm
Example 1.9. Uniform convergence: Consider the integral
I =
lo
(1.325)
e-"" sinx dx,
which is uniformly convergent for cr E this we choose Q(x) as e c E Z so that
[ E , o ~ ) for
le-"" sinxi 5 e P Z
every
E
> 0. To show (1.326)
INTEGRALS INVOLVING A PARAMETER
is true for all a
43
2 E. Uniform convergence of I follows, since the integral (1.327)
is convergent. Note that by using integration by parts twice we can evaluate the integral I as 1
f”
(1.328) The case where Q = 0 may be excluded, since the integral does not converge at all.
Theorem 1.8. Let f ( o , x )and and x E [a,m). If the integral
”d(a’ a
sinx dx
be continuous for all a E
[QI, Q Z ]
(1.329) exists for all a E [ a l ,CYZ] and if the integral
(1.330) is uniformly convergent for all Q E [ a 1 , a ~then ] , g ( a ) is differentiable in [ a l ,a21 (at a1 from the right and a t a2 from the left) with the derivative
(1.331) In other words, we can interchange the order of differentiation with respect to Q and integration with respect to x as
(1.332) This is also called the Leibnitz’s rule (Kaplan). Theorem 1.9. Let f(cu,x) be continuous for all a E [a,m). Also let the integral
[al,cu2]and LC
E
(1.333) be uniformly convergent for all a E [al,a2]. Then, (a) g ( a )is continuous in [ C Y ~ , Q(at ~ ]a1 from the right and at left).
a2
from the
44
FUNCTIONAL ANALYSIS
(b) The relation
that is,
Jdm [/""
1: [I"
f(a', x) dx] da' =
1
f ( z ,a') da' dx,
(1.335)
is true for all a E [ a l , a 2 ]In . other words, the order of the integrals with respect to z and a' can be interchanged. Note that in case (a) the interval for cy does not have to be finite. Remark: In the above theorems, if the limits of integration are finite but the function f ( a ,x) or its partial derivative d f ( a ,x ) / d a is not bounded in the neighborhood of the segment defined by x = b and a E [a1,a2],we say that the integral
s(a)=
Jd
b
(1.336)
f ( a , x )dx
is uniformly convergent for all a E [ a l ,a2],if for every 60 > 0 independent of LY such that the inequality
E
> 0 we can find
a
(1.337) is true for all S E [0,So].We can now apply the above theorems with the upper limit 03 in the integrals replaced by b and the domain x E [a,00) by x E [a,b]. Example 1.10. Integrals depending on a parameter: Given the integral
(1.338) we differentiate with respect to a to write (1.339) However, this is not correct. The integral on the right-hand side of
1 dcr [--] O0
d
sinax
dx=l"cosaxdx
(1.340)
does not exist, since the limit lim
6+m
sin ax cos ax dx = lim -
(1.341)
INTEGRALS INVOLVING A PARAMETER
45
dg is not justified (Theorem does not exist. Hence the differentiation da
1.8). On the other hand, given the integral
dx
p / 2
J,
a2
cos2 x
7r
a>o, 2ff ' + sin2 x - -
(1.342)
we can write
1
dx
r/2 a2
cos2 x
d
+ sin2 x
(1.343)
to obtain the integral 2cr cos2 x dx
- --7r
2 -
2cr2'
(1.344)
Example 1.11. Integrals depending on a parameter: Consider >
f(ff,x)=
XfO, x
(1.345)
= 0,
which is continuous for all x and a.Since (1.346) which is also continuous for all x and a,and the integral (1.347) converges uniformly for all a > 0 (Example 1.9), using Theorem 1.8 we conclude that (1.348) exists and can be differentiated to write
da
X
dx =
1
" d
sin x [ e C a X T ] dx
(1.349)
( 1.350) where we have used the result in Equation (1.328). We now use Theorem 1.9 to integrate g'(a) [Eq. (1.349)], which is continuous for all a > 0 to obtain
46
FUNCTIONAL ANALYSIS
However, we can also write
Lm
g’(a)dcy= -
Lm[I*
e--az sinxdx] da
=-La[- I, e-ax sin x x
=
-
Lm[/I
-L
03
dx,
=
1
eCaXsinxda d x
sin x Tdx,
a > 0, (1.352)
which along with Equation (1.351) yields the definite integral sin x
dx = n/2.
( 1.353)
1.16 LIMITS OF INTEGRATION DEPENDING ON A PARAMETER Let A ( x ) and B ( x ) be two continuous functions with continuous derivatives
] , B ( x ) > A ( x ) . Also let f ( t , x ) and a f ( t ’ x ) be continuous in in [ x 1 , x 2 with dX the region defined by [x1,x2]and [ X I = A(x),x2 = B ( x ) ] .We can now write the intcgral ~
(1.354) and its partial derivative with respect to x as (1.355) Using the relations [Eq. (1.272)] (1.356) (1.357) we can write (1.358) (1.359) We can also write
DOUBLE INTEGRALS
47
Ai
Y;
X
Figure 1.10 The double integral.
Thus obtaining the useful formula
1.17 DOUBLE INTEGRALS Consider a continuous and bounded function, f ( ~y), , defined in a closed region R of the xy-plane. It is important that R be bounded, that is, we can enclose it with a circle of sufficiently large radius. We subdivide R into rectangles by drawing parallels t o the z and the y axes (Fig. 1.10). We choose only the rectangles in R and numerate them from 1 to n. Area of the i t h rectangle is shown as AAi and the largest of the diagonals, h, is called the norm of the mesh. We now form the sum n
(1.362) i=l
where, as in the one-dimensional integrals, (x:, y,2) is a point arbitrarily chosen in the i t h rectangle. If the sum converges to a limit as h + 0, we define the double integral as the limit n
f ( ~ 5 yT)AAi ,
lim
h-0
i=l
+
ss
f ( x , y) dxdy.
R
(1.363)
48
FUNCTIONAL ANALYSIS
Figure 1.11
Ranges in the iterated integrals.
When the region R can be described by the inequalities Yl(2)
IY I YZ(Z),
51 I
Zl(Y)
Iz I5 2 ( Y ) ,
Y1
5
I52
(1.364)
or
IY 5 Y2,
(1.365)
where Y ~ ( s ) , Y ~ ( zand ) z1(y),22(y) are continuous functions (Fig. l.ll),we can write the double integral for the first case as the iterated integral (1.366) The definite integral inside the square brackets will yield a function F ( z ) , which reduces I to a one-dimensional definite integral:
l:
F(x) dx.
(1.367)
A similar argument can be given for the second case [Eq. (1.365)]. We now present these results in terms of a theorem: Theorem 1.10. If f ( z , y) is continuous and bounded in a closed interval described by the region Yl(Z)
IY I YZ(X),
51
I2 I 52,
(1.368)
then (1.369)
PROPERTIES OF DOUBLE INTEGRALS
49
is a continuous function of x and
Similarly, if
R is described by Zl(Y)
F 2 L 52(Y), Y1 I YI Y2,
(1.371)
then we can write
A formal proof of this theorem can be found in books on advanced calculus. 1.18
PROPERTIES OF DOUBLE INTEGRALS
We can summarize the basic properties of double integrals, which are essentially same as the definite integrals of functions with single variable as follows:
I.
(1.373)
f(x,y) dxdy, c is a constant;
cf(x,y ) dxdy = c
(1.374)
R
(1.375) where
R is composed of R1 and R2,which overlap only at
the boundary.
11. There exists a point (XI,y1) in R such that
where A is the area of R.The value f(xl,yl) is also the mean value, ( f ) , of the function in the region R : (1.377)
50
FUNCTIONAL ANALYSIS
(1.378) where A l is the absolute maximum, that is,
in R and A is the area of R. IV. Uses of double integrals: If we set f ( z , y ) = 1 in J J f ( z , y ) dzdy, the double integral corresponds to the area of the region R :
R
dzdy = area of R.
(1.380)
R
For f ( : r , y ) 2 0 we can interpret the double integral as the volume between the surface z = f ( z , y ) and the region R in the zy-plane. If we interpret f(z, y ) as the mass density of a flat object lying on the zy-plane covering the region R, the double integral (1.381) gives its total mass M TRIPLE A N D MULTIPLE INTEGRALS
1.19
Methods and results developed for double integrals can easily be extended to triple and multiple integrals:
/ / If(.,
y, z ) dzdydz,
J’J’J’/f(x,
R
y, z , w) dxdydzdw . . . .
(1.382)
R
Following the arguments given for the single and the double integrals, for a continuous and bounded function f (2, y, z ) in a bounded region R defined by
we can define the triple integral
(1.384)
PROBLEMS
An obvious application of the triple integral is when f ( x , y , z ) gives the volume of the region R :
SJ'S R
d x d y d z = volume of R.
=
51
1, which (1.385)
In physical applications, total amount of mass, charge, etc., with the density p(z,y, z ) are given as the triple integral
( 1.386) The average value of a function f ( z ,y, z ) in the region R with the volume V is defined as
(1.387) Example 1.12. Volume between two surfaces: To find the volume between the cone z = d G 5and the paraboloid z = x 2 + y', we first write the triple integral
L'I'[L2+g2 L' [ &GiF
I/ = =
dz] dxdy
,/'= - x2 - y2] d x d y .
(1.388) (1.389)
We now use plane polar coordinates to write this as (1.390) (1.391) (1.392) (1.393)
PROBLEMS 1. Determine the critical points as well as the absolute maximum and minimum of the functions y = Inz, 0 < z 5 2, (i)
(iii)
+ 2x2 + 1, -2
y = z3
< x < 1.
52
FUNCTIONAL ANALYSIS
2. Determine the critical points of the functions (i)
z =z3
(ii)
z
=1
(iii)
z
= z2 - 42y
-
62y2 + y 3 ,
+ z2 + y2, -
y2.
and test for maximum or minimum.
3. Find the maximum and minimum points of z = x2
subject to the condition x2
+ 24xy + 8y2
+ y2 = 25.
4. Find the critical points of w=x+y subject to
x2
+ y2 + z 2 = 1
and identify whether they are maximum or minimum.
5. Express the partial differential equation
in spherical coordinates (r,I9,d) defined by the equations
x y
4, = r sin I9 sin 4, = r sin I9 cos
z = rcose.
where r E [0,co),I9 E [O,7r] , E [0,an]. Next, first show that the inverse transformation exists and then find it.
6. Given the mapping x
= u2 -
2,
y = 2uv,
(i) Write the Jacobian. (ii) Evaluate the derivatives
(e)z
and
($)z.
PROBLEMS
7. Find
(g)y
and
53
( h ) for dY
eu + x u - yv - 1 = 0, e” - xu yu - 2 = 0.
+
8. Given the transformation functions
show that the inverse transformations = u(x,Y),
4 2 ,Y)
=
satisfy
du
1 dy d u - _--1 ax _ dv 1 dy dv 1 dx - --- _ - -J d v ’ dy J d v ’ dx J d u ’ dy Jdu’
-- --
dx
where J = a(x ’). Apply your result t o Problem 1.6.
9. Given the transformation functions
x
= z ( u ,V ’w), = y ( u , ‘u, w),
z = z ( u ,v,w) with the Jacobian
J = a(x’”
show that the inverse transformation
v,w)’
functions have the derivatives
d u - 1 d(z,x) d u - -~ 1 d(x,y) J d ( v , w ) ’ dy Jd(v,w)’ dz Jd(v,w)’ 1 d ( y , z ) dv - 1 d(z,x) dv 1 d(x,y) - -~ J d(w, u)’ dy J d(w, u)’ d z J d(w, u)’
du 1 d(y,z) -
dx dv dx dw _ dz
-
d ( y_ , z )__ dw -- -1 _
J d ( u , v ) ’ dy
-
1 d(z,x) dw
Jd(u,v)’ dz
-
1d(x,y) -
Jd(u,v)’
Verify your result in Problem 1.5. 10. In one-dimensional conservative systems the potential can be represented by a (scalar) function, V(x),where the negative of the derivative of the potential gives the x component of the force on the system:
dV F3:(x)= -dx
54
FUNCTIONAL ANALYSIS
With the aid of a sketch, analyze the forces on a system when it is displaced from its equilibrium position by a small amount and show that a minimum corresponds to stable equilibrium and a maximum corresponds to unstable equilibrium. 11. In one-dimensional potential problems show that near equilibrium potential can be approximated by the harmonic oscillator potential V ( 2 )= -1k ( z 2
where k is a constant and
20 is
-20)
2
,
the equilibrium point. What is k?
12. Expand z(z, y) = x3 sin y
+ y2 cos z
in Taylor series up to third order about the origin. 13. If z
= z ( u ,v)
and y
= y(u, u ) , then
show the following:
14. Show the integrals
Hint: Use
som9 dx = 4.
15. Evaluate the improper integrals: dx
.I
'I2
fi(1
dx -
22)
16. First show the following: - coverges if and only if p
(ii)
(iii)
11$,
coverges if and only if p
.IC&?
> 1,
< 1,
coverges if and only if p
<1
PROBLEMS
55
and then check the convergence of
17. Check the integral
1’
x2dx (1 - x2)1/2(2x3 1)
+
for convergence.
18. Show that the integral
lco sin x
dx
is convergent by using integration by parts. 19. Using the integral
I ( a ,b ) =
1
T/2
dx a2 cos2 x + b2 sin2x
-
7r
2ab’
a>0,b > 0 ,
where a and b are two parameters, show the integral
.i
dx
T/2
20. Determine the convergent:
(u2
a!
cos2 x
2=&(+2+$).
+ b2 sin2 x>
values for which the following integrals are uniformly
21. Can the order of integration be interchanged in the following integral (explain):
1
x-a!
dx da.
22. Use the result
.I
03
g(a)
=
sin xa! dx x(x2 1)
+
=
iT
-(1 2
-
e-a),
a!
> 0,
56
FUNCTIONAL ANALYSIS
to deduce the integrals sin x a
dx
=
7r
-(122
e P Q ) ,c
>0
and
23. Evaluate the double integral
I = //2y
dxdy
over the triangle with vertices (-1,0), (0, l ) ,and (2,O).
24. Evaluate I = J’Szy
dzdy
over the triangle with vertices (0,0), (1,l),and ( 1 , 3 ) .
25. Evaluate the integral
26. First evaluate the integral
and then repeat the integration over the same region but with the x integral taken first.
27. Test the following integral for convergence:
CHAPTER 2
VECTOR ANALYSIS
Certain properties in nature like mass, charge, temperature, etc., are scalars. They can be defined at a point by just giving a single number, that is, their magnitude. On the other hand, properties like velocity and acceleration are vector quantities which have both direction and magnitude. Most of the Newtonian mechanics and Maxwell's electrodynamics are formulated in terms of the language of vector analysis. In this chapter, we introduce the basic properties of scalars, vectors, and their fields. 2.1
VECTOR ALGEBRA: GEOMETRIC M E T H O D
Abstract vectors on a plane are defined by directed line segments. The length of the line segment describes the magnitude of the physical property and the arrow indicates its direction. As long as we preserve their magnitude and * 3 in direction, we can move vectors freely in space. In this regard, A and A Figure 2.1 are equivalent vectors:
' A =3 A. Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
8. SelGuk Bayin 57
58
VECTOR ANALYSIS
Figure 2.1
Abstract vectors.
We use Latin letters with an arrow to show vector quantities:
It is also customary to use boldface letters, A, B , a,b, . . . , for vector quantities, as we use in the figures. Magnitude or norm of a vector is a positive number and is shown as
lXl,1Z1,IZl... or simply as A , B , a , .. . Multiplication of a vector with a positive number, cr > 0, multiplies the magnitude by the same number while leaving the direction untouched:
Multiplication of a vector with a negative number, /3 < 0, reverses the direction while changing the magnitude as
Two vectors can be added by using the parallelogram method (Fig. 2.2). A convenient way to add vectors is to draw them head to tail as in Figure 2.3. This allows us to define a null vector d as -+
A
+ (-1)X = d.
(2.8)
Using the cosine and the sine theorems, we can find the magnitude, r, and the angle, 4, of the resultant,
T+=x+z,
(2.9)
VECTOR ALGEBRA: GEOMETRIC METHOD
59
r=A+B
k5-A
B Figure 2.2
Addition of vectors, r = A
+ B.
0
Figure 2.3
Addition of vectors by drawing them head to tail.
as T =
(A2+B2+2ABcosB)1’2,
4 = arcsin
(t
sind) .
(2.10) (2.11)
With respect to addition, vectors commute:
x+3=z+x and associate:
+z 3 + ( B + + Z ) .
( X + 3) A set of vectors,
=
(2.12)
(2.13)
{ x,z,5 , .. . } , can be added by drawing them head t o
tail. In Figure 2.3 (left), the resultant is ?;’=
x + 3+z+a’.
(2.14)
60
VECTOR ANALYSIS
Figure 2.4
Force problems.
If the resultant is a null vector, then the head of the last vector added and the tail of the first vector added meet (Fig. 2.3, right): d = 3 + 2 + 3 +
i?+Z+T.
Example 2.1. Point of application: In physics the point of application of
a vector is important. Hence we have t o be careful when we move them around to find their resultant. In some equilibrium problems, where the forces act at the center of mass, the net force is zero (Fig. 2.4, left). In Figure 2.4 (right), where there is a net force, the resultant also acts a t the point 0. 2.1.1
Multiplication of Vectors
For the product of two vectors there are two types of multiplication: The scalar product, which is also known as the dot or the inner product, is
defined as +
A
. Z= A B C O ~ O (2.15)
where 0 is the angle between the two vectors. The dot product is also shown as (2,S‘). If we write -ri’ . as
Tt
(2.16) + Ag = A .EB = ACOSO,
2,
(2.17)
where 2~ is a unit vector along the direction of the dot product becomes a convenient way to find the projection of a vector along another vector, that is, the component, AB, of 2 along EB.
VECTOR ALGEBRA: GEOMETRIC METHOD
Figure 2.5
61
Dot or scalar product.
In physics, work is a scalar quantity defined as the force times the displacement along the direction of force. In other words, it is the dot product of the force with the displacement. For a particle in motion, the infinitesimal work is written as the dot product of the force, 3, with the infinitesimal displacement vector along the trajectory of the particle, d s ’ , as (Fig. 2.5)
sW=T.db.
(2.18)
We have chosen to write 6W instead of dW t o emphasize the fact that, in general, work is path-dependent. To find the total work done between two points A and B, we have to integrate over a specific path C connecting the two points: (2.19) JA
c
A different path connecting A to B in general yields a different value for the work done. Another type of vector multiplication is called the vector product or the cross product, which is defined as the binary operation (2.20)
3,
The result is a new vector, which is perpendicular to the plane defined by + A and 3 with the magnitude defined as (Fig. 2.6)
C = ABsinO.
(2.21)
The direction is found by the right-hand rule, that is, when we curl the fingers of your right hand from the first vector to the second, the direction of our
62
VECTOR ANALYSIS
Figure 2.6
Cross or vector product.
3.
thumb gives the direction of Note that when the order of the vectors multiplied is reversed, the direction of also reverses. Angular momentum
3
+
L = 7 x T
(2.22)
+ r = 7 x 3
(2.23)
and t,orque (Fig. 2.6)
are two important physical properties that are defined in terms of the vector product. In the above expressions, 7 is the position vector defined with respect to an origin 0 and and are the momentum and the force vectors, respectively. In celestial mechanics we usually choose the origin as the center of attraction, M . The gravitational force is central and directed toward the center of attraction; hence the torque is zero. Since the rate of change of the angular momentum is equal to the torque, in central force problems angular momentum is conserved, that is, its magnitude and direction remains fixed. This means that the orbit of a planet always remains in the plane defined by the two vectors 7 and 3. This allows us to use plane polar coordinates in orbit calculations thereby simplifying the algebra significantly.
3
Tg
2.2
VECTOR ALGEBRA: COORDINATE REPRESENTATION
A convenient way to approach vector algebra came with Descartes through the introduction of Cartesian coordinates. We define a Cartesian coordinate system by choosing three mutually orthogonal straight lines, which we identify as the 21,x2,x3-axes, respectively. We also draw three unit basis vectors, Zl,E2,E3, along these axes (Fig. 2.7). A point P in space can now be represented by the position vector 7, which can be written as the sum of three vectors, x1E1,x222, and 2323, along their respective axes as + T = XI21
+ x2s2 + 2323,
(2.24)
VECTOR ALGEBRA: COORDINATE REPRESENTATION
63
A
e.
..
\
\'
' 1
/-----__I/
Figure 2.7
Cartesian coordinates.
where z1,22,z3 are called the coordinates of the point P or the components ---f of 7. We also use 2 for the position vector. In general any vector, A , can be written as the sum of three vectors:
--f
where A l , A2, A3 are called the components of A . We can also write a vector as the set of ordered numbers
Since the unit basis vectors are mutually orthogonal, they satisfy the relations
el . e l = 1, e l . e2 = 0, 21 .23 = 0, e2 . e l = 0, e2 . e2 = 1, e2 . e3 = 0, e3 . e l = 0, e3 . e2 = 0, 23 .23 = 1, A , - .
A
h
A
A
-
h
A
A
-
A
-
h
(2.27) (2.28) (2.29)
which can be summarized as
e2. . 23. - 6.. 2 3 , i , j = 1,2,3.
h
(2.30)
The right-hand side, S i j , is called the Kronecker delta, which is equal to 1 when the two indices are equal and 0 when the two indices are different:
(2.31)
64
VECTOR ANALYSIS
Using Equation (2.30), we can write the square of the magnitude of a vector, + A , in the following equivalent ways:
The components, Ai, are obtained from the scalar products of unit basis vectors:
Ai
+
A . 2i, i
1
1
1,2,3.
2 with
the
(2.34)
In component notation, two vectors are added by adding their respective components:
Multiplication of a vector with a scalar a is accomplished by multiplying each component with that scalar:
Dot product of two vectors is written as --i
A 3 = ( X , T I ) = A l B l + A2B2 + A3B3
(2.39) (2.40)
3
=
C AiBi.
(2.41)
i=l
Using component notation one can prove the following properties of the dot product:
65
VECTOR ALGEBRA: COORDINATE REPRESENTATION
Properties of the dot product
(2.42) (2.43) (2.44) (2.45) (2.46) (2.47) (2.48) Equation (2.47) is known as the Schwarz inequality. Equation (2.48) is the triangle inequality, which says that the sum of the lengths of the two sides of a triangle is always greater than or equal to the length of the third side. Before we write the cross product of two vectors in component notation, we write the following relations for the basis vectors: el x El = 0, e2 x el = -e3, e3 x el = e2,
h
-
,
A
-
.
A
A
h
h
h
el x e2
A
= e3,
A
e2 x Z 2 = 0, h
h
h
h
e3 x e2 = -el,
h
h
el x e3 = -e2, e2 x e3 = e l , e3 x 23 = 0.
,
h
-
.
A
,
-
.
(2.49)
The cross product of two vectors can now be written as +
A x
I? = (AiEl+ A2Z2 + A3E3) x (BlEl+ B 2 E Z + B3E3) =
(AZB3 - A3B2)21+ (&B1 - AlB3)22
We now introduce the permutation symbol
& ZJk ..
=
{
0 1 -1
&ijk,
+ (All32
(2.50) -
A2Bl)E3.
which is defined as
When any two indices are equal. For even (cyclic) permutations: 123, 231, 312. For odd (anticyclic) permutations: 213, 321, 132.
(2.51)
An important identity that the permutation symbol satisfies is 3
(2.52) i=l Using the permutation symbol, we can write the i t h component of a cross product as 3
3
(2.53) j=1 k=l
66
VECTOR ANALYSIS
Using determinants we can also write a cross product as (2.54)
Note that we prefer to use the index notation ( Z ~ , I C ~ , over Z ~ ) labeling of the axes as ( ~ , y , z )and show the unit basis vectors as (21,&,23) instead of ( 2 , j , k ) . The advantages will become clear when we introduce generalized coordinates and tensors in n dimensions. A
h
-
Example 2.2. Triple product: In applications we frequently encounter the scalar triple product +
A
‘
(3x 3)= Al(B2C3 B3C2) + A2(B3Cl + A3(BlC2 B2C1), -
-
B1C3) (2.56)
-
which is geomet$ally by the vectors A ,
equal t o the volume of a parallelepiped defined Note that
3,and 3 (Fig. 2.8).
UBC =
1 3 x 31= B h l = B ( C s i n 4 )
(2.57)
is the area of the base and h2 = Acos0 is the perpendicular height t o the base thereby giving the volume as
V = h2 . UBC = (A cos 0) BC sin q5 =
(2.58) (2.59)
2 .(3x 3).
(2.60)
Using index notation, one can easily show that
v = X (3x 3) = 3. (3x X ) = 3. (2x 3). ’
(2.61)
The triple product can also be expressed as the determinant
X . ( x x 3)= d e t
(i
A2
l3;
E;
A3
)
.
Properties of the cross product can be summarized as follows:
(2.62)
VECTOR ALGEBRA: COORDINATE REPRESENTATION
67
B Figure 2.8
Triple product.
Properties of the cross product --f
AxI?=-ZXX,
(2.63)
2 x ( T ? + z ) = x X x + f x x ,
(2.64)
(02)x 3 = o ( X x 31,a is a scalar, 2 x (Z x 3)= Z(2. 3)- Z(Z.Z), (2x 3). (2x 2)= A2B2- (2.Z)2,
(2.65) (2.66) (2.67)
~ x ( Z x z ) + ~ x ( ~ x x ) + z x ( 2 x Z ) = 0 . (2.68)
Using the index notation, we can prove Equation (2.66) as 3
3
3
j=1 k = l 3
3
1=1 m=l 3
r 3
j = 1 1=1 m = l Lk=l 3
3
3
3
1
J
68
VECTOR ANALYSIS
Figure 2.9
2.3
Equation of a line.
LINES A N D PLANES
We define the parametric equation of a line passing through a point + ( ~ 0 1 , : ~ 0 2 , 2 0 2and ) in the direction of the vector A as
3 = 3 +tx,
3= (2.71)
where 3 = ( z 1 , x 2 , ~ 3 )is a point on the line and t is a parameter (Fig. 2.9). + If the components of A are (al, a 2 , a 3 ) , we obtain the parametric equation of a line in space as
(2.72) (2.73) (2.74) In two dimensions, say on the xlx2-plane, the third equation above is absent. Thus by eliminating t among the remaining two equations, we can express the equation of a line in one of the following forms: (2.75)
(2.76) a122 - a 2 2 1
=
(202a1 - z o 1 a 2 ) .
(2.77)
Consider a plane that contains the point P with the coordinates ( 1 ~ 0 1 , 2 0 2 , 2 0 2 ) . Let 3 be any nonzero vector normal to the plane at P and let 3 be any
LINES AND PLANES
Figure 2.10
69
Equation of a plane.
point on the plane (Fig. 2.10). Since ( 2- ?) is a vector on the plane whose dot product with 2 is zero, we can write
2 = 0.
(3 - ?),
(2.78)
Since any 2 perpendicular to the plane satisfies this equation, we can also write this equation as
( 2- 3). ti? = 0,
(2.79)
where t is a parameter and i? is the unit normal in the direction of write 6 as
6 = (721,722, 723),
72;
+ + 72; 72;
= 1,
we can write the equation of a plane, that includes the point and with its normal pointing in the direction i? as 721x1
+
72222
+
72323 = [zolnl
+
202722
+
5037233.
2. If we (2.80)
(~01,2025 , 02)
(2.81)
Example 2.3. Lines and planes: The parametric equation ofthe line passing through the point ? = (3,1,1) and in the direction of A = (1,5,2) is 2l(t)
=3
+t,
+ 5t, Q ( t ) = 1 + 2t. 22(t) =
1
(2.82) (2.83) (2.84)
70
VECTOR ANALYSIS
For a line in the z122-plane passing through 3 = ( 2 , 1 , 0 ) and in the + direction of A = (1,5,0) we write the parametric equation as
2 l ( t )= 2 52(t) =
1
+t,
(2.85) (2.86)
+ 5t.
We can now eliminate t to write the equation of the line as 22 =
52,
-
9.
(2.87)
For a plane including the point 7 = (2,1,-2) and with the normal = (-1,1, I ) the equation is written as [Eq. (2.8l)l
3
+ 2 2 + 5 3 = -3.
-21
(2.88)
In general, a line in the zlzs-plane is given as a51
+ b Z 2 = c.
(2.89)
Comparing with Equation (2.77), we can now interpret the vector ( a ,b) as a vector orthogonal to the line, that is,
( a , b ) .(a1,a2) = (-a2,a1).
(a1,a2)
= 0.
(2.90)
To find the angle between two planes,
+ + = 2, + + 223 = 1,
221 -21
22
(2.91) (2.92)
23
22
we find the angle between their normals, 3 1
= (2,1,1),
(2.93)
3
= (-1,1,2),
(2.94)
2
as (2.95)
e=cos-l =
2.4
[
-2+1+2
1 cos-1 6
436
]
(2.96) (2.97)
VECTOR DIFFERENTIAL CALCULUS
2.4.1 Scalar Fields and Vector Fields We have mentioned that temperature is a scalar quantity, hence a single number is sufficient to define it at a given point. In general, the temperature inside
VECTOR DIFFERENTIAL CALCULUS
71
a system varies with position. Hence in order t o define temperature in a system completely, we have to give the temperature at each point of the system. This is equivalent to giving temperature as a function of position:
This is an example of what we call a scalar field. In general, a scalar field is a single-valued differentiable function,
f(m, z2,23),
(2.99)
representing a physical property defined in some domain of space. In short, for f ( 2 1 , 2 2 , ~ we 3 ) also write f(7) or f(2). In thermodynamics temperature is a well-defined property only for systems in thermal equilibrium, that is, when the entire system has reached the same temperature. However, granted that the temperature is changing sufficiently slowly within a system, we can treat a small part of the system as in thermal equilibrium with the rest and define a meaningful temperature distribution as a differentiable scalar field. This is called the local thermodynamic equilibrium assumption and it is one of the main assumptions of the theory of stellar structure. Another example for a scalar field is the gravitational potential in Newton’s theory, a(?). For a point mass M located at the origin, the gravitational potential is written as
a(?)
=
M -G-, r
(2.100)
where G is the gravitational constant. For a massive scalar field, the potential is given as
a(?)
e-Pr
=
k-,
r
(2.101)
where p-lis the mass of the field quanta and k is a coupling constant. We now consider compressible flow in some domain of space. Assume that the flow is smooth so that the fluid elements, which are small compared to the body of the fluid but large enough to contain many molecules, are following well-defined paths called the streamlines. Such flows are called irrotational or streamline flows. At each point of the streamline we can associate a vector tangent to the streamline corresponding to the velocity of the fluid element at that point. In order to define the velocity of the fluid, we have to give the velocity vector of the fluid elements at each point of the fluid as
This is an example of a vector field. In general, we can define a vector field by assigning a vector to every point of a domain in space (Fig. 2.11).
72
VECTOR ANALYSIS
Figure 2.11
2.4.2
Flow problems.
Vector Differentiation
Trajectory of a particle can be defined in terms of the position vector ? ( t ) , where t is a parameter, which is usually taken as the time. The velocity 3 ( t ) and the acceleration 2 ( t )are now defined as the derivatives (Fig. 2.12)
7(t + v ( t )= lim
+ At)
7 ( t )d?(t) dt
(2.103)
3 ( t )d27(t) dt2 '
(2.104)
-
at
At-0
and
3(t + a ( t )= lim
+ At)
-
At
At-0
In general, for a differentiable vector field given in terms of a single parameter
t, +
+
+
A ( t )= Ai(t)Zi A2(t)Z2 A3(t)Z3,
(2.105)
we can differentiate componentwise as
(2.106) Higher-order derivatives are found similarly according t o the rules of calculus. Basic properties of vector differentiation
d +
-(A dt
dx dx + 3)= + -, dt dt
(2.107) (2.108)
d +
-(A dt
. Z )= -ddt.xZ +
-+ d 3
A . -, dt
d + d z dx - ( A x ~ ) = X X -dt+ - Xdt Z dt
(2.109) (2.110)
GRADIENT OPERATOR
73
J Figure 2.12
Vector differentiation.
Vector fields depending on more than one parameter can be differentiated partially. Given the vector field +
A (7) = A i ( 7 ) Z I + A 2 ( 7 ) Z 2 + A3(?)Z2,
since each component is a differentiable function of the coordinates, we can differentiate it as
(2.111) x 1 , x 2 , z3,
(2.112) (2.113)
2.5
GRADIENT OPERATOR
Given a scalar field @(7) defined in some domain of space described by the Cartesian coordinates ( x l , z 2 , z 3 ) ,we can write the change in a(?) for an infinitesimal change in the position vector as
@(7 +A?)
-
@(7) =d@(7)
(2.114)
If we define two vectors: (2.1 16)
74
VECTOR ANALYSIS
x A 3
/ x1
Figure 2.13
Equipotential surfaces.
and
d?
=
(dzl,d22,d ~ g ) ,
(2.117)
Va. d?.
(2.118)
we can write d@ as dQ,(?)
=
Note that even though Q, is a scalar field, introduce the differential operator
a‘@ is a vector
field. We now
(2.119) which is called the gradient or the del operator. On its own the del operator is meaningless. However, as we shall see shortly, it is a very useful operator. 2.5.1
Meaning of the Gradient
In applications we associate a scalar field, a(?), with a physical property like the temperature, gravitational, or the electrostatic potential. Usually we are interested in surfaces on which a scalar quantity takes a single value. In thermodynamics, surfaces on which temperature takes a single value are called the isotherms. In potential theory equipotentials are surfaces on which potential is a constant, that is,
GRADIENT OPERATOR
75
If we treat C as a parameter, we obtain a family of surfaces as shown in Figure 2.13. Since (a(?;') is a single-valued function, none of these surfaces intersect each other. For two infinitesimally close points, ?;'I and ?;'2, on one of the surfaces, @(XI, ~ 2 ~ x = 3 )C , the difference (Fig. 2.14), d?;' = 7
2
(2.121)
-71,
is a vector on the surface. Thus the equation
T(a.d?;'
=0
(2.122)
indicates that ?(a is a vector perpendicular to the surface This is evident in the special case of a family of planes: (a(?;')
= 12121
+
72252
+
12323
=
(a = C
(Fig. 2.14).
c,
(2.123)
where the gradient:
T@= ( n l , n 2 , 1 2 3 ) >
(2.124)
is clearly normal to the plane. For a general family of surfaces, naturally the normal vectors depend on the position in a given surface.
Example 2.4. Equation of the tangent plane to a surface: Since the normal to a surface, F ( z 1 , ~ 2 , 2 3 ) = C, and the normal t o the tangent plane at a given point, ? = ( 2 0 1 , 5 0 2 , I C O ~ ) ,coincide, we can write the equation of the tangent plane at P as
( 2- 9 ).TF = 0,
(2.125)
where 2 is a point on the tangent plane. In the limit as 2 can write ( 2- ?) = d 2 . Hence the above equation becomes
---f
a'F. d 2
? we
= 0.
In other words, in the neighborhood of a point 3 ,the tangent plane approximately coincides with the surface. To be precise, this approximation is good to first order in ( 2- ?). 2.5.2
Directional Derivative
We now consider a case where ?;'I is on the surface (a(?) = C1 and 7 2 is on the neighboring surface (a( 7) = C2. In this case the scalar product T(a.d?;' is different from zero (Fig. 2.15). Defining a unit vector in the direction of d? as B = d 7 / Id?;'l, we write (2.126)
76
VECTOR ANALYSIS
Figure 2.14 Gradient.
which is called the directional derivative of @ in the direction of G. If we move along a path, A , that intersects the family of surfaces iP = Ci, it is apparent from Figure 2.15 tha,t the directional derivative, (2.127) is zero when ct = 7 ~ 1 2 that , is, when we stay on the same surface. It is a maximum when we are moving through the surfaces in the direction of the gradient. In other words, the gradient indicates the direction of maximum change in as we move through the surfaces (Fig. 2.15). The gradient of a scalar field is very important in applications and usually defines the direction of certain processes. In thermodynamics heat flows from regions of high temperatures to low temperatures. Hence, the heat current density, is defined as proportional to the tempcrature gradient as
f,
J’(7;f)= - k v T ( ? ) ,
(2.128)
where k is the thermal conductivity. In transport problems mass flows from regions of high concentration to low. Hence, the current density of the flowing material is taken as proportional to the gradient of concentration, p C ( ? ) , as
7(7) = -KVC(?), where
ti
is the diffusion constant.
(2.129)
DIVERGENCE AND CURL OPERATORS
Figure 2.15
2.6
77
Directional derivative.
DIVERGENCE A N D CURL OPERATORS
The del operator,
d d 7= z1+ /.a e2 -+ z3 -, 8x1 8x2 ax3
(2.130)
+ can also be used to operate on a given vector field A either as
v.2,
(2.131)
7x 2.
(2.132)
or as
The first operation results n a scalar field:
7
dAl A=8x1
4
dA2 +-+8x2
dA3 8x3
(2.133)
is called the divergence o the vector field 3,and the operator V. is called the div y e r a t o r . The second operation gives another vector field, called the curl of A , components of which are given as
(2.134)
78
VECTOR ANALYSIS
or as
dA3
dA2
dAz
where
Vx
dA1
dA3
dA1
(2.135)
is called the curl operator and di stands for d / d x i .
Basic properties of the gradient, divergence, and the curl operators
a'(d$)
=
$Vd + dV$,
(2.136) (2.137) (2.138)
V x ( X + 3)= V x x + V x 3,
(2.139) (2.140)
2.6.1 Meaning of Divergence and the Divergence Theorem For a physical understanding of the divergence operator we consider a tangible case like the flow of a fluid. The density of the fluid, p ( 7 , t ) , is a scalar field and gives the amount of fluid per unit volume as a function of position and time. The current density, J(?,t ) ,is a vector field that gives the amount of fluid flowing per unit area per unit time. Another critical parameter related t o the current density is the flux of the flowing material through an area element A b . Naturally, flux depends on the relative orientation of 7 and A d . For an infinitesimal area, d 3 ,flux is defined as (2.141) (2.142) (2.143) which gives the amount of matter that flows through the infinitesimal area element da per unit time in the direction of the unit normal 5i to the surface (Fig. 2.16). Notice that when the area element is perpendicular to the flow, that is, 0 = ./a, the flux is zero. We now consider a compressible flow such as a gas flowing in some domain of space, which is described by the current density = ( J l ,J2,J3) and the
7
DIVERGENCE AND CURL OPERATORS
79
/
Figure 2.16
Flux through a surface.
matter density p. Take a small rectangular volume element
Ar = A x ~ A x ~ A x ~
(2.144)
centered at 7= (*, *, *) as shown in Figure 2.17. The net amount of matter flowing per unit time in the x2 direction into this volume element, that is, the net flux 4 2 in the 2 2 direction, is equal to the sum of the fluxes from the surfaces 1 and 2:
A42
= [ J ( x l ,0
+ Ax2)
23,
t)
+ T1\(X1,0,23,t ) ] 6AzlAx3 '
(2.145)
(2.147) where for the flux through the second surface we have used the Maclaurin series expansion of J ' ( 7 , t )for 2 2 and kept only the first-order terms for a sufficiently small volume element. Note that the flux through the first surface is negative, since 5 2 and the normal 6 to the surface are opposite in direction. Similar terms are obtained for the other two pairs of surfaces. Thus their sum gives us the net amount of material flowing into the volume element:
a4 = a41 + A42 + A43
(2.148)
Since the choice for the location of our rectangular volume element is arbitrary, for an arbitrary point in our domain we can write (2.150)
80
VECTOR ANALYSIS
"3
Figure 2.17
which is nothing but
Ad
=
Flux through a cube.
"
'
J'(7,t p r .
(2.151)
Notice that when the net flux A 4 is positive, it corresponds to a net loss of dP matter within the volume element Ar. Hence we equate it t o ----Or. Since dt
d 7
the position of the volume element is fixed, that is, dt = 0, we can write
-!&AT dt
=-
-
d p dx2 [dp& +--+--+-
8x1 dt
8x2 dt
dp d ~ 8x3 dt
dP
3
"I
dt
AT
(2.152)
--AT
at
to obtain (2.153) Since the volume element AT is in general different from zero, we can also write
a'
'
J'(?,t)
+ d P ( 7 , t ) = 0,
(2.154)
DIVERGENCE AND CURL OPERATORS
81
For a compressible fluid flow, current density can be related to the velocity field of the fluid as
7(?,t)
=p(?,t)T+(T,t),
(2.155)
where 3 ( ? , t ) is the velocity of the fluid element at ? and t. Equation (2.154) is called the equation of continuity and it is one of the most frequently encountered equations of science and engineering. It is a general expression for conserved quantities. In the fluid flow case it represents conservation of mass. In the electromagnetic theory, p stands for the electric charge density and is the electric current density. Now the continuity equation becomes an expression of the conservation of charge. In quantum mechanics, the continuity equation is an expression for the conservation of probability, where p = 99*is the probability density, while is the probability current density. For a finite rectangular region R with the surface area S we can use a network of n small rectangular volume elements, each of which satisfies
7
7
(2.156)
where the subscript i denotes the i t h volume element at T i . When we take the sum over all such cells and consider the limit as n -+ 03, fluxes through the adjacent sides will cancel each other, thus giving the integral version of the continuity equation as
Since the integral on the right-hand side is convergent, we can interchange the order of the derivative and the integral t o write
(2.158)
sv
where in the last step we have used total derivative since p ( 7 , t ) d r is only a function of time. The right-hand side is the rate of change of the total
dm
amount of matter, m, within the volume V . When m is conserved, - = 0. dt In other words, unless there is a net gain or loss of matter from the region, the divergence is zero. If there is net gain or loss of matter in a region, it implies the presence of sources or sinks within that region. That is, a nonzero divergence is an indication of the presence of sources or sinks in that region.
82
VECTOR ANALYSIS
It is important to note that if the divergence of a field is zero in a region, it does not necessarily mean that the field there is also zero, it just means that the sources are elsewhere. Divergence theorem: Another way to write the left-hand side of Equation (2.158) is by using the definition of the total flux, j SJ’ . 6 do,that is, the net amount of material flowing in or out of the surface, S , per unit time, where S encloses the region R with the volume V . Equating the left-hand side of Equation (2.158) with the total flux gives
9 . f ( T + t, ) d r =
f
.??do,
(2.159)
where 6 is the outward unit normal to the surface S bounding the volume V and do is the area element of the surface (Fig. 2.18). Equation (2.159), which is valid for any piecewise smooth surface S with the volume V and the outward normal 6, is called Gauss’s theorem or the divergence theorem, which can be used for any differentiable and integrable vector field Gauss’s theorem should not be confused with Gauss’s law in electrodynamics, which is a physical law. A formal proof of the divergence theorem for any piecewise smooth surface that forms a closed boundary with an outward unit normal 6 can be found in Kaplan. Using the divergence theorem for an infinitesimal region, we can write an integral or an operational definition for the divergence of a vector field f as
7.
(2.160)
where S is a closed surface enclosing the volume V. In summary, the divergence is a measure of the net in or out flux of a vector field over the closed surface S enclosing the volume V. I t is for this reason that a vector field with zero divergence is called solenoidal in that region. Derivation of the divergence theorem has been motivated on the physical model of a fluid flow. However, the result is a mathematical identity valid for a general differentiable vector field. Even though f . 6 da represents the flux of f through d d , may not represent any physical flow. As a mathematical identity, divergence theorem allows us to convert a volume integral to an integral over a closed surface, which then can be evaluated by using whichever is easier.
7
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
Figure 2.18
2.7
2.7.1
83
Area element on S.
VECTOR INTEGRAL CALCULUS IN T W O DIMENSIONS Arc Length and Line Integrals
A familiar line integral is the integral that gives the length of a curve as 1=
]c d s ] d d x f + d x i , =
(2.161)
C
where C denotes the curve the length of which is to be measured and s is the arc length. If the curve is parameterized as
(2.162) (2.163) we can write 1 as
1 We can also use either
=
21
.I,
/(%)2
or
22
+ (%)
2
dt.
(2.164)
as a parameter and write
(2.165) or
(2.166)
84
VECTOR ANALYSIS
Line integrals are frequently encountered in applications with linear densities. For example, for a wire with linear mass density ~ ( s )we, can write the total mass as the line integral
(2.167) or in parametric form as
Extension of these formulas to n dimensions is obvious. In particular, for a curve parameterized as ( z l ( t )z,2 ( t ) x, 3 ( t ) )in three dimensions, the arc length can be written as
(2.169) If the coordinate
z1
is used as the parameter, then the arc length becomes
(2.170) Example 2.5. W o r k done o n a particle: An important application of the line integral is the expression for the work done on a particle moving along a trajectory under the influence of a force 3 as
W=
ds,
(2.171)
where FT is the tangential component of the force, that is, the component along the displacement ds. We can also write W as
(2.172)
(2.173)
(2.174)
(2.175)
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
85
Normal and tangential components of force.
Figure 2.19
Using the relations (Fig. 2.19) F1 =
FT cos
+ FNcos
F2 =
FT sin a
-
(5 (5
- a ) = FTcos a
- a ) = FTsin a
FN sin
+ FN sin a ;
(2.176)
FN cos a ,
(2.177)
-
and
dxl = d s c o s a , dxz = d s s i n a ,
(2.178)
we can easily show the following equivalences:
W
=
L +L
+
[ F ~ c o s a F ~ s i n ac]o s a ds
[FTsin a
=
-
FNcos a]sin a ds
FT ds.
(2.179) (2.180)
In most applications, line integrals appear in combinations as
(2.181) which we also write as
(2.182)
Jc
We can consider P and Q as the components of a vector field Ti? as iij’ =
p(xi,xz)gi + Q(xi,x2)22.
(2.183)
86
VECTOR ANALYSIS
Figure 2.20
Unit tangent vector.
Now the line integral [Eq. 2.1811 can be written as
L
+ Q ( x 1 , m )dx2 =
P ( x I , x ~dzl )
LWT
ds,
(2.184)
where WT denotes the tangential component of %3 in the direction of the unit tangent vector ?(Fig. 2.20):
-t = -eldx1,
dxz+ -e2 ds + (sin a )2 2 ,
ds = (cos a )
(2.185) (2.186)
Using Equation (2.186) we write A
~ ~ = 8 ~ t = P c o s a + Q s i n a ,
(2.187)
hence proving
L
wT ds = =
(PcosQ
P dxl
+ Q sin a )ds
+ Q dxz
(2.188) (2.189)
If we represent the path in terms of a parameter t , we can write .d?
=L
P dzl + Q dx2
(2.190) (2.191) (2.192)
Example 2.6. Change i n kinetic energy: If we take 7 as the position of a particle of mass m moving under the influence of a force the
2,
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
87
work done on the particle is written as
W=L?.d?
(2.193)
= L T . d?=
dt.
(2.194)
d 3 d 3 . Substituting the second law of Newton, 3 = m-, where - is the dt dt acceleration of the particle, we can write W as
(2.195) (2.196) (2.197) (2.198) (2.199) The quantity we have defined as
T
= -mv 1 2
2
,
(2.200)
is nothing but the kinetic energy of the particle. Hence the work done on the particle is equal to the change in kinetic energy.
2.7.2
Surface Area and Surface Integrals
We have given the expressions for the arc length of a curve in space. Our main aim is now to find the corresponding expressions for the area of a surface in space, which could either be given as x3 = X S ( X ~ , X or ~ ) in parametric form as
(2.201) (2.202) (2.203) Generalizations of the formulas [Eqs. (2.170) and (2.169)] to the area of a given surface are now written as
88
VECTOR ANALYSIS
or as
S where
..’.J’
(2.205)
dudv,
=
+(z) +($), (z)2+(z)2+(z)2 2
2
2
E = ( 2 )
(2.206)
8x1 8x1 F = --+--+--
du dv
G=
8x2
8 x 3 ax3
du dv
du dv
ax2
(2.207)
(2.208)
A propcr treatment of the derivation of this result is far too technical for our purposes. However, it can be found in pages 371-378 of Treatise on Adva~iced Calculus by Franklin. An intuitive geometric derivation can be found in Advanced Calculus by Kaplan. We give a rigorous derivation when we introduce the generalized coordinates and tensors in the following chapter. For the surface analog of the line integral
(2.209) we write
I
=
/ s, ?
.db.
(2.210)
Consider a sniooth surface S with the outer unit normal defined as ~ = c o s f f ~ ~ + c o s ~ & + c o s y ~ ~
(2.211)
and take ? = (V1, V2,V3) to be a continuous vector field defined on S. We can now write the surface integral I as
/ /;(7
.6)d a =
/s,
(V1 cos 0
+ v2c o s p + v3cosy) d a
where we used the fact that projections of the surface area element d b = %a onto the coordinate planes are given as cosa d a = dx2dx3, cosp d o = dx3dx1, and cos y d a = d x l dx2. Similar to line integrals, a practical application of surface integrals is with surface densities. For example, the mass of a sheet described by the equation
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
89
x3 = x3(x1,2 2 ) with the surface density o ( x ~ , xx3) ~ ,is found by the surface integral
or in terms of the parameters u and v as (2.2 14)
2.7.3
An Alternate Way to Write Line Integrals
We can also write the line integral
L P d x l + Q dx2
(2.2 15)
as
(2.216) where the vector field
3 is defined
as + v = QZ1- P2z
(2.217)
and 2 is the unit normal to the curve, that is, the perpendicular to 2.21):
(Fig.
Fi=txz3
-
dx2-el ds
-
dxi-e2. ds
(2.218)
3 now becomes v, = 3 . 6
The normal component of
=z
(QE1- P&) . d ~ 2 dxl + P-,ds ds
= Q-
(2.2 19) el
-
-e2 ds
(2.220) (2.221)
which gives
(2.222)
90
VECTOR ANALYSIS
8
Figure 2.21
If we take
3
1
Normal and the tangential components.
3 as + u -
P dxl
+ Q dx2,
(2.223)
we get
(2.224) =
L
-Q dxl
+ P dx2.
(2.2 25)
Example 2.7. Line integrals: Let C be the arc y = x 3 from (0,O) t o (- 1,l).T h e line integral
I =
Ic
can be evaluated as
.I
-1
I =
+ x3y d y
(2.226)
+ 3 ~ ' )dx
(2.227)
xy2 dx
(x'
2'
-
(2.228)
8 -
_1 - - 1 8
3
(2.229) (2.230)
Example 2.8. Closed paths: Consider the line integral
I
=
P
y3 d x + x 2 d y
(2.231)
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
Figure 2.22
91
Closed path in Example 2.8.
over the closed path in Figure 2.22. The first integral from (0,O) to (1,O) is zero, since y = 0 along this path. In the second part of the path from (1,O) to (1.1) we use y as our parameter and find
1'
1 dy = 1.
(2.232)
We use x as a parameter t o obtain the integral over y = x as
lo
x3 dx
+ x2 dx
-
(2.234) 7 -_
(2.235)
12'
Finally, adding all these we obtain I 2.7.4
(2.233)
=
A.
Green's Theorem
Theorem 2.1. Let D be a simply connected domain of the xlz2-plane and let C be a simple (does not intersect itself) smooth closed curve in D with its interior also in D. If P ( z l , x 2 ) and Q(x1,z2) are continuous functions with continuous first partial derivatives in D , then (2.236) where R is the closed region enclosed by C. Proof: We first represent R by two curves,
a 5x15 4
fl(X1)
5 2 2 F f2(x1),
(2.237)
as shown in Figure 2.23 and write the second double integral in Equation (2.236) as
92
VECTOR ANALYSIS
I
I
I
l
a
b
Figure 2.23
*x
1
Green’s theorem.
The integral over x2 can be taken immediately to yield
(2.240) P(Xi,X2) dXl.
=
(2.241)
Similarly, we can write the other double integral in Equation (2.236)as
(2.242) thus proving Green’s theorem. Example 2.9. Green’s Theorem: Using Green’s theorem [Eq. (2.236)], we can evaluate
I =
16xy3 dx + 24x2y2dy,
where C is the unit circle, x2+y2 = 1, and P I
=
/
=
(2.243)
16xy3and Q = 24x2y2as
L(48xy2 - 48xy2) dxdy = 0.
(2.244)
93
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
Example 2.10. Green’s Theorem: For the integral 2y
dx+-
X
22
+ y2
dY,
(2.245)
+
where C is the circle x2 y2 = 2, we cannot apply Green’s theorem since P and Q are not continuous at the origin.
Example 2.11. Green’s Theorem:
For the integral
(32 - y) dx
I =
+ (X + 5y) dy
/L(1+
1) dxdy
= 2A.
(2.246)
(2.247) (2.248)
where A is the area enclosed by the closed path C.
2.7.5
Interpretations of Green’s Theorem
I. If we take
in [Eq. (2.184)] (2.250)
where W T is the tangential component of 3, that is, W T = 3 . T a n d notice that the right-hand side of Green’s theorem [Eq. (2.236)] is the 2 3 component of the curl of that is,
a,
(2.251)
we can write Green’s theorem as (2.252)
This is a special case of Stokes’s theorem that is discussed in Section 2.8. 11. We have seen that if we take 3 as + v = QZ1 - P&,
(2.253)
94
VECTOR ANALYSIS
we can write the integral
I =
P dxl + Q dx2
(2.254)
as [Eq. (2.2lG)l
I=Lg.??ds=
(2.255)
Now, the Green’s theorem for 5 can be written as (2.256) (2.257) This is the two-dimensional version of the divergence theorem [Eq. (2.159)]. 111. Area inside a closed curve: If we take P = 2 2 in Equation (2.241) or Q = x1 in Equation (2.242), we obtain the area of a closed curve as
/J
dxldx2 = R
i
52
=i x 1
dxl
dx2.
(2.258) (2.259)
Taking the arithmetic mean of these two equal expressions for the area of a region R enclosed by the closed curve C, we obtain another expression for the area A as
which the reader can check with Green’s theorem [Eq. (2.236)]. 2.7.6
Extension to Multiply Connected Domains
When the closed Dath C in Green’s theorem encloses Doints a t which one or dP both of the derivatives - and - do not exist, Green’s theorem is not
aQ
817:1
8x2
applicable. However, by a simple modification of the path, we can still use Green’s theorem to evaluate the integral
I
=
fc P dxl + Q dx2.
(2.261)
Consider the doubly connected domain D shown in Figure 2.24 (left) defined by the boundaries a and b, where the closed path C1 encloses the hole in the
VECTOR INTEGRAL CALCULUS IN TWO DIMENSIONS
Figure 2.24
95
Doubly connected domain and the modified path.
domain. As it is, Green’s theorem is not applicable. However, if we modify our path as shown in Figure 2.24 (right), so that the closed path is inside the region, where the functions P, Q and their first derivatives exist, we can apply Green’s theorem to write
[Pdxl + Q d ~ z ] (2.262) (2.263) where R is now the simply connected region bounded by the closed curve C = C1 L1 + L2 C2.We can choose the two paths L1 and L2 as close as possible. Since they are traversed in opposite directions, their contributions cancel each other, thereby yielding
+
+
In particular, when (2.265) we obtain
96
VECTOR ANALYSIS
“‘r
tx2
Figure 2.25
Paths in Example 2.12.
The advantage of this result is that by choosing a suitable path C2, such as a circle, we can evaluate the desired integral I , where the first path, C1, may be awkward in shape (Fig. 2.24, left). Example 2.12 Multiply-connected domains: Consider the line integral
where C1 and C2 are two closed paths inside the domain D defined by two concentric circles x: + xz = a2 and xf xz = b2 as shown in Figure 2.25 (left). For the path C1, P, Q, and their first derivatives exist inside aP C1.Furthermore, since and - are equal:
+
aQ
8x2
8x1
(2.269)
I is zero. For the second path C2, which encloses the hole at the center, we modify it as shown in Figure 2.25 (right), where C, is chosen as a circle with radius T ( T > a),so that the integral on the right-hand side of (2.270)
97
CURL OPERATOR AND STOKES'S THEOREM
L
Figure 2.26
Curl of a vector field
can be evaluated analytically. The value of the integral over 6'3, x? xi = r 2 , can be found easily by introducing a new variable 6' :
I
=
.d'
+
x1 = T C O S ~ ,
(2.271)
x2 = T sin 0, Q E [O, 2 ~ ]
(2.272)
[ P dxl
+ Q dxz]=
(2.273)
.1c3
2.8
CURL OPERATOR AND STOKES'S THEOREM
2.8.1 On the Plane Consider a vector field on the plane and its line integral over the closed rectangular path shown in Figure 2.26 as
We first consider the integral over C1, where
22 = 2 0 2 :
98
VECTOR ANALYSIS
Figure 2.27
Infinitesimal rectangular path.
Expanding 7 ( 5 1 , 2 2 ) in Taylor series about linear terms, we write
+
( ~ 0 1 , 5 0 2and )
(22
keeping only the
(2.276)
- 2021,
which, when substituted into Equation (2.275), gives
We now take the integral over C3, where Ax1 to 501:
201
+
2 2 = x02
+ Ax2 and x1 varies from
xo1 i 3 d ( z 1 , x 2 ) . d 7 = /
dZiVi(x1,xoz
+ Axz).
(2.278)
zoi+A~i
Substituting the Taylor series expansion [Eq. (2.276)] of ( 2 0 1 , xo2) evaluated at 2 2 = 2 0 2 Ax2:
+
Vl(x1, x2)
about
(2.279) into Equation (2.278) and integrating gives
-
AX2AX1.
(2.280)
CURL OPERATOR AND STOKES’S THEOREM
99
Note the minus sign coming from the dot product in the integral. Combining these results [Eqs. (2.277) and (2.280)] we obtain
L
.k
T ( x 1 , ~ 2 ) . d 7 + T‘(xi,x2).d?=
I
-
Ax~Ax~.
dV1(x11x2) dx2 (xo1 J 0 2 )
A similar procedure for the paths
C2
(2.281)
and C, yields
(2.282) which, after combining with Equation (2.281), gives
Since the location of 7 0 is arbitrary on the xlx2-plane, we can drop the subscript 0. If we also notice that the quantity inside the square brackets is the 2 3 component of x we can write
a‘ d,
(2.284) The approximation we have made by ignoring the higher-order terms in the Taylor series expansion is justified in the limits as Ax1 0 and Ax2 -+ 0. For an infinitesimal rectangle we can replace Axl with dxl and Ax2 with dx2. Similarly, AD= Ax1Ax2 can be replaced with the infinitesimal area element da12. For a finite rectangular path C, we can sum over the infinitesimal paths as shown in Figure 2.27. Integrals over the adjacent sides cancel, thereby leaving only the integral over the boundary C as -+
=
/l(7 7) x
.6da12,
(2.285)
where C is now a finite rectangular path. The right-hand side is a surface integral to be taken over the surface bounded by C , which in this case is the region on the xlx2-plane bounded by the rectangle C, with its outward normal 6 defined by the right-hand rule as the Z3 direction. Using Green’s theorem [Eq. (2.252)] in Equation (2.284), we see that this result is also valid for an arbitrary closed simple path C on the xlx2-plane as (2.286)
100
VECTOR ANALYSIS
tz3
Figure 2.28
Different surfaces with the same boundary C.
tf
where VT is the tangential component of along C and R is the region bounded by C. The integral on the right-hand side is basically a surface integral over a surface bounded by the curve C , which we have taken as S 1 lying on the zlzz-plane, with its normal 6 as defined by the right-hand rule (Fig. 2.28, left). We now ask the question, What if we use a surface Sz in three dimensions (Fig. 2.28, right), which also has the same boundary as the planar surface in Figure 2.28 (left)? Does the value of the surface integral on the right-hand side in Equation (2.286) change? Since the surface integral in Equation (2.286) is equal to the line integral (2.287) which depends only on and C, its value should not depend on which surface we use. In fact, it does not, provided that the surface is oriented. An oriented surface has two sides: an inside and an outside. The outside is defined by the right-hand rule. As in the first two surfaces in Figure 2.29, in an oriented surface one cannot go from one side to the other without crossing over the boundary. In the last surface in Figure 2.29 we have a Mobius strip, which has only one side. Following a closed path, one can go from one “side” with the normal 6i to the other side with the normal 60,which points exactly in the opposite direction, without ever crossing a boundary. Consider two orientable surfaces S 1 and S, with the same boundary C and cover them both with a network of simple closed paths Ci with small areas, Aa,, each (Fig. 2.30). In the limit as Aai 4 0, each area element naturally coincides with the tangent plane to the surface of which it belongs at that point. Depending on their location, normals point in different directions. For
CURL OPERATOR AND STOKES'S THEOREM
Figure 2.29
101
Orientable surfaces versus the Mobius strip.
each surface element we can write
?.d7
=
(? x 74) .%iAui,
(2.288)
where i denotes the ith surface element on either 5'1 or 5'2, with the boundary Ci.For the entire surface we have t o sum these as
l and for S (2.290) for S2. Since the surfaces have different surface areas, 1 and m are different in general. In the limit as 1 and m go to infinity, contributions coming from adjacent sides will cancel. Thus the sums on the left-hand sides of Equations (2.289) and (2.290) reduce t o the same line integral over their common boundary C : (2.291) On the other hand, the sums on the right-hand sides become surface integrals over their respective surfaces, S 1 and S2. Hence, in general we can write (2.292) where S is any oriented surface with the boundary C.
102
VECTOR ANALYSIS
Figure 2.30
2.8.2
Two orientable surfaces
In Space
In Equation (2.292) even though we took S as a surface in three-space, its boundary C is still on the zlx2-plane. We now generalize this result by taking the closed simple path C also in space. Stokes’s Theorem: Consider a smooth oriented surface S in space with a smooth simple curve C as its boundary. Then for a given continuous and differentiable vector field (2.293) in some domain D of space, which includes S, we can write
(2.294)
where 6 is the outward normal to S. Proof: We first write Equation (2.294) as
i
? . d?
=
V, dzl
+ V, dz2 + V3 dz3
(2.295)
CURL OPERATOR AND STOKES’S THEOREM
103
P-7 C
I I
I
I
I I
Figure 2.31
Stokes’s theorem in space.
which can be proven by proving three separate equations:
(2.298) We also assume that the surface S can be written in the form x3 = f(z1,xz)
(2.299)
and as shown in Figure 2.31, C12 is the projection of C onto the xlx2-plane. Hence, when (x1,~ 2 ~ x goes 3 ) around C a full loop, the corresponding point ( X I , 2 2 , O ) also completes a full loop in C l2 in the same direction. We choose the direction of 2 with the right-hand rule as the outward direction. Using Green’s theorem [Eq. (2.236)l with Q = 0, we can write
=
-
1L,,[9 9x1
We now use Equation (2.212) with
ax2
+
ax3
ax2
dxldx2.
(2.300)
7taken as (2.301)
104
VECTOR ANALYSIS
7
Note that in Equation (2.212) is an arbitrary vector field. Since the normal 75’ to d a is just the gradient to the surface 23
-
f (51,XZ)= 0,
(2.302)
that is,
*=
af
(--,--,I),
8x1
af
(2.303)
ax2
we write Equation (2.212) as
Since
where y is the angle between ?3 and 3, d a co s y is the projection of the area element, d b ,onto the xlx2-plane, that is,
We can now rewrite the left-hand side of Equation (2.304) as an integral over Rl2 to obtain the relation -
/ 1,2[zg + 21
dxldx2=
/ ./,2
dx3dx1 - dV1 dxldx2. 8x2 (2.307)
Substituting Equation (2.307) into (2.300), we obtain
which is Equation (2.296). In a similar fashion we also show Equations (2.297) and (2.298). Finally, adding Equations (2.296)-(2.298) we establish Stokes’s theorem.
MIXED OPERATIONS WITH THE DEL OPERATOR
105
A
n
Figure 2.32
2.8.3
Unit normal to circular path.
Geometric Interpretation of Curl
We have seen that the divergence of a vector field ? is equal to the ratio of the flux through a closed surface S to the volume enclosed by S in the limit as the surface area of S goes to zero [Eq. (2.160)], that is, (2.309) Similarly, we can give an integral definition for the value of the curl of a vector field in the direction G as
7
=
lim
r-0
jCr7. d 7 A,
(2.310) 1
where C, is a circular path with radius r and area A,, and G is the unit normal to A, determined by the right-hand rule (Fig. 2.32). In the limit as the size of the path shrinks to zero, the surface enclosed by the circular path can be replaced by a more general surface with the normal 6. Note that this is also an operational definition that can be used to construct a “curl-meter,” that is, an instrument that can be used to measure the value of the curl of a vector field in the direction of the axis of the instrument, 6.
2.9
MIXED OPERATIONS W I T H T H E DEL OPERATOR
3
By paying attention to the vector nature of the operator and also by keeping in mind that it is meaningless on its own, we can construct several other useful operators and identities. For a scalar field, @(7), a very useful operator can be constructed by taking the divergence of a gradient as
3.3@(7) = 32@(7),
a‘
(2.311)
where the operator is called the Laplacian or the Laplace operator, which is one of the most commonly encountered operators in science. Two
106
VECTOR ANALYSIS
very important vector identities used in potential theory are ~ ' . ( V+ AX )=O
(2.312)
and
Txvk=o,
(2.313)
where 2 and Q? are differentiable vector and scalar fields, respectively. In other words, the divergence of a curl and the curl of a gradient are zero. Using the definition of the operator, proofs can be written immediately. For Equation (2.312) we write
a'
a ' . ( v x x ) = d e t ( : A1
Az
2 ), A3
(2.314)
and obtain
(2.315)
(2.316)
(2.317) Since the vector field is differentiable, the order of differentiation in Equation (2.317) is unimportant and so the divergence of a curl is zero. For the second identity [Eq. (2.313)], we write (2.318)
(2.319) = 0.
(2.320)
Since for a differentiable Q? the mixed derivatives are equal, we obtain zero in the last step, thereby proving the identity.
107
MIXED OPERATIONS WITH THE DEL OPERATOR
Another useful identity is
a'.( U V V ) = a'u . a'v + u w u ,
(2.321)
where u and u are two differentiable scalar fields. We leave the proof of this identity as an exercise, but by using it we prove two very useful relations. We first switch u and u in Equation (2.321):
a'.(ua'u)= v u . vu + va'2u;
(2.322)
then we subtract this from the original equation [Eq. (2.32l)l to write
a'.( U V U ) - a'.(va'u)= ua'2u
-
va'2u.
(2.323)
Integrating both sides over a volume V bounded by the surface S and using the divergence theorem (2.324) we obtain Green's first identity:
d 2 . [ua'u - ua'u] =
d r [ua"u
-
ua'"u] .
(2.325)
Applying the similar process directly to Equation (2.321), we obtain Green's second identity:
~ d d - u ~ u - ~ d r [ ~ u - a ' ~ - u a ' ~ u(2.326) ]. Useful vector identities (2.327)
a'Xa'f=O,
9 .(T x 3 )= o ,
(2.328) (2.329)
(2.330) a' x a' x 2 = a'(T.2) v22, a ' . ( X x 3)= Z . ( ( a ' X 2 )- 2 . ( V x 3), (2.331) V ( 2 .3)= 2 x (a'x 3)+ 3 x (a'x 2)+ ( 2 .T ) 3+ (8. V)2, -
(2.332)
3 x (2x 3)= X p . 3) 3 p .2)+ (8. a')X - (2. V)3. -
(2.333)
108
VECTOR ANALYSIS
A
Figure 2.33
Gravitational force and gravitational field.
2.10 POTENTIAL THEORY The gravitational force that a point mass M located a t the origin exerts on another point mass m a t ? is given by Newton’s law (Fig. 2.33) as
Mm, 3 = -G-er, r2
(2.334)
where G is the gravitational constant and Zr is a unit vector along the radial direction. Since mass is always positive, the minus sign in Equation (2.334) indicates that the gravitational force is attractive. Newton’s law also indicates that the gravitational force is central, that is, the force is directed along the line joining the two masses. We now introduce the gravitational field 3 due to the mass M as +
M, r2
g = -G-er,
(2.335)
which assigns a vector to each point in space, with a magnitude that decreases with the inverse square of the distance and always points toward the central mass M (Fig. 2.33). Gravitational force that M exerts on another mass m can now be written as
F’
=m
3.
(2.336)
In other words, M attracts m through its gravitational field, which eliminates the need for action at a distance. Field concept is a very significant step in understanding interactions in nature. Its advantages become even more clear with the introduction of the Lagrangian formulation of continuum mechanics and then the relativistic theories, where the speed of light is the maximum speed with which any effect in nature can propagate. Of course, in Newton’s theory the speed of light is infinite and the changes in a gravitational field at
POTENTIAL THEORY
109
a given point are felt everywhere in the universe instantaneously. Today the field concept is an indispensable part of physics, at both the classical and the quantum level. We now write the flux 4 of the gravitational field of a point mass over a closed surface S enclosing the mass M (Fig. 2.34, left) as
(2.337) Since the solid angle, do, subtended by the area element du is (2.338) (2.339) (2.340) we can write the flux as
(2.341) = -GM i d , , ,
whcre dA is the area element in the direction of surface gives
(2.342)
&. Integration over the entire
4 = -4rGM,
(2.343)
where the solid angle subtended by the entire surface is 47r. We now use the divergence theorem [Eq. (2.159)] to write the flux of the gravitational field as
(2.344) which gives
7 .T d r = -47rGM,
(2.345)
where V is the volume enclosed by the closed surface S. An important property of classical gravity is linearity; that is, when there are more than one particles interacting with m, the net force that m feels is the
110
VECTOR ANALYSIS
Figure 2.34
Flux of the gravitational field.
sum of the forces that each particle exerts on m as if it were alone. Naturally, for a continuous distribution of matter with density p ( 7 ) interacting with a point mass m, the mass M in Equation (2.345) is replaced by an integral:
M
+
L
p ( 7 ) dr.
We now write Equation (2.345) as
.If 3 . T or as
d r = -47rG
L (7.+ T
L
(2.346)
p ( 7 ) dr
(2.347)
I r G p ( 7 ) ) d r = 0.
(2.348)
For an arbitrary but finite volume element, the only way to satisfy this equality is to have the integrand vanish, that is,
3 . ?J’+ 47rGp( 7)= 0,
(2.349)
which is usually written as
3 .3 = -47rGp(7).
(2.350)
This is the classical gravitational field equation to which Einstein’s theory of gravitation reduces in the limit of weak fields and small velocities. Given the
POTENTIAL THEORY
111
mass distribution p(?), it gives a partial differential equation to be solved for the gravitational field T . If we choose a closed surface that does not include the mass M , then the net flux over the entire surface is zero. If we concentrate on a pair of area elements, dAl and d A 2 , in the figure on the right (Fig. 2.34), we write the total flux as
+
(2.351)
d412 = d d i d4z = - G M d R l + GMdR2.
(2.352)
Since the solid angles, dR1 and dR2, subtended at the center by dA1 and Az, respectively, are equal, d41 and dq52 cancel each other. Since the total flux is the sum of such pairs, the total flux is also zero. The gravitational field equation to be solved for a region that does not include any mass is given as
T.7j+=O0.
(2.353)
As we have mentioned before, this does not mean the gravitational field is zero in that region, but it means that the sources are outside the region of interest.
2.10.1
Gravitational Field of a Spherically Symmetric Star
For a spherically symmetric star with density p ( r ) , the gravitational field depends only on the radial distance from the origin. Hence we can write ++
(
'
(2.354)
= g(')cT,
where ZT is a unit vector pointing radially outwards. To find g ( r ) ,we choose a spherical Gaussian surface, S ( T ) ,with radius r. Since the outward normal to a sphere is also in the ZT direction, we utilize the divergence theorem to convert the volume integral in Equation (3.347) to a surface integral,
JI,,, 7 .T
dr
and write
g ( r ) (ZT . G ) do
=
-4rG
f ' g ( r ) r z dR = -47rG
g(r)r2
T . d*,
=
f' dR = -47rG
L(T)
dr,
(2.356)
p(r)r2drdR,
(2.357)
f' dR,
(2.358)
P(T)
s,,) 1
(2.355)
p(r)r2 d r
where d o = r2dR = r2 sin 9 d9d+ is the infinitesimal surface area element of the sphere. Since $dR = 4n, we obtain the magnitude of the gravitational
112
VECTOR ANALYSIS
Figure 2.35
Work done by the gravitational field.
field as
(2.359) (2.360) (2.361) An important feature of this result is that part of the mass lying outside the Gaussian surface, which is a sphere of radius r, does not contribute to the field at r and the mass inside the Gaussian surface acts as if it is concentrated at the center. Note that dm, is the mass of an infinitesimal shell at r with thickness dr. Similarly, if we find the gravitational field of a spherical shell of radius R, we find that for points outside, r 2 R, the shell behaves as if its entire mass is concentrated a t the center. For points inside the shell, the gravitational field is zero. These interesting features of Newton’s theory of gravity also remain intact in Einstein’s theory, where they are summarized in terms of Birkhoff’s theorem.
2.10.2 Work Done by Gravitational Force We now approach the problem from a different direction. Consider a test particle of mass m moving along a closed path C in the gravitational field of another point particle of mass M (Fig. 2.35). The work done by the
POTENTIAL THEORY
113
gravitational field on the test particle is (2.362) (2.363) where gT is the tangential component of the gravitational field of M along the path. Using Stokes’s theorem [Eq. (2.294)], we can also write this as
W
=m
i 3. d 7
(2.364) (2.365)
If we calculate 3 x for the gravitational field of the point mass M located at the origin, we find
a‘~y=v’x =-GMTx
(2.366)
[
X l G (Lc;
+
x2z2
+ x323
+ x$ + x 3 3 / 2
= 0.
1
(2.367) (2.368)
3 T
Substituting x = 0 into Equation (2.365), we obtain the work done by the gravitational field on a point particle m moving on a closed path as zero. If we split a closed path into two parts as C1 and C2, as shown in Figure 2.35, we can write (2.369) CZ
c1
Interchanging the order of integration, we obtain r2
r2
(2.370) Since C1 and C2 are two arbitrary paths connecting points 1 and 2, this means that the work done by the gravitational field is path-independent. As the test particle moves under the influence of the gravitational field, it also satisfies the second law of Newton, that is, (2.371) (2.372)
114
VECTOR ANALYSIS
Using this in Equation (2.370), we can write the work done by gravity as (2.373)
(2.374)
(2.375)
(2.376)
(2.377)
(2.378)
(2.379) In other words, the work done by gravity is equal to the change in the kinetic energy,
T
=
1 2 -mu , 2
(2.380)
7
of the particle as it moves from point 1 to 2. This result, x 3 = 0, obtained for the gravitational field of a point mass M has several important consequences. First of all, since the gravitational interaction is linear, the gravitational field of an arbitrary mass distribution can be constructed from the gravitational fields of point masses by linear superposition. Hence Vx?j+=0
(2.381)
is a general property of Newtonian gravity, independent of the source and the coordinates used.
2.10.3 Path Independence and Exact Differentials We have seen that for an arbitrary vector field 3, if the curl is identically zero, then we can write 3 as the gradient of a scalar field, that is, if 7 X ? = O ,
(2.382)
POTENTIAL THEORY
we can always find a scalar field,
a(?),
such that
37 = ?a. The existence of a differentiable of 3, that is, by the conditions
115
(2.383)
is guaranteed by the vanishing of the curl
dVl
dV2
8x2
8x1
=0,
(2.384) (2.385)
dV3
dUl
8x1
8x3
1
v1 dxl
= 0.
(2.386)
+ v2 dx2 + u3 dx3.
(2.387)
We consider the line integral
l2
3. d 7=
If we can find a scalar function, are
2
@(XI,
x 2 , 2 3 ) , such
that its partial derivatives
(2.388) then the line integral [Eq. (2.387)] can be evaluated as
(2.389) (2.390) (2.391) = @(2) - @(l).
(2.392)
In other words, when such a @ can be found, the value of the line integral; J;” 37 .d?, depends only on the values that Q, takes at the end points, that is, it is path-independent. When such a function exists, vldxl + v2dxz + v3dx3 is called an exact differential and can be written as
116
VECTOR ANALYSIS
The existence of (a is guaranteed by the following sufficient and necessary differentiability conditions:
(2.394) (2.395) (2.396)
(2.397) (2.398) (2.399) which are nothing but the conditions [Eqs. (2.384)-(2.386)] for
2.10.4
Gravity and Conservative Forces
We are now ready t o apply all this t o gravitation. Since introduce a scalar function @ such that + g =-?(a, where
9 x 3 = 0.
(a(?)
9x3
=
0, we
(2.400)
is called the gravitational potential:
a(?)
= -G-
M r
(2.401)
The minus sign is introduced to assure that the force is attractive, that is, it is always toward the central mass M . We can now write Equation (2.370) as
(2.402) = -m[@(2) - @(I)].
(2.403)
Using this with Equation (2.379), we can write
[:
-m [@(a)- @(I)]= -mu
2]2-
[+U2l1.
(2.404)
POTENTIAL THEORY
117
If we rewrite this as (2.405) we see that the quantity 1 -mu2 +ma(?) = E 2
(2.406)
is a constant throughout the motion of the particle. This constant, E , is nothing but the conserved total energy of the particle. The first term, + m u 2 ,is the familiar kinetic energy. Hence we interpret ma(?) as the gravitational potential energy, 0, of the particle m,
R(?)
= m@(?),
(2.407)
and write 1 -mu2fR=E. 2
(2.408)
To justify our interpretation of R , consider m at a height of h from the surface of the Earth (Fig. 2.36) and write
R = m@(7) -
-m-
GM ( R+ h ) (2.409)
<< R, we
can use the binomial
GMm (1 __ ! I ,) R
(2.410)
where R is the radius of the Earth. For h expansion to write -
RE--GMm R + m ( g ) h .
(2.411)
If we identify GM/R2 as the gravitational acceleration g , the average numerical value of which is 9.8 m/s2, the second term becomes the gravitational potential energy, mgh, familiar from elementary physics. The first term on the right is a constant. Since from 3 = -q@, adding or subtracting a constant to @ does not effect the fields and hence the forces, which are the directly accessible quantities to measurement, we can always choose the zero level of the gravitational potential energy. Thus, when R >> h we can take the gravitational potential energy as
R
= mgh.
(2.412)
118
VECTOR ANALYSIS
tm I
Ih I
Figure 2.36
Gravitational potential energy.
From the definition of the gravitational field of a point particle, + MA g = -G-er,
(2.4 13)
r 2
it is seen that operationally the gravitational field at a point is basically the force on a unit test mass. Mathematically, the gravitational field of a mass distribution given by the density p ( 7 ) is determined by the field equation
V . ?= j’ -47rGp( 7),
(2.414)
which is also known as Gauss’s law for gravitation. Interactions with a vanishing curl are called conservative forces. Frictional forces and in general velocity-dependent forces are nonconservative, since the work done by them depends upon the path that the particles follow. 2.10.5
Gravitational Potential
We consider Equation (2.402) again and cancel m on both sides to write r2
3 . d 7
Q(2) - Q(1) = -
(2.4 15)
C
or (2.416)
If we choose the initial point 1 a t infinity and define the potential there as zero and the final point 2 as the point where we want to find the potential,
POTENTIAL THEORY
119
we obtain the gravitational potential as
-
@(?)
=
-
T
’
d?.
s’, -
(2.41 7)
From Figure 2.37 it is seen that the integral . d? is equal to the work that one has to do to bring a unit test mass infinitesimally slowly from infinity to ? : (2.4 18) (2.419) (2.420)
Note that for the test mass to move infinitesimally slowly, we have to apply a force by the amount (2.421)
so that the test particle does not accelerate towards the source of the gravitational potential. For a point mass M this gives the gravitational potential as
M @(?) = -G-. r
(2.422)
What makes this definition meaningful is that gravity is a conservative field. Hence @ is independent of the path we use (Fig. 2.37). We can now use + g =-?a
(2.423)
to write the gravitational field equation [Eq. (2.414)] as
7.g@= 47iGp,
(2.424)
V2@ = 4-irGp,
(2.425)
or as
which is Poisson’s equation. In a region where there is no mass, the equation t o be solved is
3% = 0,
(2.426)
which is Laplace equation, and the operator ?’ is called the Laplacian. The advantage of working with the gravitational potential is that it is a scalar and hence has only magnitude, which makes it easier t o work with.
120
VECTOR ANALYSIS
Figure 2.37
Gravitational potential.
Since gravity is a linear interaction, we can write the potential of N particles by linear superposition of the potentials of the individual particles that make up the systeni as N
mi
@(7) = -G
(2.427)
i=l
where 7 i is the position of the ith particle and 7is called the field point. In the case of a continuous mass distribution, we write the potential as an integral:
Q(7) = -G
p ( 7’) d37’
(2.428)
where the volume integral is over the source points 7’.After a(?) is found, one can construct the gravitational field easily by taking its gradient, which involves only differentiation. 2.10.6
Gravitational Potential Energy of a System
For a pair of particles, gravitational potential energy is written as [Eqs. (2.401) and (2.407)]
R
=
Mm -G-, T
(2.429)
where T is the separation between the particles. For a system of N discrete particles we can consider the system as made up of pairs and write the grav-
121
POTENTIAL THEORY
it,ational potential energy in the following equivalent ways: mimj 7.. - 7. - ---t
C
R=-G
-,
All pairs, i#j
211-
Ti
(2.430)
'Zj
(2.431)
(2.432) We have written R in three different ways. First of all, we do not include the cases with i = j , which are not even pairs. These terms basically correspond to the self energies of the particles that make up the system. We leave them out since they contribute as a constant that does not change with the changing configuration of the system. The factor of 1/2 is inserted in the last expression to avoid double counting of the pairs. Note that R can also be written as 1 2
+
+ . . . + m,@.,)
(2.433)
R = - (ml@1 m2@2 N
N
1 9. i f i. = - E m + @ + , = -GT
2
(2.434)
2=1
where @i is the gravitational potential at the location of the particle mi due to all other particles. If the particles form a continuum with the density p, we then write
R=il =
f
M
@dm @(?"')p(?"')
(2.435) (2.436)
d37',
where @ is the potential of the part of the system with the mass M acting on dm = p d 3 7 .
-
dm
Example 2.12. Gravitational potential energy of a uniform sphere: For a spherically symmetric mass distribution with density p ( r ) and radius R we can write the gravitational potential energy as
R(R) =
@ dm
(2.437) (2.438)
where m(r) is the mass inside the radius r, and d,m is the mass of the shell with radius r and thickness dr: dm = 47rr2p(r) dr.
(2.439)
122
VECTOR ANALYSIS
For uniform density po we write s1 as (47rp0r3/3) 47rr2po d r
R(R) = - G I
r
,
(2.440)
which gives
R(R) = --.
3GM2 5R
(2.441)
Because of the minus sign, this is the amount of work that one has t o do to disassemble this object by taking its particles t o infinity. 2.10.7
Helmholtz Theorem
We now introduce an important theorem due to Helmholtz, which is an important part of potential theory. Theorem 2.2. A vector field, if it exists, is uniquely determined in a region R surrounded by the closed surface S by giving its divergence and curl in R and its normal component on S. Proof: Assume that there are two fields, dl and 3 2 , that satisfy the required conditions, that is, they have the same divergence, curl, and normal component. We now need to show that if this be the case, then these two fields must be identical. Since the divergence, the curl, and the dot product are all linear operators, we define a new field 7i? as
d
=3 1
-
3 2 ,
(2.442)
which satisfies
?x
Since
d = O in R,
(2.443)
a'.7i? = 0 in R,
(2.444)
2 . 3= 0 on S.
(2.445)
? x 8 = 0, we can introduce a scalar potential @ as 7i? = -TQ.
(2.446)
Using Green's second identity [Eq. (2.326)]: (2.447)
with the substitution u = u
= @,
we write
POTENTIAL THEORY
123
When Equation (2.446) is substituted, this becomes
i d 3 . (@Tit)= h d r [-Tit. Tit - a ? . 31. Since the first integral,
A d z . @Tit =
A
do@( G . Tit) ,
(2.449)
(2.450)
is zero because of Equation (2.445) and the integral (2.451) is zero because of Equation (2.444), Equation (2.449) reduces t o
(2.452) Since (312 is always a positive quantity, the only way t o satisfy this equation for a finite volume is to have
5=0,
(2.453)
that is, (2.454) thereby proving the theorem.
2.10.8 Applications of the Helmholtz Theorem Helmholtz theorem says that a vector field is completely and uniquely specified by giving its divergence, curl, and normal component on the bounding surface. When we are interested in the entire space, the bounding surface is usually taken as a sphere in the limit as its radius goes to infinity. Given a vector field, we write its divergence and curl as (2.455) (2.456) where kl and k2 are constants. The terms on the right-hand side, p(?) and T(?),are known functions of position and in general represent sources and current densities, respectively. There are three cases that we analyze separately:
124
VECTOR ANALYSIS
(I) In cases for which there are no currents, the field satisfies (2.45 7) (2.458) We have already shown that when the curl of a vector field is zero, we can always find a scalar potential, @(?), such that
3 = -9@.
(2.459)
Now the second equation [Eq. (2.458)] is satisfied automatically and the first equation can be written as Poisson's equation
a'%
=
-k1p,
(2.460)
the solution of which can be written as (2.461) where the volume integral is over the source variable 7' and 7is the field point. Notice that the definition of scalar potential [Eq. (2.459)] is arbitrary up to an additive constant, which means we are free to choose the zero level of the potential. (11) In cases where p ( 7 ) = 0, the field equations become
9..=0,
(2.462)
a' x 3 = k 2 7 ( 7 ) .
(2.463)
We now use the fact that the divergence of a curl is zero and introduce a vector potential such that
x(?)
3 = 9 X Z .
(2.464)
We have already proven that the divergence of a curl vanishes identically. We now prove the converse, that is: if the divergence of a vector field 3 vanishes identically, then we can always find a vector potential 2 such that its curl + gives 3. Since we want A to satisfy Equation (2.464), we can write
dA3 8x2
dA2 8x3
= 'u1,
(2.465) (2.466) (2.467)
125
POTENTIAL THEORY
Remembering that the curl of a gradient is zero [Eq. (2.320)], we can always + add or subtract the gradient of a scalar function to the vector potential A ,
x x +T h , --f
(2.468)
without affecting the field 3.This gives us the freedom t o set one of the components of 3 to zero. Hence we set A3 = 0, which simplifies Equations (2.465)-(2.467) to (2.469) (2.470) dA2
ax,
dAl 8x2
= 213.
(2.471)
The first two equations can be integrated immediately to yield
I,, 23
A1 =
v2
(2.472)
dx3,
-lo, + x3
A2
=
211
dx3
f2(21,~2),
(2.473)
where f2(x1,x2) is arbitrary a t this point. Substituting these into the third equation [Eq. (2.471)],we obtain (2.474) Using the fact that the divergence of 3 is zero, that is, avl dv2 -+-+---=0, 8x1 ax2
av3
ax3
(2.475)
we can write Equation (2.474) as (2.476) The integral in Equation (2.476) can be evaluated immediately to give
+
a f 2 ( x 1 1 x 2 ) U3(51,22,23) -vuQ(Xl,x2,~03) = ‘k3(21,22,23),
ax 1
which yields
f2(x1,x2)as
the quadrature
(2.477)
126
VECTOR ANALYSIS
Substituting f2 into Equation (2.473) we obtain the vector potential as
Lo3 1:: z3
A1
=
A2
=
(2.480)
UZ(zl,zZ,z3) d53, V3(zlrz2,z03) dzl
-
A3 = 0.
1::
Ul(zl,Z2,23) dx3,
(2.481) (2.482)
In conclusion, given a vector field 3 satisfying Equations (2.462) and + (2.463), we can always find a vector potential A such that
3=a‘XX, where 2 is arbitrary up to Using a vector potential (2.463) as
(2.483)
tb gradient of a scalar function.
A , [Eq. (2.464)], we can now write Equation
a‘ x 3 = kJ(?;’),
(2.484)
X=kJ(T+),
(2.485)
a‘ x a ‘ x
V ( V .2) 322 = k 2 J ’ ( ? ; ’ ) . -
Using the freedom in the choice of
(2.486)
2 we can set
a‘.2=0,
(2.487)
which is called the Coulomb gauge in electrodynamics. The equation t o be solved for the vector potential is now obtained as 7
2
2
= -k,J’(?.).
(2.488)
Since the Laplace operator is linear, each component of the vector potential satisfies Poisson’s equation,
V2Ai= -k2Ji(?;’),
i = 1,2,3;
(2.489)
hence we can write its solution as (2.490) (111) In the general case, where the field equations are given as
a‘.3 = k1p(?;’), a‘ x 3 = kJ(?;’),
(2.491) (2.492)
POTENTIAL THEORY
we can write the field in terms of the potentials @ and
127
2 as
a' X 2.
37 = -a'@+
(2.493)
Substituting this into the first equation [Eq. (2.491)] and using the fact that the divergence of a curl is zero, we obtain
-a' a'@+ a'.(a'x 2)= kip, '
V2@ = -kip.
(2.494) (2.495)
Similarly, substituting Equation (2.493) into the second equation [Eq. (2.492)], we get
a' x (-a'@+ a' x 2)= k 2 7 ( 7 ) , -V x V@+ a' x a' x 2 = k 2 J ( 7 ) , a' (72 ) v2x = k J ( T + ) , -
'
(2.496) (2.497) (2.498)
where we used the fact that the curl of a gradient is zero. Using the Coulomb gauge (q. 2 = 0) and Equation (2.495), we obtain the two equations t o be solved for the potentials as (2.499) (2.500)
2.10.9
Examples from Physics
Gravitation: We have already discussed this case in detail. The field equations are given as
a'.Tj+= -47rGp(7), a'X?j+=O,
(2.501) (2.502)
where p( ?) is the source of the gravitational field, that is, the mass density. Instead of these two equations, we can solve Poisson's equation,
a"@= 47rGp(?'f),
(2.503)
for the scalar potential @, which then can be used to find the gravitational field by + g
Electrostatics:
=-a'@.
(2.504)
128
VECTOR ANALYSIS
In electrostatics the field equations for the electric field are given as
a'.3 = 47rp(?), a'XZ=O.
(2.505) (2.506)
Now, p(?) stands for the charge density and the plus sign in Equation (2.505) nieans like charges repel and opposite charges attract. Poisson's equation for the electrostatic potential is
V2@= -47rp(?), where
z
=
-T@.
(2.507)
(2.508)
Magnet ostat ics: Now the field equations for the magnetic field are given as
T..=O, a'x
(2.509)
3 = "c J ,
(2.510)
where c is the speed of light and J' is the current density. The fact that the divergence of 3 is zero is a direct consequence of the fact that magnetic monopoles do not exist in+ nature. Introducing a vector potential A and with the Coulomb gauge, 3 . A = 0, we can solve
(2.511) and obtain the magnetic field via
Z=VxX.
(2.512)
Maxwell's equations: The tinie-dependent Maxwell's equations are given as
3.23= 4np,
Tx
1 ax z+-0 c at , =
(2.513) (2.514)
T.Z=O,
(2.515)
1az =-J'. 47r a' x 3 --c at c
(2.516)
These equations are coupled and have to be considered simultaneously. We now introduce the potentials @ and 2 such that
(2.517) (2.518)
POTENTIAL THEORY
129
and use the Lorenz gauge: 1aQi
+
--+?.A c at
=O.
(2.5 19)
Hence Maxwell’s equations reduce t o (2.520) (2.521)
Applications of potential theory to electromagnetic theory can be found in Griffiths and Inan & Inan (a,b). Irrotational flow of incompressible fluids: For flow problems the continuity equation is given as
? . 7 + - =dP 0,
at
(2.522)
7
where is the current density and p is the density of the flowing material. In general, the current density can be written as
J
=p 3 ,
(2.523)
where d is the velocity field of the fluid. Hence the continuity equation becomes dP 7. ( p d ) + - = 0. at
(2.524)
For stationary flows, apldt = 0. If we also assume incompressible fluids, that is, p = constant, the continuity equation reduces to
T..=O.
(2.525)
However, from the Helmholtz theorem we know that this is not sufficient to determine the velocity field 3.If we also assume irrotational flow, which means ? X d = O ,
(2.526)
we can introduce the velocity potential Qi: ;ii’ =
?a.
(2.52 7)
Substituting this into Equation (2.525), we obtain Laplace equation
7%= 0.
(2.528)
130
VECTOR ANALYSIS
PROBLEMS
1. Using coordinate representation, show that
(2x 3). (3x 5)= (2.3)(3. 73)- (2.3)(3.3). 2. Using the permutation s l z b o l , show that the i t h component of the cross product of two vectors, A and 3, can be written as 3
3
j=1 k = l
3. Prove the triangle inequality
4. Prove the following vector identity, which is also known as the Jacobi identity:
2 x (2x 3)+2 x (3 x 2)+3 x (2 x 3) =o. 5 . Showthat
(2x 2)x (3x d)= (2.3 x 3 ) Z - ( 2 . 3x 7?)5. + * 6. Show that for three vectors, A , B and essary and sufficient condition is
3,to be noncoplanar the nec-
2 .(3x 73)# 0. 7. Find a parametric equation for the line passing through the points (a) ( 2 , 2 , - 2 ) and ( - 3 , 1 , 4 ) ,
(b) (-1, 4,3) and (4, -3,l). 8. Find the equation of the line orthogonal to i
(a) A = (1, -11,
(b)
3 = (-5,
21,
2 = ( 2 , -l), 7= (4,2).
9. Show that the lines 221 - 3 2 2 =
and
1
2 and passing through 7:
PROBLEMS
131
are not orthogonal. What is the angle between them?
10. Find the equation of the plane including the point normal 3:
3 and
with the
3 = (2,1, -11, 3 = (1, I, 21, (b) ? = (2,3,5), 3 = (-1,1,2).
(a)
11. Find the equation of the plane passing through the following three points: (a) (2,1,1), (4,1, -1) and ( L 2 , 21, (b) (-5, -1,2),(2,1, -1) and ( 3 , -1,2).
12. (a) Find a vector parallel t o the line of intersection of 4x1 - 2x2
+ 2x3 = 2
and
6x1
+ 2x2 + 2x3 = 4.
(b) Find a parametric equation for the line of intersection of the above planes.
13. Find the angle between the planes
14. Find the distance between the point
3x1
2 = (1,1,2) and the plane
+ 2 2 - 3x3 = 2.
15. Let P and Q be two points in n-space. Find the general expression for the midpoint of the line segment joining the two points. 16. If T ( t )and ?(t) are two differentiable vectors, then show that
(a) d?(t)
d
dz+(t)
x ?(t)] = ?(t) x -+ -x ?(t), d t [T(t) dt dt
d
+
- [ T ( t )x x ( t ) ]= 2 ( t ) x dt
Z(t).
132
VECTOR ANALYSIS
17. Given the parametric equation of a space curve, namely, = cost,
21
22
= sint,
23 =
2sin2t,
(a) sketch the curve,
(b) find the equation of the tangent line at the point P with t = 71.13, (c) find the equation of a plane orthogonal to the curve at P, (d) show that the curve lies in the surface
2: - 2;
+ 2 3 = 1.
18. For the following surfaces find the tangent planes and the normal lines at the points indicated: (a)
2::
(b)
z:
+ zi + zz = 6 at (1,1,2), + 2+ 2x: = 2 at (1, I, I ) , 5122
z22: -
19. Find the directional derivative of F ( 2 1 , 2 2 , 2 3 )=
22,2 +z,2
-
2 23
in the direction of the line from (1,2,3) to (3,5,1) at the point ( 1 , 2 , 3 ) .
20. For a general point, evaluate d F l d n for F = zyz, where n is the outer normal to the surface
x; + 22;
+ 42; = 4.
21. Determine the points and the directions for which the change in
f = 2xq
+ 2; + 5 3
is greatest if the point is restricted to lie on x: 22. Prove the following:
+ x ; = 2.
PROBLEMS
133
23. Prove the following properties of the divergence and the curl operators: (a)
d.(X+3)=V.X+V.Z,
(b)
$ . ( + x )= $ $ . x + $ $ . x ,
(c)
dx(X++VxX+$xZ,
(d)
dx
(42)
=4
7 x X + V +x
2.
24. Show that the following vector fields have zero curl and find a scalar function @ such that 3 = q@: (a)
(b)
+ 2yzz&, + y2zZz, 3 = ( 3 2 ’ ~+ z2y)Ez + (z3+ z2z)Zy + 2zxyZz
d
= y2zZz
25. Using the vector field + v = x 2 yze,- - 2x3y3Zv show that
+ xy2zZz,
9 .9 x 3 = 0.
26. If 7;’ is the position vector, show the following:
=.
7..=3,
VtX=O,
(3.7)3. 27. Using the following scalar functions, show that (a) (b)
CP
T X?@
= 0:
= exy cos z ,
1
= (z2
+ +z y2
y 2 .
28. An important property of the permutation symbol is given as 3
EijkEilm
= SjlSkrn
-
Sjrnbkl.
i=l
A general proof is difficult, but check the identity for the following specific values:
j=k=1, j=l=l,
k=m=2.
134
VECTOR ANALYSIS
29. Prove the following vector identities:
30. Write the gradient and the Laplacian for the following scalar fields:
+ = ln(x2 + y2 + z 2 ) ,
(a) (b)
=
(c)
@=
1
(x2
+ y2 + z2)1/2 ’
J2qT
31. Evaluate the following line integrals, where the paths are straight lines connecting the end points:
(b)
y dx
+ x dy.
32. Evaluate the line integral
I = L y 2 d x + x 2 dy over the semicircle centered a t the origin and with the unit radius in the upper half-plane.
33. Evaluate
I over the parabola y
=
=
x2.
34. Evaluate
over a circle of radius 2 .
i;;;)+ y dx
x2 dy
PROBLEMS
135
35. Evaluate the line integral
over the curve y = ex - ex5
36. Evaluate J J ,
+ 22.
2 .2do, where
and S is the portion of the plane 2x + 2y octant and 2 is the unit normal to S.
+ z = 6 included in the first
37. Evaluate
over y
=
x2 + 2x
-
2.
38. Evaluate
where C is the square with the vertices (1,l),(-1, l),(-1, -l), (1, -1).
39. Evaluate the line integral
I
=
y2dx
+x2dy
over the full circle x2 + y2 = 1.
40. Evaluate over the indicated paths by using Green’s theorem:
41. Evaluate
(a)
I = f c y 2 d z + x y dy, x 2 + y 2
(b)
I = jC(2z3 - y3) dx
(c)
1 = fc f(z) dx
= 1,
+ (x3+ 2y3) d y ,
+ g(y) d y ,
x2
+ y2 = 1,
any closed path.
136
VECTOR ANALYSIS
where 4
v =
( 2+ y2)Zz + 2xyZy
over y = x3 from (O,O) to ( I , 1).
42. Use Green’s theorem to evaluate
I = jhcvnds, where
2 = ( 2+ $)ZZ and C is the circle x2
+ 2zyZy
+ y2 = 2. +
43. Given the vector field 2 = -3yZZ 2zZy line integral by using Stokes’s theorem:
+ Z2,evaluate the following
where C is the circle x2 + y2 = 1, z = 1.
44. Using Stokes’s theorem, evaluate
+
h [ y 2 d z z2dy
+ x’dz],
where C is the triangle with vertices at (O,O, 0), (0, a,0) and (O,O, a) 45. Usc Stokes’s theorem to evaluate
I = {8xy2z dx
+ 8x2yz dy + (4x2y2
around the path x = cost, y = sint, z
= sint,
-
22) d z
where t E [ 0 , 2 ~ ] .
46. Evaluate the integral id*.?? for the surface of a sphere with radius R and centered at the origin in two different ways. Take 7? as (a)
+ v = zZZ +yey + z Z z ,
PROBLEMS
137
47. Given the temperature distribution 2 T(z1,52,53)= z 1
+ 22122 +
2;23,
(a) determine the direction of heat flow at ( 1 , 2 , l),
(b) find the rate of change of temperature at (1,2,2) in the direction of h
e2
+ Z3.
48. Evaluate f(22
-
y
+ 4) dx + (5y + 32
-
6) dy
around a triangle in the zy-plane with the vertices at (O,O), (3,0), ( 3 , 2 ) traversed in the counterclockwise direction. 49. Use Stokes’s theorem to evaluate
over x3 = 9
-
2 21 -
x; 2 0.
50. Obtain Green’s second identity:
51. Evaluate the following integrals, where S is the surface of the sphere z2 y2 z 2 = a2 :
+ +
f S
[z3 cos
oz,n + y3 c o ey,n ~ + 2 3 cos e,,,]
do
and
For the vector field in the second part plot on the zy-plane and interpret your result.
138
VECTOR ANALYSIS
52. Verify the divergence theorem for +
+
A = ( 2 2 ~ z)Ez
+ g2Ev
-
(X
+ 3y)E2
taken over the region bounded by the planes 2 ~ + 2 y + z = 6 , z=O, y=O, z = O . 53. Prove that
is a conservative field and find the work done between (3, -2,2) and -1).
54. Without using the divergence theorem, show that the gravitational force on a test particle inside a spherical shell of radius R is zero. Discuss your answer using the divergence theorem. 55. Without using the divergence theorem, find the gravitational field outside a uniform spherical mass of radius R. Repeat the same calculation with the gravitational potential and verify your answer obtained in the first part. Interpret your results using the divergence theorem. 56. Without using the divergence theorem, find the gravitational field for an internal point of a uniform spherical mass of radius R. Repeat the same calculation for the gravitational potential and verify your answer obtained in the first part. Discuss your results in terms of the divergence theorem.
57. Assume that gravitation is still represented by Gauss’s law in a universe with four spatial dimensions. What would be Newton’s law in this universe? Would circular orbits be stable? Note: You may ignore this problem. It is an advanced but fun problem that does not require a lot of calculation. However, if you want to attempt it, you may want to read Goldstein, Poole, and Safko on central forces first.
Hint: The surface area of a sphere in four dimensions is 2n2R3.In three dimensions it is 4nR2.
CHAPTER 3
GENERALIZED COORDINATES AND TENSORS
Scalar quantities are defined a t a point by just giving a single number. Hence they have only magnitude. Vector quantities are geometrically defined as directed line segments, which have both magnitude and direction. By assigning a vector to each point in space we obtain a vector field. Similarly, a scalar field is defined. Field concept is one of the most fundamental concepts of theoretical physics. In working with scalars or vectors, it is important that we first choose a suitable coordinate system. A proper choice of coordinates, one that reflects the symmetries of the physical system, simplifies the algebra and the interpretation of the solution significantly. In this chapter, we start with Cartesian coordinates and their transformation properties. We then show how a generalized coordinate system can be constructed from the basic principles and discuss general coordinate transformations. The definition of vectors with respect to their transformation properties brings new depths into their discussion and takes us beyond their geometric interpretation as directed line segments. This allows us t o introduce more sophisticated objects called tensors, where vectors and scalars appear only as special cases. We finally conclude with a detailed discussion of cylindrical and spherical coor-
Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
8. Selquk Bayin 139
140
GENERALIZED COORDINATES AND TENSORS
Figure 3.1
Orthogonal transformations.
dinate systems, which are among the most frequently encountered coordinate systems in applications. 3.1
TRANSFORMATIONS B E T W E E N CARTESIAN COORDINATES
Transformations between Cartesian coordinates that exclude scale changes,
x'
= kx, k = constant,
(3.1)
are called orthogonal transformations. They preserve distances and magnitudes of vectors. There are basically three classes of orthogonal transformations. The first class involves translations, the second class is rotations, and the third class consists of reflections (Fig. 3.1). Translations and rotations can be generated continuously from an initial frame; hence they are called proper transformations. Since reflection cannot be accomplished continuously, they are called improper transformations. A general transformation is usually a combination of all the three types. 3.1.1
Basis Vectors and Direction Cosines
We now consider orthogonal transformations with a common origin (Fig. 3.2). To find the transformation equations, we write the position vector in two frames in terms of their respective unit basis vectors as
and
TRANSFORMATIONS BETWEEN CARTESIAN COORDINATES
Figure 3.2
141
Orthogonal transformations with one point fixed.
Note that the point P that the position vector, 7 ,represents exists independent of the definition of our coordinate system. Hence in Equation (3.3) we have written 7instead of 7’. However, when we need to emphasize the coordinate system used explicitly, we also write 7’. In other words, Equations ( 3 . 2 ) and (3.3) are just two different representations of the same vector. Obviously, there are infinitely many choices for the orientation of the Cartesian axes that one can use. To find a mathematical dictionary between them, that is, the transformation equations, we write the components of ?;’ in terms of -i e, as
which, after using Equation ( 3 . 2 ) , gives
These are the transformation equations that allow us to obtain the coordinates in terms of the primed system given the coordinates in the unprimed system. These equations can be conveniently written as
c(q Zy) 3
2::=
.
j=1
zj,
i = 1,2,3.
(3.10)
142
GENERALIZED COORDINATES AND TENSORS
Figure 3.3
Direction cosines for rotations about the zs-axis.
The coefficients (< . Zj) are called the direction cosines and can be written as u 23- -- ( ~ ’ , Z j ) = c o s Q z ji,= l , 2 , 3 ,
(3.11)
where $ i j is the angle between the i t h basis vector of the primed system and the j t h basis vector of the unprimed system. For rotations about the z3-axis (Fig. 3.3), we can write a i j , i = 1,2,3, as the array 5’3 = aij,
=(
i = 1,2,3,
(3.12)
(Z1 . Zl) (Z1 .22)
(3. Z 1 ) 0
0
(& . Z 2 ) 0
(3.13)
0 (3.14)
0 cosQ
sin0
0 (3.15)
3.1.2 Transformation Matrix and the Orthogonality Relation General linear transformations between Cartesian coordinates can be written as (3.16)
TRANSFORMATIONS BETWEEN CARTESIAN COORDINATES
143
where the square array
s = uij, i = 1 , 2 , 3 ,
( it; ;;; ) a12
-
(3.17)
a13
@;
(3.18)
is called the transformation matrix. Let us now write the magnitude of ?;‘ in the primed system as (3.19) i=l
Using the transformation equations, we can write as
17-1
2
in the unprimed system
3 UijtXjl i=l 3
3
3
1
(3.20)
i=l j = 1 j’=l
Rearranging the triple sum, we write (3.22) Since the orthogonal transformations preserve magnitudes of vectors, the transformation matrix has t o satisfy 3
= tijj,, j , j ‘ = 1,2,3,
CUijUij‘ i= 1
(3.23)
which is called the orthogonality condition. Equation (3.22) now gives
Cz’..’. + x? + x’3” 3
7. 7=
3
3
=
(3.24)
j=1
cc 3
=
3
Sjj’XjXj!
= c x j x j = 2 21 j=1
+ x 2 + 2 32 .
(3.25)
(3.26)
144
GENERALIZED COORDINATES AND TENSORS
3.1.3
Inverse Transformation Matrix
For the inverse transformation we need to express xi in terms of x:. This can only be done when the determinant of the transformation matrix does not vanish:
(3.27) Writing the orthogonality relation explicitly as
( :t
a13
zzi z:i ) ( %:f a23
a33
a12
a13
a22
a23
a32
a33
) ( =
1 0 0 0 1 0 0 0 1
)
,
(3.28)
we can evaluate the determinant of S. The determinant of the left-hand side is the multiplication of the determinants of the individual matrices. Using the fact that interchanging rows and columns of a matrix does not change the value of its determinant, we can write (det S)2= I,
(3.29)
= fl.
(3.30)
which yields det S
The negative sign corresponds to improper transformations, hence the determinant of the transformation matrix provides a convenient tool t o test whether a given transformation involves reflections or not. For a formal inversion we multiply the transformation equation, 3
x:=
C u . . x . i = 1,2,3, a3
3,
(3.31)
j=1
with aiJf and sum over i to write
(3.32)
(3.33) Substituting the orthogonality relation for the sum inside the square brackets we get 3
i=l
3 j=1
= xjt, j ' = 1,2,3,
(3.35)
CARTESIAN TENSORS
145
which, when written explicitly, gives 21
= a112; f
a21.k
22
= a122; f
0422;
23
= a135:
+ a232;
+ + +
a315;,
(3.36)
a32xk,
(3.37) (3.38)
a334
We now write the inverse transformation matrix as
[ ::: 2: 2: 1 all
s-1 =
a21
a31
(3.39) '
Comparing with S in Equation (3.18), it is seen that
-
s-l = s,
(3.40)
s
where is called the transpose of S, which is obtained by interchanging the rows and the columns in S. In summary, the inverse transformation matrix for orthogonal transformations is just the transpose of the transformation matrix. For rotations about the q - a x i s [Eq. (3.15)],the inverse transformation matrix is written as cosd
-sin8
0 (3.41)
Note that S ,' corresponds t o a rotation in the opposite direction by the same amount, that is, S,-'(O) = S3(-O).
3.2
(3.42)
CARTESIAN T E N S O R S
So far we have discussed the transformation properties of the position vector + r . We now extend this to an arbitrary vector, 3, as -1
21
ST,
(3.43)
where S = u Z g ,i , j = 1 , 2 , 3 , is the orthogonal transformation matrix [Eq. (3.18)]. In other words, a given triplet of functions,
caririot he used to define a vector;
146
GENERALIZED COORDINATES AND TENSORS
unless they transform as
(3.46) Under the orthogonal transformations a scalar function, @(xi,x2,x3), transforms as
In the new coordinate systerrl, @ will naturally have a different functional dependence. However, the values that Q, assumes at each point of space remain the same. It is for this reason that in Equation (3.48) we have written @ instead of @ I . In order to indicate the coordinate system used, we may also write @ I . Since temperature is a scalar quantity, its value at a given point does not depend on the coordinate system used. A different choice of coordinate system assigns different coordinates (codes) to each point:
however, the numerical values of the temperature at each point remain the same. We now write the scalar product of two vectors, 2 and 3, in the primed coordinates by using the scalar product written in the unprimed coordinates, that is,
(3.50) = ZlYl
+ 52y2 + 23y3.
(3.51)
Using the orthogonal transformations,
(3.52) wc write 2
‘
3 as (3.53)
(3.54)
CARTESIAN TENSORS
Using the orthogonality relation [Eq. (3.23)]: s S
=
147
S s = I , this becomes (3.55)
(3.56) (3.57) (3.58) In other words, the orthogonal transformations do not change the value of a scalar product. Properties of physical systems that preserve their value under coordinate transformations are called invariants. Identification of invariants in the study of nature is very important and plays a central role in both special and general theories of relativity. In the previous chapter we have defined vectors with respect to their geometric and algebraic properties. Definition of vectors with respect to their transformation properties under orthogonal transformations brings new levels into the subject and allows us to free the vector concept from being just a directed line segment drawn in space. Using the transformation properties, we can now define more complex objects called tensors. Tensors of second rank, Tij , are among the most commonly encountered tensors in applications and have two indices. Vectors, vi, have only one index and they are tensors of first rank. Scalars, @, which have no indices, are tensors of zeroth rank. In general, tensors of higher ranks are written with the appropriate number of indices as T=Tijkl...,
i , j , k,... =1,2,3.
(3.59)
Each index of a tensor transforms like a vector:
(3.60) (3.61)
(3.62)
(3.63) etc.
148
GENERALIZED COORDINATESAND TENSORS
Tensors of second rank can be conveniently represented as 3 x 3 square matrices: (3.64) Definition of tensors can be easily extended to n dimensions by taking the range of the indices from 1 to n. As we shall see shortly, tensors can also be defined in general coordinates. For the time being, we confine our discussion to Cartesian tensors, which are defined with respect to their transformation properties under orthogonal transformations. 3.2.1
Algebraic Properties of Tensors
Tensors of equal rank can be added or subtracted term by term and the result does not depend on the order of the tensors: For example, if A and B are two second-rank tensors, then their sum is
A +B = B +A Cij
= Aij
= C,
+ Bij, i , j = 1 , 2 , 3 .
(3.65) (3.66)
Multiplication of a tensor with a scalar, a , is accomplished by multiplying all the component of that tensor with the same scalar. For a third-rank tensor, A, we can write
CYA = aAijk, i , j , k = 1 , 2 , 3 .
(3.67)
From the basic properties of matrices, second-rank tensors do not commute under multiplication. That is,
AB # BA,
(3.68)
A ( B C )= (AB)C,
(3.69)
however, they associate:
where A, B , C are second-rank tensors. Antisymmetric tensors satisfy
Aij = -Aji, i,j = 1 , 2 , 3 ,
(3.70)
or
-
A = -A,
(3.71)
where is called the transpose of A , which is obtained by interchanging the rows and columns. Note that the diagonal terms, All, A22, A33, of an
149
CARTESIAN TENSORS
antisymmetric tensor are necessarily zero. If we set i = j in Equation (3.70) we obtain (3.72) (3.73) (3.74) Symmetry and antisymmetry are invariant properties. If a second-rank tensor, A , is symmetric in one coordinate system,
Aij = Aji, i , j
=
(3.75)
1,2,3,
then A' is also symmetric. We first write 3
3
(3.76) Since the components, a i j , are constants or in general scalar functions, the order in which they are written in equations do not matter. Hence we can write Equation (3.76) as 3
3
A:j =
i , j = 1,2,3.
(3.77)
i'=l j ' = l
Using the transformation property of second-rank tensors [Eq. (3.62)], for a symmetric second-rank tensor, Aij , this implies i,j=1,2,3.
(3.78)
A similar proof can be given for the antisymmetric tensor. Any second-rank tensor can be written as the sum of a symmetric and an antisymmetric tensor:
+ Using the components of two vectors, d and b , we can construct a secondrank tensor A as
A=
(
albl
alb2
alb3
a2b1
ad2
a2b3
a361
a3b2
a3b3
1
,
(3.80)
which is called the outer product or the tensor product of d and and it is shown as A = Z T
3,
(3.81)
150
GENERALIZED COORDINATES AND TENSORS
or as
A = Z t T .
(3.82)
To justify that A is a second-rank tensor, we show that it obeys the correct transformation property; that is, it transforms like a second-rank tensor: A 2.7! . = afb'. z 3
(3.83) (3.84)
(3.85)
i'=l j'=]
One can easily check that the outer product defined as T@Z is the transpose of A. We remind the reader that even though we can construct a secondrank tensor from two vectors, the converse is not true. A second-rank tensor cannot always be written as the outer product of two vectors. Using the outer product, we can construct tensors of higher rank from tensors of lower rank:
(3.87) (3.88) (3.89)
where the indices take the values 1,2,3. For a given vector, there is only one invariant, namely, its magnitude. All the other invariants are functions of the magnitude. For a second-rank tensor there are three invariants, one of which is the spur or the trace, which is defined as the sum of the diagonal elements:
We leave the proof as an exercise but note that when A can be decomposed as the outer product of two vectors, the trace is the inner product of these vectors. We can obtain a lower-rank tensor by summing over pairs of indices. This operation is called contraction. Trace is obtained by contracting the two indices of a second-rank tensor as 3
trA
=
XAii. i=l
(3.91)
151
CARTESIAN TENSORS
Other examples of contraction are 3
(3.92) i= 1
(3.93) etc. We can generalize the idea of inner product by contracting the indices of a tensor with the indices of an other tensor: 3
bi =
C
i
1,2,3,
(3.94)
Tz-j k.Aj.k , i = 1 , 2 , 3 ,
(3.95)
Tijaj,
=
j=1
xx 3
ai =
3
j=1 k = l 2
2
(3.96) 3
a
3
=
(3.97)
~ i j ~ i j ,
2=1 J = l etc. The rank of the resulting tensor is equal to the number of the free indices, that is, the indices that are riot summed over. Free indices take the values 1, 2 , or 3. In this regard, we also write a tensor, say Tz3,i , j = I, 2,3, as simply T L JThe . indices that are summed over are called the dummy indices. Since dummy indices disappear in the final expression, we can always rename them.
3.2.2
Kronecker Delta and the Permutation Symbol
To check the tensor property of the Kronecker delta, we use the transformation equation for the second-rank tensors, 3
3
(3.98) with ZlJ/= 62fjfand use the orthogonality relation [Eq. (3.2311 to write 3
3
3
= saj.
3
(3.100)
152
GENERALIZED COORDINATES AND TENSORS
In ot,her words, the Kronecker delta is a symmetric second-rank tensor that transforms into itself under orthogonal transformations. It is also called the identity tensor, which is shown as I. Kronecker delta is the only tensor with this property. Permutation symbol, also called the Levi-Civita symbol, is defined as
EIJk
=
i
0 1 -1
when any two indices are equal. for even (cyclic) permutations: 123, 231, 312. for odd (anticyclic) permutations: 213, 321, 132.
(3.101)
Using the permutation symbol we can write a determinant as
(3.102)
(3.103)
(3.104) Interchange any two of the indices of the permutation symbol in [Eq. (3.103)], tlic determinant changes sign. This operation is equivalent to interchanging the corresponding rows and columns of a determinant. We now write the determinant of the transformation matrix, a i ~ j /as ,
Reiianiing the dummy indices: i -j,
j
+i,
(3.106) (3.107)
Equation (3.105) becomes det a2/3f= -
a2zaIja3kEtjk.
(3.108)
$3k
From Equation (3.30) we know that the determinant of the orthogonal transformation matrix is det az,jl = ~ 1hence , the component ~ 2 1 3transforms as (F1)&213 =
a2zalja3kEzjk. 2.7
k
(3.109)
CARTESIAN TENSORS
153
Similar arguments for the other components yields the transformation equation of &lmn as
(3.110) The niinus sign is for the improper transformations. In summary, & i j k transforms like a third-rank tensor for proper transformations, and a minus sign has to be inserted for improper transformations. Tensors that transform like this are called tensor densities or pseudotensors. Note that aside from the ~1 factor, E i j k has the same constant components in all Cartesian coordinate systems. Permutation symbol is the only third-rank tensor with this property. An important identity of & i j k is 3
Permutation symbol also satisfies
for the cyclic permutations of the indices. For the anticyclic permutations we write
Example 3.1. Physical tensors: Solid objects deform under stress to a certain extent. In general, forces acting on a solid can be described by a second-rank tensor called the stress tensor:
Components of the stress tensor represent the forces acting on a unit test area when the normal is pointed in various directions. For example, t i j is the ith component of the force when the normal is pointing in the j t h direction. Since the stress tensor is a second-rank tensor, it transforms as 3
3
k = l 1=1
The amount of deformation is also described by a second rank tensor, u i j , called the strain tensor. The stress and the strain tensors are related by the equation 3 tij
3
=
Cijklgkl, k = l 1=1
(3.116)
154
GENERALIZED COORDINATES AND TENSORS
where the fourth-rank tensor C i j k l represents the elastic constants of the solid. This is the most general expression that relates the deformation of a three-dimensional solid to the forces acting on it. For a long and thin solid sample, with cross section AA and with longitudinal loading F , Equation (3.116) reduces to Hook’s law: (3.117)
t
= Ycr,
where t is the force per unit area, Al/l, and Y is Young’s modulus.
(3.118)
is the fractional change in length,
Many of the scalar quantities in physics can be generalized as tensors of higher rank. In Newton’s theory, mass of an object is defined as the proportionality constant, m, between the force acting on the object and the acceleration as
Fi
= mai.
(3.119)
Mass is basically the ability of an object to resist acceleration, that is, its inertia. It is an experimental fact that mass does not depend on the direction in which we want to accelerate an object. Hence it is defined as a scalar quantity. In some effective field theories, it may be advantageous to treat particles with a mass that depends on direction. In such cases we can introduce effective mass as a second-rank tensor, mij, and write Newton’s second law as 3
Fi
=
1
mij aj
,
(3.120)
j=1
When the mass is isotropic,
mij
becomes
ma3. . - m&. a3 i
(3.121)
thus Newton’s second law reduces to its usual form.
3.3 3.3.1
GENERALIZED COORDINATES Coordinate Curves and Surfaces
Before we introduce the generalized coordinates, which are also called the curvilinear coordinates, let us investigate some of the basic properties of the Cartesian coordinate system from a different perspective. In a Cartesian coordinate system at each point there are three planes defined by the equations
x 1 = c1, x 2
= c2,
x3 = c3.
(3.122)
GENERALIZED COORDINATES
Figure 3.4
155
Coordinate surfaces and coordinate curves in Cartesian coordinates.
These planes intersect at the point ( e l ,c2, c 3 ) , which defines the coordinates of that point. In this section we start by writing the coordinates with an upper index as xi. There is no need for alarm: As far as the Cartesian coordinates are concerned there is no difference, that is, zz = xi. However, as we shall see shortly, this added richness in our notation is absolutely essential when we introduce the generalized coordinates. Treating c1, c2, c3 as parameters, the above equations define three mutually orthogonal families of surfaces, each of which is composed of infinitely many nonintersecting parallel planes. These surfaces are called the coordinate surfaces on which the corresponding coordinate has a fixed value (Fig. 3.4). The coordinate surfaces intersect along the coordinate curves. For the Cartesian coordinate system these curves are mutually orthogonal straight lines called the coordinate axes (Fig. 3.4). Cartesian basis vectors, E l , E 2 , E 3 , are defined as the unit vectors along the coordinate axes. A unique property of the Cartesian coordinate system is that the basis vectors point in the same direction at every point in space (Fig. 3.5). We now introduce the generalized coordinates, where the coordinate surfaces are defined in terms of the Cartesian coordinates ( x 1 , x 2 , x 3as ) three single-valued continuous functions with continuous partial derivatives: (3.123) (3.124)
(3.125) Treating ,&, Z3 as continuous variables, these give us three families of surfaces, where each family is composed of infinitely many nonintersecting surfaces (Fig. 3.6). Using the fixed values that these functions, Zi(xl, x2,z3),
156
GENERALIZED COORDINATES AND TENSORS
Figure 3.5 direction.
i
=
Basis vectors in Cartesian coordinates always point in the same
1,2,3, take on these surfaces, we define the generalized coordinates
(z' ,z2,z3)as (3.126) (3.127) (3.128)
Note that these equations are also the transformation equations between the Cartesian coordinates (zl,x2,z3)and the generalized coordinates (Z', Z 2 , T 3 ) . For the new coordinates to be meaningful, the inverse transformations, xz = Xi@):
(3.129) (3.130) (3.131)
should exist. In Chapter 1, we have seen that the necessary and the sufficient condition for the inverse transformation to exist, Jacobian of the transformation has t o be different from zero. In other words, for a one-to-one
GENERALIZEDCOORDINATES
Figure 3.6
157
Coordinate surfaces in generalized coordinates for T1
correspondence between (z', x2,x3) and (Z' ,z2, Z3)we need to have
J=
d ( d ,22,z3)
(3.132)
a(z',' 2 , 2 3 )
(3.133)
or since J K = 1.
(3.134)
For the coordinate surfaces given as Z1 =Zl(z',z2 , 23 )
x3) = c2,
2 2 = :2(z1,22, -3--3
x
-z
1
=c1,
2
3
(z ,z, 5 ) = z 3 ,
(3.135) (3.136) (3.137)
t,he intersection of the first two, Z1(x1,x2,x3) = and Z2(x1,x2,x3) = defines the coordinate curve along which Z3 varies (Fig. 3.7). We refer to this as the Z 3 curve, which can be parameterized in terms of z3as (2'(T3), x2(Z3), z3(Z3)) . Similarly, two other curves exist for the Z1and the x2 coordinates. These curves are now the counterparts of the coordinate axes in Cartesian coordinates. -
c2,
158
GENERALIZED COORDINATES AND TENSORS
Figure 3.7
Generalized coordinates
Wc now define the coordinate basis vectors, TI,?^, 2 3 , in terms of the Cartesian unit basis vectors (?I,&, &.) as the tangent vectors: (3.138)
8x1,
ax2,.
8x3,
z2 + 7ze 2 + -e3, z2 8x3, 8x1, ax2, e 3 = -el + -e2 + -e3. z3 E3 z3
j
e
2 = -el
j
(3.139) (3.140)
Note that TZare in general neither orthogonal nor unit vectors. In fact, their magnitudes,
(3.141)
as well as their directions depend on their position. We define unit basis vectors in the direction of as
(3.142)
3
Coordinate basis vectors, e i , point in the direction of the change in the position vector, when we move an infinitesimal amount along the 52 curve. In other words, it is the tangent vector to the ZZ curve at a given point. We can now interpret the condition, J # 0, for a legitimate definition of generalized
GENERALIZED COORDINATES
Figure 3.8
159
Covariant and contravariant components.
coordinates. We first write the Jacobian, J , as
8x1
ax2
ax3
1
J = det
Remembering that the triple product $1 . ($2 x ? 3 ) is the volume of the parallelepiped with the sides $1,$2, and $ 3 , the condition J # 0 for a legitimate definition of generalized coordinates means that the basis vectors have to be noncoplanar.
3.3.2 Why Upper and Lower Indices Consider a particular generalized coordinate system with oblique axis on the plane (Fig. 3.8). We now face a situation that we did not have with the Cartesian coordinates. We can define coordinates of a vector in two different ways, one of which is by drawing parallels t o the coordinate axes and the other is by dropping perpendiculars to the axes (Fig. 3.8). In general, these two methods give different values for the coordinates. Coordinates found by drawing parallels are called the contravariant components, and we write them with an upper index as ui.Now the vector 3 is expressed as + a = al a2 s2, (3.144)
s1 +
where g1 and 2 2 are the unit basis vectors. Coordinates found by dropping perpendiculars to the coordinate axes are called the covariant components. They are written with a lower index as ail and their values are obtained as Ul
=
i2 .&,
a2
=2
A
.&.
(3.145)
160 3.4
GENERALIZED COORDINATES AND TENSORS
GENERAL TENSORS
Geometric interpretation of the covariant and the contravariant components demonstrates that the difference between the two types of coordinates is, in general, real. As in the case of Cartesian tensors, we can further enrich the concept of scalars and vectors by defining them with respect to their transformation properties under general coordinate transformations. We write the transformation equations between the Cartesian coordinates, xi = (xl, x2,x3), and the generalized coordinates, T i = (T1,T2,T3), as
Similarly, we write the inverse transformations as xi =
xy7J+).
(3.147)
Note that each one of the above equations [Eqs. (3.146) and (3.147)] correspond to three equations for i = 1,2,3. Even though we write our equations in three dimensions, they can be generalized to n dimensions by simply extending the range of the indices to n. Using Equation (3.146), we can write the transformation equation of the coordinate differentials as (3.148) For a scalar function, @(xz), we can write the transformation equation of its gradient as (3.149) We now generalize these to all vectors and define a contravariant vector as a vector that transforms like d z j as (3.150) and define a covariant vector as a vector that transforms like the gradient of a scalar function: (3.151) Analogous to Cartesian tensors, a second-rank covariant tensor, Tij, is defined as (3.152)
GENERAL TENSORS
161
Tensors with contravariant and mixed indices are also defined with respect to their transformation properties as (3.153)
(3.154) Note that the transformation equations between the coordinate differentials [Eq. (3.148)] are linear, that is, El
dZ1 = - dx'
8x1
z2
z1dx2 + El +dx3, 8x2 8x3
(3.155)
E2 2 E2 3 +-dx +-dx, (3.156) 8x2 8x3 E3 E3 E3 dZ3 = - dx' - dx2 + - dx3, (3.157) 8x1 8x2 8x3 hence the elements of the transformation matrix, A , in V = Av [Eq. (3.151)] are given as 1
dZ2=-dx
8x1
+
m
A=A2=-=
(3.158)
8x3
- - -
8x1 8x2 8x3 If we apply this to orthogonal transformations between Cartesian coordinates defined in Equation (3.10), we obtain the components of the transformation matrix as
Ai. 3 = A a3 . . - S.. 22 - c osQ 23.
1
(3.159)
where d i j are the direction cosines and we have used the fact that for Cartesian coordinates covariant and the contravariant components are equal. Using the inverse transformation (3.160) in Equation (3.148), we write
3
(3.161) k=l
162
GENERALIZED COORDINATESAND TENSORS
to obtain the relation
(3.162)
In general, we write the transformation matrix, A, and the inverse transformation matrix, '21, as
(3.163)
respectively, which satisfy the relation
3
(3.164) j=1
One should keep in mind that even though for ease in comparison we have identified the variables xi as the Cartesian coordinates and we will continue to do so, the transformation equations represented by the transformation matrix in Equation (3.158) could represent any transformation from one generalized coordinate system into another. We can also write the last equation [Eq. (3.164)] as
(3.165)
thus showing that '21 is the inverse of A = A;.. If we apply Equation (3.163) to the orthogonal transformations between Cartesian coordinates [Eq. (3.31)] and their inverse [Eq. (3.35)],we see that
-
-
A = A.
(3.166)
GENERAL TENSORS
163
We can now summarize the general transformation equations as 3 j=1 3
(3.168)
51 = - p ( v , ,
T,
-2
3.4.1
=
cc 3
3
A$T$
(3.171)
Einstein Summation Convention
From the above equations, we observe that whenever an index is repeated with one up and the other one down, it is summed over. We still have not shown how to raise or lower indices but from now on whenever there is a summation over two indices, we agree to write it with one up and the other down and omit the summation sign. It does not matter which index is written up or down. This is called the Einstein summation convention. Now the above transformation equations and their inverses can be written as
(3.172)
A general tensor with mixed indices is defined with respect t o the transformation rule (3.173) To prove the tensor property of the Kronecker delta under general coordinate transformations, we use Equation (3.164) to write (3.174) = &A:.‘
(3.175)
= 6,; ..
(3.176)
164
GENERALIZED COORDINATES AND TENSORS
Hence 6; is a second-rank tensor and has the same components in generalized coordinates. It is the only second-rank tensor with this property. Algebraic propcrties described for the Cartesian tensors are also valid for general tensors. 3.4.2
Line Element
We now write the line element in generalized coordinates, which gives the distance between two infinitesimally close points. We start with the line element in Cartesian coordinates, which is nothing but Pythagoras’ theorem, which can be written in the following equivalent forms:
d.5’ = d 7 .d 7 = (dx’)2+ (dx’))”+ (dx3)’
(3.177)
3
=Cdxkdxk
(3.178) (3.179)
Using the inverse transformation (3.180) and the fact that ds is a scalar, we write the line element in generalized coordinates as 3
3
(ts’ =
ds2 = C d x k d z k=
axk C axk dEa--EJ dZJ
(3.181)
k=l
k=l
(3.182)
3.4.3
Metric Tensor
We now introduce a very important tensor, that is, the metric tensor, which is defined as
gij,
3
(3.183) k=l
Note that the sum over k is written with both indices up. Hence, even though we still adhere to the Einstein summation convention, for these indices we keep the sumniation sign. The metric tensor is the singly most important second-rank tensor in tensor calculus and general theory of relativity. Now the line element in generalized coordinates becomes
GENERAL TENSORS
165
Needless to say, components of the metric tensor in Equation (3.184) are all expressed in terms of the barred coordinates. Note that in Cartesian coordinates the metric tensor is the identity tensor, gzj = szj;
(3.185)
thus the line element in Cartesian coordinates becomes ds2 = 6ijdxadxj =
(dx’)’
(3.186)
+ (dx’)’ + ( d x 2 ) ’ .
(3.187)
3.4.4 How to Raise and Lower Indices Given an arbitrary contravariant vector
vj, let
us find how
[gijvj]
(3.188)
transforms. Using Equation (3.172), we first write (3.189) (3.190) and then substitute them into Equation (3.188) to get ., [ g i j d ] = A: A: A: [ijztjtVk] .I-’
= A:’ =
A:’
= A:’
(3.191)
[Aj’;iJ,] [gz,j,5k]
(3.192)
[s:]
(3.193)
[gi,j,~k]
[gzrk~jlc] .
(3.194)
Renaming the dummy variable Ic on the right-hand side as k+.i,
(3.195)
we finally obtain
[gijvj] = A:’ [?ji,jEj] .
(3.196)
Comparing with the corresponding equation in Equation (3.172), it is seen that gLjvJ transforms like a covariant vector. We now define the covariant component of vj as
vi = gajv3.
(3.197)
We can also define the metric tensor with the contravariant components as (3.198)
166
GENERALIZED COORDINATES AND TENSORS
where
(3.199) Note that in the above equations, in addition to the summation signs that come from the definition of the metric tensor. the Einstein summation convention is still in effect. Using the symmetry of the metric tensor, we can also write Slkgkl'
-
g11' = 6, 1' .
(3.200)
We now have a tool that can be used to raise and lower indices at will: T2,
= gz,,Ti1,
(3.201)
A'" 23
= gkk'Az3k',
(3.202)
c,, 3
-
g33 k z l g k k ) C ; : k t ,
(3.203)
etc. Metric tensor in Cartesian coordinates is 6", Using Equations (3.158) and (3.172), we can show that under the general coordinate transformations it transforms into the metric tensor:
(3.204) 3
=
C Ai A j -2
.
,
-2
.
I
(3.205)
(3.206) (3.207)
3.4.5 Metric Tensor and the Basis Vectors If we remember the definition of the basis vectors [Eqs. (3.138)-(3.140)],
GENERAL TENSORS
dXk 2 .- *z'
167
i = 1,2,3,
(3.208)
ax2ax3, + -e2 + rn zz
(3.209)
1 -
8x1,
= -el
which are tangents to the coordinate curves (Fig. 3.7),we can write the metric tensor as the scalar product of the basis vectors:
(3.210) (3.211) Note that the basis vectors 3 i are given in terms of the unit basis vectors of the Cartesian coordinate system Zi.Similarly, using the definition of the metric tensor with the contravariant components,
(3.212)
we can define the new basis vectors
$2
as
(3.2 13) which allows us to write the contravariant metric tensor as the scalar product
The new basis vectors, 22,are called the inverse basis vectors. Note that neither of the basis vectors, 2i or are unit vectors and the indices do not refer to their components. Inverse basis vectors are actually the gradients:
Ti,
Hence they are perpendicular t o the coordinate surfaces, while Ti are tangents to coordinate curves. Usage of the upper or the lower indices for the basis vectors is justified by the fact that these indices can be lowered or raised
168
GENERALIZED COORDINATES AND TENSORS
by the metric tensor as
(3.216) (3.2 17) 3
(3.2 18) k=l
(3.219) (3.220) Similar1y,
(3.221) (3.222)
(3.223) (3.224) (3.225) 3.4.6
Displacement Vector
In generalized coordinates the displacement vector between two infinitesimally close points is written as
(3.226) -
Ti&i
(3.22 7 )
+a
= &lTfl+c L z 2 z ) z
3 2 3
(3.228)
Using the displacement vector [Eq. (3.228)], we can write the line element as
(3.229) (3.230) (3.231)
GENERAL TENSORS
If we move along only on one of the coordinate curves, say covered is
?El,
169
the distance
Similarly, for the displacements along the other axes we obtain
For a general displacement we have to use the line element [Eq. (3.231)]. For orthogonal generalized coordinates, where
(Ti. Tj) = 0, 2 # j ,
(3.235)
the metric tensor has only the diagonal components and the line element reduces to
+ ds$ + ds$ (3.236) = 911 + g22 (a”)’ + 933 ( & 3 ) 2 (3.237) + ( 2 2 . 2 2 ) (&’))’+ (T3.2 3 ) (fi’))”.(3.238) = (21 . 21)
ds2 = ds$
3.4.7
Transformation of Scalar Functions and Line Integrals
As in orthogonal transformations, value of a scalar function is invariant under generalized coordinate transformations, hence we write Q ( x ~ , x ) ’ , x =~ )~ ( E ’ , E ~ , E or~ ) = @(21,2)’,23).
(3.239) (3.240)
+ The scalar product of two vectors, 3 and b , is also a scalar, thus preserving its value. In generalized coordinates we write it as
(3.241) (3.242) Using the transformation equations,
(3.243) (3.244)
170
GENERALIZED COORDINATES AND TENSORS
it is clear that it has the same value that it has in Cartesian coordinates: 3
.-
a . h =?Phi
(3.245) (3.246) (3.247) (3.248) (3.249)
In the light of these, a given line integral in Cartesian coordinates,
can be written in generalized coordinates as
(3.251) (3.252)
We can also write I as
In orthogonal generalized coordinates, only the diagonal components of the metric tensor are nonzero. hence I becomes
I
=
s
gI1v1&1
+ g22v2&2 + g 3 p 3 d z 3 .
7
(3.254)
It is important to keep in mind that a vector exists independent of the coordinate system used to represent it. In other words, whether we write in Cartesian coordinates as
d = v121+ v222 + v323 = v22,
7
(3.255)
or in generalized coordinates as
d
it is the same vector. Hence the bar on is sometimes omitted. We remind the reader that ??i are not unit vectors in general. Covariant components of
GENERAL TENSORS
J
171
are found as
(3.258) (3.259) (3.260) (3.261) J
Similarly, using the inverse basis vectors, e components as
z
, we
can find the contravariant
(3.262) (3.263) (3.264) (3.265) The two types of components are related by
V J= p V i .
(3.266)
We can now write the line integral [Eq. (3.250)] in the following equivalent ways:
I = = =
J' ? . J'
1.
(&'??1+
&'?)z
+ a3T3)
+ ( v . $ ~& 2 )+ (7.23) a 3
a 1
&'+V2
a2+V3
a3
(3.267) (3.268) (3.269) (3.270)
3.4.8 Area Element in Generalized Coordinates Using t h 2 expression for the area of a parallelogram defined by two vectors, * a and b , as area = 1 3x
71,
(3.271)
we write the area element in generalized coordinates defined by the infinitesimal vectors & E l 2 1 and &E2?2 (Fig. 3.9) as fiZlf2 =
x
Tz/&1a2,
(3.272)
172
GENERALIZED COORDINATES AND TENSORS
Figure 3.9
Area element in generalized coordinates.
Similarly, the other areas are defined:
A
In orthogonal generalized coordinates, where the unit basis vectors, i2i = Ti/ lT.;l,i = 1,2,3, satisfy (3.275) (3.276) (3.277) we can write
where the area element is oriented in the & direction. Similarly, we can write the area elements dZZ3?1 = =
and
I&
&fi3
s 2
(3.280)
g2,
(3.281)
173
GENERAL TENSORS
u and v coordinates defined on a surface.
Figure 3.10
3.4.9
Area of a Surface
A surface in three dimensional Cartesian space can be defined either as
x3 = f(X1,XZ) or in terms of two coordinates (parameters), u and v, defined on the surface as (Fig. 3.10)
x 1 = x y u ,v), x2 = x2(u,v), x3 = x3(u,v).
(3.284) (3.285) (3.286)
The u and v coordinates are essentially the contravariant components of the coordinates that a two-dimensional observer living on this surface would use, that is,
x x
-1
= u,
(3.287)
-2
= v.
(3.288)
We can write the infinitesimal Cartesian coordinate differentials, dx', dx2,dx3, corresponding to infinitesimal displacements on the surface, in terms of the surface coordinate differentials, d u and dv, as
dx'
dX = - du
dU
dX2
dX +dv, dV
dX2
dx2 = - du + - dv, dU dV ax3 8x3 dx3 = - du + - dv. dU dV
(3.289) (3.290) (3.291)
174
GENERALIZED COORDINATES AND TENSORS
We now write the distance d s between two infinitesimally close points on the surface entirely in terms of the surface coordinates u and v as
+ (dx2)’ + (dx’)’ 2 + +
d s 2 = (dx’)’ =
[(g)(g)2($3’1
+
(3.292)
du2
I‘):(
[(g)2 + (g)’+
dv2.
(3.293)
Comparing this with the line element for an observer living on the surface:
(3.294)
d s 2 = gij d u d v , i = 1 , 2 , = gZlu d u 2
+ 2g,,
dudv
+ guv d v 2 ,
(3.295)
we obtain the components of the metric tensor as 2
g u u = ( g ) guv =
+(g)2+(g), 2
(3.296)
ax1 ax1 + -ax2 ax2 8x3 8x3 -+ -d u dv
du dv ’
d u dv
(3.297)
2
guu=(g)2+(g)2+(!$)
(3.298)
Since the metric tensor can also be written in terms the surface basis vectors, 4 e z L and Tu, as
d s 2 = (2% . 3% dU2 )
+ 2 ( 2% . 7?u)d u d v + ( Zv. T u d)v 2 ,
(3.299)
we can read T Uand 7?ufrom Equations (3.296)-(3.298) as
-
ax1,
e
= -el
e
= -el
aU
ax1, dv
ax2, ax3+ -e2 + -e3, dU du dx2ax3 + 8v + -Z3. 8U -e2
(3.300)
(3.301)
Note that the surface basis vectors are given in terms of the Cart,esian unit basis vectors ( Z l , Z2,Z3). We can now write the area element of the surface in terms of the surface coordinates as
d 3 u u = T Ud u x
ZVd v ,
(3.302)
GENERAL TENSORS
175
which, after substituting Equations (3.300) and (3.301), leads to
db,,
=
f
[(--du dv
-
du dv
-d u dv
(3.303)
d u dv
which can also be written as
The signs f correspond to proper and improper transformations, respectively. Using Equation (3.303), we can write the magnitude of the area element as
/dZuvl = JEG where
-
F2 dudv,
E=(E)2+(Z)2+(E)
(3.305)
2
, (3.306)
(3.308) Integrating over the surface, we get the surface area (3.309)
(3.310)
which is nothing but the Equation (2.205) we have written for the area of a surface in the previous chapter. If the surface S is defined in Cartesian coordinates as z3- f(x1,x2) = 0, we can project the surface element, d 3 , onto the x1x2-plane as d x 1 d x 2 = ( E . z3)d a = cosyda, where 6 = 5,’1 5 1is the unit normal to the surface and integrate over the region RXlx2,which is the projection of S onto the
176
GENERALIZED COORDINATES AND TENSORS
A
.3
Figure 3.11
Projection of the surface area element
z1z2-plane (Fig. 3.11). Since the normal is given as
3=
8.f
8f
we write the surface area as
S
=//do
= /./nx,z2(l/cos7)dx'dx2
The two areas [Eqs. (3.310) and (3.311)] naturally agree. Example 3.2. Curvilinear coordinates o n the plane: Transformations from the Cartesian to curvilinear coordinates on the plane, say the z1x2plane, is accomplished by the transformation equations z1 = 2 1 ( U , V ) ,
(3.312)
z2= 2 ( U , V ) ,
(3.313)
x
3
= 0.
(3.314)
Metric tensor can easily be constructed by using Equations (3.296)-(3.298). Area element is naturally in the z3direction and is given as
(3.315)
GENERAL TENSORS
177
Taking the plus sign for proper transformations, we write the magnitude of the area element as
(3.316) In other words, under the above transformation [Eqs. (3.312)-(3.314)], the area element transforms as
(3.317) Notice that on the x1x2-plane
(3.318) Applying these to the plane polar coordinates defined by the transformation equations
where u = p and v
=
x1 = pcos4,
(3.319)
x2 = p sin 4,
(3.320)
4, we can write the line element ds2 = dp2
as
+ p2 dq5’.
(3.321)
Since
(3.322) the area element becomes dg 3.4.10
=p
dpdd.
(3.323)
Volume Element in Generalized Coordinates
In Cartesian coordinates the scalar volume element is defined as d r = C1 . (C2 x
Z3)
dzld~~d.~.
(3.324)
Since the Cartesian basis vectors are mutually orthogonal and of unit magnitude, the infinitesimal volume element reduces t o dr
= dz1dz2dz3.
(3.325)
In generalized coordinates we can write the scalar volume element dr’, which is equal to d7, as the volume of the infinitesimal parallelepiped with the sides defined by the vectors 2121, 2 2 2 2 , 2 3 2 3
(3.326)
178
GENERALIZED COORDINATES AND TENSORS
as
dr' =
(&&l).
(Z2&2x ?3&3)
=2 1 .( 2 2x
T3)&1&2&3.
(3.327) (3.328)
Using Equation (3.143), this can also be written as (3.329)
A tensor that transforms as
is called a tensor density or a pseudotensor of weight w. Hence the coorwhich transforms as dinate volume element, (3.331) is a scalar density of weight now transforms as
~
1. Volume integral of a scalar function p(Z1, T2, z3)
(3.332) In orthogonal generalized coordinates the volume element is given as dr'
3.4.11
=131
1 2 2
a21 1 2 3 a3/
=
1pq 1
=
&&& d z 1 d z 2 d z 3 .
~ 1 ~ ~ &l&2&3 1~ 1
(3.333) (3.334) (3.335)
lnvariance and Covariance
We have seen that scalars preserve their value under general coordinate transformations that do not involve scale changes. Magnitude of vectors and the trace of second-rank tensors are also other properties that do not change under such coordinate transformations. Properties which preserve their value under coordinate transformations are called invariants. Identification of invariants in natiirc is very important in understanding and developing new physical theories. An important property of tensors is that tensor equations preserve their form under coordinate transformations. For example, a tensor equation given as
DIFFERENTIAL OPERATORS IN GENERALIZED COORDINATES
179
transforms into
Even though the components of the individual tensors in a tensor equation change, the tensor equation itself preserves its form. This useful property is called covariance. Since the true laws of nature should not depend on the coordinate system we use, it should be possible to express them in coordinate independent formalism. In this regard, tensor calculus plays a very significant role in physics. In particular, it reaches its full potential with Einstein's special theory of relativity and the general theory of relativity. DIFFERENTIAL OPERATORS IN GENERALIZED COORDINATES
3.5 3.5.1
Gradient
We first write the differential of a scalar function @(Zi)as
a@
a@ a@ + - dZ2 + - dZ3. z1 z2 z3
d@ = - dZ1
(3.338)
Using the displacement vector written in terms of the generalized coordinates and the basis vectors 2 i as
+d Z 2 2 2 +dZ323,
d 7 = &'$I
(3.339)
we rewrite dQ, as d@ = T @ . d 7
(3.340)
+ dZ3$3)
(3.341)
to get
+
d@ = T@. (&El21 d Z 2 2 2 =
(?a.
21)
dZ1+ (T@. 2 2
)
dZ2
+
$3)
dZ3.
(3.342)
Comparing with Equation (3.261), this gives the covariant components of the gradient in terms of the generalized coordinates as (3.343) In orthogonal generalized coordinates, where the unit basis vectors are defined as el=,-h
3 e l
lesl
3 -
e l
3
-3
2
&'e2=-
e 2
d=,
2
e3=-
e 3
6'
(3.344)
Equation (3.343) gives the gradient in terms of the generalized coordinates and their unit basis vectors:
180
GENERALIZED COORDINATES AND TENSORS
Figure 3.12
3.5.2
Volume element used in the derivation of the divergence operator.
Divergence
To obtain the divergence operator in generalized coordinates, we use the integral definition [Eq. (2.309)] (3.346) where S is a closed surface enclosing a small volume of AV. We confine ourselves to orthogonal generalized coordinates so that the denominator can be taken as (3.347) For the numerator we consider the flux through the closed surface enclosing the volume element shown in Figure 3.12. We first find the fluxes through the top and the bottom surfaces. We chose the location of the volume element such that the bottom surface is centered at PI = (EA,Ei,O) and the top surface is centered at P2 = (?i$,zi, AT3).We write the flux through the bottom surface as (3.348) (3.349) -
where we used is.
x3for the component of 2 along the unit basis vector -
A3 =
+ -
A
.E3.
g3,
that
(3.350)
DIFFERENTIAL OPERATORS IN GENERALIZED COORDINATES
181
-
The minus sign is due to the fact that 2 3 and the normal - to the surface are in opposite directions. Note that @, &, and 713 are all functions of position. For the flux over the top surface we write
We now have a plus sign, since 2 3 and the normal are in the same direction. Since we use orthogonal generalized coordinates, the other components of + A do not contribute to the flux through these surfaces. Since the righthand side of Equation (3.352) is t o be evaluated at ($j,zg,AT3) , we expand
( 4 5 7 6 2 3 )in Taylor series about
($,,?i$,O)
and keep only the first-order
terms: -
&&A3
--2
,x ,AZ3
=
fi&Z
Substituting this into Equation (3.352), we obtain
Since the location of the volume element is arbitrary, we drop the subscripts and write the net flux through the top and the bottom surfaces as
Similar terms are written for the other two pairs of surfaces, giving
(3.356)
182
GENERALIZED COORDINATES AND TENSORS
4 x3
/==&+;* cll -X1 Figure 3.13 Closed path used in the definition of curl, where AT2 and AT3 represent the change in coordinates between the indicated points.
Substituting this into Equation (3.346) with Equation (3.347), we obtain the divergence in orthogonal generalized coordinates as
L
(3.357)
3.5.3 Curl We now find the curl operator in orthogonal generalized coordinates by using the integral definition [Eq. (2.310)]:
(3.358)
where C is a small closed path bounding the oriented surface AS. The outward normal to d d is found by the right-hand rule. We pick a single component of x A by pointing d d in the desired direction, say gl.In Figure 3.13 we show the outward unit normal 6 found by the right-hand rule, pointing in the direction of Z1, that is, 6 = gl.We now write the complete line integral
(a'
->
DIFFERENTIAL OPERATORS IN GENERALIZED COORDINATES
183
over C as +
J
i x - d f - ; Ca+Cb+Cc+Cd
A .d7
( 2 .&)
&I
= L.+C*+cc+cd
q
+ (2.
&2
+ ( 2 .T3)&3, (3.359)
where we have used Equation (3.268). We first consider the segments C, and C,. Along C, we write
+ where 2 2 = A . & . We now write the Taylor series expansion of about Po = (zi,?E;,Ti) with only the linear terms:
(z3-
zi).
6
3
2
(3.361)
(FA ,Z;,Zg)
Along C, we have T3 = T i and TI
= Ti;hence
Equation (3.360) becomes
(3.362) Next, to evaluate
+
jCC A . d 7 , we write
184
GENERALIZED COORDINATES AND TENSORS
-
We again use the Taylor series expansion [Eq.(3.361)]of 6 x 2 about Po = (zA,Ti,zi) with only the linear terms. Along the path C,, we have T3 = Ti AT; and z1= zA,which gives
+
(3.364)
Using this n Equation (3.363), we obtain
k x+
(3.365)
This allows us to combine the integrals in Equations (3.362) and (3.365) to yield
Since our choice of the point (3$, Tg,T i ) is arbitrary, we write this for a general point as (3.367)
A similar equation will be obtained from the other two segments as (3.368)
DIFFERENTIAL OPERATORS IN GENERALIZED COORDINATES
185
Addition of Equations (3.367) and (3.368) yields
Using this result in Equation (3.358) gives the component of direction of 21 as
3 x 2 in the
A similar procedure yields the other two components as
and
The final expression for the curl of a vector field in orthogonal generalized coordinates can now be given as
(3.373) which can also be expressed conveniently as
. (3.374)
186
GENERALIZED COORDINATES AND TENSORS
3.5.4
Laplacian
Using the results for the gradient and the divergence operators [Eqs. (3.345) and (3.357)], we write the Laplacian for orthogonal generalized coordinates as (3.375)
3.6
ORTHOGONAL GENERALIZED COORDINATES
The general formalism we have developed in the previous sections can be used to define new coordinate systems and t o study their properties. Depending on the symmetries of a given system, certain coordinate systems may prove to be a lot easier to work through the mathematics. In this regard, many different coordinate systems have been designed. To name a few, Cartesian, cylindrical, spherical, paraboloidal, elliptic, toroidal, bipolar, and oblate spherical coordinate systems can be given. Among these Cartesian, cylindrical, and spherical coordinate systems are the most frequently used ones, which we are going to discuss in detail. Historically, the Cartesian coordinate system is the oldest and was introduced by Descartes in 1637. He labeled the coordinate axes as x,y,and z :
x1 = 5 , x2 = y, x3 = z A
A
(3.376) (3.377) (3.378)
A
and used i , j , k for the unit basis vectors: A
A
.
(3.379)
e l = 2,
A
-32
h
(3.380)
=j, A
h
e3
= k.
(3.381)
In Cartesian coordinates motion of a particle is described by the radius or the position vector, 7 ( t ) ,as -+ T ( t )= x(t$+ y(t)T
+ z(t)Z,
(3.382)
ORTHOGONAL GENERALIZED COORDINATES
187
where the parameter t is usually the time. The velocity, T ( t )and , the acceleration, i T ( t ) ,are obtained as
(3.383)
z i+$ h
= ~
h
+z
A
(3.384)
j k, d2T
a ( t ) = - d= -3 dt
(3.385)
dt2
A
h
A
(3.386)
=xi+yj+zlc.
Example 3.3. Circular motion: Motion of a particle executing circular motion can be described by the parametric equations
x ( t ) = a0 coswt;
(3.387)
y ( t ) = a0 sinwt,
(3.388) (3.389)
z ( t ) = zo. Using the radius vector + T ( t )= a0 cos wt i
h
+ a0 sin wt j + zo k , h
A
(3.390)
we can obtain the velocity T ( t )as + 2, ( t ) = -uow sin w t i
h
+ aow cos wt j
A
(3.391)
and the acceleration Z ( t )as + a ( t ) = -aow
A
2
= -w”(t).
3.6.1
A
cos wt i - uow2 sin wt j
(3.392) (3.393)
Cylindrical Coordinates
Cylindrical coordinates are defined by -1
(3.394)
x =P,
-2
x x
-3
=4,
(3.395)
z.
(3.396)
=
They are related to the Cartesian coordinates by the transformation equations (Fig. 3.14)
x
= p cosq5,
y
= p sin
z
=
z,
4,
(3.397) (3.398) (3.399)
188
GENERALIZED COORDINATES AND TENSORS
Figure 3.14
Cylindrical coordinates: Coordinate surfaces for p, 4, z and the
unit basis vectors.
where the ranges are given as p E [ O , o o ] , $ E [0,2n],2 E [ O , o o ] . Inverse transformation equations are written as p=
VGqF,
(3.400) (3.401)
2
(3.402)
= 2.
We find the basis vectors [Eq. (3.208)], -=,
e
i =
axk
-,
(3.403)
zz
as f
e
+ sin 4 j , I X,,l = 1, 4 = - p s i n 4 i + p c o s 4 j , lZ41= p,
Z p= cos 4 i
3
e
h
h
1=
h
2 = 2
f
h
e3=Zz=k,
(3.404)
A
/+ ezI=l,
(3.405)
(3.406)
The unit basis vectors are now written as h
h
h
e l = ep = cos$ i + s i n $ j ,
-
-
A
A
h
e2
h -
= E4 =
-
sin $ i
(3.407)
+ cos $ j , A
(3.408)
h A
e3 =
e, = k .
(3.409)
ORTHOGONAL GENERALIZED COORDINATES
Figure 3.15
189
Infinitesimal displacements in cylindrical coordinates.
It is easy to check that the basis vectors are mutually orthogonal; hence they satisfy the relations (Fig. 3.14)
(3.410)
It is important to note that the basis vectors, ?i, are mutually orthogonal; however, their direction and magnitude depends on position. We now write the position vector, 7, and the infinitesimal displacement vector d 7 [Eq. (228)l as
7= p T p + $ 2 6 d
+z T z , 7 = d p T 0+d$T+ +d z T z .
(3.411) (3.4 12)
From the line element ds2 = d 7 . d 7
(3.413)
+
= dp2 ( T f . Z f ) d$' = dp2
(2+ .?+)
+ p2dq52 + dz2
= Q f f dP2
+ g+dJd42 + g z z
+ dz2 (2,. T z )
(3.414) (3.415)
dZ2,
(3.416)
we obtain the metric tensor: (3.417)
190
GENERALIZED COORDINATES AND TENSORS ..
We construct the contravariant metric tensor, g z J , by using the inverse basis vectors [Eq. (3.213)],
(3.418) (3.4 19) (3.420) which are found by using the inverse transformation equations [Eqs. (3.400)-(3.402)] as
dpi+ -
-3 dp, + -dpk dy dz X Y (x2 y2)1/Zi (x2 y 2ye = --z -3
2 1-Z P =
(3.421)
ax
h
+
P
+
(3.423)
P h
h
-2e
(3.422)
+
+
= C O S ~i + s i n $ j ,
(3.424)
84, dqk + -3 -- 3 4 = -t dx dy
(3.425) (3.426) (3.427) (3.428)
-3
e
- e--fz = k . A
(3.429) . .
We can now write the contravariant metric tensor, 9'3, as ..
gZ3= -ie
=(
. -ej
1
(3.430) 0
0
$-.
0
0
0 0 ) .
(3.431)
1
Note that
(3.432) (3.433) (3.434)
ORTHOGONAL GENERALIZED COORDINATES
191
Line integrals in cylindrical coordinates are written as
(3.435) (3.436) where
(3.437) (3.438) (3.439) Area elements in cylindrical coordinate (Fig. 3.15) are given as
(3.440) (3.441) (3.442) while the volume element is d r = dp(pd4)dz
(3.443) (3.444)
= pdpd4dz.
Applying our general results to cylindrical coordinates, we write the following differential operators: Gradient [Eq. (3.345)]:
(3.445) Divergence [Eq. (3.357)]:
(3.446) where --$
A,=A.Z,,
+
+ -
A,=A.Z$, A z = A . k .
(3.447)
Curl [Eq. (3.373)] :
(3.448)
192
GENERALIZED COORDINATES AND TENSORS
+
where A, = A . C,, A4 = expressed as
+
A .Z4, A,
=
+ A . k . Curl can also be conveniently
(3.449)
Laplacian [Eq. (3.375)] 1d
[
d@
P @ ( p , 4 , z ) = -- ppap iip]
1 d2@ d2@ + -p2 a@ dz2'
(3.450)
+
Example 3.4. Acceleration i n cylindrical coordinates: In cylindrical coordinates the position vector is written as ---f
r
= p cos
4 ;+
p sin 4
3 + zZ.
(3.451)
Using the basis vectors Z,,Z$,z [Eqs. (3.407)-(3.409)], we can also write this as
(7t.K)Z
+ T = (?.EP)E,+(?;).2$)z?4+
+ zk.
(3.452)
A
= pZ,
Since the basis vector particle is written as
(3.453)
Z, changes direction with position,
velocity of a
(3.454) A
Using Equation [3.407], we write the derivative of the basis vector, e,, as A
h
e , = 4(- s i n 4 i =
+ cos4 j ) h
$Z&
(3.455) (3.456)
thus obtaining the velocity vector + v = pZ,
+ p&4 + Zk. h
(3.457)
To write the acceleration, we also have t o consider the change in the direction of Ed:
ORTHOGONAL GENERALIZED COORDINATES
Figure 3.16 basis vectors.
Spherical coordinates: Coordinate planes for
T,
193
8, q5 and the unit
Using Equation (3.456) and h
&- cos 4 i
h
k4 = =
A
-
sin 4 j )
(3.459)
-4zp,
(3.460)
we finally obtain
Zp + (&
3.6.2
+ 2b&)z4+ Zk. h
(3.461)
Spherical Coordinates
Spherical coordinates (r,8,4) are related to the Cartesian coordinates by the transformation equations (Fig. 3.16)
4, y = r sin 0 sin 4,
(3.462)
z
(3.464)
x
= r sin 0 cos
(3.463)
= r cos 0,
whcre the ranges are given as
r E [ 0 , 4 , 0 E [0,7d, 4
E
[0,27d.
(3.465)
194
GENERALIZED COORDINATES AND TENSORS
The inverse transformations are (3.466) (3.467) (3.468) We write the radius vector as + r =zi+yj+zk A
h
-
= r sin B cos
.
(3.469)
4T+ r sine sin 4 3+ r cos 0 X.
(3.470)
Calling -1
x = r,
we write the basis vectors, 3
e
1=
3
$i
z2= 8, z3 = 4, dXj
= -,
as
zz
h
A
A
-f?,=sinBcosq5i+sin0sin$j+cose h
e2 =2 0
(3.471)
(3.472)
k, h
A
=rcosecosdi+rcosOsin4j-rsinB
3
A
k,
(3.473)
A
(3.474)
e3=-f?4=-rsinBsin$i+rsinBcos4j.
Dividing with their respective magnitudes,
\Tr1= 1,
I ~ Q \= r,
(3.475)
1241 = rsin8,
gives us the unit basis vectors:
Z,= sin 0 cos 4 T+ sin 0 sin $ 3+ cos 0 k , h
(3.476)
h A
ee =cosQcos$;+cosBsinq53-sintl A
24 =
-
sin
i
+ cos $ j ,
k,
(3.477)
A
(3.478)
which satisfy the relations A
e,
-
A
A
= ee x E d ,
(3.479) Using the basis vectors, we construct the metric tensor
(3.480) (3.481)
ORTHOGONAL GENERALIZED COORDINATES
Figure 3.17
195
Infinitesimal displacements in spherical coordinates.
which gives the line element as
(3.482)
ds2 = gijdTi&? = dr2
+ r2 de2 + r2sin28 dd2.
(3.483)
The surface area elements (Fig. 3.17) are now given as
dore = r drd8, dgr4 = r sin 8 drdq5,
(3.484) (3.485)
doe4 = r2sin 8 dedd,
(3.486)
d r = r2 sin8 drdedq5.
(3.487)
while the volume element is
Following similar steps to cylindrical coordinates, we write the contravariant components of the metric as
1
0
0
..
(3.488) r2 sin2 0
Using the metric tensor, we can now write the following differential operators for spherical polar coordinates:
196
GENERALIZED COORDINATES AND TENSORS
G r a d i e n t [Eq. (3.345)]:
a@, + --eQ Id@,. 1 d@, +r 80 rsine
?@(r, 0 , 4 )= -e, dr
(3.489)
Divergence [Eq. (3.357)]: +
d ( r 2sin OA,) d ( r sin OAQ) r2 sin 0 ae 1 d(r2AA,) 1 d(sin6'Ao) 1 dA, - -~ fr2 dr rsin6' d0 rsin6' 84 '
?.A=-
(3.490)
+
+--
(3.491)
where A,
--+
---t
=
A GT,AQ= A .EQ, A, =
(3.492)
dr
which can also be conveniently expressed as h
e,
A,
I,
r 20 rsin6' &$,
asd
?xA+ = - d e r2 t [ sin $1 0
(3.493)
t
rAQ rsinBA6
---t + + where A, = A . E,, A0 = A . EQ, A, = A . e+. Laplacian [Eq. (3.375)] :
(3.494)
(3.495) -
1
d2@
(3.496) E x a m p l e 3.5. Motion i n spherical coordinates: the position vector,
In spherical coordinates
ORTHOGONAL GENERALIZED COORDINATES
197
t'
Figure 3.18
Basis vectors along the trajectory of a particle.
is written as + r = rsinecos45+rsinOsin4 ?+rsino
Z,
+ r = re,.
(3.497) (3.498)
For a particle in motion the velocity vector is now written as h
u
--f
= G,
+re,.
(3.499)
Since the unit basis vectors also change direction with position (Fig. 3.18), we write the derivatives e , , e e , and e+ as h
h
6, = (cosocos+b-sinesin4$):+ -
=
sine
(cososin4b+sinecos4Q)T
;4
(3.500)
bz8 + sin e &z@,
(3.501)
A
ee = (-sinecos4 i - c o s e s i n 4 -
A
coso
= -I%,
ex
ed = -cos@
= -sin
(3.503) .,-.
4i - s i n 4 $j $2, - cos e &. . A
4)~
b+cos~cos~
(3.502)
+ cos 0 &4,
h
$)a+ (-sinesin4
(3.504) (3.505)
198
GENERALIZED COORDINATES AND TENSORS
Velocity [3.499] is now written as
which also leads to the acceleration
PROBLEMS
1. Show that the transformation matrices for counterclockwise rotations through an angle of 0 about the x1-,x2-,and x3-axes, respectively, are given as
1
0
0 -sin0 cosd
cos0
2 . ShowtJhat z’@b .
0
cos0 -sin0
sin0
the tensor product defined as
0
;5’ @ 3 is the transpose of
3. Show that the trace of a second-rank tensor,
is invariant under orthogonal transformations.
4. Using the permutation symbol, justify the formula
PROBLEMS
199
5 . Convert these tensor equations into index notation:
6)
(ii)
6. Write the components of the tensor equation u,
= vJvJwLl
i,j
=
1,2,3,
explicitly.
7. What are the ranks of the following tensors: (i) (ii) (iii)
KZJk1DkBrnAZJ, AZBB,WJkuk, A ~ J A ~ ~ B ~ .
8. Write the following tensors explicitly, take i, j = 1 , 2 , 3 :
(i) (ii)
9. Let
A i j , Bij,
AiBiWjkuk, AijAijBk.
and Ci be Cartesian tensors. Show that
is a first-rank Cartesian tensor. 10. Show that the following matrices represent proper orthogonal transformations and interpret them geometrically: (i)
200
GENERALIZED COORDINATES AND TENSORS
(ii)
cos30O -sin30°
-sin3O0 -cos30° 0
0 0 -1
11. Show that
12. Show that in cylindrical coordinates the radius vector is written as + r - p'Zp+zk.
h
13. Parabolic coordinates, (T1,Z2,Z3), usually called (7, [, 4 ) ,are related to the Cartesian coordinates, (2,y, z),by the transformation equations
x1 = x
x
2
= q
= y = rl<sin 4,
1 2
2 2 = 2 = -(<2
- 72)
(i) Show that this is an admissable coordinate system and write the inverse transformations. (ii) Find the basis vectors and construct the metric tensor
gij.
(iii) Write the line element. ..
(iv) Write the inverse basis vectors and construct 9'3. (v) Verify the relation between the basis vectors and the inverse basis vectors [Eq. (3.220) and (3.225)]. (vi) Write the differential operators, gradient, divergence, curl, and Laplacian, in parabolic coordinates. (vii) Write the velocity and acceleration in paraboloidal Coordinates. 14. Elliptic coordinates, ( q , [ ,4) are defined by
x1 = x
CL = -coshqcos<,
2
a . x = y = -sinhqsin<, 2 2 n
2'
= 2 = z.
(i) Show that this is an admissable coordinate system and write the inverse transformations. (ii) Find the basis vectors and construct the metric tensor
gij.
PROBLEMS
201
(iii) Write the line element. (iv) Write the inverse basis vectors and construct
.. 9'3.
(v) Verify the relat,ions between the basis vectors and the inverse basis vectors [Eqs. (3.220) and (3.225)]. (vi) Write the differential operators, gradient, divergence, curl, and Laplacian, in elliptic coordinates. (vii) Write the velocity and acceleration in elliptic coordinates.
15. Given the parametric equation of a surface:
x1 = xl(u,v), 2
2
= 22 ( u , v ) ,
z3 = x 3 ( u , v ) and the surface area element
show that the magnitude of the area element is given as
where 2
E = ( Z )
8x1 8x1 F=--+--+-du dv 2
G=($)
+(g)2+(g), 2
8x2 8x2
du av
8x3 8x3 du dv '
+(g)+(g). 2
2
16. For the area of a surface in terms of the surface coordinates ( u , v ) we wrote
and said that in three dimensions and in Cartesian coordinates S can also be written as
(x1,22,x3),
202
GENERALIZED COORDINATES AND TENSORS
Is S a scalar? If your answer is yes, then scalar with respect t o what? What happened to the third coordinate in the transformation? 17. Use the transformation x+y=u, y = uv to evaluate
18. Evaluate the integral
over the triangular region bounded by the lines x = 0, y = 0 and = 1 by making a suitable coordinate transformation. Is this an orthogonal transformation?
x +y
19. Transform the following integral:
by using the substitution
x
= u,
y=u-v.
Find the new basis vectors
20. Evaluate the surface integral I
=A
( x 2+ y2) d o
over the cone
z between z = 0 and z
=
d
m
= 3.
21. Express the differential operators
a d d ax ’ a y ’ 32
PROBLEMS
203
in spherical coordinates and then show that
d d +isin@-, 84 d0
= icotOcos@-
+ .
where L is the angular momentum operator in quantum mechanics defined as +
L
= -a
(7x 3).
We have set fi = 1.
Hint: Differential operators are meaningful only when they operate on a function. Hence, write
d v x , g, Z) - d v , e,@)dr dx dr dx
-+
d w - ,0 ’ 4 )-80 + do dx
a w , e,@)a4 d@
dx’
etc. and proceed. 22. Quantum mechanical angular momentum is defined as the differential operator +
L
= -2
(7x
3).
Show that
where L, and L y are the Cartesian components of the angular momentum. Also, show that
L2 = L;
+ L; + L: = -21 (L+L- + L-L+) + L;.
23. Show by using two different methods that in spherical coordinates + L =-i(7XV)
=i
(AslrlD
e~---e4-@:
:e)
204
GENERALIZED COORDINATES AND TENSORS
24. In spherical coordinates show that the L2 operator in quantum mechanics, that is,
+ + L;,
L2 = L& L; becomes
Hint: Use the result of Problem 3.21 and construct L2 Verify your result by finding L2 as
+
L~ = L ~ L * L ~ L ~ . 25. Show that the cross product of two vectors,
+
3 x b , is a pseudovector.
26. Show that the triple product of three vectors, doscalar.
3.
27. Prove that
28. Show that the covariant components of velocity are given as
vi = where v 2 = (i') coordinates.
d
6%
(:u2) , i
= 1,2,3,
+ (i2) + (i3) . Justify this result by using spherical
29. Given the following position vectors, find the rectangular components of velocity and acceleration:
30. Write the components of velocity and acceleration in polar coordinates for the two-dimensional position vector defined by (i) (ii)
T T
= a / ( b - sin4), 4 = wt, = a / t , 4 = bt.
PROBLEMS
31. What are the components of velocity and acceleration in spherical ordinates with the position of the particle given as
205 CO-
r = a, 0 = bsinwt, 4 = wt. 32. Find the expressions for the kinetic energy in (i)
cylindrical coordinates,
(ii)
spherical coordinates.
33. Find the expression for the covariant components of velocity and acceleration in (i)
parabolic coordinates (Prob. 3.13),
(ii)
elliptic coordinates (Prob. 3.14).
3.1. Given the expression .2
v2 = Ax2 + Bxx$sin4 + C x 2 4
for the velocity squared in the following generalized coordinates : -
21 = 2 , z2
= $5,
find the covariant components of the velocity and acceleration. 35. Find the unit tangent vector to the intersection of the surfaces Q1(x, y, z ) = x2
+ 3xy - y2 + yz + 2 = 5
and Q2(x,y, z ) = 3x2 - xy
+y
at (1.1,l).
Hint: First find the normals to the surfaces.
2
=3
This Page Intentionally Left Blank
CHAPTER 4
DETERMINANTS AND MATRICES
In many areas of mathematical analysis we encounter systems of ordered sets of elements, which could be sets of numbers, functions, or even equations. Determinants and matrices provide an efficient computational tool for handling such systems. We already used some of the basic properties of matrices and determinants when we discussed coordinate systems and tensors. In this chapter, we give a formal treatment of matrices and determinants and their applications to systems of linear equations.
4.1
BASIC DEFINITIONS
We define a rectangular matrix of dimension m x n as an array
Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
9. Selsuk Bayin 207
208
DETERMINANTS AND MATRICES
where the elements of A could be numbers, functions or even other matrices. An alternate way to write a matrix is
A = a 2.7. . i = l , . . . , m, j = 1 , . . . , n,
(4.2)
where aij is called the i j t h element or the component of A. The first subscript, i , denotes the row number and the second subscript, j , denotes the column number. For the time being, we take aij as real numbers. When the number of rows is equal to the number of columns, m = n, we have a square matrix of order n. We define a row matrix with the dimension 1 x n as
A
=
(
all
'..
a12
aln
)
(4.3)
or simply as
A = ( a1
a2
.
an
(4.4)
).
Similarly, we define m x 1 column matrix as all
A=
a21
or as A
=
am1
4.2
OPERATIONS W I T H MATRICES
The transpose of a matrixis obtained by interchanging its rows and columns and it is denoted by AT or A . The transpose of a matrix of dimension m x n,
A=
is of dimension n x m :
AT
=x=
[
all
a21
...
am 1 'm2
aln
a2n
..'
amn
The transpose of a column matrix is a row matrix,
1
(4.7)
OPERATIONS WITH MATRICES
209
and vice versa:
If the transpose of a square matrix is equal to itself, it is called symmetric. When the transpose of a square matrix is equal to the negative of itself, it is called antisymmetric or skew-symmetric. If only the diagonal elements of a square matrix are nonzero, then it is called diagonal:
(4.10)
The zero or the null matrix, 0, is defined as the square matrix with all of its elements zero:
0 0
..'
0 (4.11)
0 0
...
Identity matrix, I, is defined as the square matrix
(4.12)
= 6.. 2 3 > i = 1, ... ,n, j = 1,. . . ,n.
(4.13)
To indicate the dimension we may also write I,. Addition of matrices is defined only when they have the same dimension. Two matrices, A and B, of the same dimension, m x n, can be added term by term, with their sum being again a matrix of dimension m x n : C=A+B,
+
cij = ~ i j b i j ,
i
=
1,.. . , m, j = 1,. . . ,n.
(4.14) (4.15)
Multiplication of a rectangular matrix with a constant (Y is accomplished by multiplying each element of the matrix with that constant: C = aA, c23. . - a ua3 - ,. i = 1 , . . . ,m, j = 1,. . . ,n.
(4.16) (4.17)
210
DETERMINANTS AND MATRICES
Subtraction of two matrices can be accomplished by multiplying the matrix to be subtracted by -1 and then by adding to the other matrix as A-B
=A
+ (-1)B.
(4.18)
It is easy to verify that for three matrices, A, B , and C, of the same dimension addition satisfies the following properties: A + B =B +A, ( A + B ) + C = A + (B + C ) .
(4.19) (4.20)
Two matrices of the same dimension, A and B, are equal if and only if all their corresponding components are equal, that is, 23 -
b23. . for all i and j .
(4.21)
Multiplication of rectangular matrices with constants satisfy
(4.22) (4.23) (4.24) (4.25) where A and B have the same dimension and a and ,8 are constants. Multiplication of rectangular matrices is defined only when the number of columns of the first matrix matches the number of rows of the second. Two matrices, A of dimension m x n and B of dimension n x p , can be multiplied as
C = AB,
(4.26)
n
a zk . bk3r. i
ca3- .-
=
1, . . . ,m, j
= 1, . . . , p ,
(4.27)
k=l
with the result being a matrix of dimension m x p . In general, matrix multiplication is not commutative,
AB
# BA,
(4.28)
but it is associative:
(AB)C = A(BC).
(4.29)
The distributive property of multiplication with respect t o addition is true, that is,
(A + B)C = AC + BC, A(B + C) = AB + AC.
(4.30) (4.31)
OPERATIONS WITH MATRICES
211
For a given square matrix, A, of order n, we say it is invertible or nonsingular if there exists a matrix B such that
AB
= BA = I.
(4.32)
We call B the inverse of A and write
B = A-l.
(4.33)
If two matrices can be multiplied, their transpose and inverse satisfy the relations
A-i5 BA, =A+B,
(4.34)
=
(A+B)
(4.35)
-
~
aB = aB, N
A-1 =
(4.36)
(A)-',
(4.37)
( A B ) - ~= B - ~ A - ~ ,
(4.38)
+ B)-l = A-' + B-l.
(4.39)
(A
The sum of the main diagonal elements of a square matrix is called the spur or the trace of the matrix: 71
n
(4.40)
+ + . . . + unn. a22
(4.41)
tr(AB) = tr(BA), t r ( A + B) = t r A t rB.
(4.42) (4.43)
= all
Trace satisfies
+
Example 4.1. Operations with matrices: Consider the following 2 x 3 matrix A:
A = ( 21 2
-1
),
(4.44)
which has 2 rows
( 2
1 l ) , ( 1 2
-1)
(4.45)
and 3 columns
(T)- ( k ) ,
('1).
(4.46)
212
DETERMINANTS AND MATRICES
Its transpose is
(; ; )
A -=
(4.47)
1 -1
Given two square matrices
(4.48) we can write the following matrices
A+B=
(
2Ar(
""=(
2+1 -1-1
3+2
2'2 2.(-1)
2.3
3 -2
'-')=(
1 ) 5
')). 6
-2
i).
Now consider the following 3 x 3 matrices:
C=
(t ; 4 2 3
-1
)
'
a n d D = ( 61 ;l2
(4.49)
(4.50)
(4.51)
We can write the products
.fi ; ; )
CD=( 2 3
=(
-1
2-3-0 1-o+o 3-lf0 -1 1 2
p)
;l2
1
(4.52)
4-3-0 2-o+o 6-1+0
2+0-1 1+0+2 3+0+l
1 1 2 3 ) 5 4
and
DC=(:l
('1
fl
(4.53)
(4.54)
i)(.fi 7 ; ) 2
3
-1
(4.55)
4 (4.56)
OPERATIONS WITH MATRICES
Note that DC cations:
(
I:
# CD. We can also write the following matrix multipli-
8 a ) ( I)
=
(
2+0+1
);;;;2;
=
8 !)(!s)=( =(;;).
1+0+1 -1 0 2 1+0+1
+ +
(;l
213
()
,
(4.57)
0+0-1 -0 0 - 2 +0+1-1
+
(4.59)
Note that the dimensions of the multiplied matrices [Eqs. (4.57) and (4.59)] and the product matrices satisfy the following relations, respectively:
(3 x 3)(3 x 1) = (3 x l), (3 x 3)(3 x 2) = (3 x 2).
(4.60) (4.61)
The multiplication
1
0
0
1
(4.62)
is not allowed, since the column number of the first matrix does not match the row number of the second one.
Example 4.2. Multiplication with a diagonal matrix: When a rectangular matrix is multiplied with a diagonal matrix from the right, we obtain
(4.63)
(4.64)
214
DETERMINANTS AND MATRICES
Similarly,
(4.65)
(4.66)
4.3
SUBMATRIX AND PARTITIONED MATRICES
Consider a given (rn x n)-dimensional matrix A. If we delete T rows and s columns, we obtain a [ ( m- r ) x ( n - s)]-dimensional matrix called the submatrix of A. For example, given
all A=
a12
a13 a14
a21 a22 a23
a24
(4.67)
we can write the 3 x 2 submatrix by deleting the 4th row and the 3rd and the 4th columns as (4.68) Even though there is no standard way of writing submatrices, we can write them either by indicating the rows and the columns deleted as
B = A (413,4)
(4.69)
or by indicating the rows and the columns kept as
B = A [1,2,311,2].
(4.70)
For a given m x n matrix, consider the submatrices
All = A [ l , . . , r n l l l , . . . ,1211, A12 = A [ I , .. . , rnlIn1 1 , .. . , n] , A21 = A [rnl 1,. . . ,mil,.. . , nl], A22 = A [ml 1,.. . , 1 , .. . , n],
+ +
+
+
(4.71) (4.72) (4.73) (4.74)
SUBMATRIX AND PARTITIONED MATRICES
215
where ml and nl are integers satisfying 1 < ml < m and 1 < n1 < n. We now define the partitioned matrix in terms of the submatrices Aij as (4.75) Partitioned matrices are very important in applications. If A = Aij is a partitioned matrix, then multiplying A with a constant can be accomplished by multiplying each partition with the same constant. If A and B are two matrices of the same dimension and partitioned the same way, Aij and Bij, a = 1,.. . ,m, j = 1 , . . . , n, they can be added blockwise with the result being another matrix with the same partition, that is,
A + B = A .23. + B i j , i = l , . . . , m, j = 1 , . . . , n.
(4.76)
Two partitioned matrices can be multiplied blockwise, if the multiplied partitions are compatible. For example, we can write
A
(c
B D)(:
:)=(
AE+BG CE+DG
AF+BH CF+DH
)'
(4.77)
if the matrix multiplications AE, BG, AF, . . . exist. Partitioned matrices that can be multiplied are called conformable.
Example 4.3. Partitioned matrices: Given the 4 x 4 matrices 1
0
1
i
i1
A=[! 0 0
1
0
1
Il -1
andB=[
0 l!0
1 0 0
1
1,
(4.78)
the partitions
(4.79)
and
216
DETERMINANTS AND MATRICES
are not conformable. However, the partitioned matrices:
are conformable and can be multiplied blockwise to yield the partitioned matrix
(4.82)
a result that can be verified by direct multiplication of the matrices.
4.4
SYSTEMS OF LINEAR EQUATIONS
In many scientific and engineering problems we encounter systems of linear equations to be solved simultaneously. In general, such systems can be written in terms of n equations and n unknowns, x ~ , x z ., . , x,, as
+ a1222 + . . + Uln2, a2121 + a2222 + . . . + a2,2, a1121
'
= y1, = y2,
(4.83) where
aij
and yi are given constants. Introducing the matrices
GAUSS’S METHOD OF ELIMINATION
217
the above system can be written as
AX = y .
(4.85)
Solution of this system is the 2 1 , 2 2 , . . . ,z, values that satisfies all the n equations simultaneously. To demonstrate the basic method of solution, let us start with a simple system with two equations and two unknowns: a1121 a2121
+ = y1, + a 2 2 2 2 = y2.
(4.86) (4.87)
a1222
Since these are linear equations, using their linear combinations we can write infinitely many equivalent sets that will yield the same solution with the original set. Now, a method of solution appears. We simply try to reduce the above set into an equivalent set with the hope that the new set will yield the solution immediately. We first multiply Equation (4.86) by a21/a1l and then subtract from the second [Eq. (4.87)] to get the equivalent set a1121 a21
(a21 -
-a11)21 all
+
+ a1222 = Ill,
a2 1
(a22
-
(4.88) a2 1
-al2)z2
= y2 - -yl
all
all
’
(4.89)
or a1121
(a22 -
+ a 1 2 2 2 = y1,
a21 -a12152
all
= y2
(4.90) a21
-
-y1. all
(4.91)
Since aij and yi are known numbers, the second equation of the new set [Eq. (4.91)] gives the value of one of the unknowns, x2, directly, while the first equation [Eq. (4.90)] gives the remaining unknown, 21, in terms of 2 2 . Substituting the value of 2 2 obtained from the second equation into the first, we obtain the value of 21, thereby completing the solution.
4.5
GAUSS’S M E T H O D OF ELIMINATION
Consider three equations with three unknowns:
(4.92) (4.93) (4.94) To obtain an equivalent set, we start with the first equation and use it to eliminate 51 from the remaining two equations. To obtain the second equation of the new set, we multiply the first equation by 2 and then subtract from the
218
DETERMINANTSAND MATRICES
second. For the third equation, we multiply the first equation by 3 and then subtract from the third, thus obtaining the following equivalent set:
+ 2x2 +
1, [-2 - 2(1)]23 = 1 2 2 ( l ) , [-3 - 3 ( 1 ) ] ~ = 3 -3 - 3(1),
21
[a
+ +
2(l)] 2 1 [3- 2 ( 2 ) ] 2 2 [3 - 3 ( 1 ) ] ~ l [4 - 3(2)]22 ~
+ +
23 =
(4.95) (4.96) (4.97)
that is, 21
+ 222 +
(4.98) (4.99) (4.100)
1, -x2 - 423 = -1, -222 - 6x3 = -6. 23 =
We now use the second equation of the new set to eliminate 2 2 from the first and the third equations [Eqs. (4.98) and (4.100)]. We first multiply the second equation by 2 and then add to the first equation and then multiply the second equation by 2 and subtract from the third equation to obtain another equivalent set: 21
+ [2
-
2(1)]22
+ [I - 2 ( 4 ) ] ~ 3= 1
-
2(-1)]22
+
2(l),
423 = -1, [-6 - 2 ( - 4 ) ] ~ 3= -6 - 2(-1) -22
[-2
-
-
(4.101) (4.102) (4.103)
or
723 = -1, - 4x3 = -1, 2x3 = -4.
(4.104) (4.105) (4.106)
21 -22
From the last equation the value of 2 3 can be read immediately. For the other two unknowns, 2 1 and 2 2 , the method allows us t o express them in terms of the already determined 2 3 as
(4.107) (4.108) thus yielding the solution as 2 1
= -15,
~2
= 9,
23 =
-2.
(4.109)
It can be checked easily that this solution satisfies all three of the equivalent sets. This method is called the Gauss’s method of elimination, also called the Gauss- Jordan reduction. When the original set of equations are compatible, Gauss’s method of elimination always yields the solution. Furthermore, the algorithm of the Gauss’s method is very convenient to be carried out by computers.
GAUSS’S METHOD OF ELIMINATION
219
Of course, an important issue about a given system of n equations with n unknowns is the existence of solution, that is, the compatibility of the system. Consider the system
+
2x1 3x2 5x1 - 2 2 3x1 2x2
-
6x3 = 1,
+ 2x3 = 2,
+
-
4x3 = 2.
(4.110) (4.111) (4.112)
Proceeding with Gauss’s method of elimination, we use the first equation t o eliminate x1 from the other two t o get the equivalent set
(4.113) (4.114) (4.115) We now use the second equation of the new set to eliminate 2 2 from the other two equations of the new set to write the final reduced set as
14 221 = (4.116) 17’ -17x2 3423 = -1, (4.117) 22 0=-. (4.118) 17 Obviously the last equation is a contradiction, thus indicating that the original set is incompatible, hence does not have a solution. Could we have seen this from the beginning? Before we give the answer, let us go back to the first case where we have two equations and two unknowns [Eqs. (4.86) and (4.87)] and write it as
+
UlX
a22
+ b1y = c1,
+ b2y = c2.
We first solve the first equation for
J:
(4.119) (4.120)
in terms of y as
ci
bi -y,
a1
a1
2 == - -
(4.121)
which, when substituted back into the second equation [Eq. (4.120)] and solved for y gives
(4.122)
220
DETERMINANTS AND MATRICES
Using this in Equation (4.121), the final solution can be written as 2 =
c1bz baa1
-
hc2
-
ash'
c2a1
-
a2c1
(4.123)
(4.124) baa1 - a2b1. In general, the solution exists if the denominator is different from zero, that is, when Y=
baa1 - a 2 h
# 0,
(4.125) (4.126)
Geometrically, this result can be understood by the fact that the two equations, ~ 1 3 : b l y = c1 and a2x b2y = c2, correspond to two straight lines in the xy-plane and the existence of solution implies their intersection at a finite point. When the two ratios are equal,
+
+
(4.127) the two lines have the same slope. Hence they are parallel and they never intersect. We now introduce determinants that play a very important role in the theory of matrices. We first write Equations (4.119) and (4.120) as a matrix equation: (4.128)
AX = y ,
(4.129)
where the matrices A, x, and y are defined as (4.130) We now write the solution [Eqs. (4.123) and (4.124)] in terms of determinants as
where the scalar in the denominator is called the determinant of the coefficients, (4.132) = alba
- azbl,
(4.133)
DETERMINANTS
221
In this regard, solution of the linear system [Eqs. (4.119) and (4.120)] exists if the determinant of the coefficients does not vanish. Before we generalize this result to a system of n equations with n unknowns, we give a formal introduction to determinants. 4.6
DETERMINANTS
Determinants are scalars associated with square matrices. We have already introduced the determinant of a 2 x 2 square matrix (4.134) as
det A
all
=
= a11a22
a12
-
(4.135)
a21a12,
which is also written as D(A). The order of a square matrix is also the order of its determinant. There are several different but equivalent ways to define a determinant. For n = 1, the determinant of a scalar is the scalar itself. For n = 2, we have defined the determinant as in Equation (4.135). In general we define determinants of order n > 1 as n
det A
= ~ ( - l ) l + ” a l det , A(llu),
(4.136)
v=l ,
where detA(1lu) is called the minor of al,, which is the determinant of the submatrix obtained by deleting the 1st row and the uth column of A . When n = 2, Equation (4.136) gives
1 2: 1
= (-1)’+’a11
+ (-1)1+2a12 det A(ll2)
det A(1/1)
a22
= a11a22
-
(4.137) (4.138)
a12a21
and for n = 3, the determinant becomes all
a12
a13
a21
a22
a23
a31
a32
a33
=all
I
+ a13
a22
a23
a32
a33
1
1 1 1’
a21
a22
a31
a33
-
a12
a21
a23
a31
a33
1 (4.139)
In Equation (4.136) we have used the first row to expand the determinant in terms of its minors. However, any row can be used, with the result being
222
DETERMINANTS AND MATRICES
the same for all. In other words, for a square matrix, A, of order n the determinant can be written as n
det A
=
~ ( - l ) z i ” u i , det A(ilv),
(4.140)
v= 1
where i is any one of the numbers i = 1,.. . ,n. From this definition it is seen that the determinant of a square matrix of order n can be expanded in terms of the determinants of its submatrices of order n - 1. This is called the Laplace development. Determinants can also be found by expanding with respect to a column, hence we can also write n
det A
=
(4.141)
x ( - l ) ” + z u , i det A(vli), ,=l
where i is anyone of the integers i = 1,.. . , n. Formal proofs by induction can be found in linear algebra books. The rank of a square matrix is the largest number among the orders of the minors with nonvanishing determinants. Example 4.4. Evaluation of determinants: Given the matrix 0
1
2 (4.142)
let us evaluate the determinant by using the Laplace development with respect to the first row: d e t A = (-l)’+’(O)
1
-1
1
1
1 1
+(-1)1+2(1)
1 1
(4.143) = O(-1)
-
l ( 1 - 1) + 2(l)
(4.144) (4.145)
= 2.
To check we also expand with respect to the third row: detA
=
(-1)3+1(1)
1
1 -1
2
1
+(-1)3+2(0)
1 1 0
2
(4.146) = 1(1+ 2)
=2 and the second column:
+ 1(-1)
(4.147) (4.148)
PROPERTIES OF DETERMINANTS
1 1
d e t A = (-l)’+’(l) 1 1 +(-1)’+’(-1)
223
1 1 0 2
(4.149) = -1(1
-
1) + (-l)(-2)
- 0(-2)
(4.150) (4.151)
= 2.
A convenient way to determine the signs in front of the minors is to use the following rule.
+ - + - + -
+
-
+
-
+
-
+
...
+ ...
... (4.152)
+ ...
Advantage of the Laplace development is that sometimes most of the elements in a row or column may be zero, thus expanding with respect t o that row or column simplifies the calculations significantly. For example, for the matrix
1 2 1 - 1 A = [ ;
;; ;
0 0 0
2
j,
(4.153)
evaluation of the determinant is considerably simplified if we use the third column and then the second column of the 3 x 3 minor as detA=+(l)
1 2 3 1 0 1 0 0 2
(4.154) (4.155)
= -2(2) = -4.
4.7
PROPERTIES
(4.156)
OF D E T E R M I N A N T S
Evaluation of determinants by using the basic definition is practical only for small orders, usually up to 3 or 4. In this regard, the following properties are extremely useful in matrix operations:
224
DETERMINANTS AND MATRICES
Theorem 4.1. The determinant of the transpose of a square matrix is equal to the determinant of the matrix itself. For example, the determinant of the transpose of the matrix A in Equation (4.153) is again -4 :
11
1 1 0 1
1-1
3
1
1 1
detA=detA=
(4.157)
1 2 1
1 1 0 2 0 0 =(+1) 3 1 2
(4.158)
(4.159) =
-4.
(4.160)
Theorem 4.2. If each element of one row or column is multiplied by a constant, then the value of the determinant is also multiplied by the same constant. For example, if 1 0 1
0 1 1 1 =2, 0 2
(4.161)
then =
2 (ClCO)
,
(4.162)
where co and Clare constants. Theorem 4.3. If two rows or columns of a square matrix are proportional, then its determinant is zero. A special case of this theorem says, if two rows or colunins of a square matrix are identical then its determinant is zero. Consider the square matrix
*=\/ 2
4 2
2 1
O1
L
(4.163)
0
where the first two columns are proportional. the determinant is 2
= -2
+ 2 = 0.
0ll
(4.164) (4.165)
PROPERTIES OF DETERMINANTS
225
Theorem 4.4. If two rows or columns of a determinant are interchanged, then the value of the determinant reverses sign. For example, if we interchange the first two rows in the determinant 1 0 1
0 1 1 1 =1, 0 2
(4.166)
we get
0 1 1 0 1 0
1 1 2
=-1.
(4.167)
Theorem 4.5. The value of a determinant is unchanged, if we add to each element of one row or column a constant times the corresponding element of another row or column. For example, consider 1 0 1 A = ( 01 11 01 ) with det A
= 2.
(4.168)
We now add 3 times the first column to the second column: A’=(
1 Of3 1 1 1+3 O ) = ( 0 1+0 1
1 3 1 1 4 0 ) 0 1 1
(4.169)
and add twice the first row of A‘ to its third row to write
3 A”
=
(4.170) 0+2
1+6
1+2
2
7
3
where the determinants of A , A’ and A” are all equal: det A = det A‘= det A”
(4.171)
= 2.
Theorem 4.6. Determinant of the product of two square matrices, A and
B, is equal to the product of their determinants: det(AB) = (det A) (det B) .
(4.172)
Consider two square matrices
1 0 1 1 0 A = ( 10 11 O 1 ) a n d B = ( O 1 -1 1
0
I1 )
(4.173)
226
DETERMINANTS AND MATRICES
detAB
4.8
=
2 -1 1 1 1 0
1 1 2
=
( d e t A ) ( d e t B ) = 4.
(4.174)
CRAMER'S RULE
We now consider the case of three linear equations with three unknowns to be solved simultaneously: (4.175) (4.176) (4.177) Following Gauss's method of elimination, we use the first equation to eliminate 5 1 from the other two equations. We multiply the first equation by a21/a11, a l l # 0, and subtract from the second. Similarly, we multiply the first equation by ~ 3 1 / ~ 1 1all , # 0, and subtract from the third to get the equivalent set
We now use the second equation of the new set to eliminate two equations. We first multiply the second equation with
22
from the other
(4.181)
and subtract from the first, and then we multiply the second equation by
(4.182)
CRAMER’S RULE
227
and subtract from the third to get the final equivalent set:
(4.183)
(a22
-
5 2
a11
+ (a23
-
all
5 3 = (b2
-
Ebi) ,
(4.184)
The last equation [Eq. (4.185)] gives the value of 23 directly, while the first and the second equations [Eqs. (4.183) and (4.184)] give the values of 5 1 and 52 in terms of 53, respectively. Thus completing the solution. We have assumed that the coefficients all and
(
a22
-
1
-a12 a21 all
to be different from zero. In
cases where we encounter zeros in these coefficients, we simply rename the variables and equations so that they are not zeros. We now concentrate on the third equation [Eq. (4.185)], which can be rearranged as 23 =
all(a2263 all(a22a33
-
-
a32b2) - al2(a2lb3
a32a23) - alZ(a21a33
-
+ bi(azia3z a31a22) a31a23) + a13(a21a32 a31a22)
a3lb2)
-
-
-
’
(4.186) Using determinants, this can be written as
x3 =
all
a12
bl
a21
a22
b2
a31
a32
b3
all a21
a12 a22
a13 a23
a31
a32
a33
(4.187)
228
DETERMINANTS AND MATRICES
Values of the remaining two variables, 21 and 2 2 , can be obtained from Equation (4.186) by cyclic permutations of the indices, that is, 1+2+3+1+2-+...
21 =
52
=
1 det A
bi
a12
a13
b2
a22
a23
b3
a32
a33
1 det A
all
b1
a13
a21
b2
a23
a31
b3
a33
-
-
(4.188)
,
(4.191)
,
where det A is the determinant of the coefficients: detA
=
all
a12
a13
a21
a22
a23
a31
a32
a33
(4.192)
Note that for the system to yield a finite solution the determinant of the coefficients, det A , has to be different from zero. Geometrically, the three equations [Eqs. (4.175)-(4.177)] correspond to three planes in the ~ 1 1 ~ 2 space with their normals given as the vectors (4.193) (4.194) (4.195) Thus the condition det A #O is equivalent to requiring that n1, n2,and noncoplanar, that is, n1
. (nz x
n3)
# 0.
n3
be
(4.196)
~ 3 -
CRAMER'S RULE
229
The above procedure can be generalized to n linear equations with n unknowns:
an1x1 + a n 2 2 2
+ ' . + annxn = bn, '
(4.197)
= b,
(4.198)
which can be conveniently written as
Ax where
A=
X = an1
an2
".
,b=
(4.199)
ann
We can also write A in terms of the column matrices
(4.200)
as
A = A(A1, A2,. . . , A n ) .
(4.201)
Cramer's rule says that the solution of the linear system, Ax = b, is unique and is given as
xj =
det A(A1, A 2 , .. . , AjP1,b, A j + l , .. . , A n ) > det A
(4.202)
where
det A =
(4.203)
(4.204)
230
DETERMINANTSAND MATRICES
is the determinant obtained from A by replacing the j t h column with b. The solution exists and is unique when the determinant of the coefficients is different from zero, that is, det A
# 0.
(4.205)
Using determinants the proof is rather straightforward. We substitute Equation Ax = b into Equation (4.202) to write (det A) x j = det A(A1,.. . , Aj-1, (Alzl
+ . . . + Anz,), A j + l , . . . , An). (4.206)
Using Theorem 4.5 we can write this as (det A) z j
, Aj-1, A 1 ~ 1Aj+l , . . . ,A,) + . . . + det A(A1, . . . ,A,-1, Anxn,Aj-I,. . . ,A,),
= det A(A1,.. ,
(4.207)
which also after using Theorem 4.2 becomes (det A) z j
+ .. . + xj det A(A1,. . . , Aj-1, A j , Aj+l, . . . , A,) + . . . (4.208) + 2 , det A(A1,. . . ,Aj-1, A,, A j + l , . . . ,A,).
=5 1
det A(A1,.. . , Aj-1, A l , A j + l , .. . ,A,)
Except the j t h term, all the determinants on the right-hand side have two identical columns, hence by Theorem 4.3 they vanish, leaving only the j t h term giving (det A) x j = x j det A(A1,. . . , Aj-I, Aj, Aj+l,. . . ,A,), (det A) = (det A ) ,
(4.209) (4.210)
thereby proving the theorem. 4.9
INVERSE
OF A M A T R I X
Another approach to solving systems of linear equations, Ax the inverse matrix, A-l, which satisfies
= b, is
by finding
Thc solution can now be found as
A-lAx
= A-lb,
(4.212)
x = A-lb.
(4.213)
We return to Equation (4.186) and rewrite it as
(4.214)
INVERSE OF A MATRIX
231
We immediately identify the terms in the parentheses as determinants of 2 x 2 matrices and write 23
=
1 -
det A [bl
1
a32 a12
a33 a13
1 1 1 1
a23
a21
I 1
a21
a22
a31
a32
-
b2
a12
a13
1 1 1+ 1
a33 a23
a31 a21
1+ 1
a11
a12
a31
a32
+
b3
a22 a32
a23 a33
I] I]
a33
a31 all
I]
a21
a22
Similarly, by cyclic permutations of the indices we write 1 det A
= -[b2
I
and 22
1 det A
[h. 1
=-
Comparing with x
b3
-
-
bl
b1
b2
a13
. (4.215)
(4.216)
. (4.217)
= A-lb,
(4.218) we obtain the components of the inverse matrix as
a;
=
1
a21
a22
a31
a32
det A
1
1
a:;
-
-
-
1
a11
a12
a31
a32
det A
1
,
-1 a33
=
1
all
a12
a21
a22
det A
1
, etc. (4.219)
In short, we can write this result as (4.220) where detA(j1i) is the minor obtained by deleting the j t h row and the i t h column. Notice that the indices in det A(jli) are reversed. Furthermore, this result is also valid for n x n matrices. In general, for a given square matrix A , if det A #0, we say A is nonsingular and a unique inverse exists as (4.221) Usually the signed minors are called cofactors: (cofactorA)ij
=
(-l)z+j det A(i1j).
(4.222)
Hence (4.223)
232
DETERMINANTS AND MATRICES
Another way to write the inverse is
(4.224) where the adjoint matrix, adjA, is defined as the matrix, whose components are the cofactors of the transposed matrix, that is, adjA = cofactorx. Inverse operation satisfies
(A
+ B)-'
+ B-l,
= A-'
(AB)-' = B - ~ A - ~ , 1
(0A-l
= -A-'.
(4.225) (4.226) (4.227)
(Y
Example 4.5. Inverse Matrix: Consider the square matrix 1
A = ( ;
; ;), 0
2
(4.228)
since the determinant is different from zero, det A = - 3, the inverse exists. To find A-' we first write the transpose
1
0
1 (4.229)
2 1 0 and construct the adjoint matrix term by term by finding the cofactors i-2S
-
( I: -1 I!
2
-2
;I)
(4.230)
and write the inverse matrix as
A-l
=
-_
-1
2
-2 (4.231)
HOMOGENEOUS LINEAR EQUATIONS
233
To check we evaluate 2
1
0
2 (4.232)
-1
1
0
0
-3
1
1 1 0 (4.233)
=I =AA-~.
4.10
(4.234)
HOMOGENEOUS LINEAR EQUATIONS
When the right-hand sides of all the equations in a linear system of equations are zero, a1121
+ .. . +
UlnX,
= 0,
(4.235) an121
+ . . + annxn = 0, '
we say the system is homogeneous. When the determinant of the coefficients, det A, is different from zero, there is only the trivial solution:
x1 = x2 = . . . - 2 ,
= 0.
(4.236)
When the determinant of the coefficients vanishes.
(4.237)
I
a,l
an2
. . . ann
I
then there are infinitely many solutions. Example 4.6. Homogeneous equations: Let us consider the homogeneous system
(4.238) (4.239) (4.240)
2 3 5 -1 3 2
-6 2 -4
=o,
(4.241)
234
DETERMINANTS AND MATRICES
We seek a solution using Gauss’s method of elimination. Using the first equation, we eliminate x from the other two equations to obtain the equivalent set
+
3y - 62 = 0, 17 --y 172 = 0, 2 5 --y + 52 = 0. 2
22
(4.242)
+
(4.243) (4.244)
Since the last two equations are identical, this yields the solution x y
= 0,
(4.245) (4.246)
= 22.
For the infinitely many values that z could take this represents an infinite number of solutions to the homogeneous system of linear equations [Eqs. (4.238) - (4.240)]. Whether the equation is homogeneous or not Gauss’s method of elimination can always be used with m equations and n unknowns. In general, the method will yield all the solutions. In cases where there are no solutions, the elimination process will lead to a contradiction. In case we obtain 0 = 0 for an equation, we disregard it. In cases where there are fewer equations than unknowns, n > m, the system cannot have a unique solution. In this case, m of the variables can be solved in terms of the remaining n - m variables. For matrices larger than 4 x 4 the method becomes too cumbersome to follow. However, using the variations of Gauss’s method, there exists package programs written in various computer languages. Also programs like Mathematica, Maple, etc. , include matrix diagonalization and inverse matrix calculation routines. PROBLEMS
1. Given A, B, and C as
A = ( -1 1
21 0 3 ) , B = ( 24
+=(
1 -1
1 2
I0 ) ,
1 -1
write the following matrices: (i) A + B , B
(ii) A,
+ A, A + 2B, B-2A,
-B
-
z,c,A+ B , E ,( A A + B ) .
(iii) AB,BA, AA, AB, CA, AC
C.
PROBLEMS
235
(iv) Write the row and the column matrices for A, B, and C 2. If A and B are two m x n matrices and
-
(A+oB) = A
cy
is a constant. show that
+ aB.
3. For a symmetric matrix show that
A
= A.
Also show that
A+A is symmetric.
4. Using the index notation, prove that
A(BC) = (AB)C. 5. Calculate the product AB for
(9 A = ( - '4
1' ) , B = ( ;
a
i).
(ii)
1 2 1 1 2 1 - 1 1 A = ( I 1 O),B=(2 1 1 1 2 ) 2 0 1 1 0 1 - 1 1 6. Use Gauss's method of elimination t o solve the following linear systems:
(9 3x1 5x1
+ 2x2 = 1, + 6x2 = 2.
(ii)
5x1 - 2x2 21
= 3,
+ 2 2 = 2.
7. For a square matrix, A, prove the following: (i) A2 - I
=
(A
+ I)(A
-
I),
236
DETERMINANTS AND MATRICES
(A - I ) ( A 2 + A +I), (iii) A2 - 4A+3I = (A-31)(A - I). (ii) A3 - I
=
8. Solve the following system of linear equations by using Gauss’s method of elimination:
+
32 + 2 y 42 = 2, 22 - y + 2 = 1, 2 2y 32 = 0.
+ +
Check your answer via the Cramer’s rule. 9. If
2 A=( 4 3 0 -
1 0 2 1 1 2 1 1
1 2 1 2
1 0 - 3 3 0
1
write
10. Using the following partitioned matrices, find AB and A
where
I ? = ( 01 0l ) , A l = 2 1 B l = ( ; B2=( ~ ) , B ~ = 2 , 1 ~ = 1 . 11. Evaluate the determinants (i)
(ii)
1 0 0 1
2 3 0 2
0 4 5 3
0 0 6 ’ 4
i)’
+ B[1,2\1,2]
PROBLEMS
237
(iii)
0 2 0 - 1 2
1 0 1 2 1 0 - 1 2 0 1 0 2 . 1 1 2 1 1 0 1 0
12. Find the ranks of the following matrices:
(9
(ii)
0
1 0 1 2 1 0 - 1 2
y;
;1+
13. Find the values for which the following determinant is singular:
14. Solve
and
+ 2x2 + 253 = 2, 2 1 + 5 2 + 323 = 2, z 1 x2 + Lc1 = 1
221
-
by using Gauss’s method of elimination. solution by finding A-l.
For each case, verify your
15. First check that the inverse exists and then find the inverses of the following matrices. Verify your answers.
238
DETERMINANTS AND MATRICES
(i)
(ii)
A = ( O1 2
01 ) .
1 1 -1 (iii)
61. 0
2
9
2
3 0 4 1 2 - 1 0 2
'
16. Simplify (i) [ ( A B ) p ' A - l ] - l , (ii) ( A B C ) - ~( c - ~ B - I A - I ) - ', (iii) [(AB)TA T I T .
17. If A and B are two nonsingular matrices with the property AB then show that
18. Show that ( ~ ~ 1 =- ~ l - 1 ~ - 1
19. If ABC
= I,
then show that
BCA = CAB 20. Find the inverses of the matrices
= I.
= BA,
PROBLEMS
A=(;
0 2
239
42 ) 2
3 -1 and
Check the relation ( A B ) - l
= B-lA-l
for these matrices.
21. Given the linear system of equations: 2x1
-
+ 2 3 = 2,
2x2
+
51 2x2 - 5 3 = 1, 5x1 - 5x2 423 = 4,
+
(i) Solve by using Gauss’s method of elimination. (ii) Solve by using Cramer’s rule.
(iii) Solve by finding the inverse matrix of the matrix of the coefficients.
22. For the homogeneous system 21 - 2 2
21
+ 2x2
+ + x4 = 0, 23
-
23
-
x4
= 0,
+
3x1 - x2 - x3 2x4 = 0, X I 3x2 x3 - 2x4 = 0,
+
+
show that the determinant of the coefficients is zero and find the solution.
23. Show the determinant
24. Evaluate the determinant - 1 0 0 1
1 3 2 2
2 0 2 0 1 1 0 1
240
DETERMINANTS AND MATRICES
by using Laplace construction with respect to a row and a column of your choice.
25. Solve the following system of linear equations by a method of your choice:
(9 3x1 x1
+ 2 2 2 3 = 1, + 5 2 + x3 = 1, -
22
- 2 3 = 0.
(ii)
4x1 - 2x2 + 2x3 = 1,
+
XI 3x2 - 2x3 = 1, 4x1 - 3x2 +x3 = 1.
26. Write the inverse of a diagonal matrix. 27. Solve the following system of linear equations by using Cramer’s rule and interpret your results geometrically:
(9 3x1 - 2x2
+ 2x3 = 10,
+ 2x2 3x3 = -1, 4x1 + 2 2 + 2x3 = 3.
21
-
(ii)
(iii)
2x1
+ 5x2
- 3x3 = 3, 2x2 + 2 3 = 2, 4x2 - 3x3 = 12.
21 -
7x1
+
3x1 - 2x2 + 2x3 = 0, 21 2x2 - 3x3 = 0, 4x1 5 2 2x3 = 0.
+ + +
CHAPTER 5
LINEAR ALGEBRA
Discussion of vectors usually starts with their geometric definition as directed line segments. Introduction of coordinate systems allows us to extend the concept of vectors to a much wider class of objects called the tensors, which are defined with respect to their transformation properties. Vectors now belong to a special class of tensors called the first-rank tensors. It is well known that in n dimensions a given vector can be written as the linear combination of n linearly independent basis vectors. Linear algebra is basically the branch of mathematics that uses the concept of linear combination to extend the vector concept to a much broader class of objects. 5.1
FIELDS A N D VECTOR SPACES
We start with the basic definitions used throughout this chapter. As usual, a collection of objects is called a set. The set of all real numbers is denoted by R and the set of all complex numbers by @. The set of all n-tuples of real numbers,
x = (z1,22,.. . , zn), Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
(5.1)
5. SelSuk Bayin 241
242
LINEAR ALGEBRA
is shown by Rn.Similarly, the set of n-tuples of complex numbers is shown by C7'. An essential part of linear algebra consists of the definitions of field and vector space, which is also called the linear space. Definition 5.1. A field K is a set of objects that satisfy the following conditions: (I) If a , p are elements of K , then their sum, a+P, and their multiplication, ap, are also elements of K . (11) If a is an element of K , then -a such that a ( - a ) = 0 is also an element of K . Furthermore, if a # 0, then a - l , where a(.-') = 1 is also an element of K . (111) The elements 0 and 1 are also elements of K . The set of all real numbers, R, and the set of all coniplex numbers, @, are fields. Since the condition I1 is not satisfied, the set of all integers is not a field. Notice that a field is essentially a set of objects, elements of which can be added and multiplied according to the ordinary rules of arithmetic and that can be divided by nonzero elements. We use K t o denote any field. Elements of K will be called numbers or scalars. We now introduce the concept of linear space or as usually called the vector space. A vector space is defined in conjunction with a field and its elements are called vectors. Definition 5.2. A vector space, V, over a field, K , is a set of objects that can be added with the result being an element of V and the product of an element of V with an element of K is again an element of V. A vector space also satisfies the following properties: (1) Given any three elements, u , u and w, of V, addition is associative:
+
u
+ (v + w)= (u+ v) + w.
(5.2)
(11) There is a unique element, 0, of V, such that
o+u=u+o=u,
U € V .
(5.3)
(111) Let u be an element of V, then there exists a unique vector, -u,in V such that
u+ (-u)= 0.
(5.4)
(IV) For any two elements, u and v,of V, addition is commutative:
u + v = w +u.
(5.5)
(V) If a is a number belonging to the field K , then a(u+v)=au+av, u,vEV.
(5.6)
(VI) If a,P are two numbers in K , then we have
( a + P). = av
+ pv, 21 E v.
(5.7)
FIELDS AND VECTOR SPACES
(VII) For any two numbers
Q
243
and p in K , we have
(Qp)v = a(Dv),v E
v.
(5.8)
(VIII) For all elements, u,of V , we have
1.u = u,
(5.9)
where 1 is the number one. As we shall demonstrate in the following examples, use of the word vector for the elements of a vector space is largely a matter of convenience. These are essentially linear spaces, where their elements can be many things like ordinary vectors , matrices, tensors, functions, etc.
Example 5.1. The n.-tuple space X n : For a given field K , Let X" be the set of all n-tuples 17: = (x1,22,.
. . ,Z n ) ,
(5.10)
where xi are numbers in K . If
Y = ( Y l , y2,. . . , Y") is another element of X " with the sum
x + Y = (21 + Y l r Z 2
II:
(5.11)
+y :
+ Y 2 , . . . ,xn + Yn)
(5.12)
and the product ax is defined as
. ,
Q X = ( Q Z ~ , C Y I C .~ , QII:"), . Q E
K,
(5.13)
then it can easily be checked that X" is a vector space.
Example 5.2. The space of m x n matrices: The sum of two m x n matrices, A and B, is defined as A + B = a i j + b i j , i = l , . . . , m, j = 1 , . . . , n, while their multiplication with a scalar
Q
(5.14)
is defined as
a A = a a i j , i = l , . . . , m , j = l , . . . , n.
(5.15)
It is again obvious that the eight conditions in the definition of a vector field are satisfied. In this sense the set of m x n matrices defined over some field K is a vector space and its elements are vectors.
Example 5.3. Functions as vectors: Consider the set V of all continuous functions of R into R. It is obvious that V is a vector space. A subset of V is the set V' of all differentiable functions of R into R.If f and g are two differentiable functions, then f g is also differentiable. If Q is a scalar in K , then ~f is also a differentiable function. Since the zero function is differentiable, V' also forms a vector space. We say V' is a subspace of V.Similarly, if a subset K' of K also satisfies the conditions to be a field, it is a called a subfield.
+
244
5.2
LINEAR ALGEBRA
LINEAR COMBINATIONS, GENERATORS, A N D BASES
Let v l , 212,.. . , vTLbe the elements of a vector space V defined over the field K and let 0 1 , a 2 , . . . , a , be numbers. An expression like a1v1
+
a202
+ . . . + a,v,
(5.16)
is called a linear combination of v1,212,. . . , v,. As can be shown easily, the set, V’, of all linear combinations is a subspace of V and the vectors (5.17)
V I ~ V ~ , . 1.%.
are called the generators of V’.If V = V’, we say ~ 1 , 2 1 2 ., . . , w, generate V over K . Consider a vector space, V, defined over the field K . If v 1 , w 2 , . . . , v, are elements of V, then w1,v2,.. . , v, are said t o be linearly dependent over K ; if there exist numbers, a l , 0 2 , . . . , a,, in K , not all of them are zero, such that
a l % + a2212
+ . . . + a,v,
= 0.
(5.18)
If such a set of numbers, a 1 , a 2 , . . . ,a,, cannot be found, then we say the vectors 211, u2,. . . , v, are linearly independent. Example 5.4. Linear independence: Consider the vector space V and the vectors
=
Rn
el = (I,(),. . . , O ) , e2
= (0,1,. . . , O ) ,
(5.19) e, = (1,0,.. . ,I),
Since we cannot have ale1
+ a2e2 + . . . + a,e,
=0
(5.20)
unless all ai are zero, e l , e2, . . . , e, are linearly independent. Example 5.5. Linear independence of functions: Consider the functions cost and sint, where t is real. For these functions t o be linearly independent, it should be impossible to find two real numbers, a and b, both nonzero, such that the equation acost
+ bsint = 0
(5.21)
is satisfied for all t. Differentiating the above equation gives another equation, -asint
+ bcost = 0,
(5.22)
LINEAR COMBINATIONS, GENERATORS,AND BASES
245
which, when combined with the original equation, gives u2 = - b .2
(5.23)
Naturally, this cannot be satisfied in the real domain unless
u=b=O.
(5.24)
Hence, cos t and sin t are linearly independent. In general, let V be the vector space of all functions of the real variable t ; then the linear independence of any given number of functions,
can be checked by showing that the equation
cannot be satisfied unless all cri are zero. Consider an arbitrary vector space, V, defined over the field K . Let w1 ,212, . . . , v, be linearly independent vectors in V . Suppose that we have two linear combinations that are equal: QlUl
+ a2v2 +
'
' . + a,v,
=PlVl
+
p2w2
+ . . + &u,, '
(5.27)
then Qi
=pz,
i = 1, . . . ,n.
(5.28)
The proof is simple: We write Equation (5.27) as (a1 - P l ) V l
+
( 0 2 - P2)w2
+ . + (a, ' '
-
Pn)wn = 0.
(5.29)
Since ~1,212,. . . , w, are linearly independent, the only way to satisfy the above equation is by having
a Z = p Z , i = l , . . . ,n.
(5.30)
We now define basis vectors of V over K as the set of elements { w l , v2,.. . , v,,1 of V, which generate V and that are linearly independent. In this regard, the set of vectors {el, e2,. . . ,e,} defined in Equation (5.19), forms a basis for IWrL over R.In general, a basis for V is a set of linearly independent vectors, ( ~ 1 , 2 1 2 , .. . ,w,} , that spans V. In other words, every element v of V can be written as a linear combination of the basis vectors {q, v2,. . . ,v,} as
v
=~
1
+ ~a22121 + . . . + a,~,,
ayi E K .
(5.31)
A vector space is finite-dimensional if it has finite basis, and the dimension is defined as the number of the elements in its basis {q,v2,.. . ,w,} . In a
246
LINEAR ALGEBRA
vector space V, if one set of basis has n elements and another basis has m elements, then n = m. For a given vector space, V, over the field K , if U and W are subspaces such that
U+W=V
(5.32)
and
U nW
=
{o},
(5.33)
that is, the intersection of U and W is the null set, then V is the direct sum of U with W , which is written as
V=U$W.
(5.34)
Dimension of V is equal to the sum of the dimensions of U and W. Formal proofs of these results can be found in books on linear algebra.
5.3 COMPONENTS One of the advantages of introducing basis vectors, {vl, v2,.. . ,vn}, in ndimensional space V is that we can define coordinates or components analogous to the coordinates of vectors in Cartesian space. Components of a vector, w, in V are the scalars, a1, a2,. . . ,an, which are used to express w as a linear combination of the basis vectors as 'u
=QlU1
+ a2v2 + . . . +
1aiva.
QnUn
(5.35)
n
=
(5.36)
i=l
For a given vector, v, and the basis, B = {v1,v2,... ,urn},the scalars a{, = 1,.. . , n, are unique. A given vector v can be conveniently represented as the column matrix
i
V =
(5.37)
To indicate the matrix representation we use boldface character, that is, essentially w and v represent t,he same element in V . In the matrix representation order of the numbers ai follow the order in which the basis vectors are written. In a given n-dimensional space V, let
COMPONENTS
247
and
B’={?I;,,vi} .;,...
(5.39)
be two sets of bases. Then there exists unique scalars S i j , i = 1 , 2 , . . . , n, j = 1 , 2 , .. . , n, such that
c n
v; =
(5.40)
szjvj.
j=1
Let a:, i
= 1,.. .
, n , be the components in terms of B’ of a given u in V : ‘
I
21 = a1v1 n
=
+ a;.; + . . . + a;.:,
(5.41)
CLy:.;.
(5.42)
i=l
Using Equation (5.40); we can write Equation (5.42) as (5.43) n
n
(5.44)
(5.45) Since in the unprimed basis v is written as n
v =cajuj,
(5.46)
j=1
we obtain the relation between the components found with respect to the primed and the unprimed bases, B‘ and B, as (5.47) i=l
Representing S t j ; i = 1 , 2 , . . . , n, j = 1 , 2 , .. . , n, in Equation (5.40) as an n x 11 matrix,
S=
,
(5.48)
248
LINEAR ALGEBRA
we can write Equation (5.47) as
-
v =Sv‘,
(5.49)
where v and v’ are the column matrices
(5.50)
and S is the transpose of S. We now write Equation (5.47) explicitly as
+ Slza: +
SllQ’l
s l , C u :
s 2 1 4 s22a;
+ . . + Snla:, = + . . . + Sn2Q:, = ‘
Q1,
Q2,
+ s 2 , a ; + . . . + snna:,= a,.
(5.51)
When v = 0, that is,
the above system of linear equations [Eq. (5.51)] is homogeneous, hence the only solution that it has is the trivial solution:
v‘ = 0
(5.53)
or
s
This means that the determinant of is different from zero. Remembering that for a square matrix the determinant of S is equal t o the determinant of 3, we conclude that both S and 3 are invertible, hence we can write
-
(5.55)
v’=S - v.
We remind the reader that v‘ and v are essentially the same vector in V. They are just different representations of the same vector in terms of the primed, B’, and the unprimed, B, bases, respectively [Eqs. (5.42) and (5.46)]. Note that if we prefer to write Equation (5.40) as = Sjivj,Equation (5.49) becomes v = Sv’.
c,”=,
249
LINEAR TRANSFORMATIONS
5.4
LINEAR TRANSFORMATIONS
Linear transformations,which are also called linear operators, help us to relate the elements of two vector spaces defined over the same field. A linear transformation, T, is defined as a function that transforms the elements of one vector space, V, into the elements of another, W , which are both defined over the field K and such that
+ Pv) = aT(u)+ P T ( V )
T(au
for all u , v in V and for all scalars
(5.56)
a , P in K .
Example 5.6. Space of polynomials: Consider the differential operator d T = - acting on the space, V, of polynomials. Given a polynomial
dx
f ( x ) of order n :
f ( x )= a0
+ a15 + a252 + . . . + anxn,
(5.57)
the action of T on f ( x ) is to produce another polynomial of order n - 1 as d -f(x)
dx
= a1
+ 2a2x + . . . + na"xn-1.
(5.58)
It is easy t o check that T is a linear transformation from V into V
Example 5.7. Linear transformations: Consider the matrices all
a12
A=[ am1
am2
...
'.:
'i'
) , x =
' . . amn
with elements in the field K . The function T defined by the equation
T ( z )= AX
(5.60)
is a linear transformation from the (nx 1)-tuple space K" to the ( mx 1)tuple space K".
Example 5.8. Linear Transformations: Let V be the space of continuous functions defined over the field of real numbers R.The transformation defined as rx
(5.61) is a linear transformation from V into V , where f ( x ) is an element of V. Linearity of the transformation follows from the properties of the Riemann integral.
250
LINEAR ALGEBRA
An important property of linear transformations is that they preserve linear combinations, that is, if u1, u2,. . . , u, are vectors in V, then
We now state an important theorem: Theorem 5.1. In a given finite-dimensional vector space, V, defined over the field K , let the basis vectors be {Q, v2,.. . , v,}. In another vector space, W , defined over the same field, K , let { w l ,w 2 , . . . , w,} be any set of vectors. Then, there is precisely one linear transformation, T , from V into W such that T ( v i )= W 2, -
i = 1, . . . , n .
(5.63)
The proof of this theorem can be found in Hoffman and Kunze. Its importance is in formally stating the central role that linear transformations play among many possible transformations from V into W.
5.5 M A TRIX REP RESENT A T I0N 0F TRANS FOR M A TI0NS Consider an n-dimensional vector space, V, over the field K with the basis Bv = ( ~ 1 , 2 1 2 , .. . , v,}. Similarly, let W be an m-dimensional vector space over the same field K with the basis Bw = { W I , w2,. . . , w,}. If T is a given linear transformation from V into W over the field K , then the effect of T on a vector of V will be to convert it into a vector in W. In other words, the transformed vector, T v , can be expressed uniquely as a linear combination of the basis vectors of W. Similarly, each one of the transformed basis vectors, T v i , can be uniquely expressed as the linear combination ...
T v= ~
C
Aijwi
(5.64)
i=l
in terms of the basis vectors { w l ,w2,. . . , w,} , where the scalars Aij correspond to the components of Tvj in the W space in terms of the basis Bw. In other words, the m x n matrix, A i j , i = 1,.. . ,m, j = I , . . . , n, uniquely defines the effect of T on the vectors of V.Hence it is called the transformation matrix of T with respect to the bases Bv and Bw. Consider an arbitrary vector PI in V, which we write in terms of the basis Bv as 'u
= QlVl
c
+
Q2V2
+ . . . + a,v,
(5.65)
n.
=
ajvj.
j=1
(5.66)
MATRIX REPRESENTATION OF TRANSFORMATIONS
251
If we act on v by T , we obtain n
T~ =
T C ~ ~ U ~
(5.67)
j=1
n
=
Caj(Tvj).
(5.68)
j=l
Using Equation (5.64), we write this as n
m
(5.69)
(5.70) In other words, n
pi
=CAijaj
(5.71)
j=l
are the components, pi, i = 1 , 2 , . . . m , of T v in terms of the basis Bw of the space W. Using the matrix representations of v, w, and A:
(5.72)
(5.73)
we can write Equation (5.71) as w = Av.
(5.74)
We now summarize this result formally as a theorem: Theorem 5.2. Let V be an n-dimensional vector space over the field K with the basis vectors Bv = { v l ,v2, . . . , vn} and let W be an m-dimensional vector space over the same field with the basis vectors Bw = { w ~~ ,2 , ... ,w,}. For each linear transformation, T , from V into W, we can write an m x n matrix, A, with the entries in the field K such that
[TvIBw = Av = A [ u ] B ~ , ,
(5.75)
252
LINEAR ALGEBRA
where v is any element in V . Furthermore, the transformation from T + A is a one-to-one correspondence between the set of all linear transformations from V into W and the set of all rn x n matrices over the field K . We call the matrix A the transformation matrix of T with respect t o the bases Bv and Bw . To make this dependence explicit, we write the matrix that represents T as
We have written the subscripts as B w B v , since the row dimension is determined by W and the column dimension is determined by V.
5.6
ALGEBRA OF T R A N S F O R M A T I O N S
Let T and U be two linear transformations from V into W , where V is a vector space of dimension n and W is a vector space of dimension m, both of which are defined over the field K . Let A and B be the transformation matrices of T and U with respect to the bases Bv and B w , respectively. If P is the transformation written as the linear combination
P
=c
+
(5.77)
~ Tc ~ U ,
where c1 and c2 are scalars in field K , the transformation matrix of P with respect to the bases Bv and Bw is the same linear combination of the transformation matrices of T and U with respect to the same bases: [PIBwBv = 1 ' [TIBwBv
+ 2'
['IBwBv
.
(5.78)
For the proof we first write rn
(5.79) i=l
m
(5.80) i=l
where vj belongs to the basis Bv = {w1,v2,. . . , v,} and wj belongs to Bw {wl, w2,. . . , w,}. We then write Pvj = ( C l T
+ ~ 2 U ) v =j cl(Tvj) + ~ z ( U v j )
=
(5.81) (5.82)
m
(5.83)
ALGEBRA OF TRANSFORMATIONS
253
thus obtaining
+
[P]Bv.BL = CIA%?c ~ B , , ,i = I,.. . ,VL,j = 1,.. . ,TL.
(5.84)
Let us now consider the product of transformations, given three finitediniensional vector spaces, V ,W , and 2,defined over the field K. Let T be a linear transformation from V into W and let U be a linear transformation from W into 2.Also let Bv = ( u l , u 2 , . . . , u n } , Bw = (w1, w2,. . . , w n L } ,and BZ = ( ~ 1 ~ ~ . 2. ,, z .P } be the bases for the spaces V,W , and 2,respectively. We can write ( U T ) ( V j )= U ( T ? / j ) =
U
C
(5.85) (5.86)
Akjwk
m
(5.87) k=l m
/ P
k=l
\i=l
(5.88) w
/ m.
/ \
(5.89) where A k 3 , k = 1, . . . , m, j = 1, . . . , n, is the transformation matrix of T with respect to the bases Bv and B w , and B l . k , a = 1 , .. . , p , k = 1 , . . . , m, is the transformation matrix of U with respect to the bases Bw and B z . In other words, the matrix of the product transformation, U T , [ U T I B ~ B= ~ ['IBzBw
[TIBwBv
1
(5.90)
is the product of the transformation matrices of U and T. When T and U are two linear transformations in the same space, V , with respect to the basis Bv,we write
[UTIBv = [UlBv [TIBV '
(5.91)
The inverse transformation is defined as
[T-'TIBv
=
[TT-'IBv = I.
(5.92)
Using equation (5.90) we can conclude that (5.93) (5.94)
254
LINEAR ALGEBRA
That, is, the matrix of the inverse transformation is the inverse matrix of the matrix of the transformation. The one-to-one correspondence that Bv establishes between transformations and matrices guarantees that a linear transformation is invertible, if and only if the transformation matrix is invertible, that is, nonsingular:
5.7
CHANGE OF BASIS
We have seen that the matrix representation of a given transformation depends on thc bases used. Since there are infinitely many possible bases for a given n-dimensional vector space, we would like to find how the matrix elements change under a change of bases. Let T be a given linear transformation or an operator in n-dimensional vector space V and let Bv = {ul,vz,. . . ,vn} and BC = {ui,ui,. . . , u h } be two possible sets of bases for V. We need a relation between the matrices
[TIBVand [TIB;
(5.96)
'
Let S be the unique n x n matrix, Si3, i = 1 , 2 , . . . , n, j = 1 , 2 , . . . , n, relating the coinponents of a vector u in V in terms of the two bases as
bIBv = s 1 4 3 : ,
(5.97)
where Q1
Qn
By definition, for the linear transformation T we can write PIBv =
(5.99)
[TIBV[ 4 B V.
Applying Equation (5.97) to the vector Tv,we obtain [T~lBv = s [7%;
(5.100)
,
which, when substituted in Equation (5.99), gives [TIBv I.[ Bv
=
s [Tvl B; .
(5.101)
We now substitute Equation (5.97) into the left-hand side of the above equation to write [TIB"
s b1B;
=
s [T7JI,;
.
(5.102)
CHANGE OF BASIS
255
Multiplying both sides by the inverse S-l, we obtain
S-l
s [ u ] B ; = S-lS [ T u ] B ; ,
[T]BV
which we write as [TulB(,
'>
('-'
=
[TIB~
['IB;
'
(5.103)
(5.104)
Comparing with the corresponding equation [Eq. (5.99)] written in the BC basis, = fT1B; ['IB;
iTu1B;
,
(5.105)
we obtain the expression for the transformation matrix of T in the BL basis, [TIB;,in terms of [TIBVas
[TIB;= s-l [TIBVs.
(5.106)
Equation (5.97) means that the components of u in the bases Bv and BL are related as n
Qi =
csija;. j=1
In terms of the basis B v , u is written as u = we can write it in the basis BL as
(5.107)
ciaiui. Using Equation (5.107)
In other words, the transpose of the n x n matrix S relates the two sets of basis vectors as
c n
sijua = u;.
(5.109)
i=l
If two n x n matrices, A and B , defined over a field K , can be related by an invertible n x n matrix S over K as
B = S-lAS,
(5.110)
we say that B is similar to A over K . In conjunction with our result in Equation (5.106), when we say [TIB; is similar t o [TIBVmeans that on each n-dimensional vector space V over K , the two matrices, [TIBVand [TIB;, represent the same transformation or operator in terms of two different bases, Bv and BG, defined in V. Note that if B is similar to A, then A is similar to B through
A = SBS1.
(5.111)
256
5.8
LINEAR ALGEBRA
INVARIANTS UNDER SIMILARITY TRANSFORMATIONS
Under similarity transformations, the determinant and the trace of a matrix remains unchanged. If A and B are two similar matrices, A = SBS-l, then their determinants are equal. The proof follows from the properties of determinants: det A = det (SBS-I) = detSdetBdetS-l.
Since det S-l
=
(5.112) (5.113)
1/ det S, we get
det A = det B.
(5.114)
If A and B are two similar matrices, then their traces are equal: n.
trB =
(5.115)
[Ellii i=l n
=
C [S-'AS] i=l n
n
zz ..
(5.116)
n
(5.117)
(5.118) (5.119) j=1 k = l n
n
(5.120) j=1 k = l
n
(5.121) k=l
5.9
EIGENVALUES A N D EIGENVECTORS
We have seen that matrix representations of linear operators depend on the bases used. Since we have also established the fact that two different representations of a given operator are related by a similarity transformation, we are ready to search for a basis that presents the greatest advantage. Since diagonal matrices are the simplest matrices, whose rank and determinant can be read at a glance, our aim is to find a basis in which a given linear operator, T , is represented by a diagonal matrix. The question that needs to be
EIGENVALUES AND EIGENVECTORS
257
answered is, Can we always represent a linear operator by a diagonal matrix? If the answer is yes, then how do we find such a basis? Since in a given basis, B = ( u 1 , v2,. . . , en}, the matrix representation of T is diagonal:
(5.122)
if and only if n
n
j=1
j=1
(5.123) we start by searching vectors u that are sent to multiples of themselves by T . Definition 5.3. Let T be a linear operator on a vector space V defined over the field K . An eigenvalue, also called the characteristic value, is a scalar, A, such that there is a nonzero vector v in V with
T v = Xu,
(5.124)
where v is called the eigenvector of T corresponding t o the eigenvalue A. In an n-dimensional vector space a linear operator is represented by an n x n matrix, A, hence we concentrate on determining the eigenvalues and the eigenvectors of a given square matrix. We can write the eigenvalue equation (5.124) as the matrix equation
(A - AI)v = 0,
(5.125)
where A = A i j , i, j = 1, . . . , n is an n x n matrix. Eigenvector v, corresponding t o the eigenvalue A, is represented by the n x 1 column matrix
V =
(I:),
(5.126)
an where ai are the components. The n x n identity matrix is written as I. When Equation (5.125) is written explicitly, we obtain a linear system of homogeneous equations:
(5.127)
258
LINEAR ALGEBRA
IA - XI1
=
(All - A )
A12
...
A21
(A22 - A)
...
Al, A27l
An1
An2
...
(Ann. - A)
S-lAS = B,
= 0.
(5.128)
(5.130)
we can write d e t ( B - vI) = det(S-lAS
vI)
(5.131)
S-lvIS)
(5.132)
v1)S)
(5.133)
d e t ( A - vI) det S
(5.134)
-
= det(SplAS = det(S-'(A = det S-'
-
= d e t ( A - vI),
(5.135)
thereby proving that A and B have the same characteristic equation, thus the same eigenvalues. We now write the matrix equation [Eq. (5.129)] as (5.136) where a j k is the j t h component of the kth eigenvector. Consider the n x n matrix, s, columns of which are the eigenvectors, v k , k = 1 , 2 , . . . , n, as (5.137)
EIGENVALUES AND EIGENVECTORS
259
where
(5.138)
If the eigenvectors are linearly independent, then det S #O.
(5.139)
Hence the inverse of S exists. We can now write Equation (5.136) as n
(5.140) j=1
(5.141) =cSiiDlk:, i , k = l , ... ,n,
(5.142)
z=1
where we have introduced the diagonal matrix
Dlk
as
(5.143)
We can also write Equation (5.142) as
AS = SD,
(5.144)
which, after multiplying both sides by the inverse S-l, becomes
S-IAS
S-lSD = D.
=
(5.145) (5.146)
In othcr words, the matrix S, columns of which are the eigenvectors of A , diagorializes A by the similarity transformation
S I A S = D.
(5.147)
We now express these results formally in terms of the following theorems: Theorem 5.4. Let V be an n-dimensional vector space defined over the field K and let A be a linear operator or transformation from V into V.
260
LINEAR ALGEBRA
Let 211, u 2 , . . . ,v, be the eigerivectors of A corresponding to the eigenvalues X I , XI,. . . ,A, respectively. If the eigenvalues are distinct,
Xi
# Xj,
when i
#j,
(5.148)
then the corresponding eigenvectors, v1, v2, . . . , v,, are linearly independent and span V. Theorem 5.5. Let V be an n-dimensional vector space over the field K and let A be a linear operator from V into V. Assume that there exists a basis B = {vI,v2,. . . , v,} that spans V consisting of the eigenvectors of A with the eigenvalues A1, XI,. . . , A,, respectively, then the matrix of A with respect to this basis is the diagonal matrix
(5.149)
Example 5.9. Characteristic equation and the eigenvalues: Consider the matrix
;1) ,
A = ( ; 1; 0
(5.150)
l1
(1-A)
det(A-XI)
=
0
0 (1- A ) 0
1 0
(5.151)
(1- A)
Thus the roots of the characteristic equation gives three distinct real roots: A1 = 0,
A2 =
1,
A3
= 2.
(5.156)
Example 5.10. Characteristic equation and the eigenvalues: Consider
A = ( -2
') ' 0
(5.157)
261
EIGENVALUES AND EIGENVECTORS
where the characteristic equation is (5.158)
(1-A)
fi
det(A-XI) =
0
Jz ( 0 - A) 0
0 0 (0-A)
(5.160)
A1 = 0, A1 = 2, A1 = -1.
(5.166)
We now find the eigenvectors one by one, by using the eigenvalue equation
For A1 = 0 this gives
(#
J z o 0
0 ) ( 2 ) = 0 (
2),
(5.168)
which leads to the equations to be solved for the components as (5.169) (5.170) (5.171)
262
LINEAR ALGEBRA
The last equation leaves the third component arbitrary. Hence we take it as
where c1 is any constant. The first two equations determine the remaining components as a11 = a21
thus giving the first eigenvector as
= 0,
(5.173)
8)
v1 = c 1 (
(5.174)
Similarly, the other eigenvectors are found. For
A2
= 2 we write
(5.175) (5.176) (5.177) (5.178)
and obtain the second eigenvector as (5.179) (5.180)
where
c2
is an arbitrary constant. For
A3 =
-1 we write
(5.181) a13
f h a 2 3 = -a13,
ha13
+
0a23 = -a23,
0=
-a33
(5.182) (5.183) (5.184)
and obtain the third eigenvector as
v3 = c'3
():-
1
,
c3
# 0,
(5.185)
263
EIGENVALUES AND EIGENVECTORS
where c3 is an arbitrary constant. For the time being, we leave these constants in the eigenvectors arbitrary. Using these eigenvectors, we construct the transformation matrix S as (5.186)
Since the determinant of S is nonzero: det S = - 3 C l C 2 C g
# 0,
(5.188)
the inverse transformation, S p l , exists and its components can be found by using the formula (5.189) where S ( j l i ) is the minor obtained by deleting the j t h row and the i t h column. Note that the order of the indices in det S ( j 1 i ) are reversed; hence we first write the transpose of S as
0 (5.190) c3
-fie3
0
and then construct the inverse by finding the minors one by one as
1 s-'= det S
0
0
1
-
C1
(5.191)
264
LINEAR ALGEBRA
We can easily check that S-lS = I :
1 c1
0
0
-
0
4c2
c3
(5.192) 0
( ;; ) 1 0 0
=
cl
=s-ls.
(5.193)
We finally construct the transformed matrix
A'= S-lAS
(5.194)
as
0
0
-
1 .
C1
A'
=
0
0
-
1
c1 0 2 2c2 ac2
0
0
0o
a ) = ( 0 02 0
0
0 0
) ,
-1
(5.195) which is in diagonal form with the diagonal elements being equal to the eigenvalues:
(5.196) (5.197) (5.198)
MOMENT OF INERTIA TENSOR
5.10
265
M O M E N T OF INERTIA TENSOR
In rotation problems we can view the moment of inertia tensor, I, as an operator or transformation in the equation +
L = 13,
(5.199)
+
which relates the angular momentum, L , and the angular velocity, Ti?, vectors. Moment of inertia tensor is represented by the n x n matrix (5.200) where p ( r ) is the mass density and ri, i = 1 , 2 , 3 , stand for the Cartesian + components of the position vector r . The angular momentum, L , and the angular velocity, 3, vectors are represented by the column matrices
.=( Ei) 5). ancia=(
(5.201)
We can write the transformation equation [Eq. (5.199)] as
c 3
Lz =
Iij wj,
(5.202)
j=1
which, when written explicitly, becomes
+Ixywy +Ixzwz, L y = Iyzwx + I y y w y + I y z w z , L z = Izzwx + I z y w y + J z z w z , Lx
= Ixxwx
(5.203) (5.204) (5.205)
It is desirable to orient the Cartesian axes so that in terms of the new coordinates the moment of inertia tensor, 1’, is represented by the diagonal matrix
I =
(5.206)
0
0
I3
where 11,I,, and 1 3 are called the principal moments of inertia. Directions of the new axes denoted by the subscripts 1 , 2 , and 3 are called the principal directions of the object and the corresponding Cartesian coordinate system is called the principal coordinates. In the principal coordinates, Equations (5.203)-(5.205) simplify to (5.207) (5.208) (5.209)
266
LINEAR ALGEBRA
Figure 5.1
Moment of inertia tensor of a flat uniform triangular mass distribution.
whcre L1, Lz, L3 and wl,w2, w3 are the components of the angular momentum and the angular velocity along the principal axes, respectively. Finding the principal directions, which always exist, and the corresponding transformation matrix requires the solution of an eigenvalue problem for the moment of inertia tensor. This is demonstrated in the following example:
Exercise 5.12. Moment of inertia tensor: Let us now consider the moment of inertia tensor of a uniform flat rigid body with the density B and in the shape of a 45" right triangle as shown in Figure 5.1. Components of the moment of inertia tensor [Eq. (5.200)] are evaluated as follows: Since r2 = x 2 y2, we first find I,, as
+
1"1 lala-,
(5.211)
x ) 3 dx
(5.212)
a-x
I,, =
u ( r 2- x2)dydx
=0
=
I"(.
-
- 3a2x
= =
y2 dydx
u -(a4 3
-
+ 3ax2 - x3) dx
a2 a3 3a - + 3a2 3
-
a4 -1 4
a4 12
= u-.
(5.210)
(5.213) (5.214) ( 5 . 215)
Since the mass, M , of the object is M = cra2/2, we write I,, = M a 2 / 6 . Similarly, we obtain the other diagonal terms as
Ma2 Ma2 I,, = 6 I, zz = 3 .
(5.216)
MOMENT OF INERTIA TENSOR
267
For the off-diagonal terms, we find
Jo” 1 -gla la-x a-x
I,,
uxy dydx
= I,, = -
xy dydx
a2
dx
-
(5.2 17) (5.2 18) (5.2 19) (5.220)
-
Ma2 12 ’ I,, = I,, = 0, I,, = I,, = 0, - --
(5.221) (5.222)
thus obtaining the complete moment of inertia tensor as
Ma2 -
Ma2 12
--
6
Ma2 -
Ma2 12
I=
--
\
(5.223)
6
0
0
Ma2 3
The principal moments of inertia are found by solving the characteristic equation:
(5.224)
(5.225)
L
(5.226)
268
LINEAR ALGEBRA
which yields the principal moments of inertia as Ma2 Ma2 Ma2 , I3=(5.227) = -, I 2 = 4 12 . Orientation of the principal axes, which are along the eigenvectors, are found as follows: For the principle value 11 = we write the eigenvalue equation as 11
F,
1/2 3
-1/4
-1/4 0 1/2 0
0 1
)( )
=
MU2 3
( ::: )
,
(5.228)
a3 1
which gives
Ma2 all MU2 all 12
--
Ma2
Ma2 3 Ma2
-a l l ,
(5.229)
-a21 =
-a21,
(5.230)
Mu2
Ma2
-
12 a21 = -
+
Ma2
6
3 a 3 1
= -a 3 1 .
(5.231)
Thc first two equations reduce to a21 =
-2a11,
(5.232)
=
--all,
1 2
(5.233)
= a21 = 0.
(5.234)
a21
which cannot be satisfied unless a11
The third equation [Eq. (5.231)] gives the only nonzero component of the first eigenvector, a31 = c1, thus yielding v1=.1(;),
(5.235)
where c1 is any constant different from zero. Similarly, the other eigenvectors are found as
Again the constants c2 and c3 are arbitrary and different from zero. We now construct the transformation matrix S as (5.237)
(5.238)
MOMENT OF INERTIA TENSOR
269
Since det S = 2 c l c z c ~# 0, the inverse exists and is found by using the formula
(5.239)
where S ( j l i ) is the minor obtained by deleting the j t h row and the i t h column. This yields the inverse as
f o
0
-
1
c1
(5.240)
The inverse can be checked easily as
0
0
-
1
C1
1
-
2c2
l 2c2
o
1 -
1 -
2c3
2c3
0
--
( ; ; ;) 1 0 0
=
=ss-l.
(5.241)
270
LINEAR ALGEBRA
:I
In terms of the new basis vectors, ( v I , v ~ , v ~ we) ,can now write the moment of inertia tensor as
0
1
-
0
1 Ma2 S-lIS
3 0
=
\
5.11
0
0 Ma2 4 0
0 I2 0
:).
4
(5.244)
INNER P R O D U C T SPACES
So far our discussion of vector spaces was rather abstract with limited potential for applications. We now introduce the missing element, that is, the multiplication of vectors with vectors, which is called the inner product. This added richness allows us to define magnitude of vectors, distances, and angles and also determine the matrix representations of linear transformations. We will also be able to evaluate the proportionality constants, which were left undetermined in eigenvector calculations [Eqs. (5.174), (5.180), and (5.185) and Eqs. (5.235) and (5.236)] by normalizing their magnitudes to unity.
THE INNER PRODUCT
5.12
271
T H E INNER PRODUCT
For ordinary vectors in three-dimensional real space, the scalar product is defined as
(5.245) where d and
are two vectors given in terms of their Cartesian components:
The magnitude, a , of a vector d is found as
and the angle between two vectors, d and
7, is defined as (5.248)
We have seen that in terms of matrices, vectors can be written as column matrices, which we denote by boldface characters:
(5.249) Using the transpose, 6, we can write the scalar product of a with itself as
3
=
C aiai.
(5.251)
i=l
Similarly, the scalar product of two vectors,
3 and -b+, .is defined as (5.252)
3
=
C aibi. i=l
(5.253)
272
LINEAR ALGEBRA
Note that the scalar product is symmetric, that is,
(5.254)
(5.2 55) Generalization to n dimensions is simply accomplished by letting the sums run from 0 to 71. For a given abstract n-dimensional vector space, V , defined over the field K. we have already represented vectors, v, as column matrices:
(5.256)
where a; stands for the coefficients in the linear combination = Q12’1
+ a2u2 + . . . + cr,vn,
(5.257)
in terms of the basis
B = { v 1 , v 2 , . . . ,v,}.
(5.258)
We now introduce the inner product for abstract vector spaces as n
i=1
whcre
2’
and
UI
belong to V with the components
(5.260)
Wc can also write the inner product [Eq. (5.259)] in the following equivalent ways:
( v , w ) = vw = wv.
(5.261) (5.262)
The mgle between u and w is now defined as
(5.263)
THE INNER PRODUCT
273
where (v,u) and (w, w) stand for the magnitude squares, luI2 and lwI2 , respect ively. This definition of the inner product works fine for vector fields defined over the real field R.For vector fields defined over the complex field, @, it presents problems. For example, consider the one-dimensional vector v = za, its magnitude is
Iu1 = (ia, ia)l/2
(5.264)
= (-a2)1'2,
which is imaginary for real a. Since the inner product represents lengths or distances, it has to be real, hence this is not acceptable. However, there is a simple cure. We simply modify the definition of the inner product defined over the complex field @ as n
(5.265) i=l
where the asterisk indicates that the complex conjugate has to be taken. With this definition, the magnitude, Iv(,of a vector defined over the complex field becomes '*I
1/2
Ivl = ( v , v ) 1 / 2 =
&[+z]
i=l
=
[&$]
1/2
(5.266)
i=l
Since lail 2 0, the magnitude is always real. It is important t o keep in mind that for vector fields defined over the complex field the inner product is not symmetric: 2
# (w,u).
(5.267)
( v , w ) = (w,u)*
(5.268)
(u,w) However, the following is true:
Definition 5.4. For a given a vector field, V, defined over the real field,
R,or the complex field, @, an inner product is a rule that associates a
scalar-valued function t o any ordered pairs of vectors, v and w, in V such that (1)
(5.269)
(5.270) where u , u , and w are vectors in V
274
LINEAR ALGEBRA
(111) If a is a scalar in the underlying field K , then
(aw, w)= a*(v, w),
(5.271)
(v,QW)
(5.272)
= a ( v ,w),
ti and w are vectors in V. (IV) For any u in V,we have (v,u ) 2 0 and (v,u ) = 0 if and only if v = 0. The scalar (v(= ( v , v ) ' / ~is called the norm or the magnitude of v.
where
A vector space with an inner product definition is called an inner product space. In the above definition, if the underlying field is real, the complex conjugation becomes superfluous. A Euclidean space is a real vector space with an inner product, which is also called the dot or the scalar product. An inner product space defined over the complex field C is called a unitary space. Example 5.13. Definition of inner product: One can easily show that the set of all continuous functions defined over the complex field C and in the closed interval [0,1] forms a vector space V. In this space we can define an inner product as
(f(z), d z ) )= where
Jo
1
f * ( x ) d x )dx1
(5.273)
f ( ~ and ) g(z) are complex-valued functions of the real argument
3..
5.13
ORTHOGONALITY A N D COMPLETENESS
After we introduced the concepts like distance, magnitudelnorm, and angle, applications to physics becomes easier. Among the remaining central concepts that are needed for applications are the orthogonality and the completeness of sets of vectors that form a vector space. We say that two vectors, v and ui, are orthogonal, if and only if their inner product vanishes:
(v,w)
= 0.
(5.274)
Note that the condition of orthogonality is symmetric, that is, if (v,w) = 0, then ( w , v ) = 0. For ordinary vectors, orthogonality implies the two vectors being perpendicular to each other. Equation (5.274) extends the concept of orthogonality to abstract vector spaces. whether they are finite- or infinitedimensional. A set of vectors {vl,u2,.. . , vTL} is said t o be orthonormal if for all i and j , (vz,213)
= &,
(5.275)
holds. In a finite-dimensional vector space an orthonormal set is complete, if every vector v in V can be expressed as a linear combination of the vectors in
ORTHOGONALITY AND COMPLETENESS
the set. A set of vectors if the equation
(v1, v2,
el211
275
. . . ,v,} is said to be linearly independent
+ c2v2 + . . . + c,v,
(5.276)
= 0,
where c1, c2,. . . , c, are scalars, cannot be satisfied unless all the coefficients are zero: c1 = c2 = . . . = c, = 0. To obtain a formal expression for the criteria of linear independence, we form the inner products of Equation (5.276) by v l , v 2 , . . . , v, and use the linearity of the inner product to write
=o,
~ 1 ( ~ 1 , ~ 1 ) + ~ 2 ( ~ 1 , ~ 2 ) + ' ~ ~ + ~ , ( ~ 1 , ~ , )
+...+c,(m,v,)
c1(w,v1) +C2('U2,Q)
= 0,
This gives a set of n linear equations to be solved for the coefficients, c1, c2, . . . , c,, simultaneously. Unless the determinant of the coefficients vanishes, these equations do not have a solution besides the trivial solution, that is, c1 = c2 = . . . = c, = 0. Hence, we can write the condition of linear independence f o r t h e s e t (v1,v2, . . . ,v,} as
The determinant G is called the Gramian. For an orthonormal set in n dimensions the Gramian reduces to 1 0 ... 0 1 '..
G=
.. .
0 0
...
..
. ..
0 0
."
1
,
= 1.
(5.279)
Hence an orthonormal set is linearly independent. For a given orthonormal set, ( q ,7 1 2 , . . . , v,}, in an n-dimensional vector space, the following statements are equivalent: 1. The set (v1, 212,. . . , v,} is complete. 2. If (wi,v) = 0 for all i, then v = 0. 3 . The set (v1, v 2 , . . . , v,} spans V. 4. If v belongs to V, then n
v
=
ccivi,
(5.280)
i=l
where ci = ( u i , v ) are called the components of v with respect to the basis B = ( ~ 1 , 0 2 , .. . ,u,}.
276
LINEAR ALGEBRA
5.14 G R A M -S CHMIDT 0R T HOGO NA LIZ A T I0N It is often advantageous to work with an orthonormal set of n linearly independent vectors. Given a set of linearly independent vectors, { 211, 212, . . . ,v,}, we can always construct an orthonormal set. We start with any one of the vectors in the set, say 211, call it w1 = 211, and normalize it as
(5.281) where l(wl) = (w1, w1)1/2 is the norm of w1. Next we choose another element from the original set, say 212, and subtract a constant, c, times Z?l from it to write
We now determine c such that
h
(5.282)
eel.
w2
= 212
w2
is orthogonal to
-
(21,(212 - cZ?~)) = (21, 212)
A
-
Z?1 : h
c(e1, el) = 0.
(5.283)
Since (21, 21) = 1, this gives c as
(5.284)
c = (Z?1,vz). Hence
w2
becomes w2 = 212
We now normalize
w2
-
(&, v2)el. h
(5.285)
to obtain the second member of the orthonormal set as
(5.286) where
(5.287) We continue the process with a third vector from the original set, say write A
w3 = 213 - clel - c2e2.
213,
and
(5.288)
Requiring w3 to be orthogonal to both 21and Z?2 gives us two equations to be solved simultaneously for c1 and c2, which yields the coefficients as c1 = (21,213) and c2 = (22,213). Again, the third orthonormal vector is given as
(5.289)
EIGENVALUE PROBLEM FOR REAL SYMMETRIC MATRICES
277
Continuing this process, we finally obtain the last where I(w3) = (w3, member of the orthonormal set, {Zl, &, . . . ,gn}, as
(5.290) where n- 1
w, = un
-
- - y ( E i , un)Zi
(5.29 1)
i=l
and
l ( w n )= (wn,wn)1/2.
(5.292)
This method is called the Gram-Schmidt orthogonalization procedure. 5.15
EIGENVALUE PROBLEM FOR REAL S Y M M E T R I C MATRICES
In many physically interesting cases linear operators or transformations are represented by real symmetric matrices. Consider the following eigenvalue problem
A v= ~ AVX,
(5.293)
where A is a symmetric n x n matrix with real components and vx is the n x 1 column matrix representing the eigenvector corresponding to the eigenvalue A. Eigenvalues are the roots of the characteristic equation (all
IA-AII =
-A)
a21
...
a12
(a22
-A)
...
(5.294)
which gives an nth-order polynomial in A. Even though the matrix A is real, roots of the characteristic equation could be complex, and in turn the corresponding eigenvectors could be complex. Consider two eigenvalues, X i and A j , and write the corresponding eigenvalue equations as Avi = X i v i ,
Avj
(5.295) (5.296)
= Ajvj .
We multiply the first equation from the left by
V3 and write (5.297)
278
LINEAR ALGEBRA
We now consider the second equation [Eq. (5.296)) and take its transpose, and then we take its complex conjugate to write
-
v!A* = A*?* 3 3 3'
(5.298)
Since A is real and symmetric, A*= A, Equation (5.298) is also equal to
?*A = A*v* 3 3 3'
(5.299)
Multiplying Equation (5.299) with vi from the right, we obtain another equatioii:
v,*Avi= X?Z;*vi. 3 3
(5.300)
Subtracting Equation (5.297) from Equation (5.300), we obtain
0 = (A*3 - xz)v;Vz,
(5.30 1)
which leads us to the following conclusions: (I) When j = i, the expression Vfvi = ( u i , u i ) = IviI2 is the square of the norm of the i t h eigenvector, which is always positive and different from zero, hence we obtain
xg = xi.
(5.302)
That is, the eigenvalues are real. In fact, this important property of real and symmetric matrices was openly displayed in Examples 5.9-5.12. (11) When j # i and when the eigenvalues are distinct, X j # Xi, Equation (5.301) implies 3;;y =
(7JjpJZ)
= 0.
(5.303)
In other words, the eigenvectors corresponding t o the distinct eigenvalues are ort hog orial. (111) When i # j but X i = X j , that is, when the characteristic equation has multiple roots, for the root with the multiplicity s, s < n, there always exist s linmrly independent eigenvectors (Hildebrandt). We should make a note that t,his statement about multiple roots does not in general hold for nonsymmetric matrices. Eigenvalues corresponding to multiple roots are called degenerate.
5.16
PRESENCE OF DEGENERATE EIGENVALUES
To demonstrate how one handles cases with degenerate eigenvalues, we use an example. Consider an operator represented by the following real and symmetric matrix in a 3-dimensional real inner product space:
1 1 1
A = (
i).
(5.304)
PRESENCE OF DEGENERATE EIGENVALUES
279
We write the characteristic equation as (1-A)
JA-A11 =
1 1
1 (1 - A ) 1
1 1 (1 - A )
(5.305) (5.306)
which has the roots A1 = A2
= 0 and
A3 =
3.
We first find the eigenvector corresponding to the multiple root, by writing the corresponding eigenvalue equation as 1 1 1
( 11 1 1 ) ( 2 ; ) = 0 (
;;;),
(5.307) A1
= A 2 = 0,
(5.308)
which gives a single equation:
Hence we can solve only for one of the components, say a l l , in terms of the remaining two, a21 and a31, as = -a21
- U31.
(5.310)
Since a21 and a31 are not fixed, we are free to choose them a t will. For one of the eigenvectors we choose
which gives a l l = -c1, thus obtaining the first eigenvector as
v1=
( jl)
=c1(
jl).
(5.312)
(5.313)
For the second eigenvector corresponding to the degenerate eigenvalue, Xl,a 0. we choose
=
280
LINEAR ALGEBRA
which gives a l l = -cz, thus yielding the second and linearly independent eigerivector as
(5.315)
(5.316) At this point, aside from the fact that the eigenvectors look relatively simple, there is no reason for making these choices. Any other choice would be equally good as long as the two eigenvectors are not constant multiples of each other. Finally, for the third eigenvalue, A 3 = 3, we write
(5.317) (5.318) (5.319) (5.320) which implies a13 = a23 = a33 = c3, c3
This yields
v3
# 0.
(5.321)
up to an arbitrary constant, c g , as v3=c3(
1).
(5.322)
Linear independence of the eigenvectors, (v1,v 2 , v3) , can be checked by showing that the Gramian is different from zero, that is,
G = clczc3
1
-1
-1 1 0
1 1 1
1
=
-3clc~cg # 0.
(5.323)
We now construct the transformation matrix S as
(5.324)
(5.325)
PRESENCE OF DEGENERATE EIGENVALUES
281
Since the determinant of S, det S = -3clc2cg # 0, is different from zero, the inverse transformation matrix exists and can be found by using the formula
(5.326) where det S ( j l i ) is the minor obtained by deleting the j t h row and the i t h column. Notice that the indices in det S ( j l i ) are reversed. We find S-' as
s-1 =
(
2/3Ci
-1/3Ci
-1/3Cl
-1/3C2
-1/3C2
2/3C2
1/3c3
1/3~3
1/3~3
It can easily be checked that
1
.
ss-1 = s-1s = 1
(5.327)
(5.328)
and
S-lAS
=
(
0
0
) ( =
A2
A3
0 0 0
0 0 0 ) . 0 0 3
(5.329)
We now analyze the eigenvectors for this case more carefully. Using the definition of the inner product [Eq. (5.259)] it is seen that the two eigenvectors corresponding to the degenerate eigenvalues are not orthogonal, that is,
( w , m ) # 0,
(5.330)
However, the third eigenvector is orthogonal to both of them:
Since we picked two of the components of the eigenvectors corresponding t o the degenerate eigenvalue randomly, were we just extremely lucky to have them orthogonal to u3? If we look at the eigenvalue equation for the degenerate eigenvalues [Eq. (5.309)], X1,2 = 0, we see that we have only one equation for the three components, hence we can write the corresponding eigenvectors as
vx1,2
=
(
-a21
- a31
)
(5.332)
For reasons to be clear shortly, we introduce the constants el, c 2 , b, and c as
(5.333)
282
LINEAR ALGEBRA
and write
vx1.2
=
-c1(l+ b) - c2(l c2 clb c1 c2c
+ +
-(1
+ c) (5.334)
+ b)
-(I
+ c) (5.335)
In other words, the eigenvector, v ~ , ,for ~ the , doubly degenerate eigenvalue, X l , 2 = 0, can be written as the linear combination of two vectors vxl,s = C l W l
+ c2w2,
(5.336)
where
It can easily be checked that w1 and w2 satisfy the eigenvalue equations for the matrix A with the eigenvalue X = 0 [Eq. (5.309)],that is,
(5.338) (5.339) In general, any linear combination of eigenvectors belonging to the same eigenvalue is also an eigenvector with the same eigenvalue. The proof is simple. Let w1 and w2 be the eigenvectors of B with the eigenvalue A:
(5.340) (5.341) Then
B(cowi
+~
1
~ =2 B(cowi) )
+ B(ciw2)
= coBwl+ Q B W ~ = COXWl
+ ClXW2
= X(cow1
+ ClW2).
(5.342) (5.343) (5.344) (5.345)
The converse of this result is also true. If we can write an eigenvector, w, corresponding to the eigenvalue A,
BW= XW,
(5.346)
as the linear combination of two linearly independent vectors, w1 and w2, as
w = cow1
+
ClW2,
(5.347)
PRESENCE OF DEGENERATE EIGENVALUES
283
where both co and c1 are different from zero, then both w1 and w 2 are eigenvectors of B corresponding to the eigenvalue A. The proof is as follows:
+ +
+
B(cow1 C l W 2 ) = A ( C o W 1 C l W Z ) , c ~ B w c~~ B w = ~ coAwl+ c ~ A w ~ , c~(Bw Awl) ~ + c l ( B w 2 - A w ~ ) = 0.
(5.348) (5.349) (5.350)
Since in general co and c1 are different from zero, the only way to satisfy this equation for all co and c1 is to have
Bw~ = Awl, Bw= ~ Aw~,
(5.351) (5.352)
which completes the proof. As we stated earlier, for symmetric matrices of order n and for multiple roots of order s (s < n ) , there always exists s linearly independent eigenvectors. For the matrix under consideration [Eq. (5.304)],n = 3 and s = 2 for the degenerate eigenvalues A 1 , 2 = 0. Hence, we can take the two linearly independent eigenvectors as [Eq. (5.335)]: v1 = c,
b 1
,
v2 = c 2
(5.353)
where the scalars b and c are still arbitrary. In other words, there are infinitely many possibilities for the eigenvectors of the degenerate eigenvalue A 1 , 2 = 0. However, notice that all these vectors satisfy (5.354) (5.355) That is, they are all perpendicular to the remaining eigenvector, us,corresponding to A3 = 3. Using the freedom in the choice of the constants, b and c, we can also choose u1 and 212 as perpendicular: (Vl,V2)
= 0,
(5.356)
which gives a relation between b and c as (1
+ b )(1 + + b + c = 0, C)
+
1 2c b=-2+c
(5.357) (5.358)
Choosing c as 1, Equation (5.358) gives b = -1, thus obtaining the orthogonal set of eigenvectors of A as
284
LINEAR ALGEBRA
Figure 5.2
Eigenvectors corresponding to the degenerate eigenvalues.
In all the eigenvalue problems the eigenvectors are determined up to a multiplicative constant. This follows from the fact that any constant multiple of an eigerivector is also an eigenvector; that is, if
AV = XV, then
cyv, where cy
(5.360)
is a scalar, is also an eigenvector:
A(av) = X(cyv).
(5.361)
Having defined the inner product of two vectors, we can now fix the constants left arbitrary in the definition of the eigenvectors [Eq. (5.359)] by normalizing their norms to unity. Thus obtaining an a orthonormal set of eigenvectors as
Geometrically, the degeneracy can be understood from Figure 5.2, where all the vectors lying on the plane perpendicular t o the third eigenvector v3 are possible eigenvectors to XI,^ = 0. Now the transformation matrix S can be written as
(5.363) we can easily check that the inverse is given as
(5.364)
QUADRATIC FORMS
285
Note that the inverse transformation matrix is the transpose of S. In other words, S represents an orthogonal transformation.
5.17
QUADRATIC FORMS
Let A be an operator defined in an n-dimensional inner product space V over the field of real numbers R.Also let A be represented by a symmetric n x n matrix,
A = Aij, i , j = 1 , . . . , n ,
(5.365)
in terms of the basis vectors B = {q,v 2 , . . . , v,}. construct scalar quantities like
In this space we can
Q = GAv,
(5.366)
where v is a vector in V represented by the column matrix
V =
[
(5.367) ffn
If we write Q explicitly, we obtain a quadratic expression like
Such expressions are called quadratic forms, which are frequently encountered in applications. For example, the quadratic form constructed from the moment of inertia tensor [Eq. (5.200)] as
I
= GIG,
(5.370)
where G is the unit vector in the direction of the angular velocity, 8 = w f i , is called the moment of inertia (scalar). In terms of the moment of inertia, I, the kinetic energy:
T
=
1 -Iw’, 2
(5.371)
is written as the quadratic form
T = Izzw:
+ I y y w i + I,,w; + 2IzYw,wy + 2 I z Z w , ~ , + 21yzwyw,,
(5.372)
where w,, wy, w, are the components of the angular velocity in Cartesian coordinates. It is naturally desirable to orient the Cartesian axes so that the
286
LINEAR ALGEBRA
moment of inertia tensor is represented by a diagonal matrix, where the kinetic energy simplifies t o 1 1 1 T = -Ilw; - I Z W-I~w$. ~ (5.373) 2 2 2 In this equation 11,I,, 13 are the principal moments of inertia and w1, w2, w3 are the components of the angular momentum along the principal axes. This is naturally an eigenvalue problem, where we look for a new set of basis, B’ = {v;,v:, . . . , v;}, where A [Eq. (5.365)] is represented by a diagonal matrix. In Equation (5.97) we have written the transformation equation between two vectors, v and v’,as
+
v
+
= SV‘.
(5.374)
Since the inverse transformation exists, we can also write this as V’
=
s-lv.
(5.375)
For the new basis vectors we first solve the eigenvalue problem
Aw
= Xw
(5.376)
and find the orthogonal set of eigenvectors, w1, w2, . . . ,w,. Then by using these eigenvectors, we construct the new orthonormal basis, which also spans V as h
B’
= {Zl,Z2, . . .
,ZTL},
(5.377)
Aw,
i = 1 , 2 , . . . , n.
(5.378)
where ( Z t , Z J )= S,, and
-e, = 20,
= X,w,,
IWZI’
We have demonstrated for real and symmetric matrices that the eigenvectors corresponding to distinct eigenvalues are always orthogonal and for the eigenvectors corresponding to degenezte eigenvalues they can always be arranged as orthogonal. Hence, the set B’ can always be constructed. We can now write the n x n transformation matrix, S, as
s = ( ( isl where
( ) isi
)
(is2
)-( isn )).
(5.379)
is the n x 1 column matrix representation of the eigenvector
2i. Using its transpose,
(5.380)
QUADRATIC FORMS
287
we construct the product (21,Zl)
(21,22)
(Z2,21)
(22,221
( G , ~ I (zn,G) )
... ...
(5.381)
... h
Using the orthonormality relation of the new set B’, A
h
( e i ,e j ) = 6.. 3
(5.382)
ss = I.
(5.383)
Equation (5.381) becomes
In other words, the inverse transformation is the transpose of the transformation matrix:
-
s-’ = s.
(5.384)
Matrices satisfying this property are called orthogonal matrices. Transformations represented by orthogonal matrices preserve norms of vectors. This 2 can be easily seen by using v = Sv’ to write IvI in terms of the new basis, 3, as
-
lVl2
= vv = (SV’)SV’
(5.385) (5.386) (5.387) (5.388)
The quadratic form,
Q = GAv,
(5.389)
can now be written in terms of the new bases. Using the transformation equation,
v = SV’, we can write
-
Q = (Sv’)ASv’ = G‘SASV‘ = Z;’
(S-’ AS) v’,
(5.390)
(5.391) (5.392) (5.393)
288
LINEAR ALGEBRA
where S-IAS is the matrix representation of the operator A in terms of the new bases, that is, A'. It is important to note that orthogonal transformations preserve the values of quadratic forms, in other words, Q = vAv = ~ ' A ' v '= Q'.
(5.394)
Writing the transformation matrix [Eq. (5.379)], S =Z$,i , j = 1,.. . ,n,explicitly as
where Zijis the i t h component of the j t h normalized eigenvector, we form the product
(5.396)
In the above equation we have used the eigenvalue equation [Eq. (5.378)]: n.
n
j=1
1=1
Equation (5.396) can also be written as
(5.398)
which leads to
that is, in the new bases A is represented by a diagonal matrix with the diagonal elements being the eigenvalues.
HERMITIAN MATRICES
5.18
289
H E R M I T I A N MATRICES
We now consider matrices in the complex field @. The adjoint of a matrix, At, is defined as A*, where the transpose of A along with the complex conjugate of each element is taken. If the adjoint of a matrix is equal to itself, At = A, it is called self-adjoint. In real inner product spaces self-adjoint matrices are simply symmetric matrices. In complex inner product spaces selfadjoint matrices are called Hermitian. Adjoint operation has the following properties. (5.400) (5.401) (5.402) (5.403) Hermitian matrices have very useful properties. Consider the eigenvalue problem for an n x n Hermitian matrix H : H u = ~XU~,
(5.404)
where ux is the n x 1 column matrix representing the eigenvector corresponding to the eigenvalue A. For two eigenvalues, A, and A j , we write the corresponding eigenvalue equations, respectively, as Hu~ = X~U,
(5.405)
Huj
(5.406)
and = Ajuj.
We now multiply Equation (5.405) with 65 from the left and write U~HU = XiU,*ui. ~
(5.407)
Next we take the adjoint of Equation (5.406):
(5.408) and multiply by
ui
from the right:
U,*Htui= A;U;ui. Using the fact that for Hermitian matrices Ht
(5.409) = H, Equation
63Hui = A;U,*ui.
(5.409) becomes
( 5.4 10)
290
LINEAR ALGEBRA
We now subtract Equation (5.410) from (5.407) to write
0 = ( X i - x;,u;uz.
(5.411)
As in the case of real symmetric matrices, there are basically three cases: (I) When i = j , Equation (5.411) becomes (5.412) (5.413) Since luiI2is always positive, this implies that the eigenvalues of a Hermitian matrix are always real, that is, X i = A;.
(5.414)
In quantum mechanics eigenvalues correspond to directly measurable quantities in laboratory, hence observables are represented by Hermitian matrices. (11) When i # j and the eigenvalues are distinct, X i # X j , the corresponding eigenvectors are orthogonal in the Hermitian sense, that is, (Uj,Ui)
= u;uz = 0.
(5.415)
(111) As in the case of real and symmetric matrices, for repeated roots of order s there exists a set of s linearly independent eigenvectors. Hence for a Hermitian n x n matrix H, we can construct a set of linearly independent orthonormal basis vectors, {GI,G2,. . . , which spans the n-dimensional inner product space V defined over the complex field C.It then follows that any vector v in V can be written as the linear combination
a,},
n
u
=
X"iGi,
(5.416)
i=l
where the components ai are numbers in the complex field C.Using the definition of the inner product and the orthogonality of the normalized basis vectors, we can evaluate C Y ~as n
(2j,U)=
.y(Gj,"iGZ)
(5.417)
i= 1 n
(5.418) i=l n.
(5.419) = "j.
(5.420)
HERMITIAN MATRICES
291
In this complex vector space, Hermitian operators, H , transform vectors in V into other vectors in V. Again, it is desirable t o find a new set of basis vectors, where H is represented by a diagonal Hermitian matrix, H. Steps t o follow are similar to the procedure described for the real and symmetric matrices. We define the transformation matrix U as (5.421)
where Eii, i matrices
1 , 2 , . . . , n are the unit eigenvectors represented by the column
=
Gi=
( q,
(5.422)
U n1
where G j i corresponds to the j t h component of the i t h eigenvector, fying the eigenvalue equation
&, satis-
HGi = X,Gi.
(5.423) -*
Due to the orthonormality of the eigenvectors, ( G j , G i ) = GjGi write
= Sij,
we can
u*u
=z
(5.424)
hence (5.425) (5.426)
A matrix whose inverse is equal to its adjoint is called a unitary matrix. Similarly, a transformation represented by a unitary matrix is called unitary transformation. Unitary transformations are the counterpart of orthogonal transformations in complex inner product spaces, and they play a very important role in quantum mechanics. A unitary transformation, UtU = I, preserves the norms of vectors. This can be proven by using v = uv’
(5.427)
292
LINEAR ALGEBRA
to write
-* v*v = (UV’) uv’
(5.428)
-*-
= v’ u * u v ‘
-*
= v‘
(5.429)
u+uv’
= z;‘*v‘,
(5.430) (5.431)
/vI2 = lV’l2.
(5.432)
which is the desired result:
Consider a Hermitian matrix, H, representing a Hermitian operator, H , which transforms a vector v in V into another vector w in V defined over the complex field C as w = Hv.
(5.433)
Vectors, v and w, transform under unitary transformations as w = uw’,
v
(5.434) (5.435)
= uv’.
Using these in Equation (5.433) we obtain
UW’= HUv’,
(5.436)
U-lUw’= (U-’HU)v’,
(5.437)
w’= (U?HU)v’.
(5.438)
In conclusion, in terms of the new bases defined by the unitary transformation U, H is expressed as
H‘ = U-lHU = UtHU.
(5.439)
A unitary transformation constructed by using the eigenvectors of H, a s described in the argument leading to Equation (5.399), diagonalizes H as 0 H ’’== ( i.
...
’.:: ...
r)
An
(5.440)
MATRIX REPRESENTATION OF LINEAR TRANSFORMATIONS
5.19
293
M A T R I X REPRESENTATION OF LINEAR TRANSFORMATIONS
Having defined the inner product spaces, we can now show how the matrix representation of a given transformation or operator can be found. Consider an ndimensional vector space, V ,spun by the orthonormal set B = { E l , $ 2 , . . . , e n } and defined over the field K . Let T be a linear transformation from V into V, that is. h
w =Tv,
(5.441)
where v and w are elements in V. Using the orthonormal basis B , we can write the vectors v and w as
c n
u=
a&,
(5.442) (5.443)
where the components ai and
&
are evaluated by using the inner products
ai= (Zi, u)and
pj
= ( Z j , w).
(5.444)
We now write Equation (5.441) as n
n.
j=1
i=I n
= CaiTEi.
(5.446)
i= 1
If we take the inner product of both sides with condition, ( Z k , E j ) = S k j , we obtain n
n
j=1
i=l
Ek
and use the orthonormality
(5.447) n.
n
j=1
i=l
(5.448) n
(5.449)
c,"=,
i=l
Comparing with ,& = Aijaj [Eq. (5.71)], we see that the elements of the transformation matrix are obtained as
294
LINEAR ALGEBRA
For any linear operator T acting on V,we can define another linear operator Tt called the adjoint of T, which for any two vectors v and w in V has the property
(Tv, W ) = ( v ,Ti,). If we write v and w in terms of the basis B
=
(5.451)
{Z1,22, . . . ,Zn}, we obtain (5.452) (5.453)
In other words, the matrix [Tt],j = Aij, representing the adjoint operator T t , is obtained from the matrix [T]ij= Aij, which represents T by complex conjugation and transposition. A linear operator satisfying
Tt = T
(5.454)
is called a Hermitian operator or a self-adjoint operator. Hermitian operators are represented by Hermitian matrices.
5.20
FUNCTIONS OF MATRICES
We are now ready to state an important result from Gantmacher: Theorem 5.6. If a scalar function f ( x ) can be expanded in power series:
where r is the radius of convergence, then this expansion remains valid when x is replaced by a matrix A whose eigenvalues are within the radius of convergence. Using this theorem we can write the expansions:
(5.456) (5.457)
(5.458) An important consequence of this theorem is the Baker-Hausdorf formula: ;LAH~-ZA
=H
+ i [A,HI
-
1 2
- [A, [A,HI]
+ . .. ,
(5.459)
295
FUNCTIONS OF MATRICES
where A and H are n x n matrices and [ A , H ]= A H - H A
(5.460)
is called the commutator of A with H. For the proof we first expand ezA and e P L Ain power series and then substitute the results into e Z A H e c z A and collect similar terms. Another very useful result is the trace formula: det eA = etTA,
(5.461)
where A is a Hermitian matrix. Since unitary transformations, U t U preserve values of determinants (Section 5.8),we can write det eA = det(UteAU) = det eutAu.
=
I,
(5.462) (5.463)
Using a unitary h-ansformation that diagonalizes A as
UtAU = D,
(5.464)
(5.465)
where X i are the eigenvalues of A, we can write det eA = det eD
(5.466) (5.467) (5.468)
Since the determinant of a diagonal matrix is equal to its trace and unitary transformations preserve trace (Section 5.8), we finally obtain the trace formula as 3o
deteA=x--n=O
(trD)n n!
(5.469) (5.470) (5.471)
296
LINEAR ALGEBRA
5.21 FUNCTION SPACE AND HILBERT SPACE We now define a new vector space, L2, whose elements are complex valued square integrable functions of the real variable x,defined in the closed interval [a,01. A function is square integrable if the integral (5.472) exists and is finite. With the inner product of two square integrable functions, f l and f 2 , defined as (5.473) the resulting infinite-dimensional inner product space is called the Hilbert space. Concepts of orthogonality and normalization in Hilbert space are defined as before. In a finite-dimensional subspace, V,of the Hilbert space, a set of square integrable functions, (91, g2, . . . , g n } , is called an orthonormal set if its elements satisfy
J1"
g:gjdx
= Si3,
i , j = 1,.. . , n.
(5.474)
Furthermore, we say that the set {gl,g2,. . . , g n } is complete if it spans V; that is, any square integrable function, g(z), in V can be written as n
(5.475) i=O
Using the orthonormality relation, we can evaluate the expansion coefficients, c i , as (5.476) (5.477) Using a linear operator, A , acting in the Hilbert space we can define an eigenvalue problem as
A f i ( x )= A i f i ( z ) , i
=
1 , 2 , .. .
,
(5.478)
where f L ( x )is the eigenfunction corresponding to the eigenvalue A,. We have seen that in finite-dimensional vector spaces, V, Hermitian operators are represented by Hermitian matrices, which have real eigenvalues and with their eigcnvectors form a complete orthonormal set that spans V . For the infinitedimensional Hilbert space the proof of completeness is beyond the scope of
DIRAC'S BRA AND KET VECTORS
297
this book, a discussion of this point can be found in Byron and Fuller. The state of a system in quantum mechanics is described by a square integrable function in Hilbert space, which is called the state function or the wave function. Observables in quantum mechanics are Hermitian operators with real eigenvalues acting on square integrable functions in Hilbert space. 5.22
DIRAC'S BRA A N D K E T VECTORS
A different notation introduced by Dirac has advantages when we consider eigenvalue problems and Hermitian and unitary operators in quantum mechanics. For two vectors in Hilbert space the inner product is not symmetric: (5.479) (5.480) However, from the relations ( Q l , Qz) = (Qz, Q l ) * ,
+ Q 2 , Q 3 ) = (Ql,Q 3 ) + (Qz, Q 3 ) , ( Q l , Q2 + 9 3 ) = (Ql, Q 2 ) + (Ql, Q3),
(Ql
(Ql, a Q 2 ) = a(Q1,Qz),
( Q Q l , Q2) = a*(Q1,Qz),
(5.481) (5.482) (5.483) (5.484) (5.485)
where a is a number in the complex field @, we see that the inner product is nonlinear with respect to the prefactor. This apparent asymmetry can be eliminated if we think of both vectors as belonging to different spaces-- that is, the space of prefactor vectors and the space of postfactor vectors, where each space is linear within itself but related to each other in a nonlinear manner through the definition of the inner product. Hence, they are called dual spaces. Dirac called the prefactor vectors bra and showed them as (QI , and he called the postfactor vectors ket and showed them as IQ) . For each bra there exists a ket in its dual space, that is,
(5.486) (5.487) (5.488) Each space on its own is a vector space. The connection between them is established through the definition of the inner product as (Q11 92)= /Q;Q2
dx.
(5.489)
298
LINEAR ALGEBRA
Obviously, (5.490) Note that generally we write (Q1I Q 2 ) , rather than (Q1l / Q 2 ) . A linear operator, A, associates a ket with another ket:
A IQi
A IQ) = I@) > + Q 2 ) = A IQi)
+A
(5.491) (5.492) (5.493)
/Q2),
A ( a 19))= a ( A IQ)). By writing
(Qil
A
1Q2)
as (Qll
we may define a bra, or the bra as
(Qll
[AIQ2)1 =
(5.494)
[(Qll A1 1Q2)
A, that allows us to use A to act on either the ket
In other words, A is a linear operator in both the bra and the ket space. In terms of the bra-ket notation the definition of Hermitian operators become (Qll At
1Q2) = ( Q 2 l A
I%)*.
(5.497)
In general, we can establish the correspondence
A 1")
=
I@) * (@I
=
(Ql
At.
(5.498)
PROBLEMS 1. Is the set of integers,
. . . , - 1 , O , 1 , .. .
a subfield of
2. Is the set of rational numbers a subfield of
C?
C? Explain.
3. Write three vectors that are linearly dependent in R3 but such that any two of them are linearly independent. 4. Let V be the vector space of all 2 x 2 matrices over the field K. Show that V has the dimension 4 by writing a basis for V which has 4 elements. 5. Let V be the vector space of all polynomials from R into R of degree 2 or less. In other words, the space of all functions of the form
PROBLEMS
299
Show that B = ( 9 1 , ~2~ g 3 } , where
+ 1, 93(5) = ( 5 + 212, forms a basis for V. If f(x) = a0 + a l x + a21c2, find the components with g1(5)
= 1, 9 2 ( 5 ) = 5
respect to B .
6. Show that the vectors 1,0,0)1
7J1 =
(11
7J2 =
( L O , 0, a),
7J3 =
(O1O, 1,1),
7J4
= (0,010,1)
form a basis for R4.
7. Which of the following transformations, T ( z 1 , 5 2 ) ,from R2 into EX2 are linear transformations? T(51,52) = ( 2 1 , 1
(i) (ii)
+521,
T(51,52)= ( 5 2 , 5 1 ) ,
(4
T ( Z l , 5 2 )= ( 2 1 , 4 ,
(iv)
T ( 5 1 , ~ =)( 2 1 5 2 , 0 ) .
8. If T and U are linear operators on EX2 defined as T ( 5 1 , 5 2 )= ( 5 2 , 5 1 1 ,
U ( z 1 ,5 2 )
=
( a 01, ,
give a geometric interpretation of T and U . Also write the following transformations explicitly:
( U + T ) , UT, TU, T 2 ,U2. 9. Let V be the vector space of all n x n matrices over the field K . Show that T ( A )= A B
-
BA,
where A belongs to V and B is any fixed n x n matrix, is a linear transformation. 10. Let V be the space of polynomials of degree three or less:
+ a15 + a252 + a353. The differential operator D = 2 maps V into V, since its effect on f ( 5 ) f(5)= a0
is to lower its degree by one.
300
LINEAR ALGEBRA
(i) Let
be the basis for
V.Find the matrix representation of D , that is, [D]B.
(ii) Show that the set B' = (gl,g2,g3,94} ,where 91 = fl, Q2 = fl g3 = f l
Q4 = f i
also forms a basis for
+ f2,
+ 2 f 2 + f3, + 3f2 4- 3f3 + f4,
V.
(iii) Find the matrix representation of D in terms of the basis B'
11. For a distribution of m point masses, the moment of inertia tensor is written as m
where k stands for the kth particle and i and j refer to the Cartesian coordinates of the kth particle. Consider 11 equal masses located at the points:
(i) find the principal moments of inertia, (ii) find the principal axes and plot them,
(iii) write the transformation matrix an orthogonal transformation, (iv) show explicitly that
S and show that it corresponds to
S diagonalizes the moment of inertia tensor,
(v) write the quadratic form corresponding to the moment of inertia, I , in both coordinates. 12. Consider a rigid body with uniform density of mass 2M in the shape of a 45" right triangle lying on the xy-plane and with a point mass M on the z-axis as shown in Figure 5.3. Find the principal moments of inertia and the eigenvectors. 13.
Find the eigenvalues and the eigenvectors of the following matrices:
PROBLEMS
Figure 5.3
301
Mass distribution in Problem 5.12.
(ii)
; t).
A = ( 2 0 0
14. Write the quadratic form
in matrix form Q = ,Ax and then (i) find the eigenvalues and the eigenvectors of A (Hint: one of the eigenvalues is 6), (ii) construct the transformation matrix that diagonalizes the above quadratic form and show that this transformation is orthogonal and diagonalizes Q.
15. Find the transformation that removes the xy term in 2’
+ z y + y2 = 16
and interpret your answer geometrically.
16. Given the quadratic form
Q
=2 4
+ 52; + 2 ~ +: 4 2 1 2 3 ,
302
LINEAR ALGEBRA
(i) express Q in matrix form as Q
= %AX,
(ii) find the eigenvalues and the eigenvectors of A, (iii) find the transformation matrix that diagonalizes the above quadratic form and show that this transformation is orthogonal,
(iv) write Q in diagonal form. 17. Schwarz inequality: Since the cosine of any angle lies between -1 and +1, it is clear that geometric vectors satisfy the Schwarz inequality:
1x.q5 1x11x1. Show the analog of this inequality for the general inner product spaces: I(u,v)I
2
I (u,u)(v,v).
18. Triangle inequality: For the inner product spaces prove the triangle inequality Iu
+
'UI
5
IUI
+ Ivl.
19. Show that the space of square integrable functions forms a vector space. 20. Prove the Baker-Hausdorf formula:
, i a ~ ~ - iA
-H
+ i[A,HI - -21[A,[A,HI] + . . . ,
where A and H are n x n matrices and [A,H]= AH
-
21. In bra-ket notation Hermitian operators are defined as
(qll At
1Q2)
Establish the correspondence
A19) = I@)
=
-
( % A I%)*.
(@I
=
("/At.
HA.
CHAPTER 6
SEQUENCES A N D SERIES
Sequences and series have found a wide range of applications in both pure and applied mathematics. A sequence is a succession of numbers or functions that may or may not converge to a limit value. A natural way to generate sequences is to use partial sums of series. In this regard, the convergence of a series is synonymous with the convergence of the corresponding sequence of partial sums. In this chapter, we introduce the basic concepts of sequences and series and discuss their most commonly used properties. We can name the evaluation of definite integrals, calculation of limits, and series approximation of functions to any desired level of accuracy as being among the most frequently used techniques with series. Series solutions of differential and integral equations and perturbative techniques are other important applications of series.
Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
3. S e l p k Bayin 303
304
SEQUENCES AND SERIES
6.1 SEQUENCES An infinite sequence is defined by assigning a number, S,, t o each positive integer, n, and then by writing them in a definite order as
{ S n > n,
= 1 , 2, . . .
.
(6.2)
A finite sequence has a finite number of elements:
{SI1 SZ . . . Sn).
(6.3)
In general a sequence is defined by a rule, that is, by its n t h element, which allows one to generate the sequence. For example, the following sequences:
{-1,1, -1,. . . } ,
(6.6)
where the n t h terms are given, respectively, as
can also be written as
A sequence {S,} is said to converge t o the number S, that is, lim S,
n+m
if for every positive number inequality
E
+ S,
we can find an N such that for all n
ISn - S I < E, n > N , is satisfied. The limit, when it exists, is unique.
(6.10)
> N the (6.11)
SEQUENCES
A
I
I
Figure 6.1
A 1
I l l
I
Convergent sequence.
I
I
I
I
I
I
I
I
'2
B
I
4'
3'
,
1
'5
1
S
B
1
I
PS
'6
Figure 6.2
Convergent monotonic increasing sequence.
Figure 6.3
Convergent monotonic decreasing sequence.
I I
I
1'
s2
Figure 6.4
A I
I
s3
I
3'
1
I
1
I
,
I
4'
I I
+s
7'
' 5 '6
Divergent monotonic increasing sequence.
I l l
1 1 l 1 1
BI
, , I
I , , , ,
I
s5
Figure 6.5
Figure 6.6
52'
'6
Divergent bounded sequence.
Divergent unbounded sequence.
PS
305
306
SEQUENCES AND SERIES
In Figures 6.1-6.6 we show various possibilities for sequences. Convergence of a sequence can also be decided without the knowledge of its limit by using the Cauchy criteria: Theorem 6.1. Cauchy criteria: For a given infinite sequence {S,} the limit lim
n-30
S, = S
(6.12)
exists, if and only if the following condition is satisfied: For every exists an integer N , such that for all n and m greater than N
IS,
-
SmI < E , n
E
> 0 there
# m,
(6.13)
is true. A sequence is called divergent if it is not convergent. Sequences in Equations (6.4) and (6.5) converge since they have the limits 1 lim - + 0, n-30 2,
(6.14) (6.15)
while the sequences in Equations (6.6) and (6.7) diverge. A monotonic increasing sequence satisfies (Fig. 6.2)
s1 5 s, 5 s3 5 . ”
5
s, 5 . . ’ .
(6.16)
Similarly, a monotonic decreasing sequence is defined as (Fig. 6.3) s 1
2 s, 2 s3 2 . . . 2 s, 2 ’ . . .
(6.17)
A sequence is bounded if there are two numbers, A and B , such that A 5 S, 5 B for all R. Sequences in Figure 6.4 and Figure 6.6 are unbounded. It can be proven that every bounded monotonic decreasing sequence is convergent and every unbounded sequence is divergent (Kaplan). Sequences that are bounded but not monotonic can diverge or converge. For example, the sequence {-1,1, -1,1,. . . , ( - l ) n , . . . }
(6.18)
diverges by oscillating between -1 and 1. Another sequence {cos?},
n = 1 , 2, . . . ,
(6.19)
oscillates between the values 0 , l and -1 :
(0,- l , O , 1,0, -1,. . . }.
(6.20)
SEQUENCES
307
In other words, a bounded series may diverge not because it does not have a limit value but because it has too many. For bounded sequences we define an upper limit as
GTL-ms, +U
(6.21)
> 0 there exists a number N such that (6.22) IS, - UI < & is true for infinitely many n > N and if no number larger than U has this
if for every
E
property. Similarly, we define the lower limit as
bEn+&I if for every
E
+
L,
(6.23)
> 0 there exists a number M such that
(6.24) IS, - LI < & true for infinitely many n > M and if no number less than L has this
is property. A bounded sequence has an upper and a lower bound such that U > L. For unbounded sequences we can take +cc and -cc as possible limit “values.” For a sequence of real numbers, {S,},we can say the following: (i) -
limn+mSn
> lim, ,=S,.
(6.25)
(ii) A sequence converges if and only if both limits are finite and equal:
L,-m~, = lim, ,,s,
= S.
(6.26)
G,+mS, = lim, ,,Sn = +cc.
(6.27)
(iii) A sequence diverges to +oo if and only if
(iv) A sequence diverges to -cc if and only if
E,+m~, = lim, , o o ~ n= -m.
(6.28)
(v) A sequence where we have
i&-mSn
# limn-,mSn
(6.29)
is said to oscillate. By simple graphing one can verify the following limits:
S,
= ( - l ) , ( ~-
S,
=1
i); h,+wS,
+ (-1),/3,;
s, = ( - 1 ) V ;
-
= -1,
limn-wS,
=
+I,
b,+wS,
= 1,
lim,+wS,
=
1,
h,-wS,
= -00,
-
lim,+wSn = +00,
S, = nsin2(inrr);
bn++-Sn = 0,
lim,-wSn
S, = sin(+n.rr);
h,-mSn
lim,-wSn
= -1,
= +m, = +l.
(6.30)
308
SEQUENCES AND SERIES
6.2
INFINITE SERIES
An infinite series of real numbers is defined as the infinite sum
(6.31) n=l
When there is no room for confusion, we also write an infinite series as an or as C,,a,. With each infinite sum we can associate a sequence of partial sums,
{ 4 , S 2 , S3, . . . , Sn, . . . } ,
(6.32)
c.2.
(6.33)
where
n
sn=
i=l
The series CEl ai is said to converge if the sequence of partial sums, { S n } , converges and the series diverges if the sequence of partial sums diverges. When the sequence of partial sums converges to S , we say the series converges and call S the sum of the series and write 00
Y ai = S.
(6.34)
u
i=l
In other words, n
lim
n-00
00
C ui C =
ai
= S.
(6.35)
i= 1
i=l
When the limit exists and if limn+00 Sn = fm,we say the series is (properly) divergent. Let 00
03
n=l
n=l
be two convergent series and let a and
P be two constants, then we can write
00
C [ a a n+ ~ b n=] n=l
=
Q:
C a n + P cbn, 00
00
n=l
n= 1
as, + p s b .
(6.36) (6.37)
ABSOLUTE AND CONDITIONAL CONVERGENCE
6.3
309
ABSOLUTE AND CONDITIONAL CONVERGENCE
A series, X u T Lis, called absolutely convergent if the series constructed by taking the absolute value of each term, lanl , converges. A series that does not converge absolutely but converges otherwise is called conditionally convergent. For a convergent series the nth term, a,, necessarily goes t o zero as n goes to infinity. This is obvious from the difference
c
s,
-
(6.38)
s , 1 = a,,
which by the Cauchy criteria must approach to zero as n goes to infinity. However, the nth term test, lim a,
(6.39)
+ 0,
7%-cc
is a necessary but not a sufficient condition for the convergence of a series. In what follows we present some of the most commonly used tests, which are also sufficient for absolutely convergent series or series with positive terms. For series with positive terms the absolute value signs are redundant.
6.3.1
Comparison Test
Given two infinite series, then the convergence of
6.3.2
C a,
and C b,, with b, > 0 and lanl 5 b, for all n, b, implies the absolute convergence of a,.
c
Limit Comparison Test
Assume that b,
> 0 for n = 1 , 2 , . . . and suppose that (6.40)
then the series C a, converges absolutely, if and only if C b, converges. When the limit limn+m la,/b,l + 0 is true, we can only conclude that the convergence of C b, implies the convergence of C lanl .
6.3.3
Integral Test
Let f(x) be a positive decreasing function in the interval [l,m] satisfying the following conditions: (i) f(x) is continuous in [I,m], (ii) f(x) is monotonic decreasing and limn-m f(x) + 0, (iii) f ( n ) = a,. Then the series C a, converges if the integral f(x)dx converges and diverges if the integral diverges.
Jy
310
SEQUENCES AND SERIES
6.3.4
Ratio Test
Given an infinite series, xu,, if (6.41) then for L<1 L >1 L =1
6.3.5
C a, C a,
is absolutely convergent, is divergent,
(6.42)
the test is inconclusive.
Root Test
For a given series,
a,, let (6.43)
then for
R<1 R>1 R =1
C a, C a,
is absolutely convergent, is divergent, the test is inconclusive.
(6.44)
The strategy in establishing the convergence of an unknown series is t o start with the simplest test, that is, the comparison test. In order to use the comparison test effectively, we need to have a t our disposal some series of known behavior. For this purpose we often use the harmonic series or the geometric series: Theorem 6.2. The harmonic series of order p :
. y $ = l + - +1 - + 1. . . 2p 3p rz= 1
,
(6.45)
converges for p > 1 and diverges for 0 < p 5 1. We can establish this result most easily by using the integral test. For p > 0 we write the integral
When p > 1 the above limit exists and takes the value l / ( p - 1), hence the series converges. For 0 <.p < 1 the integral diverges, hence the series also diverges. For p = 1 we have
dx = lim (lnb) = 00 b-ca
(6.47)
311
ABSOLUTE AND CONDITIONAL CONVERGENCE
aiid tlie harrnonic series diverges again. Finally for p < 0, t.hc n,th term rcadily establishcs the divergence. thus completing the proof. Theorem 6.3. The geometric series,
c 31.
z?
= 1
+ .I: f 2 2 +
'
. , '
(6.48)
11=0
converges for -1 < 1' < 1 as 7c
CJ7L
1 = -, 11.1
1t=O
1-x
< 1.
(6.49)
The proof can be rstablishcd by the ratio test. From the limit
(6.50) i i c ~ d1.r < 1 for convergence. The series obviously divcrgcs for the nth tcrrn test.
\vv
T =
1 by
Example 6.1. Convergence tests: Consider tht: scrim 3c:
n+2 4n2 n
C
+ + 1.
n=l
(6.51)
Since the nth term converges to zero. it is of no help. For large n the ttth term behaves as 1/4n. which suggcsts comparison with 1/4n. Sirice the inequality
+2 1 + n + 1 > -4 71
ri
4n2
(6.52)
is satisfied for all n 2 1, by comparison with the harmonic scries. C,,+. we conclude that the above srrics [Eq. (6.51)]also diverges. Note that for this case the ratio tmt fails:
Example 6.2. Convergence tests: The series (6.54) divcrges by the integral test:
(6.55)
312
SEQUENCES AND SERIES
Note that comparison with the harmonic series is of no help since
(6.56) Example 6.3. Convergence tests: For the series
xF 00
(6.57)
n=l
we can use the comparison test with the harmonic series: Inn 1 >-, n n
n = 3 , 4,...,
(6.58)
to conclude that the series is divergent.
Example 6.4. Convergence tests: Given the series 03
(6.59) n=l
where c is a positive constant, we can use the ratio test,
(6.60) to conclude that the series converges. Similarly, for the series
c 00
(6.61)
$1
n=l
we apply the ratio test,
(6.62) to conclude that the series diverges. In fact, in this case we can decide quicker via the n t h term test, nn lim n!
n-03
---f
en lim -
n-m&
+
0,
where we have used the Stirling's approximation n! 2~ nneCn& large n.
(6.63) for
Example 6.5. Convergence tests: We can write the series 00
(6.64) n=l
ABSOLUTE AND CONDITIONAL CONVERGENCE
313
as 00
(6.65) n=l
where the first series converges by the ratio test, 1/3"+l 1 lim -- - < 1 , n+cc 1/3" 3
(6.66)
while the second series converges by the integral test,
1"$=1.
(6.67)
Hence the original series [Eq. (6.64)] converges. We should make a note that the sum of two convergent series is convergent and the sum of a divergent series with a convergent series is divergent. However, the sum of two divergent series may be convergent.
Example 6.6. Convergence tests: Given the series
2 (&y,
(6.68)
n=2
the ratio test fails: n+1
(6.69) However, the root test, lim
n-cc
n & +n-cc lim n2 +1
+
0,
(6.70)
yields that the series is convergent. In general the root test is more powerful than the ratio test. When the ratio test is inconclusive, usually a result can be obtained via the root test. The above tests work for the absolutely convergent series or series with positive terms. For conditionally convergent series we can use the following theorem: Theorem 6.4. An alternating series, (6.71) n=O
converges when the following two conditions are met:
314
SEQUENCES AND SERIES
(i) a,, are monotonic decreasing, that is,
an+l 5 a,, n = 0 , 1 , 2 , . . . .
(6.72)
(ii)
lim an
n-zc
-7’
(6.73)
0.
Example 6.7. Alternating series: The alternating series
(6.74) In n converges since - is monotonic decreasing for n = 2 , 3 , . . . and the ri limit In n lim - + 0
n-cx
(6.75)
n
is true.
6.4
OPERATIONS W I T H SERIES
Wc have already pointed out that convergent series can be added, subtracted, and multiplied by a constant: M
-&4+ C(Pbn) =a n=l
?a=l
c +P c 00
00
a,
n= 1
bn,
(6.76)
n=l
where C and C b,, are two convergent series and CY and p are constants. In addition to these there are three other operations, grouping, rearrangement and multiplication, that can be performed on series. For absolutely convergent series or series with positive terms, grouping- that is, inserting parentheses- yields a new convergent series having the same sum as the original series. For the sum
x u n
= a1
n=l
+ a2 + a3 + + . . . , a4
(6.77)
iriscrting parentheses, namely (a1
+ a2 + a3)+ (a4 + a5 + ag) + (a7 + U S + u g ) + . . . ,
(6.78)
while preserving the order of terms, will only cause one t o skip some of the terms in the sequence of partial sums of the original series; that is,
OPERATIONS WITH SERIES
315
will be replaced by
If the first sequence [Eq. (6.79)] converges to S, then the second sequence [Eq. (6.80)] obtained by skipping some of the terms will also converge to the same value. If the original series is properly divergent, then insertion of parentheses yields again a properly divergent series. Since a series with positive terms is either convergent or properly divergent, we can insert parentheses freely without affecting the sum. Divergent series with variable signs can occasionally produce convergent series. For example, the divergent series 03
C(-l)'" = $1
-
1
+1
-
1
+1
(6.81)
-
n=O can be written as
where all the pairs in parentheses give zeros, thus yielding 1 as the sum. Another operation with the series is rearrangement of its terms. For absolutely convergent series converging to S, every rearrangement of its terms will also converge absolutely and will yield the same sum S. For conditionally convergent series, no such guarantees can be given. In fact, by a suitable arrangement of its terms a conditionally convergent series can be made t o converge to any value or even t o diverge. To this effect we present the following theorem without proof (Apostol): Theorem 6.5. Let C ai be a conditionally convergent series and also let n,b, a < b, be any two numbers in the closed interval [-m,+co]; then there exists a rearrangement C bi of C ai such that lim, ,~
-
b, = a and limn+co
b, = b.
(6.82)
To demonstrate Theorem 6.5 consider the alternating series
(6.83) which converges to ln2. We rearrange its terms as 03
C
1 3
-I+-+-+ -
(-Y+l
k=l
1 5
-I+-+ 1 - +1. . . - 3 5 -
1 2 2
1 1 ---... 4 6 [1+;+;+-..].
(6.84) (6.85)
316
SEQUENCES AND SERIES
We first add and then subtract obtain
4 [l+ + + . . . ] to the above series to
00
k=l
]
(6.86) (6.87)
= 0.
(6.88)
Note that when we rearrange the terms of a series, there is a one-to-one correspondence between the terms of the rearranged series and the original series. So far we have emphasized the central role that absolute convergence plays in operations with series. In fact, multiplication is another operation defined only for the absolutely convergent series. If C;=, a , and Cz=,b, are two absolutely convergent series, then their product, (Ca,) (Cb,) = C en, is also absolutely convergent. Furthermore, if C a, converges to A and C b, converges to B , then their product converges to AB. Since the product series, Cc,,, is absolutely convergent, a particular rearrangement of its terms is known as the Cauchy product:
(6.89) = aobo
+ (aob1 + U l b O ) + (aoba + U l b l
+
+ . ..
,
(6.90)
where in general we can write en as n.
c,
6.5
=xakbn-k, k=O
n = 0,1,2,.. .
(6.91)
SEQUENCES AND SERIES OF FUNCTIONS
So far wc have confined our discussion to sequences and series of numbers. One can also define sequences of functions as { f n ( ~ ) )n, =
1 , 2 , .. . ,
(6.92)
where f,,(s)are functions defined over the same interval, which can be infinite along the x-axis. Similarly, we can define series of functions as w
317
SEQUENCES AND SERIES OF FUNCTIONS
where the n t h partial sum is defined as n
Sn(x)= C
Ui(4.
(6.94)
i=l
At each point of their interval of definition, sequences or series of functions reduce to sequences and series of numbers, hence their convergence or divergence is defined the same way. However, there is one subtle question that needs to be answered, that is, a given series or sequence may converge a t each point of its interval, but does it converge at the same rate? For this we introduce the concept of uniform convergence: Definition 6.1. A series, u,(x),is said to converge uniformly to S ( x ) in the interval [LI, L2] if, for a given E > 0, an integer N can be found such that
c,"==,
ISn(x) - S(z)l < E
(6.95)
for all n 2 N and for all x in [Ll, Lz]. This definition of uniform convergence works equally well for sequences, that is, a sequence, { f n ( x ) } ,is uniformly convergent to f ( z ) in the interval [LI,Lz] if, for a given E > 0, an integer N can be found such that
(6.96)
Ifn(z) - f(x)l < E
for all n 2 N. In other words, the uniform convergence of a series is equivalent to the uniform convergence of the sequence of its partial sums. Note that in this definition, E basically stands for the error that one makes by approximating a uniformly convergent series with its n t h partial sum. Uniform convergence assures that the error will remain in the predetermined margin,
f(.)
-E
< f n ( x ) < f(.)
+E,
(6.97)
for all z in [ L I ,LZ].
Example.6.8. The geometric series: Consider the geometric series
n=O
where the nth partial sum is given as
(6.99) and the sum is 1 S ( x ) = -. l-x
(6.100)
318
SEQUENCES AND SERIES
Absolute value of the error, E,, committed by approximating the geometric series with its n t h partial sum is (6.101) Consider the interval [-$, $1. Since the largest error always occurs a t the end points, for a n error tolerance of E, = lop3, we need t o sum at least 7 terms: 1/3n 10-3 = 1 - 1/3' 3 3 , = -lo3, 2
(6.102) (6.103) (6.104)
L
n = 7.
(6.105)
As we approach the end points of the interval (-1, l), the number of terms to be added increases dramatically. For example, for the same error margin, E, = l o A 3 , but this time in the interval (-0.99,0.99), the number of terms to be added is 1146. It is seen from Equation (6.101) that as n goes to infinity, the error E,, which is also equal t o the remainder of the series, goes to zero: lim 7L-w
-11F I - 2 1
--f
0 for 1x1 < 1.
(6.106)
Adding 1146 terms assures us that the fractional error, E,/S, committed in the interval [-0.99,0.99] is always less than (6.107) We write the geometric series approximated by its first 1146 terms as
S(z)e s 1 1 4 6 ( 2 )
+ 0(10-5),
[-0.99,0.99].
(6.108)
In this notation, O(10V5) means that the partial sum s 1 1 4 6 ( X ) approximates the exact sum S(z) accurate up t o the fifth digit. 6.6
M-TEST FOR UNIFORM CONVERGENCE
A commonly used test for uniform convergence is the Weierstrass M-test or in short, the M-test:
PROPERTIES OF UNIFORMLY CONVERGENT SERIES
c,"==,
319
Theorem 6.6. Let u,(z) be a series of functions defined in the interval [LI,L2].If we can find a convergent series of positive constants, M,, such that the inequality
c,"==,
Iun(z)I 5
Mn
(6.109)
holds for all z in the interval [ L l ,L2],then the series uniformly and absolutely in the interval [ L l ,Lz].
c,"==, u,(x) converges Xn
Example 6.9. Weierstruss M-test: Consider the series C,"=,- which np ' reduces to the geometric series for p = 0. To find the values of p for which this series converges, we first apply the ratio test: (6.110) (6.111) =
1x1 3
(6.112)
which says that independent of the value of p the series converges absolutely for 1x1 < 1 . At one of the end points, x = 1, the series converges for p > 1 and diverges for 0 < p 5 1 [Eq. (6.45)]. At the other end point, x = -1, for p > 1 the series becomes alternating series, which converges by Theorem 6.4. To check for uniform convergence, we use the M-test with the series " 1
(6.113)
n=l
which converges for p > 1 Comparing with
c,"=,we write Xn
np
(6.114) which holds for 1x1 5 1. Hence we conclude that the series
C,"==, np
converges uniformly and absolutely in the interval 1x1 5 1 for p > 1. Keep in mind that uniform convergence and absolute convergence are two independent concepts, neither of them implies the other. Weierstrass M-test checks for both uniform and absolute convergence.
6.7
PROPERTIES OF UNIFORMLY CONVERGENT SERIES
Uniform convergence is very important in dealing with series of functions. Three of the most important and frequently used properties of uniformly convergent series are given below in terms of three theorems:
320
SEQUENCES AND SERIES
Theorem 6.7. The sum of a uniformly convergent series of continuous functions is continuous. In other words, if un(z)are continuous functions in the interval [ L l ,L2],then their sum, f ( x ) = u n ( x ) ,is also continuous providcd that the series C;=, u,(x) is uniformly Convergent in the interval [Ll,L21. From this theorem we see that any series of functions which has discontiniiities in the interval [ L l ,Lz]cannot be uniformly convergent. Theorem 6.8. A uniformly convergent series of continuous functions can be integrated term by term in the interval of uniform convergence. In other words, in the interval of uniform convergence, [ L l ,L z ] ,we can interchange the integral and the summation signs:
xr=l
(6.115) (6.116) dun Theorem 6.9. If uiz(x)= - are continuous for [ L l , L z ]and the se-
dx
x,"==,
CC ries CTLE1 u,,(x) converges to f ( x ) for [ L l , L 2 ] ,and the series uL(x) coilverges uniformly for [ L I , L z ] ,then we can interchange the order of the differentiation and the summation signs:
(6.117) cc
=
Cu:,(x).
(6.118)
n=l
Other operations like addition, and multiplication with a continuous function, h ( r ) , of uniformly convergent series, f ( x ) = z,"==,un(x) and g ( x ) = 30 C,I=l I J , ~ ( Scan ) , be performed as follows:
(6.120) n=l
where the results are again uniformly convergent in [ L I L2]. , Example 6.10. Uniformly convergent series: Differentiability and integrability of uniformly convergent series gives us a powerful tool in obtaining new series sums from the existing ones. For example, we use the
POWER SERIES
321
geometric series 1 1-2
-=
00
Exn 1+ z + 2 =
+ . . . , lZ/
+23
< 1;
(6.121)
n=O
differentiating once gives us the series
d "
"
n=O
(1 - x)2
-
cn2n-1
(6.122)
n=O
00
--
d
- 1 -
+ 22 + 32' + . . . ,
1x1 < 1.
(6.123)
n= 1
Differentiating once more, we obtain 1
-
d "C
~ l d - = l
n=l
$ (nzn-l) ,
00
1x1 < 1, (6.124)
n=l
Similarly, by integration we obtain the series
Jc"dz[$---]
=l;'.[Fxn] n=O
00
=Eld s z n , "
(6.126)
n=O
22 23 -= II: + ++ . . . , 1x1 < 1. (6.127)
2
n=O
6.8
2
3
P O W E R SERIES
An important and frequently encountered class of uniformly convergent series is the power series of the form
c 00
c,,(5 - ICo)n = co
+
Cl(Z
- 20)
+
c2(2 - 20)'
+ ... ,
(6.128)
n=O
where C O , c1, c2, . . . are constants and ICO is called the point of expansion. Every power series has a radius of convergence defined about the point of expansion. To use the ratio test, we first obtain the limit
(6.130)
322
SEQUENCES AND SERIES
Hence the power series converges absolutely for (6.131) or for Ix - 201
< R,
(6.132)
where R is called the radius of convergence, which is defined as the limit (6.133)
The radius of convergence can be anything including zero and infinity. When R = 0, the series converges only at the point 2 0 . When R = 00, the series converges along the entire x-axis. If two power series, C,"==, c,,(z - xo), and CT=ob,L(x- ZO),, converge to the same function, f ( z ) , then the two series are identical, that is, c, = b, for all n. In other words, the power series representation of a function is unique. The uniform convergence of power series for the interior of any r satisfying 0 < r < R, where R is the radius of convergence, follows from the Ill-test by comparing C,"o=o c,(x - xo), with Cr=oM,,, Adn = Ic,I rn, where /c,(x - Z O ) ~ Ic,I rn for 0 < r < R. Example 6.11. Radius of convergence: Consider the following power series generated from the geometric series in Example 6.10:
We find their radius of convergences by using the ratio test. For the first series we write the limit (6.135) hence the series converges for 1x1 < 1 and R the ratio test gives
=
1. For the second series
(6.136) hence the radius of convergence is 1 and the series converges for 1x1 < 1. For the last series, we write the limit (6.137)
POWER SERIES
323
where the radius of convergence is again obtained as 1 and the series converges for 1x1 < 1. In other words, the series generated from the geometric series by successive differentiation and integration have the same radius of convergence. Actually, as the following theorem states, this is in general true for power series:
Theorem 6.10. Within the radius of converge, a power series can be differentiatedand integrated as often as one desires. The differentiated or the integrated series also converges uniformly with the same radius of convergence as the original series. Example 6.12. Binomial formula: Using the geometric series 00
1
(6.138)
l-x
n=O
after k-fold differentiation and using the formula
d kxn - n!xn-k dxk
(n- k)!'
(6.139)
we obtain
(6.140) which is the well-known binomial formula for (1 - x ) - ~ In . general the binomial formula is given as
(6.141)
where
(3
are called the binomial coefficients defined as
m!
(6.142)
If m is negative or noninteger, the upper limit of the series [Eq. (6.141)] is infinity. Two special cases,
(6.143) and
(6.144) are frequently used in obtaining approximate expressions.
324
6.9
SEQUENCES AND SERIES
TAYLOR SERIES A N D MACLAURIN SERIES
Consider a uniformly convergent power series, 03
f ( ~ =)
C
C,(Z
- 2 0 )n
, zo - R < z < zo
+ R,
(6.145)
n=O
with the radius of convergence R. Performing successive differentiations, we write
+ C1(Z f’(z)= c 1 + 2c2(z f(.)
= CO
-
+
f”(z)= 2 ~ 2
Icg)
+ + + 3c3(5 - + . . . C a ( Z - ZO)2
C3(Z
2 0 )2
-ZO)
+ ‘.
6 ~ 3 -( 20) ~
2 0 )3
+ ...
1
3
,
(6.146)
+
Based on Theorem 6.10, these series converge for zo - R < J: < 20 R Evaluating these derivatives at J: = ZO, we obtain the coefficients, en, as
.
(6.147)
c,
= f(n)(J:O)/n!,
thus obtaining the Taylor series representation of f(x) as
(6.148) When we set zo= 0, Taylor series reduces t o Maclaurin series: f(Z)=
6.10
c
031 -pqO)Z‘“. n. n=O
(6.149)
I N D E T E R M I N A T E FORMS A N D SERIES
It is well known that L’H6pital’s rule can be used to find the limits of indeterminate expressions like or However, this method frequently requires multiple differentiations, which makes it rather cumbersome to implement. In such situations, series expansions not only can help us t o find the limits of the
g.
325
INDETERMINATE FORMS AND SERIES
indeterminate forms but also can allow us t o write approximate expressions t o any desired level of accuracy. Using the series expansions of elementary functions, we can usually obtain the series expansion of a given expression, which can then be written t o any desired level of accuracy. Example 6.13. Indeterminate forms: Consider the limit
ex sin x
(6.150)
g.
Using the Maclaurin series of the elemenwhich is indeterminate as tary functions e x , s i n x , and cosx, we first write the numerator and the denominator in power series as
+ + + '.'] [x $ + $ + . " 1
ex sin x - [l + x $ $ cosx - ex 11 - ..Z "+ . . . ] 2! + . 4! +2-
+x2
3
-x - x 2
-
22 6
-
-
-
[I + x +
2-. .. 30
25 - . . . '
+ $ + .. .1 (6.151)
120
After a formal division of the numerator with the denominator, we obt ain the series ex sin x - -1 - -x 1 2 + -23 1 + ... , (6.152) cosx - ex 6 6 which yields the limit lim X+O
ex sin x cosx - ex
-+ -1.
(6.153)
This result can also be obtained by the L'H6pital's rule. However, the series method accomplishes a lot more than this. It not only gives the limit but also allows us to approximate the function in the neighborhood of x = 0 to any desired level of accuracy. For example, for small x we ex sin x can approximate as cosx - ex
ex sin x cos x - ex
=
1 -1 - -x2 6
+0(~3).
(6.154)
Example 6.14. Indeterminate forms: Let us now consider the function x2
+ (cos2 x
2 2 [COSZ x
which is as
-
-
1)
11 '
(6.155)
in the limit as x goes to 0. We first write the above expression 1 [cos2x - 11
+-1
22'
(6.156)
326
SEQUENCES AND SERIES
The first term, which is divergent as x 1 cos2z-1
0, can be written as
---f
1
-
...I
pie+f:4+ 2!
-
-z2
4!
2
-1
1 + -.4 - 226 + . . .’ 3
(6.157) (6.158)
45
which after a formal division gives the series expansion
1 -1 - -z2 1 1 - _[cos2z - 11 z2 3 15
+ ... .
(6.159)
Substituting Equation (6.159) into Equation (6.156), the divergent piece, -l/z2, is canceled by 1/z2to yield the finite result in the neighborhood of :1: = 0 as
2 + (cos2z - 1) 2 2 [cos2z -
11
-
1 1 - -z2 3 15
-_
+ 0(~4).
PROBLEMS
1. Find the limits of the following sequences:
(i) (ii)
{
(3n
-
I
q4+ J S T S ,
2+n3+7n4
{-},n=1,2
,....
2. Show the divergence by the n t h term test (i)
CT=lcos -.n237r
3. Show the divergence by the comparison test
n = 1 , 2 ,... .
(6.160)
PROBLEMS
327
4. Check the convergence of the following alternating series:
5 . Determine the convergence or the divergence of the following series:
(iii)
n2 C;==,n! + 3 '
6. Check the convergence of the following series and verify your answers by an other method:
328
SEQUENCES AND SERIES
7. Find the range of x for which the following series converge. Check the end points also.
(ii)
C:='=, n!zn.
8. The nth partial sum of a certain series is given as
s,(x)
= n2x/(1
+ n3x2).
For which one of the intervals, [-1,171 and [0.5,1],does this series converges uniformly. Explain.
9. Investigate the uniform convergence of the series with the partial sums
S,
= n2xePnx.
10. Check the uniform convergence of
11. Verify the formula
kx = 1 + - + k(k + 1lx2 + .. . (1 - X)k 1 1.2 k ( k 1). . ( k 72 - 1)%, . > 1x1 < 1. 1 . 2 . . .72 1
+ +
'
+
+
,
,
12. Using the binomial expansion, show the expansions
and
13. Find the Maclaurin expansions of sin2 x, cos2 x, and sinhx. Find their radius of convergence and show that they converge uniformly within this radius.
PROBLEMS
329
14. Find the Maclaurin expansion of 1/ cos x and find its radius of convergence.
15. Find the limit lim
Z+O
xe" sin 22 cos x - 2ex '
Using the series expansions of elementary functions about x = 0, find an approximate expression good to the fourth order in the neighborhood of x = 0.
16. Find the interval of convergence of the following series:
17. Evaluate the following definite integral to three decimal places:
18. Expand t,he following functions:
6)
1
(ii)
f(x) = G>
in Taylor series about x
=
f(x) =
x+2 (x 3)(x + 4) '
+
1 and find their radius of convergence.
19. Expand the following function in Maclaurin series:
20. Obtain the Maclaurin series
Show that the series converges for x = -1 and hence verify
c 00
log2
=
(- 1)"+' n n=l
330
SEQUENCES AND SERIES
21. Expand 1/x2 in Taylor series about x = 1. 22. Find the first three terms of the Maclaurin series of the following functions: (i)
f ( x ) = cos[ln(x
(ii)
sin x f ( x ) = -.
+ I)].
X
(iii)
1
f(x) =
JiT7G.
23. Using the binomial formula, write the first two nonzero terms of the series representations of the following expressions about x = 0:
6)
f(x) =
+d r n.
(4x - 1)* 1+ x 3
+7x4
24. Use the L’HBpital’s rule to find the limit lim
2-0
d2T-G 4x3 - 3x2
and then verify your result by finding an appropriate series expansion.
25. Find the limit lim . z+o 5x3
+ 2x2
by using the L’H6pital’s rule and then verify your result by finding an appropriate series expansion.
26. Evaluate the following limits and check your answers by using Maclaurin series: (i)
limx,o
1 - ex -
(ii)
limx-o
:[
X
-
1-
ex
1
-
1
.
CHAPTER 7
COMPLEX NUMBERS AND FUNCTIONS
As Gamow mentioned in his book, One Two Three ... Infinity: Facts and Speculations of Science, the 16th-century Italian mathematician Cardan is the first brave scientist t o use the mysterious number called the imaginary i with the property i2 = -1 in writing. Cardan introduced this number t o express the roots of cubic and quartic polynomials, albeit with the reservation that it is probably meaningless or fictitious. All imaginary numbers can be written as proportional to the imaginary unit i . It is also possible to define hybrid numbers, a ib, which are known as complex numbers. Complex analysis is the branch of mathematics that deals with the functions of complex numbers. A lot of the theorems in the real domain can be proven considerably easily with complex analysis. Many branches of science and engineering, like control theory and signal analysis, make widespread use of the techniques of complex analysis. As a mathematical tool, complex analysis offers tremendous help in the evaluation of series and improper integrals encountered in physical theories. With the discovery of quantum mechanics, it also became clear that complex numbers are not just convenient computational tools but also have a fundamental bearing on the inner workings of the
+
Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
3. SelGuk Bayin 331
332
COMPLEX NUMBERS AND FUNCTIONS
universe. In this chapter, we introduce the basic elements of complex analysis, that is, complcx numbers and their functions.
7.1 T H E ALGEBRA OF C O M P L E X N U M B E R S A wcll-known result from mathematical analysis states that a given quadratic equation,
ax2 always has two roots,
x1
+ bx + c = 0, a , b, c E R,
and
52,
(7.1)
and can be factored as
(x - q ) ( x - x2) = 0.
(7.2)
A general cxpression for the roots of a quadratic equation exists, and it is given by the well-known formula
When the coefficients satisfy the inequality b2 - 4ac 2 0, both roots are real. However, when b2 - 4ac < 0, no roots can be found within the set of real numbers. Hence, the number system has to be extended to include a new kind of number, the imaginary i with the property
Now the roots can be expressed in terms of complex numbers as x1,2 =
-b & i d2a
, 4ac - b2 > 0.
(7.5)
In general a complex number, z , is written as 2 =
x +iy,
where .c and y are two real numbers. The real and the imaginary parts of z arc written, respectively, as R e z = x and I m z = y.
(7.7)
Complcx numbers are also written as ordered pairs of real numbers:
z = (x,y).
(7.8)
When y = 0, that is, I m z = 0, we have a real number, z = x,and when x = 0, that is, R e z = 0, we have a pure imaginary number, z = i y . Two complex nurnbers:
THE ALGEBRA OF COMPLEX NUMBERS
333
and
(7.10)
2 2 = (X2,Y2),
are equal if and only if their real and imaginary parts are equal:
(7.11) (7.12)
51 = x 2 ,
y1
= y2.
Zero of the complex number system, = 0, means x = 0 and y = 0. Two complex numbers, zland 2 2 , can be added or subtracted by adding or subtracting their real and imaginary parts separately as 21
+ iy1) f ( 2 2 + i y 2 ) = (51 * 2 2 + i ( Y 1 f Y2)).
(7.13) (7.14)
f 2 2 = (51
Two complex numbers, zland 2122
can be multiplied by first writing
22,
=
(21
+
iYl)(X2
+
(7.15)
iy2)
and then by expanding the right-hand side formally as 2122 = 2 1 x 2
+
i51y2
+
iy122
+
i2YlY2.
(7.16)
Using the property i2 = -1, we finally obtain the product of two complex numbers as 2122 = ( 5 1 5 2 - YlY2)
+
i(x1y2
+
y122).
(7.17)
The division of two complex numbers,
(7.18) can be performed by multiplying and dividing the right-hand side by 21 - 2 1 22
22
+ iy1 +
-iy2
(7.19)
iy2 ' 5 2 - iy2
+ y1y2 + i(YlX2 21y2) 4 + Y22 2 1 x 2 + YlY2 Y1X2 x1y2
-
21x2
-
(
-
22
(x2-iy2):
-
-
4+y;
x;+y;
(7.20)
)
(7.21)
Division by zero is not defined. The following properties of algebraic operations on complex numbers can be shown to follow from the above properties: 21
+
22 = z 2
21,552 21
+
Zl,
= 2221,
+ ( 2 2 + 23) = (21 + 2 2 ) + 23, 21 ( 2 2 2 3 ) = ( 2 1 Z 2 ) z3i 21(Z2
+
23) = 2 1 z 2
+
Z123.
(7.22) (7.23) (7.24) (7.25) (7.26)
334
COMPLEX NUMBERS AND FUNCTIONS
The complex conjugate, z * , or simply the conjugate of a complex number, z , is defined as
z* = x
-
iy,
(7.27)
In general the complex conjugate of a complex quantity is obtained by reversing the sign of the imaginary part or by replacing i with -2. Conjugation of complex numbers has the following properties:
(z1
+ z2)* = z; + z;,
(z1z2)* = z r z ; ,
(7.28) (7.29) (7.30)
z + z* = 2 R e z = 22, z - z* = 2 i I m z = 2iy.
(7.31) (7.32)
The absolute value, ( z (, also called the modulus, is the positive real number
(7.33) (7.34) Modulus has the following properties:
121
= ->
(7.36)
IzI = Iz*I 1
(7.37)
2
(z(
= zz*.
(7.38)
Triangle inequalities, 121
+ z21 I bll + 1221
(7.39)
and Iz1 - z21
2 I l Z l l - Iz211,
(7.40)
derive their name from the geometric properties of triangles and they are very useful in applications. The set of all complex numbers with the above algebraic properties forms a field and is shown as @. Naturally, the set of real numbers, R,is a subfield of @. A geometric representation of complex numbers is possible by introducing the complex z-plane, where the two orthogonal axes, x- and y-axes, represent the real and the imaginary parts of a complex number, respectively (Fig. 7.1).
THE ALGEBRA OF COMPLEX NUMBERS
335
Yf X
Figure 7.1
Complex z-plane.
From the z-plane it is seen that the modulus of a complex number, IzI = T , is equal to the length of the line connecting the point z to the origin 0. Using the length of this line, T , and the angle that it makes with the positive x-axis, 8,which is called the argument of z , usually written as argz, we introduce the polar representation of complex numbers as
z
= r(cos8+
The two representations, z(x, y) and equations
Z(T,
isin8).
(7.41)
O ) , are related by the transformation
x = T cos 0,
(7.42)
y = rsine
and with the inverse transformations r=
Jm',
e = tan-'
(z)
.
(7.43)
Using the polar representation, we can write the product of two complex numbers, z1 = T I (cos 01
+ i sin 81)
(7.44)
and z2 =
r2(cos02+ i s i n Q 2 ) ,
(7.45)
as
ziza = T ~ T Z [ ( C O CS ~O ~S ~ Z- sin81 sin&) = rlrZ[cos(81
+ 8 2 ) + isin(81 + &)I.
+ i(sin81 cos02 + cosel sinez)] (7.46) (7.47)
336
COMPLEX NUMBERS AND FUNCTIONS
In other words, the modulus of the product of two complex numbers is equal to the product of the moduli of the multiplied numbers, lz1z2l
(7.48)
= lz1lIz21 = 7-17-2,
and the argument of the product is equal to the sum of the arguments: argzlz, = argzl
+ argz2 = 81 + 82.
(7.49)
In particular, when a complex number is multiplied with i , its effect on z is to rotate it by 7r/2 : (7.50) (7.51) Using Equation (7.47), we can write
. -t &)I. (7.52) Consequently, if z1 = z2
= ...=
z,
= r(cos8
+ i s i n Q ) ,we obtain
zn = rn [cosn8 + i sin 1281. When
7- =
(7.53)
1, this becomes the famous DeMoivre’s formula:
[cos6
+ i sin 81, = cos 71.8+ i sin no.
(7.54)
The ratio of two complex numbers can be written in polar form as z1 7-1 = -[cos(81 22
7-2
- 6,)
+ i sin(O1 - Q2)],
7-2
# 0.
(7.55)
As a special case of this, we can write z-l
= r-l [cos8 - i sin 81, 7-
# 0,
(7.56)
which leads to z - n = r P T[cos L n8 - i
sin nQ],r
# 0,n > 0;
thus DeMoivre’s formula is also valid for negative powers.
7.2
ROOTS OF A COMPLEX NUMBER
Consider the polynomial
(7.57)
ROOTS OF A COMPLEX NUMBER
zn which has n roots, z sentation of z ,
~
zo = 0,
= z : ' ~ , in
z
n = positive integer,
337
(7.58)
the complex z-plane. Using the polar repre-
= r(cosQ+isinQ),
(7.59)
zo = rO(cos&+isinQO),
(7.60)
we can write Equation (7.58) as
+
rn (cos nQ+ i sin no)= T O(cos 80 i sin 0,) ,
(7.61)
which offers the solutions 1/ n r=rO , nQ+27rk=Qo, k = 0 , 1 , 2,... .
(7.62)
The first equation gives the all equal moduli of the n roots, while the second equation gives their arguments. When k is an integer multiple of n,no new roots emerge. Hence we obtain the arguments of the n roots as (7.63) These roots correspond to the n vertices, (7.64) of an n-sided polygon inscribed in a circle of radius r;ln. Arguments of thesc roots are given as
("",((""."'>.(""+-) n
n
n
2T 2
n
,...,
n
("n
27r. ( n - 1)
n
which arc separated by
ae,
27r
= -.
n
(7.66)
In Figure 7.2 we show the 5 roots of the equation z5
-
1 = 0,
(7.67)
338
COMPLEX NUMBERS AND FUNCTIONS
Figure 7.2
where n
=5
Roots of '2
-
1 = 0.
and 1 = cos 0
+ i sin 0.
(7.68) (7.69)
60 = 0.
(7.70)
20 =
Hence TO
=
1 and
Using Equation (7.66), this gives the moduli of all the roots as 1 and their arguments: arg zi = arg zi-l i = 1,. . . , 5 , as
+ F,
argzl
= 0,
271 + -, 5 271 271. 471 argz3 = - + - = 5 5 5' 4~ 27l 67l argZ4 = - + - = 5 5 5 ' 67l 27l 8.ir argz5 = - + - = -. 5 5 5 argz2
If rn,and
12
=O
(7.71)
are positive integers with no common factors, then we can write
where k = 0 , 1 , 2 , . . . , n - 1.
INFINITY AND THE EXTENDED COMPLEX PLANE
339
7.3 INFINITY AND THE EXTENDED COMPLEX PLANE In many applications we need to define a number,
z
+
00
00,with
the properties
+ z = 00, for all finite z ,
= 00
(7.74)
and
z
‘00= 00
The number
03,
.z
z # 0 but including z = 00.
= 00, for all
(7.75)
which represents infinity, allows us to write
5=0O, z f 0 , 0
(7.76)
and
z
00
=o,
z#O0.
(7.77)
In the complex z-plane, @, there is no point with these properties, thus we introduce the extended complex plane, which includes this new point, 03, called the infinity:
c,
@. = c +{m}.
(7.78)
A geometric model for the members of the extended complex plane is possible. Consider the three-dimensional unit sphere S :
+ x; + xi = 1.
x:
(7.79)
For every point on S, except the north pole N at (O,O,l), we associate a complex number X I +ix2
z=
1 - x3 This is a one-to-one correspondence with the modulus squared,
(7.80)
(7.81) which, after using Equation (7.79), becomes
IzI 2 = -.1+x3
1 - x3
(7.82)
Solving this equation for x3 and using Equations (7.79) and (7.80) t o write and 5 2 , we obtain
XI
z 1=
+ z* 1 + jzI2 ’ z
~
z - z* x2
=i
23 =
~
(1 + lz12)
1zI2 - 1 1zI2 1’
+
(7.83) (7.84)
(7.85)
340
COMPLEX NUMBERS AND FUNCTIONS
T'
1N
Figure 7.3
Riemann sphere and stereorgraphic projections.
This is a one-to-one correspondence with every point of the z-plane with every point, except (0,0, l),on the surface of the unit sphere. The correspondence with the extended z-plane can be completed by identifying the point ( O , O , 1) with m. Note that from Equation (7.85) the lower hemisphere, 2 3 < 0, corresponds to the disc IzI < 1, while the upper hemisphere, 5 3 > 0, corresponds to its outside > 1. We identify the z-plane with the zlsz-plane, that is, the equatorial plane, and use 2 1 - and the 22-axes as the real and the imaginary axes of the z-plane, respectively. In function theory the unit sphere is called the Riemann sphere. If we write z = 2 iy and use Equations (7.83)-(7.85), we can establish the ratios
(zI
+
(7.86) which with a little help from Figure 7 . 3 shows that the points z , 2,and N lie on a straight line. In Figure 7 . 3 and Equation (7.86) the point N is the north pole, (O,O,l), of the Riemann sphere, the point 2 = ( Z ~ , Q , I C ~is) the point at which the straight line originating from N pierces the sphere and finally, z is the point where the straight line meets the equatorial plane, which defines the z-plane. This is called the stereographic projection. Geometrically a stereographic projection maps a straight line in the z-plane into a circle on S , which passes through the pole and vice versa. In general, any circle on the sphere corresponds to a circle or a straight line in the z-plane. Since a circle on the sphere can be defined by the intersection of a plane,
+ bxz +
U Z ~
C Z ~=
d, 0 5 d
< 1,
(7.87)
INFINITY AND THE EXTENDED COMPLEX PLANE
341
with the sphere
x:
+ + 2 52
2 23 =
(7.88)
1,
using Equations (7.83)-(7.85), we can write Equation (7.87) as U(Z
+ z*)
-
bi(z - z * )
+
~ ( 1 21 ~1) =
d(lzI2
+ 1)
(7.89)
or as
+
(d - c ) ( x 2 y2) - 2ax
-
2by
+ d + c = 0.
(7.90)
For d # c this is the equation of a circle, and it becomes a straight line for d = c. Since the transformation is one-to-one, conversely, all circles and straight lines on the z-plane correspond to stereographic projections of circles on the Riemann sphere. In stereographic projections there is significant difference between the distances on the Riemann sphere and their projections on the z-plane. Let ( x 1 , 2 2 ,x3) and (x:,xi,z j ) be two points on the sphere, that is,
x: xi2
+ x2 + 2
2 23 =
+ xi2 + x:
1,
(7.91)
= 1.
(7.92)
We write the distance, d ( z , z’), between these points as [ d ( z , z’)I2 = (XI =2 -2 ( x 4
+(
~ 2 +(23 + x2h. + 2 3 2 ; ) . -
-
(7.93) (7.94)
Using the transformation equations [Eqs. (7.83)-(7.85)] we can write the corresponding distance in the z-plane as (7.95)
If we take one of the points on the sphere as the north pole, N , that is, z’ Equation (7.95) gives
= 03,
(7.96)
+
*
Note that the point z = x iy, where x and/or y are infinity, belongs to the z-plane. Hence it is not the same point as the 00 introduced above.
342
COMPLEX NUMBERSAND FUNCTIONS
Figure 7.4
Graph of f(z)
7.4 COMPLEX FUNCTIONS We can define a real function, f , as a mapping that returns a value, f ( x ) , for each point, x , in its domain of definition:
f :x
---f
f(x).
(7.97)
Graphically, this can be conveniently represented by introducing the rectangular coordinates with two perpendicular axes called the x- and the y-axes. By plotting the value, y = f ( x ) , that the function returns along the y-axis directly above the location of x along the x-axis, we obtain a curve as shown in Figure 7.4 called the graph of f(x). Complex-valued functions are defined similarly as relations that return a complex value for each z in the domain of definition:
f :z
+ f(z).
(7.98)
Analogoiis to real functions, we introduce a dependent variable w and write a complex function as w =f(z).
(7.99)
Since both dependent and independent variables have real and imaginary parts, z=z+iy, w =u+iv,
(7.100) (7.101)
it is generally simpler to draw w and z on separate planes. Now the function w = f ( ~ )which , gives the correspondence of the points in the z-plane to the points in the w-plane, is called mapping or transformation. This allows us to view complex functions as operations that map curves and regions in their
343
COMPLEX FUNCTIONS
Figure 7.5
The w-plane.
domain of definition to other curves and regions in the w-plane. For example, the function
w=d m + i y
(7.102)
+
maps all points of a circle in the z-plane, x2 y2 = c2, c 2 0, to u = c and v = y in the w-plane. Since the range of y is -c 5 y 5 c, the interior of the circle is mapped into the region between the lines -u 5 T: 5 u and u = c in the w-plane (Fig. 7.5). The domain of definition, D , of f means the set of values that z is allowed to take, while the set of values, w = f ( z ) ,that the function returns is called the range of w. A function is called single-valued in a domain D if it returns a single value, w, for each z in D. From now on, we use the term function only for the single-valued functions. Multiple-valued functions like z1I2 or logz can be treated as single-valued functions by restricting them to one of their allowed values in a specified domain of the z-plane. Domain of definition of all polynomials,
f ( z ) = a,zn
+ an-lzn--l +
' ' '
+ ao,
(7.103)
is the entire z-plane, while the function
1 f(z) =
is undefined at the points z = 35,. Each function has a specific real, u(x,y), and an imaginary,
w =f = u(2,y)
+ i U ( Z , y).
(7.104)
~ ( 2 y ),,
part:
(7.105) (7.106)
344
COMPLEX NUMBERS AND FUNCTIONS
Consider f ( z ) = z 3 . We can write
(7.107)
f ( z ) = z3
= z 2z = ((7: = =
(7.108)
+iy)2(z + i y )
[(x2- y2)
(7.109)
+ i(2zy)](z+ iy)
(7.110)
[(2- y2)z - 2xy2]+ i[y(xc”- y2) + 2z2y],
(7.111)
u ( z ,y) = ( x 2- y2)z - 2xy2
(7.112)
+ 2Z2Y.
(7.113)
thus obtaining
and u ( 5 ,y) = y(z2 - v 2 )
For w = sinz, the u((7:,y) and the v(x,y) functions are simply obtained from the expression w = sin(x + iy) as w = sin x cosh y i cos x sinh y.
+
7.5
LIMITS AND CONTINUITY
Since a complex function can be written as
w
=
4 2 , Y)
+
iV(Z,
v),
(7.114)
its limit can be found in terms of the limits of the two real functions u(x, y) and u(z,y). Thus the properties of the limits of complex functions can deduced from the properties of the limits of real functions. Basic results can be summarized in terms of the following theorems: Theorem 7.1. Let f ( z ) = u ( z ,y )
The limit of f ( z ) at
+ iu(lc,y), z = + i y and zo = zo + iyo,
20
(7:
(7.115)
exists, that is, lim f ( z ) = uo Z-ZO
+ iwo,
(7.116)
if and only if
lirn
u ( x , y )= uo,
(7.117)
(s>Y)-(.o,Yo)
(7.118) Theorem 7.2. If fl(z)and exist at 20:
f2(z)
are two complex functions whose limits
(7.119) (7.120)
DIFFERENTIATION IN THE COMPLEX PLANE
345
then the following limits are true: = w l + w2,
(7.121)
= w1w2,
(7.122) (7.123)
The continuity of complex functions can be understood in terms of the continuity of the real functions u and u. Theorem 7.3. A given function f (2) is continuous at zo if and only if all the following three conditions are satisfied:
(i) (ii) (iii)
f(z0)
lim,,,, lim,,,,
exists, f ( z ) exists, f ( z ) = f(zo).
(7.124)
This theorem implies that f ( ~is)continuous if and only if u(z, y) and u(x,y) are continuous.
7.6
DIFFERENTIATION I N T H E COMPLEX PLANE
As in real analysis, the derivative of a complex function at a point, z , in its domain of definition is defined as (7.125) Nevertheless, there is a fundamental difference between the differentiation of complex and real functions. In the complex z-plane a given point z can be approached from infinitely many different directions (Fig. 7.6). Hence a meaningful definition of derivative should be independent of the direction of approach. If we approach the point z parallel to the real axis, AZ = A x , we obtain the derivative
(7.126)
du --+i--. ax -
dv dx
(7.128)
346
COMPLEX NUMBERS AND FUNCTIONS
Z
Az iAy
w
0
Figure 7.6
X
Differentiation in the complex plane.
On the other hand, if z is approached parallel t o the imaginary axis, iAy, the derivative becomes
=
-2-
.du+ dv dY dY
Az =
(7.131)
or
df
-
.du
dv
- - -- 2dz dy dy
(7.132)
For a meaningful definition of derivative, these two expressions should agree. Hence giving us the conditions for the existence of derivative at z as
--ax d y ’
du
dv
(7.133)
dv dX
-%-.
(7.134)
--
.du dY
These are called the Cauchy-Riemann conditions. Note that choosing the direction of approach first along the x- and then along the y-axes is a matter of calculational convenience. A general treatment will also lead t o the same conclusion. Cauchy-Riemann conditions shows that the real and the imaginary parts of a differentiable function are related. In summary, the Cauchy-Riemann conditions have t o be satisfied for the derivative t o exist at a given point. However, as we shall see, in general they are not the sufficient conditions.
DIFFERENTIATION IN THE COMPLEX PLANE
347
Example 7.1. Cauchy -Riemann conditions: Consider the following simple function: f(z)=z
2
,
(7.135)
We can find its derivative as the limit (7.136) =
lim
6-0
{ f (2z + 6))
(7.137) (7.138)
= 22.
If we write the function, f ( z ) = z 2 , as
+ i229,
f ( z ) = ( 2- y2)
(7.139)
we can easily check that the Cauchy-Riemann conditions are satisfied everywhere in the z-plane: dU
dV
dX
dYdU
dX We now consider the function
dY
- = 22 = -,
(7.140)
dV - 2y = --
f(.)
’
2
(7.141)
= IZI
and write the limit (7.142)
+ S)(z* + S*)
-
zz*]
(7.143) (7.144)
6-0
At the origin, z = 0, regardless of the direction of approach, the above limit exists and its value is equal to 0; thus we can write the derivative (7.145) For the other points, if 6 approaches zero along the real axis, S = t, we obtain
dz
(7.146) E+O
=z*+z
(7.147)
348
COMPLEX NUMBERS AND FUNCTIONS
and if 6 approaches zero along the imaginary axis, 6 = i e , we find dz
(7.148) iE-O
= z* - z .
(7.149) 2
Hence the derivative of f ( z ) = IzI does not exist except at z = 0. In fact, the Cauchy-Riemann conditions for f ( z ) = 1zI2 are not satisfied,
dv
dU
-=2x#-=O,
dY dv - = 0 # - - =dU 2 dX dY
(7.150)
dX
unless z
(7.151)
Y>
= 0.
Example 7 . 2 . Cauchy- Riemann conditions: Consider the function IZI
z
# 0,
(7.152)
= 0.
At z = 0 we can easily check that the Cauchy-Riemann conditions are satisfied: (7.153) (7.154) However, if we calculate the derivative using the limits in Equations (7.127) and (7.130), we find
dfo = dz
lim
u ( A x 0, ) - u ( O , O )
Az-0
+i
AX, 0) - v(0,O) AX
]
(7.155)
1
+ i-=l+i
(7.156) (7.157)
and df (0) = lim dz
iAy-0
~ ( 0iAy) , - u(0,O)
I -
= ~A lim Y-o = -1
+i.
(
+i
~ ( 0i A, y ) - v(0,O)
ZAY
1
( i ~ l y ) 1~ ( ~ A Y 1) ~ 2-i ~ i A~y ) ( ~~ A Yi A) y~
+
]
(7.158) (7.159) (7.160)
ANALYTIC FUNCTIONS
349
In other words, even though the Cauchy-Riemann conditions are satisfied at z = 0, the derivative, f ’ ( O ) , does not exist. That is, the Caiichy-Riemann conditions are necessary but not sufficient for the existence of derivative. The following theorem (for a formal proof see Brown and Churchill) gives the sufficient condition for the existence of f ’ ( z ) : Theorem 7.4. If u ( z ,y) and ~ ( zy), are real- and single-valued functions with continuous first-order partial derivatives at ( 2 0 ,yo), then the CauchyRiernann conditions at ( 2 0 ,yo) imply the existence of f ’ ( z 0 ) . What happened in Example 7.2 is that in order to satisfy the CauchyRieniann conditions at ( O , O ) , all we needed was the existence of the firstorder partial dcrivatives of u ( z ,y) and u(z,y ) at (0,O). However, Theorem 7.4 not only demands the existence of the first partial derivatives of u and ‘u at a given point but also needs their continuity at that point. This means that thc first-order partial derivatives should also exist in the neighborhood of a given point for the function to be differentiable. 7.7
ANALYTIC FUNCTIONS
A function is said to be analytic at zo if its derivative, f ’ ( z ) , exists not only at zo but also at every other point in some neighborhood of 2 0 . Similarly, if a function is analytic at every point of some domain D , then it is called analytic in D . All polynomials,
+ a12 + ’ . . + anzn,
f ( z ) = a0
(7.161)
are analytic everywhere in the z-plane. Functions analytic everywhere in the z-plane are called entire functions. Since the derivative of
f
=
1H2
(7.162)
does not exist anywhere except a t the origin, it is not analytic anywhere in the z-plane. If a function is analytic at every point in some neighborhood of a point zo, except the point itself, then the point zo is called a singular point. For example, the function
(7.163) has a singular point at z = 2 . If two functions are analytic, then thcir sum and product are also analytic. Their quotient is analytic except at the zeros of the denominator. If we let f l ( z ) be analytic in domain D1 and let f 2 ( z ) be analytic in domain D2, then the composite function (7.164)
350
COMPLEX NUMBERS AND FUNCTIONS
is also analytic in the domain D1. For example, since the functions f l ( z ) = z2
+ 2 and f i ( z ) = exp(z) + 1
(7.165)
are entire functions, the composite functions
and
are also entire functions.
7.8
HARMONIC FUNCTIONS
+
Given an analytic function, f ( z ) = u iv,defined in some domain, D , of the z-plane, the Cauchy-Riemann conditions,
dv ax dy’ dv du - -ax dy’ du
--_
(7.168) (7.169)
are satisfied at every point of D. Differentiating the first condition with respect to x and the second condition with respect t o y, we get (7.170) For an analytic function the first-order partial derivatives of u and v are continuous, hence the mixed derivatives, d2v/dxdy and d2v/aydx, are equal, and thus we obtain
d2U 8% -+-=o. 8x2 dy2
(7.171)
That is, the real part of an analytic function, u(x, y), satisfies the two-dimensional Laplace equation in the domain of definition D. Similarly, differentiating Equation (7.168) with respect to y and Equation (7.169) with respect to x and then by adding the results, we obtain
d2v d2v -+ - = o . 8x2 dy2
(7.172)
In other words, the real and the imaginary parts of an analytic function satisfy the two-dimensional Laplace equation. Functions that satisfy the Laplace equation in two dimensions are called harmonic functions. They could be
HARMONIC FUNCTIONS
351
used either as the real or the imaginary part of an analytic function. Pairs of harmonic functions, (u, v), connected by the Cauchy-Riemann conditions are called the conjugate harmonic functions.
Example 7.3. Conjugate harmonic functions:
f(x)
=
x3
-
Given the real function
3y2x,
(7.173)
it can be checked easily that it satisfies the Laplace equation
(7.174) Hence it is harmonic and can be used to construct a n analytic function. Using it as the real part, u = x3 - 3y2x, we can find its conjugate pair
du dv as follows: Using the first Cauchy-Riemann condition, - = ax dg' we write
dv
- = 3x2 - 3y2,
dY
(7.175)
which can be integrated immediately t o get
v(x,y) = 3 2 y - y3
+ @(.),
(7.176)
where @(x) is arbitrary at this point. We now use the second Cauchy-
dv
du
to obtain an ordinary differential equaRiemann condition, - = --, dx dy tion for @(x):
+ @'(x) = 6yx,
~ X Y
@'(z)= 0 , solution of which gives @(z) = (7.176) yields v(z, y ) as
v(x, y)
CO.
(7.177) (7.178)
Substituting this into Equation
= 3x2y - y3
+ co.
(7.179)
It can easily be checked that v is also harmonic.
Example 7.4. C - R conditions in polar coordinates: In polar representation a function can be written as
Using the transformation equations
x = r c o s e and y
= rsin0,
(7.181)
352
COMPLEX NUMBERS AND FUNCTIONS
we can write the Cauchy-Riemann conditions as
d u - _1dv _ -
(7.182)
dr r 80' 1 d u dv - _-- rd0 dr'
(7.183)
Example 7.5. Derivative in polar representation: Let us write the derivative of an analytic function, f ( z ) = u(r,0) +iv(r,O ) , in polar coordinates as
df - d u d r - - -dr d z
dz
d u d 0 +z--. d v d r +i-- d u d 0 + -80 d z drdz dOdz'
(7.184)
Substituting the Cauchy-Riemann conditions [Eqs. (7.182) and (7.183)] in Equation (7.184) we write
df dudr d u d 0 .dv dr dud0 - -- T-+ z-+ ir-dz
dr d z
drdz
= -d u [dr -+irg]
dr d z
Since z
= reie, we
drdz
dr d z
+ i K dv [,+ir$]. dr
(7.185) (7.186)
can write (7.187)
Hence the expression inside the square brackets in Equation (7.186) is (7.188) which, when substituted into Equation (7.186), gives (7.189) Following similar steps in rectangular coordinates, we obtain
df - d u .dv _ - - +z-.
dz
7.9
dx
dx
(7.190)
BASIC DIFFERENTIATION FORMULAS
If the derivatives w', wi,and w; exist, then the basic differentiation formulas can be given as
(1) dc dz
- = 0, c E
dz
C,and - = 1 dz
(7.191)
ELEMENTARY FUNCTIONS
dw dz
d(cz) dz
-=c--.
(7.192)
d dwl d ( w 1 + w2) = -+ -. dz
dz
d(WlW2)
dz
d
= w1-
~
dz
353
~
2
dz
+2 dwl dz -202.
(7.193)
(7.194)
(7.195)
(7.196)
d
-zn dz and z
= nzn-l, n
> 0,
(7.197)
# 0 when n < 0 integer.
7.10
7.10.1
ELEMENTARY FUNCTIONS Polynomials
The simplest analytic function different from a constant is z . Since the product and the sum of analytic functions are also analytic, we conclude that every polynomial of order n,
P,(z) = a0
+ a1z + . . . + anzn, a , # 0,
(7.198)
is also an analytic function. All polynomials are also entire functions. The fundamental theorem of algebra states that when n is positive, P,(z) has at least one root. This simple-sounding theorem, which was the doctoral dissertation of Gauss in 1799, has far-reaching consequences. Assuming that z1 is a root of P,, we can reduce its order as
Pn(.) = (2 - Z l ) P n - l ( Z ) .
(7.199)
354
COMPLEX NUMBERS AND FUNCTIONS
Similarly, if to writ,e
z2
is a root of Pn-l(z), we can reduce the order one more time
pn(z) = ( z - z~)(z- .2)Pn-z(z).
(7.200)
Cascading like this we eventually reach the bottom of the ladder as
PTL(Z) = (2 - z1)(z
- 22)
' . . ( z - zn).
(7.201)
In other words, a polynomial of order n has n, not necessarily all distinct, roots in the complex plane. Significance of this result becomes clear if we remember how the complex algebra was introduced in the first place. When equations like z2+1=0
(7.202)
are studied, it is seen that no roots can be found among the set of real numbers. Hence the number system has to be extended to include the complex numbers. We now see that in general the set of polynomials with complex coefficients do not have any other roots that are not included in the complex plane, @, hence no further extension of the number system is necessary. 7.10.2
Exponential Function
Let us consider the series expansion of the exponential function with a pure imaginary argument as (7.203) We write the even and the odd powers separately: (7.204)
Recognizing the first and the second series as cosy and sin y> respectively, we obtain eZy= cos y
+ i sin y,
(7.206)
which is also known as Euler's formula. Multiplying this with the real number e.', we obtain the exponential function
ez
= e"(cosy
+isiny).
(7.207)
355
ELEMENTARY FUNCTIONS
Since the functions u = e" cosy and v = ex sin y have continuous first partial derivatives everywhere and satisfy the Cauchy-Riemann conditions, using Theorem 7.4, we conclude that the exponential function is a n entire function. Using Equation (7.190), namely
du dv df - - +i-
dx
dz
(7.208)
dx'
we obtain the derivative of the exponential function as the usual expression de" dz
-- ez.
(7.209)
Using the polar representation in the w-plane, u write
w = ez
= p
cos 4 , v
= p(cosq5+isin4).
= p sin
4.we
(7.210)
Comparing this with
e z = e"(cosy
+ isiny),
(7.211)
4 = y,
(7.212)
we obtain p = ex and
that is,
lez( = e" and arge" = y.
( 7.2 13)
Using the polar representation for two points in the w-plane, ez'
-
p1 (cos 41
- p2 (cos 4 2 -
+ i sin h ) , + i sin h),
(7.214) (7.215)
we can easily establish the following relations: ez1e"2
-
ezl+"z
,
(7.216)
( 7.2 17) (ez)n = e T L Z .
(7.218)
In terms of the exponential function [Eq. (7.206)]; the polar representation of 2,
z=r(cosO+isinO),
(7.219)
z = r eiQ ,
(7.220)
can be written as
356
COMPLEX NUMBERS AND FUNCTIONS
which is quite useful in applications. Another useful property of ez is that for an integer n we obtain 2x2 n
-
-(e
-
1-12
(7.221)
z 2n7ri
(7.222) (7.223)
hence we can write ez+2nai
-e
e = ez.
In other words, ez is a periodic function with the period 27r. Series expansion of ez is given as
ez
=
z
z'
1+ - + - + . .. l! 2!
(7.224) (7.225)
n=O
7.10.3
Trigonometric Functions
Trigonometric functions are defined as ,it
cosz
=
sinz
=
+ e-iz
2 ,iz
(7.226)
'
- e-iz
(7.227)
22
Using the series expansion of ez [Eq. (7.225)], we can justify these definitions as the usual series expansions: cosz
=
z2
1- 2!
z4 ++." 4!
,
(7.228) (7.229)
Since eiz and e P i z are entire functions, cos z and sin z are also entire functions. Using these series expansions, we obtain the derivatives:
d
- sinz = cosz,
dz d cosz = -sinz. dz
(7.230) (7.231)
The other trigonometric functions are defined as sin z cos z t a n z = -, cot z = cos z sin z ' 1 1 sec z = -, cscz = -. cos 2 sin z
(7.232) (7.233)
357
ELEMENTARY FUNCTIONS
The usual trigonometric identities are also valid in the complex domain: sin 2 z +cos 2 z = 1, sin(z1 f 2 2 ) = sin z1 cos z2 fcos z1 sin z2, cos(z1 f 2 2 ) = cos z1 cos 22 T sin z1 sin 2 2 ,
(7.234) (7.235) (7.236)
sin(-z) = - sin z , cos(-2) = cosz,
(7.237) (7.238)
z)
= cosz,
(7.239)
sin 22 = 2 sin z cos z ,
(7.240)
sin
(4
-
cos2z
7.10.4
= cos2
z
-
sin2 z .
(7.241)
Hyperbolic Functions
Hyperbolic cosine and sine functions are defined as (7.242) sinhz
(y - e-z
(7.243) 2 ‘ Since ez and e-’ are entire functions, coshz and sinhz are also entire functions. The derivatives d (7.244) - sinh z = cosh z , dz d - coshz = sinhz (7.245) =
dz
and some commonly used identities are given as cosh 2 z -sinh 2 z = 1, sinh(z1 f z 2 ) = sinh z1 cosh z2 f cosh z1 sinh z 2 , cosh(z1 2 2 ) = cosh z1 cosh 2 2 f sinh z1 sinh 2 2 , sinh(-z) = - sinh z , cash( -2) = cash Z ,
(7.246) (7.247) (7.248)
sinh 22 = 2 sinh z cosh z .
(7.251)
*
(7.249) (7.250)
Hyperbolic and trigonometric functions can be related through the formulas 1 (7.252) cos z = cos(x iy) = -(ezz-Y ePiz+Y) 2 1 1 (7.253) = -e Y ( c o s x + i s i n x ) + - e Y ( c o s z - i s i n x ) 2 2
+
=
(
eY
= cos x
+ e-Y
+
eY
)cosx-i(
cosh y
-
i sin x sinh y.
-
e-Y
)sin,
(7.254)
(7.255)
358
COMPLEX NUMBERS AND FUNCTIONS
arid similarly, sin z
= sin x cosh y
+ i cos zsinh y.
(7.256)
Froin these formulas we can deduce the relations sin(iy) = i sinh y; cos(iy) = coshy. 7.10.5
(7.257) (7.258)
Logarithmic Function
Using the polar representation, z
= reie, we
w
can define a logarithmic function,
= logz,
(7.259)
as
logz
= Inr
+ iQ,
r > 0.
(7.260)
Since is real and positive, an appropriate base for the l n r can be chosen. Since the points with the same T but with the arguments Q3=2n7r,n = 0 , 1 , . . . , correspond to the same point in the z-plane, log z is a multivalued function, that is, for a given point in the z-plane, there are infinitely many logarithms, which differ from each other by integral multiples of 27ri : w,, = log z = In jzI
+ i arg z
= l n r + i ( B f 2 n n ) , n=0,1, ... , 0 5 0 < 2 i 7 .
(7.261) (7.262)
The value, ‘ ~ ~ 1 corresponding 0, to n = 0 is called the principal value or the principal branch of logz. For n # 0, w, gives to the n t h branch value of log z . For example, for z = 5, z = -1, and 2 = 1 + i we obtain the following logarithnis:
w, = log5 = In5 + i a r g 5 = In5 + i ( O f 2nn), w,,= log(-1) = In 1 + z(i7 f 2n7r) = i(7r f 2n7r), w,,= l o g ( l + i) = In ~ ‘ 3 +i
(7.263) (7.264) (7.265) (7.266) (7.267)
For a given value of n,the single-valued function UJ, =
log z = In IzI
= Inr
+ i arg z
+ i (0 i2n7r), 0 5 0 < 27r,
(7.268) (7.269)
359
ELEMENTARY FUNCTIONS
with the
11
and the v functions given as
u = lnr, v = 0 f 2n7r,
(7.270) (7.271)
has continuous first-order partial derivatives, (7.272) which satisfy the Cauchy-Riemann conditions [Eqs. (7.182) and (7.183)]; hence Equation (7.269) defines an analytic function in its domain of definition. Using Equation (7.189), we can write the derivative of logz as the usual expression d
(7.273) (7.274) (7.275)
Using the definition in Equation (7.269), one can easily show the familiar properties of the log z function as log z1 z2
+ log
= log z1
21
log-
22,
(7.276)
= logal - logz2.
(7.277)
22
Regardless of which branch is used, we can write the inverse of w = logz as ew
(7.278)
= ,lnz -
e(ln~+iO)
-
,lnr
(7.279)
iB
e
(7.280) (7.281) (7.282)
= re20 = z.
Hence elogz
(7.283)
= 2;
that is, the exp and the log functions are inverses of each other. 7.10.6
Let
7n
Powers of Complex Numbers
be a fixed positive integer. Using Equation (7.269), we can write mlog z = m l n r
+ im(8 f 2n7r),
R
= 0,1,...
.
(7.284)
360
COMPLEX NUMBERS AND FUNCTIONS
Using the periodicity of ez [Eq. (7.223)] and Equation (7.262), we can also write log Z m = log[&(o*2"") ] , n = 0 , 1 , . . . , = In rm im(8 f 2n7r).
(7.285) (7.286)
+
Comparing Equations (7.284) and (7.286), we obtain rnlogz
= logzm.
(7.287)
Similarly, for a positive integer p we write (7.288) = Inrl/p
i + -(Of a h ) , k = 0,1,... ,( p
P
-
1).
(7.289)
We can also write log Z'/P = log T'/Pe('/P)('*2nT)
[
= in+
1,
i + -(e f2 k 4
P
n = 0,1,.. . ,
k = o , i , . . . , ( p - 1).
(7.290) (7.291)
Note that due to the periodicity of the exponential function e(i/P)(o*2nx),no new root emerges when n is an integer multiple of p . Hence in Equations (7.289) and (7.291), we have defined a new integer k = 0 , 1 , . . . , ( p - 1). Comparing Equations (7.289) and (7.291) we obtain the familiar expression (7.292) In general we can write (7.293)
or p/P
= ,(m/p) 1%
2,
(7.294)
In other words, the p distinct values of logzm/p give the number zrn/P. For example, for the principal value of z ~ /that ~ , is, for k = 0, we obtain *5/3
= J5/3)
1% z
(7.295)
= ,(5/3)(lnr+iO)
(7.296)
- r5/3ei(5/3)o -
(7.297)
361
ELEMENTARY FUNCTIONS
All three of the branches are given as
z5/3 - T5/3ei(5/3)(Q*2k.rr),
= 0,1,2.
(7.298)
We now extend our discussion of powers t o cases where the power is complex: w = zc or w = z-",
(7.299)
where c is any complex number. For example, for i-2 we write i-2 = exp(-i log i) = exp = exp
{ -i
(7.300)
[ln 1
(5
* 2nn)]}
(7.301)
, n = 0,1,.. . .
(7.302)
+ i (;
5 2nn)
Replacing m / p in Equation (7.294) with c, we write *c
= ,clog=
(7.303)
Using the principal value of log z , we can write the derivative
(7.304) (7.305) -
ce(c- 1)log =
(7.306)
The right-hand side is nothing but czC-l, hence we obtain the formula
(7.307) which also allows us to write zc
= c, log z > z f 0 .
(7.308)
Example 7.6. C o m p l e x exponents: Let us find ii for the principal branch: zi = ,i log i
l+i.rr/2]
-
,i[ln
-
,-.rrP.
As another example we find the principal branch of (1+ i)Z : (1 + i)i = ,ilog(l+i) - e i [ ~ 4+i.rr/4] n
(7.309) (7.310) (7.311)
(7.312) (7.313)
-
,i In d e - . r r / 4
-
2i/2,-7r/4
(7.315)
-
e(i/2)In 2e -.rr/4 .
(7.316)
(7.314)
362
COMPLEX NUMBERS AND FUNCTIONS
7.10.7
Inverse Trigonometric Functions
Using the definition eiw - e-iw
z
= sinw =
(7.317)
2i
along with
we solve for ezzOto obtain ezw = i z
+ (1
-
zy2,
(7.319)
which allows us to write w = -ilog[iz
+ (1
-
z 2 ) 1121.
(7.320)
Thus the inverse sine function is defined as sin-lz
=
+ (1
-ilog[iz
-
z 2 ) 1/21,
(7.321)
which is a multiple-valued function with infinitely many branches. Similarly, one can write the inverses cos-1 z
=
+ (2
-ilog[z
-
1)’/2],
(7.322)
i i+z tan-’ z = - log -. 2 i-z
(7.323)
PROBLEMS
1. Evaluate the following complex numbers:
(A+i ) + i(1 + id),(iv)
(i) (ii)
4 (1 i ) ( l + i)(2 - 2 ) ’ 2 3i (3 - 2 i ) ( l i ) ’ ~
(iii)
2.
(2,1)(1, -21,
(v)
(1 - i I 4 ,
(vi)
(2, -1)(1,3)(2,2)’
+
+ Evaluate the numbers z1 + 2 2 , z1 -
(1,1) 22,
and
z1z2
and show them graphi-
cally when (i) (ii)
ZI = 21 =
(1, I ) , 2 2 = (3, -I), (iii) z1 = (1,3), z2 = (4, -I), (xi, y i ) , zz = (21, -m), (iv) z1 = (1 - i12, z2 = 1 + 2i.
3. Prove the commutative law of multiplication:
zlzz
= 2221.
PROBLEMS
4. Prove the following associative laws:
5. Prove the distributive law
+
zl(22
23)
+ zlz3.
= zlz2
6. Find
(z* + 2i)*,
(i)
(ii)
(2iz)*,
(iii)
2 (1- i ) ( l +i)*’
(iv)
[(I - i ) ‘ ] * .
7. Use the polar form to write
+ 24,
(i)
(z*
(ii)
(1 - i)(l i)*
(iii)
(iz+ I)*,
(iv)
(1 - i ) * ( 2 i).
5
+
+
8. Prove
and (24)*
= (2*)4
9. Prove the following: 121221 =
lz1l 1x21 ,
’
363
364
COMPLEX NUMBERS AND FUNCTIONS
10. Prove and interpret the triangle inequalities:
11. Describe the region of the z-plane defined by (i) (ii) (iii)
1< Imz
< 2,
Iz - I1 2 2 Iz / z - 41 > 3.
+ 11,
12. Show that the equation
describes a circle. 13. Express l+i
[-]
in terms rectangular and polar coordinates. 14. Firid the roots of z4
+ i = 0.
15. If z 2 = i , find z .
16. Show that
17. Show that tanh( 1
+
TZ)
e2 - 1 e2 + 1’
=-
+ 2i) with respect
18. Fiiid the complex number that is symmetrical to (1 t,o the line a r g z = a0.
19. Find all the values of (i)
(ii) (iii) (iv) (v) (vi)
z
=
(3~)l/~,
z = (1 + i3)3/2, z = (-1) 1/3 , z = (1- z). 1/3 , z = (-8)1/3, z = (1p4.
PROBLEMS
365
and show them graphically.
20. Derive Equations (7.83)-(7.85)’ which are used in stereographic projections:
21. Show the following ratios [Eq. (7.86)] used in stereographic projections: y ---z 1
1
.
1-53
22
Verify that the points z,2,and N lie on a straight line. 22. Derive Equation (7.95):
used in stereographic projections to express the distance between two points on the Riemann sphere in terms of the coordinates on the z-plane.
23. Establish the relations
(e’)”
= en’.
24. Establish the sum
and then show
n8 sin [ ( n+ 1)8/2] + cos8 + cos 28 + . . . + cosn8 = cos (5) sin(Ql2) ’
(i)
1
(ii)
sin 8
+ sin 28 + . . . + sin n8 =
sin
(+)
+
sin [(n 1)8/2] sin(8/2) ’
366
COMPLEX NUMBERS AND FUNCTIONS
where 0
< 0 < 27r
25. Show that the following functions are entire: (i) (ii)
f(z) = 22
-
z),
f(z) = -sinycoshz+icosysinhz,
(iii) f ( z ) (iv)
+ y + i(2y
+ isinrc),
= epY(cosI(:
f ( z ) = ezz2.
26. Find the singular points of
(ii)
3.2 z (z 2
+1 + 2) ’
and explain why these function are analytic everywhere except at these points. What are the limits of these functions a t the singular points?
27. Show that the functions
f ( z ) = 22y
+ iy
and
f ( z ) = e2Y(cos17:
+ i sin 2y)
are analytic nowhere.
28. Show that for an analytic function, f ( z ) harmonic, that is,
=u
+ iv,the imaginary part is
29. Show that f ( 2 , 1J)
= y2 - x2
+ 22
and
f(z, y)
= cosh II: cos y
are harmonic functions and find their conjugate harmonic functions.
PROBLEMS
367
30. In rectangular coordinates show that the derivative of an analytic function can be written as
.dv
df - d u ---22-
ax ax
dz
or as
31. Show that eli3.rri
-
e0
=
-e, 1.
32. Find all the values such that ez
=
ez
=1
e(2z+l) =
-2,
+ ifi,
1.
33. Explain why the function
is entire. 34. Justify the definitions cosz sinz
,iZ
+
eiz
-
=
e-iz
2
=
’ e-iz
2i
and find the inverse functions c0s-l z and sin-’ z .
35. Prove the identities sin(z1 + 2 2 ) = sin z1 cos z2 + cos z1 sin z2, cos(z1 + z2) = cos z1 cos z2 - sin z 1 sin z2.
36. Find all the roots of (i) (ii)
cosz = 2 , sinz = cosh2.
37. Find the zeros of sinhz and coshz.
368
COMPLEX NUMBERS AND FUNCTIONS
38. Evaluate (i) (ii) (iii)
(1 i i ) i , (-2)'IT, (1+ i&i)'+i.
39. What are the principal values of (1- i ) " , i2',( - i ) 2 f i ? 40. In polar coordinates, show that the derivative of an analytic function can be written as
CHAPTER 8
COMPLEX ANALYSIS
Line integrals, power series, and residues constitute a n important part of complex analysis. Theorems of complex integration are usually concise but powerful. Many of the properties of analytic functions are quite difficult to prove without the use of these theorems. Complex contour integration also allows us to evaluate various difficult proper or improper integrals encountered in physical theories. Just as in real analysis, in complex integration we distinguish between definite and indefinite integrals. Since differentiation and integration are inverse operations of each other, indefinite integrals can be found by inverting the known differentiation formulas of analytic functions. Definite integrals evaluated over continuous, or at least piecewise continuous, paths are not just restricted to analytic functions and thus can be defined exactly by the same limiting procedure used t o define real integrals. Most complex definite integrals can be written in terms of two real integrals. Hence, in their discussion we heavily rely on the background established in Chapters 1 and 2 on real integrals. One of the most important places, where the theorems of complex integration is put t o use is in power series representation of analytic functions. In this regard, Laurent series play an important part in applications, which also allows us t o classify singular points. Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
S. Selsuk
Bayin
369
370
8.1
COMPLEX ANALYSIS
CONTOUR INTEGRALS
Each point in the complex plane is represented by two parameters, hence in contrast to their real counterparts, ~ ~ f(x) 1 * dx,complex definite integrals are defined with respect to a path or contour, C , connecting the upper and the lower bounds of the integral as
+
If we write a coniplex function as f ( z ) = ~ ( zy), i u ( z , y ) , the above integral can be expressed as the sum of two real integrals:
Furthermore, if the path C is parameterized in terms of a real parameter t :
where the end points; tl and
t2,
are found from
t,he complex integral in Equation (8.2) can also be written as
In the above equations we have written
and dz =
[ ~ ' (+t )iy'(t)] d t .
371
CONTOUR INTEGRALS
Integrals on the right-hand sides of Equations (8.3) and (8.7) are real; hence from the properties of real integrals, we can deduce the following: (8.10)
The two inequalities
(8.13)
and (8.14) where I f ( z ) l 5 M on C and L is the arclength, are very useful in calculations. When z is a point on C, we can write an infinitesimal arclength as
( d z (= (z’(t)+ iy’(t)l d t =
(8.15)
4-.
(8.16)
Length of the contour C is now given as
L
ldzl
(8.17)
= L.
If we parameterize a point on the contour C as z
=
z ( t ) , we can write
(8.18) In another parametric representation of C , where the new parameter r is related to t by t = t ( 7 ) ,we can write the integral [Eq. (8.18)] as (8.19)
372
COMPLEX ANALYSIS
which is nothing but
(8.20) Hence an important property of the contour integrals is that their value is independent of the parametric representation used.
8.2
TYPES OF CONTOURS
We now introduce the types of contours or paths that are most frequently encountered in the study of complex integrals. A continuous path is defined as the curve
where z ( t ) and y(t) are continuous functions of the real parameter t. If the curve does not intersect itself, that is, when no two distinct values of t in [tl,t2]correspond to the same (z,y), we call it a Jordan arc. If
but no other two distinct values of t correspond t o the same point (x,y), we have a simple closed curve, which is also called a Jordan curve. A piecewise continuous curve like y = t 2 ,2 y = t3,2
= t, =t,
t E [1,2], t E (2,3],
(8.21)
is a Jordan arc. A circle with the unit radius
x2 + y2 = 1,
(8.22)
which can be expressed in parametric form as
x
t E [0,27r],
= cost, y = sint,
(8.23)
is a simple closed curve. If the derivatives d ( t ) and y ' ( t ) are continuous and do not vanish simultaneously for any value of t , we have a smooth curve. For a smooth curve the length exists and is given as
L
=
l:'
, / ~ / ( t+) ~ ' ( td t). ~
(8.24)
In general a contour C is a continuous chain of finite number of smooth curves, C1,C2,.. . , Cn. Hence,
C = C1+ C,
+ ' . + Cn. '
373
TYPES OF CONTOURS
Figure 8.1
Contours for Example 8.1.
A contour integral over C can now be written as the sum of its parts as
c
c 2
Contour integrals over closed paths are also written as jCf ( z ) dz,where by definition the counterclockwise direction is taken as the positive direction.
Example 8.1. Contour integrals: Using the contours, C1, C2, and C3, shown in Figure 8.1, let us evaluate the following integrals:
ICl
s
=
c 1[y=s21
We first write I
I =
J c
1
z 2 dz, Ic2 = =
z 2 dz, Ic3 =
c 2 [Y=Ol
J
z 2 dz. (8.26)
c 3 [x= 11
sc z2 dz as the sum of two real integrals [Eq. (8.3)]:
[(x2- y2) dx - 2x7~dy]+ i
J
c
[2xydx + ( x 2 - y2) dy]. (8.27)
For the path C1 we have y = x2 and dy = 22 dx; hence the above integral is evaluated as I,,
=
s
+
[(x2- x4) dx - 2x32x dx] i
P l
=I0
J
[2x3dx + (x2- x4)2x dx]
C 1
C1
(8.28)
(x2- 5x4) dx + i
Pl
(4x3 - 2x5)dx
(8.29)
I0
2
= --
3
+ i-.2
3
(8.30)
374
COMPLEX ANALYSIS
Figure 8.2
For the path Cz we set y obtain
=0
Semicircular path.
and dy
Ic2 =
Jd
=0
1
[xzdx] c:!
in Equation (8.27). Hence we
+i[O]
(8.31)
(8.32) Finally, on the path C, we write x = 1 and d x = 0 t o obtain
Ic,
=
-6’,
dy
+i
c3
1
i (1 - y2) dy
(8.33)
c3
2 = -1 +i-. 3
(8.34)
Example 8 . 2 . Parametric representation of the contour: We now consider the semicircular path C1 in Figure 8.2 for the integral 1 ~ ~ :
I,,
= Jz2 dz
(8.35)
c1 =
/[ud - vy/’] d t c 1
+i
(8.36) C1
with
u = x 2 - y 2, v = 2xy.
(8.37) (8.38)
TYPES OF CONTOURS
375
When we use the parametric form of the path,
x ( t ) = cost, x'(t) = - s i n t , y(t) = sint, y'(t) = c o s t , t Equation (8.36) becomes
[ -3 cos2 t sin t + sin3 t] d t + i
(8.39) E
[o,~],
I"
[-3 sin2 t cos t
(8.40)
+ c0s3 t] d t (8.41)
For the path along the real axis, parameter:
C2
(Fig. 8.2), we can use x as a
x = t , y = o , u = t2 , v = o , hence Equation (8.36) yields
Ll +ill
(8.42)
1
I,,
=
t2 d t
0 dt
(8.43) (8.44) (8.45)
Example 8.3. Simple closed curves: For the combined path in Example 8.2, that is, C = C1 C2 (Fig. 8.2), which is a simple closed curve, the integral IC = ,$ z2 dz becomes
+
(8.46)
(8.47) = 0.
(8.48)
Similarly, for the closed path C in Figure 8.3 we can use the results obtained in Example 8.1 to write
+ I,, + I,, 2 2 1 2 = -(-+ i - ) + - + (-1 + i-) 3 3 3 3
(8.49)
= 0.
(8.51)
1, = ICl
(8.50)
Note that in the complex plane, geometric interpretation of the integral as the area is no longer true. As we shall see shortly, the fact that we have obtained zero for the integral $ z 2 dz for two very different closed paths is by all means not a coincidence. In the next section we elaborate this point.
376
COMPLEX ANALYSIS
Figure 8.3
8.3
Closed simple path
T H E CAUCHY-GOURSAT T H E O R E M
We have seen that for a closed contour, C, the complex contour integral of f ( z ) can be written in terms of two real integrals [Eq. (8.3)] as
f ’ f ’ c
f(2)
dz =
C
[U
dz
-
v d ~+] Z
f’ c
[U d~
+ u dy].
(8.52)
Let us now look at this integral from the viewpoint of Green’s theorem introduced in Chapter 2, which states that for two continuous functions, P(z,y) and Q(z,y ) , defined in a simply connected domain, D , with continuous firstorder partial derivatives within and on a simple closed contour, C , we can write the integral (8.53) where the positive sense of the contour integral is taken as the counterclockwise direction and R is the region enclosed by the closed contour C. If we apply Green’s theorem to the real integrals defining the real and the imaginary parts of the integral in Equation (8.52), we obtain (8.54)
(8.55)
THE CAUCHY-GOURSAT THEOREM
Figure 8.4 We stretch CZ into
C2 =
L1
377
+ L2.
From the properties of analytic functions [Theorem 7.4.1, we have seen that a given analytic function,
f(.)
=
4 5 , Y) + iv(x,Y),
(8.56)
defined in some domain D , has continuous first-order partial derivatives, u T ,u y ,u s , vy, and satisfies the Cauchy-Riemann conditions: (8.57) (8.58) Hence for an analytic function, the right-hand sides of Equations (8.54) and (8.55) are zero. We now state this result as the Cauchy-Goursat theorem, a formal proof of which can be found in Brown and Churchill. Theorem 8.1. Cauchy-Goursat theorem: If a function f ( z ) is analytic within and on a simple closed contour C in a simply connected domain, then
ff
(2)
(8.59)
dz = 0.
C
This is a remarkably simple but powerful theorem. For example, to evaluate the integral
I =
lr
(8.60)
f ( z ) dz
c1
+
over some complicated path, C1, we first form the closed path: C1 C2 (Fig. 8.4, left). If f ( z ) is analytic on and within this closed path, C1 C2, we can
+
378
COMPLEX ANALYSIS
Figure 8.5
Definite integrals.
use the Cauchy-Goursat theorem to write (8.61) which allows us to evaluate the desired integral as
The general idea is to deform Cz into a form such that the integral I can be evaluated easily. The Cauchy-Goursat theorem says that we can always do this, granted that f ( z ) is analytic on and within the closed path: C1 C2. On the right-hand side in Figure 8.4, C, is composed of two straight line segments, L1 and L2. In Example 8.3, for two different closed paths we have explicitly shown that & z 2 d z is zero. Since z2 is an entire function, the Cauchy-Goursat theorem says that for any simple closed path the result is zero. Similarly, all polynomials, P,(z), of order n are entire functions; hence we can write
+
P,(.)
dz = 0,
(8.63)
where C is any simple closed contour.
Example 8.4. Cauchy- Goursat theorem: Let us evaluate the integral (8.64)
INDEFINITE INTEGRALS
379
over any given path, C1, as shown on the left in Figure 8.5. Since the integrand, f ( z ) = 3z2 + 1, is an entire function, we can form the closed path on the right and use the Cauchy-Goursat theorem to write
(3z2
+ 1) dz = 0,
(8.65)
which leads t o
(3z2
+ 1) dz = -
(3z2 + 1) dz.
(8.66)
From f ( z ) = 3z2 =
+1
[3(z2- y2)
+ 11 + i ( 6 ~ y ) ,
(8.67)
we obtain the functions u = [3(z2 - y2) + 11 and ‘u = 6zy, which are needed in the general formula [Eq. (8.3)]. For L2 we use the parameterization z = z, y = 1; hence we substitute u ( z , l ) = 3z2 - 2 and u ( z , 1) = 6z into Equation (8.3) to obtain
=5+9i.
(8.68)
Similarly, for L1 we use the parameterization z = 2, y = y; hence we substitute u(2, y) = -3y2 13, ~ ( 2y), = 12y into Equation (8.3) t o get
+
-
s,,
(-3y2 = -18
+ 13) dy
+ 62.
(8.69) (8.70)
Finally, using Equation (8.68) and (8.70) in Equation (8.66) we obtain
(3z2 JG
8.4
+ 1) dz = (5 + 92) + (-18 + 6i) = -13 + 15i.
(8.71) (8.72)
INDEFINITE INTEGRALS
Let zo and z be two points in a simply connected domain D , where f ( z ) is analytic (Fig. 8.6). If C1 and C2 are two paths connecting zo and z , then by using the Cauchy-Goursat theorem we can write
J’
f(d)dz’
c2
-
J’
c1
f(z’) dz’ = 0.
(8.73)
380
COMPLEX ANALYSIS
Figure 8.6
Indefinite integrals.
In other words, the integral
F ( z )=
l:s(.’)
(8.74)
dz’
c has the same value for all continuous paths (Jordan arcs) connecting the points zo and z . In general we can write (8.75) That is, the integral of an analytic function is an analytic function of its upper limit, granted that the path of integration is included in a simply connected domain D , where f ( z ) is analytic. Example 8.5. Indefinite integrals: An indefinite integral of f ( z ) = 32’ 1 exists and is given as
lc,
(32’
+ 1) dz = z3 +
+
(8.76)
Z.
+
Since ( z 3 + z ) is an entire function with the derivative (3z2 l ) ,for the integral in Equation (8.66) we can write (8.77) where C1 is any continuous path from (1,l) to ( 2 , 2 ) . Substituting the numbers in the above equation, we naturally obtain the same result in Equation (8.72):
Ll
f ( z ) dz = z(z’
+ 1)\!:1;;
= -13
+ 152.
(8.78)
SIMPLY AND MULTIPLY CONNECTED DOMAINS
Figure 8.7
8.5
381
Multiply connected domain between two concentric circles.
SIMPLY A N D MULTIPLY CONNECTED D O M A I N S
Simply and multiply connected domains are defined the same way as in real analysis. A simply connected domain is an open connected region, where every closed path in this region can be shrunk continuously to a point. An annular region between the two circles (Fig. 8.7) with radiuses R1 and R2, R2 > R1, is not simply connected, since the closed path Co cannot be shrunk to a point. A region that is not simply connected is called multiply connected. The Cauchy-Goursat theorem can be used in multiply connected domains by confining ourselves t o a region that is simply connected. In the multiply connected domain shown in Figure 8.7 we have (8.79) however, for C1 we can write
(8.80) where f ( z ) is analytic inside the region between the two circles.
8.6
T H E CAUCHY INTEGRAL FORMULA
The Cauchy-Goursat theorem [Eq. (8.59)] works in simply connected domains, D , where the integrand is analytic within and on the closed contour
382
COMPLEX ANALYSIS
Figure 8.8
Singularity inside the contour.
C included in D. The next theorem is called the Cauchy integral formula. It is about cases where the integrand is of the form (8.81)
where zo is a point inside C and f ( z ) is an analytic function within and on C. In other words, the integrand in fc F ( z ) dz has an isolated singular point in C (Fig. 8.8). Theorem 8.2. Cauchy integral formula: Let f ( z ) be analytic at every point within and on a closed contour C in a simply connected domain D. If zo is a point inside the region defined by C , then (8.82) where C[O]means the contour C is traced in the counterclockwise direction. This is another remarkable result from the theory of analytic functions with far-reaching applications in pure and applied mathematics. It basically says that the value of an analytic function, f ( z o ) , a t a point, 20, inside its domain D of analyticity is determined entirely by the values it takes on a boundary C, which encloses zo and which is included in D. The shape of the boundary is not important. Once we decide on a boundary, we have no control over the values that f ( z ) takes outside the boundary. However, if we change the values that a function takes on a boundary, it will affect the values it takes on the inside. Conversely, if we alter the values of f ( z ) inside the boundary, a corresponding change has to be implemented on the boundary to preserve the analytic nature of the function. Proof: To prove this theorem, we modify the path C as shown in Figure 8.9, where we consider the contour co in the limit as its radius goes to zero.
THE CAUCHY INTEGRAL FORMULA
383
'T'
Figure 8.9
Modified path for the Cauchy integral formula.
Now the integrand, f ( z ) / ( z - z ~ )is, analytic within and on the combined path
C [ O ]= L1[1]+L2[T]+C[(3]+co[O].By the Cauchy-Goursat theorem we can write
(8.83) The two integrals along the straight-line segments cancel each other, thus leaving (8.84) Evaluating both integrals counterterclockwise, we write
(8.85) We modify the integral on the right-hand side as
(8.86) = 11
+ 12.
(8.87)
For a point on co we can write
z
-
zo = roei0, dz
= iroei0d8;
(8.88)
384
COMPLEX ANALYSIS
thus the first integral, II,on the right-hand side of Equation (8.86) becomes (8.89) (8.90) (8.91) For the second integral, 1 2 , when considered in the limit as when z + 2 0 , we can write
T O -+
0, that is,
(8.92) (8.93) The limit (8.94) is nothing but the definition of the derivative of f ( z ) at zo, that is, (8.95) Since f ( z ) is analytic within and on the contour C O , this derivative exists with a finite modulus ldf(zo)/dzI ; hence we can take it outside the integral to write (8.96) Since 1 is an entire function, using the Cauchy-Goursat write
j
co [::1 [ T O -01
dz = 0 ,
theorem, we can (8.97)
thus obtaining 1 2 = 0. Substituting Equations (8.91) and (8.97) into Equation (8.87) completes the proof of the Cauchy integral formula. 8.7
DERIVATIVES OF ANALYTIC F U N C T I O N S
In the Cauchy-integral formula location of the point, 20, inside the closed contour is entirely arbitrary; hence we can treat it as a parameter and differentiate with respect to it to write (8.98)
COMPLEX POWER SERIES
385
A formal proof of this result can be found in Brown and Churchill. Successive diffcrentiation of this formula leads to
f'"'(.o)
=
.I
27rk n!
C [\ ]
f(.) d z , n = 1 , 2 , .. . . ( z - zo)n+l
(8.99)
Asstlining that this formula is true for any value of n, say n = k , one can show that it holds for n = k + 1. Based on this formula, we can now present an important result about analytic functions: Theorem 8.3. If a function is analytic at a given point 20, then its derivativcs of all orders, f ' ( z o ) , f " ( z o ) , . . . , exist at that point. In Chapter 7 [Eq. (7.190)] we have shown that the derivative of an analytic function can be written as
au av f'(z) = - + 2ax dy
(8.100)
or as
av dY
f'(z) = --
du
-.
aY
(8.101)
Also, Theorem (7.4) says that for a given analytic function, the partial derivatives I L , ~ , u,, ~ ~ and , uy exist and they are continuous functions of x and y. Using Theorem 8.3, we can now conclude that in fact the partial derivatives of all orders of u and u exist and are continuous functions of x and y at each point where f ( z ) is analytic.
8.8
C O M P L E X P O W E R SERIES
Applications of complex analysis often require manipulations with explicit analytic expressions. To this effect, power series representations of analytic functions are very useful.
8.8.1 Taylor Series with the Remainder Let f ( z ) be analytic inside the boundary B and let C be a closed contour inside B (Fig. 8.10). Using the Cauchy integral formula [Eq. (8.82)], we can write (8.102)
386
COMPLEX ANALYSIS
..A z
Figure 8.10
Taylor series: ( z - z01 = T , Iz'
-
z01 = T'
where z' is a point on C and z is any point within C. We rewrite the integrand as
where zo is any point within C that satisfies the inequality
/ z - zo/ <
12'
- 201.
(8.104)
Note that choosing C as a circle centered at zo with any radius greater than r automatically satisfies this inequality. Before we continue with this equation, let us drive a useful formula: We first write the finite sum
s =1 + z + z 2 + . . . + z n
(8.105)
s = (1+ + z 2 + . .. + 2 - 1 ) + z n ,
(8.106)
as 2
where z is any complex number. We also write
s
-
S as
+ z 2 + ' . . + zn = z(l + + '. + 2 - 1 )
1= z
2
= z[S-
z"]
'
(8.107) (8.108) (8.109)
COMPLEX POWER SERIES
387
to obtain
S(1- z ) = 1 - P + l ,
(8.110)
which yields the sum as (8.111) Substituting this in Equation (8.106), we obtain (8.112) This is a formula which is quite useful in obtaining complex power series representations. We now use this result [Eq. (8.112)] to write the quantity inside the square brackets in Equation (8.103) to obtain
1 (8.113)
Using the derivative formula [Eq. (8.99)],
‘f 27ri
f ( 2 )dz‘
C [ O ] (2’ - z o ) k + l
1
= -f‘”(*0),
k!
(8.114)
we can also write Equation (8.113) as f ( z ) = f(z0)
+1
Zf’(ZO)(Z
- 20)
1 + . . + -P1) ( z o ) ( z- 2 0 ) ~ ~+’R,, ( n - l)! ’
(8.115) where (8.116) is called the remainder. Note that the above expression [Eq. (8.115)] is exact, and it is called the Taylor series with the remainder term R,. Using the triangle inequality,
388
COMPLEX ANALYSIS
we can put an upper bound t o \Rn\as (8.118)
1 271.
5 -M
L Iz - 201 min[lz’ - zoI - Iz - zol] min Iz’ - 201
[
(8.120)
where L is the length of the contour C , M stands for the maximum value that can take on C , and ‘min’ indicates the minimum value of its argument on C. From Equation (8.104) we have
\).(fI
12 -
zoI
min I z’ - zo I
(8.121)
1,
which means in the limit as n goes to infinity (R,I goes t o zero. Hence, we obtain the Taylor series representation of an analytic function as
When the function is known to be analytic on and within C , the convergence of the above series is assured. The contour integral representation of the coefficients, kf(’)(zo), are given in Equation (8.114). The radius of convergence, R , is from zo to the nearest singular point (Fig. 8.10). When zo = 0, The Taylor series is called the Maclaurin series: f(z) = f(0)
+ -f’(O)z 1 + z1f ” ( 0 ) z 2+ . . l!
’
.
(8.123)
Examples of some frequently used power series are: ez =
zn c,=o2 ,
I4 < 00,
CC
z2n-
1
sinz = CF=l(-l)n+l (2n - I)!’
I4 < 00,
c;=o(-l)“-
IzI
< 00,
sinh z = Cr==, (an - l)!’
IzI
< 00,
IzI
< 00,
cosz =
z2n
z 2 n -(an)!’ 1
zZn
cash z = Cr==, ( 2 n ) !’ 1 -= 1-2
x=:o
zn,
(8.124)
IzI < 1.
Example 8.6. G e o m e t r i c series: In the sum [Eq. (8.112)]
1
--
1-2
-
(1 + z
zn + z2 + .. ’ + 2 - 1 ) + 1-2’
(8.125)
COMPLEX POWER SERIES
389
when 1x1 < 1 we can take the limit n 4co to obtain the geometric series 00
(8.126)
Example 8.7. Taylor series of 1/z about z = 1 : Consider the function f ( z ) = l / z , which is analytic everywhere except at z = 0. To write its Taylor series representation about the point z = 1, we evaluate the derivatives, = z,
f'O'(z)
f'n)(z)= (-1)nat z
=
n!
(8.127)
Zn+l '
1 and obtain the coefficients in Equation (8.122) as
f'"'(1) = (-l)%!. Hence the Taylor series of f ( z )
=
1
"
-
n=O
1/z about z = 1 is written as
- = X(-l)n(z
-
1)n.
This series is convergent up to the nearest singular point, z we write the radius of convergence as
Iz
8.8.2
-
(8.128)
11 < 1.
(8.129) =
0; hence (8.130)
Laurent Series with the Remainder
Sometimes a function is analytic inside an annular region defined by two boundaries, B1 and Bz, or has a singular point at zo in its domain (Fig. 8.11). In such cases we can choose the path as shown in Figure 8.11 and use the Cauchy integral theorem to write (8.131) where z is a point inside the composite path
390
COMPLEX ANALYSIS
Figure 8.11 series.
Annular region defined by the boundaries, B1 and
B2,
in Laurent
and z’ is a point on C. Integrals over the straight line segments, L1 and Lz, cancel each other, thus leaving
where both integrals are now evaluated counterclockwise. We modify the integrands as
(8.134)
where zo is any point within the inner boundary Bz. When z’ is on C1 we satisfy the inequality
COMPLEX POWER SERIES
391
and when z’ is on Cz we satisfy lz’
-
201
< jz - 201.
(8.137)
Note that choosing C1 and Cz as two concentric circles, 12’ - zgi = 7-1 and lz’ - zo( = 7-2, respectively, with their radii satisfying 7-1 > T and 7-2 < T ailtonlatically satisfies these inequalities. We now proceed as in the Taylor series dcrivation and implement Equation (8.112):
-=Ez“1 ‘n-l
1 - Z
2”
1-2’
k=O
to obtain (8.138) where
f(z’) dz’
1
k = 0 , 1 , 2 ,... , n - 1 ,
(8.139) (8.140)
and the remainder terms are written as (8.141)
The proof that J R n (approaches t o zero as n goes to infinity is exactly the same as in the derivation of the Taylor series. For \&“I, if we let 111 be the maxiinurn of I f ( z ’ ) i on C2, we can write the inequality (8.143) Writing the triangle inequality (8.144)
Iz
-
z’l > / z - zoI
- (2’ -
zo/,
(8.145)
392
COMPLEX ANALYSIS
Figure 8.12
Closed contour C in the Laurent theorem.
we can also write
where L is the length of the contour C2 and “min” and “max” stand for the minimum and the maximum values of their arguments on C2. Since on Cz we satisfy the inequality (8.148) as n goes to infinity, IQnl goes to zero. Since the function f ( z ) is analytic inside the annular region defined by the boundaries, B1 and Bz,we can use any closed path encircling B2 and zo within the annular region to evaluate the coefficients, a, and b,, without affecting the result. Hence, it is convenient to use the same path, C, for both coefficients (Fig. 8.12). A formal statement of these results is given in the following theorem: Theorem 8.4. Laurent Series: Let f ( z ) be analytic inside the annular region defined by the boundaries, B1 and B2, and let zo be a point inside B2 (Fig. 8.12); then for every point inside the annular region, f ( z ) can be represented by the Laurent series
(8.149)
CONVERGENCE OF POWER SERIES
1
f ( 2 ’ ) dz‘ ( 2- Zg)n+l’
n = 0 , 1 , 2,.’. ,
393
(8.150) (8.151)
where C is any closed contour inside the annular region encircling B2 and zo (Fig. 8.12). If the function f ( z ) is also analytic inside the central region defined by the boundary Ba, then the coefficients of the negative powers, (8.152) all vanish by the Cauchy-Goursat theorem, thus reducing the Laurent series to the Taylor series.
8.9
CONVERGENCE OF POWER SERIES
Concepts of absolute and uniform convergence for series of analytic functions follow from their definitions in real analysis. For the power series M
(8.153) n=O
we quote the following theorem. Theorem 8.5. For every power series [Eq. (8.153)], we can find a real number R, 0 5 R 5 03, called the radius of convergence with the properties: (i) The series converges absolutely in Iz - zoI 5 R and uniformly for every closed disk Iz - 201 5 R’ < R. (ii) For IzI > R, the series diverges. (iii) For 121 < R, the sum of the series is an analytic function, hence its derivative can be obtained by termwise differentiation and the resulting series has the same radius of convergence. Furthermore, if the contour is entirely within the radius of convergence and the sum is a continuous function on C , then the series can be integrated term by term, with the result being equal to the integral of the analytic function that the original series converges to. Radius of convergence, R, can be found by applying the ratio test as (8.154) (8.155) lz
- 201
< lim n-m
I . 1 , an+1
(8.156)
394
COMPLEX ANALYSIS
thus (8.157)
8.10
CLASSIFICATION OF SINGULAR P O I N T S
Using Laurent series we can classify the singular points of a function. Definition 8.1. Isolated singular point: If a function is not analytic at 20 but analytic at every other point in some neighborhood of zo, then zo is called an isolated singular point. For example, z = 0 is an isolated singular point of the functions 1/z and l / s i n h z . The function l/sin.lrz has infinitely many isolated singular points at z = 0, f l , f 2 , . . . . However, z = 0 is not an isolated singular point of 1/ sin(l/z), since every neighborhood of the point z = 0 contains other singular points. Definition 8.2. Singular point: In the Laurent series of a function, M
(8.158) n=--oo
if for
R
< -m < 0, a, = 0
(8.159)
a-m # 0,
(8.160)
and
then zo is called a singular point or pole of order m. Definition 8.3. Essential singular point: If m is infinity, then 20 is called an essential singular point. For exp(l/z), z = 0 is a n essential singular point. Definition 8.4. Simple pole: If m = 1, then zo is called a simple pole. Power series representation of an analytic function is unique. Once we find a power series that converges t o the desired function, we can be sure that it is the power series for that function. In most cases the needed power series can be constructed without the need for the evaluation of the integrals in Equations (8.150) and (8.151) by algebraic manipulations of known series, granted that the series are absolutely and uniformly convergent.
Example 8.8. Power series representations: Let us evaluate the power series representation of e”
fk)= 1-z’
(8.161)
CLASSIFICATION OF SINGULAR POINTS
395
We already know the series expansions
(8.162) and 1
-1-2
Izj<1.
-l+z+Z2+-,
(8.163)
Hence we can multiply the two series directly to obtain
e’ 22 z3 = [I + z + - + - + . . . ] [ I + z + z 2 + z 3 + . . . I 1 - Z
2! 3! = 1 + 2 2 + - z5 2 +--z1 6 3 t 2 6
(8.164) (8.165)
The resulting power series converges in the interval
IzI
< 1.
Example 8.9. Power series representations: Power series expansions of el/‘ and sin(z2) can be obtained by direct substitutions into the expansions [Eq. (8.124)] of e” and sinz as (8.166) w
sin(z2) =
Z4n-2
C(-I)~+’ (an l)! ’ n=l -
121
< co.
(8.167)
The second series converges everywhere, hence its radius of convergence is infinite, while the first function, el/’, has an essential singular point a t z = 0.
Example 8.10. Power series representations: Consider (8.168) which can be written as 1
+
1
f ( z )= - 2-1 z-2’
(8.169)
Hence f ( z ) is analytic everywhere except the points z = 1 and z = 2. If we write f ( z ) as
(8.170)
396
COMPLEX ANALYSIS
we can use the geometric series [Eq. (8.126)] t o write w
(8.171) n=O w
11 z n , Iz/ < 1.
(8.172)
n=O
This expansion is valid inside the unit circle. We can obtain another expansion if we write f ( z ) as (8.173)
which is valid in the annular region between the two circles IzI 1z/ = 2. We can also write
'[- 1"
f ( z )= z
1 -11/z
=
1 and
+ 1- 2 / z
(8.175)
14 > 2,
(8.176)
1+2"
n=O
which is valid outside the circle Iz( = 2. Note that the series representations given above are all unique in their interval of convergence. Example 8.11. Power series representations: Let us find the power series representation of 1 = z2 cosh z
(8.177) '
Substituting the series expansion of coshz [Eq. (8.124)], we write 1
f ( z )=
22[1
+ 9 / 2 ! + z4/4! + . . . ]
(8.178) (8.179)
Hence f ( z ) is analytic everywhere except at z = 0. Since the series in the denominator of Equation (8.179) does not vanish anywhere except at the origin, we can perform a formal division of 1 with the denominator to obtain 1 1 5 1 - - - - + -z2 z2coshz z2 2 24 hence z = 0 is a pole of order 2.
-
...
?
zf0;
(8.180)
397
RESIDUE THEOREM
Figure 8.13
8.11
Isolated poles.
RESIDUE T H E O R E M
Cauchy integral formula [Eq. (8.82)] deals with cases where the integrand has a simple pole within the closed contour of integration. Armed with the Laurent series representation of functions, we can now tackle contour integrals,
where the integrand has a finite number of isolated singular points of varying orders within the closed contour C (Fig. 8.13). We modify the contour as shown in Figure 8.14, where f ( z ) is analytic in and on the composite path C': n
IL
j=1
j=1
+ CIj[/] + l j [ J ] + CCZ[O].
C"3] = C[O]
(8.182)
We can now use the Cauchy-Goursat theorem to write (8.183)
Integrals over the straight line segments cancel each other, thus leaving (8.184) where all integrals are to be evaluated counterclockwise. Using Laurent series expansions [Eq. (8.149)] about the singular points, we write
398
COMPLEX ANALYSIS
(8.186) where the expansion coefficients for the j t h pole, a k j and b k j , are given in Equations (8.150) and (8.151). Since ( z - z . ? ) is~ analytic within and on the contours cj for all j and k, the first set of integrals vanish:
f I' CJ
1
k (z-2,)
d z = 0 , j = 1 , 2 , . . . , n, k = O , l , . .
(8.187)
For the second set of integrals, using the parameterization 2 -
zj
(8.188)
=TJtP,
we find
r3ie20 d0
,J
.
=
1 , 2,... , n , k = 1,
,
(8.189) (8.190) (8.191)
In other words, n
(8.192) The coefficient of the l / ( z - z j ) term, that is, b l j , is called the residue of the pole z j . Hence, the integral $c,r:>l f ( z ) d z is equal to 27i-i times the sum of the residues of the n isolated poles within the contour C. This important result is known as the residue theorem: Theorem 8.6. If we let f ( z ) be an analytic function within and on the closed contour C, except for a finite number of isolated singular points in C, then we obtain (8.193)
RESIDUE THEOREM
399
Modified path for the residue theorem.
Figure 8.14
where b1j is the residue of the j t h pole, that is, the coefficient of -in the z - zj Laurent series expansion of f ( z ) about z j . Integral definition of b l j is given as (8.194) Integrals in Equations (8.193) and (8.194) are taken in the counterclockwise direction.
Example 8.12. Residue theorem: Let us evaluate the integral 32 - 1 dz
(8.195)
where C is the circle of radius 2 . Since both poles, 0 and 1, are within the contour, we need to find their residues at these points. For the first pole, z = 0, we use the expansion
- 32 - = (-3 1- i ) ( & ) z ( z - 1) =
(
3-;
(8.196)
'>
(-)(l+z+22+.-)
(8.197)
1 -2+ - - 2 z - 2 z 2
(8.198) + . ' . , 0 < 1x1 < 1, z which yields the residue at z = 0 from the coefficient of 1/z as b I ( 0 ) = 1. For the pole at z = 1 we need to expand 1/z in powers of ( z - l),which is given [Eq. (8.129)] as =
1 -=l-(z-l)+(z-l)2--. z
,
121<1.
(8.199)
400
COMPLEX ANALYSIS
+-;I
Now the series expansion of
32 - 1
z ( z - 1)
in powers of ( z - 1) is obtained as
32 - 1 1 z ( z - 1) ( z - 1)
(8.200) (8.201) (8.202)
which yields the residue of the second pole as b l ( 1 ) = 2. Hence the value of the integral [Eq. (8.195)] is obtained as
+
dz = 27ri [b1(0) b l ( l ) ] = 27ri[l
+ 21
= 67ri.
(8.203) (8.204) (8.205)
Example 8.13. Finding residues: To find the residues of unknown functions, we often benefit from the existing series expansions. For example, the residue of cos(l/z) at 2 = 0 can be read from
1 1 1 = 1 - -+ -+ . . . z 2!22 4!z4
COS-
(8.206)
as b l ( 0 ) = 0. Residue of (sinhz)/z2 can be read from sinh z
(8.207) -
-1+ - z+ . . . z 3!
(8.208)
as b l ( 0 ) = 1. Example 8.14. A convenient formula f o r finding residues: Since the Laurent expansion of a function that has a pole of order m a t zo is given as
the combination
F ( z ) = (2 - zo)rnf(z). is free of singularities. Hence the ( m - 1)-fold differentiation of F ( z ) gives the residue of f ( z ) a t 20 as
401
PROBLEMS
For example, the function
(8.210) has 4th-order poles at z residues as ,
bl(2)
=12.
=
Using the above formula, we can find the
1 d3 24 3! dz3 ( z i)"] r=i
[
--
-
+
i -25
(8.211)
and
b1(-i)
=
i 25
-.
(8.2 12)
Example 8.15. Residue theorem: Let us now evaluate the integral (8.213) where the integrand has poles at z = 1 and z = 1 2 i and C is the circle 1z/ = 3. The only pole inside the contour C is z = 1. Hence we calculate its residue as
[
bl(1) = lim ( z - 1) Z-1
23 (2-
+1
1)(z2 + 4 )
2
(8.214)
which leads the value of the integral as
(8.2 15) 4n. 5
- -7,.
(8.216)
PROBLEMS
1. Evaluate the following integral when C is the straight line segment from z = 0 to z = 1 + i :
jc(y
-
z
+ 22x2) dz.
2. Evaluate the integral L ( y - x2
+ i(2z2 + y)) dz,
402
COMPLEX ANALYSIS
where C is two straight line segments from z = 0 to z = 1 z = 1 + i to 2 = 2 f i .
+ i and from
3. Evaluate the integral
dz7 where C is the semicircle z = eiO,where 0 5 0 5
T.
4. Evaluate the integral L(22
+ z2) dz,
where C is the square with the vertices at z
= 0,
z
=
1, z = 1+ i , z = i.
5. Prove the inequality
6. Bounds for analytic functions: Let a function f ( z ) be analytic within and on a circle C, which is centered at zo and has the radius R. If / f ( z ) l 5 M on C, then show that the derivatives of f ( z ) at zo satisfy the inequality
Using the above inequality, show that the only bounded entire function is the constant function. This result is also known as the Liouville
theorem. 7. Without integrating show that
where C is the arc of the circle with IzI = 2 and lying in the first quadrant.
8. Show that
where C is the circle ) z / = R. What can you say about the value of the integral as R co ? --j
PROBLEMS
403
9. For the integrals
4;,A
=
0,
+ z + 1 = 0,
4;,.2
what can you say about their contours?
10. If C is the boundary defined by the rectangle 0 5 x 5 2 , o 5 y 5 2 traversed in the counterclockwise direction, show that
dz ,
2-1-2
dz
= 27ri,
=0, n=2,3,... .
11. Evaluate the following integrals over the circle IzI
=
2 :
12. If f ( z ) is analytic within and on the closed contour C and zo is not on C, then show that
13. Find the Taylor series expansions of cosz and sinz about z 7r.
14. Expand sinh z about z = .iri and z sinh(z2) about z = 0. 15. Find a series expansions of
valid outside the circle IzI 16. Represent the function
=
2 and inside IzI
=
1.
= 7r/2
and
404
COMPLEX ANALYSIS
in powers of ( z - 1). What is the radius of convergence?
17. Expand
in powers of ( z - 1) and find the radius of convergence. 18. Expand
in powers of ( z - 1) for ) z - 11 > 1 and for Iz
-
11 < 1.
19. Find the radius of convergence of the following series: 03
(i)
Cn3zn, n=O 03
n=O
20. For what value of z is the series
n=O
(&>”
convergent? What about the series
21. Using the geometric series, show the following series representations: 00
1
--
-
(1 - z ) 2
1 -(1 - 4
n=l
3
C n z n - 1 , ( z /< 1,
-1
C n(n
2 n=2
-
1 ) Z y
121
< 1.
PROBLEMS
22. Find the limits lim
,
z-xz
cosh z sinh z and lim, -. ( z - T2)2 z - x z ( z - Ti)
23. Classify the singular points of
z
(ii)
f(z)= sin z ’
(iii)
f ( z )= z
(iv)
f ( z ) = tanhz.
+ exp(z) z4
24. Evaluate the contour integral
where (i) C is the circle Iz (ii) C is the circle Iz
-
21
-
= 2,
21 = 4.
25. Find the value of
where (i) C is the circle JzI = 2,
+ 2) = 3 , (iii) C is the circle Iz + 41 = 1. (ii) C is the circle
1%
26. Using the residue theorem show that
(ii)
cot z dz = 27ri.
’
405
This Page Intentionally Left Blank
CHAPTER 9
ORDINARY DIFFERENTIAL EQUATIONS
Differential equations are proven t o be very useful in describing the majority of physical processes. They are basically composed of derivatives of an unknown function. In general, the unknown function depends on several independent variables. Hence the corresponding differential equation, which involves partial derivatives, is called a partial differential equation. In many of the physically interesting cases, the number of independent variables can be reduced to one, thereby reducing the equation t o be solved to an ordinary differential equation. In this chapter, we discuss ordinary differential equations in detail and concentrate on the methods of finding analytic solutions in terms of known functions, like polynomials, exponentials, trigonometric functions, etc. We start with a discussion of the first-order differential equations. Since the majority of differential equations encountered in applications are second-order, we give an extensive treatment of second-order differential equations and introduce techniques for finding their solutions. The general solution of a differential equation always contains some arbitrary parameters called integration constants. To facilitate the evaluation of these constants, differential equations have to be supplemented with extra conditions called initial or boundary conditions. Initial conditions play a significant role in the Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
3. S e l p k Bayin 407
408
ORDINARY DIFFERENTIAL EQUATIONS
final solution of the problem. We also discuss the conditions for the existence of a unique solution which satisfies the given initial conditions. We finally discuss the Frobenius method, which can be used t o find infinite series solutions when all attempts to find a closed expression for the solution fails. There exists a number of tables for the exactly solvable cases. We highly recommend checking Ordinary Differential Equations and Their Solutions by Murphy, before losing all hopes for finding an analytic solution.
9.1
BASIC DEFINITIONS FOR ORDINARY DIFFERENTIAL EQ UAT10 NS
Unless otherwise specified, we use y for the dependent variable and the independent variable. Derivatives are written as
2
for
The most general differential equation can be given as g(z, y , y', . . . , y'"') = 0.
A general solution expressed as F(2,YjC1,C2,...,Cn) =
o
is called an implicit solution. For example, 22'
+ 3y2 + sin y = 0
is an implicit solution of 42
dY = 0. + (6y + COSY) dx
An implicit solution can also be given in terms of integrals, which are also called quadratures. Solutions that are given as
are called explicit solutions. For example,
y ( z ) = 2sin2x
+ 3cos 22
is an explicit solution of d2Y
+ 4 y = 0. dx2
(9.7)
BASIC DEFINITIONS FOR ORDINARY DIFFERENTIAL EQUATIONS
409
Arbitrary constants, C1, ( 3 2 , . . . , C,, are called the integration constants. The order of the highest derivative is also the order of the differential equation. The solution of an nth-order differential equation containing n arbitrary constants is called the general solution. A solution containing less than the full number of arbitrary constants is called the special solution or the particular solution. Sometimes a solution that satisfies the differential equation but is not a special case of the general solution can be found. Such a solution is called the singular solution. In physical applications t o complete the solution, we also need to determine the arbitrary constants that appear in the solution. In this regard, for a complete specification of a physical problem, an nth-order differential equation has to be complemented with n initial or boundary conditions. Initial conditions that specify the values of the dependent variable a t n different points, y(z1) = y1,. . . , y(x,) = y, are called n-point initial conditions. When the values of the dependent variable and its ( n - 1) derivatives at some point 20 are given, yl(x0) = yo,y'(xO) = yb, . . . , Y ( ~ - ' ) ( x o ) = y p p l ) , we have single-point initial conditions. It is also possible to define mixed or more complicated initial conditions. Sometimes as initial conditions we may impose physical principles like causality, where the initial conditions are implicit. A linear differential equation of order n is in general written as
d"Y a,n(Z>-dxn
d"-1
dY
+ a n - l ( z ) p + . . . + a 1 ( z ) -dx+ + o ( x ) y = F ( z ) ,
(9.9)
where the coefficients, ai(x), i = 1,. . . ,n, and F ( z ) are only functions of z. If a differential equation is not linear, then it is nonlinear. Differential equations with F ( z ) = 0 are called homogeneous. If the nonhomogeneous term is different from zero, F ( z ) # 0, then the differential equation is called nonhomogeneous. Differential equations can also be written as operator equations. For example, Equation (9.9) can be written as
x{Y(z)) = F ( z ) , where the differential operator,
X
= u,(z)-
d" dx"
X, is defined
(9.10) as
dn-1 + un-l(z)- dxn-l + . . . + u1(x)-dxd + ao(x).
(9.11)
Linear differential differential operators satisfy L{ClYl
+ ClY2)
= ClX{Y1)
+ c2x{Y2},
(9.12)
where c1 and c2 are arbitrary constants and y1 and y2 are any two solutions of the homogeneous equation.
-t{Y(.))
= 0.
Other definitions will be introduced as we need them.
(9.13)
410
ORDINARY DIFFERENTIAL EQUATIONS
Example 9.1. Classification of diflerential equations: given as
The equation
(9.14) is a linear, nonhomogeneous and second-order partial differential equation. The equation
is a nonlinear, homogeneous, partial differential equation of secondorder, while the equation d4y(x) + 4 d 2 Y O
+ 8y(x) =
4
dx2
dx4
(9.16)
is a linear, nonhomogeneous, fourth-order, ordinary differential equation with constant coefficients. The equation
(9.17) is a second-order, homogeneous and nonlinear ordinary differential equation and (1 - 32)- d3Y(4
dx3
+ 4x2-d2y(x) + 8y(x) = 0 dx2
(9.18)
is a third-order, homogeneous, linear, ordinary differential equation with variable coefficients.
9.2
FIRST- 0R DER DIFFER ENTI A L EQ UAT I0NS
The niost general first-order differential equation can be given as
d x ,Y! Y’) = 0.
(9.19)
We first investigate equations that can be solved for the first-order derivative as
(9.20) Before we introduce methods of finding solutions, we quote a theorem that gives the conditions under which the differential equation (9.20), with the initial condition yo = y(xo), has a unique solution. When all attempts to
FIRST-ORDER DIFFERENTIAL EQUATIONS
411
find an analytic solution fail, it is recommended that the existence and the uniqueness of the solution be checked before embarking on numerical methods.
Theorem 9.1. Consider the initial value problem (9.21)
If (9.22) are continuous functions of x and y in the neighborhood of the point (20,yo), then there exists a unique solution, y(x), that satisfies the initial condition yo = y(x0).
Example 9.2. Uniqueness of solutions: Given the initial value problem 4zy3, y(2) = 6.
(9.23)
Since the functions (9.24) are continuous in any domain containing the point (2,6), from Theorem (9.1) we can conclude that a unique solution for this initial value problem exists.
Example 9.3. Uniqueness of solutions: Consider the initial value problem dy = 6y2/3, y(2) = 0.
dx
(9.25)
From the equations 8.f 4 f (2, y) = 6y2l3 and - = dy y1/3’
(9.26)
we see that d f l d y is not continuous a t y = 0. Actually, it does not even exist there. Now the conditions of Theorem (9.1) are not satisfied in the neighborhood of the point (2,O). Hence, for this initial value problem we cannot conclude about the existence and the uniqueness of the solution. In fact, this problem has two solutions:
y(x) = 0 and y(x) = (2z - 4)3.
(9.27)
412
9.3
9.3.1
ORDINARY DIFFERENTIAL EQUATIONS
FIRST-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION Dependent Variable Is Missing
When the dependent variable is missing in Equation (9.20), f(2,y) wc can write the solution as the quadrature y=
9.3.2
c+
.i’
q x ) dz.
=
a(.),
(9.28)
Independent Variable Is Missing
When the independent variable is missing, f ( z , y ) = O(y), the solution is given as the integral J’&=.+C.
9.3.3
The Case of Separable f
(9.29)
(z, y)
If neither variable is missing but f(x,y) is separable, f(x,y) then the solution can be written as
J’&/ a(.) =
dz
=
+ C.
(a(x)O(y),
(9.30)
The integrals in Equations (9.28) and (9.30) may not always be taken analytically; however, from the standpoint of differential equations, the problem is usually considered as solved once it is reduced to quadratures. Example 9.4. Separable equation: The differential equation (9.31) is separable and can be reduced to quadratures as
(9.32) The solution is implicit and it is given as 1 y3
1
1
y
4x2
1 - . +-=c.
4x
(9.33)
FIRST-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
9.3.4
Homogeneous f
(z, y)
413
of Zeroth Degree
Here we use the term homogeneous differently. In general, a function satisfying
is called homogeneous of degree n. Homogeneous equations of the zeroth degree are in general given as
(9.35)
If we use the substitution
we obtain the first-order equation
(9.37) which is separable. Solution for z ( x ) is now written as
.I& .I + z =
?! X
= lnx
9.3.5
Solution When
f (z, y)
C.
(9.38)
I s a Rational Function
When the differential equation is given as yl =
+ bly + a2x + bzy + UIX
CI
c2
’
(9.39)
there are two cases depending on the value of the determinant
d = det
( t: i) ) .
(9.40)
Case I: When d # 0, that is, when alb2 # a z b l , the numerator and the denominator are linearly independent. Hence we can make the substitutions:
x=u+A y=v+B
(9.41) (9.42)
so that
dx = d u , d y
= dv
and d y l d x = d v / d u .
(9.43)
414
ORDINARY DIFFERENTIAL EQUATIONS
Now Equation (9.39) becomes
Since d
# 0, we can adjust the constants A and B so that (9.45) (9.46)
This reduces Equation (9.44) to (9.47) (9.48) which is homogeneous of degree zero. Thus can be solved by the technique introduced in the previous section. Case 11: When the determinant d is zero, that is, when alba = bla2, we can write
where k is a constant. We now introduce a new unknown z ( x ) as
+ biy =
(9.50)
+ bzy = k z
(9.51)
+ bly’ = z’.
(9.52)
so that
and
a1
Now the differential equation for z ( x ) is separable: (9.53) which can be reduced to quadratures immediately.
Example 9.5. Variable change: Consider the differential equation y’ =
+
2y - 4 2x+y+3
-X
(9.54)
FIRST-ORDERDIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
415
Since d = -5 [Eq. (9.40)], this is case I. We first write Equations (9.45) and (9.46):
- A + 2B - 4 = 0, 2A B + 3 = 0,
(9.55) (9.56)
+
and then determine A and B as
A = -2, B
=
1.
(9.57)
Using the substitutions [Eqs. (9.41) and (9.42)]
x=u-2,
(9.58) (9.59)
y=v+l, we obtain the differential equation [Eq. (9.48)]
-dv_ - - u + ~ v du
(9.60)
2u+v
-1 -
+ 2-UV
2+-
(9.61)
v .
U
Since the right-hand side is homogeneous of zeroth degree, we make another transformation, 2
= v/u,
(9.62)
to get
dz
=[
+
-1 22 2+2
-.I-
1 U
(9.63) which is separable and can be reduced t o quadratures immediately as
J [ s ] d x =
-/" U
-
lnu
+ C.
(9.64)
Evaluating the integral and substituting the original variables, we obtain the solution as
416
ORDINARY DIFFERENTIAL EQUATIONS
9.3.6
Linear Equations of First-Order
First-order linear equations in general can be written as y'
+ a(x)y = b ( z ) .
(9.66)
For homogeneous equations, b ( z ) is zero; hence the equation becomes separable and the solution can be written as
/
=
-
/a(.)
dx,
For the inhomogeneous equation, b(z) # 0, the general solution can be obtained by using the method of variation of parameters, where we treat C in Equation (9.67) as a function of x. We now differentiate
as
y' = C'e- J' 4.1
dx
-
cue-s 4 x 1
(9.69)
and substitute into Equation (9.66) to obtain a differential equation for C ( x ) : 14.1
dr
-
cue-s 4.1
dx
+ace- S
dx = b (XI,
C/ = b(x)eS a(.)
dz .
(9.70)
Solution of Equation (9.70) gives C(z) as (9.71) Note that as in the above equation, which means C ( x )= we omit the primes. Hence the final solution becomes
sxdz' [b(z')ei"'
a(x") dx"
(9.72) Example 9.6. Linear equations: Consider the linear equation
+
3 ~ '6 2= ~ x3.
(9.73)
We first solve the corresponding homogeneous equation 3y'
+ 6zy = 0
(9.74)
1,
FIRST-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
417
as (9.75)
In(y/C)
=
-x2, y / >~ 0,
y
=
ce-x2.
(9.76)
We can now obtain the general solution by using the method of variation of parameters. We substitute y = C(x)e-Z2
(9.77)
into Equation (9.73) to write
C‘ e - x 2
-
1
2x~e-x’ + 2 x ~ e - x=~ - 5 3 , 3
(9.78)
which gives 1
C ( x )= - / x 3 e r 2 dx 3 1
= -eZZ(x2 6
-
1) + co.
(9.79)
We now write the final solution as 1 y = -(x2 - 1) 6
9.3.7
+ Coe-2’.
(9.80)
Exact Equations
Sometimes the right-hand side of (9.20) is given as a rational function: (9.81)
If we write this as
+
M ( x ,y) dn: N ( x ,9) dy = 0
(9.82)
and compare it with the total derivative of a function,
which is
d F = - dF - d X + - d ydF =O, dX 8Y
(9.84)
418
ORDINARY DIFFERENTIAL EQUATIONS
we see that finding the solution is equivalent t o find a function, F ( s ,y), where M ( z ,y) and N(z,y) are the partial derivatives: (9.85) (9.86) Sirice the condition for the existence of F ( z ,y) is given as
d 2 F ( x , y ) - d2F(x,7J) dxdy dyax ’
(9.87)
Equation (9.81) can be integrated when aMzl(x,Y>- d”z,y)
dY
ax
.
(9.88)
Starting with Equation (9.85), we can immediately write the integral (9.89)
In order to evaluate @(y) we differentiate Equation (9.89) with respect to y :
(9.90) (9.91) and use Equation (9.86) to write
which gives @(y) as (9.93) Now the solution [Eq. (9.89)] can be written as
Instead of Equation (9.85), if we start with (9.95)
419
FIRST-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
we obtain an equivalent expression of the solution as
Example 9.7. Exact equations: Consider the following exact equation:
(x2 - 2y2) dx + ( 5 - 4xy) d y = 0.
(9.97)
Using Equation (9.96) we find the solution as
/ /(5
-
5y
~ ( xy), d y +
4xy) dy -
4 sy2 2
/ +/ /
+
/
[ ~ ( xy),
gJ ”J
-
[(x2- 2y2) - -
-
1
( 5 - 4xy) d y dx
dX
[(x’
4xy2 + 5y - 2
~ ( xy>, dy] d x = C,
-
[(x2
d
2y’)
-
-
2y2)
5y - 41t.y2 + 2
-
/
= C,
I 4y2)]
/ ( 5 - 4xy) dy dx = C ,
(
- 5y dX
-
- dx=C,
[(XZ - 2y2) 23
4xy2 + 5y - 2 3
-
+ ay”]
2y2x
dx = c,
+ 2y2x = c,
5y - 2xy2
x3 += C. 3
(9.98)
It is always a good idea to check the solution. Evaluating the total derivative of f ( x , y)
= 5y
-
2xy 2
+x3 = c, 3
(9.99)
we obtain the original differential equation [Eq. (9.97)]
(x2 - 2y2) dx + ( 5 - 4 2 ~ d) y 9.3.8
=
0.
(9.100)
Integrating Factors
If a given equation, M ( X 1 Y)
dx + N ( x , Y) d y = 0,
(9.101)
is not exact, then
dM(X,Y) dY
+
dN(X1 Y)
ax
.
(9.102)
420
ORDINARY DIFFERENTIAL EQUATIONS
Under most circumstances, it is possible t o find an integrating factor, I(x,y), such that the new functions, I ( z ,y)M(z,y) and I(x,y)N(x,y), in
I(., Y) [ M ( x Y) , d x + N ( x ,Y) dYl
=0
(9.103)
[ I ( %Y)N(Z, , Y)1
(9.104)
satisfy the integrability condition:
a [I(x,Y)M(Z, Y)1 aY
-
dX
Unfortunately, the partial differential equation to be solved for I(x,y) :
(9.105) is usually more difficult than the original problem. However, under special circumstances, it is possible to find an integrating factor. For example, when the differential equation is of the form (9.106) an integrating factor can be found by a quadrature. We rewrite Equation (9.106) as
[ P ( x ) y- &(.)I
+dy =0
dx
(9.107)
and identify the 111 and the N functions as
M ( x ,Y) = [ P ( ~ )Y &(.)I N(x,y) = 1.
,
(9.108) (9.109)
Applying the condition for exact differential equations [Eq. (9.88)]; namely (9.110) (9.111)
P ( z ) = 0,
(9.112)
we see that Equation (9.106) is not exact unless P ( x ) is zero. Let us multiply Equation (9.106) by an integrating factor, I(x),that depends only on x as
I ( z ) [ P ( x ) y- Q(z)]d x
+ I(z) d y = 0.
(9.113)
Now the differential equation that I(x) satisfies becomes (9.114) (9.115)
FIRST-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
421
which is easily solved to yield the integrating factor
~ ( x=) e f P ( x ) d z .
(9.1 16)
Multiplying Equation (9.106) with I(x) we write
es P ( X )
dxdy
dx
+ ,S
~ ( x d )x p ( x ) y
= S,
~ ( x dxQ(x), )
(9.117)
which can be simplified as
(9.118) This gives the general solution of Equation (9.106) as ef P ( x ) dxy(x)=
J [ef W X dxQ(x)] ) dz + C.
(9.119)
Example 9.8. Integrating factor: Consider the following differential equation: dy -+dx
5x+1 X
y
= e-52
(9.120)
+
52 1 Since P ( x ) = -, we write the integrating factor as X
= exp
5x + 1 (J 7 dx)
= xe5x.
(9.122) (9.123)
Multiplying Equation (9.120) with I(x) and simplifying, we get
(9.124)
d [xe5"y] = 2 , dx xe5"y = 22 2
(9.125)
+ C,
(9.126)
and the general solution becomes
(9.127) If there are two independent integrating factors, I1 and I2, that are not constant multiples of each other, then we can write the general solution as 11(x,Y) = CIz(Z,Y).
(9.128)
422
ORDINARY DIFFERENTIAL EQUATIONS
Integrating Factors for Some Terms in Differential Equations: Term Y - XY’ f ( x ) + Y - XY’ Y + [ f ( Y ) - XIY’ + .f(XY)I - 4 1 - f ( X Y ) l Y ’
Integrating Factor 1 h 2 , 1 1 Y 2 , I l X Y , 1/(x2 + Y2) 1/x2 11Y2 1lXY
If an equation is exact and, furthermore, if M(x, y) and N(x, y) are homogeneous of the same degree different from -1, the solution can be written without the need for a quadrature as
First-order differential equations have an additional advantage; that is, the roles of dependent and independent variables can be interchanged. Under certain circumstances, this property may come in handy. For example, the equation y3 dx
+ (5xy
-
1) dy
=0
(9.130)
is clearly not exact. When we write it as dy -
dx
Y3 (1 - 5xy)’
(9.131)
it is nonlinear and not one of the types discussed before. However, if we interchange the roles of x and y: 1- 5 2 ~
dx -
dY
Y3
(9.132)
a i d rewrite it as (9.133) it becomes not only linear in x but also assumes the form (9.134)
Of course, it may not always be possible to invert the final solution, x(y), as Y (XI.
423
FIRST-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
9.3.9
Bernoulli Equation
A class of nonlinear first-order differential equation that yields analytic solutions is the Bernoulli equation: Y l = f(X)Y
+ g(x)Yk.
(9.135)
There are the following special cases: (i) When f(x) and g(x) are constants, the Bernoulli equation is separable. (ii) When k = 0, the equation is linear. (iii) When k = 1, the equation is separable. (iv) When k is an arbitrary constant, not necessarily an integer and different from 0 and 1, the solution is given as
yl-'"= Y1 + Y2,
(9.136)
where
(9.137)
y1 = Ce4,
(9.138) J
(9.139) For the proof, make the transformation yl-k
=u
(9.140)
to get the linear equation
u'(4 = (1- k)[g(x)+ f(x)uI,
(9.141)
solution of which leads t o the above result [Eqs. (9.136)-(9.139)].
Example 9.9. Bernoulli equation: Consider the Bernoulli equation yl = xy
+ xy3,
(9.142)
where
(9.143) (9.144) (9.145) We first find
4 ( ~=)(1 - 3) ' - -2- 2
s
x dx
2
(9.146)
424
ORDINARY DIFFERENTIAL EQUATIONS
and then write the solution as
1
- = ce-z2 Y2
= ce-z2
9.3.10
+ (1 -
-
3)e-z’ J’rczz‘iz (9.147)
1.
Riccati Equation
The generalized Riccati equation is defined as
This simple-looking equation is encountered in many different branches of science. Solutions are usually defined in terms of more complicated functions than the elementary transcendental functions. If one or more special solutions can be discovered, then the general solution can be written. Case I: If a special solution, y1, is discovered, then the transformation Y(X)
= y1
+ -U1
(9.149)
changes Equation (9.148) into a linear form:
+
+
u’ U ( z ) u h ( z )= 0,
(9.150)
where
U ( Z )= g
+ 2hyi.
(9.151)
The general solution can now be written as
where ~ ( z=) exp
(1
~ ( z ‘)iz).
(9.153)
Case 11: Using the transformation = Yl + u ,
(9.154)
we can also reduce the Riccati equation into a Bernoulli equation:
d(z)= ( g + 2hy1)u + h U 2 ,
(9.155)
the solution of which is given in Equations (9.135)-(9.139). For cases where more than one special solution are known and for methods of finding special
FIRST-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
425
solutions, we refer the reader to Murphy. Examples of Riccati equations with one special solution can be given as Y + -,
dy = 16x3(y - 2 ~ ) ’
dx
dY = 2(1 - x)y2
dx dY - -2y dx
y1 = 2 ~ ,
(9.156)
X
2
+ (2X - 1)y
+zy+
1 2
1 y1 = 2 2’
X
-
-,
(9.157)
X
-, y 1 = -,
(9.158)
2
(9.159) Sometimes a special solution can be spotted just by inspection. As in the following example, a series substitution with finite number of terms can also be tried.
Example 9.10. Riccati equation: Given the Riccati equation y f = X - 2 + - y -3; Y
1
X
(9.160)
21
where
f ( x )= x
-
(9.161)
2,
(9.162) (9.163)
To find a special solution, we can try y = c1x constants c1 and c2 to obtain y1
+
c2
and determine the
(9.164)
= X.
Making the transformation [Eq. (9.154)]
(9.165)
y=x+u,
the differential equation t o be solved for u becomes one of Bernoulli type: U’
= (9 =
+ 2 h y l ) u + hU2
( t - 2 3 - - u
1
X
= ( ; - 2 ) 1 L - ; u .1 2
(9.166) 2
(9.167) (9.168)
426
ORDINARY DIFFERENTIAL EQUATIONS
Solution for ~ ( xcan ) now be found as (9.169) where
4=-
1(9
= -3lnx
-
2 ) dx
+ 2z.
Evaluating the integrals, we obtain 1 - CeZ" U
-
23
(9.170)
.""/ x 3 e - 2 x
(9.171)
(-:)
dz
x3
(9.172) (9.173)
1
(9.174)
) Equation (9.165), we obtain the final solution as Substituting ~ ( xinto y(x) = 17: + x3
[
ce2x
-
-- - -
2
2
4
(9.175)
9.3.11 Equations That Cannot Be Solved for y' First-order equations, = 0,
S(Z,Y,Y')
(9.176)
that cannot be solved for the first derivative may be possible to solve analytically for some specific cases. One such case is Lagrange's equation:
Y = Cp(Y')X
+ @(Y'),
(9.177)
which can always be solved in terms of quadratures. Substitute y' = p in the above equation to write Y = cp(P)X
+ @(PI
(9.178)
and then differentiate with respect to x to get
p
-
cp = -(xcp' dP
dx
+ $1.
(9.179)
FIRST-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
427
We write this as
(9.180) (9.181) which is now a first-order equation, the general solution of which can be given as
x = x(p, C ) .
(9.182)
Substituting this back into the Lagrange’s equation [Eq. (9.178)] gives us a second equation, that is, y in terms of p as Y(P, C ) = Cp(P)X(P,C )
+ “1.
(9.183)
Eliminating p between Equations (9.182) and (9.183) gives us the solution
Y = Y(X, C ) .
(9.184)
Example 9.11. Lagrange ’s equation: Consider the equation given as 7J = 4xy’
Substituting y’
=p,
+ y’2.
(9.185)
we write this as = 4px
y
+p
2
(9.186)
and then differentiate with respect to x to get
p -3p
dp dP + 42 + 2p-,dx dx dP = (42 + 2p)-. dx = 4p
(9.187)
We now write this as
dx dp
-
2 3’
4x 3p
(9.188)
which is a first-order linear differential equation that can be solved by the substitution z = x/p (Section 9.3.4) as
x(p) = p
[cp-6
-
;]
(9.189)
Substituting this in Equation (9.186), we obtain
[
y = 4p.p c p - 3 = p2
[
4cp-5
’
-
271
- +P2
:I.
-
(9.190) (9.191)
428
ORDINARY DIFFERENTIAL EQUATIONS
We now have two equations:
[
y(p) = p2 4cp-3
-
and
x(p) = p $-.[
-
I:
(9.192)
-
51
,
(9.193)
which describes the solution in terms of the parameter p . Example 9.12. Clairaut equation: A special case of Lagrange’s equation, Y = ZY’ + $ (Y’)
,
(9.194)
where $(y’) is a known function of y’, is called the Clairaut equation. We first substitute p = y’ to write Y = XP + $(PI
(9.195)
and then differentiate with respect to x to get
(9.196) There are two possibilities: If we take
dP = 0, -
(9.197)
p = c.
(9.198)
dx
the solution becomes
Substituting this back into Equation (9.194)gives the general solution as
Y = c x + 1L(C),
(9.199)
which is a single parameter family of straight lines. The second alternative is
[x+
21
= O.
(9.200)
Since I) is a known function of p, this gives p as a function of x: P = P(X)’
(9.201)
which when substituted into the Clairaut equation [Eq. (9.195)]gives the singular solution
Y = xp(z)+ 1L[P(x)l.
(9.202)
This may be shown to be the envelope of the family of straight lines found before [Eq. (9.199)].
SECOND-ORDER DIFFERENTIAL EQUATIONS
9.4
429
SECOND-0 R D ER D I FFER ENT IA L EQ UAT I0 NS
The most general second-order differential equation can be given as
where its general solution, F(Y,
x,c1,C2) = 0,
(9.204)
contains two arbitrary integration constants, C1 and C2. The general linear second-order differential equation can be written as
Ao(x)y// + Al(Z)Y’ + AZ(2)Y
= f(.).
(9.205)
When the right-hand side, f(x), is zero, the differential equation is called homogeneous, and when f(x) is different from zero, it is called nonhomogeneous. The general solution of a second-order linear differential equation can be written as the linear combination of two linearly independent solutions, y1 and YZ, as Y(%) = ClYl
+ CZYZ,
(9.206)
where C1 and C2 are two arbitrary constants. Linear independence of two solutions, y1 and y2, can be tested by calculating a second-order determinant, W(y1, yz), called the Wronskian, which is defined as
(9.207) Two solutions, y1 and y2, are linearly independent if and only if their Wronskian, W(y1, y2), is nonzero. When the two solutions are linearly dependent, their Wronskian vanishes. Hence they are related by
Y1Y; = Y i Y 2 ,
(9.208)
which also means
(9.209) (9.210) (9.2 11) In other words, one of the solutions is a constant multiple of the other. This also means that for two linearly dependent solutions we can find two constants, C1 and C2, both nonzero, such that
ClYl + CZYZ = 0.
(9.212)
430
9.5
ORDINARY DIFFERENTIAL EQUATIONS
SECOND-ORDER DIFFERENTIAL EQUATIONS: M E T H O D S OF SOLUTION
In the most general equation,
f (x,Y, d ,Y”)
(9.213)
= 0,
we can reduce the differential equation to one of first-order when either the dependent or the independent variable is missing. Independent variable is missing: Using the transformation (9.214) we can write Equation (9.213) as (9.215) which is now a first-order differential equation. Dependent variable is missing: We now use the substitution
dP y’ = p , y” = p’, p’ = dx
(9.216)
and write Equation (9.213) as
f (x,P,P’)= 0.
(9.217)
which is again one of first-order.
+
+
Example 9.13. y” f (x)y’ g(y)yI2 = 0 : This is an exactly solvable case for second-order nonlinear differential equations with variable coefficients. We use the transformation (9.218)
Y l = U(Z:)V(Y)
to write the derivative y” = u’v
+ u-dv y
/
dY dv = UIV + u2u-. dY
(9.219)
Substituting Equations (9.218) and (9.219) into Yl’
+m
y ’ + S(Y)d2 = 0,
(9.220)
we get U‘V
+ u2 v-dv = dY
-
fuv
-
gu2v2.
(9.221)
SECOND-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
431
Rearranging and collecting the x dependence on the left-hand side and the y dependence on the right-hand side, we obtain
(9.222) We can satisfy this equation for all x by setting both sides to zero, which gives two first-order equations
(9.223) (9.224) and
(9.225) Both can be integrated to yield
u(x)= Cle-J v(y)
f(x) ‘x,
= C2e-J” g(y) ‘9.
(9.226) (9.227)
Using Equation (9.218) and integrating once more, we obtain the implicit solution:
(9.228) where co and c1 are integration constants.
9.5.1
Linear Homogeneous Equations with Constant Coefficients
In this case the differential equation is written as y”
+ a l y l + a2y = 0,
(9.229)
where a1 and a2 are constants. It is clear by inspection that we can satisfy this equation if we find a function whose first and second derivatives are proportional to itself. An obvious candidate with this property is the exponential function: y
= erx
(9.230)
where r is a constant:
(9.231) (9.232) (9.233)
432
ORDINARY DIFFERENTIAL EQUATIONS
Substituting y = eTxinto Equation (9.229), we get a quadratic equation for r called the characteristic equation:
r2
+ alr
+a2 = 0 ,
(9.234)
which can be factorized to yield the roots as
( r - rI)(r - r 2 ) = 0,
(9.235) (9.236) (9.237)
With r taken as the roots of the characteristic equation, r1 and 1-2, we now have two solutions, eTIXand e r Z x ,of the second-order linear differential equation with constant coefficients [Eq. (9.229)]. Depending on the coefficients, a1 and a2, we have the following three cases: Case I: When - 122) > 0, the two roots, r1 and r2, are real and distinct. The corresponding solutions are linearly independent and the general solution is given as
(5
y = C1erlx
+ C2erZX,
(9.238)
where (9.239)
We can also express this solution as y = Ale-a1x12cosh(pz
+Az),
(9.240)
where (9.241)
(9.243) Case 11: When
= 0, the roots are real but identical: a1 r = r1 = r2 = -2'
(9.244)
SECOND-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
433
In this case we get only one solution. To obtain the second linearly independent solution, we try y = ‘u(x)erz.
(9.245)
Substituting its derivatives,
+ ruerz, = u”erx + ruleTz + ru‘erx + r2’uerz,
y’ = d e T z
(9.246)
y”
(9.247)
into Equation (9.229), we obtain uuIIeI’s
+ ruUlerT+ rulerz + r2uerz + al(’ulerz+ ruerz)+ uZueTz = 0 ,
(9.248)
which, after rearranging, becomes [u” Since
el.,’
+ +r + (T
a1)d
+ ( r 2+ alr + uz)’u]erx = 0.
(9.249)
# 0, we obtain the differential equation to be solved for u ( x ) as Y”
+ (27- + a1)u’ + ( 2+
UlT
+
U2)Y
= 0.
(9.250)
Using Equation (2.244), namely
r
a1 = --
(9.251)
2’
arid the relation a2
=
4 -,
(9.252)
4
we write
+
u/! (-a1
which gives
‘u
+ u1)u’ + (-a:4
-
a;
-
2
u: + -)u 4
= 0,
(9.253)
ul! = 0,
(9.254)
as ‘U
=
bl
+ b22,
(9.255)
where bl and bz are constants. We can now take the second linearly independent solution as yz
= xeTz,
(9.256)
which leads to the general solution :y = Clerx
+ C2xerx, r =
a1
2
(9.257)
434
ORDINARY DIFFERENTIAL EQUATIONS
Note that there is nothing wrong in taking the second linearly independent solution as YZ =
(2
( h + b2z)erz,
however y~2taken like this also contains yl
Case 111: When
-
(9.258)
= erz.
az) < 0, the two roots,
TI
and
TZ,
are complex.
Hence they can be written in terms of two real numbers, dl and dz, as
(9.259) (9.260) where
(9.261) (9.262) Now the general solution is given as = e d ~ z[ c l e i d 2 z + cZe-i&z
I.
(9.263)
Since C1 and Cz are arbitrary, we can also take them as complex numbers and write
+
+
y = e d I z [ ( C I R i ~ l I ) ( c o s d z z isindzz)
+ (CZR + iC2I)(cosd2z =
-
i sin &z)]
(9.264)
e d 1 " { ( c l R + C 2 R ) c o S d ~ z + ( C-Cl1)sind2z z~
+ i [(Cu + C21)cosd2z + (C1R
-
C Z R sindzz]}, )
(9.265)
where C ~ and R Cir,i = 1 , 2 , are real numbers representing the real and the imaginary parts of Cl and Cz. Choosing
(9.266) (9.267) we can write the solution entirely in terms of real quantities as
+
y(z) = e d l z [ 2 C 1 ~ c o s d z x 2C21 sindzz].
(9.268)
This can a.lso be written as y(z) = AOedlZcos(dzz - A l ) ,
(9.269)
SECOND-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
435
where we have defined two new real constants, A0 and A l , as
C2I
A1 = tan-’ -
(9.271)
G R
In Equation (9.269) A0 is the amplitude and function:
A1
is the phase of the cosine
Example 9.14. Harmonic oscillator: Small oscillations of a mass, m, attached to a spring is governed by the equation
mx
=
-kx,
(9.272)
where x denotes the displacement from the equilibrium position and k is the spring constant. Usually we write Equation (9.272) as
x + w 2o x = 0,
(9.273)
where w: = k / m is called the natural frequency of the spring. The general solution of Equation (9.273) is given as
2 ( t ) = a0 cos(w0t - a ) , where a0 is the amplitude and the initial conditions,
ai
is the phase of the oscillations. Using
x ( 0 ) = a0 cos a , vg = 2(0) = aOwO Sinai,
20 =
we can determine of the mass m as
a0
and
Q
(9.274)
(9.275) (9.276)
in terms of the initial position and velocity
(9.277) (9.278)
Example 9.15. Damped harmonic oscillator: When there is a damping force proportional to velocity, f d = -b2, the equation of motion of the harmonic oscillator becomes
mx
=
b ’ 2 + -x m
-bx
-
kx,
2 + w0x = 0,
(9.279) (9.280)
436
ORDINARY DIFFERENTIAL EQUATIONS
where b is the damping factor or the friction coefficient. Roots of the characteristic equation are now given as (9.281) which we analyze in terms of three separate cases: Case I: When b2/4m2 > wg,the harmonic oscillator is over-damped and the solution is given entirely in terms of exponentially decaying terms:
Since the first term blows up at infinity, we set C1 final solution as
=
0 and write the
Case 11: When b2/4m2 < w i , the harmonic oscillator is under-damped and the roots of the characteristic equation have both real and imaginary parts: (9.284) Now the general solution is given as
~ ( t=)Age-bt/2mcos(wlt - a ) ,
(9.285)
where the motion is oscillatory, albeit an exponentially decaying amplitude. Using the initial conditions, we can determine A0 and a as 1 tana = w1
(& + :),
(9.286) (9.287)
where 20
= z(O), v0 = i ( 0 ) .
(9.288)
Case 111: When b2/4m2 = wi,we have the critically damped harmonic oscillator, where the roots of the characteristic equation are equal: (9.289)
437
SECOND-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
The two linearly independent solutions are now taken as e-bt/2m
and tePbtlam,
(9.290)
Hence, the general solution is written as
x ( t )= (
(9.291)
~ l +
where the arbitrary constants are given in terms of the initial position, 50,and velocity, UO, as (9.292) (9.293)
9.5.2
Operator Approach
An alternate approach t o second-order linear differential equations with constant coefficients is t o write the differential equation, y”
+ a1y’ + a2 = 0 ,
(9.294)
in terms of a linear differential operator, C, as
Cy(x) = 0,
(9.295)
where C is defined as (9.296) We can factorize C as the product of two linear operators:
( D - Q ) ( D - r l ) y = 0. Since the naming of the roots as as
r1
or
7-2
(9.297)
is arbitrary, we can also write this
( D - q ) ( D - r2)y = 0.
(9.298)
Solutions of the two first-order equations, =0
(9.299)
( D - 7-2)Y = 0,
(9.300)
( D - r1)y and
are also the solutions of the Equation (9.297), hence (9.294), which are y1 = COerlX,
y2
= C1er2x,
(9.301) (9.302)
438
ORDINARY DIFFERENTIAL EQUATIONS
respectively. Since the Wronskian of these solutions,
w(y1,yp) = (rz - rl)e(rl+rz)z,
(9.303)
is different from zero for distinct roots, these solutions are linearly independent and the general solution is written as their linear combination:
Y = ClYl + GY2.
(9.304)
In the case of a double root,
r = 7-1 = 7-2,
(9.305)
Equation (9.297) or (9.298) becomes
( D - r ) 2y = o ,
( D - r ) ( D - r)y = 0.
(9.306) (9.307)
( D - r ) y = u(x)
(9.308)
Let us call
so that Equation (9.307) becomes
( D - r ) u ( x ) = 0,
(9.309)
u = Coerx.
(9.310)
the solution of which is
Substituting this into Equation (9.308), we get
( D - r)y = Coerz,
(9.311)
which can be easily solved with the techniques introduced in Section 9.3.6, to yield the general solution as
y(z) = COerZ
+ C1xerx.
(9.312)
As expected, in the case of a double root the two linearly independent solutions are given as yi 9.5.3
= erx
and y2 = xerx.
(9.313)
Linear Homogeneous Equations with Variable Coefficients
We can write the general expression,
Ao(Z)Y’’ + Al(X)Y’
+ A2(X)Y = 0,
(9.314)
SECOND-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
439
for the second-order linear homogeneous differential equation with variable coefficients as
y” + P(z)d+ Q(z)y = 0.
(9.315)
When Q ( x ) is zero, we can substitute (9.316)
v = yr
arid obtain a first-order equation, v’
+ P ( z ) v = 0,
(9.317)
which can be reduced to a quadrature as v = Cle-
s P ( z ) dx.
(9.318)
Using this in Equation (9.316) and integrating once more, we obtain the solution as y = C1
1
e-
+ C,.
p ( z ) dxdx
(9.319)
In general, the solution for Equation (9.315) cannot be given except for certain special cases. However, if a special solution, y1 (z),is known, then the transformation (9.320)
Y =Yl4X)
reduces Equation (9.315) into the form y1dI
+ (ay: + Py1)u’ = 0,
(9.321)
where the dependent variable is missing, hence the solution can be reduced to quadratures with the substitution w = u/ as
(9.322)
(9.323)
Some suggestions for finding a special solution are (Murphy): (i) If P ( x ) z&(z) = 0, then y1 = z. (ii) Search for a constant k such that k2 + ICP Q = 0. If you can find such a k , then y1 = e“. (iii) Try a polynomial solution, y1 = a0 alz . . a,zn. Sometimes one can solve for the coefficients.
+
+
+ +. +
440
ORDINARY DIFFERENTIAL EQUATIONS
(iv) Set x = 0 or its solution is y1.
5
= 00 in Equation (9.315). If the result is solvable, then
Example 9.16. When a particular solution is known: Consider the differential equation (9.324) / ~particular . solution is given where P ( x ) = 2/32 and &(x) = - 1 / 9 ~ ~ A 113 as y1 = e x . We can now use Equation (9.323) to write the general solution as (9.325) (9.326) (9.327) In the last step we redefined the arbitrary constant C1. Example 9.17. Wronskian from the differential equation: An important property of the second-order differential equations is that the Wronskian can be calculated from the differential equation without the knowledge of the solutions. We write the differential equation for two linearly independent solutions, y1 and yz, as Y:'
+ P(z)Y:+ Q(z)Y~= 0
(9.328)
Y;
+ P(.)Y~+ Q ( ~ ) V Z= 0.
(9.329)
and
Multiply the second equation by y1 and the first one by yz,
+ P ( X ) Y +~ QY(~z ) Y ~ Y=~0, Y~Y:' + J'(z)YzY/~ + Q(Z)YzYi = 0, YiY;
(9.330) (9.331)
and then subtract to write y1y;
-
YZYj:l+ P(.)
[YlYL
-
Y a d l = 0.
(9.332)
This is nothing but
+
d W ( z ) P ( z ) W ( z )= 0,
dx
(9.333)
441
SECOND-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
where W ( x ) = [ylyi - y2yi]. Equation (9.333) gives the Wronskian, W ( x )= W [y1(x),y2(x)] , in terms of the coefficient P ( x ) as
W ( x )= Woe- s P(.)
dz.
(9.334)
Note that the solution in Equation (9.323) can also be written as
Y = y1 [CI 9.5.4
Cauchy-Euler
1w(z) Y1
dx + C2]
(9.335)
Equation
A commonly encountered second-order linear differential equation with variable coefficients is the Cauchy-Euler equation: 2ytt
where
a1
and
a2
+ a1xy’ + a2y = 0,
(9.336)
are constants. Using the transformation y(z) = u ( z ) , z = l n x
(9.337)
and the derivatives (9.338) we can write the Cauchy-Euler equation as
u”(z)
+ (a1
-
+
l)u’(z) a2u(z) = 0,
(9.339)
which has constant coefficients with the characteristic equation T2
+ (a1
-
1)‘
+ a2 = 0.
(9.340)
There are three cases: Case I: When (a1 - 1)2> 4U2, the general solution for Equation (9.336) is y = C1xT’
+
C2xT2,
(9.341)
where ‘1,2
Case 11: When becomes
=
(a1 -
~
(’
-
2
&
1
-J(Ul
2
-
1)’ - 4U2.
(9.342)
1)2 = 4a2,the general solution of Equation (9.336)
(9.343)
442
ORDINARY DIFFERENTIAL EQUATIONS
Case 111: When (a1 - 1)2< 4a2, roots of the characteristic equation are complex: ~ 1 ,= 2
a fib,
(9.344)
where a and b are given as
a = - (1 - a1) 2 ,
1
b = - [4az - (a1 - 1)2] 2
(9.345) 1/2 .
(9.346)
Hence the general solution of Equation (9.336) can be written as y = Clza cos [blnz - CZ].
9.5.5
(9.347)
Exact Equations and Integrating Factors
We now extend the definition of exact equation to second-order differential equations written in the standard form
Ao(z)y”
+ A l ( ~ ) y+’ A2(z)y= 0.
(9.348)
For an exact equation we can find two functions, Bo(z) and Bl(x), which allow us to write the above equation in the form (9.349) the first integral of which can be written immediately as
+
Bo(z)l~’ B i ( z ) y = C.
(9.350)
This is a first-order differential equation and could be approached by the techniques discussed in Sections 9.2 and 9.3. To obtain a test for exact equations, we expand Equation (9.349):
Boy!’
+ (BL+ B1)y’ + B i y = 0;
(9.351)
then we compare it with Equation (9.348). Equating the coefficients of y”, y’, and y gives the following relations, respectively: (9.352) (9.353) (9.354) Using the first two equations [Eqs. (9.352) and (9.353)], we write B1 = A1 Ah, which after differentiation and substitution of Equation (9.354) gives the
443
SECOND-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
condition for exactness entirely in terms of the coefficients Ai, i as
A:
-
A:
=
=
A2.
0,1,2
(9.355)
In the case of first-order differential equations, granted that it has a unique solution, it is always possible t o find an integrating factor. For the secondorder differential equations there is no guarantee that an integrating factor exists. If a differential equation fails to satisfy the test for exactness [Eq. (9.355)], we can multiply it with a n integrating factor, ~ ( z ) t,o write
+
PL(5)[A0(4Y1’+ Al(z)Y’ A2(Z)Y1 = 0, [P(2)Ao(.)l Y’I + MzM1 ().I Y’ + [P(Z)AZ (.)I Y = 0.
(9.356) (9.357)
Substituting the new coefficients of Equation (9.357) into Equation (9.355), we write
(PAO)” - (PAl)’
+ PA2 = 0,
(9.358)
which gives the differential equation to be solved for p ( x ) :
Aop”
+ (2Ab
-
A1)p’
+ (A:
- A:
+ A2)p = 0.
(9.359)
This is also a second-order differential equation with variable coefficients, which is not necessarily easier than the original problem. Example 9.18. Integrating factor: Motion of a particle of mass m in uniform gravitational field g is governed by the equation of motion
m x = -mg,
(9.360)
where the upward direction is taken as positive. An overhead dot denotes time derivative. Solution is easily obtained as
+ vo, 2 = --gt 1 2 + vot + zo, 2 2,
= x = -gt
(9.361) (9.362)
where zo and uo denote the initial position and velocity. In the presence of friction forces proportional to velocity, we write the equation of motion as
mx = -bx _. 2
b .
+ --2 m
-
mg,
= -9;
(9.363)
(9.364)
where b is the coefficient of friction. The homogeneous equation
b . x+-z=o m
(9.365)
444
ORDINARY DIFFERENTIAL EQUATIONS
is not exact, since the test of exactness [Eq. (9.355)] is indeterminate, that is, 0 - 0 = 0. Using Equation (9.359), we write the differential equation to be solved for the integrating factor,
b . = 0, m
b-
-p
(9.366)
the solution of which yields p ( t ) as p ( t ) = ebtlm.
(9.367)
Multiplying Equation (9.364) with the integrating factor found gives ebtlm [% +
kj-]
=
-gebt/m,
(9.368) (9.369)
This can be integrated immediately to yield (9.370)
or -
”””[
e-bt/m
b
(1+2)-1]
(9.372)
Note that in the limit as t + co,velocity goes to the terminal velocity, vt = - g m / b . Equation (9.372) can be integrated once more t o yield (9.373) In the limit of small friction, this becomes (9.374) 9.5.6
Linear Nonhomogeneous Equations
In the presence of a nonhomogeneous term, the general solution of the linear equation
is given as Y = Yc
+ Yp,
(9.376)
SECOND-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
445
where yc is the solution of the homogeneous equation,
+ Czy2,
Ye = ClY1
(9.377)
which is also called the complementary solution. The particular solution, yp, is a solution of the nonhomogeneous equation that contains no arbitrary constants. In what follows we discuss some of the most commonly used techniques to find particular solutions of linear second-order nonhomogeneous differential equations. Even though there is no general technique that works all the time, the techniques we introduce are rather effective when they work. 9.5.7
Variation of Parameters
If we know the complementary solution, yc, one of the most common techniques used to find a particular solution is the variation of parameters. In this method we treat the integration constants, C1 and C2, in the complementary solution as functions of the independent variable, 2, and write Equation (9.377) as
+ %(Z)Yz.
Yp = .Ul(Z)Yl
(9.378)
For a given y1 and y2, there are many choices for the functions u1 and 212 that give the same yp. Hence we can use this freedom to simplify equations. We first write the derivative = ViYl
+ w:+ 4 Y 2 + WY;
(9.379)
and then use the freedom we have to impose the condition
4Yl
+ 4 4 2 = 0,
(9.380)
so that the first derivative of g p becomes
Y:, = w; + WY:.
(9.3811
Now the second derivative is written as
+ v1yy + u;y; + vzy;.
y; = 'u:y:
(9.382)
Substituting these derivatives into the differential equation [Eq. (9.375)] gives A0
+A1 [ Q Y ~
+
[v~Y;
+ V;Y; + V Z ~ ; ]
V I Y ~
+ v~Y;]+ Az [WYI + 2 1 ~ ~ 2 .=1 f ( x ) .
(9.383)
After rearranging and using the fact that y1 and yz are the solutions of the homogeneous equation, we obtain (9.384)
446
ORDINARY DIFFERENTIAL EQUATIONS
Along with the condition we have imposed [Eq. (9.380)],this gives two equations,
(9.385) (9.386) to be solved for vi and vh. Determinant of the coefficients is nothing but the Wronskian
(9.387) Since y1 and 9 2 are linearly independent solutions of the homogeneous equation, W[yl,y2] is different from zero. Hence we can easily write the solutions for vi and vi as
I
0
Y 2 1
I f/Ao v; = I
Y1
Yh Y2
I
I
(9.388)
(9.389)
(9.390)
(9.391) These can be integrated for v1 and v2 as
(9.392) (9.393) to yield the particular solution YP
9.5.8
=Vl(Z)Yl
+ v2(x)y2.
(9.394)
Method of Undetermined Coefficients
When this method works, it is usually preferred over the others. It only depends on differentiation and simultaneous solution of linear algebraic equations. Even though its application is usually tedious, it does not require any
SECOND-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
447
integrals to be taken; and as in the variation of parameters method, it does not require the knowledge of the solution of the homogeneous equation. For a general nonhomogeneous term, which can be written as f ( - z ) = eaz[cosbz(a0
+ a1-z + . . . + a,-zn)
+ sin b-z(Po + P I X + . . . + pn-zn)],
(9.395)
where a , b, n, ai, pi are constants, we substitute a solution of the form
+
yp(x) = -zTeaz[cosb-z(A1O All-z+ . . . + A l n z n ) sin b-z(Blo B1l-z . . . Bln-zn)]
+
+
+ +
(9.396)
into the differential equation
Ao(z)y”+ Ai(z)y’+ A ~ ( X = ) Yf(.).
(9.397)
The result is 2n linear algebraic equations for the coefficients
Ali, i
= 0 , . ..
(9.398)
,n,
and
Bli, i = O , . . . , n.
(9.399)
For the value of T we take 0, if a f i b is not one of the roots of the characteristic equation. In other cases we take r as the number of roots coinciding with a+ib, which in no case will be greater than 2 for a second-order differential equation. Usually the inhomogeneous term, f ( z ) , can be decomposed as
f(.)
= fl(-z)
+ . + fm(-z), ’ ’
(9.400)
where each term is of the form in Equation (9.395). For each term we repeat the process t o obtain the particular solution of
AO(-z)d/+Al(-z)Y/+ A2(X)Y = f ( - z ) , =fl(-z)+..‘+fm(-z),
(9.401)
as
where
+
+
Ao(x)yii Ai(z)ybi Az(-z)y,i = fi(-z),
i = 0 , . . . ,m.
(9.403)
Example 9.19. Damped and driven harmonic oscillator: The equation of motion is given as
FO z + --zmb . + W,2X = COSWt, m
(9.404)
448
ORDINARY DIFFERENTIAL EQUATIONS
where the nonhomogeneous term is the driving force. Comparing with Equation (9.395), we identify
FO a = 0 , b = w , ao=-,n=O, m
(9.405)
hence write x p as x p = tr [A10cos w t
+ Blo sin w t ] .
(9.406)
We take the value of r as 0, since i w does not coincide with any one of the roots of the characteristic equation of [Eq. (9.28l)l I(:
+ --zm6 . + w ; x
Using Equation (9.406) with r
=0
= 0.
(9.407)
and its derivatives,
ip = -Alow sin w t + Blow cos w t ] ,
(9.408)
xp = -A10w2 cos w t - Blow' sin w t ] ,
(9.409)
in Equation (9.404), we obtain two linear equations for
AIO(W; - w2)
+ Blo
w2) - A10
(g) m, (5)
A10
= FO
= 0.
and Blo : (9.410) (9.411)
If we first square and then add these equations, we obtain (9.412) Using Equation (9.411) we can also write the ratio BlO bw/m -~ A10 W: - w''
(9.413)
Now the particular solution can be written as (9.414) where
A0
2
1/2
[A?, + B101 , BlO = tan-' .
=
(9.415) (9.416)
SECOND-ORDER DIFFERENTIAL EQUATIONS: METHODS OF SOLUTION
449
Finally, the general solution becomes
4 t ) = zc(t) + z p ( t ) ,
(9.417)
where 2 , is the solution of the homogeneous equation found in Example 9.15. For the under-damped harmonic oscillator, the complementary solution is given as [Eq. (9.285)]
x c ( t )= Coe- b t / 2 m cos(w1t - CI), w1 =
.J
wo -2:: ’
(9.418)
where CO and C1 are integration constants. Note that for sufficiently large times, z c ( t )dies out leaving only the particular solution. It is for this reason that 5 , is called the transient solution and z p is called the steady-state solution.
Example 9.20. Method of undetermined coeficients: Comparing the nonhomogeneous term of the equation
y”
-
3y‘ + 2y = 2xe”
(9.419)
with Equation (9.395) we identify the constants as
a = 1, b = 0,
a0 =
0,
a1
= 2 and n = 1,
(9.420)
where a f ib is just 1. The characteristic equation in this case has the roots r1 = 2 and 7-2 = 1. Since the second root coincides with 1, we take r = 1 and write yp as
yp = ze”[Alo
+ Allz].
(9.421)
Substituting this into Equation (9.419) and after rearranging, we get
-2Allze”
+ (-A10 + 2All)e” = 2xe”
(9.422)
or
+
- 2 A 1 1 ~ (-A10 Equating the equal powers of for All and Alo:
5 , we
+ 2A11) = 22.
(9.423)
get two linear equations to be solved
-2Al1 = 2, -A10 2A11 = 0,
(9.424) (9.425)
A l l = -1 and A10 = -2.
(9.426)
+
which yields
Thus a particular solution is obtained as
yp = e”(-22 - 2 2 ). It is always a good idea to check.
(9.427)
450
9.6 9.6.1
ORDINARY DIFFERENTIAL EQUATIONS
LINEAR DIFFERENTIAL EQUATIONS O F HIGHER ORDER With Constant CoefFicients
The homogeneous linear differential equation of order n with real constant coefficients, dng aodxn
dn-l + a1-dxn-l + " . + any=O,
(9.428)
can be solved exactly. If we try a solution of the form y a erx,
(9.429)
we obtain an nth-order polynomial as the characteristic equation: a0rn
+ a1rn-l + . ' . + a, = 0 ,
(9.430)
which has n roots in the complex plane. Depending on the nature of the roots, we have the following cases for the general solution: Case I: If the characteristic equation has a real root, a , occurring k 5 n times, then the part of the general solution corresponding to this root is written as
[co + c1z
+ c2z2 + . . . + c k - 1 ~ ~ - eax. ~ ]
(9.431)
Case 11: If the remaining roots of the characteristic equation are distinct and real, then the general solution is of the form y = [co
+ c1z + c2z 2 + . . . + ck-1zk--l] eaz
+ c k + l e a k + l z + . . . + cneanz.
(9.432)
Case 111: If there are more than one multiple root, then for each multiple root we add a term similar to Equation (9.431) for its place in the general solution. Case IV: If the characteristic equation has complex conjugate roots, a f i b , then we add y
= eas (c1 sin bz
+ c2 cos bz)
(9.433)
for their place in the general solution. Case V: If any of the complex conjugate roots are l-fold, then the corresponding part of the general solution is written as
LINEAR DIFFERENTIAL EQUATIONS OF HIGHER ORDER
9.6.2
451
With Variable Coefficients
In general, for nth-order linear homogeneous differential equations with variable or constant coefficients, there are n linearly independent solutions,
{fl,f 2 , . . , f n ) , '
(9.435)
which forms the fundamental set of solutions. The general solution is given as their linear combination:
where ci, i = 1, . . . , n are arbitrary integration constants. For a linearly independent set, fl, f 2 , . . . , f i L , the Wronskian W [ f l f,2 , . . . , f n . ] , which is defined as
is different from zero. Unlike the case of constant coefficients, for linear differential equations with variable coefficients, no general recipe can be given for the solution. However, if a special solution, ~ ( x )is, known, then the transforniat ion
can be used to reduce the order of the differential equation by 1 to ( n- 1) in thc depcnderit variable w 9.6.3
du dx
= -.
Nonhomogeneous Equations
If glr, and yzp arc the particular solutions of
and
respectively, then their linear combination
Y = ClYlp
+ CZYlp,
(9.441)
452
ORDINARY DIFFERENTIAL EQUATIONS
is the particular solution of
where c1 and
c2
are constants. For the nonhomogeneous equation,
(9.443) the general solution is the sum of the solution of the homogeneous equation, yc, which is called the complementary solution, plus a particular solution, yp, of the nonhomogeneous equation: Y = Yc
+ Yp.
(9.444)
To find a particular solution, the method of variation of parameters and the method of undetermined coefficients can still be used.
9.7
INITIAL VALUE PROBLEM AND UNIQUENESS OF T H E SOLUTION
Consider the initial value problem
y(4= f (x,Y, Y', . . . > y(n-U), Yo
= Y(ZO),
. . . , Y o(-1)
= y("-l)(zo),
(9.445) (9.446)
If the functions (9.447) are continuous in the neighborhood of the point P(z0,yo,. . . , y r - " ) as functions of the n + 1 variables, then there exists precisely one solution satisfying the initial values
Yo = Y(Zo),... , Yo(n-1) = y ( n - l ) ( z o ) . 9.8
(9.448)
SERIES SOLUTIONS: FROBENIUS METHOD
So far we have concentrated on analytic techniques, where the solutions of differential equations can be found in terms of elementary functions like exponentials and trigonometric functions. However, in a large class of physically interesting cases, this is not possible. In some cases, when the coefficients of the differential equation satisfies certain conditions, a uniformly convergent series solution may be found.
SERIES SOLUTIONS: FROBENIUS METHOD
453
A second-order linear homogeneous ordinary differential equation with two linearly independent solutions can be written as (9.449) When xo is no worse than a regular singular point, that is, if lim (z - z o ) P ( z )t finite
(9.450)
lim (x- x ~ ) ~ Q (+x finite, )
(9.451)
2-20
and 2-20
we can seek a series solution of the form 00
y(x) = c u k ( z -
a0
# 0.
(9.452)
k=O
Substituting the series in Equation (9.452) into the differential equation [Eq. (9.449)], we obtain an equation that in general looks like
Usually the first couple of terms go as
+
bo(a0,0)(x - z0)a-2 b1(ao,al, a)(. - xO)--l +bz(az, al, a3, a)(. - zo)' . . . = 0.
+
(9.454)
For this equation to be satisfied for all x in our region of interest, all the coefficients, b,(ai, ai-1, ai-2, a ) , must vanish. Setting the coefficient of the lowest power of (x - 20) with the assumption a0 # 0 gives us a quadratic equation for a , which is called the indicial equation. For almost all of the physically relevant cases the indicial equation has two real roots. Setting the coefficients of the remaining powers to zero gives a recursion relation for the remaining coefficients. Depending on the roots of the indicial equation, we have the following possibilities for the two linearly independent solutions of the differential equation (Ross): 1. If the two roots differ, a1 > 0 2 , by a noninteger, then the two linearly independent solutions are given as
454
ORDINARY DIFFERENTIAL EQUATIONS
2. If (cyl - ~112) = N , where N is a positive integer, then the two linearly independent solutions are given as co
Yl(Z) =
12 -
QIQ1
C U k ( Z -Z k=O
O K
a0
# 0,
(9.456)
and 00
y2(x) = 1%
-
C
20/Q2 b k ( z - ~ k=O
0
+ c) ~~ ~In( IIZ c ZO) I , -
bo # 0. (9.457)
The second solution contains a logarithmic singularity, where C is a constant that niay or may not be zero. Sometimes a2 will contain both solutions; hence it is advisable to start with the smaller root with the hopes that it might provide the general solution. 3. If the indicia1 equation has a double root, a1 = ~ 2 then , the Frobenius method yields only one series solution. In this case the two linearly independent solutions can be taken as
(9.458) where the second solution diverges logarithmically as z 20. In the presence of a double root, the Frobenius method is usually modified by taking the two linearly independent solutions as ---f
(9.459)
and y2(.c) =
1 2-
cE0bk(x
-~
0
+ Y ~ ( zIn) Iz )
~
-
20
I.
In all these cases the general solution is written as Y ( 2 ) = Alyl(z)
where Al and
A2
+ AZYZ(2)I
(9.460)
are integration constants.
Example 9.21. A case with distinct roots: Consider the differential equation z2y”
+ ();
y’
+ 2 2 y = 0.
(9.461)
Using the Frobenius method, we try a series solution about the regular singular point, 20 = 0, as
c 03
Y(Z) =
/ZIT
n=O
anzn1 a0
# 0.
(9.462)
SERIES SOLUTIONS: FROBENIUS METHOD
455
Assuming that x > 0, we write
c 03
y(x)
unxn+r,
=
(9.463)
n=O
which gives the derivatives, y’ and y”, as 03
y/ =
C(n+ r)u,xn+r-l,
(9.464)
n=O 03
y” =
- y ( n+ r ) ( n+
T -
(9.465)
l)unxn+r-2.
n=O
Substituting y, y’, and y” into Equation (9.461), we get w
03
n=O
n=O
n=O
(9.466) We express all the series in terms of z ~ + ~ : w
w
+
where we have made the variable change n 2 4n’ in the last series and dropped primes at the end. To start all the series from n = 2, we write the first two terms of the first two series explicitly:
(9.468) This equation can only be satisfied for all z, if and only if all the coefficients vanish, that is,
[.
(r
[(r
-
+ 1) ( r +
31
= 0,
(9.469)
a1 = 0,
(9.470)
uo
31
456
ORDINARY DIFFERENTIAL EQUATIONS
The coefficient of the first term [Eq. (9.469)] is the indicia1 equation and with the assumption a0 # 0 gives the values of r as r1 =
1 2
- and 7-2 = 0.
(9.472)
The second equation [Eq. (9.470)] gives a1 = 0 for both values of r and finally, the third equation [Eq. (9.471)] gives the recursion relation an
=
We start with
-
2an-2 [(n r ) (n r
+
+
-
i)], n = 2 , 3 , . . . .
(9.473)
= 1 / 2 , which gives the recursion relation
a,
=
-
2an-2 n n(n +) ’
+
=
2,3,... ,
(9.474)
and hence the coefficients
(9.475)
The first solution is now obtained as
Similarly, for the other root,
a,
=-
7-2
= 0, we obtain the recursion relation
2an-2 n(n -
i)’ n = 2 , 3 , . . . ,
(9.477)
457
SERIES SOLUTIONS: FROBENIUS METHOD
and the coefficients a1 =
0,
2a0 a2 = -3 ’ a3 = a4
a5
=
0,
-,2a0
(9.478)
21 = 0,
4a0 a6 = --
693 ’
which gives the second solution as y2 = a.
2x2 2x4 4x6 1- - + - - - + . . . 3 21 693
[
(9.479)
We can now write the general solution as the linear combination zX2 2x4 4x6 1 - - + - - -+ . . . 5 45 1755 2x2 2x4 4x6 +c2 I--+---+... 3 21 693
y = QXI
[ [
1
1 (9.480)
Example 9.22. General expression of the nth term: In the previous example we have found the general solution in terms of infinite series. We now carry the solution one step further. That is, we write the general expression for the n t h term. The first solution was given in terms of the even powers of x as y1=x2
2x2 zX4 I--+---+... 5 45
[
1
4x6 1755
03
= X1l2
(9.481) (9.482)
a2kx2k.
k=O
Since only the even terms are present, we use the recursion relation [Eq. (9.474)] to write the coefficient of the k term in the above series as a2k = -
We let k
+k -
2a2(k - 1) k = 1,2,. . . 2k(2k $) ’
(9.483)
+
1: U2k-2
=
-
1 ( k - 1)(2k - 2
+ i)a2k -4
(9.484)
458
ORDINARY DIFFERENTIAL EQUATIONS
and use the result back in Equation (9.483) to write a2k =
-
1 k(2k +
3) ( k
1 1)(2k - 2 1 9(2k-2
k ( k - 1)(2k +
We iterate once more. First we let k =
a2k-2
+ +) a 2 k - 4
-
1 ( k - l)(k - 2)(2k - 2
(9.485)
+ i)U 2 ( k P 2 ) ’
+k
- 1 in the above equation:
+3(2k
-
4
+ 3)a 2 ( k - 3 )
(9.486)
and then substitute the result back into Equation (9.483) to write a2k
1 a2(k-3) k ( 2 k $ ) ( k - l ) ( k - 2 ) ( 2 k - 2 ;)(ark - 4 $) 1 - a 2 (k- 3 ) 2.2.2.k(k - 1)(k - 2 ) ( k i ) ( k - 1 z)( k - 2 + $ ) 1 -a2 ( k - 3 ) . 2 3 q k - i)(k - 2)(k i ) ( k - 1 + ) ( k- 2 (9.487)
=
-
+
+
+
+
+
+
After k iterations we hit ao. Setting U2k
= 2k
+
a0 =
+ 2)
1, we obtain
(-Ilk [ k ( k - l ) ( k - 2 ) . . .2.1] ( k + i ) ( k - 1
+ T)(k - 2 + + ) . . . ( 1 + a ) (9.488)
We now use the gamma function:
r ( x + 1) = zr(x), z > 0 ,
r(i)= 1,
(9.489)
which allows us t o extend the definition of factorial to continuous and fractional integers as
qn+ 1) = qn) = n(n -
i ) q n- 1)
= n(n - i)(n- 2 ) r ( n - 2)
This can also be written as
n(n - l ) ( n- 2) ’ . . ( n- k ) = r ( n + l ) n - k > O . r(n- k ) ’
(9.491)
SERIES SOLUTIONS: FROBENIUS METHOD
459
Using the above formula, we can write
Substituting Equation (9.492) and k ( k Equation (9.488), we write a 2 k as
-
l)(k
-
2).-.2.1
=
k ! into
(9.493) which allows us to express the first solution [Eq. (9.481)] in the following compact form: 00
y1(z) = x1'2
k=O
(-1)kr(5/4) 22k 2"!r(k 5/4)
(9.494)
+
Following similar steps we can write y 2 ( x ) as (9.495)
Example 9.23. W h e n the roots difler by an integer: Considerthedifferential equation y(z) = 0,
dx Since
20 =
3:
2 0.
(9.496)
0 is a regular singular point, we try a series solution
c 00
y=
unxn+T,a0 # 0 ,
(9.497)
n=O
where the derivatives are given as 00
yl =
C(n+
(9.498)
T)unxn+r-l,
n=O M
C(n+ .)(n + ~~
y" =
(9.499)
T - l)CLnxn+r-2.
n=O
Substituting these into the differential equation [Eq. (9.496)] and rearranging terms as in the previous example, we obtain
[,.(,
+ 2) + -
UOXT+1
431
+
[
(r
+ 1 ) ( T + 3) + -4
+ n=2 C {[( n+ r ) ( n+ + 2) + -431 an + un-2 T
31
1
ulxr+2
x n f r + l = 0.
(9.500)
460
ORDINARY DIFFERENTIAL EQUATIONS
Orice again we set the coefficients of all the powers of
[
T(T
[
(T
[
+
( n .)(n
5
to zero:
+ 2) + -431 a0 = 0, a0 # 0,
+ 1 ) ( T + 3 ) + -431 a1 = 0 ,
+ + 2) + -31 a, + an-2 4
(9.502)
= 0 , 12
T
(9.501)
2 2.
(9.503)
The first equation [Eq. (9.501)] is the indicia1 equation and with the assumption a0 # 0 gives the values of T as (9.504) Let. us start with the first root, (9.502)] gives
TI
= -1/2.
The second equation [Eq.
(9.505)
[3
a1 = 0,
(9.506)
0.
(9.507)
a1 =
The remaining coefficients are obtained from the recursion relation: a7,= -
-
1
+ +3
an-2,
[ ( n- i)(n $) k - 2
(2n - l)(2n
+ 3) + 3'
n
n
2 2,
2 2.
(9.508)
All the odd terms are zero and the nonzero terms are obtained as a 2 = - - a0 a 4 = - a0 6' 120'"' '
a2n =
Hence the solution corresponding t o
T
a0 (-1y (2n+ . l)!?...
=
(9.509)
- l / 2 is obtained as (9.510)
We can write this as (9.511) = a 0 ~ - 3 / sin 2 x.
(9.512)
SERIES SOLUTIONS: FROBENIUS METHOD
For the other root, comes
7-2
461
= -3/2, the second equation [Eq. (9.502)] be-
[-; (i)+ i]
a1 = 0 ,
(9.513)
0 a1 = 0,
(9.514)
thus a1 # 0. The remaining coefficients are given by the recursion relation
a,
=
-
4
~
2
(2n - 3)(2n + 1) + 3'
n 2 2,
(9.515)
as
Now the solution for
7-2
= -3/2 becomes
+alx-3/2
( -xi 3
+ -x5 120
2 -
-
...
)
(9.517)
We recognize that the first series in Equation (9.517) is nothing but cos x and that the second series is sin x;hence we write this solution as
y = aOz-3/2 cos x
+ u ~ x - sin ~ /x.~
(9.518)
However, this solution may also contain a logarithmic singularity of the form y1(x) In 121:
y
= agz-3/2
sinz
+ u ~ x - sinx ~ / + ~ Cyl(x) In 1x1.
(9.519)
Substituting this back into Equation (9.496), we see that c(2x1I2cos x
-
x - ' / ~sin x) = 0.
(9.520)
the quantity inside the brackets is in general different from zero, hence we set C to zero. Since Equation (9.518) also contains the solution obtained for 7-1 = we write the general solution as
-3,
y=
COX-^'^ cos x + c ~ x - sin ~ /2, ~
(9.521)
where co and c1 are arbitrary constants. Notice that in this case the difference between the roots, (9.522) is an integer. Since it also contains the other solution, starting with the smaller root would have saved us some time. This is not true in general. However, when the roots differ by an integer, it is always advisable to start with the smaller root hoping that it yields both solutions.
462
ORDINARY DIFFERENTIAL EQUATIONS
9.8.1 Frobenius Method and First-Order Equations It is possible to use Frobenius method for a certain class of first-order differential equations that could be written as
+ p(z)y = 0.
y’
(9.523)
A singular point of the differential equation, zo, is now regular, if it satisfies
(x - zo)p(lcO) 4finite.
(9.524)
Let us demonstrate how the method works with the following differential equation: zy’
Obviously, ICO
=0
+ (1
-
z)y = 0.
(9.525)
is a regular singular point. Hence we substitute 03
(9.526) n=O
and its derivative,
c 03
un(n
y/ =
+
(9.527)
7-)Zn+r-1,
n=O
into Equation (9.525) to write 03
n=O
00
00
n=O
n=O
Renaming the dummy variable in the first two series as n dropping primes, we obtain
-+
n’+ 1 and then
which, after rearranging, becomes 00
(7-
+ l)zruo + C[(n + 7- + 2)un+1
-
u,]zn+r+l = 0.
(9.530)
n=O
Indicia1 equation is obtained by setting the coefficient of the term with the lowest power of to zero:
+
(1 r)ao = 0, uo
# 0,
(9.531)
463
PROBLEMS
which gives
r
=
-1.
(9.532)
Using this in the recursion relation
(9.533) we write
(9.534) and obtain the series solution as
(9.535) -
a0
-ex
(9.536)
5
This can be justified by direct integration of the differential equation. Applications of the Frobenius method to differential equations of higher than second order can be found in Ince. PROBLEMS
1. Classify the following differential equations:
dy dx
+ x2y2= 5xex.
(i)
-
(iii)
d4y d3y - + 5 - - x2y dx4 dx3 d2U
-
8x2
d2u -+8x2
(ii)
=
dY
0,
d2U ++ - = f(x,y, dy2 dz2 d2U
d2U
dydx
+ - ddz2 =2oU.
d3Y + x2y2 = 0. dx3
2).
(vi) (viii
+ x2y = 5.
-=$+-@. dr d2r ds
d4y d3y - + 4- 7y = 0. dx4 dx3 x3dx + y2dy = 0 .
2. Show that the function
y(x) = (8x3+ C)e-6x satisfies the differential equation
_ dy -- -6y + 24x2e-6x dx
464
ORDINARY DIFFERENTIAL EQUATIONS
3. Show that the function y(x) = 2
+ ce-*z2
is the solution of the differential equation
dY + 16x3 = 322. dx
4. Find the solution of
,
2x-y+9 =x-3y+2
5. Given the differential equation
show that its general solution is given as y = (4x2
+ C)ep2"
6. Show that [Eq. (9.72)]
satisfies the differential equation y'
+ a(x)y = b(x).
7. Solve thc following initial value problem:
*+ dx
2 1 =~ 1~6 ~ ~ e -~ ~( 0~=), 2.
8. Show that the following initial value problems have unique solutions:
(ii)
dY - 2Y2 -
(iii)
-
dx
dy dx
2-2'
+ 2y = Sxe-'",
y(1) = 0. y(0) = 2.
PROBLEMS
465
9. Which one of the following equations are exact:
(i) (ii) (iii) (iv) (v)
+ + + + + + +
(3x + 4y) dx (4z 4y) dy = 0. (2x74 2) dx (x2 4y) dy = 0. (y2 1)cosx dx 2y(sinx) dy = 0. (3s 2y) dx (2x y) dy = 0. (4xy - 3) dx (x2 49)dy.
+ + +
+ +
10. Solve the following initial value problems: (i) (ii)
(iii)
+ (4x2 + 4y) dy = 0, (2ye“ + 2e” + 4y2) dx + 2(e” + 4xy) dy = 0,
(8xy - 6) dx
9dx
2y - 2 -3+2y-222’
Y(2) = 2 y(0) = 3.
y(-1) = 2.
11. Solve the following first-order differential equations:
+ +
+ + +
16xy dx (4x2 1) dy = 0. x(4y2 1) dx (x4 1) dy = 0. tan 6’dr 27- d% = 0. (iv) (x 2y) dx - 2 2 dy = 0. 2xy y2) dx ( x 2 2xy - y2) dy = 0. (v) (22’ (i)
(ii) (iii)
+
+
+
+
+
+
12. Find the conditions for which the following equations are exact: (i)
(Aox+ A1y) dx + (Box+ B l y ) dy
= 0.
13. Solve the following equations:
+ +
y’ (3/x)y = 482’. (i) (ii) xy’ [(4x 1)/(2x I)] y = 2 2 - 1. (iii) y’ - ( l / x ) y = -(1/x2)y2. (iv) 2xy’ - 4y = 2x4. (v) 2y’ + (8y - l/y3)x = 0.
+
+
14. The generalized Riccati equation is defined as
Y’(4
=
f(z)+ d Z ) Y
+
+W ) Y 2 .
Using the transformation y(x) = y1 u,show that the generalized Riccati equation reduces t o the Bernoulli equation:
466
ORDINARY DIFFERENTIAL EQUATIONS
+
15. If an equation, Ad dx N dy = 0, is exact and if M(x,y) and N ( z , y ) are homogeneous of the same degree different from -1, then show that t,he solution can be written without the need for a quadrature as
16. If there are two independent integrating factors, 11and 12, that are not constant multiples of each other, then show that the general solution of M ( r ,y) dx N ( x ,y) dy = 0 can be written as
+
11 (x,y )
= C12(2,Y).
17. Solve by finding an integrating factor:
+ 16y2 + 1) dx + (2x2 + 8xy) dy = 0. (8x7~’+ 2y) dx + (2y3 - 22) dy = 0. (1Ozy
(i)
(ii)
18. Show that the general solution of 2
( D - T ) y(x) is givcn as y(2)
=
D
= 0,
cOerZ + clxerZ.
=
d
-,
dx
19. Which one of the following sets are linearly independent:
{ eZ,e2Z,e-22}. {x,x2,x3}. {x,22, e32}. {sinx,cosz,sin2x} {ez,xex,x2eZ}.
(i) (ii) (iii) (iv) (v)
20. Given that y
= 22
is a solution, find the general solution of
(2+
i) 2
-
22-dY
dx
+ 2y = 0
by reduction of order.
21. Given the solution y = 2 2
+ 1, solve
22. Find the general solution for (i) (ii) (iii) (iv) (v)
+ 4y = 0. + 2y’ + 3y = 0. 2y’ + 15y = 0. y(2”) + 6y“ + 9y = 0.
y” y” y”
y/l’
-
-
2y”
+ y’
-
2y = 0.
PROBLEMS
467
23. Verify that the expression
satisfies the differential equation
+ P(z);Y'+ Q ( x ) =~ 0,
7~"
where y1 is a special solution.
24. Use the method of undetermined coefficients to write the general solution of y"
+ y = co sin x + c1 cos x.
25. Show that a particular solution of the nonhomogeneous equation d2Y + n2y -
dx2
= n2f(x)
is given as y=nsinnx
.c
f(x)cosnx dx-ncosnz
26. Use the method of undetermined coefficients to solve 5y' - 3y = x2ex. 2y' - 3y = sinx. (ii) y" 2y' - 3y = xe". (iii) y" 2y' - 3y = 2 sin x 52e" (iv) ~ ( 2 " ) + y' = x2 + sinx + ex. (v) (i)
2y"
y"
-
-
+
27. Show that the transformation x differential equation,
= et reduces
dn-1 d"Y uOxn u12"-1dz" dzn-
+
the nth-order Cauchy-Euler
+ ' . + any = F ( x ) , '
into a linear differential equation with constant coefficients.
Hint: First show for n = 2 and then generalize. 28. Find the general solution of d2Y dx2
X2 -+ x -
dy + n 2 y = xm dx
468
ORDINARY DIFFERENTIAL EQUATIONS
29. Solve the following Cauchy-Euler equations: (i)
2x2-d2Y - 52-dY dx2 dx
(ii)
d2Y x2dx2
-
d2Y (iii) 2x2dx2 d2Y (iv) x2-dx2 d3y x3dx3
(v)
dY 2xdx
-
-
3y = 0.
3y = 0.
dY + y = 0. + 32-dx
dY - 6y = 0. + x-dx dy + 22- - 2y = Inx. dx2 dx d2y
-
z2-
d2Y dY (vi) 22’- dx2 - 52-dx
+ y = x3.
30. Classify the singular points for the following differential equations:
+
+ + + + + + + + + +
(x2 3x - 9)y” (x 2)y’ - 3x2y = 0. (i) (ii) x 3 ( i - x)y” x2 sinxy’ - 3xy = 0. (iii) (x2 - 1)y” 2xy’ y = 0. (iv) (x2 - 4)y” (x 2)y’ 4y = 0. (v) (x2 - 2 ) ; ~ ” (X - 1 ) ~-’ 6y = 0. (vi) x2y” - 2xy’ 2y = 0. 31. Find series solutions about x ential equations: (i) (ii) (iii) (iv) (v)
=0
and for x > 0 for the following differ-
+
x(x3 - 1)y” - (3x3 - 2)y’ 2x2y = 0. xy” + 49’ - 4y = 0. (4x2 1)y” 4xy’ 16xy = 0, y(0) = 2, y’(0) 3xy” (2 - x)y’ - y = 0. xy” 4y’ - 4xy = 0.
+ + +
+
+
= 6.
Discuss the convergence of the series solutions you have found.
32. Find the general expression for the series in Example 9.21: 2x2 + 2x4 - 4x6 + . . . yz(x) = 1 - 3 21 693 where the recursion relation is given as an = -
2an-2 n = 2,3, . . . n ( n- +) ’
CHAPTER 10
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Applications of differential equations are usually accompanied by boundary or initial conditions. In the previous chapter, we have concentrated on techniques for finding analytic solutions to ordinary differential equations. We have basically assumed that boundary conditions can in principle be satisfied by a suitable choice of integration constants in the solution. The general solution of a second-order ordinary differential equation contains two arbitrary constants, which requires two boundary conditions for their determination. The needed information is usually supplied either by giving the value of the solution and its derivative a t some point, or by giving the value of the solution at two different points. As in chaotic processes, where the system exhibits instabilities with respect t o initial conditions, the effect of boundary conditions on the final result can be drastic. In this chapter, we discuss three of the most frequently encountered second-order ordinary differential equations of physics: Legendre, Laguerre, and Hermite equations. We approach these equations from the point of view of the Frobenius method and discuss their solutions in detail. We show that the boundary conditions impose severe restrictions on not just the integration constants but also on the parameters that the differential equation itself includes. Restrictions on such parameters Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
3. Selquk
Bayin
469
470
SECOND-ORDERDIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
may have rather dramatic effects like the quantization of energy and angular momentum in physical theories.
10.1 LEGENDRE EQUATION Legendre equation is defined as (1 - x 2 )-d2Y
dx2
-
2x-dY +Icy = 0,
(10.1)
dx
where Ic is a constant parameter with no prior restrictions. In applications, I; is related to physical properties like angular momentum, frequency, etc. Range of the independent variable x is [ - 1,1].
10.1.1 Series Solution Legendre equation has two regular singular points at the end points x = f l . Since x = 0 is a regular point, we use the Frobenius method and try a series solution of the form 00
y(x) = C a n x n + s , a0
# 0.
(10.2)
n=O
Our goal is to find a finite solution in the entire interval [-1,1]. Substituting Equation (10.2) and its derivatives, (10.3)
n=O
c 03
y” =
un(n
+ s ) ( n+ s
-
1)Zn+s--2,
(10.4)
n=O
into the Legendre equation, we obtain
-2
03
03
n=O
n=O
C an(n+ s ) ~ c +~ k+C ~ a n ~ n + s 0, -
(10.5)
03
C a n ( n + s ) ( n+ s - l)xn+s-Z
n=O 03
+ C a, [-(n +
S)(TZ
+s
-
1) - 2(n
+ S) + k] z
~ =+0. ~
(10.6)
LEGENDRE EQUATION
471
To equate the powers of x, we substitute n-2=n'
(10.7)
into the first series and drop primes to write
n=-2
+
c 03
a, [-(n
+ s ) ( n+ s + 1)+ k] xn+s = 0.
(10.8)
n=O
Writing the first two terms of the first series explicitly, we get ao(-2
+
c 03
+ s + 2)(-2 + s + 1 y 2+ a1(-l+ s + 2)(-1+ s + 1)xS-l z ~ =+0. ~ + s + 2)(n + s + 1)+ an [-(n+ + s + 1) + 1~11
{~7i+2(12
,n=o
(10.9) Since this equation can be satisfied for all x only when the coefficients of all powers of x vanish simultaneously, we write
an+2
aos(s - 1) = 0, a0 # 0) a l s ( s 1) = 0) -(n s ) ( n s 1) k , n = 0 , 1 , 2 ,... . = -an ( n s 2)(n s 1)
+ + + + + + + + +
(10.10) (10.11) (10.12)
The first equation [Eq. (lO.lO)] is the indicia1 equation and its roots give the values of s as SI = 0
Starting with the second root, s
and
s2 =
1.
= 1, Equation
(10.13)
(10.11) gives
a1 = 0.
(10.14)
From Equation (10.12) we obtain the recursion relation for the remaining coefficients as an+2 = an
+
+ 2) k + 2)(n+ 3) , n = 0 , 1 , 2 , . . . ,
(n I)(. (n
-
(10.15)
472
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
which gives the coefficients 1.2 - k a0 7 2.3 2.3 - k ag = a ] = 0, 3.4 3.4 - k 1.2 - k a4 = a2 = 2.3 4.5 4.5 - k 2.3 - k a5= 3.4 5.6
a2 =
( 10.16)
~
(10.17)
~
[
~
~
[
][
~
and the series solution y(z) = aon: [l
3.4 - k
][ i ]
]
~
x 2+
( 10.19)
= O’
I,.[
+2
( 10.18)
1.2 - k
3.4- k
x4 +
...
]
.
(10.20)
For the time being, we leave this solution aside and continue with the other root,. s = 0. The recursion relation and the coefficients now become an+2 = a n
n(n+ 1) - k n = 0 , 1 , 2 , ... ( n l ) ( n 2) ’
+
(10.21)
+
(10.22)
# 0, a1 # 0, a0
( 10.23)
-k -ao, 1.2 1.2 - k a3 = ___ a1, 2.3 2.3 - k a4 = 3.4 3.4 - k 1.2 - k a 5 = - ___ 2.3 4.5
(10.24)
a2 =
(10.25) (10.26)
~
[
][
]
( 10.27)
This gives the series solution
k y(n:) = a0 1 - -x2 1.2
[
-
k (2.3 - k ) x4 1.2.3.4
+
.
,
.
1
]
(10.28)
( 10.29)
473
LEGENDRE EQUATION
Note that this solution contains the previous solution [Eq. (10.20)]. Hence we can take it as the general solution of Legendre equation, where y1 and y2 are the two linearly independent solutions and the coefficients, a0 and a l , are the integration constants. In the F'robenius method when the roots of the indicia1 equation differ by an integer, it is always advisable t o start with the smaller root with the hopes that it will give both solutions. 10.1.2
Effect of Boundary Conditions
To check the convergence of these series, we write Equation (10.28) as (10.30) and consider only the first series with the even powers. Applying the ratio test with the general term, uzn = ~ 2 ~and x the ~ recursion ~ , relation C2n+2
=
+
2n(2n 1) - k ~ (an l ) ( 2 n 2)
+
+
2 ri ~= 0, , 1 , 2 , .
.. ,
(10.31)
we obtain (10.32)
=I
+
2n(2n 1) - k (2n 1)(2n 2)
+
(10.33)
+
For convergence we need this limit to be less than 1. This means that the series converges for the interior of the interval [-1,1], that is, for 1x1 < 1. For the end points, z = 51, the ratio test is inconclusive. We now examine the large n behavior of the series. Since limn-m C Z ~ + ~ / C 1,~ we ~ can write the high n end of the series as ---f
y1=
[
1-
kx2 1.2
C2nx2n(
1
+ x2 + x4 + . . .
)I
,
(10.34)
which diverges at the end points as (10.35) The conclusion for the second series with odd powers is exactly the same. Since we want finite solutions everywhere, the divergence at the end points is unacceptable. A finite solution can not be obtained just by fixing the integration constants. Hence we turn to the parameter, k , in our equation. If we restrict the values of k as k = L ( l + l ) , Z=O,1,2 , . . . ,
(10.36)
474
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
one of the series in Equation (10.28) terminates after a finite number of terms, while the other series continues to diverge at the end points. However, we still have the coefficients, a0 and a l , at our disposal [Eq. (10.29)]; hence we set the coefficient in front of the divergent series to zero and keep only the finite polynomial solution as the meaningful solution in the entire interval [-1,1]. For example, if we take 1 = 1, hence k = 2 , only the first term survives in the second series, y2, [Eqs. (10.28) and (10.29)], thus giving 2 y(2) = a0 1 - --a: 1.2
[
2 -
+ a1x.
2(2.3 - 2) 2 4 + . . . 1.2.3.4
1
(10.37)
Since the remaining series in Equation (10.37) diverges at the end points, we set the integration constant a0 to zero, thus obtaining the finite polynomial solution as (10.38)
y'=l(z) = a1-a:. Similarly, for 1 = 2, hence k = 6, Equation (10.28) becomes y(x)
=a0
We now set
[
1 - --a:
a1 = 0
162
2]
to obtain the polynomial solution (10.40)
In general, the solution is of the form (10.41)
10.1.3
Legendre Polynomials
To find a general expression for the coefficients, we substitute k write [Eq. (10.21)],
an
= -%+2
+
+ + +
(n 2)(n 1) (1 - n)(l n 1)'
= 1(1+
1) and
(10.42)
as an-2 =
-a,
n(n - 1)
(1 - n
+ 2)(1+ n
-
1).
(10.43)
LEGENDRE EQUATION
475
Now the coefficients of the decreasing powers of x can be obtained as
an-4
(n- 2)(n - 3) (1 - n 4)(1+ n - 3) ’
= -an-2
(10.44)
+
Starting with the coefficient of the highest power, coefficients in the polynomials as
al,
we write the subsequent
a1 ’ (2-2
(2-4
(10.45)
Z(1 - 1) 2(21 - 1)’ (1 - 2)(I - 3) Z(1 - 1)(1- 2)(1 - 3) = -Ul-2 = a1 2.4(21 - 1)(21 - 3) ’ 4(21 - 3)
= -al
(10.46) (10.47)
Now a trend begins to appear, and after s iterations we obtain al-2s = al(-l)S
+
1)(1 - 2 ) . . . (1 - 2s 1) 2.4 . . . (2 ~ ) ( 2 1 - 1 ) ( 2 1 - 3 ) . . . ( 2 1 - 2 ~ + 1 ) ’ Z(1
-
(10.48)
The general expression for the polynomials can be written as
(10.49) s=o
where [&I stands for the greatest integer less than or equal to values, and the number of terms in yl are given as
[i]
1
r31
# ofterms
0 1 2 3 4 5 6 7 8 0 0 1 1 2 2 3 3 4 . 1 1 2 2 3 3 4 4 5
To get a compact expression for (10.48) as Z(1 - 1)(1- 2) .
’ ’
a2lPs,
(1 - 2s
6.
For some 1
( 10.50)
we write the numerator of Equation
+ 1) ((1I
-
2s)! I! 2s)! (1 - 2s)!.
(10.51)
476
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Similarly, by multiplying and dividing with the appropriate factors the denominat,or is written as
+ 1)
2.4.. . ( 2 ~ ) ( 2I l)(2l - 3 ) . . . (21 - 2s -
(1.2)(2.2)(2.3).. . ( 2 . ~ ) ( 2 1 l)[21 - 2](2I - 3)[2I - 41 . . . (2l - 2s [2l - 2][2l - 41.. . [2I - 2s f a ] . [2l - 2s]!
+ 1) . [2l - as]! (10.52)
- 2SS!-
-
[all (2l 2"
-
sI!
l)[2I - 2](2I - 3)[2l - 41 . . . (2l - 2s + 1) . [2l - 2 ~ ].![I [I(I - 1)(l - 2 ) . ' . (1 - s + 1)[l - s ] ! ] [21 - as]! (10.53) '
s!(2l)!(l - s ) ! I!(2I - 2s)!
(10.54)
Combining Equations (10.51) and (10.54) in Equation (10.48), we obtain 1! l!(2l - 2s)! (I - as)! s!(2l)!(I - s ) ! (l!)2(2I - 2s)! al(-l)s (I - 2S)!S!(21)!(1- s ) !'
ul-2s = U l ( - l ) S
(10.55) (10.56)
which allows us to write the polynomial solutions as
(10.57) (1!)22l
= a[-
(2l)!
c
(-1y 21 (I
-
s=o
(2l- 2s)! &ZS 2s)!s!(l - s)!
(10.58)
-
Legendre polynomials are defined by setting Ul =
(2l)! (1!)221'
-
(10.59)
as rf1
c 21
p1(2= ) s=o
(-1y
(2I - 2s)! 51-2s (I - 2s)!s!(I - s ) !
(10.60)
These are the finite polynomial solutions of the Legendre equation IEq. ( l O . l ) ] in the entire interval [-1, 11:
LEGENDRE EQUATION
477
Legendre Polynomials
Po ).( = 1, Pl(.) = 5 , P2
(i)
(x)=
(;)
P3 (x)=
(i) P5(z) (i) P ~ ( x )=
(k)
(10.61)
\5x3 - 3x1,
[35x4 - 30x2
[63x5 - 70z3
=
p6(x)=
[3x2 - 13 >
+ 31 ,
+ 15x1,
+
[231x6 - 3 1 5 ~ 1052' ~ - 51.
10.1.4 Rodriguez Formula Legendre polynomials are also defined by the Rodriguez formula
PZ(2) =
1 d' 2l1! dx
(10.62)
1)Z.
-
To show its equivalence with the previous formula [Eq. (10.60)], we use the binomial formula and expand ( x 2- 1)' as
(10.63) where the binomial coefficients are defined as
(a)
I!
(10.64)
= s!(l- s)!'
We now write Equation (10.62) as Z
1 dz P1(x)= -C(-l)" 211! dxz s=o
1
=
I!
1 C(-1)" 211! s=o
1! s!(Z
s!(Z
-
-
%2(Z-s)
s)!
d' -x2(z-s) s ) ! dxl
(10.65) (10.66)
When the order of the derivative is greater than the power of x, we get zero:
(10.67)
478
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Hence we can write
r41
<
I
(10.68) Using the useful formula
dlxm dxl
-= m(m - l ) ( m - 2 ) . . . ( m- 1 - 1 ) P - l
(10.69) (10.70)
we obtain
rt1
=
c s=o
(21 - 2s)! 2ps T s ! ( l- s ) ! ( l- 2s)!
(10.72)
which is identical to our previous definition [Eq. (10.60)].
10.1.5
Generating Function
Another useful definition of the Legendre polynomials is the generating function definition: w
1
T ( x , t )=
(1 - 2xt
+ t2)1/2
= C f i ( X ) t l , It1
< 1.
(10.73)
l=O
Equivalence with the previous definitions can be established by using complex contour integrals. Let the roots of (1 - 2xt t 2 )be given as T I and r2 in the complex t-plane. If T is the smaller of the lrll and 1 ~ 2 1 ,then the function
+
1 (1 - 2xt t2)1/2
( 10.74)
+
is analytic inside the region It1
< T . Hence we can write
the Taylor series
r,
( 10.75)
where the coefficients are defined by the contour integral 8
(10.76)
LEGENDRE EQUATION
I Figure 10.1
479
u - complex
Contour C’ in the complex u-plane.
which is evaluated over any closed contour C enclosing t It1 < r. We now make the substitution 1 - ut
=
(1 - 2xt
= 0 within
+ t2)1’2,
the region
( 10.77)
which is also equal to t = 2(u - x ) / ( u 2- 1) and convert the contour integral [Eq. (10.76)] into
( 10.78)
2‘ (u- 2 ) 1 + 1 ’
where C’ is now a closed contour (Fig. 10.1) in the complex u-plane enclosing the point u = x . Note that t = 0 corresponds t o u = x. Using the Cauchy integral formula, namely (10.79) where f ( z ) is analytic within and on C, with the replacements z = u,zo = x , n = 1 and (10.80) we obtain
1 dl(U2 q ( x )= Substituting these coefficients,
T ( x , ~=)
al,
+t2)1/2
1)’
(10.81)
into Equation (10.75), namely 03
1
(1 - 2xt
-
1=0
l=O
(10.82)
480
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
we obtain the Rodriguez formula:
1 d1 PL(2) = - c ( x 2
-
2l1! dx
1)l.
(10.83)
10.1.6 Special Values Special values of the Legendre polynomials are given as
We can prove the special values at the end points, x = ~ 1by, using the generating function [Eq. (10.73)] and the geometric series as
m
(10.86) (10.87)
thus obtaining
(10.88)
fi(r1)=
For the special values at the origin, we write
1
T ( 0 , t )= (1
03
’
+t2)1/2
=
Ct19(0), l=O
(10.89)
481
LEGENDRE EQUATION
and use the binomial expansion t o write the left-hand side as (-1) ( - 2 ) . . . ( - 2 3 ) ( t 2 yt (1 + t 2 ) - " 2 = 1 + ( - + + . . . + 2 2 l! (10.90) w
1.3.5.. , (2l - l)tZ1 (10.91) =C(i1)' 211! 1=0
00
-
x'")' 1=0
1.2.3... (21 - 1)(21)t2' 211!2.4.6.. . (21 - 2)(21) (10.92) (10.93)
l=O
(10.94)
Finally, comparing with the right-hand side of Equation (10.89), we get the desired result
10.1.7
Recursion Relations
Differentiating the generating function with respect to
2,
we writ'e (10.95)
t (1 - 22t
t (1 - 22t
c
c 00
+t2)3/2
=
+t2)1/2
=
(10.96)
Pf( z ) t l ,
1 =o 00
l=o
m
c
Pf(2)tl - 22
1=0
(10.97)
1 P;(z)tl+l + cP;(2)tz+2 (10.98) =o w
00
P1(z)tl'l =
/=0
EP[(Z)(l- 22t + t 2 ) t l , 03
1
l=O
In the first sum on the right-hand side we make the change 1 = I' -t 1, and in last sum we let I = I" - 1 and then drop primes to write
c 00
P1(z)tl+l =
1=0
x m
1=-1
00
P;+,(z)tL+l- 2 2 c Pf(z)tl+l + 1=0
1P;-,(z)tl+l. w
1=1
(10.99)
Since
PA = 0 ,
(10.100)
482
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
we can start all the sums from 1 = 0 to write oc
C [ P , ( x )- P,',,(X) 1=o
+ 2xP[(x)
-
P;-l(x)]tl+l= 0.
(10.101)
This equation can be satisfied for all t only when the coefficients of all the powers of t vanish simultaneously, thus yielding the recursion relation
P1(x)= P;+l(x) - 2xP[(x)+ P;-l(x).
(10.102)
A second relation can be obtained by differentiating the generating function with respect to t as
+ l)Pl+l(X) +lR-l(X).
(21+ l ) x P l ( x )= ( 1
(10.103)
These two basic recursion relations are very useful in calculations and others can be generated by using these.
10.1.8 Orthogonality We can write the Legendre equation [Eq. ( l O . l ) ] for two indices, 1 and m, as d [(I
dx
-" %I
+l(l+
l)q = 0
(10.104)
and
d
[
dPm
+ m(m+ 1)P,
dx (1 - x2)-] dx
(10.105)
= 0.
We multiply the first equation by P, and then subtract the second equation multiplied by Pl, and integrate the result over the entire range to write
l1d", [ P,-
+ [1(1 + 1)
]
( 1 - x 2 )-dpz d x -
dx
-
+
1
dPm
d", [(1-x2)-] dx
8-
dx
1
m(m l)]
-1
Pz(x)P,(x) dx = 0.
(10.106)
Integrating the first two integrals by parts gives
(10.107)
LEGENDRE EQUATION
483
Since the surface terms vanish, we are left with the equation
+ [1(1 + 1) - m(m+ l)]
Pl(z)Pm(x)dx = 0,
(10.108)
which is
We now have two possibilities:
This is called the orthogonality relation of Legendre polynomials. To evaluate the normalization constant, Ni,we use the Rodriguez formula [Eq. (l0.62)] to write
(10.111)
l-fold integration by parts gives
N'
=
1
1 ~
22' (1!)2
[(-I)'/
1
d2' ( x 2 - 1 ) ' p ( z 2 - 1)' d x .
-1
(10.113)
Since the highest power in the expansion of (zz- 1)' is x2',we use the formula
d2' dx21
-(z2
-
1)l =
(21)!
(10.114)
t o write
(-1)'(21)!
(zz - 1)' d z .
(10.115)
Using trigonometric substitution and contour integrals (Bayin) the above integral can be evaluated as
l 1 ( z 2- 1)' d x =
(-
1)122l+1(1!)2
(21 + l)!
'
(10.116)
484
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
thus yielding the normalization constant as
N1
= -
(-1)L (21)! (-1)l221+1(1!)2 221(1!)2 . (2l I)! 2 21 1'
+
+
(10.117) (10.118)
10.1.9 Legendre Series If a real function, F ( x ) , is continuous apart from a finite number of discontinuities in the interval [-1,1], that is, F ( x ) is piecewise continuous and the integral J", [F(x)I2 dx is finite, then the series c ~ q ( z ) where , the coefficients are calculated using the orthogonality relation,
xzo
9 ( x ) P m ( x )dx = [2/(2m
+ 111hirn,
( 10.119)
as
( 10.120) converges to F ( x ) when x is not a point of discontinuity:
(10.121) 1=0
At the point of discontinuity, the series converges to the mean (Lebedev): 1
- [F(x:+)+ F ( x - ) ] . 2
(10.122)
This series is called the Legendre series of F ( x ) .
Example 10.1. Generating function: A practical use €or the generating function can be given in electromagnetic theory. Electrostatic potential, $(r, 6, q5), of a point charge, Q, displaced from the origin along the z-axis by a distance a is given as (Fig. 10.2) (10.123)
Q(r, 6 , 4 ) = Q R -
Jr2 -
Q -
+ a2 Q 2ar cos 6
(10.124)
-
1
( 10.125)
LEGENDRE EQUATION
485
Q‘ a
Figure 10.2
Electrostatic potential of a point charge
If we substitute t = a / r and t = cos 8 , electrostatic potential becomes
(10.126) Using the generating function of Legendre polynomials [Eq. (10.73)], we can express Q(r,8 , 4 ) in terms of the Legendre polynomials as (10.127)
Example 10.2. Another derivation for Nl: The normalization constant can also be obtained from the generating function. We multiply two generating functions and integrate between -1 and 1 to write
L1
1
1
T ( x ,t ) T ( x t, ) d z =
s_,
1 J1- 2xt
1
+ t 2 J1-
2xt
+t2
dx
1 1 -- In(1- 22t t2)1-, 2t 1 1 = -- ln(1 - t ) - ln(1 t ) .
+
=
+t
t
+
(10.128) (10.129)
(10.130)
This is also equal to
FF[L1
1 dzfiPk] t‘+k = -- ln(1 - t )
1=0 k=O
t
+ -1t ln(1 + t ) .
(10.131)
486
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Expanding the right-hand side in powers o f t and using the orthogonality relation [Eq. ( l O . l l O ) ] we write
( 10.133) Thus, obtaining the normalization constant as
1, 1
Nl =
2 dxPf(x) = 21 1'
+
(10.134)
s-, 1
Example 10.3. Evaluation of the integral I k l = x k f i ( x ) dx, k 5 1 : This is a useful integral in many applications with Legendre polynomials. We use the Rodriguez formula to write I k l as
( 10.135) Integration by parts gives:
where the surface term vanishes. A second integration by parts gives
where the surface term vanishes again and a trend appea,rs. After k-fold integration by parts, all the surface terms vanish, thus giving
(10.138) For k
< 1 the integral, I k L ) vanishes. This can be seen by writing
Ikl
as
HERMITE EQUATION
487
which becomes (10.140) (10.141) Since k < 1 and both k and 1 are integers, the highest k value without k being equal to 1 is 1 - 1, which makes the integral zero: J(1-l)l
=
(-l)l-l(l211!
l)!
[(x2-
1 -1
= 0.
(10.142) (10.143)
For the lower values of k , the derivative in Equation (10.141) always contains positive powers of (x2- l ) ,which vanishes when the end points are substituted. When k = 1, no derivative is left in Equation (10.138); hence the integral reduces to
( 10.144) This is the same integral in Equation (10.116). Substituting its value, we obtain 2l+1(1!)2
Ill =
(21
(10.145)
+ l)!
Now the complete result is given as
10.2 HERMITE EQUATION 10.2.1 Series Solution Hermite equation is defined as
h”(z)- 2Ich’(Z)
+ (c
-
l)h(Ic) = 0,
Ic E
[-m, 001,
( 10.147)
where 6 is a real continuous parameter, which is unrestricted at this point. Hermite equation is primarily encountered in the study of quantum mechanical harmonic oscillator, where E stands for the energy of the oscillator. We
488
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
look for a finite solution in the entire interval [-m, m]. Since x = 0 is a regular point, we use the Frobenius method and try a series solution of the form 03
h(x)= Ca,xk+s, k=O
a0
# 0.
(10.148)
Substituting h ( x ) and its derivatives,
c 00
hl(x) =
Uk(k
+ s)zk+S-l,
(10.149)
k=O
into the Hermite equation we obtain 03
M
k=O
k=O w +(f
- 1)
akxk+‘
(10.151)
= 0.
k=O
To equate the powers of x , we let k primes:
2 = k’ in the first series and then drop
-
co
03
k=O m
k=O If we write the first two terms of the first series explicitly, we can write the remaining terms under the same sum as
+ s ( s + l)a&-l {ak+Z(k + s + 1 ) ( k + s + 2 ) ak [2(k+ S) - ( E s ( s - l)a()xS-2
+
03
-
(10.153) -
I ) ] } 5k+S= 0.
k=O
Setting the coefficients of all the powers of x t o zero, we obtain
s ( s - 1)ao = 0,
a0
# 0,
s ( s + 1)al = 0, 2(k + s ) - c + 1 k=O,l, ... . ak+2 = ak (k+s+l)(k+s+2)’
(10.154) ( 10.155) (10.156)
489
HERMITE EQUATION
The first equation is the indicia1 equation, the roots of which give the values of s as 0 and 1. Since their difference is an integer, guided by our past experience, we start with the smaller root, hoping that it will directly yield the general solution. Now the recursion relation becomes
2k-€+1 k = O , l , 2,... , ( k l ) ( k 2) '
+
ak+2 = a k
+
(10.157)
which gives the coefficients a0
# 0,
a1
# 0,
a2
=
a3 a4
a5
1--E 1.2 ao, 3--E =2.3 5--E 1--E 5 - € = -a2=-1.2 . 3.4 ao, 3.4 7--E 3--E 7 - t = -a 3 = - 2.3 ' 4.5 4.5
-
and the series solution
[+
h ( x ) = a0 1
+ a1
-x2
1--E 5--E x 4 + . . . +1.2 3.4 '
( 10.158)
I
1
3--E 7--E [x + 2 .3x3+--2.3 . 4.5 x 5 + . . . . 3--E
(10.159)
As we hoped, this solution contains two linearly independent solutions: 1--E hl(X)= 1 + 1 .2X 2
5--E + 1--E x4+.. 1.2 3.4
(10.160)
and
3--E 7 - € 3--E h2(5) = 2 + 2 .3x 3 + - -2.3 4.5 x 5 + . . . .
(10.161)
'
Since a0 and a1 are completely arbitrary, from now on we treat them as the integration constants and take the general solution as
+
h ( x ) = aohi(x) ~ h z ( x ) .
(10.162)
To check the convergence of these series, we write them as 00
00
hl
a2nx2n and h2 =
= n=O
~ 2 ~ +2 n1+xl n=O
(10.163)
490
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
and concentrate on the first one with the even powers: h1(z) = ao[l
+ a2x2 + a4x4 + . . .I.
(10.164)
From the recursion relation [Eq. (10.157)] the ratio of the coefficients of two consecutive terms is given as (10.165) For sufficiently large n, this has the limit (10.166) Hence, we can write the series as
... where the second series is nothing but (10.168)
( 10.169) In other words, h l ( z ) diverges as ex2when J: + f m . This conclusion is also valid for the second series, h2(z),with the odd powers of z. Our aim is to find solutions that are finite everywhere in the interval [-m, m]. This cannot be accomplished by just adjusting the integration constants, a0 and al. Hence, we turn to the parameter c in the differential equation and restrict it to the following values: c = 2 n + 1 , n=0,1,2, . . . , n = 0 1 2 3 4 ... € = 1 3 5 7 9 ..‘.
(10.170) (10.171)
This allows us to terminate one of the series in Equation (10.159) after a finite number of terms. For the even values of n, the general solution now looks like
+ . . .anJ:nl + a l [ z + a 3 z3 +...I, n = e v e n .
hn(z)= ao[l+ a222
(10.172)
Since the second series diverges as x --t i c o , we set al = 0 and keep only the polynomial solutions. Similarly, for the odd values of n we have
+ a2z2 + ’ . . ] + u ~ [ +J :a3x3 + . . . + anzn],
h n ( z )= ao[l
12
= odd.
(10.173)
HERMITE EQUATION
491
We now set a0 = 0 and keep only the polynomial solutions with the odd powers of x. In summary, we obtain the following finite polynomial solutions:
n = 0, n = 1, n = 2, n = 3,
10.2.2
ho(x) = ao, hl(x) = a l x , h2(x)= ao(1- 2x2), h3(x) = a l ( z - $x3),
(10.174)
Hermite Polynomials
To find a general expression for the polynomial solutions of the Hermite equation, we write the recursion relation [Eq. (10.157)] for the coefficients as
(10.175) Starting with the coefficient of the highest power of x,a,, we write the coefficients of the decreasing powers of x as
n(n - 1) n(n - 1) (10.176) -an 2 ( n - 72 + 2) 2.2 ’ n(n - 1)(n - 2 ) ( n - 3) (n- 2 ) ( n - 3) . (10.177) an-4 = -an-2 = (-)(-)an 2.2 (2.4) 2.4
an-2
=
-a,
After j iterations we obtain
an-2j
(-l)jn(n - l ) ( n - 2 ) . . . ( n - 2 j 2 j 2 . 4 . . . (2j) (- 1)in! - an ( n- 2 j ) ! 2 ? 2 j j ! . = a,
+ 1)
(10.178)
( 10.179)
We can now write h n ( x ) as
(10.180) where [5]stands for the greatest integer less than or equal t o .: If we take a, = 2,, the resulting polynomials are called the Hermite polynomials:
(10.181)
492
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Hermite Polynomials
H o ( z ) = 1, Hl(2) = 22, H 2 ( z ) = 4x2 - 2, H3(2) = 8z3 - 122, H ~ ( x=) 16z4 - 482' + 12,
10.2.3
(10.182)
Contour Integral Representation
Hermite polynomials can also be defined by the complex contour integral
(10.183) where C is any closed contour enclosing the point integral by using the residue theorem as -22
dz
Using the expansion e z &z2
-
( z - z)n+l
=
=
2ni Residue
{
2.
We can evaluate this
(10.184) Z=X
.
Ernz m / m ! ,we write
e-(Z+x)(z-x)
( z - z)n+l
=x
+ 2 p ( z - z p.
(-l)"(z
O0
m=O
m!(z- z ) n + l
(10.185)
Substitution of the binomial expansion
( z +z y
=
[2z
+ ( z -"I).
(10.186)
( 10.187) gives e:cz-z'
( z - 2)7L+l
c o r n
(- l)"m!
(2z)"-L(z
-.)I
(z-X ) m m!(z- z)n+l
(10.188)
7n=0 I=O
m=O I=O
l!(m- I ) !
z m - l ( z - 2)m+l-n-l
(10.189)
Since the desired residue is the coefficient of the ( z - z)-' term, in the above double sum [Eq. (10.189)] we set
m + l - n - 1 = -1, m + l - n = 0, m=n-1,
(10.190) (10.191) (10.192)
493
HERMITE EQUATION
which gives the residue as
This yields the value of the integral in Equation (10.184) as dz
I41 ( 10.194) 1=0
Hence, Hermite polynomials [Eq.(10.183)] become (10.195) which is identical to our previous definition [Eq. (10.181)].
10.2.4 Rodriguez Formula We first write the contour integral definition [Eq. (10.183)]: (10.196) and the derivative formula:
( 10.197) where f ( z ) is analytic within and on the contour C , which encloses the point 20. Making the replacements
f ( z ) = e-2'
(10.198)
and
zo = z
(10.199)
in Equation (10.197): (10.200) and using this in Equation (10.196) we obtain the Rodriguez formula n
-5'
, n = 0 , 1 ; 2, . . . . H , ( z ) = (-1) n ex 2 d e dxn
(10.201)
494
10.2.5
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Generating Function
Starting with the definition
(10.202) we aim to construct a series of the form
2
Hn(X)tn
n!
n=O
(10.203) '
Using equation (10.202) we write
(10.204) and make the dummy variable change n
+ n'
+j .
(10.205)
After dropping primes, we obtain
5
n=O
Hn (x>tn n! =
c o n n=O j = o c o n
(-1)jy-j
,.ra-jtn+j
(n-j)!j!
(10.206)
(10.207) n=O
j=o
where the second equation is multiplied and divided by n!. We rearrange Equation (10.207) as
The second series on the right-hand side is the binomial expansion (10.209) hence Equation (10.208) becomes
c 00
C 00
Hn(x)tn = tn(2x - t)" n! n! n=O n=O
( 10.210)
HERMITE EQUATION
495
Furthermore, the series on the right-hand side is nothing but the exponential function with the argument t(2x - t ) :
(10.211) thus giving us the generating function, T ( z ,t ) , of H n ( z ) as (10.212)
10.2.6
Special Values
Using the generating function, we can find the special values at the origin as
T ( 0 , t )= e-t2 =
c O0
n=O
Hn(0)tn n! '
(10.213)
( 10.214) which gives
( 10.215) ( 10.216) 10.2.7
Recursion Relations
By using the generating function, we can drive the two basic recursion relations for the Hermite polynomials. Differentiating T ( z , t )with respect to x we write
(10.217) Substituting the definition of the generating function t o the left-hand side, we write this as
(10.218) (10.219) Making a dummy variable change in the first series:
n+n'-I
(10.220)
496
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
and dropping primes, we write
(10.221) (10.222) which gives the first recursion relation as
2nHn-1(x)
= Hk(X),
n = 1,2,... .
(10.223)
Note that HE,= 0. For the second recursion relation we differentiate T ( 5 , t ) with respect to t and write
(10.224) (10.225) (10.226)
( 10.227) To equate the powers oft, we let n -+n'-1 in the second series and n in the third series and then drop primes t o write 03
tn
C [ 2 x H n - 2nHn-1 - H n + i ] , r = 0. n=O
+ n"+l
(10.228)
Setting the coefficients of all the powers of t t o zero gives us the second recursion relation:
2xHn
-
2nHn-1
= Hn+l.
(10.229)
10.2.8 Orthogonality We write the Hermite equation [Eq. (10.147)] as
Hl
-
2xH; = -2nHn,
(10.230)
where we have substituted 2 n = E - 1. The left-hand side can be made exact (Chapter 9) by multiplying the Hermite equation with e-"' as
e - x Z Hnf f- 2 x e P x 2 Hnf = - 2 n e P x 2 H n ,
(10.231)
HERMITE EQUATION
497
which can now be written as
(10.232) and integrate over [-m, co]to write
We now multiply both sides by H,(x)
x2dH,
d
H,(x)Hn(x)e-x2dx,
lcoHmz [ep dx ]
(10.233)
which, after integration by parts, becomes
H:,H:,e - x 2
co
H:, H;epx2dx = -2n
Hm(x)Hn(x)e-x2dx. (10.234)
Since the surface term vanishes. we have
H k HAe-x2dx = 2 n
Hm(x)H,(z)e-x2dx.
(10.235)
Interchanging n and m gives another equation:
L
00
03
H:, H:,e-x2dx
=
2m
L
Hm(x)H,(x)e-x2dz,
(10.236)
which, when subtracted from the original equation [Eq. (10.235)], gives 00
2(m
-
Hm(x)Hn(x)e-x2dx
n)
= 0.
(10.237)
J-CO
We have two cases:
which shows that the Hermite polynomials are orthogonal with respect to the weight factor e-”’. To complete the orthogonality relation we need to calculate the normalization constant. Replace n by n - 1 in the recursion relation (10.229) to write
22Hn-1 Multiply this by
-
2(n - l)Hn-z
= H,.
(10.239)
H, :
2xHnH,-1
-
2nHnH,-2
+ 2HnHn-z
= H:.
( 10.240)
498
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
We also multiply the recursion relation (10.229) by H,-1 equation:
2xHn-1H,
2nH,_, 2
-
t o obtain a second
= Hn-1H,+1.
(10.241)
Subtracting Equation (10.241) from (10.240), we get
2xHnH,-1 =
Hi
+ 2H,H,-2
2nH,H,-2
-
-
+ 2nH?-,
2zH,-IH,
H,-lH,+I
-
(10.242)
or
+ 2HnHn-2 + 2nH,-1 + 2
-2nH,H,-2
Hn-lH,+l
= H?,
(10.243)
which, after multiplying by the weight factor e P x 2and integrating over [-m, 001, becomes 00
33
-
[
2n
+2 [
dxePx2H,H,-2
dxePx2H,H,_2
J -CC
J-CC
J-CC
J-Oc
(10.244) J
-00
Using the orthogonality relation [Eq. (10.238)],this simplifies t o 33
2n
[
CC
dxe-x2H:-1
[
=
dxePx2H:,
(10.245)
n = 1,2,3,. ..
(10.246)
J-02
J--03
2nN,-1
= N,,
Starting with N,, we iterate this formula to write
N , = 2nN,_1 = 2n2(n - 1)N,-2 = 2122(n- 1 ) 2 ( n- 2)N,-3
= 2j+'n(n
-
1 ) . . . ( n - j ) Nn-j-1.
(10.247)
We continue until we hit j = n - 1 , thus
N,
= 2,n!No.
(10.248)
We evaluate No using HO= 1 as
lCC 00
NO =
e - x 2 H i ( x ) dx
(10.249) (10.250)
J
=
-33
A,
(10.251)
HERMITE EQUATION
499
which yields N,, as
N , = 2 n n ! f i ,n = 0 , 1 , 2 , .. . .
(10.252)
10.2.9 Series Expansions in Hermite Polynomials A series expansion for any sufficiently smooth function, f ( x ) ,defined in the infinite interval (-00,oo)can be given as M
(10.253) n=O
Using the orthogonality relation,
(10.254) J -0
we can evaluate t,he expansion coefficients, cn, as
Convergence of this series is assured, granted that the real function, f(x),defined in the infinite interval (-co,co) is piecewise smooth in every subinterval [-a, a] and the integral
is finite. At the points of discontinuity the series converges to
(10.257)
A proof of this theorem can be found in Lebedev. Example 10.4. Expansion of f ( z ) = e a x , a is a constant: Since f ( z ) is a sufficiently smooth function, we can write the convergent series:
(10.258) n=O
where the coefficients are
(10.259)
500
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Using the Rodriguez formula [Eq. (10.201)], we write (10.260) (10.261) (10.262) (10.263) We have used n-fold integration by parts in Equation (10.261) and completed the square in the last integral. Now the final result can be expressed as (10.264)
10.3
LAGUERRE EQUATION
Laguerre equation is defined as d2Y dx2
2-
+ (1
-
dY
x)-
dx
+ ny = 0,
x
E
[O, 001,
(10.265)
where n is a real continuous parameter. It is usually encountered in the study of single electron atoms in quantum mechanics. The free parameter is related to the energy of the atom. 10.3.1 Series Solution Since x = 0 is a regular singular point, we can use the Frobenius method and attempt a series solution of the form (10.266) with the derivatives y'(z, s) =
c
ar(T
+ s)zr+s-l,
(10.267)
r=O 00
f ( x , s) =
Car(.+ r=O
S)(T
+s
-
(10.268)
501
LAGUERRE EQUATION
Substituting these into the Laguerre equation, we get
c M
+ s ) ( r+ s
a,(?-
T=o
l)xT+s-l
-
c
c
a, ( r
M ._
a,(?-
+ s)xT+s-l
T=o
00
-
c c +
+n
a,(?- 4-s)xT+s
+ s)22T+s-1
-
c
a,(r
00
0,
(10.269)
= 0.
( 10.270)
aTIcT+s-
+s-
n)xT+S
In the first series we let r - 1 = r’ and drop primes at the end t o write 00
00
c 03
~
o
~
+~
+ +
x[ u ~~+ ~-( Ts~ 1)2- U,(T
+ s - n ) ]xT+’
(10.272)
= 0.
T=o
Equating all the coefficients of the equal powers of x to zero we get a092
= 0, a0
aT+l
=
# 0,
( 10.273)
+
(?- s - n ) r s 1)2 ’
(7-
+ +
= 0,1,
(10.274)
In this case the indicia1 equation (10.273) has a double root, s = 0. The recursion relation becomes a,+1
=
-a,-
n-r
(r
+
’
r=O,l,...
,
(10.275)
which leads to the series solution
n(n - 1 l x 2 + . . . + (-I), n(n - 1).. . ( n - r (2!)2 (r!)2
+ 1)x T +
...
( 10.276) This can be written as 00
r=O
n(n - 1 ) . . . ( n - r + 1) (r!)2
(10.277)
Laguerre equation has also a second linearly independent solution. However, it diverges IogarithmicalIy as x + 0, hence we set its coefficient in the general solution to zero and continue with the series solution given above.
1
502
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
10.3.2
Laguerre Polynomials
As we add more and more terms, this series behaves as
[::
y(x) = a0 1 - -x+
. ' . +arxr
hence it diverges as e x as x 403. For a finite solution everywhere in [O, co] we have no choice but to restrict n to integer values, the effect of which is to terminatc the series [Eq. (10.276)] after a finite number of terms, thus leading to the polynomial solutions:
r=O
n(n - I ) . . . ( n - r (r!)2
+ 1)
(10.279)
(10.280) Polynomials defined as (10.281) are called the Laguerre polynomials and constitute the everywhere finite solutions of the Laguerre equation:
10.3.3
Contour Integral Representation
Laguerre polynomials are also defined by the complex contour integral
dz
(10.283)
where the contour C is any closed path enclosing the point z = x. To show the equivalence of the two definitions, we evaluate the contour integral by using the residue theorem as (10.284)
To find the residue, we use the expansions
zn = (2+ x - x)n = [ ( z- z) 21"
+
(10.285) (10.286) (10.287)
LAGUERRE EQUATION
503
and -
ex-r
e-(r--2)
(10.288)
00
= C(-l)m
(2 -
m=O
x)."
m!
(10.289) '
Using these, we write the integrand of the contour integral [Eq. (10.283)] as zrl
e x -z
( z - x)n+l
c n
-
n! ( z - x)G7?--l l!(n- l ) ! ( z - x)n+l m=O 1 =o a
m=O
n
(.
(- l)"n!
-
x)Z-n-l+m
( z - x)" m! 5 n-Z .
(10.290) (10.291)
z=o
For the residue we need the coefficient of the (z-x)-' term, that is, the terms with
I-n-
I + m = -1, l=n-m.
(10.292) (10.293)
Therefore the residue is obtained as (10.294) Substituting into Equation (10.284)' we obtain (10.295) which agrees with our previous definition [Eq. (10.281)]
10.3.4
Rodriguez Formula
Using the contour integral representation [Eq. (10.283)] and the Cauchy derivative formula: 27ri n!
--f'"'(Zo)
=
(10.296)
we can write the Rodriguez formula of Laguerre polynomials as
ex dn(xne-2) n! dx"
Ln(x)= -
(10.297)
504
10.3.5
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
Generating Function
To obtain the generating function, T ( z , t ) ,we multiply L,(x) [Eq. (10.295)] by tn and sum over n to write
cc c o n
=
n=O T=O
( -l)'n!z'tn (n - r)!(r!)Z'
+ s t o write
We introduce a new dummy variable s as n = r 0000
+
(-l)T(r
(10.299)
S)!ZTtY+S
(10.300)
r=O s=O
Note that both sums now run from zero to infinity. We rearrange this as
c 00
T ( z , t )=
03
(-1)T5rtr
r!
r=o
s=o
+
( r s)!tS (r!)s! .
(10.301)
If we note that the second sum is nothing but (Dwight) 00
(r
+ s)!t"
1
(10.302)
s=O
we can rewrite Equation (10.301) as
(10.303) Finally, using ex of L,(x):
=
~ ~ o z r we / robtain ! the generating function definition 1
T ( z , t )= (1 - t ) exp
-xt
03
[m]
(10.304)
n=O
10.3.6 Special Values and Recursion Relations Using the generating function, and the geometric series, 1/(1- t ) = C,"==, tn, we easily obtain the special value
L,(O) = 1.
(10.305)
From the Laguerre equation [Eq. (10.265)] we also get by inspection
L',(O) = -n.
(10.306)
505
LAGUERRE EQUATION
Differentiating the generating function with respect t o t gives
(n+ I)Ln+l(X)= (271
+ 1 - x)L,(x)
-
US
nLn-l(x)
(10.307)
and differentiating with respect t o x , we obtain
L',+, (X )
-
L;(z)
=
(10.308)
-Ln ( x ).
Using the first recursion relation [Eq. (10.307)], the second recursion relation can also be written as
x L ~ ( x=) nLn(x) - nLn-I(x).
(10.309)
10.3.7 Orthogonality If we multiply the Laguerre equation by e c X as
d2L, (10.310) + (1- x)e-"--dLn = -ne-"L,(x), ePxxdx2 dx the left-hand side becomes exact and can be written as (see Chapter 9) (10.311) We first multiply both sides by L m ( x ) and then integrate by parts:
-
Jd
03
z dL, [xe-"%]
dx = -n
Jd
(10.313) 03
e-"LnLm dx. (10.314)
Interchanging m and n, we obtain another equation: O0
dL,
( 10.315)
which, when subtracted from Equation (10.314) gives ( m- n)
e-"L,L,
dx = 0.
( 10.316)
This gives the orthogonality relation as
e-"L,L,dz
= N,Sn,.
(10.317)
Using the generating function [Eq. (10.304)], the normalization constant, Nn; can be obtained as 1.
506
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
10.3.8 Series Expansions in Laguerre Polynomials Like the previous special functions we have studied, any sufficiently smooth real function in the interval [0, 00) can be expanded in terms of the Laguerre polynomials as 03
(10.318) n=O
where the coefficients en are found by using the orthogonality relation: cn = JO
e -" f ( x )L, ( x ) dx.
(10.319)
Convergence of this series t o f ( x ) is guaranteed when the real function, f ( x ) , is piecewise smooth in every subinterval, [ X I , Z ~ ]where , 0 < x1 < 2 2 < 00,of [O, cm) and x is not a point of discontinuity, and the integral
(10.320) is finite. At the points of discontinuity the series converges to (Lebedev)
(10.321) Example 10.5. Laguerre series of e-ax : This function satisfies the conditions stated in Section 10.3.8 for a > 0; hence we can write the series 00
(10.322) n=O
where the expansion coefficients are obtained as
(10.323)
$1
00
=
e-a" __ dn (e-"xn) dx dxn
(10.325)
n!
an
-
(a
(10.324)
+ 1)"+1' n = 0 , 1 , ... .
(10.326)
PROBLEMS
507
PROBLEMS
1. Find Legendre series expansion of the step function:
Discuss the behavior of the series you found at x
= a.
Hint: Use the asymptotic form of the Legendre series given as
where
E
is any positive number (Lebedev).
2. Show the parity relation of Legendre polynomials: P1(-x)
=
(-1)1fi(x).
3. Using the basic recursion relations [Eqs. (10.102) and (10.103)], derive (i)
Pi+l(x) = (1
+ 1)Pi(x)+ x q ’ ( x ) .
4. Show the relation cc 1-t2 = C(21fl)fi(z)tl. (1 - 2xt f t2)3/2 1=0
5. Show that Legendre expansion of the Dirac delta function is
6. Show that Hermite polynomials satisfy the parity relation
Hn(x)= (-1yHn(-z). 7. (i) Show that Hermite polynomials can also be defined as
508
SECOND-ORDER DIFFERENTIAL EQUATIONS AND SPECIAL FUNCTIONS
(ii) Define your contour and evaluate the integral t o justify your result.
8. Show the integral
9. Show that 00
z2e-"2H,(z)Hm(x) dz
= 2n-W2(2n
+ l)n!6,, + 2"7W(n + 2)!5,+a,m
+ 2n-27r1'2n!5,
- 2 ,m .
10. Show the Laguerre expansion m
xm
=
C c , ~ , ( x ) , m = 0 , 1 , 2 , .. . , n=O
where
11. Using the generating function definition of Laguerre polynomials, show that the normalization constant, N,, in
is 1.
r
e-"L,L,dx
= N,6,,
12. Prove the basic recursion relations of the Laguerre polynomials:
13. Using basic recursion relations obtained in Problem 10.12, derive
z L k ( x ) = n L n ( x )- nL,-l(x)
CHAPTER 11
BESSEL’S EQUATION AND BESSEL FUNCTIONS
Bessel functions are among the most frequently encountered special functions in physics and engineering. They are very useful in quantum mechanics in WKB approximations. Since they are usually encountered in solving potential problems with cylindrical boundaries, they are also called cylinder functions. Bessel functions are used even in abstract number theory and mathematical analysis. Like the other special functions, they form a complete and an orthogonal set. Therefore, any sufficiently smooth function can be expanded in terms of Bessel functions. However, their orthogonality is not with respect to their order but with respect to a parameter in their argument, which usually assumes the values of the infinitely many roots of the Bessel function. In this chapter, we introduce the basic Bessel functions and their properties. We also discuss the modified Bessel functions and the spherical Bessel functions. There exists a wealth of literature on special functions. Like the classic treatise by Watson, some of them are solely devoted to Bessel functions and their applications.
Essentials of Mathematical Methods in Science and Engineering. By $. SelGuk Bayin Copyright @ 2008 John Wiley & Sons, Inc.
509
510
BESSEL'S EQUATION AND BESSEL FUNCTIONS
11.1 BESSEL'S EQUATION AND I T S SERIES SOLUTION Bessel's equation is defined as 2
Ym x 2d - dx2
dym
+ X - dx
2
+ (x
2
-
m )ym = 0, x
2 0,
(11.1)
where the range of the independent variable could be taken as the entire real axis or even the entire complex plane. At this point we restrict m t o positive and real values. Since x = 0 is a regular singular point, we can try a series solution of the form 00
(11.2) k=O
with the derivatives 03
(11.3) k=O 00
=E c k ( k
Y77 " l
+r)(k+T - 1)~"+'-~.
(11.4)
k=O
Substituting these into the Bessel's equation we write 03
00
k=O
k=O
k=O
(11.5) k=O which can be arranged as 00
+
x [ ( k r)(k
+r
-
1)
+ (k +r )
k=O
We now let k 03
+ 2 = k'
-
+
03
m 2 ] ~ k x k + rc ~ k=O
k
x
=~0.
+ (11.6) ~ ~
in the second sum and drop primes to write 03
+
c ( ( k T ) ( k 4-T - 1 )
+ ( k + r ) - m 2 ] ~ k x k ++r ~
C
k
-
2
~= ' 0.~
~(11.7)
k=2
k=O
Writing the first two terms of the first series explicitly, we can have both sums starting from k = 2 , thus
+ +
(r2- m2)cOzT [(r 1)2- m2]c1xr+1 00
+ C [ ( ( k+ k=2
T)2 - m2)Ck
+
Ck-2]Zk+T
= 0.
(11.8)
~
BESSEL'S EQUATION AND ITS SERIES SOLUTION
511
Equating coefficients of the equal powers of x t o zero, we obtain
(r2- m2)co= 0, co
+
[(r
+
-
[ ( k r ) 2- m2]c k
# 0,
(11.9)
m2]c1= 0,
+ ck-2
(11.10)
= 0, k = 2 , 3 , . . .
.
(11.11)
The first equation is the indicia1 equation, the solution of which gives the values of r as
r = f m , m > 0. For the time being, we take r gives
=
(11.12)
m, hence the second equation [Eq. ( l l . l O ) ]
[ ( m+ 1)2 - m 2 ] q= 0, (am 1 ) C l = 0 ,
(11.13) (11.14)
+
which determines c1 as zero. Finally, the third equation [Eq. (11.11)] gives the recursion relation for the remaining coefficients as ck-2
[(lc Now, with r
=
+ ?-)2 - m2]' k = 2 , 3 , . . . .
(11.15)
m, all the nonzero coefficients become
co # 0, c2
=
c4 =
(11.16)
+
CO
(11.17)
( m 2)2 - m2 '
+ 2)2
[(m
CO -
+
m2][(m 4)2 - m2]'
A similar procedure gives the series solution for r
=
-m; hence for r
(11.18)
=m
we
can write
=
C C Z ~ X ~ m~ +> ~0. ,
(11.20)
k=O
To check convergence, we write the general term as (11.21)
512
BESSEL'S EQUATION AND BESSEL FUNCTIONS
From the limit
c2 k
= lim k-oo
=
(11.23)
c2 ( k- 1) X2
lim
(2k
k-oo
"1
= lim
+ m)2
4k2
k-x
+o<
(11.24)
m2
-
1,
(1 1.25)
we conclude that the series converges on the real axis for T = +m. A parallel argument leads to the same conclusion for T = -m. To obtain a general expression for the kth term of this series, we use the recursion relation [Eq. (11.15)] with T = m to write c2k
=-
Ca(k-1)
=-
CZ(k-2) =
c2(k-1)
22k(k
(11.26)
+ m)' C2(k-2)
(11.27)
22 ( k - l ) ( k + m - l ) ' C2 ( k -3 )
- 2
2 ( k - 2)(k
(11.28)
+ m - 2) '
Substitution of Equation (11.27) into Equation (11.26) gives
(11.29) Next we substitute Equation (11.28) into the above equation t o obtain czk = (-113
c2(k - 3)
222222k(k- 1 ) ( k - 2)(k
+ m)(k+ m - l ) ( k + m - 2) '
(11.30)
where a trend begins to appear. After s iterations we obtain C2k
= (-1)'
c2 ( k -s )
22Sk(k- I ) . . . ( k - s + l ) ( k + m ) ( k + r n -
1)...(k+m-s+1)
(1 1.31) arid after k iterations we hit C2k
CO:
CO
= (-1)k 22kk(k
-
1). . . 2 . l ( k + m ) ( k
+m
-
+
1) . . . ( m 1) '
(11.32)
Dividing and multiplying by m ! ,we can write this as (11.33)
BESSEL'S EQUATION AND ITS SERIES SOLUTION
513
Now the series solution [Eq. (11.20)] becomes
(11.34) Bessel function of order m is defined by setting
c 0 2 ~ m= ! 1
(11.35)
as (11.36)
J1,(z) are also called the Bessel function of the first kind. Series expressions of the first two Bessel functions are (11.37) (11.38)
Bessel functions of higher order can be expressed in terms of JOand 51: We first multiply Equation (11.36) by xm and then differentiate with respect to x to write
d d " z[xrnJrnI = k=O
(x/2)2k+2m
zE(-1)2-mk!(k + m)! k
(11.39) (11.40)
00
=C(-l)k k=O
= xm
[();
+
(2k 2m)x2k+2rn-1 22k+mk!(k+ m)! m-1
(11.41)
O0
C(-l)k
k=O
k ! ( k+ m
= xmJm-l.
-
I)!
(11.42) (11.43)
Writing the left-hand side explicitly,
m x m - l J m + x m J ~= x m J m - l ,
(11.44)
we obtain
(11.45)
514
BESSEL'S EQUATION AND BESSEL FUNCTIONS
Similarly, multiplying Equation (11.36) by xPm and then differentiating with respect to x, we obtain another relation:
m --Jm X
+ JL = -Jm+l.
( 11.46)
Subtracting Equation (11.46) from (11.45) gives 2m -Jm
+ Jm+1,m = 1 , 2 , . . . ,
= Jm-l
X
(11.47)
while their sum yields 2JL
= Jm-l- Jm+l, m=
1 , 2 , .. . .
(11.48)
Repeated applications of these formulas allow us to find J , and J A in terms of JO and J1. For m = 0, Equation (11.48) is replaced by
Jh
(11.49)
= -J1.
11.1.1 Bessel Functions J * m ( z ) , N,(z), and
HZ3')(z)
The series solution of the Bessel's equation can be extended to all positive values of m through the use of the gamma function as (See Problem 11.6.) O0
J7n(x) =
k=O
(-1)k
k ! r ' ( m+ k
+ 1)
( 2 )m+2k , 2
(11.50)
which is called the Bessel function of the first kind of order m. A second solution can be written as (11.51) However, the second solution is independent of the first solution only for the noninteger values of m. For the integer values, m = n, we multiply J-, by ( - l ) - " and write (-l)-'"-n(x)
= (;)-n
(;),
03
(;)-'"z(-l)k-n k=n
=
(;)"
?(-I) k=n
k-n
(x/2)2k k ! ( k - n)!
(11.52)
2(k--n)
k ! ( k - n)! '
(11.53)
We now let k - n + k' and then drop primes to see that J-, and Jn are proportional to each other as
J-,(x)= ( - l ) n J n ( ~ ) .
(11.54)
BESSEL’S EQUATION AND ITS SERIES SOLUTION
-0.5
515
-
-1 Figure 11.1
Bessel functions, JO and NO.
When m takes integer values the second and linearly independent solution can be taken as
Nm(x)=
cosmnJ,(x) - L m ( z ) ’ sin m.rr
(11.55)
which is called the Neumann function or the Bessel function of the second kind. Note that N,(x) and Jm(x) are linearly independent even for the noninteger values of m. Hence it is common practice t o take N,(z) and Jn, (x)as the two linearly independent solutions for all m (Fig. 11.1). Since N, is indeterminate for the integer values of m, we can find the series expression of N , by considering the limit m + n (integer) and the L’HGpital’s rule:
dm [cosm.rrJ,(x) - J-,(x)] m 8, [sin mn]
N T 1 ( z= ) lim N m ( x ) = lim m
m-n
1
=-
lim
T m-n
[$-(-1)”-
(11.56) (11.57)
dm
We first evaluate the derivative 8, J , as 8Jm = ~
dm
();
~ ~ln( x )-
();
m co
C ( - l ) k * ( m + k + 1) k!r(m+k + 1 ) k=O
(2)2k, (11.58) 2
where we used the definition (11.59)
516
BESSEL'S EQUATION AND BESSEL FUNCTIONS
Some of the properties of Q(z) are
*(I) = -7,
(11.60)
where y = 0.57721566 is the Euler constant and
S(m+ 1) = -7
+ 1 + -21 + . . . + -,m1
m = 1 , 2 , 3 , .. . .
(11.61)
+ n + l)] .
(11.62)
We can now write the limit
00
-
Q(k
Similarly, we write the derivative 00
dJ-, dm
[-In k=O
2
+ S(k
-
m
+ l)].
(11.63)
In the limit as m t n (integer), for the first n terms, k = 0,1,. . . , n- 1, r(k - m + 1) and Q ( k - m + 1) are infinite since their arguments are zero or negative integer (Prob. 11.6): lim m-n
{
r(lc-m+i)
03,
, k = O , l , . . . , (n-1),
9(k-m+1)
(11.64)
00.
4
However, using the well-known results from the theory of gamma functions:
r(z)r(iQ(1-
7r
=-
sin T X '
2) - *(2) = 7rCOtT2,
their ratio is constant: lim
m-n
Q ( k - 772
+ 1) =
r(k - m + 1)
lim
m-n
[qm
- k ) sinT(rn - k )
S ( m- k )
+
7r cot 7r(m-
k)
T
, (n- 1).
(11.65)
(42)n+2p + (-1y C(-1)" , [-In 2 + Q ( p + l)], P ! ( n + PI.
(11.66)
= (-1)"-'(n
-
k
-
l)!, k
= 0,1,. ..
We can now write
dJ-, dm
lim --
m-n
k=O co
p=o
k! 2
1
BESSEL’S EQUATION AND ITS SERIES SOLUTION
517
where we have defined a new dummy index in the second sum as (11.67)
p=k-n.
We now substitute the limits in Equations (11.62) and (11.66) into Equation (11.57) to find the series expression of N n ( z ) as
c(-l)k +
lc0
+?r
k=O
(2/2)n+2k
k)!
k!(n
[21n 5 2
-
*(k
+ 1)
-
Q(k
+ n + l)],
where n = 0 , 1 , . . . , and the first sum should be set to zero for n also write this as
= 0.
(11.68)
We can
Other linearly independent solutions of the Bessel’s equation are given as the Hankel functions, H z ) ( s ) ,H g ) ( x ) ,which are defined as
and
H g ’ ( 2 ) = J m ( 2 ) - iNm(2).
(11.71)
Hankel functions are also called the Bessel functions of the third kind. The motivation for introducing H g ’ 2 ) ( z is ) that they have very simple asymptotic expressions for large 2 , which makes them very useful in applications. In the limit as ~ 7 :4 0, the Bessel function, J m ( z ) ,is finite for rn 2 0 and behaves as lim Jm(x) +
2-0
All the other functions diverge as
r(m+i)
(”>,2
(11.72)
518
BESSEL'S EQUATION AND BESSEL FUNCTIONS
(11.74) (11.75) (11.76) (11.77) In the limit as x + 00, the Bessel functions, J m ( z ) , N,(z), H ,(1)(x),and H,,, ( 2 ) (z), with m > 0, behave as (11.78)
(11.79)
We remind the reader that m
11.1.2
2 0 in these equations.
Recursion Relations
Using the series definition of Bessel functions, we have obtained the following recursion relations [Eqs. (11.47) and (11.48)]:
2m JTn-l(x) J m + l ( z ) = - J m ( X ) ,
+
m = 1 , 2 , .. .
(11.82)
Jm+l(z) = ~ J ; ( x ) , m = 1 , 2 , . . . .
(11.83)
X
and
Jm-l(x)
-
BESSEL'S EQUATION AND ITS SERIES SOLUTION
519
First by adding and then by subtracting these equations, we obtain two more relations:
(11.84) and
m Jm+l(x) = -Jm(x) - J ~ ( x ) X
- -xm
d
[x-" Jm (41. dx
(11.85)
Other Bessel functions, N,, H;'), and Hi'), satisfy the same recursion relations. 11.1.3
Generating Function
Similar t o the other special functions, we look for a generating function, g ( x , t ) , which produces Bessel functions as
(11.86) m=--00
Multiply the relation
(11.87) with tm and sum over m to write
c m
c
03
m=-m
?. u
m=--00
-00
t
z tmJm+l=-
tmJm-l+
tm-'Jm-l+t
m=--03
l
c m
tm+'Jm+l
c c
2 O "
=-
mtmJm,
(11.88)
mtmJm.
(11.89)
m=-m
m=-m
m=--00
We first substitute m - 1 = m' into the first sum and m second and then drop primes to get
c m
t
m=-m
l o o
tmJm+t
m=-m
tg
+ 1 = m"
2t dg
tmJm=--, x at
+ -g1t
(I+ ;) g
2tdg
=
--,
=
--&.
2
at
2t dg
into the
( 11.90) (11.91)
(11.92)
520
BESSEL'S EQUATION AND BESSEL FUNCTIONS
This can be reduced to the quadrature (11.93)
to yield the generating function g(x,t ) as (11.94)
where @(x) is a function to be determined. We now write the above equation as d X >t ) = 4 ( z > ex t / 2 e - x / 2 t
(11.95)
and substitute series expressions for the exponentials to get
(11.97) Since (11.98) m=-w
we can write
(11.99) = ."
+ 1t
-J-1
+ Jo + t J 1 + . . . .
(11.100)
To extract 4(z),it is sufficient to look at the simplest term, that is, the coefficient of to. In the double sum on the left-hand side this means taking only the terms with m = n. Hence,
(11.101) The quantity inside the square brackets is nothing but Jo, thus determining 4(x) as 1. We can now write the generating function as exp
[(f) (1- :)]
00
=
C m=--00
tmJm.
(11.102)
BESSEL'S EQUATION AND ITS SERIES SOLUTION
11.1.4
521
Integral Definitions
Since the generating function definition works only for integer orders, integral definitions are developed for arbitrary orders. Among the most commonly used integral definitions we have
and
J m ( x )=
(x/2)m
J;;r(m+;)
J" dt (1 -1
- t2)m-tcosxt,
1 ( m > --). 2
(11.104)
For the integer values of m the second term in the first definition [Eq. (11.103)] vanishes, thus leaving cos [mp - z s i n p ] dp, m = 0, f l ,f 2 , . . .
,
(11.105)
which can be proven by using the generating function (Prob. 11.14). We prove the second definition [Eq. (11.104)], which is due to Poisson, by using the integral represent.ation of the gamma function (Prob. 11.6):
(11.106)
which, when substituted into the definition of the Bessel function [Eq. (11.50)]:
(11.107) gives
Since the series converges absolutely, we can interchange the summation and the integral signs to write Jm(X)
=
1
. (11.109)
Finally, using the so-called duplication formula of the gamma functions,
(11.110)
522
BESSEL’S EQUATION AND BESSEL FUNCTIONS
where I?( l / 2 ) = fi,we obtain the desired integral representation:
11.1.5 Linear Independence of Bessel Functions Two functions, u1 and u2, are linearly independent if and only if their Wronskian, that is, the determinant
(11.112) does not vanish identically. Let us and u2 be two solutions of Bessel’s equation, hence
(11.113) and
4x4)
dx
+ (x
-
m2 -)u2 X
= 0.
(11.114)
We first multiply Equation (11.113) by u2 and Equation (11.114) by us and then subtract to write
d d us-(xu;) - u2-(zu:) = 0, dx dx d -[x(u1u; - u2u;)] = 0,
dx
d
-[[.W(x)] = 0. dx
(11.115) (11.116) (11.117)
Hence the Wronskian of two solutions of the Bessel’s equation is (11.118)
where C is a constant. If we take US
= J , and u2 = J-rn,
(11.119)
C can be calculated by the limit C = lim x W [ J r n,Lm], x-+o
(11.120)
BESSEL’S EQUATION AND ITS SERIES SOLUTION
523
where m is a noninteger. Using the asymptotic expansion of J m ( x ) [Eq. (11.72)] and the following properties of the gamma function, namely
(11.121) we can determine C as -2m C = S+O lim r ( l + m ) r ( l - m) [I+ 0(x2)1 -
2 sin rnn -
7T
(11.122)
>
thus obtaining
W [ J mJ-,] , =-
2 sin m.ir T X
(11.123)
Similarly, we can show the following Wronskians:
W[J,,Hp]
= -2i TX’
(11.124) (11.125) (11.126)
This establishes the linear independence of the functions J,, H g ) . Hence the solution of the Bessel’s equation, X2
d2y, dy, dx2 +x- dx
+ ( x 2- m
2
) y m = 0,
N,, H g ) , and
(11.127)
can be written in any one of the following ways:
(11.128) (11.129) (11.130) (11.131)
where a l , a2,. . . , d2 are constants to be determined from the initial conditions.
11.1.6 Modified Bessel Functions 1, (z) and K , (z) If we take the arguments of the Bessel functions, J m ( x ) and H ~ ) ( x as ) , imaginary, we obtain the modified Bessel functions (Figs. 11.2 and 11.3) (11.132)
524
BESSEL'S EQUATION AND BESSEL FUNCTIONS
5-
43-
21
0.5
1
Figure 11.2
1.5
2
2.5
3
Modified Bessel functions 10 and 11.
Figure 11.3 Modified Bessel functions KOand
K1.
and Tim+l
K,(J:) = -H$(iII:). 2
(11.133)
These functions are linearly independent solutions of the differential equation
(11.134) Their
J: -+
0 and
II:
403
limits are given as (real m
2 0) (11.135)
525
BESSEL’S EQUATION AND ITS SERIES SOLUTION
Figure 11.4
Spherical Bessel functions j o ,
-
[In
(z) +.I,
jl
and j z .
m = 0, (11.136)
2-0
,
2
m#O
and lim I m ( z )
-j
-
(11.137)
2-00
(11.138)
11.1.7 Spherical Bessel Functions j l ( x ) ,n l ( x ) ,and
hi1’2)(2)
Bessel functions with half-integer orders often appear in applications with the factor
6;
- so that they are given a special name. Spherical Bessel functions,
jl(x),nl(x),and t ~ ! ” ~ ) (are z )defined , as (Figs. 11.4 and 11.5) jl(,)
=
@l++(x),
m ( x ) = @l+t(z), h!1)2)(z) =
(11.139)
526
BESSEL’S EQUATION AND BESSEL FUNCTIONS
where 1 = 0, f l ,f 2 , . . . . Bessel functions with half-integer indices, Jz+;(x) and Nl+; ( x ) ,satisfy the differential equation
while the spherical Bessel functions, j l ( x ) ,~ ( xand ) h!””)(x)satisfy
(11.141) Series expansions of the spherical Bessel functions are given as cc
(-l)n(l+ n)!
(11.142)
22n,
n=O
(11.143) where the first equation can be derived using Equation (11.50), the duplication formula [Eq. (1l.llO)l: 22zf2”(1 n)!r(l n l / 2 ) = @(a1 2n)! and r(n 1) = nr(n).Spherical Bessel functions can also be defined as
+ +
+
+
+
(11.144) nz(x) = ( - X y
(q(---)
cos x
(11.145)
x dx
The first two spherical Bessel functions are cos x n,(x) = --, X cosx cosx sinx -7 j l ( X ) = -- -, n1(x) =
jo(.)
sin x .x sinx
= -,
22
X
(11.146)
X
and
h, h,(( 11 ))(( xx ))= = -i-, -i-, h,( 1 ) ( x )=eta:
eax
e-ix
h,( 2 ) ( x )= i-, h, ( x )= i-,
.T X
[-;
1 -
$1,
.T X
(11.147) (2)
h, ( x )=e-i“
Using the recursion relations given in Equations (11.84) and (11.85), one can easily obtain the recursion relations
-[d
xz+l 7Jz(x)1= xZ+17Jn-1(x), dx d -“Lc-zYl(x)l = -2- Z Y n + l ( X ) , dx
(11.148) (11.149)
ORTHOGONALITY AND THE ROOTS OF BESSEL FUNCTIONS
527
“0
L
0.2,
-0.2. -0.4, -0.6.
Figure 11.5
Spherical Bessel functions 720,721 and
722.
where yl stands for any of the spherical Bessel functions, j l , n l , or f ~ ! ” ~ ) . Asymptotic forms of the spherical Bessel functions are given as
(21
jl(,)
i
n1(x)
+
-
22 +X1 l)!! (’- 2(2I+ 3) +...) , x <
(21 - l)!!
X2
2(1 - 21)
+...),
1,
(11.150)
x < 1,
(11.151)
1 In j l ( x ) + - sin(x - -), x >> 1, X 2 1 1T m(x) -+ -- cos(x - T ) , x >> 1,
(11.152) (11.153)
X
+
where the double factorial is defined as (21+ l)!! = (21 1)(21- l ) ( 2 l - 3) . . . 5 . 3 . 1 = (21 1)!/2l1!.
+
11.2 ORTHOGONALITY AND T H E ROOTS OF BESSEL FUNCTIONS Bessel functions also satisfy an orthogonality relation. Unlike the Legendre, Hermite, or Laguerre polynomials, their orthogonality is not with respect to their order but with respect to a parameter in their argument. We now write the Bessel equation with respect to a new argument, kx, and replace x with kx in Equation (11.1) t o write d2J , (kx) dx2
+ -x1 d J ,dx(kx)
(11.154)
where k is a new parameter. For another parameter, 1, we write
d2J ,
1 dJ, (12) + -x dx
(22)
dx2
(11.155)
528
BESSEL'S EQUATION AND BESSEL FUNCTIONS
Multiply Equation (11.154) by x J m ( l x ) and Equation (11.155) by x J m ( k x ) and then subtract to obtain
d -[x ( J , ( k x ) J & ( l ~ )- J k ( k x ) J , ( l x ) ) ] = ( k 2 - l 2 ) ~ J m ( k ~ ) J m ( l ~ ) . dx (11.156) In physical applications the range is usually [O,a],hence we integrate both sides from zero to a to write
( 11.157) and substitute the definitions
k = -X m i l = - X m j which gives
(xfni- x i j )
(5)La
= a [J,, (?a)
xJ,
(y
J A (%a)
(11.158)
U
U
x ) J,
-
(y
J A (%a)
x ) dx
.
J , (?a)]
(11.159)
In most physical applications the boundary conditions imposed on the system are such that the right-hand side vanishes, that is,
J,,l (?a)
J L (?a)
=
JA (%a)
Along with the above condition, when x,i
La
J , (%a)
# x,j,
.
(11.160)
Equation (11.159) implies
x J , ( y x ) J , ( y x ) d x = 0.
(11.161)
To complete the orthogonality relation, we have to evaluate the integral (11.162) for x,,,i = x m j , that is, i
=j.
Calling Xmi
-x
U
we write
= t,
(11.163)
Iii = I as I
=
=
LaxJL
(yx)
dx,
(5)' LXmi
t J & ( t ) dt.
(11.164)
ORTHOGONALITY AND THE ROOTS OF BESSEL FUNCTIONS
529
When we integrate Equation (11.164) by parts with the definitions
u = J A ( t ) and dv = t d t ,
(11.165)
we obtain (11.166)
Using the Bessel’s equation [Eq. (11.l)]:
t 2 J m ( t ) = m2Jm(t)- tJh(t)- t2J”(t),
(11.167)
we now write Equation (11.166) as XW,i
J A [ m2 Jm
-
t J 2 - t2J$] d t ,
(11.168)
which can be simplified as
(11.169)
(11.170) 1
-
-m2J$(t) 2
+
(11.171)
For m > -1, the right-hand side vanishes for impose the boundary condition
2 =
0. For
2 =
a we usually
or
Jm (xmi)= 0.
(11.172)
In other words, x,,i are the roots of J m ( z ) . From the asymptotic form [Eq. (11.78)] of the Bessel function, it is clear that it has infinitely many distinct positive roots (Fig. 11.6).
J ? n ( ~ m i=) 0,
i = 1 , 2 , 3 , .. . ,
(11.173)
where z,i stands for the i t h root of the mth-order Bessel function. When m takes integer values, the first three roots are given as m =0 m =z 1 7n = 2
2.405
502
= 3.832 = 5.136
512
201 = 211 221
222
= 5.520 = 7.016 = 8.417
203
= 8.654
513
=
223
10.173 = 11.620
... ... ...
(1I.174)
530
BESSEL'S EQUATION AND BESSEL FUNCTIONS
Figure 11.6
Roots of the Jo, J1, and J Z functions.
Higher-order roots are approximately given by the formula (11.175)
We now write Equation (11.171) as (11.176) which, after using with Equation (11.164), gives (11.177) Using the recursion relation [Eq. (11.85)] m
Jm+l = -Jm X
and the boundary condition J,(x,i)
-
JA
(11.178)
= 0, we can also write this as
I a x J i ( T x ) dx = ?a 1 2 Jm+l(~mi). 2
(11.179)
Orthogonality relation of the Bessel functions can now be given as
laxJm(Yx) (Yx) J,
dx = ~a2J ~2 + ~ ( x ~ i )m& 2 j ,-1.
(11.180)
Since Bessel functions form a complete set, any sufficiently smooth function, f(z),in the interval X E P,aI
(11.181)
ORTHOGONALITY AND THE ROOTS OF BESSEL FUNCTIONS
531
can be expanded as (11.182) where the expansion coefficients, A,i,
are found from (11.183)
Series expansions [Eq. (11.182)] with the coefficients calculated using Equation (11.183) are called the Fourier-Bessel series. 11.2.1
Expansion Theorem
If a given real function, f ( x ) , is piecewise continuous in the interval (0, a ) and of bounded variation in every subinterval [ a l ,Q], where 0 < a1 < a2 < a and if the integral (11.184)
is finite, then the Fourier-Bessel series converges to f(x) at every point of continuity. At points of discontinuity the series converges to the mean of the right and left limits at that point: (11.185)
For the proof and definition of bounded variation we refer to Watson (p. 591) and Titchmarsh (p. 355). However, bounded variation basically means that the displacement in the y direction, Sy, as we move along the graph of the function is finite. 11.2.2
Boundary Conditions for the Bessel Functions
For the roots given in Equation (11.174) we have used the Dirichlet boundary condition, that is,
J,(ka) = 0,
(11.186)
which gives us the roots
k
- xmi
(11.187)
{ ~ ~ ( y x )i =} 1 ,, 2 , . . . , m > o ,
(11.188)
a
Now the set of functions
532
BESSEL'S EQUATION AND BESSEL FUNCTIONS
form a complete and an orthogonal set with respect t o the index i in the interval [0,u ] . The same conclusion holds for the Neumann boundary condition defined as dJm (xmix/u) dx
I
= 0.
(11.189)
x=a
Similarly, the general boundary condition is written as
In the case of general boundary conditions, using Equations (11.161) and (11.171), the orthogonality condition becomes
I a x J m ( y x ) Jm ( y x ) dx xmi # xmj,
0,
(11.191)
Now the expansion coefficients in the series [Eq. (11.182)] 00
f ( x )=
C A , ~J , i=l
(%x) U
,
m
2 -1,
(11.192)
become
In terms of the Bessel function, J n ( k z ) , Neumann and the general boundary conditions are written, respectively, as
lo.Jdk
(11.194)
dx
x=ka
=O
and
+
A o J n ( x ) Bok-
(11.195)
For the Neumann boundary condition [Eq. ( l l . l 8 9 ) ] there exist infinitely many roots, which can be read from existing tables. For the general boundary condition, roots depend on the values that A0 and Bo take. Thus each case
ORTHOGONALITY AND THE ROOTS OF BESSEL FUNCTIONS
533
must be handled separately by numerical analysis. From all three types of boundary conditions we obtain a complete and an orthogonal set as (11.196)
so"
Example 11.1. Evaluation of e-kxJg(lx)lk , 1 > 0 : To evaluate this integral we make use of the integral representation in Equation (11.104). Replacing Jo with its integral representation we obtain
1"
e-"JO(lx) dx
=
=
r2 lT'21
I" 27r
dx e-kx-
(11.197)
dz eCkz cos [lzsin p]
(11.198)
00
dp
= 7r -
cos [lzsinp] d p
7r
k dp k2 + l 2 sin2 p
(11.199)
k , l > 0.
(11.200)
1
d m l
so"
Since the integral e-kzJO(lz) dx is convergent, we have interchanged the order of the p and z integrals in Equation (11.198). Example 11.2. Evaluate e - k 2 x 2Jm(lz)zmtldx: This is also called the Weber integral, where k , 1 > 0 and m > -1. We use the series representation of J , [Eq. (11.50)] t o write
I"
e-kzz2Jm(lz)zm+1 dx
(11.204) -
I" (2Ic2)m+l
-12/4k2,
k , l > 0 and m > -1.
(11.205)
Since the sum converges absolutely, we have interchanged the summation and the integration signs and defined a new variable, t = k 2 x 2 , in Equation (11.203).
534
BESSEL’S EQUATION AND BESSEL FUNCTIONS
Example 11.3. Crane problem: We now consider small oscillations of a mass raised (or lowered) by a crane with uniform velocity. Equation of motion is given by d --(mL20) mgl sin 6 = 0, (11.206) dt where 1 is the length of the cable and m is the mass raised. For small oscillations we can take
+
sine
21
8.
( 11.207)
For a crane operator changing the length of the cable with uniform velocity, VO, we write dl dt
- = VO
(11.208)
and the equation of motion becomes 19
+ 2voe + g o
= 0.
(11.209)
We now switch to 1 as our independent variable. Using the derivatives (11.210)
(11.211) we can write the equation of motion in terms of 1 as d2Q
g + 21 ddl9 + +(l) lv,
- -d12
= 0.
(11.212)
In applications we usually encounter differential equations of the form
]
a2 - p2c2 2
52
y(x) = 0,
(11.213)
solutions of which can be expressed in terms of Bessel functions as
Y(X)= za [A,J,(bz“)
+ AINp(bxC)].
( 11.2 14)
Applying to our case, we identify 1 - 2a 2
2 2
a -pc
2,
(11.215)
=0,
(11.216)
=
(11.217) (11.218)
PROBLEMS
535
which gives (11.219)
We can now write the general solution of Equation (11.212) as
Time-dependent solution, Q ( t )is, obtained with the substitution
l ( t ) = lo + vot.
(11.221)
PROBLEMS 1. Drive the recursion relations
2m JnL-l(x) Jm+l(x)= -Jm(x),
+
m = I , & . ..
X
and J ~ - ~ ( X-) J ~ ~ + ~ (=X2)J k ( x ) , m =
1,2,. . . .
Use the first equation to express a Bessel function of arbitrary order ( m = 0 , 1 , 2 , . . . ) in terms of J o ( x ) and J ~ ( x )Also . show that for m = 0 the second equation is replaced by Jb(X) = -J1(x). 2. Derive Equation (11.58):
and Equation (11.63):
3. Verify the following Wronskians: W [ J m ,H E ' ] =
22 --
7rX
,
4i
w [ H : ) , H g ) ] = --
n-2
2
W [ J m , N m ]= E'
,
536
BESSEL'S EQUATION AND BESSEL FUNCTIONS
4. Find the constant, C , in the Wronskian
C w [&n(x),Km(z)l= ---. X 5. Verify the Wronskian W [ & , Lm] =-
2 sin rnr 7lX
6. Gamma function: To extend the definition of factorial t o noninteger values, we can use the integral
where r(z) is called the gamma function. Using integration by parts, show that for x 2 1
Use the integral definition to establish that r(1)= 1 and then show that when n is a positive integer 1) = n!. Because of the divergence at x = 0, the integral definition does not work for x 5 -1. However, definition of the gamma function can be extended to negative values of x by writing above formula as
r(n+
1 qX) = -qX + I), X provided that for negative integers we define
1 r(-n)
= 0,
n is integer.
Using these first, show that
r(-1/2)= J;; and then find the value of r(-3/2). Also evaluate the following values of the gamma function:
7. Evaluate the integral d z , a, b
3
> 0, 2n + - > m > -1. 2
PROBLEMS
537
Hint: Use the substitution
8. For the integer values of n prove that A-,(z) = ( - l y N , ( X ) .
9. Use the generating function
n=-cc
to prove the relations
Jn(-x)
=
(-l)nJn(x)
and
which is also known as the addition formula of Bessel functions.
10. Prove the formula
Jn(x) = (-1)CX"
('")" x dx
Jo(z)
by induction, that is, assume it to be true for n = N and then show for Nfl. 11. Derive the formula eizcose -
C
imJm(.z)eime,
m=-cc
where z is a point in the complex plane. This is called the Jacobi-Anger expansion and gives a plane wave in terms of cylindrical waves.
12. In Fraunhofer diffraction from a circular aperture we encounter the integral
I
- la 12x r dr
d0 eibr'OS
',
538
BESSEL'S EQUATION AND BESSEL FUNCTIONS
where a and b are constants depending on the physical parameters of the problem. Using the integral definition
1LT
cos [mcp- xsincp] dcp,
Jm(x)= first show that
I
-
27r
m = 0, f l ,f 2 , . . . ,
La
Jo(br)r d r
and then integrate to find
I = (27ra/b)Ji(ab). Hint: Using the recursion relation:
m -Jm X
+ J A = Jm-l,
first prove that
d dx
-[zmJm ( x ) ]= x m Jm-l ( x ) .
Also note that a similar equation,
can be obtained via
m --Jm X
+ J A = -Jm+l.
13. Prove the following integral definition of the Bessel functions:
14. Using the generating function definition of Bessel functions, show the integral representation
Jm(x)= 15. Prove
i*
cos[mcp - x sin cpldcp, m = 0 , fl, f2,..
PROBLEMS
539
where n is positive integer. 16. Show that
17. Show that the spherical Bessel functions satisfy the differential equation
18. Spherical Bessel functions, jl(x) and n~(x),can also be defined as
Q(X) =
(-x)l
(:$
(---) cos x
Prove these formulas by induction 19. Using series representations of the Bessel functions, y 7 n and fl ,.L, show that the series representations of the spherical Bessel functions, j l and n ~are , given as
c 00
jl(X) = 2lX1
n=O
( - l y ( l + n)! X2n, n!(2n 21 l)!
+ +
n=O
(-l)"(n - l ) ! zn X n!(2n - 21)!
Hint: Use the duplication formula: 22kr(k
+ i ) r ( k + 1/2) = r ( i / 2 ) r ( 2 l c + 1) = J;;(2k)!.
20. Using the recursion relations given in Equations (11.84) and (11.85), J"-l(X)
and
= x-"-
d [X"J"(X)] dx
540
BESSEL'S EQUATION AND BESSEL FUNCTIONS
show that spherical Bessel functions satisfy the following recursion relations:
where yl stands for anyone of the spherical Bessel functions, j l , nl, or hj1,2) 21. Show that the solution of the general equation
can be expressed in terms of the Bessel functions as ~ ( 2= )
[AoJ,(bz") + AINp(bxC)].
CHAPTER 12
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
The majority of the differential equations of physics and engineering are partial differential equations. The Laplace equation,
T%(T+)= 0,
(12.1)
which plays a central role in potential theory, is used in electrostatics, magnetostatics, and stationary flow problems. Diffusion and flow or transfer problems are commonly described by the equation 1 N(?.’,t)
T h ( T + t, ) - -
a2
at
= 0,
(12.2)
where o is a physical constant depending on the characteristics of the environment. The wave equation,
V%(?.’,t)
1 #Q(?.’,t)
-
v2
at2
= 0,
(12.3)
where ‘u stands for the wave velocity, is used to study wave phenomena in many different branches of science and engineering. The Helmholtz equation, Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
s. SelGuk Bayin
541
542
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
TPQ(7) + k ; Q ( 7 ) = 0,
(12.4)
is encountered in the study of waves and oscillations. Nonhomogeneous versions of these equations, where the right-hand side is nonzero, are also frequently encountered. In general, the nonhomogeneous term represents sources, sinks, or interactions that may be present. In quantum mechanics, the timeindependent Schrodinger equation is written as
(12.5) while the time-dependent Schrodinger equation is given as tl2
--PQ(7,t) 2m
+ V(?)Q(?,t)
= itl
as(?, t )
(12.6)
dt
Partial differential equations are in general more difficult to solve. Integral transforms and Green’s functions are among the most commonly used techniques to find analytic solutions (Bayin). However, in many of the interesting cases it is possible to convert a partial differential equation t o a set of ordinary differential equations by the method of separation of variables. The majority of the partial differential equations of physics and engineering can be written as a special case of the general equation:
VZQ(7,t) +KQ(7,t)
=a
d Z Q ( 7 , t )+,dQ(7,t)
’
at
at2
(12.7)
where a and b are usually constants but K could be a function of ?. In this chapter we discuss treatment of this general equation by the method of separation of variables in Cartesian, spherical and cylindrical coordinates. Our results can be adopted to specific cases by an appropriate choice of the parameters K , a and b.
12.1 SEPARATION OF VARIABLES IN CARTESIAN COORDINATES In Cartesian coordinates we start by separating the time variable in Equation (12.7) by the substitution Q ( 7 , t )= F(?)T(t)
(12.8)
and write
T(t)g2F(?)
:: Z]
+ r ; F ( 7 ) T ( t )= F ( 7 ) [a-
+ b-
,
(12.9)
543
SEPARATION OF VARIABLES IN CARTESIAN COORDINATES
where we take
K
as a constant. Dividing both sides by F(?)T(t) gives
1
(12.10)
where the left-hand side is only a function of 7and the right-hand side is only a function o f t . Since T and t are independent variables, the only way this equation can be true for all 7and t is when both sides are equal to the same constant. Calling this constant - k 2 , we obtain two equations: (12.11) and (12.12) The choice of a minus sign in front of k2 is arbitrary. In some problems, boundary conditions may require a plus sign if we want to keep k as a real parameter. In Cartesian coordinates the second equation is written as
We now separate the x variable by the substitution
and write
(12.15) which, after division by X ( x ) G ( y z, ) , becomes
X ( X ) dx2
1 G(y,z)
dz2 (12.16)
Similarly, the only way this equality can hold for all x and (y, z ) is when both sides are equal to the same constant, k:, which gives the equations
1 d2X(x) 2 -~ +k,=O
X ( x ) dx2
(12.17)
and
(12.18)
544
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
Finally, we separate the last equation by the substitution
which gives
Dividing by Y ( y ) Z ( z ) we , write 1 d2Y(y)-
Y ( y ) dy2
22z )
[%+
1
z(z)
(K
+ k2
-
]
k z ) Z(Z) ;
(12.21)
and by using the same argument used for the other variables, we set both sides equal t o the constant k i t o obtain
(12.22)
d2Z(z) + (6 dz2
+ k 2 - k; - k i ) Z ( z ) = 0.
(12.23)
In summary, after separating the variables, we have reduced the partial differential equation [Eq. (12.7)] t o four ordinary differential equations:
+ b-ddTt + k 2 T ( t )= 0 , d 2 X ( x )+ k 2 X ( X ) = 0 , dx2
d2T adt2
+ + k2 (K
-
(12.25)
+ k ; Y ( y ) = 0,
(12.26)
k; - k i ) z ( z ) = 0.
( 12.27)
d2Y(Y) dY2
d2Z(z) dz2
(12.24)
During this process, three constants, k , k,, and k,, which are called the separation constants, have entered into our equations. The final solution is now written as
Q ( T + , t )= T ( t ) X ( x ) Y ( y ) Z ( z ) .
(12.28)
12.1.1 Wave Equation One of the most frequently encountered partial differential equations of physics and engineering is the wave equation:
T’”Q(?;t,t)
-
1 82Q(?;t,t)
212
at2
= 0.
(12.29)
SEPARATION OF VARIABLES IN CARTESIAN COORDINATES
545
For its separable solutions we set 1
a=-,
V2
b=0, K=O,
(12.30)
where zi is the wave speed. Introducing w ,
w
=
k v , k2 = kq + k ; + k : ,
(12.31)
which stands for the angular frequency, we find the equations to be solved in Cartesian coordinates as
d2T dt2 d2X(x) dx2
-+ W 2 T ( t ) = 0 ,
+ k ; X ( z ) = 0, d2Y(y) + k;Y(y) = 0, dY2 d2Z(z) dz2
+ k:Z(z) = 0.
(12.32)
( 12.33)
( 12.34) (12.35)
All these equations are of the same type. If we concentrate on the first equation, the two linearly independent solutions are coswt and sinwt. Hence the general solution can be written as
T ( t )= a0 cos wt
+ a1 sin wt
or as
T ( t )= A c o s ( w t + 6), where ( a o , a l ) and ( A , 6) are arbitrary constants to be determined from the boundary conditions. In anticipation of applications to quantum mechanics, one can also take the two linearly independent solutions as e*Zwt. Now the solutions of Equations (12.32)-(12.35) can be conveniently combined to write
where
( 12.37) (12.38) These are called the plane wave solutions of the wave equation, and Q ( 7 , t ) corresponds to the superposition of two plane waves moving in opposite directions.
546
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
12.1.2 Laplace Equation In Equation (12.7) if we set K=O
(12.39)
and assume no time dependence, we obtain the Laplace equation
V’”Q(?”)= 0.
(12.40)
Since there is no time dependence, in Equation (12.11) we also set k = 0 and
T ( t )= 1, thus obtaining the equations to be solved for a separable solution as d2X(x) dx2 d2Y(y) dY2
d2Z(z) (kp dz2
--
+ kPX(X) = 0,
(12.41)
+ ICiY(y) = 0,
(12.42)
+ r$)Z(z) = 0,
(12.43)
Depending on the boundary conditions, solutions of these equations are given in terms of trigonometric or hyperbolic functions. Example 12.1. Laplace equation inside a rectangular region: If a problem has translational symmetry along one of the Cartesian axes, say the z-axis, then the solution is independent of z . Hence we solve the Laplace equat,ion in two dimensions: (12.44) solution of which consists of a family of curves in the xy-plane. Solutions in three dimensions are obtained by extending these curves along the z direction to form a family of surfaces. Consider a rectangular region (Fig. 12.1) defined by x E [O,aI, Y E [O,bI,
( 12.45)
and the boundary conditions given as Q(x, 0) = f(x), Q(x, b) = 0, Q(0,Y) = 0, Q ( a , y ) = 0.
In the general equation [Eq. (12.7)] we set
(12.46) (12.47) (12.48) (12.49)
547
SEPARATION OF VARIABLES IN CARTESIAN COORDINATES
I' I
I
Figure 12.1
Laplace equation inside a rectangular region.
a = b = 6 = 0.
(12.50)
No time dependence gives k = 0 and T ( t ) = 1. Since there is no z dependence, Equation (12.27) also gives kz kp = 0; hence we define
+
X2 = k2 = -k2
Y'
(12.51)
which gives the equations to be solved for a separable solution as
( 12.52) (12.53) Solutions of these equations can be written immediately as
X ( z ) = a0 sin X z + a1 cos Ax, Y (y) = bo sinh Xy + bl cosh Xy. Imposing the third boundary condition [Eq. (12.48)], we set which yields
X ( z ) = a0 sin Xz.
(12.54) (12.55) a1
= 0,
(12.56)
Using the last condition [Eq. (12.49)], we find the allowed values of X as nrr X n -- - , n = 1 , 2 ,.") ( 12.57)
a
548
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
which gives the solutions
nrx
X,(x) = a0 sin -, Y,(y)
n = 1 , 2 , .. . , nr nr
(12.58)
a
+ bl cosh -ya
= bo sinh -y
a
( 12.59)
Hence, the solution of the Laplace equation becomes
+ bl cosh -ya
1.
(12.60)
Without any loss of generality, we can also write this as
nr nr Qn(x,y) = A [sin T-51[ Bsinh -9 a
+ cosh -ya
(12.61)
We now impose the second condition [Eq. (12.47)] to write (12.62) and obtain B as cosh 7b sinh 7b '
B=-
(12.63)
Substituting this back into Equation (12.61), we write
[
XPn(x,y)= A sin-x
1[
y
sinh ( b - y) sinhFb
(12.64)
So far we have satisfied all the boundary conditions except the first one, that is, Equation (12.46). However, the solution set
{ x,(x)=
a0
nrx
sin -, a
n = 1,2,...
1,
( 12.65)
like the special functions we have seen in Chapters 10 and 11, forms an orthogonal set satisfying the orthogonality relation
la
[sin
7 1 y]dx [sin
=
(s)
.&t
(12.66)
Using these base solutions, we can express a general solution as the infinite series. 00
00
n.= 1
n=l
a
sinh
yb ( 12.67)
SEPARATION OF VARIABLES IN CARTESIAN COORDINATES
549
't
Figure 12.2
A different choice for the boundary conditions.
Note that the set
{ @ n ( x , y ) }> n = 1 , 2 , . . . ,
(12.68)
is also orthogonal. At this point, we suffice by saying that this series converges to S ( x ,y ) for any continuous and sufficiently smooth function. Since the above series is basically a Fourier series, we will be more specific about what is meant from sufficiently smooth when we introduce the (trigonometric) Fourier series in Chapter 13. We now impose the final boundary condition [Eq. (12.46)] to write (12.69)
To find the expansion coefficients we multiply the above equation by sin and integrate over [0, a] and then use the orthogonality relation [Eq. (12.66)l to obtain
yx
c, =
(:)
La
f(x) sin n r x dx. a
(12.70)
Since each term in the series (12.67) satisfies the homogeneous boundary conditions [Eqs. (12.47)-(12.49)], so does the sum. Now, let us consider a different set of boundary conditions (Fig. 12.2): (12.71) (12.72) (12.73) (12.74)
550
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
In this case the solution can be found by following similar steps:
I
c c,, ?I [ oc
~ ( xy) , =
sinh ? ( a - z) sinhyu '
[sin
n=l
where
C,
=
(12.75)
(f) lb
F ( y ) sin n T Y dy. b
(12.76)
Note that in this case the boundary conditions forces us to take X2 = -k:. = k i in Equation (12.51). Solution for the more general boundary conditions (Fig. 12.3) (12.77) (12.78) (12.79) (12.80) can now be written as the linear combination of the solutioris given in Equation (12.67) and (12.75) as
1c, sin a":sinh 00
~ ( zy ), =
n=l
[
b
1
+ F c n s i n y sinh y ( u - z) sinh y u n=l
'
where the coefficients are found as in Equations (12.70) and (12.76). Similarly, when all the boundary conditions are not homogeneous, the general solution is written as a superposition of all four cases.
12.1.3 Diffusion and Heat Flow Equations For the heat flow and diffusion problems, we need to solve the equation 1 aQ(T+,t)
TPQ(T+,t)- 2 Q
at
= 0,
(12.81)
which can be obtained from the general equation [Eq. (12.7)] by setting K=O,
a=0, andb=-
1
Q2.
(12.82)
551
SEPARATION OF VARIABLES IN CARTESIAN COORDINATES
'T
Figure 12.3
For more general boundary conditions.
Now the equations to be solved for a separable solution becomes 1 dT -+ k 2 T ( t )= 0,
a2 d t d2X(x) k 2 X ( Z ) = 0, dx2 d2Y(y) dY2
(12.83)
+
(12.84)
+ k i Y ( y ) = 0,
(12.85) (12.86)
dz2 In the last equation we have substituted
k 2 - k2 - k2 x y
= k2
2'
(12.87)
Solution of the first equation gives the time dependence as
T ( t )= T g e - k 2 a Z t ,
(12.88)
while the remaining equations have the solutions
+ + +
X ( x ) = a0 cos k x x a1 sin k,x, Y ( y ) = bo cos k,y bl sin k,y, Z ( z ) = co c o s k , ~ c1 sin k,z.
(12.89) (12.90) (12.91)
E x a m p l e 12.2. Heat transfer equation in a rectangular region: consider the one-dimensional problem
First
552
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
with the boundary conditions
(12.93) (12.94) (12.95) Using the time dependence given in Equation (12.88) and the Fourier series method of Example 12.1, we can write the general solution as
c 00
~ ( xt ), =
a, (sin E )e - ( n 2 r 2 a 2 / a 2 ) + U
n=l
(12.96)
where the boundary conditions determined k as k = nr/u,n = 1,2, . . . and the expansion coefficients are given as
a, =
(:)
L a f ( x ) s i n -n r x dx.
,
(12.97)
U
For the heat transfer equation in two dimensions,
d2Q(X,Y,t) dx2
+
d2Q(2,Y,t)- 1 dQ(X,Y,t) dY2 a 2 at '
(12.98)
the solution over the rectangular region
satisfying the boundary conditions
(12.100) (12.101) (12.102) (12.103) (12.104) can be written as the series
(12.105) The expansion coefficients are now obtained from the integral
):(
Am, =
I" 1"
1-
f ( x , y) [sin mrx [sin U
y]
dxdy.
(12.106)
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
12.2
553
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
In spherical coordinates, Equation (12.7) is written as
(12.107) where the ranges of the independent variables are (12.108) We first substitute a solution of the form
Q(?,t) = F(?)T(t)
(12.109)
and write Equation (12.107) as
(12.110) Multiplying the above equation by (12.111) and collecting the position dependence on the left-hand side and the time dependence on the right-hand side, we obtain
Since 7and t are independent variables, the only way to satisfy this equation for all ? and t is to set both sides equal t o the same constant, say - k 2 . Hence, we obtain the following two equations:
(12.113)
554
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
and (12.114)
where the equation for T ( t )is an ordinary differential equation. We continue separating variables by using the substitution
F ( 7 ) = R(T)Y(O,41,
(12.115)
Equation (12.113) as
Multiplying both sides by (12.117) we obtain
Since r and ( 0 , 4 ) are independent variables, this equation can only be satisfied for all r and (0,d) when both sides of the equation are equal to the same constant. We call t,his constant X and write
& (r 2:r- R ( r ) ) +
[ ( K + k 2 ) r 2 - A] R ( r ) = 0
( 12.119)
and
Equation (12.119) for R ( r )is now an ordinary differential equation. We finally separate the 0 and 4 variables in Y (B,4) as
y (0,4) = -0 (0)
(4)
(12.121)
and write sin0 d0
+ XO (0) ip (4)=
0 (0) d 2 @(4)' sin2e
---
(12.122)
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
555
Multiplying both sides by
(12.123) and calling the new separation constant m2,we write
We now obtain the differential equations to be solved for 0 (0) and @ ( 4 ) as
+ [Asin'
0 - m2]o (0) = o
(12.125)
and
(12.126)
In summary, via the method of separation of variables, in spherical coordinates we have reduced the partial differential equation
VZQ(T+,t) + K!P(T+,t)
=a
a2Q(T+,t) at2
+
,aQ(?,t) at
'
(12.127)
to four ordinary differential equations:
(12.128) (12.129)
+ [Xsin2e
-
m2]@Am (8)= 0 ,
( 12.130) (12.131)
which have to be solved simultaneously with the appropriate boundary conditions to yield the final solution as
During this process three separation constants, k , X and m, indicated as subscripts, have entered into our equations. For the time-independent cases we set a = b = k = 0 and
T ( t ) = 1.
(12.133)
556
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
For problems with azimuthal symmetry, where there is no take
4 dependence, we
m = 0, Q(4)= 1. For the
4 dependent solutions, we impose the periodic boundary condition
+ an)= a m ($1
@7rl(4
(12.134)
>
to write the general solution of Equation (12.131) as
( 4 )= no cos m4 + a1 sin m4, m = O,1,2, . . . .
(12.135)
Note that with applications to quantum mechanics in mind, the general solution can also be written as
( 4 ) = a0 ezm@+ al ePrn@.
(12.136)
(cosm4, s i n m 4 } , m = O , l , 2 , . . .
(12.137)
@71L
Since the set
is complete and orthogonal, an arbitrary solution satisfying the periodic boundary conditions can be expanded as w2
A, cos m4 + ,3€
Q (4) =
sin m4.
(12.138)
nt=O
This is basically the trigonometric Fourier series. We postpone a formal treatment of Fourier series to Chapter 13 and continue with Equation (12.130). Defining a new independent variable, namely 2
= cose, z E [-1,1],
we write Equation (12.130) as
For i n = 0, this reduces to the Legendre equation. If we impose the boundary condition that O ~ o ( z be ) finite over the entire interval including the end points, the separation constant X has to be restricted to integer values: X = l ( l + l ) , l = O , l , 2, . . . .
( 12.140)
Thus, the finite solutions of Equation (12.139) become the Legendre polynomials (Chapter 10): @lO(Z) =
9(z).
(12.141)
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
557
Since the Legendre polynomials form a complete and an orthogonal set, a general solution can be expressed in terms of the Legendre series as M
(12.142) 1=0
For the cases with m # 0, Equation (12.139) is called the associated Legendre equation, polynomial solutions of which are given as (12.143) For a solution with general angular dependence, Y ( 8 ,q5), we expand in terms of the combined complete and orthogonal set
ulnL= P r ( c o s 0) [A[,,cos m#
+ B I , sin m4 , I = 0,1, . . . , m = 0,1, . . . , I
A particular complete and orthogonal set constructed by using the 0 and q~! solutions is called the spherical harmonics. A detailed treatment of the associated Legendre polynomials and spherical harmonics is given in Bayin, hence we suffice by saying that the set {Plm(z),1 = 0, I , . . . , } is also complete and orthogonal. A general solution of Equation (12.139) can now be written as the series
c 00
O(8)=
C,Plrn(C0S8).
(12.144)
1=0
So far, nothing has been said about the parameters a, b, and K ; hence the solutions found for the q5 and 8 dependences, (4) and O(z), are usable for a large class of cases. To proceed with the remaining equations that determine the t and the T dependences [Eqs. (12.128) and (12.129)], we have to specify the values of these parameters, a , b, and K , where there are a number of cases that are predominantly encountered in applications. 12.2.1 Laplace Equation To obtain the Laplace equation,
a'Q(?")= 0,
(12.145)
we set K = a = b = 0 in Equation (12.7). Since there is no time dependence I; is also zero, hence the radial equation [Eq. (12.129)] along with Equation (12.140) becomes -
Z ( 1 + 1 ) R ( r )= 0
(12.146)
558
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
or
d2R r2dr2
+ 2r-ddRr - 1(1+
This is nothing but the Cauchy-Euler solution of which can be written as
R[ ( T ) = Carl
l ) R ( r ) = 0.
(12.147)
equation (Chapter 9), the general
+ c1-. 1
(12.148)
We can now write the general solution of the Laplace equation in spherical coordinates as o
o
l
r
.
l
where Almr Blm, a[,, and bl, are the expansion coefficients to be determined from the boundary conditions. In problems with azimuthal or axial symmetry, the solution does not depend on the variable 4 , hence we set m = 0, thus obtaining the series solution as (12.150)
12.2.2 Boundary Conditions for a Spherical Boundary Boundary conditions for Q ( T , ~ ) on a spherical boundary with radius a is usually given as one of the following three types: I. The Dirichlet boundary condition is defined by specifying the value of Q ( r , 0 )on the boundary, r = a , as Q ( a , 0) = f
(6
(12.151)
11. When the derivative is specified,
( 12.152) we have the Neumann boundary condition. 111. When the boundary condition is given as
( 12.153) where do could be a function of 0, it is called the general boundary condition.
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
For finite solutions inside a sphere, we set take the solution as
559
BI= 0 in Equation (12.150) and
co
@(T,O) = C A l r ' P l ( c o s 8 ) .
( 12.154)
1=0
For the Dirichlet condition
the remaining coefficients, Al, can be evaluated by using the orthogonality relation of the Legendre polynomials as Al =
2 i) (l+
l T f ( 0 ) P l ( c o s ( I ) s i n Ddo.
( 12.156)
Outside the spherical boundary and for finite solutions at infinity we set Al = 0 and take the solution as
( 12.157) Now the expansion coefficients are found from (12.158)
For a domain bounded by two concentric circles with radii a and b, both A1 and BI in Equation (12.150) are nonzero. For Dirichlet conditions
and
we now write
and (12.162)
560
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
Using the orthogonality relation of the Legendre polynomials, we obtain two linear equations,
Alal
+ Bl-&
=
(1
+
i) 1
7r
f l ( Q ) P l ( c o s 8sin8 ) dQ
(12.163)
and (12.164) which can be solved for Al and Bl. Solutions satisfying the Neumann boundary condition [Eq. (12.152)] or the general boundary conditions [Eq. (12.153)] are obtained similarly. For more general cases involving both angular variables, 8 and 4, the general solution [Eq. (12.149)] is given in terms of the associated Legendre polynomials. This time the Dirichlet condition for a spherical boundary is given as
and the coefficients, Al, and Ell,, in Equation (12.149) are evaluated by using the orthogonality relation of the new basis functions: ulnL(Q, 4) = Py"(cos8)[a~ cosrn4
+ a2 sinrn41.
(12.166)
Example 12.3. Potential of a point charge inside a sphere: Consider a hollow conducting sphere of radius a held at zero potential. We place a charge q at point A along the z-axis at r' as shown in Figure 12.4. Due to the linearity of the Laplace equation, we can write the potential, a(?),at a point inside the conductor as the sum of the potential of the point charge and the potential due to the induced charge on the conductor, Q(?), as
( 12.167) where
a(?,)
Due to axial symmetry has no 4 dependence, hence we write it as n(r,O). Since O ( r , Q )must vanish on the surface of the sphere, we have a Dirichlet problem. The boundary condition we have to satisfy is now written as -
4
da2+ r f 2- 2ar' cos Q
= Q ( u , 8).
(12.169)
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
Figure 12.4
561
Point charge inside a grounded conducting sphere.
Using the generating function definition of the Legendre polynomials:
we can write the left-hand side of Equation (12.169) as
a
(12.171) 1=0
r' Since - < 1, the above series is uniformly convergent. Using the LegU
endre expansion of Q(r,Q): 00
Q(T,
Q) =
C
Alr'fi(COSO),
(12.172)
1=0
we can also write the Dirichlet condition, !€'(a,O ) , as 00
(12.173) 1=0
562
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
Comparing the two expressions for Q ( a ,8) [Eqs. (12.171) and (12.173)] we obtain the expansion coefficients:
(5) 1
A1 which allows us to write
Q(T,
=
-2
a
,
(12.174)
19) [Eq. (12.172)] as
(12.175) Using the generating function definition of Pl [Eq. (12.170)],we rewrite Q(r?B)as
1 4 Q ( r , 0 )= -a J 1 - 2 ( 3 c o s B + ( ~ )2' Now the potential at
( 12.176)
7becomes
We rearrange this as
a
Q(7) =
(12.178)
4
If we call
p =1 7- 7 ' 1
( 12.179)
and introduce q', r" and p' such that g' = -9-
a r' '
a" r' '
+ T"2
= - pl = JT2
-
2rlrr.cos 0,
( 12.180)
we can also write Q(7) as
a(?) = -9 + -.4/ P
P'
(12.181)
Note that this is the result that one would get by using the image method, where an image charge, q', is located along the z axis at A' at, a distance of T" from the origin (Fig. 12.4).
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
12.2.3
563
Helmholtz Equation
In our general equation [Eq. (l2.7)] we set
a
=
b = 0,
K
=
( 12.182)
k o2 ,
to obtain the Helmholtz equation:
a'%(?")+ lC;*.(?"))
= 0.
(12.183)
Since there is no time dependence, in the separated equations [Eqs. (12.128)(12.131)] we also set k = 0 and T ( t )= 1. The radial part of the Helmholtz equation [Eq. (12.129)] becomes
+ [kgr2 or d2R 2 d R -+--+ dr2 r dr
-
[& - - -
1(1+ l)]R ( r )= 0
1(1+ l ) ]
r2
R(r)= 0.
( 12.184)
( 12.185)
The general solution of Equation (12.185) is given in terms of the spherical Bessel functions as
Rz(r) = coji(kor) + cini(kor).
(12.186)
Now, the general solution of the Helmholtz equation in spherical coordinates can be written as the series
cc a ? ,
Q ( Tt ),=
+
[AlmjL(kor) Bzmnz(kor)l P ; " ( C O S ~ ) cos(,4
+ fjzm),
I=O m = O
(12.187) where the coefficients Al,,, Bl,, 61, are to be determined from the boundary conditions. Including the important problem of diffraction of electromagnetic waves from the earth's surface, many of the problems of mathematical physics can be expressed as such superpositions. In problems involving steady state oscillations with time dependence described by ezwt or e-z"wt,nl(k0r) in Equation (12.187) is replaced by h ~ " ( k 0 r )or h,(1)( k o r ) ,respectively. 12.2.4
Wave Equation
In Equation (12.7) we set
( 12.188)
564
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
and obtain the wave equation
a'"(7,t)
1
=-Q(7,t),
(12.189)
V2
where v stands for the wave velocity. The time-dependent part of the solution,
T ( t ) ,is now determined by the equation [Eq. (12.128)] d2T -k2v2T(t), (12.190) dt2 the solution of which can be written in the following equivalent ways: -=
T ( t )= a0 cos wt + a1 sin wt = A0 cos(wt + A l ) , w = k ~ ,
(12.191) (12.192)
where ao, a1, AO, A1 are integration constants. The radial equation [Eq. (12.129)] is now written as
-+--+ d2R 2 d R dr2 T dr
[
k -~ l ( l T ; l ) ] R ( r ) = 0,
(12.193)
where the solution is given in terms of the spherical Bessel functions as
Rl(r) = coj1(kr)
+ c1n1(lcr).
(12.194)
We can now write the general solution of the wave equation [Eq. (12.189)] as 0
0
1
[Al,jl(kr)
Q ( 7 , t )=
+ Bl,nl(kr)]
Plm(cos8)cos(m4
+ Sl,)
cos(wt
+A),
1=0 m=O
( 12.195) where the coefficients Al,, ary conditions. 12.2.5
Bl,,, Sl,, Al are to be determined from the bound-
Diffusion and Heat Flow Equations
In Equation (12.7) if we set ~ = 0 b, f O , a = 0 ,
(12.196)
t) V2Q(7,t ) = b as(?, dt '
( 12.197)
we obtain
which is the governing equation for diffusion or heat flow phenomenon. Since k 2 # 0, using Equation (12.128) we write the differential equation to be solved for T ( t )as
bdi'o + k2T(t) = 0, dt
( 12.198)
565
SEPARATION OF VARIABLES IN SPHERICAL COORDINATES
which gives the time dependence as
T ( t )= C e - k 2 t / b ,
(12.199)
where C is an integration constant to be determined from the initial conditions. Radial dependence is determined by Equation (12.129), -d + 2 R- - +2 d R dr2 r dr
[
k 2 - - i ( i + l ) ] R ( r ) = 0,
r2
(12.200)
solutions of which are given in terms of the spherical Bessel functions as
Rl(r)
= Aoj l(k r)
+ Bonl(kr).
(12.201)
Now the general solution of the diffusion equation can be written as
1=0
m=O
(12.202) where the coefficients Al,, conditions.
12.2.6
Bl,,
61, are to be determined from the boundary
Time-Independent Schrodinger Equation
For a particle of mass m moving under the influence of a central potential, V ( r ), the time-independent Schrodinger equation is written as (12.203) where E stands for the energy of the system. To compare with Equation (12.7) we rewrite this as
2mE
2mV(r)
( 12.204)
In the general Equation [Eq. (12.7)] we now set
(12.205)
a=b=O. Using Equation (12.128) we also set k a function of r :
2m
K(T)
=0
and T ( t )= 1. Note that
= - [E - V ( r ) ;] fi2
K
is now
(12.206)
566
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
hence the radial equation [Eq. (12.129)] becomes
f( r z dR) + ($[ E 2
-
V(r)] - 1(1
+ 1)
(12.207)
For the Coulomb potential, solutions of this equation are given in terms of the associated Laguerre polynomials, which are closely related to the Laguerre polynomials (Bayin). 12.2.7
Time-Dependent Schrodinger Equation
For the central force problems the time-dependent Schrodinger equation is
+ V(r)Q(?;f,t) = ih d Qd( t7 , t )’
ti2 2m
--+$(7,t)
(12.208)
which can be rewritten as
?Q(?;f,t)
2mV(r)
-
___ Q ( 7 , t )= -2ti2
, 2 m as(F ,t ) h dt .
(12.209)
We now have K =
2mV(r)
-___ , a = Q a n d b = - - . ti2
2mi fi
(12.2 10)
The time-dependent part of the solution satisfies
2mi dT h dt
+ k 2 T = 0,
(12.211)
(12.212) We relate the separation constant k 2 with the energy, E , as
2mE
k =-.
h , ’
( 12.213)
hence T ( t )is written as
T ( t )= TOe-iEt/h.
(12.214)
The radial part of the Schrodinger equation [Eq. (12.129)] is now given as
f (r
2
z)+ ($ dR
[E - V(T)] - 1(1+ 1)
(12.215)
where 1 = 0, 1,.. . . Solutions of Equation (12.215) are given in terms of the associated Laguerre polynomials. Angular part of the solution, Q(?, t ) , comes from Equations (12.130) and (12.131), which can be expressed in terms of the spherical harmonics xm(8, 4).
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
567
12.3 SEPARATION OF VARIABLES IN CYLINDRICAL CO 0R DINATES
In cylindrical coordinates we can write the general equation,
d 2 Q ( 7 ’ t ) b a Q ( 7 t, ) PQ(7, t )+ K Q ( 7 , t ) = a at2 at ’ +
( 12.216)
as
(12.217) Separating the time variable as
Q ( 7 ’ t )= F ( 7 ) T ( t ) ,
(12.218)
we write
(12.219) Dividing by F ( f ) T ( t )gives us the separated equation
-
a-
+ b-1 dt
’
Setting both sides equal to the same constant,
-x2, we obtain (12.221)
and
We now separate the z variable by the substitution
568
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
to write
(12,224) Dividing by G(r,qb)Z(z) and setting both sides equal to the same constant, -A2, we get
d22(z) dz2
+
(K
- X"Z(z)
=0
(12.225)
and
(12,227) to write
Dividing by R ( r ) @ ( 4 ) / rand 2 setting both sides to the constant p2 gives us thc last two equations as
--% 1d
(r-$-) dR(r)
+ ( x 2 + x2
-
(12.229)
and (12.230)
In summary, we have reduced the partial differential equation [Eq. (12.216)] in cylindrical coordinates to the following ordinary differential equations: (12.231)
Ir d dr
(rF) d R ( r )
+ ( x 2 + X2
-
g)
R ( r )= 0 ,
(12.232) (12.233)
d2Z(z) dz2
_ _ _ -( A 2 - K ) Z ( Z ) = 0.
(12.234)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
569
Combining the solutions of these equations with the appropriate boundary conditions, we write the general solution of Equation (12.216) as
Q(?,t) = T ( t ) R ( r ) @ ( 4 ) Z ( z ) .
(12.235)
When there is no time dependence, we set in Equations (12.231)-(12.234)
(L
= 0,
b = 0,
For azimuthal symmetry there is no
x = 0 , T ( t )= 1.
(12.236)
4 dependence, hence we set
@(4)= 1.
(12.237)
a=0, b=0,
(12.238)
p = 0,
12.3.1 Laplace Equation When we set K=O,
Equation (12.216) becomes the Laplacc equation:
PQ(?) = 0.
(12.239)
For a time-independent separable solution, namely
we also set x = 0 and T solved become
=
1 in Equation (12.231); hence the equations to be
(12.24 1) (12.242)
d2Z(z)
~-
dz2
X”(z)
=
0.
(12.243)
Solutions can be written, respectively, as
+ + +
R ( r ) = aoJp(Xr) alNp(Xr), @ ( 4 )= bo cos p$ bl sin p $ , Z ( Z )= co cosh Xz c1 sinh Xz.
(12.244) (12.245) (12.246)
570
12.3.2
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
Helmholtz Equation
In Equation (12.216), when we set
~ = 2k a ,= 0 , b = 0 ,
(12.247)
we obtain the Helmholtz equation,
a‘”(7) + I c 2 8 ( ? )
(12.248)
= 0,
which when a separable solutions of the form Q(7) = R(r)@(q6)Z(z) is substituted, leads t o the following differential equations: (12.249) (12.250) (12.251) In terms of the separation constants the solution is now written as
x
Note that in Equation (12.231) we have set = 0 and T = 1 for no time dependence. We can now write the solution of the radial equation as
Solution of Equation (12.250) gives the q6 dependence as
@(@)= bo cos pq5
+ bl sin &.
(12.254)
Finally, for the solution of the z equation [Eq. (12.25l)l we define 2 - k2 =
k20
sinh koz
} { x - kki > o } .
( 12.255)
to write the choices
Z ( z ) = co 12.3.3
{
cOskOz
cash koz
}+ { c1
for
-
(12.256)
Wave Equation
For the choices K.
= 0,
1 b = 0, and a = -, V2
( 12.257)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
571
Equation (12.216) becomes the wave equation,
( 12.258) where v stands for the wave velocity. For a separable solution, namely
equations to be solved become [Eqs. (12.231)-(12.234)] 1 d2T(t) -X2T(t)= 0,
v2 dt2
+
(12.259) (12.260) (12.261) (12.262)
where x,A, and p are the separation constants. Solution of the time-dependent equation gives
T ( t )= a0 cos w t + a1 sinwt,
(12.263)
w = vx.
(12.264)
x2 + x 2 = m2,
(12.265)
where we have defined
Defining a new parameter, namely
the solution of the radial equation is immediately written in terms of Bessel functions as
The solution of Equation (12.261) is
@(4)= bo cos p4 + bl sin p$
(12.267)
and for the solution of the z equation we write
Z ( z ) = co cosh Xz + c1 sinh Xz.
(12.268)
572
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
12.3.4
Diffusion and Heat Flow Equations
For the diffusion and the heat flow Equations, in Equation (12.216) we set
n = 0 , a = 0 , andb#O
(12.269)
to obtain
a*(?, V2Q(?,t) =b
t)
(12.270)
at
we have to solve the following differential equations:
1d dR(r) --z (r7)
bdll‘o + x2T(t) = 0 , dt
+ (x2+ X2
-
,)R(r) P2 r
= 0,
(12.272) (12.273) (12.274)
d22(z) dz2
_ _ _ - X”(z)
= 0.
(12.275)
The time-dependent part can be solved immediately to yield
( 12.276)
T ( t )= Toe-X2t/b, while the remaining equations have the solutions
R ( r ) = aoJp(rnr)+ U l N , ( r n T ) , m2 = x2 @(4)= bo cos pq5 bl sin p 4 , Z ( z ) = co cosh Xz + c1 sinh Xz .
+
+ X2,
(12.277) (12.278) (12.279)
Example 12.4. Dirichlet problem f o r the Laplace Equation: Consider the following Dirichlet conditions for a cylindrical domain (Fig. 12.5):
( 12.280) (12.281) (12.282) for the Laplace equation we have
VQ(T, $ , z ) = 0.
(12.283)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
Figure 12.5
573
Laplace equation with Dirichlet conditions.
This could be a problem where we find the temperature distribution inside a cylinder with the temperature distributions at its top and bottom surfaces are given as shown in Figure 12.5, and the side surface is held at 0 temperature. Since the boundary conditions are independent of 4,we search for axially symmetric separable solutions of the form
= 1 for no 4 dependence, we use Equations (12.244) Setting /I = 0, and (12.246) to write R(r) and Z ( z ) as
+
R ( r ) = uoJo(Xr) u1No(Xr), Z ( z ) = co cosh Xz + c1 sinh Xz.
(12.284) (12.285)
Since N,(s)+ 00 when r + 0, for physically meaningful solutions that are finite along the z-axis, we set a1 = 0 in Equation (12.284). Using the first boundary condition [Eq. (12.280)], we write &(Xu) = 0
(12.286)
and obtain the admissable values of X as (12.287)
574
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
where XO, are the zeros of the Bessel function Jo(x).Now a general solution can be written in terms of the complete and orthogonal set, {Q,(T,z)
=
I).?(
[co,cosh ( % z )
Jo
+cl,sinh
(Tr)} , (12.288)
n = 1 , 2 , . . . as
c 00
Q ( r , z )=
[n,cosh
n=l
(12.289) Using the remaining boundary conditions [Eqs. (12.281) and (12.282)], we also write 00
,
A,Jo ( %ar )
f o ( r )= Q(r,O) =
(12.290)
n=l
c 00
fl(r) =
Q ( r , l )=
+ B,sinh
[A, cosh ( % l )
n.=l
(12.291) Using the orthogonality relation,
La’
(Fr) (Tr)
~ J o
JO
dr
=
a’ 2 5 [JI (xo,)] , ,,S
(12.292)
we can evaluate the expansion coefficients, A , and B,, as 2J:rfo(r)J0
( F r )dr
A, =
(12.293) a2 [Ji(xon)I’
and
(-)
2 [ L a r f ~ ( r ) J 0X O n r dr - cosh
Bn
=
a a2 [ J1(Q,)]
(%) (%)
(-)
Larf0(r)Jo
dr]
a
sinh
(12.294)
Example 12.5. Another boundary condition for the Laplace equation: We now solve the Laplace equation with the following boundary conditions: (12.295) (12.296)
( 12.297)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
575
= 1 in Equation Because of axial symmetry, we again take p = 0, (12.242). To satisfy the second condition, we set co = 0 in Equation (12.246):
Z ( z ) = co cosh Xz
+ c1 sinh Xz,
(12.298)
to write Z ( z ) as Z ( z ) = c1 sinhXz.
(12.299)
If we write sinh as sin with an imaginary argument, that is, sinh Xz = -i sin i X z ,
(12.300)
and use the third condition [Eq. (12.297)], the allowed values of X are found as n --
nm -, n = 1 , 2) . . . .
(12.301)
1
Now the solutions can be written in terms of the modified Bessel functions as
(12.302)
Z ( z ) = c1 sin
(3
Since
K~ we also set
a1
(7.)
---f
oo as r
0,
4
= 0, thus obtaining the complete and orthogonal set
{ Q n ( r , z )= a010 ( Y r ) sin ( Y z ) } , n = 1 , 2 , . . . .
(12.303)
We can now write a general solution as the series 03
z) =
C A , I ~(5) 1 sin I (“2)
.
(12.304)
n=l
Using the orthogonality relation (12.305)
576
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
we find the expansion coefficients as
A,,=
(f)
~ g ( ~ a ) ] - ' ~ F ( r ) s i n ( ~ z ) d z (12.306) .
Example 12.6. Periodic boundary conditions: Consider the L a p h e equation with the following periodic boundary conditions:
q(?-, 0, z ) = *(?-,27r, z ) ,
(12.307) (12.308)
and Q ( u ,4 , z ) = 0.
(12.309)
Using the first two conditions [Eqs. (12.307) and (12.308)] with Equation (12.245):
Q(4)= bo cos pq5 + bl sin p4, which we write as
where 60 and S,,, are constants, we obtain the allowed values of m as p = m = O , 1 , 2 ,... . For finite solutions along the z-axis, we set [Eq. (12.244)] to write
a1 =
(12.311)
0 in the radial solution
( 12.312)
R ( r ) = aoJ,(Ar). Imposing the final boundary condition [Eq. (12.309)]:
( 12.313)
& ( x u ) = 0,
we obtain the admissable values of X as 57nn
An=---,
a
n = 1 , 2 ,...,
(12.314)
where z,,~ are the roots of J,(z). Finally, for the z-dependent solution we use Equation (12.246) and take the basis functions as the set
{ 'Psn,, = J,,
(y )
+
[cos(m$ S,)]
[cg
Zmnz
+
xmrLzl
cosh - c1 sinh ' a a (12.315)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
where m = 0 , 1 , . . . and n soliition as the series
=
J,,, Ill
I1
=
1 , 2 , .. .
(y) cos(mq5 + 6 ) ,
577
. We can now write a general
[Amncosh -+ Bmnsinh a a Xmnz
(12.316)
Example 12.7. Cooling of a long circular cylinder: Consider a long circular cylinder with radius a , initially heated to a uniform temperature T I ,while its surface is maintained at a constant temperature To. Assume the length to be so large that the z-dependence of the temperature can he ignored. Since we basically have a two-dimensional problem, using the cylindrical coordinates, we write the heat transfer equation as
d Q ( r ,t ) - d 2 9 ( r ,t ) b-at dr
+ -r1-d 9dr( r ,t )'
( 12.317)
where b is a constant depending on the physical parameters of the system. We take the boundary condition a t the surface as
Q ( a , t ) = TO, 0 < t < 00,
(12.318)
while the initial condition at t = 0 is Q(T,O) =TI, 0
5 r < a.
( 12.319)
We can work with the homogeneous boundary condition by defining a new dependent variable as
R(r, t ) = Q ( T , t ) - TO,
(12.320)
where O ( r ,t ) satisfies the differential equation
b-
dR(r,t ) - d2R(r, t ) at dr
t) + -r1-dR(r, ar
(12.321)
with the boundary conditions
f2(a,t)= 0, 0 < t < 00
(12.322)
and
R(r,O)
= TI - To,
0 5 r < a.
(12.323)
We need a finite solution for 0 5 r < a and one that satisfies R(r, t ) as t 00. Substituting a separable solution of the form
----f
0
--f
O ( r ,t ) = R(r)T(t),
(12.324)
578
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
we obtain the differential equations to be solved for R(r) and T ( t )as
dT dt
(12.325)
dR [7] + X2R = 0,
(12.326)
b-++2T=O
and
1d
--$
where x is the separation constant. Note that Equations (12.325) and (12.326) can be obtained by first choosing a = 0 arid K = 0 in Equation (2.216) and then from Equations (12.231)-(12.234) with the choices p = 0, Q, = 1 and X = 0, Z ( z ) = 1. Solution of the time-dependent equation [Eq. (1.325)] can be written immediately as
T ( t )= Ce-X2t/b,
(12.327)
while the solution of the radial equation [Eq. (12.266)] is
R ( r ) = aoJo(xr)+ U l N O ( X T ) . Since No(xr) diverges as
T
o(T,
---f
0, we set
a1
( 12.328)
to 0, thus obtaining
t ) = uoJo(Xr)e-X”’b.
(12.329)
To satisfy the condition in Equation (12.322), we write
J o ( x a ) = 0, which gives the allowed values of Xn
x
(12.330)
as the zeros of J o ( z ) :
Xon
= -, n = 1 , 2 ) . . . U
.
(12.331)
Now the solution becomes
(”””.)
~ ~ t() =r A, ~ J ~ e-zgntlab.
(12.332)
U
Since these solutions form a complete and orthogonal set, we can write a general solution as the series
Since On(r,t ) satisfies all the conditions except Equation (12.323), their linear combination will also satisfy the same conditions. To satisfy the remaining condition [Eq. (12.323)], we write (12.334)
SEPARATION OF VARIABLES IN CYLINDRICAL COORDINATES
579
We now use the orthogonality relation:
along with the recursion relation Z"J,-l(Z)
d dx
= -[Z"Jm,(Z)]
(12.336)
and the special value J l ( 0 ) = 0, to write the expansion coefficients as (12.337) Ex a m p l e 12.8. Symmetric vibrations of a circular drumhead: Consider a circular membrane fixed at its rim and oscillating freely. For oscillations symmetric about the origin we have only r dependence. Hence we write the wave equation [Eq. (12.258)] in cylindrical coordinates as (12.338) where Q(r,t ) represents the vertical displacement of the membrane from its equilibrium position. For a separable solution, q ( r ,t ) = R ( r ) T ( t ) , the corresponding equations to be solved are (12.339) (12.340) where x is the separation constant. These equations can again be obtained from our basic equation [Eq. (12.216)] with the substitution ti = 0, b = 0, a = l / v 2 and X = 0, 2 = 1, p = 0, = 1 in Equations (12.259)-(12.262). The time-dependent equation can be solved immediately as
T ( t )= 60cos(wt + SI),
( 12.341)
where we have defined w 2 = x2v2. The general solution of Equation (12.340) is (12.342) We again set
a1 =
0 for regular solutions at the origin, which leads to
Q(r,t ) = AoJo ( t r ) cos(wt + 61). U
(12.343)
580
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
Since the membrane is fixed a t its rim, we write W
* ( a , t ) = JO (,r)
= 0.
( 12.344)
This gives the allowed frequencies as
w,
u n = 1,2,.. . , a
= Qn-,
(12.345)
where zon are the zeros of Jo(z). We now have the complete set of functions V
Q n ( r , t )= JO ( % r ) cos (zon-t a
+ 6,)
, n = 1 , 2 , .. . ,
(12.346)
which can be used to write a general solution as
C An& (a3. cos) (xon-t a" + 6,1 . 03
@(T, t ) =
(12.347)
n=l
Expansion coefficients A, and the phases 6, come from the initial conditions,
(12.349) as
6, = tan-' -,Yon
Xon
(12.350) (12.351)
(12.352) (12.353)
PROBLEMS
1. Solve the two dimensional Laplace equation in Cartesian coordinates,
PROBLEMS
Figure 12.6
581
Boundary conditions for the Problem 12.2.
inside a rectangular region, boundary conditions:
LC
E [O,u]
and y E [O,b],with the following
2. Solve the Laplace equation in Cartesian coordinates,
inside a rectangular region, boundary conditions:
LC
E [0, u] and y E [O, b ] , with the following
where fo and f i are constants (Fig. 12.6).
582
PARTIAL DIFFERENTIAL EQUATIONS AND SEPARATION OF VARIABLES
3 . In Example 12.1 show that the Laplace equation with the boundary conditions Q ( x ,0) = 0, 9 ( z ,b) = 0, Q‘(0,Y) = Q l ( % Y ) = 0,
leads to the solution
c 03
~ ( zy), =
C, [sin
n=l
I
?I [
sinh ?(a - x) sinh ?a
’
where
4. Under what conditions will the Helmholtz equation:
+
PQ(?;t) rC”?;t)S(?;t) = 0,
be separable in Cartesian coordinates.
5. Toroidal coordinates (a,p, 4)are defined as X=
c sinh a cos 4 coshcr - c o s p ’
c sinh a sin q5 csinp z= = cosha - cosp’ cosha - c o s p ’
where cy E [0, oo),P E ( - T , 7 r ] , 4 E ( - T , 7r] and the scale factor c is positive definite. Toroidal coordinates are useful in solving problems with a torous as the bounding surface or domains bounded by two intersecting spheres (see Lebedev for a discussion of various coordinate systems available). (i) Show that the Laplace equation a‘”(W
P, 4) = 0
in toroidal coordinates is given as d
sinhcr
dQ
da [ c o s h a - c o s / 3 ~ ]
+
(cosh CY
d [ c o s sinh ha-cosPdfl +-dp 01
1 d29 = 0. - cos p ) sinh a &h2
(ii) Show that as it stands this equation is not separable.
PROBLEMS
583
(iii) However, show that with the substitution
the resulting equation is separable as O(Q, P , 4) = A ( a I B ( P M 4 )
and find the corresponding ordinary differential equations for A ( Q ) , W P ) and C(4).
6. Using your result in Problem 12.5, find separable solutions of the heat flow cquation in toroidal coordinates.
7. Consider a cylinder of length 1 and radius u whose ends are kept at temperature zero. Find the steady-state distribution of temperature inside the sphere when the rest of the surface is maintained at temperature To. 8. Find the electrostatic potential inside a closed cylindrical conductor of length 1 and radius a , with the bottom and the lateral surfaces held a t potential V and the top surface held at zero potential. The top surface is separated by a thin insulator from the rest of the cylinder. 9. Show that the stationary distribution of temperature in the upper halfspace, z > 0, satisfying the boundary condition
T ( z ,y, 0) = F ( T ) =
TO, T < a , 0, r > a,
is given as
e-’”Jo(Xr)JI(Xu)dX. Hint: Use the relation 1 r
-S(T
Can you derive it?
-T
)
‘
=
XJ,(Xr)J,(Xr’)dX.
This Page Intentionally Left Blank
CHAPTER 13
FOURIER SERIES
In 1807 Fourier announced in a seminal paper that a large class of functions can be written as linear combinations of sines and cosines. Today, infinite series representation of functions in terms of sinosoidal functions is called the Fourier series, which has become an indispensable tool in signal analysis. Spectroscopy is the branch of science that deals with the analysis of a given signal in terms of its components. Image processing and data compression are among other important areas of application for Fourier series.
13.1 ORTHOGONAL SYSTEMS OF FUNCTIONS After the introduction of Fourier series, it became clear that they are only a part of a much more general branch of mathematics called the theory of orthogonal functions. Legendre polynomials, Hermite polynomials, and Bessel functions are among the other commonly used orthogonal function sets. Certain features of this theory are incredibly similar to geometric vectors, where in n dimensions a given vector can be written as a linear combination of n linearly independent basis vectors. In the theory of orthogonal functions, we can express almost any arbitrary function as the linear combination of a Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
3. Selsuk Bayin 585
586
FOURIER SERIES
set of basis functions. Many of the tools used in the study of ordinary vectors have counterparts in the theory of orthogonal functions. Among the most important ones is the definition of inner product, which is the analog of scalar or dot product for ordinary vectors. Definition 13.1. I f f and g are two complex-valued functions, both (Riemann) integrable in the interval [a,b],their inner product is defined as the integral
(13.1) For real-valued functions the complex conjugate becomes redundant. From this definition, it follows that the inner product satisfies the properties
(13.2) (13.3) (13.4) (13.5) (13.6) where c is a complex number. The nonnegative number, (f,f)'I2, is called the norm o f f . It is usually denoted by l l f l l and it is the analog of the magnitude of a vector. The following inequalities follow directly from the properties of inner product: Cauchy-Schwarz inequality:
l(f,s)l 5 llfll 11911
'
(13.7)
Minkowski inequality:
llf + 911 I llfll + 11g11 ,
(13.8)
which is the analog of the triangle inequality. Definition 13.2. Let S = {uo,u1,. . . } be a set of integrable functions in the interval [a,b]. If
(u,,un) = 0 for all m # n,
(13.9)
then the set S is called orthogonal. Furthermore, when the norm of each element of S is normalized to unity, IIunI)= 1, we have (urn,un) = &nn
( 13.10)
and the set S is called orthonormal. We have seen that Legendre polynomials, Hermite polynomials and Bessel functions are orthogonal sets. As the reader can verify, the set einx
{un(x)=
I
, n = 0 , 1 , . . . , xE[0,27r]
(13.11)
ORTHOGONAL SYSTEMS OF FUNCTIONS
587
is one of the most important orthonormal sets in applications. Linear independence of function sets is defined similar to ordinary vectors. A set of functions,
S = { U O , ~ I , . . .,un},
( 13.12)
is called linearly independent in the interval [a,b] if the equation
couo
+
ClUl
+ '.
'
cnu,
= 0,
(13.13)
where co, c1, . . . ,c, are in general complex numbers, cannot be satisfied unless all ci are zero:
co = c l
=
. . . = c,
= 0.
(13.14)
An infinite set, S = {uo, u1,. . . }, is called linearly independent in [a,b], if every finite subset of S is linearly independent. It is clear that every orthogonal/orthonormal set is linearly independent. All the similarities with vectors suggest that an arbitrary function may be expressed as the linear combination of the elements of an orthonormal set. An expansion of this sort will naturally look like M ~~
f(x) =
C cnun(x), x E [a,b].
(13.15)
n=O
There are basically two important questions t o be addressed: first, how to find the expansion coefficients and, second, will the series C,"==, c,u, converge to f(x)? Finding the coefficients, at least formally, is possible. Using the orthonormality relation, J u & u n dx = S,, we can find c, as M
(13.16) m=O
(13.17) - c,.
(13.18)
In other words, the coefficients of expansion are found as
=la pb
c,
u:f dx.
( 13.19)
As far as the convergence of the series C,"=,c,u, is concerned, the following questions come to mind: Does it converge uniformly? Does it converge only a t certain points of the interval [a,b]? Does it converge pointwise, that is, for all the points of the interval [a,b]? Uniform convergence implies the integrability of f(x) and justifies the steps leading t o Equation (13.19). In other words,
588
FOURIER SERIES
when the series Czzocnu, converges uniformly to a function f ( z ) , then that function is integrable and the coefficients, en, are found as in Equation (13.19). Note that uniform convergence implies pointwise convergence but not vice versa. For an orthonormal set, S = ( U O , U ~ ,.. . } , defined over the interval [u,b], an integrable fiinction can be written as the series
which is called the generalized Fourier series of f ( z ) with respect to the set S. A general discussion of the convergence of these series is beyond the scope of this book, but for majority of the physically interesting problems they converge (Bayin). However, for the (trigonometric) Fourier series we address this question in detail. In the meantime, to obtain further justification of the series representation in Equation (13.20), consider the partial sum
(13.21)
and the finite sum
where
bk
are arbitrary complex numbers. We now write the expression
ORTHOGONAL SYSTEMS OF FUNCTIONS
589
Using the properties of inner product; along with the orthogonality relation [Eq. (13.10)], we can write
c n
bkUk(Z),
( 13.25)
bzuz(z)
Z=O
( 13.26) k=O 1=O
n
=C l b k l
2
,
(13.27)
k=O
(13.28) n
(13.29) k=O n
=
C
(13.30)
bkCE
k=O
and (tn,
f) = ( f ,i n ) * =
c n
(13.31)
b2k.
k=O
Using Equations (13.27), (13.30),and (13.31), Equation (13.24) can be written as n
n
n
k=O
k=O
k=O
k=O n
k=O n
k=O
k=O
(13.34) Since the right-hand side [Eq. (13.34)] is smallest when write
lbIf
b -
SnI2 dz 5
If
- G I 2 dz.
bk
= C k , we can also
(13.35)
Sigriificancc of these results beconies clear if we notice that each linear combination, t,,, can be thought of as an approximation of the function f ( z ) . The intcgral on the left-hand side of Equation (13.35) can now be interpreted as
590
FOURIER SERIES
the mean square error of this approximation. The inequality in Equation (13.35) states that among all possible approximations, tn = b k u k , of f(z), the nth partial sum of the generalized Fourier series represents the best approximation in terms of the mean square error. Using Equation (13.34), we can obtain two more useful expressions. If we set c k = b k and notice that the right-hand side is always positive, we can write
c;=,
(13.36) Ja
k=O
which is called the Bessel's inequality. With the substitution Equation (13.34) also implies
ck = bk,
( 13.37) (13.38) Ja
Since as n
+c m
we have
Ilf
- SnI(
----f
k=O
0 , we obtain (13.39)
k=O
which is known as the Parseval's formula. We conclude this section with the definition of completeness: Definition 13.3. Given an orthonormal set of integrable functions defined over the interval [a,b] :
s = {uo,u1,. . . } .
(13.40)
In the expansion (13.41) n=O
if the limit (13.42) is true for every integrable f, then the set S is called complete and we say the series C ~ = o c n u n ( converges x) in the mean to f(z). Convergence in the mean is not as strong as uniform or pointwise convergence, but for most practical purposes it is sufficient. From Bessel's inequality [Eq. (13.36)] it is seen that for absolutely inb tegrable functions, that is, when the integral Ifldx exists, the series
sa
FOURIER SERIES
Cr=oIck(2 to 0 as n
591
also converges, hence the n t h term of this series necessarily goes In particular, if we use the complete set
+ 03.
s = {u7%= e i n z } , n = 0 , 1 , 2 , . . . , x E [ o , ~ T ] ,
(13.43)
we obtain the limit r271
(13.44) which can be used to deduce the following useful formulas: lim
.I,
lim
i2*
Pa"
n-cc
n+m
f ( x ) cosnx dx f ( x ) sinnx dx
= 0,
(13.45)
= 0.
(13.46)
This result can be generalized as the Riemann-Lebesgue Lemma for absolutely integrable functions (Apostol p. 471) as
2%Jd
b
f ( x ) sin(ax
+ p) dx = 0,
(13.47)
which holds for all real CY and p, and where the lower limit, a , could be --oo and the upper limit, b, could be 00.
13.2
FOURIER SERIES
The term Fourier series usually refers to series expansions in terms of the orthogonal system
S
= { u O , U ~ ~ - ~ , U ~ ~n} = ,
1 , 2 , .. . , x E
[-7r,~],
(13.48)
where (13.49) which satisfy the orthogonality relations
lT ; 1, ; 1"
1 " ; cos n x cos m x dx = , , ,a
1
"
1
"
sin n x sin m x dx
=,,S
sin n x cos m x dx = 0.
(13.50)
,
(13.51) (13.52)
592
FOURIER SERIES
Since the basis functions are periodic with the period 27r, their linear combinations are also periodic with the same period. We shall see that every periodic function satisfying certain smoothness conditions can be expressed as the Fourier series (13.53) Using the orthogonality relations, we can evaluate the expansion coefficients as
1
ulL=
f ( t )cosnt d t , n = 0 , 1 , 2 , . . .
/T
7r
,
(13.54)
-7r
(13.55) Substituting the coefficients [Eqs. (13.54) and (13.55)] back into Equation (13.53) and using trigonometric identities, we can write the Fourier series in more compact form as
& 1;
f ( t ) dt
f ( ~= )
1 "
+ -7r
f ( t )c o s n ( ~ - t ) dt.
,=I
(13.56)
--?r
Note that the first term uo/2 is the mean value of f ( x ) in the interval
[-7r,
7r].
13.3 EXPONENTIAL FORM OF T H E FOURIER SERIES Using the relations
(13.57)
we can express the Fourier series in exponential form as 03
( 13.58) where 1
c, = -(un 2
-
1 2
&), c; = -(u,
+ it),),
co
= ao.
(13.59)
Since c; = c - ~ we , can also write this in compact form as 00
(13.60) n=--co
CONVERGENCE OF FOURIER SERIES
13.4
593
CONVERGENCE OF FOURIER SERIES
We now turn t o the convergence problem of Fourier series. The whole theory can be built on two fundamental formulas. Let us now write the partial sum
(13.61) Substituting the definitions of the coefficients (13.55)]into the above equation, we obtain
+
n
(cos k t cos k x
ak
and
bk
[Eqs. (13.54) and
+ sin kt sin k x )
( 13.62)
k=l
(13.63) k=l
(13.64) where we introduced the function 1
Dn(t) = 5
+ CCOS kt.
(13.65)
k=l
Since both f ( t ) and D n ( t ) are periodic functions with the period 27r, after a variable change, t - x -+ t , we can write
Sn(x)= 1 7T
-"-x
f(x
+ t ) D n ( t )dt
(13.66) (13.67)
Using the fact that D n ( - t ) = D n ( t ) , this can also be written as
(13.68) Using the trigonometric relation (see Prob. 13.6):
t 2 (13.69)
594
FOURIER SERIES
we can also write o n ( % ) as
D,(2t) =
sin(2n + 1)t , 2 sin t
I (n+%
t # mr, m is an integer, (13.70) t
= mr, m
is an integer,
thus obtaining the partial sum as
which is called the integral representation of Fourier series. It basically says that the Fourier series written for f(z) converges at the point z, if and only if the following limit exists:
f(z + at) + f(x - at) sin t
2
]
dt.
(13.72)
In the case that this limit exists, it is equal to the sum of the Fourier series. We now write the Dirichlet integral: lim
n-cc
1
2 'g(tlt sinnt d t = g(0'), i~
(13.73)
which Jordan has shown to be true when g ( t ) is of bounded variation. That is, when we move along the z-axis the change in g ( x ) is finite (Apostol p. 473). Basically, the Dirichlet integral says that the value of the integral depends entirely on the local behavior of the function g ( t ) near 0. Using the Riemann-Lebesgue lemma, granted that g ( t ) / t is absolutely integrable in the interval [€,&I, 0 < E < 6, we can replace t in the denominator of the Dirichlet integral with another function, like sin t , that has the same behavior near 0 without affecting the result. In the light of these, since the function
(13.74) is continuous at t = 0 , we can write the integral
Now, the convergence problem of the Fourier series reduces to finding the conditions on f ( z ) which guarantee the existence of the limit [Eq. (13.72)]
f ( z + at) + f(x ~ [ ~ '2 ~
-
lim
2
114m 7(-
2t)
dt.
(13.76)
SUFFICIENT CONDITIONS FOR CONVERGENCE
595
Employing the Riemann-Lebesgue lemma one more time, we can replace the upper limit of this integral with 6, where S is any positive number less than 7r/2. This result, which follows from the fact that the limit
lini
11-30
2 T
lTl2 [ + f(z
at) + f ( z - 2 t ) 2
dt
+0
(13.77)
is true, is quoted as the Riemann localization theorem: Theorem 13.1. R i e m a n n localization theorem: Assume that f (z) is absolutely integrable in the interval [0, 27r] and periodic with the period 27r. Then, the Fourier series produced by f ( z ) converges for a given z, if and only if the following limit exists:
lim
1c-cc
5
2 1[ 7r
f(z
+ at) + f ( x - 2 t )
dt
2
1
(13.78)
where S < is a positive number. When the limit exists, it is equal to the sum of the Fourier series produced by f ( z ) . lmportance of this result lies in the fact that the convergence of a Fourier series at a given point is determined by the behavior of f(x) in the neighborhood of that point. This is surprising, since the coefficients [Eqs. (13.54) and (13.55)] are determined through integrals over the entire interval.
13.5
SUFFICIENT CONDITIONS FOR CONVERGENCE
We now present a theorem due to Jordan, which gives the sufficient conditions for the convergence of a Fourier series at a point (Apostol pg. 478). Theorem 13.2. Let f ( z ) be absolutely integrable in the interval ( 0 , 2 ~ ) with the period 27r and consider the interval [z - 6,z 61 centered at z in which f(x) is of bounded variation. Then, the Fourier series generated by f ( r )converges for this value of z to the sum
+
f(.+) + f(z-1 2
(13.79)
Furthermore, if f ( z ) is continuous a t z, then the series converges to f ( z ) . Proof of this theorem is based on showing that the limit in the Riemann localization theorem: (13.80) exists for (13.81)
596
FOURIER SERIES
and equals g(Of). This theorem is about the convergence of Fourier series at a given point. However, it says nothing about uniform convergence. For this, we present the so called Fundamental theorem. We first define the concepts of piecewise continuous, smooth and very smooth functions. A function defined in the closed interval [u,b] is piecewise continuous if the interval can be subdivided into a finite number of subintervals, where in each of these intervals the function is continuous and has finite limits at both ends of the interval. Furthermore, if the function, f ( z ) ,coincides with a continuous function, f i ( z ) , in the ith subinterval and if f i ( z ) has continuous first derivatives, then we say that f ( z ) is piecewise smooth. If, in addition, the function fi(z)has continuous second derivatives, we say f(z)is piecewise very smooth.
13.6
T H E FUNDAMENTAL T H E O R E M
Theorem 13.3. Let f ( z )be a piecewise very smooth function in the interval [ - 7 r , 7r] with the period 27r, then the Fourier series
- 5+ C 00
j(z)
a0
(a,cosnz + b, s i n n z ) ,
(13.82)
n=l
where the coefficients are given as
a, = 7r
b,
=
r
1 7r
f(z)cosnz, n = 0,1,. .. ,
(13.83)
f(z)sinnz, n = 1 , 2 , . . .
(13.84)
-7r
-7r
converges uniformly t o f(z)in every closed interval where f ( z ) is continuous. At each point of discontinuity, zl, inside the interval [ - ~ , 7 r ] , Fourier series converges to
and at the end points
17:
=b r to
1 [ lim 2 x-i?r-
1
f ( z ) + 2 lim - * 7 rf(z) + .
(13.86)
For most practical situations the requirement can be weakened from very smooth to smooth. For the proof of this theorem we refer the reader to Kaplan (p. 490). We remind the reader that all that is required for the convergence of a Fourier series is the piecewise continuity of the first and the second derivatives of the function. This result is remarkable in itself, since for the convergence of Taylor series, derivatives of all orders have to exist and the remainder term has to go t o zero.
597
UNIQUENESS OF FOURIER SERIES
13.7
UNIQUENESS OF FOURIER SERIES
Theorem 13.4. Let f(x) and g(x) be two piecewise continuous functions in the interval [ - T , 7r] with the same Fourier coefficients, that is,
1J"-" f(x)cosnx = -= 7r
J"-"
( 13.87)
g ( x )cosnx,
(13.88) Then f(x) = g(x),except perhaps a t the points of discontinuity. Proof of the uniqueness theorem follows at once, if we define a new piecewise continuous function as
and write the Fourier expansion of h(x)and use Equations (13.87) and (13.88). 13.8
EXAMPLES OF FOURIER SERIES
13.8.1 Square Wave
A square wave is defined as the periodic extension of the function (Fig. 13.1) -1,
-7r
+I,
o
(13.90)
f(x) =
W e first evaluate the expansion coefficients as
7r
b, =
-1J 7r
I"
cosnxdx=O, n = 0 , 1 , ... ,
~ ~ = - ; l[ ~ coo s n z d x + -
0,
0
sinnx d x +
7r
-T
n=2,4
(13.91)
,...,
n = l , 3 , ...
, (13.92)
which gives the Fourier series 4
c O3
f(x) = ; n=l =
4
sin(2n - 1). (2n - 1) 4 . 37r
(13.93)
4 57r
- s i n z + - s1n3x + - sin52 + , 7r
(13.94)
598
FOURIER SERIES
Figure 13.1
Square wave and partial sums.
We plot the first four partial sums,
4
5’1 = - sinx, iT
4 . +sin 32, 3n 4 5’s = 5’2 + -sin5x, 57r 4 . 5’4 = 5’3 + -s1n72, 77r 5’2 = 5’1
(13.95) (13.96) (13.97) (13.98)
in Figure 13.1, w ich clearly demonstrates the convergence of t,,e Fourier series. Note that at the points of discontinuity the series converges to the average
Convergence near the discontinuity is rather poor. In fact, near the discontinuity the Fourier series overshoots the value of the function at that point. This is a general feature of all discontinuous functions, which is called the Gibbs phenomenon. The first term no/2 is zero, which represents the mean value of f(x) over the interval (-7r, n).
EXAMPLES OF FOURIER SERIES
Figure 13.2
13.8.2
599
Partial slims for the triangular wave.
Triangular Wave
Let us now consider the triangular wave with the period 2 ~which , is continuous everywhere (Fig. 13.2): f(x)=
(13.100)
We find 0
a,,=lST(;+x)cosnxdx+7 r -
7r
sin n x
[-$(I
=
2
-
cosn7r)
= -(1 - cosn7r), 7rn2
1 + -(I n2
l*(t
-
x) cos n x dx
sinnx d x + . . .
I
- cosnn)
n = 1 ,2 ,
1
(13.101) (13.102) (13.103) (13.104)
The coefficient a0 needs to be calculated separately as (13.105)
600
I,
FOURIER SERIES
n
------------
V
-n
3n
R
Figure 13.3
471:
w
Periodic extension.
Similarly, we evaluate the other coefficients: bn =
n-
=o,
1' (t+
z) s i n n z
dz + 7r
-77-
1-(;
-
z) s i n n z dz
n = 1 , 2,... .
(13.106) (13.107)
Hence, the Fourier expansion of the triangular wave becomes
4
f ( z ) = - cosz
4 4 +cos3z + -cos5z f . 257r
97r cos(2n- 1). (2n - 1 ) 2 .
=-c 7r
4
O0
n=l
..
(13.108) (13.109)
The first three partial sums are plotted in Figure 13.2. For the triangular wave the function is continuous everywhere but the first derivative has a jump discontinuity at the vertices of the triangle, where the convergence is weaker.
13.8.3
Periodic Extension
Each term of the Fourier series is periodic with the period 27r, hence the sum is also periodic with the same period. In this regard, Fourier series cannot be used to represent a function for all z if it is not periodic. On the other hand, if a Fourier series converges to a function in the interval [-n-,n-], then the series represents the periodic extension of f ( z ) with the period 27r. Consider the function shown in Figure 13.3:
f(.) =
{
0,
-7r
2,
0 2 2
where the dashed lines represent its periodic extension.
(13.110)
FOURIER SINE AND COSINE SERIES
Figure 13.4
601
Fourier series representation of f(z) by S7.
Fourier coefficients of f ( x ) are found as a0 =
-
a, =
-7r
(1:O
f ( x ) dx =
d x + L " s dx)
I"
x cos nz dx =
b , = L7rL " x s i n n x d x =
7r
=
2,
(13.111)
cosnx+nzsinnz rn2 sin n x - nx cos n x
(13.112)
7rn2
(13.113)
--
where n = 1 , 2 , .. . . Thus, the Fourier series representation of f ( x ) in the interval (-7r,r)is 00
(-1)n
-
1
cosnx
n=l
-
1
(-'Inn sinnx
.
(13.114)
Fourier extension of f ( x ) converges at the points of discontinuity to (Fig. 13.4) 1
-2 [ f ( r -+ ) f(.+)]
7r
=
5.
(13.115)
13.9 FOURIER SINE AND COSINE SERIES So far we have considered periodic functions with the period 27r or functions defined in the interval [-7r,n].If the function has the period 27r, then any
602
FOURIER SERIES
interval [c, c+27r] can be used. For such an interval, the Fourier series becomes
--
w
+
f ( z )=
a, cos nz
+ b, sin nz,
(13.116)
n=l
a, = 1 b,=-L
f (z) cos nz dz,
(13.117)
f ( z )sin nz d z .
( 13.118)
C+2T
7r
Using [ - T , 7r] as the fundamental interval has certain advantages. If the function is even, that is, f ( - z ) = f ( z ) ,all the b, are zero, thus reducing the Fourier series to the Fourier cosine series
2+c 00
f(.)
=
a, cosnz,
(13.119)
f ( ~C O) S E E dx.
(13.120)
n=l
LT
a, = 7r
Similarly, for an odd function where f ( - z ) giving the Fourier sine series as
=
- f ( z ) , all a, are zero, thus
c a2
f ( z )=
(13.121)
b, s i n n z ,
n=l
b, = 13.10
27r
f ( z )s i n n z d z .
( 13.122)
CHANGE OF INTERVAL
If the function f(z)has the period 2L, that is, f ( z series is written as
f ( x )=
7+ 1 / L
a, cos
L
a, =
(nrxz)+ b, sin
00
n=l
f ( z )cos
-L
+ 2L) = f ( z ) ,the Fourier
(yz)
dx,
b,,, = zl ~ LLf ( x ) s i n ( ~ z ) d x .
(13.123) (13.124) (13.125)
In this case the Fourier cosine series is given as
f(.)
c ;1 a0
=
00
+
n=l
nr a, cos ( y.), 11: E 10, L],
( 13.126)
L
a,
=
f ( z )cos ( Y z ) dz,
( 13.127)
603
INTEGRATION AND DIFFERENTIATION OF FOURIER SERIES
and the Fourier sine series become M
(13.128) (13.129) Example 13.1 P o w e r in periodic signals: For periodic signals the total power is given by the expression (13.130) Substituting the Fourier expansion of f ( z )we write
+2anbm cos (nwoz)sin (mwox)
+ bnb,
sin (nwoz) sin (mwoz)]} dz,
(13.132) where we have substituted wo = T/L. Using the orthogonality relations [Eqs. (13.50)-(13.52)], we finally find
(13.133)
13.11 INTEGRATION A N D DIFFERENTIATION OF FOURIER SERIES
The term by term integration of Fourier series, 00
(13.134) n=l
yields sinnz n=l
+cosnz n
Hence the effect of integration is to divide each coefficient by n.If the original series converges, then the integrated series will also converge. In fact, the integrated series will converge more rapidly than the original series. On the other hand, differentiation multiplies each coefficient by n,hence convergence is not guarantied.
604
FOURIER SERIES
PROBLEMS
1. Using properties of the inner product, prove the following inequalities:
and
2. Show that the following set of functions is orthonormal: einx
{ u n ( x )= -},
72
6
= 0,2,.. .
,
2 E
[0,27r]
3. Show the following orthogonality relations of the Fourier series:
1
cos n x cos m x dx
/7r
7r
/*
1 1 7r
= S,,
-7r
sin n x sin m x dx =,,,S
-7r
sin n x cos m x dx = 0. -7r
4. Show that the Fourier series can be written as
f ( x )= g
s_, 7r
f ( t ) dt
c
lo" +77 n=l
/T
f ( t )c o s n ( x - t ) dt, n = 0 , 1 , 2 , . .
-=
5. Dirichlet integral is given as
which says that the value of the integral depends entirely on the local behavior of the function g ( t ) near 0. Show that Riemann-Lebesgue lemma allows us to replace t with another function, say sint, that has the same behavior near 0 without affecting the result.
6. After proving the following trigonometric relations: t
k=l
2
sin -
PROBLEMS
and
5
sin kt
=
(sin
k=l
605
t
sin - / sin 2
show that the function [Eq. (13.65)]:
k=l
can be written as
Dn(2t) =
sin(2n + 1)t , 2 sin t
t # m x , m is an integer,
(n+i),
t = m x ,m is an integer.
Hint: First show that
and then take the real and imaginary parts of this equation, which gives two useful relations, where the real part is the desired equation.
7. Expand f (z) = 3z
+ 2 in Fourier series for the following intervals:
6) ( - T , T ) . (ii) (0,27r).
(iii) In Fourier cosine series for [0, 7r]. (iv) In Fourier sine series for (0, T ) . (v) In Fourier series (0, T ) . Graph the first three partial sums of the series you have obtained.
8. Expand f ( z ) = x2 in Fourier series for the interval (0,27r) and plot your result. 9. Expand s i n z in Fourier cosine series for [0,7 r ] . 10. Expand s i n z in Fourier series for [O,27r]. 11. Find the Fourier cosine series of f ( z ) = z in the interval [0,7r].
12. Find the Fourier series of the saw tooth wave O < X < T ,
f(x)= x-~x,
T < X < ~ I I
606
FOURIER SERIES
and plot the first three partial sums.
13. Fourier analyze the asymmetric square wave d,
O<X
0,
7r<x<27r.
f(x) =
14. Show that
15. Show that the Fourier expansion of
is given as
16. Show that the Fourier expansion of
f(.)
=
{
2,
0<x<7r,
-5,
-7r<x<0
is given as 7r 4 f(.) = - - -
c
=nn=O
17. Show that
cos(2n+l)x (2n
+
CHAPTER 14
FOURIER AND LAPLACE TRANSFORMS
Using Fourier series, periodic signals can be expressed as infinite sums of sines and cosines. For a general nonperiodic signal, we use the Fourier transforms. It turns out that most of the signals used in applications can be broken down into linear combinations of sines and cosines. This process is called spectral analysis. In this chapter, Fourier transforms are introduced along with the basics of signal analysis. Apart from signal analysis, Fourier transforms are also very useful in quantum mechanics and in the study of scattering phenomenon. Another widely used integral transform is the Laplace transform, which will be introduced with its basic properties and applications to differential equations and to transfer functions. 14.1 TYPES OF SIGNALS
Signals are essentially modulated forms of energy. They exist in many types and shapes. Gravitational waves emitted by a collapsing star and the electromagnetic pulses emitted from a neutron star are natural signals. In quantum mechanics, particles can be viewed as signals in terms of probability waves. In technology, electromagnetic waves and sound waves are two predominant Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
8. S e l p k
Bayin
607
608
FOURIER AND LAPLACE TRANSFORMS
X
t
I
I
I
I I
,
I
I 1 I I I
Figure 14.1 Analog signal (top) and the sampled signal (bottom) with the sampling frcquciicy I / At.
sources of signals. They are basically used to transmit and receive information. Signals exist in two basic forms as analog and digital. Analog signals are given as continuous functions of time. Digital signals are obtained from analog signals either by sampling or binary coding. In sampling (Fig. 14.1) we extract information from an analog signal with a certain sampling frequency, f s = 1/&. Given a sampled signal, the main problem is to generate the underlying analog signal. In general, there exist infinitely many possibilities. For the sampled signal on the top in Figure 14.2, any one of the furict ions: cos 5t, cos lot, cos 25t, . . .
(14.1)
with the frequencies 5/27r, 10/27r, 25/27r,. . . , could be the underlying signal. To eliminate the ambiguity, we can either increase the sampling frequency or obtain some prior information about the underlying signal. For example, we may determine that the frequency cannot be greater than (8/2n)Hz, which in this case implies that the underlying signal is cos 5t. In fact, a theorem by Shannon, also called the sampling theorem, says that in order to recover
SPECTRAL ANALYSIS AND FOURIER TRANSFORMS
609
.
Figure 14.2 Ambiguity (bottom) in the sampled signal (top): 1 = cos5t, 2 = cos lot, 3 = cos 2 3 .
the original signal unambiguously, the sampling frequency has to be a t least twice the highest frequency present in the sampled signal. This minimum sampling frequency is also called the Nyquist sampling frequency (Kusse and Westwig). In characterizing signals we often refer to amplitude and phase spectrums. For the signal
the amplitude and phase spectrums are obtained by plotting the amplitudes and the phases of the individual cosine waves that make up the signal as functions of frequency. In Figure 14.3, on the top we show the underlying analog signal [Eq. (14.2)] and on the bottom the corresponding amplitude and phase spectrums. The amplitude and phase spectrums of a signal completely characterize the signal, and together they are called the frequency spectrum (Woolfson and Woolfson).
610
FOURIER AND LAPLACE TRANSFORMS
:j ;I!
b P
1
1
is,
2[
; 5 ,
2;f
-1
-2 -3
Figure 14.3 2 cos(l0t - 2)
-2 -3
-
Spectral analysis of the analog signal on the top, z ( t ) = cos(5t + 1)+ 3cos(25t + a), and its amplitude and phase spectrums.
14.2 SPECTRAL ANALYSIS A N D FOURIER TRANSFORMS For a given analog signal (Fig. 14.3 top), it is important to determine the amplitude, the frequency, and the phases of the individual cosine waves that make up the signal. For this purpose we introduce the correlation function or the correlation coefficient denoted by r'. The correlation coefficient between two samples: 2 =
{xi} and y = {yi}, i = 1 , 2 , . . . , N ,
(14.3)
is defined as (14.4) where cT and gy are the standard deviations of the samples and () denotes the mean value of the corresponding quantity. When the two samples are identical, x = y, their correlation coefficient is 1. When there is no systematic relation between the two samples, their correlation coefficient is 0. Hence, the closer r is to unity, the stronger the relation between the two samples. We usually take y as a cosine or a sine wave with unit amplitude, where the standard deviation is fixed and the mean is 0, thus reducing the correlation
611
CORRELATION WITH COSINES AND SINES
coefficient to
. N-1 (14.5) which is also called the modified correlation coefficient. For continuous signals, the correlation coefficient is defined as the integral (14.6) where T is the duration of the signal or the observation time.
14.3
CORRELATION WITH COSINES A N D SINES
Let the signal be a single cosine function:
~ ( t=)A c o s ( 2 ~ f o+ t &),
(14.7)
where A stands for the amplitude, f o is the frequency measured in hertz (Hz), and Qo is the phase. To find the modified correlation coefficient, Ro, we use a cosine function with zero phase,
y(t)
= cos (2Tft) ,
(14.8)
where f is a test frequency, to write T/2
Ro = l
A c o s ( 2 ~ f o+ t Qo) c o s ( 2 ~ f t )d t .
(14.9)
T / 2
Using the trigonometric identity
cos ACOSB
=
1 2
1
- COS(A + B ) + - COS(A- B ) , 2
(14.10)
we write Ro as
This is easily integrated t o yield
+
27r(fo
-
f ) , - Qo
11
.
(14.12)
612
FOURIER AND LAPLACE TRANSFORMS
Figure 14.4 T=l.
Modified correlation function Ro for z(t) = 15cos(2n4t
+ 1.5) and
Using the relation sin(A i B ) = sin A cos B i cos A sin B , Equation (14.12) can be simplified to
RO --
24.f
+ fo)
+ W f - fo)
(14.13)
In Figure 14.4 we plot Ro for
z ( t )= 1 5 c o s ( 2 ~ 4+ t 1.5)
(14.14)
with T = 1. There are two notable peaks, one of which is at f = -4 and the other a t f = 4. The first peak a t the negative frequency is meaningless; however, the other peak gives the frequency of the signal. The next problem is to find the amplitude and the phase of the given signal. For this we look at Ro in Equation (14.13) more carefully. Near the peak frequency, f = f o , the second term dominates the first; thus the peak value REax can be written as (14.15) where we have multiplied and divided by T/2. Using the limit (14.16)
CORRELATION WITH COSINES AND SINES
Figure 14.5
613
+
Correlation function SOfor z ( t )= 1 5 c o s ( 2 ~ 4 t 1.5) and T = 1.
we obtain
Rrax=
(F)
cos 60.
(14.17)
In other words, the peak value of the correlation function gives us a relation between the amplitude, A, and the phase, 00, of the signal. To find A and 60, we need another relation. For this we now correlate the signal with a sine function, y(t)
= sin(27rft),
(14.18)
and write rT/2
so = J-T,2 A cos(27rfot + 60) sin(27rft) d t .
(14.19)
Following similar steps we can evaluate this integral to obtain
(14.20)
+
For the signal z ( t ) = 15cos(27r4t 1.5), SOpeaks again a t f = 4 but with a negative value (Fig. 14.5). Following similar steps that lead to Bornax,we can write the peak value of SOas (14.21)
614
FOURIER AND LAPLACE TRANSFORMS
Figure 14.6
Ro for z ( t )= 40cos(2?r2t + 1.5) + 15cos(2dt + 1) for T = 1.5.
which gives another relation between A and 00. We can now use Equations (14.17) and (14.21) to evaluate the amplitude and the phase of the underlying signal as
A=
y w ,
(14.22) (14.23)
+
For thc signal z ( t ) = 1 5 c o s ( 2 ~ 4 t 1.5) we read the values of RFax and from the Figures 14.4 and 14.5 as
Rfax == 0.52, Srax == -7.5,
,Fax
(14.24)
which yield the amplitude and the phase as
A = 15 and
00 =
1.5,
(14.25)
as expected. Note that the modified correlation functions, Ro and So [Eqs. (14.9) and (14.19)],are linear integral operators. Hence for signals composed of several cosine waves the corresponding correlation functions will be the sum of the correlation functions of the individual terms. For example, the correlation function, Ro, of
z ( t )= 40 c o s ( 2 ~ 2+ t 1.5) + 15 cos(2n4t + 1) has two peaks a t
fl
= 2 and
f2
= 4 (Fig. 14.6).
(14.26)
CORRELATION FUNCTIONS AND FOURIER TRANSFORMS
14.4
615
CORRELATION FUNCTIONS AND FOURIER TRANSFORMS
Using complex notation we introduce the modified complex correlation function as
X ( f ) = Ro
-
(14.27)
iSo
TI2
-LTI2
A c o s ( 2 ~ f o+ t 0,) cos ( 2 ~ f td)t
rTl2
+
A c o s ( 2 ~ f o t 0,) sin (2nft) d t ,
-i
(14.28)
J-T/2
which for a general signal, x ( t ) ,can be written as
X ( f )=
1"
x ( t )[cos ( 2 ~ f t ) isin ( 2 ~ f t )dl t
(14.29)
-TI2
(14.30) If the signal has infinite duration or it is observed for an infinite amount of time, we let T 00, thus obtaining the expression ---f
(14.31) which is called the Fourier transform of the signal z ( t ) . The amplitude spectrum, A ( f ) , of the signal is now the modulus of X ( f ) : A ( f ) = IX(f)l =
(Rg + Si)' I 2.
(14.32) (14.33)
The phase spectrum, 0(f),is the argument of X(f), that is,
Kf)= a r g X ( f ) = tan-'
(-$)
(14.34) (14.35)
Note that the phase spectrum is meaningful only when the amplitude spectrum is nonzero. 14.5
INVERSE FOURIER TRANSFORM
We have seen that Fourier transform of a signal is defined as
1
00
X ( f )=
-00
z(t)e-2"ift dt.
(14.36)
616
FOURIER AND LAPLACE TRANSFORMS
4
3
5
4
Figure 14.7
Amplitude spectrum of z ( t ) = 15 c o s ( 2 ~ 4 tt 1.5) for T = 1.
Multiplying both sides by ei2"ft and integrating over
X ( f ) e i 2 " f t df
=
/
00
00
dfei2"ft
f gives
(t')e--i27rft' dt'
( 14.37)
-00
(14.38) Using one of the representations of the Dirac-delta function,
(14.39) J
-00
we can write Equation (14.38) as
(14.40) = x(t).
(14.41)
Thus, the inverse Fourier transform is defined as 00
x(t)=
.I,
X(f)e22"ftdf.
(14.42)
FREQUENCY SPECTRUMS
14.6
617
FREQUENCY SPECTRUMS
In Figure 14.4 we have seen that the modified correlation function, Ro, for ~ ( t=) 15 cos(27r4tS1.5) is peaked a t f = 4. Let us now consider the amplitude spectrum, A(f ) ,for this signal [Eq. (14.33)]. The top plot in Figure 14.7 gives A ( f ) for T = 1. Ideally speaking, if the integration was carried out for an infinite amount of time, the amplitude spectrum would look like the bottom plot, that is, an infinitely long spike at f = 4, which is a Dirac-delta function. Similarly, when T goes t o infinity, the plot of Ro in Figure 14.4 will consist of two infinitely long spikes at f = -4 and f = 4. To see what is going on, let us consider the cosine signal
~ ( t=) A c o s ( ~ T ~ ~ ~ ) ,
(14.43)
where we have set the phase to zero for simplicity. The modified correlation function, Ro, for this signal is written as
Ro
=
24.f
+fo) (14.44)
Near the peak, Ro is predominantly determined by the second term. Hence, near fo we can write
Ro
C!
24.f
-
(14.45)
fo)
which has the peak value
AT RFax = -
(14.46) 2 ’ The first two zeros on either side of the peak are symmetrically situated about f o , which are determined by the zeros of the sine function in Equation (14.45): 27r(f
T
-
f o ) T = T and 27r(f
- f0)-
T 2
= -7r,
(14.47)
which give the frequencies
fl
=fo+,,
f2
= fo
1
-
1 -.
T In other words, the width of the peak is determined by
(14.48) (14.49)
(14.50) In Figure 14.4, Af is shown for T = 1. The longer the duration of the signal or the observation time, the sharper and taller the peak will be. In the limit as T --t 00 it will be an infinitely long spike.
618
FOURIER AND LAPLACE TRANSFORMS
14.7
DIRAC-DELTA F U N C T I O N
One of the most frequently used functions of mathematical physics is the Dirac-delta function or the impulse function. There are several ways of defining a Dirac-delta function. For our purposes the following equivalent forms are very useful: S(z - z’) = -
dkeik(x-z’)
(14.51) (14.52)
=
- lim
sin k ( z - z’)
1
(14.53)
When the argument of the Dirac-delta function goes to zero, we obtain infinity: lim S(z - z’)
x-x’
+
k lirn - --$ 00.
k-cc
T
(14.54)
Area under a Dirac-delta function can be found by using the well-known definite integral dy = T
(14.55)
d z = 1.
(14.56)
as +cc
S(z
- 2’)
Dirac-delta function allows us to pick the value of a function a t the point that makes the argument of the Dirac-delta function zero:
s_,
+rn
f(z)6(z- z’)dz = f ( z ’ ) .
(14.57)
This is also called the sampling property. To visualize a Dirac-delta function, S(z - d ) ,consider a rectangular region of unit area centered at z = z’ with the base length A L and height S(0). A Dirac-delta function described by Equations (14.56) and (14.57) is obtained in the limit as A L + 0, while the area of the rectangle is preserved as 1. In this limit, we obtain an infinitely long spike, that is, a function that is zero everywhere but assumes the value infinity at z = 2’. However, the area under this function is 1 (Prob. 14.11). Comparing the correlation function, Ro [Eq. (14.45)], with the Dirac-delta function [Eq. (14.52)], it is seen that as T + 00, peaks become infinitely long spikes. The same argument goes for the peaks in SO and the amplitude
A CASE WITH TWO COSINES
a
2
Figure 14.8
619
Freq
A(f) for z ( t ) = 4 0 c o s ( 2 ~ 2 + t 1.5) + 15cos(2~4t+ 1) for T = 0.5.
spectrum. Properties of Dirac-delta function
JTmf(z)S(n'(z- a ) d z = (-l)"f'"'(a), q-2) = q z ) , 1
6 ( a z ) = --s(z), a # 0, la1 S(z - a ) = O'(z
-
a),
where O(IC - a ) is the unit step function: 0 for
14.8
IC
< a and 1 for IC > a.
A CASE WITH TWO COSINES
Let us now consider the amplitude spectrum for a signal composed of two cosines:
z ( t )= 4 0 c o s ( 2 ~ 2+ t 1.5) + 15 c o s ( 2 ~ 4+ t 1).
( 14.58)
The amplitude spectrum for T = 0.5 is shown in Figure 14.8. Instead of the expected two peaks, we have a single peak. What happened is that the integration time is too short for us to resolve the individual peaks. That is, the two peaks are so wide that they overlap. In Figure 14.9 we increase the integration time to T = 2.5. Now the two peaks can be seen clearly. Note that the first peak at f = 2 is off the scale.
620
FOURIER AND LAPLACE TRANSFORMS
i Figure 14.9
2
4
3
A ( f ) for z ( t ) = 4 0 c o s ( 2 ~ 2+ t 1.5) + 15 cos(27r4t + 1) for T = 2.5.
14.9 GENERAL FOURIER TRANSFORMS A N D THEIR PROPERTIES In general, the Fourier transform of a function, z ( p ) , in terms of a parameter, a. is defined as
(14.59) Thc inverse Fourier transform is written as M
z(p) =
X(a)ei"P d a , J
(14.60)
-M
where cy arid p are two conjugate variables related by the Fourier transform. Symbolically, Fourier transforms are shown as
X ( a ) = F { z ( p ) }and x ( p ) = F - l { X ( a ) } . For thc existence of a Fourier transform, the integral exist. Since le-Lapl = 1, we can write the inequality 00
IL
z ( P ) e - i a p d3j
5
s-",
(14.61)
z(j?)e-i"Pd/? has to
lI
I.(P)I @.
(14.62)
Hence, the Fourier transform, X ( a ) ,exists if the function, z(P), is absolutely integrable, that is, if the integral /z(,@( d p is finite. The Fourier transform has a number of very useful properties that make it exceptionally versatile. First of all, F and F-' are linear operators. If
s-",
Xl(Q)= F{Zl(P)} and
X2(a) = F { z 2 ( @ ) } ,
621
GENERAL FOURIER TRANSFORMS AND THEIR PROPERTIES
then where c1 and c2 are constants. Similar equations for the inverse operator F-’ hold. Next, when z(P) has continuous derivatives through order n, we can write (14.64) = (ia)RX(a),
(14.65)
which makes Fourier transforms extremely useful for applications t o differential equations. Also, if c is a real constant, then we can write
F{Z(@ - c ) } = e-zcCuX(a),
(14.66)
F{eicPII:(p)}= ~ ( c -r c).
(14.67)
Using two functions, x(P) and y(P), we can form their convolution, II: * y h(/3),to obtain another function, h(P),as
lm .(MP
=
+m
h(P) =
-
t ) dt.
(14.68)
Using the definition of Fourier transforms, it can be shown that 1z:
* y = P {X ( a ) Y ( a ) } .
(14.69)
In other words, the inverse Fourier transform of the product of two Fourier transforms is the convolution of the original functions. Let X ( a ) and x(p) be two functions related by the Fourier transform equation [Eq. (15.59)]. Taking the complex conjugate of X ( a ) we write oo
X * ( a )=
.I_,
z * ( P )exp[iap] dp.
(14.70)
Integrating I X ( a ) I 2over all space and using the definition of Dirac-delta function we can write
J,
+a
2
IX(a)I dcr
[.i_+Z [s”
z * ( P )exp[iaPI dP]
= /oo --o=
1 lm Jm +-I.[: x
-
z(P0 exp[-iaP’] dp‘
+m
+oo
(14.71)
I
exp~-ic\.(~’ - P)I d a ~ d / 3
.*(PMP’)
(14.72) +a
foo
--.Ioo Loo (P).(P’)W’ II:*
-
-
P ) dP’dP
(14.73)
+m
(14.74)
622
FOURIER AND LAPLACE TRANSFORMS
Using the defining property of Dirac-delta function [Eq. (14.57)], this gives us an important identity,
+w
2
+w
IX(a)I d a =
(14.75)
which is known as the Parceval's theorem. One of the main applications of Fourier transforms is to quantum mechanics, where the state functions in position space, Q(z), are related to their counterparts in momentum space, @ ( p ) , via the Fourier transform. That is,
(14.76)
where h is the Planck's constant. For further properties of Fourier transforms, existence theorems, and tables of transforms, we recommend books specifically written on Fourier transforms.
14.10
BASIC D E F I N I T I O N OF LAPLACE T R A N S F O R M
We define Laplace transform of a function, ~ ( t as ),
X ( s ) = L{z(t)} =
.I"
z(t)ePstdt,
(14.77)
where the operator L indicates the Laplace transform of its argument. Laplace transforms are linear transformations, which satisfy
where a is a constant. Because of their widespread use, extensive tables of Laplace transforms are available. The following list of elementary Laplace transforms can be verified by direct evaluation of the integral in Equation (14.77):
BASIC DEFINITION OF LAPLACE TRANSFORM
623
Elementary Laplace Transforms
44
X(S)
(1)
1
l/s, s
(2)
t"
n!/sn+l, s
(3)
eat
l/(s - a ) , s > a
(4)
sinbt
b/(s2
(5)
cosbt
s/(s2
(6)
tneCat
?I!/(.
(7)
sinhbt
b / ( s 2 - b'), s
(8)
coshbt
s / ( s 2- b2), s > b
(9)
t sin bt
2bs/(s2
(10)
t cosbt
+ b2)', s > 0 ( s 2- b 2 ) / ( s 2+ b 2 ) 2 ,s > 0
(11)
6(t - a )
e-a s , a 2 0
>0 > 0 , n > -1
+b2), s > 0 + b2), s > 0 + a)n+', n > -1,
sf a
>0
>b
Inverse of a Laplace transform is shown as L-', which is also a linear operator:
L ? { X ( S ) + Y ( s ) }= L - l { X ( s ) } + L + { Y ( S ) } , L - ' ( U X ( S ) } = U L - ' { X ( S ) } , a is a constant.
(14.80) (14.81)
The above table can also be used to write inverse Laplace transforms. For example, using the first entry, we can write the inverse transform
L-l{
;}
= 1.
(14.82)
Two useful properties of Laplace transforms are given as
c > 0, t > c
L { I c (~ c ) } = e-''X(s),
(14.83)
and
L{eb"(t)} = X ( s - b ) , where more such relations can be found in Bayin. The convolution, is defined as
z ( t )=
Jc'
z(t')y(t
-
t') dt'
(14.84) IC
*y = z, (14.85)
624
FOURIER AND LAPLACE TRANSFORMS
It can be shown that the convolution of two functions, z(t) and y ( t ) , is the inverse Laplace transform of the product of their Laplace transforms: z
* y = X-l{x(s)Y(s)}.
(14.86)
In most cases, by using the above properties along with the linearity of the Laplace transforms, the needed inverse can be generated from a list of elementary transforms.
Example 14.1. I n v e r s e Laplace t r a n s f o r m s : Let us find the inverses of the following Laplace transforms: S
xl(s) = ( s + 1)(s + 3 ) '
(14.87) (14.88) (14.89)
Using partial fractions (Bayin), we can write X l ( s ) as S
(14.90)
Using the linearity of i?' and the third entry in the table we obtain
1 - -- ,-t 2
3 + -e-3t. 2
(14.91)
For the second inverse, we complete the square to write 1 X-l { X z ( s ) }= X-l { s 2 + 2 s + 3 } = x-1 = 1-1
{ {
s2
+ 2sl+ 1+ 2 l
( s + 1 l) 2 + 2 l .
(14.92)
We now use the fourth entry in the table t o write (14.93)
DIFFERENTIAL EQUATIONS AND LAPLACE TRANSFORMS
625
and employ the inverse of the property in Equation (14.84):
L-l { X ( s - b ) } = e b t z ( t ) ,
(14.94)
t.o obt,ain (14.95) For the third inverse, we use the property in Equation (14.83) to write the inverse:
L-’ ( e - “ ” X < s ) }= z(t - c ) , c > 0, t > c,
(14.96)
along with S
cosht,
(14.97)
Lpl { X , ( t ) } = cosh(t - c ) .
(14.98)
=
thus obtaining the desired inverse as
14.11 DIFFERENTIAL EQUATIONS A N D LAPLACE TRANSFORMS An important application of the Laplace transforms is t o ordinary differential equations with constant coefficients. We first write the Laplace transform of a derivative as (14.99) which, after integration by parts, becomes
Assuming that s > 0 and the limit limtioo z ( t ) e p s t
---f
0 is true, we obtain (14.101)
where s ( 0 + ) means the origin is approached from the positive t-axis. Similarly, wc find (14.102) =S2X(S)
-
SX(O+) - X’(O+),
(14.103)
626
FOURIER AND LAPLACE TRANSFORMS
where we have assumed that all the surface terms vanish in the limit as t + 03 and s > 0. Under similar conditions, for the n t h derivative we can write
Example 14.2. Solution of differential equations: Consider the following ordinary differential equation with constant coefficients and with a nonhomogeneous term:
d2x
dx
+ 2-dt dt2
+ 4x(t) = sin%,
( 14.105)
where the initial conditions are given as
x(0) = 0 and x’(0) = 0.
(14.106)
Assuming that Laplace transform of the solution exists, X ( s ) = L{x(t)}, and using the fact that L is a linear operator, we write
L
[ S 2 X ( S )-
sx(0) - x’(O)]
{
d2x
+ 2-dx + 4z(t) dt
I
=
L{sin2t),
( 14.107)
z + 2 [ s X ( s )- z(0)] + 4 X (s ) = s2+4‘
(14.109) By imposing the boundary conditions [Eq. (14.106)] we obtain the Laplace transform of the solution as S2X(S)
2 + 2 s X ( s ) + 4X(S) = 52 + 4’
( 14.110) n
X ( s )=
L
(s2+2s+4)(s2+4)’
(14.111)
To find the solution, we now have to find the inverse transform
Z(t) = L-I
{
2
(s2
+ 2s + 4) ( s 2 + 4)
I
(14.112)
TRANSFER FUNCTIONS AND SIGNAL PROCESSORS
627
Using partial fractions we can write this as
(14.113)
+ +
(9 2s 4) (s2
(14.114)
+ 2s + 4)
S
+ 4) + ' p i s + l }. 4 (s + +3 s2
(s
+ +3 1)2
(14.115)
1)2
Using the forth and the fifth entries in the table and Equation (15.84) we obtain the solution as
z ( t ) = --cos2t 41
+e;t
(
sn iJ?
(14.116)
14.12 TRANSFER F U N C T I O N S A N D SIGNAL PROCESSORS There are extensive applications of Laplace transforms to signal processing, control theory, and communications. Here we consider only some of the basic applications, which require the introduction of the transfer function. We now introduce a signal processor as a general device, which for a given input signal, u ( t ) ,produces an output signal z ( t ) .For electromagnetic signals the internal structure of a signal processor is composed of electronic circuits. The effect of the device on the input signal can be represented by a differential operator, which we take to be a linear ordinary differential operator with constant coefficients, say
d O=a--1,
a>0.
dt
(14.117)
The role of 0 is to relate the input signal, u ( t ) ,to the output signal, z ( t ) ,as
O z ( t )= u ( t ) ,
(14.118)
ad z ( t )+ z ( t ) = u ( t ) . dt
(14.119)
Taking the Laplace transform of this equation, we obtain
+
a s X ( s )- a z ( 0 ) X ( s ) = U ( s )
(14.120)
628
FOURIER AND LAPLACE TRANSFORMS
Figure 14.10 A single signal processor.
Since there is no signal out when there is no signal in, we take the initial conditions as
x ( 0 ) = 0 when u(0)= 0.
(14.121)
Hence, we write
(as
+ 1)X(s)= U ( s ) .
(14.122)
X(S) 1 U ( s ) as 1
( 14.123)
The function defined as
G ( s )= -- -
+
is called the transfer function. A general linear signal processor, G ( s ) , allows us to obtain the Laplace transform of the output signal, X ( s ) , from the Laplace transform, U ( s ) ,of the input signal as
X(S) = G ( s ) U ( s ) .
(14.124)
A single component signal processor can be shown as in Figure 14.10. For the signal processor represented by 1 (14.125) G ( s )= 1+as’ consider a sinosoidal input as u ( t )= sinwt.
( 14.126)
Since the Laplace transform of u ( t ) is
U ( s )=
W ~
52
+ w2’
(14.127)
Equation (14.124) gives us the Laplace transform of the output signal as X ( s )=
W
(9+ w 2 ) ( 1 + a s ) .
(14.128)
Using partial fractions, we can write the inverse transform as
z ( t )= F { X ( S ) }
(14.129)
CONNECTION OF SIGNAL PROCESSORS
Series connection of signal processors.
Figure 14.11
Figure 14.12
629
Parallel connection of signal processors.
which yields
x(t) =
[
1 +
w2a2]
sinwt -
[
wa +
w2a2]
coswt
a ] eCtla. (14.130) + [ 1+ww2a2
The last term is called the transient signal, which dies out for large times, hence leaving the stationary signal as
x ( t )=
[
I, ,
1
+
[+ ]
sinwt - 1
wa
w2a2
cos wt.
(14.131)
This can also be written as (14.132) where S = tan-' aw. In summary, for the processor represented by the differential operator
d dx
O=a--1,
a>0,
(14.133)
when the input signal is a sine wave with zero phase, unit amplitude, and angular frequency w, the output signal is again a sine wave with the same angular frequency w but with the amplitude (1 w2a2)-1/2 and phase S = tan-' aw,both of which depend on w.
+
14.13
CONNECTION OF SIGNAL PROCESSORS
In practice we may need to connect several signal processors to obtain the desired effect. For example, if we connect two signal processors in series (Fig.
630
FOURIER AND LAPLACE TRANSFORMS
Figure 14.13
G = Gl(GsG4 + GzGsGs)G7.
14.11), thus feeding the output of the first processor into the second one as the input, that is,
Xi(s) = Gi(s)Ui(s), X2(s) = Gz(s)Xi(s),
(14.134) (14.135)
we obtain X2 ( s ) =
Gz (s)Gi(s)Ui ( s ) .
(14.136)
In other words, the effective transfer function of the two processors, G1 and G2, connected in series become their product:
G ( s ) = Ga(s)Gi(s).
( 14.137)
On the other hand, if we connect two processors in parallel (Fig. 14.12), thus feeding the same input into both processors, (14.138) (14.139) along with combining their outputs,
we obtain the effective transfer function as their sum:
G(s) = Gz(s)
+ Gi(s).
(14.141)
For the combination in Figure 14.13 the effective transfer function is given as
G = G1 (G3G4
+ GzGgG6)G7.
(14.142)
Example 14.3. Signal processors in series: Consider two linear signal processors represented by the following differential equations:
.(t)
+ z ( t )= u(t)
(14.143)
CONNECTION OF SIGNAL PROCESSORS
631
with
x(0)= 0
(14.144)
and
.(t)
+ 2 i ( t ) + 42 = u ( t )
(14.145)
z(0) = i ( 0 ) = 0.
(14.146)
with
The individual transfer functions are
1 Gl(s) = -
(14.147)
1+s'
(14.148) thus for their series connection we write the effective transfer funct,ion as
G(s) = G z ( s ) G i ( s ) 1 (s2+2s+4)(1+s)'
(14.149)
For an input signal represented by the sine function:
( 14.150)
u ( t ) = sint; the Laplace transform is written as dc
1 1+ s 2 '
(14.151)
{ u ( t ) }= U ( s )= -
We can now write the Laplace transform of the output signal as X2 ( s )
= G2 (s)Gi(s)U ( S ) =
[
'I
1
(s2
(14.152)
+ 2s + 4 ) ( 1 + ).
(s2
+ 1)'
(14.153)
Using partial fractions this can be written as 14/3(26)
+ 2~/3(26)
5~/26 . (14.154) X ~ ( S=) s2 + 2 s + 4 1+s 52 + 1 We rewrite this so that the inverse transform can be found easily as +
14
(s
+
13/3(26)
+ 1)2+ 3
+
3(26)
1/26
(s
-
+ 1)2+ 3
L( 5I (1 ) ) . (14.155) -
26
1+s2
26
1+s2
632
FOURIER AND LAPLACE TRANSFORMS
This is the Laplace transform of the output signal. Using our table of Laplace transforms and Equation (14.84), we take its inverse to find the physical signal as
+ d$ where 6 =
-
sin(t
+ 6),
(14.156)
tan-' 5. Note that the output of the first processor, GI, is 1 x l ( t ) = - [ePt+ sint - cost] , 2
which satisfies the initial conditions in Equation (14.146).
PROBLEMS
1. The correlation coefficient is defined as
r = (ZY) - (4 (Y) , o x Uy
where oz and oy are the standard deviations of the samples. Show that doubling all the values of rc does not change the value of T . 2. Show that the correlation function Ro [Eq. (14.9)] can be written as
+ 2T(f
-
fo)
3. Show that the correlation function
s -
O -
24.f
+ fo)
-
24.f 4. Show that
-
fo)
SO [Eq. (14.19)] is given as
'>
2 7 r ( f o + f),
sin00
PROBLEMS
633
5. Show that the amplitude spectrum is an even function and the phase spectrum is an odd function, that is,
4 f )= A ( - f ) , W )= - O ( - f ) . 6. Find the Fourier transform of a Dirac-delta function. 7. Dirac-delta functions are very useful in representing discontinuities in physical theories. Using the Dirac-delta function express the three dimensional density of an infinitesimally thin shell of mass M . Verify your answer by taking a volume integral. 8. Find the Fourier transform of a Gaussian:
[-a 2 J:2 ] .
CY
f ( z ) = -exp
J;;
Also show that 00
where
9. If X ( s ) is the Fourier transform of ~ ( t show ) , that the Fourier transform of its derivative d x / d t is given as
X’(s) = (i2nf)X(s), granted that ~ ( t+) 0 as t + 500. Under similar conditions, generalize this to derivatives of arbitrary order. 10. Given the signal 4e-t,
f ( t )=
{o,
t 2 0,
t
find its Fourier transform.
11. A normalized Gaussian pulse has the form cy
x ( t ) = -ePa
2 2
J;;
Show that the area under this pulse is independent of cr and is equal to 1.
634
FOURIER AND LAPLACE TRANSFORMS
12. With the aid of a plot program, see how the Gaussian
scales with cy and how the Dirac-delta function is obtained in the limit as Q -+ 00. Also show that c y l f i is the peak value of the Gaussian. 13. Find the Fourier transform of f(t) = t3, 0
< t < 1.
14. Given the exponentially decaying signal,
find its Fourier transform. 15. Show that the Fourier transform of a damped sinosoidal signal,
is
where
w1 = w
+ ia and w:! = w - ia.
16. For the signal
z ( t ) = 40 cos(27r2t + 1.5) + 15 cos(2~4t+ 1)
for various values of T. Using these plots, generate the analog signal given above.
Hint: The plot program you use may show one of the peaks off scale as in Figure 14.9. In that case you can read the peak value by restricting the plot to the neighborhood of the critical frequency you are investigating. You can also check the maximum values you read from the plots by direct substitution.
PROBLEMS
635
17. Find the Laplace transforms of (i) e P t sint, (ii) cosht sinht, (iii) sin(t - 2 ) , (iv) tn, (v) t3et.
18. Find the inverse Laplace transforms of
(ii)
1
(s
+ 1)(s+ 2)(s + 3 ) ' S2
(iii) s4 - a4 '
19. Using Laplace transforms, solve the differential equations
+ 42 = c o s t , z(O) = 1, ~ ' ( 0=) 1, (ii) ~ " ( t-)22' + IZ: = sint, z(O) = 1, ~ ' ( 0 = ) 1.
(i) ~ " ( t )42'
20. Action of a signal processor is described by the differential equation 22'
+ z(t)= u ( t ) .
Find the transfer function for the input
u ( t ) = ePt cost. 21. Two signal processors are described by the differential equations 22;
and
+ 2l(t)=
Ul(t)
.;
+ 2 2 ( t ) = uz(t).
(i) Find the transfer functions for the cases where they are connected in series.
(ii) Find the transfer functions for the cases where they are connected in parallel.
636
FOURIER AND LAPLACE TRANSFORMS
(iii) Find their outputs when the input is a step function:
u ( t )=
{
0,
t 5 0,
1,
t > 0.
22. Make a diagram representing the following combination of signal processors:
23. In Example 14.3, for the input signal given as
u ( t )=
c
0 1
,
t<0,
I
t>o,
can you still use the boundary conditions in Equation (14.146)? If your answer is no, then solve the problem with the appropriate boundary conditions.
CHAPTER 15
CALCULUS OF VARIATIONS
The extremum point of a surface, z = z(x,y), is defined as the point where the first differential, A(l)z(x ,y), vanishes:
dz A(l)z(x,y) = - dx dX
dz +dy = 0. dy
(15.1)
Since dx and dy are independent infinitesimal displacements, Equation (15.1) can only be satisfied when the coefficients of dx and dy vanish simultaneously: (15.2)
A point that satisfies these conditions is called the stationary point or the extremum point of the surface. To determine whether an extremum corresponds to a maximum or a minimum, one has to check the second differential, A(2)z(x,y), which involves second-order partial derivatives of z ( 2 ,y). For a function of n independent variables,
Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
S. Selsuk Bayin 637
638
CALCULUS OF VARIATIONS
the stationary points are determined by the n independent conditions:
dz
dz 8x2
dz dxn
- ... - -- --
8x1
=o
(15.4)
The simultaneous solution of these n conditions gives the coordinates of the stationary points, which, when substituted into f ( x 1 ,x 2 , . . . , x n ) , gives the stationary values, that is, the local maximum or the minimum values of f ( x I , x 2 , .. . , x ? ~ All ) . this, which sounds familiar from Chapter 2, is actually a simple application of variational analysis to functions of numbers. In general, the calculus of variations is the branch of mathematics that investigates the stationary values of a generalized function, defined in terms of some generalized Variables. These variables could be numbers, functions, tensors, etc. In this regard, calculus of variations has found a wide range of applications in science and engineering. In this chapter, we introduce the basic elements of this interesting branch of mathematics.
15.1 A SIMPLE CASE A particular type of variational problem, where the unknown is a function that appears under an integral sign, has proven to be quite useful. For example, if we want to find the shortest path between two points, we have to extremize the line integral that gives the distance, d , as a function of the path, y ( x ) ,as (15.5) In the above equation, y ( x ) is any path connecting the points 1 and 2 and ds is the infinitesimal arclength. In flat space, ds is given by the Pythagoras theorem:
ds
=
dl
+
(2) 2
dx,
(15.6)
hence the variational problem to be solved is to find the path that minimizes d[Y(X)l (15.7) Since distance depends on the path taken, d [ y ( x ) ]is a function of a function. Hence, it is called a functional. This is a special case of the general variational problem:
VARIATIONAL ANALYSIS
639
where J[y(z)] is the functional t o be extremized, y(x) is the desired function with the independent variable x, and f ( y , y’,x)is a known function of y, y’ and x. 15.2
VA R IAT I0NA L A N A LYSIS
We first introduce the basic type of variational problem, where we look for a function or path that extremizes the functional (15.9) and satisfies certain boundary conditions. To find the stationary function, we need to analyze how J[y(x)] changes when the path is varied. Similar to finding the extremum of a function of numbers, we look for a y(x), which in its neighborhood the change in J[y(x)] vanishes to first order. We parameterize all allowed paths connecting the points x1 and x2 in terms of a continuous and differentiable parameter E as y(x,~), where y(x,O) is the desired stationary path. It is important t o keep in mind that z is not varied, hence it is not a function of E . Substituting y(x,~) into Equation (15.9), we obtain J as a function of E as
which can be expanded in powers of
E:
(15.11) The extremum of J ( E )can now be found as in ordinary calculus by imposing the condition = 0. Differentiating J ( E )with respect to E , we get
%IIxo
( 15.12) (15.13)
( 15.14) where in writing Equation (15.13) we assumed that integration with respect to x and differentiation with respect to E can be interchanged. Also assuming that (15.15)
640
CALCULUS OF VARIATIONS
we can write the last integral as
dJ
dE =
I , [az dFdy
x2
+
dF d
aylz
dy
dx.
(z)]
(15.16)
Integrating the second term on the right-hand side by parts gives
dJ de
dFdy --
-=
x2
dF
x2
dy’deIxl+ll
dJ For the stationary path, we set - = 0 a t dc
(v)] (g)dx’ dF
d
[K-dr
E
(15.17)
= 0 t o get
To see how y(x, E ) behaves in the neighborhood of the desired path, we expand in powers of
E:
y(X,€)=y(x,O)+(~) E=O
which to first order in
E
E + - ( -1 )
2!
d2y dE2
E2+’.’
>
(15.19)
E=O
can be written as (15.20)
Note that since x is not varied, we can write d y ( x , ~ ) / d e= dy(x,e)/de and
ay(x, &)/ax= dy(x,E)/dz. Since
is a function of x only, we intro-
duce the continuous and differentiable function q(x), (15.21) which reduces Equation (15.18) to
From Equation (15.20) it is seen that the variation 6y, that is, the infinitesimal difference between the varied path and the desired path is given as (15.23) (15.24) In other words,
E
is a measure of the smallness of the variation and q(x) gives
VARIATIONAL ANALYSIS
Figure 15.1
Stationary path y(x), the varied path
Y(Z,E)
641
and the variation ~ ( x ) .
how the variation changes with position between the end points (Fig. 15.1). Since Equation (15.22) has t o be satisfied by the desired path for all allowed variations, we have dropped the subscript E = 0. There are now two cases that need to be discussed separately: Case I The desired function is prescribed at the end points: In this case the variation at the end points is zero:
.
77(z1) = rl(z2) =
0;
(15.25)
hence the surface term in Equation (15.22) vanishes, that is, (15.26) thus leaving
( 15.27) For all allowed variations, q ( x ) ,Equation (15.27) can only be satisfied if the quantity inside the square brackets vanishes:
dF
d
dF
d y - dz (G) = O .
(15.28)
This is called the Euler equation. In summary, when the variation at the end points is zero, to find the stationary function, we have to solve the Euler equation, which is a second-order differential equation. Of course, at the end we have t o evaluate the integration constants by imposing the boundary
642
CALCULUS OF VARIATIONS
conditions, which are the prescribed values of the function at the end points:
Y(Zl)
= Y1 and
Y(Z2)
= Y2.
(15.29)
Case 11. The natural boundary conditions: In cases where the variations at the end points do not vanish, that is, when y(x) does not have prescribed values a t the end points, the integral in Equation (15.22) is still zero for the stationary function (path):
(15.30) but in order to satisfy Equation (15.22), we also need to have (15.31)
( 15.32) Since the variations, ~ ( “ 1 )and ~ ( x l )are , completely arbitrary, the only way to satisfy Equation (15.32) is to have
(5) (5) =Oand
=O.
(15.33)
x1
22
These conditions are usually called the natural boundary conditions. If predetermined at one of the end points, say II: = I I : ~ then , ~(zl) is zero but q(z2)is arbitrary. Hence we impose the boundary conditions
~ ( I I : is )
(15.34) Similarly, if y(x) is predetermined at we use
II:
=2
(3zl =0
2
but free at the other end, then
and y(z2)
= YZ.
(15.35)
15.3 ALTERNATE FORM OF EULER EQUATION In the Euler equation: d
dz
(w) dF
-
dF =O,
(15.36)
ALTERNATE FORM OF EULER EQUATION
643
if we write the total derivative in the first term explicitly:
we obtain
d 2F
d2F
dF -
0.
(15.38)
It is useful to note that this equation is equivalent to
F - - - d F dy dy’dx)
1
-[$( Y’
E]
__
=O
’
(15.39)
which gives the alternate form of the Eider equation as
dF
(15.40)
dx
This form is particularly useful when F ( x , y , y f ) does not depend on I(: explicitly, which in such a case allows us to write the first integral immediately as
dF F ( y , $) - -y’ dYf
= C.
(15.41)
This is now a first-order differential equation to be solved for the stationary function. E x a m p l e 15.1. Shortest path between two points: We have seen that finding the shortest path between two points on a plane is basically a variation problem, where
+ Yf 2 1112 .
(15.42)
F ( W ’ Y ’ ) = (1
Since F does not depend on x explicitly, it is advantageous t o use the second form of the Euler equation [Eq. (15.41)]. We first write the needed derivative, (15.43) 1
(1 + y’2)-1/2y’,
(15.44)
and then obtain the Euler equation to be solved for y(z) as
(1+Y f 2 11/2 This has the solution
-
y’2(1
+ y’2)-1/2
= C.
(15.45)
644
CALCULUS OF VARIATIONS
which is the equation of a straight line. Example 15.2. A case with different boundary conditions: Let us now consider the functional (15.47) where T , p , and w are positive constants. The Euler equation is now written as (15.48) where the general solution can be immediately written as
y(z) = c1 cospx
+ c2 s i n p z , p2 = -.PW2 T
(15.49)
We first consider the boundary conditions where the values of y(x) are prescribed at both end points as y(0) = 0 and y(l) = 1,
(15.50)
which determines the integration constants as c1 = 0
and c2
1 sin pZ ’
=-
(15.51)
thus giving the stationary function dz)=
sin ,Ox m’ pl # 0,
7r,
2T,. . . .
(15.52)
Next, we consider the case where y(z) is prescribed at one of the end points, z = I , as y(Z) = 1 but left free at the other end point. We now use Equation (15.35) with F = Tyt2 - pw2y2 to write the second condition to be imposed as (15.53) to obtain the solution
Y(.)
=
cos px
37r 2 2
7r
plf -,-,...
( 15.54)
Finally, when neither y ( 0 ) nor y(l) are prescribed, we impose the natural boundary conditions [Eq. (15.33)]: = 0; 2Ty’(O) = 0,
(15.55)
= 0; 2Ty’(1) = 0,
(15.56)
VARIATIONAL NOTATION
645
which yields the stationary function as
d X )=
{ c“:
cospz,
p1 # T , 2T, . . . , pz = T,2T,. .. ,
(15.57)
where c1 is an arbitrary constant. 15.4
VA R IAT I0 NA L NO T A T I0N
So far we have confined our discussion of variational analysis t o functionals of the form J[Y(Z)l
=
1-
F ( x ,Y, Y’) dx.
(15.58)
In order t o be able to discuss more general cases with several dependent and several independent variables and functionals with higher-order derivatives, we introduce the variational differential “6”.Compared t o “d” , which indicates an infinitesimal change in a quantity between two infinitesimally close points: dx, d f ( x ) , etc., the variational differential, 6,denotes an infinitesimal change in a quantity between two neighboring paths: S y , S F ( x , y, y’), etc. We can write S F ( z ,y, 9’) as SF
= F ( x ,Y
+ E r / ( X ) , Y’ + E V ’ ( X ) )
- F ( x ,Y, Y’).
(15.59)
Expanding the first term on the right-hand side in the neighborhood of ( z ,y, y’) and keeping only the first-order terms in E we obtain
dF SF=-E~(z) dY
dF + ?E~/’(X), dY
(15.60)
which, after substituting Sy = E~/(x)[Eq. (16.24)], becomes (15.61) Note that this expression is consistent with the assumption 6x = 0. We now consider the variational integral
SJ = 6
1:
F ( x , y , y ’ ) dx.
(15.62)
Since the variation of the independent variable, Sx, and the variation of the end points, 6x1 and 6x2, are zero, we can interchange the variation and the integration signs t o write (15.63)
646
CALCULUS OF VARIATIONS
hence (15.64) (15.65) Since the derivative with respect to an independent variable and the variation sign can be interchanged:
6 -
d
(15.66)
= -"sy],
dx
we can also write (15.67) After integrating the second term on the right-hand side by parts, we obtain (15.68) This is the analog of the first differential A(')z ; hence we also write it as S ( ' ) F ( x y, , y'). For the stationary functions we set the first variation to zero, thus obtaining
Since the above equation has to be satisfied for any arbitrary variation, Sy,we reach the same conclusions introduced in Section 15.2 as cases I and 11. If we notice that Sy = q ( x ) , which reduces Equation (15.69) t o Equation (15.22), the connection between the two approaches becomes clear. It can be easily verified that the laws of variation of sums, products, ratios, powers, etc., are completely analogous to the corresponding laws of differentiation:
+
dF dF 6 F = - Sy + - Sy',
(15.70)
+
(15.71)
dY
8Y'
S(F1 F2) = SFi 6F2, S(FlF2) = F1 6F2 F2 SFi,
+
(15.72) (15.73)
SF" = nFn-' SF.
(15.74)
So far nothing has been said about the stationary function being a maximum or a minimum. As in ordinary extremum problems, the answer comes from a
A MORE GENERAL CASE
647
detailed analysis of the second variation, J , which is obtained by keeping the second-order terms in Equation (15.11), where Y ( Z , E ) t o second order is now taken as y ( x , E ) = y ( x , 0)
+
T](Z)E
1 + -rl'(Z)E2. 2
(15.75)
However, in most of the physically interesting cases, this analysis, which could be quite cumbersome, can be evaded. I t is usually intuitively clear that the particular stationary function at hand is the needed maximum or minimum. In other cases, where we may have more than one stationary function, we can pick the one that actually maximizes or minimizes the given functional by direct substitution. I t is also possible that in some cases we are only interested in the extremum, regardless of it being a maximum or a minimum.
15.5
A M O R E GENERAL CASE
Consider the functional
where there are two independent variables, x and y, and two dependent variables, u(x,y ) and v ( x ,y ) . Using variational notation, we write the first variation, S ( ' ) J , and set it to zero for stationary functions:
dv Here we assume that 6u and 6v are continuous and differentiable functions over the region R. We now consider a typical term, which involves variation of a derivative: (15.78) and write it as
648
CALCULUS OF VARIATIONS
Using Green's theorem [Eqs. (2.253)-(2.257)], we can write the following useful formulas:
/L
(15.80)
dxdy =
(15.81)
where 0 represents the angle between the positive x-axis and the outward normal to the boundary, C, of the region R and ds is the infinitesimal arclength along d b . Using Equation (15.80), we can transform the first integral on the right-hand side of Equation (15.79) t o write
$6
(2)
dF cos0 Su ds dux
dxdy =
(15.82) Treating other such terms in Equation (15.77) with the variation of a derivative, we can write the equation t o be satisfied by the stationary functions as
(15.83) If the values of the functions u and v are prescribed on the boundary, then the first integral over the boundary C vanishes. Since the variations Su and SV are arbitrary, the only way t o satisfy Equation (15.83) is to have
dF
d
d
dF
du - dJ: ( G ) dy -
),.(
dF =O
(15.84)
and (15.85) These are now the Euler equations to be solved simultaneously for the stationary functions u(x,y) and v(x,y). For this case the natural boundary conditions are given as
dF
dF .
dux
dUY
-cos8 + -SlnO = 0 on C
(15.86)
A MORE GENERAL CASE
Figure 15.2
649
Minimal surfaces.
when u(x, y) is not prescribed on C and
( 15.87) when u(x, y) is not prescribed on C. Generalization t o cases with m-dependent and n-independent variables is obvious (Hildebrand) .
Example 15.3. Minimal surfuces: Consider a smooth surface, z = z(x, y), bounded by a simple closed curve C (Fig 15.2), whose surface area can be written as (Chapters 2 and 3)
where R,, is the region in the xy-plane bounded by the projection C,, onto the xy-plane. To find the minimal surface with the minimal area we have to extremize S with
F=/l+(E)2+($),
2
(15.89) where we have one dependent, z , and two independent, x and y, variables. Using Equation (15.84) for the only Euler equation we have, we write
(1+ z:
+
= 0,
(15.90)
= 0,
(15.91)
650
CALCULUS OF VARIATIONS
which can be simplified as
+
(1 z;)z,,
-
2z,zyzzy
Example 15.4. Laplace Equation: following variational problem:
+ (1+ z,)zyy 2
= 0.
(15.92)
Let us find the Euler equation for the
( 15.93) where the only dependent variable, 4(z, y, z ) , takes prescribed values on the boundary of R. Using F = (4: 4; 42) and the generalization of Equation (15.84) to three independent variables as
+ +
(15.94) we obtain (15.95) which is the Laplace equation.
Example 15.5. The inverse problem: Sometimes we are interested in writing a variational integral whose Euler equation is a given differential equation. For example, consider the differential equation
dx ( F 2 ) + p w 2 y + p
= 0,
(15.96)
which is the governing equation for the small displacements of a rotating st,ring of length I , where y(z) represents the displacement from the axis of rotation, F ( x ) is the tension, p(x) is the linear mass density, W ( X ) is the angular velocity of the string, and p(x) is the intensity of the load distribution. To find the corresponding variational problem, we multiply the above equation with 6y(z) and integrate over ( 0 , l ) :
The second and third integrals are nothing but
Using integration by parts, the first term in Equation (15.97) can be written as
HAMILTON'S PRINCIPLE
651
Substituting Equations (15.98) and (15.99) into Equation (15.97), we obtain
If the boundary conditions imposed are such that the surface term vanishes: (15.101) that is, when y(z) is prescribed at both end points, which implies
6y(O)
(15.102)
= Sy(l) = 0
or when the natural boundary conditions are imposed:
(15.103) the variational integral that produces Equation (15.96) as its Euler equation becomes
6 l [ ~ p w z y +z p y 15.6
-
5F
(3'1 dz=O.
(15.104)
HAMILTON'S PR INC IPL E
In the previous example we considered the inverse problem, where we started with a differential equation and obtained a variational integral whose Euler equation is the given differential equation. Let us now apply this to classical mechanics. In Newton's theog, a single particle of mass m, moving under the influence of a force field, f , follows a path that satisfies the equation of motion
d27 + + m dty - f = 0 , T ( t l )= 7
-+
1 , T
( t z ) = ?2,
(15.105)
where 7 1 and ?z are the initial and the final positions of the particle. Let us now consider variations, 6 7 ,about the actual path ?(t). Since the end points are prescribed, we take the variations at the end points as zero: 6 T + ( t , ) = S?"(tz)
= 0.
( 15.106)
At any intermediate time the varied path is given by
7 ' t )+ 6 7 ( t ) .
(15.107)
652
CALCULUS OF VARIATIONS
We now take the scalar product of the equation of motion [Eq. (15.105)] with 6 7 ( t ) and integrate over time from tl to t 2 :
(15.108) d Integrating by parts and interchanging - and 6 the first term becomes dt
(15.109) Since variation at the end points is zero, the surface term vanishes. Utilizing the relation (15.110) we can write (15.111) (15.112) = -61; T d t ,
( 15.113)
(s) - 2
where T = 4m is the well-known expression for the kinetic energy and we have interchanged the variation and the integral signs. In the light of these, Equation (15.108) becomes
( 15.114) Furthermore, if the particle is moving under the influence of a conservative force field, we can introduce a potential energy function, V(?;'),such that 4
f
w e can now write
f.
=--
dV
d 7 '
6as ~
= -6V
(15.115)
LAGRANGE'S EQUATIONS OF MOTION
653
which gives the final form of Equation (15.114) as
q2
(T - V) d t = 0.
(15.116)
This is called Hamilton's principle. It basically says that the path that a particle follows is the one that makes the integral (T - V) d t stationary, that is, an extremum. In most cases the extremum is a minimum. The difference (T - V) is called the Lagrangian, L , and the functional
st:'
rt2
(15.117) is called the action. Hamilton's principle can be extended t o systems with
N particles by writing Equation (15.114) for a single particle and then by summing over all particles within the system:
where T k is the net force acting on the kth particle (Hildebrand; Goldstein, Poole, and Safko). 15.7
LAGRANGE'S EQUATIONS OF M O T I O N
In a dynamical system with n degrees of freedom, it is usually possible to find n independent parameters called the generalized coordinates that uniquely describe the state of the system. For example, for a pendulum consisting of a mass m suspended by a string of fixed length I , the position of the mass can be completely specified by giving the angle that the string makes with the vertical (Fig. 15.3). If we were t o use Cartesian coordinates of m, the x and y coordinates will not be independent, since they are related by the constraining equation
x2 + y2
(15.119)
= 12,
which indicates that the length of the string is fixed. Similarly, for a double pendulum the two angles, 81 and Q2, are sufficient to completely specify the state of the system a t any given time (Fig. 15.3). In this case, the four Cartesian coordinates, ( X I y1) , for ml and (x2, y2) for m2, are related by the two constraining equations:
x;
(22 -
+ (y2
-
+ y:
= 19,
(15.120)
.z;
(15.121)
y1)2 =
654
CALCULUS OF VARIATIONS
't
X
0
*
2
m2 Generalized coordinates for the simple pendulum (left) and for the Figure 15.3 double pendulum (right).
In general, the total kinetic energy of a system will be a function of the n generalized coordinates, q l , q 2 , . . . , q,, and the n-generalized velocities, . . q l , q z , . . . , q,. In terms of generalized coordinates the f .S?;' term in Equation (15.114) is replaced by Q1Sq1
+ Q 2 . h + . . + QnSqn,
(15.122)
where Qi are the components of the generalized force. For conservative systems we can define a generalized potential energy function, which depends only on the generalized coordinates as
( 15.123)
V(41, q 2 , . . . , q n ) , where the generalized forces are given as
(15.124) In terms of generalized coordinates, Hamilton's principle,
( 15.125) determines the dynamics of an n-dimensional system by the n Euler equations:
=o,
i=l,2,
" . , n,
(15.126)
which are now called Lagrange's equations of motion. In addition to the conservative forces, if the system also has nonconservative forces, Q i , &I,. . . , QL,
LAGRANGE’S EQUATIONSOF MOTION
655
Lagrange’s equations become
= Q : , i = 1 , 2 , . . . , n.
( 15.127)
In general, Lagrangian formulation of mechanics is given in terms of Hamilton’s principle as
( 15.128) where 41, q 2 , . . . , qn are the n independent generalized coordinates. The corresponding n Lagrange’s equations of motion,
= 0 , i = 1 , 2 , . . . ,n,
(15.129)
have to be solved simultaneously with the appropriate boundary conditions. As we have seen, for conservative systems the Lagrangian can be written as
L=T-V.
(15.130)
Example 15.6. Simple penduZum: For a simple pendulum (Fig. 15.3) the kinetic energy is written as 1 T = -rn(lb)’. 2
(15.131)
In the absence of friction we can write the gravitational potential energy with respect t o the equilibrium position as
v = rngl(1-
COS0),
(15.132)
which leads t o the Lagrangian 1 L = -m(lb)2 - rngl(1 - COS0). 2
( 15.133)
The corresponding Lagrange’s equation for 0 is now written as
d d0
-(rnl’b)
-
o + rngl sin 0 = 0,
(15.134)
0 + - 9s i n 0 = 0 ,
(15.135)
”
1
which is the well-known result for the equation of motion of a simple pendulum. For small oscillations, sin0 N 0, the solution is given as 0 ( t ) = a0 cos(wt+al), where w2 = g/1 and a0 and a1 are the integration
656
CALCULUS OF VARIATIONS
constants t o be determined from initial conditions. Solutions for finite oscillations are given in terms of elliptic functions (Bradbury). Example 15.7. Double pendulum: For the double pendulum in Figure 15.3, we write the Cartesian coordinates of the mass ml as
x1 = ll s i n & , y1
=
-11 cos&
(15.136)
and for m2 as 52 =
11sin01
+ 12sin02, y2 = -11cos81
-12~0~82.
( 15.137)
The total kinetic energy of the system is now written as 1 2
2
T = -rn1(i1
1 .2 .2 + $1)2 + -m2(22 + Y2) 2
(15.138)
while the total potential energy is
V = migyi - - (ml
+ rn2gy2
+ ma)g11 cos 01
-
(15.139) ( 15.140)
m2912 cos e2.
Using Equations (15.138) and (15.140)' we can write the Lagrangian, L = T - V, and obtain Lagrange's equation of motion for 01 as
(15.141)
+
+(mi m2)gsinQl = o
( 15.142)
and for 02 as d dt
.2
LlQl cos(O1 - 6'2)
+ 1282 - 1119, sin(&
-
02) + gsin82
= 0.
(15.144)
We have now two coupled differential equations [Eqs. (15.142) and (15.144)] t o be solved for & ( t ) and &(t).
DEFINITION OF LAGRANGIAN
15.8
657
D E F I N I T I O N OF LAGRANGIAN
We have seen that for one dimensional conservative systems in Cartesian coordinates, Hamilton's principle is written as t2
[
61 = 6
( 15.145)
L ( x , i , t ) dt
l:' Jt,
=
6L(x,Z , t ) dt
= 0,
(15.146)
where the Lagrangian is defined as L ( x , x , t ) = T - V . Let us now consider the following Lagrangian: 1 .2 L' = -mx 2
+ kxxt,
which does not look like anything familiar. However, if write the corresponding Lagrange's equation of motion, namely
(15.147) d + kxt) - k x t dt m x + lcxt kx - kxt m x kx -(mX
+
+
= 0,
(15.148)
= 0,
(15.149) (15.150)
= 0,
we see that it is nothing but the harmonic oscillator equation of motion for a mass m attached to a spring with the spring constant k . This equation of motion can also be obtained from the well-known Lagrangian 1
.
2
L = -mx 2
-
1
-kx 2
2
,
(15.151)
where V = +kx2is the familiar expression for the potential energy stored in the spring. If we look a little carefully, we notice that the two Lagrangians are related by the equation
L
= L' - kxxt =L
'f-
-
1 -kx2 2
1
--kx
ddt[
t ,
(15.152)
2 ]
where they differ by a total time derivative. In fact, from Hamilton's principle,
6I =
1;'
6L dt
= 0,
(15.153)
658
CALCULUS OF VARIATIONS
it is seen that the definition of Lagrangian is arbitrary up to a total time derivative. That is, two Lagrangians, L(q,4, t ) and L'(q, i,t ) ,that differ by a total time derivative,
d
L'(q, 4, t ) = L(q, ;I,t ) + ZwI,4, t ) ,
(15.154)
have the same equation of motion. The proof follows directly from Hamilton's principle: (15.155)
( 15.156) (15.157) =S
1:
L dt
+6F(t2)
-
6F(tl).
(15.158)
Since the variation at the end points is zero, the last two terms, 6F(t2) and S F ( t l ) ,vanish, thus yielding
6 1 ; L' dt
= 61;
L dt,
(15.159)
which means that, the two Lsgrangians, L and L', have the same equation of motion. For an n-dimensional system, we replace q by 41, q 2 , . . . ,qn, and replace q by Sl,q 2 , . . . ,qn, but the conclusion remains the same. In this regard, in writing a Lagrangian we are not restricted with the form T - V. In special relativity the Lagrangian for a freely moving particle is given as
L=
rn;cqq,
(15.160)
which is not the relativistic kinetic energy of the particle. In general relativity, Einstein's field equations are obtained from the variational integral (15.161) where R is the curvature scalar and g is the determinant of the metric tensor. In electromagnetic theory, Hamilton's principle is written as
6 1 1 t 2L d t = 61.6'
[//L
( - & F ~ ~ F- ~c ~
( 15.162)
PRESENCE OF CONSTRAINTS IN DYNAMICAL SYSTEMS
659
where FOP is the field strength tensor and J“and A” are the four current and four potential, respectively. Lagrangian formulation of mechanics makes applications of Newton’s theory t o many-body problems and t o continuous systems possible. It is also an indispensable tool in classical and quantum field theories.
15.9
PRESENCE OF CONSTRAINTS IN D Y N A M I C A L SYSTEMS
For a system with n degrees of freedom, one can in principle find n independent generalized coordinates to write Hamilton’s principle as
6 J ( q i ,q i , t ) = 6
1:
L(qi,q i , t ) d t = 0.
(15.163)
Using the condition that the variation at the end points vanish: S q z ( t 1 ) = Sqi(t.2) = 0, i
= 1,2,.
. . ,n,
( 15.164)
we write
(15.165)
Since 6qi are independent and arbitrary, the only way to satisfy Equation (15.165) is to have all the coefficients of 6qi vanish simultaneously:
dL dqi
--
=o,
i=1,2
, . ” ) 12,
( 15.166)
thus obtaining n Lagrange’s equations to be solved for the n independent generalized coordinates qi ( t ) . Sometimes we may have k constraints imposed on the system:
4l~(41q,2 , . . . > qn) = 0.
( 15.167)
In this case the n 6qis in Equation (15.165) are no longer independent, hence we cannot set their coefficients t o zero. One way to face this situation is to find a reduced set of ( n- k ) generalized coordinates that are all independent.
660
CALCULUS OF VARIATIONS
However, this may not always be possible or convenient. In such cases, a practical way out is to use the Lagrange undetermined multipliers. Using the constraining equations [Eq. (15.167)], we write the following variational conditions:
(15.168) In short, these k equations can be written as (15.169) We now multiply each one of these equations with a Lagrange undetermined multiplier, XI, to write A1
c
841
i= 1
-Sqz dqi
= 0)
1 = 1 , 2 , .. . , k ,
(15.170)
and then add them to have (15.171) Integrate this with respect to time from
t1
to
t2:
(15.172) and then add to Equation (15.166) t o get
SJ
=
lr2 [ i= 1
k
d
(dG L ) % d L + X A L 6qi ~ d]t = 0. -
(15.173)
1=1
In this equation Sqi are still not independent. However, we have the k Lagrange undetermined multipliers, &, at our disposal to set the quantities inside the square brackets to zero:
( 15.174)
PRESENCE OF CONSTRAINTSI N DYNAMICALSYSTEMS
661
m2 g Figure 15.4
Constraining force.
+
thus obtaining new Lagrange’s equations. We now have n k unknowns, that is, the n qis and the k Lagrange undetermined multipliers. We can solve for these n+ k unknowns by using n Lagrange’s equations and the k constraining equations. If we write Lagrange’s equations [Eq. (15.174)] as
( 15.175) and compare with Equation (15.127), we see that the right-hand side,
behaves like the ith component of a generalized force. In fact, it represents the constraining force on the system (Goldstein, Poole, and Safko). Example 15.8. Constraining force: Consider the simple pulley shown in Figure 15.4, which has only one degree of freedom. We solve the problem with two generalized coordinates, q1 and 42, and one constraining equation: q1
+ 42 = 1,
(15.176)
that is, d( qi , 42) = Q1
+ 42 - 1 = 0,
which indicates that the string has fixed length. grangian as
L = -1 2 mlql 2
(15.177) We write the La-
.2 + -m2q2 + m l q l g -k m 2 q 2 g . 2
662
CALCULUS OF VARIATIONS
Now Lagrange's equation of motion for
q1
becomes (15.178)
( 15.179) m1ql
and for
92
-
mlg
=
(15.180)
-A
we obtain
( 15.181) d dt
-(m2q2)
-
m2g
m2q2 - m 2 g
= -4
(15.182)
= -A.
( 15.183)
All together, we have the following 3 equations to be solved for q l , q 2 , and X : (15.184) (15.185) ( 15.186)
mlql - mlg = -A, m2q2 - m 2 g = -A, q1
+ q2 -
= 0.
Eliminating X from the first two equations and using third gives the accelerations ql = -q2 =
G1 = -q2
ml - m2
from the
( 15.187)
Using this in Equation (15.184) or (15.185), we obtain X as (15. Note that in this case -A; namely (15. 89)
( 15.190) corresponds to the tension, T , in the rope, which is the constraining force.
15.10
CONSERVATION LAWS
Given the Lagrange equations:
= 0 , i = 1 , 2 , . . . , n,,
(15.191)
PROBLEMS
663
if the Lagrangian:
L(q1,q2,... , Q n , 4 1 , 4 2 , . . . ,in,t),
(15.192)
does not depend on the ith generalized coordinate explicitly, then dL/dqi corresponds to a conserved quantity: dL
- = c.
(15.193)
a4i
In Cartesian coordinates, one dimensional motion of a free particle is governed 2 by the Lagrangian L = 3mx , hence the conserved quantity: dL
- = mx = C,
(15.194)
dX
corresponds to the linear momentum of the particle. In general, pi = dL/dqi, ,i = 1 , 2 , .. . , n, corresponds to the ith generalized momentum of the system (Prob. 15.16). PROBLEMS 1. Determine the dimensions of a rectangular parallelepiped, which has sides parallel to the coordinate planes with the maximum volume and that is inscribed in the ellipsoid
x2
y2 22 + - + - = 1. a2 b2 c2
-
2. Find the lengths of the principal semiaxes of the ellipse c1x2
+ 2c2xy + c 3 y 2 = 1, c,c, > c;.
3. Consider solids of revolution generated by all parabolas passing through (0,O) and ( 2 , l ) and rotated about the x-axis. Find the one with the least volume between 2 = 0 and z = 1.
Hint: Take y
=z
+ Coz(2
-
x) and determine CO.
4. In the derivation of the alternate form of the Euler equation verify the equivalence of the relations
d2F dy'2
and
(E)
+
d2F
Byay'
(2) dzdy' d2F
+
dF
-
dy = o
664
CALCULUS OF VARIATIONS
5. Write the Euler equations for the following functionals:
(9 F
=
+
2 ~ ’zyy’ ~ - y2,
(ii)
F = yf2+ csin y, (iii)
F = x3yf2- xzy2 + 2yy’,
6. A geodesic is a curve on a surface, which gives the shortest distance between two points. On a plane, geodesics are straight lines. Find the geodesics for the following surfaces: (i) Right circular cylinder. (ii) Right circular cone. (iii) Sphere.
7. Determine the stationary functions of the functional
for the following boundary conditions: (i) The end conditions y(0) = 0 and y(1) = 1 are satisfied.
(ii) Only the condition y(0) = 0 is prescribed. (iii) Only the condition y(1) = 1 is prescribed.
(iv) No end conditions are prescribed.
8. The brachistochrone problem: Find the shape of the curve joining two points, along which a particle initially a t rest falls freely under the influence of gravity from the higher point t o the lower point in the least amount of time. 9. Find the Euler equation for the problem
6lr2F ( z ,y,y’,y’’) d z = 0
PROBLEMS
665
and discuss the associated boundary conditions. 10. Derive the Euler equation for the problem
subject to the condition that u ( x ,y ) is prescribed on the closed boundary of R.
11. Derive the Euler equation for the problem
F ( X , Y , U , U x ; ~ y ; ~ x x , ~ x y , U ydXdY y) =
0.
What are the associated natural boundary conditions in this case. 12. Write Hamilton's principle for a particle of mass m moving vertically under the action of uniform gravity and a resistive force proportional to the displacement from a n equilibrium position. Write Lagrange's equation of motion for this particle. 13. Write Hamilton's principle for a particle of mass m moving vertically under the action of uniform gravity and a drag force proportional to its velocity. 14. Write Lagrange's equations of motion for a triple pendulum consisting of equal masses, m, connected with inextensible strings of equal length 1. Use the angles @ I , & , and 0 3 that each pendulum makes with the vertical as the generalized coordinates. For small displacements from equilibrium show that the Lagrangian reduces to
15. Small deflections of a rotating shaft of length 1, subjected to an axial end load of P and transverse load of intensity p ( x ) is described by the differential equation d2 (
E I S )
+ Pdz" d2Y
-
pw2y - p ( x )
= 0,
dx2 where E I is the bending stiffness of the shaft, p is the density and w is the angular frequency of rotation. Show that this differential equation can be obtained from the variational principle
6
s' [ 0
~ E I Y"~ -Pyf2 1 - -pw 1 2y2 2 2 2
-
py
666
CALCULUS OF VARIATIONS
What boundary conditions did you impose? For other examples from the theory of elasticity see Hildebrand. 16. A pendulum that is not restricted t o oscillate on a plane is called a spherical pendulum. Using spherical coordinates, obtain the equations of motion corresponding t o T , 0,4:
mgcosQ-T=-m(s8
.2
.2
+ssin284), .2
-mg sin 8 = m(s0 - s sin 0 cos 04 ),
o=--
(ms2sin28$), s sin 8 d t where s is the length of the pendulum and T is the tension in the rope. Since there are basically two independent coordinates, 6 and 4, show that the equations of motion can be written as .2 0 - 4 s i n 0 c o s 8 + -9s i .n 8 = 0 and ms2sin28$ = I , S
where 1 is a constant. Show that the constant 1 is actually the ponent of the angular momentum: Z3
23
com-
= m(xl22 - 2 1 x 2 )
17. When we introduced the natural boundary conditions we used Equation (15.22):
where for stationary paths we set
and
Explain.
18. Write the components of the generalized momentum for the following problems: (i) plane pendulum, (ii) spherical pendulum, (iii) motion of earth around the sun and discuss whether they are conserved or not.
CHAPTER 16
PROBABILITY THEORY AND DISTRIBUTIONS
Probability theory is the science of random events. It has long been known that there are definite regularities among large numbers of random events. In ordinary scientific parlance, certain initial conditions, which can be rather complicated, lead to certain events. For example, if we know the initial position and velocity of a planet, we can be certain of its position and velocity a t a later time. In fact, one of the early successes of Newton’s theory was its prediction of the solar and lunar eclipses for decades and centuries ahead of time. An event that is definitely going t o happen when certain conditions are met is called certain. If there is no set of conditions that could make an event happen, then that event is called impossible. If under certain conditions an event may or may not happen, we call it random. From here, it is clear that the certainty, impossibility, and randomness of an event depends on the set of existing conditions. Randomness could result from a number of reasons. Some of these are the presence of large numbers of interacting parts, insufficient knowledge about the initial conditions, properties of the system, and also the environment. Probability is also a word commonly used in everyday language. Using the available information, we often base our decisions on how probable or Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
8. Selquk Bayin 667
668
PROBABILITY THEORY AND DISTRIBUTIONS
improbable we think certain chains of events are going t o unravel. The severity of the consequences of our decisions can vary greatly. For example, deciding not to take an umbrella, thinking that it will probably not rain, may just result in getting wet. However, in other situations the consequences could be much more dire. They could even jeopardize many lives including ours, corporate welfare, and in some cases even our national security. Probability not only depends on the information available but also has strong correlations with the intended purpose and expectations of the observer. In this regard, many different approaches to probability have been developed, which can be grouped under three categories: (1) Definitions of mathematical probability based on the knowledge and the priorities of the observer. (2) Definitions that reduce to a reproducible concept of probability for the same initial conditions for all observers. This is also called the classical definition of probability. The most primitive concept of this definition is equal probability for each possible event or outcome. ( 3 ) Statistical definitions, which follow from the frequency of the occurrence of an event among a large number of trials with the same initial conditions. We discuss (1) in Chapter 17 within the context of information theory. In this chapter, we concentrate on (2) and (3), which are related. 16.1
INTRODUCTION T O PROBABILITY THEORY
Origin of the probability theory can be traced as far back as to the communications between Pascal (1623-1662) and Fermat (1601-1665). An extensive discussion of the history of probability can be found in Todhunter. Modern probability theory is basically founded by Kolmogorov (1903-1987). 16.1.1
Fundamental Concepts
We define a sample space, S, as the set of all possible outcomes, A, B , . . . , of an experiment:
S
=
{ A ,B , . . . }.
(16.1)
If we roll a die, the sample space is
s = {1,2,3,4,5,61,
(16.2)
where each term corresponds to the number of dots showing on each side. We obviously exclude the events where the die breaks or stands on edge. These events could be rendered as practically impossible by adjusting the conditions. For a single toss of a coin the sample space is going to be composed of two elements.
S
=
{head, tail}.
(16.3)
INTRODUCTION TO PROBABILITY THEORY
669
An event, E, is defined as a set of points chosen from the sample space. For example,
E
=
{1,3,5}
corresponds to the case where the die comes up with an odd number, that is, either a 1, a 3 or a 5. An event may be a single element, such as
E = {4),
(16.4)
where the die comes up 4. Events that are single elements are called simple or elementary events. Events that are not single elements are called compound elements. 16.1.2
Basic Axioms of Probability
We say that S is a probability space if for every event in S we could find a number P(E)with (1) P ( E ) 2 0, (2) P ( S ) = 1, (3) If El and E2 are two mutually exclusive events in S, that is, their intersection is a null set, El n Ez = 8, then the probability of their union is the sum of their probabilities:
+
P(E1 u E2)= P(E1) P(E2).
(16.5)
Any function P ( E ) satisfying the properties (1)-(3) defined on the events of S is called the probability of E. These axioms are due to Kolmogorov. They are sufficient for sample spaces with finite number of elements. For sample spaces with infinite number of events, they have to be modified. 16.1.3
Basic Theorems of Probability
Based on these axioms, we now list the basic theorems of probability theory. Proofs can be found in Harris.
Theorem 16.1. If El is a subset of E s , that is, El
P(E1) < P ( E 2 ) . Theorem 16.2. For any event E in S ,
Theorem 16.3. Complementary set, E", of E is
P ( E " )= 1 - P(E),
c E2, then
670
PROBABILITY THEORY AND DISTRIBUTIONS
where EC
+ E = S , also shown as E" U E = S.
Theorem 16.4. Probability of no event happening is zero, that is,
P ( 0 )= 0 , where 0 denotes the null set. Theorem 16.5.For any two events in S , we can write
Theorem 16.6. If E l , E2, . . . , Em are mutually exclusive events with Ei n Ej = 0, 1 5 i, j 5 m and i # j , then m
P(E1 U E2 U . . . U E m ) =
C P(Ei).
(16.6)
i=l
So far we said nothing about how to find P ( E ) explicitly. When S is finite with elementary events E l , E 2 , .. . , E m , any choice of positive numbers P I ,P2, . . . , P,, satisfying m
C P i 4
(16.7)
i= 1
will satisfy the three axioms. In some situations, symmetry of the problem allows us to assign equal probability to each elementary event. For example, when a die is manufactured properly, there is no reason to favor one face over the other. Hence, we assign equal probability t o each face as 1
p1 = p2 = . . . = p(j = -. 6
(16.8)
To clarify some of these points, let us choose the following events for a die rolling experiment:
Ei
=
{1,3,5}, E2
=
{1,5,6}, and E3
=
{3},
(16.9)
which give
1 1 1 1 P(E1) = - - - = -, 6 6 6 2 1 1 1 1 P(E2) = - - - = 6 6 6 2 ' 1 P(E3) = -. 6
+ + + +
(16.10)
INTRODUCTION TO PROBABILITY THEORY
671
Note that in agreement with Theorem 16.1 we have
We can also use
EE
=
{ 2,4,6), Eg
=
{ 1 , 2 , 4 , 5 , 6 ) , El U E2
=
{1,3,5,6),
(16.12)
and
El n E2 = {1,5}
(16.13)
to write (16.14) 1 6
5 6'
P(E,")= 1 - P(E3) = 1 - - = -
(16.15)
which are in agreement with Theorem 16.3. Also in conformity with Theorem 16.5. we have
+
P(E1 U E2) = P(E1) P(E2) - P(E1 n E2)
(16.16)
( 16.17) For any finite sample space with N elements,
for an event
E = El U E2 U E3 U . . . U Em, m 5 N , where E l , Ez,. . . ,Em are elementary events with Ei n Ej we can write
(16.19) =
8, 1 5 i , j 5 m,
m
P ( E )= C P ( E i ) .
(16.20)
2x1
If the problem is symmetric so that we can assign equal probability to each elementary event, &, then the probability of E becomes
m P ( E )= -, N
(16.21)
where m is the number of elements of S in E . In such cases, finding P ( E ) reduces to simply counting the number of elements of S in E . We come back to these points after we introduce permutations and combinations.
672
PROBABILITY THEORY AND DISTRIBUTIONS
16.1.4
Statistical Definition of Probability
In the previous section using the symmetries of the system, we have assigned, a priori, equal probability to each elementary event. In the case of a “perfect” die, for each face this gives a probability of 1/6 and for a coin toss it assigns equal probability of 1/2 to the two possibilities: heads or tails. An experimental justification of these probabilities can be obtained by repeating die rolls or coin tosses sufficiently many times and by recording the frequency of occurrence of each elementary event. The catch here is, How identical can we make the conditions and how many times is sufficient? Obviously, in each roll or toss, the conditions are slightly different. The twist of our wrist, the positioning of our fingers, etc., all contribute to a different velocity, position and orientation of the die or the coin at the instant it leaves our hand. We also have to consider variation of the conditions where the die or the coin lands. However, unless we intentionally control our wrist movements and the initial positioning of the die or the coin to change the odds, for a large number of rolls or tosses we expect these variations, which are not necessarily small, to be random and to cancel each other. Hence, we can define the statistical probability of an event E as the frequency of occurrences:
P ( E ) = lim n-oo
number of occurrences of E n
(16.22)
Obviously, for this definition to work, the limit must exist. It turns out that for situations where it is possible to define a classical probability, fluctuation of the frequency occurs about the probability of the event and the magnitude of fluctuations die out as the number of tries increases. There are plenty of data to verify this fact. In the case of a coin toss experiment the probability for heads or tails quickly converges to 1/2. In the case of a loaded die, frequencies may yield the probabilities: 1 1 1 1 1 1 P(l)= -, P ( 2 ) = -, P(3) = -, P ( 4 ) = -, P ( 5 ) = -, P ( 6 ) = 2 8 4 16 32 32’ (16.23)
which also satisfy the axioms of probability. In fact, one of the ways t o find out that a die is loaded or manufactured improperly is to determine its probability distribution. When it comes to scientific and technological applications, classical definition of probability usually runs into serious difficulties. First of all, it is generally difficult to isolate the equiprobable elements of the sample space. In some cases the sample space could be infinite with infinite number of possible outcomes or the possible outcomes could be distributed continuously, thus making it difficult, if not always impossible, to enumerate. In order to circumvent some of these difficulties, the statistical probability concept comes in very handy.
INTRODUCTION TO PROBABILITY THEORY
673
16.1.5 Conditional Probability and Multiplication Theorem Let us now consider an experiment where two dice are rolled, the possibilities are given as
The first number in parentheses gives the outcome of the first die, and the second number is the outcome of the second die. If we are interested in the outcomes, { A } ,where the sum is 6, obviously there are 5 desired results out of a total of 36 possibilities. Thus the conditional probability is
5 P ( A ) = -. 36
(16.24)
We now look for the probability that the sum 6 comes (event A ) , if it is known that the sum is an even number (event B ) . In this case the sample space contains only 18 elements. Since we look for the event A after the event B has been realized, the probability is given as 5 P ( A / B )= -. 18
(16.25)
P ( A / B ) ,which is also shown as P ( A 1 B ) , is called the conditional probability of A. It is the probability of A occurring after B has occurred. Let us now generalize this to a case where {Cl,( 3 2 , . . . , en}is the set of uniquely possible, mutually exclusive and equiprobable events. Among this set let rri
5 n denote the number of events acceptable by A,
k: 5 n denote the number of events acceptable by B ,
r denote the number of events acceptable by both A and B . We show the events acceptable by both A and B as AB or A n B . Obviously, r 5 k: and r 5 m. This means that the probability of A happening after B has happened is
r P ( A / B )= k
r/n k/n
= -=
~
P(AB) P(B)
(16.26)
674
PROBABILITY THEORY AND DISTRIBUTIONS
Siiiiilarly.
( 16.27) Note that if P ( B ) is an impossible event, that is, P ( B ) = 0, then Equation (16.26) becomes meaningless. Similarly, Equation (16.27) is meaningless when P ( A ) = 0. Equations (16.26) and (16.27), which are equivalent, represent the multiplication theorem, and we write them as
P ( A B ) = P(A)P(B/A) = P(B)P(A/B).
(16.28)
For independent events, that is, the occurrence of A (or B) is independent of B (or A ) occurring, then the multiplication theorem takes on a simple form:
P ( A B )= P ( A ) P ( B ) .
(16.29)
For cxample. in an experiment we first roll a die, A, and then toss a coin, B. Clearly, the two events are independent. The probability of getting the iiuiiiber 5 in the die roll and a head in the coin toss is
P ( A B )=
1 1
6 2
=
1 12
-.
(16.30)
16.1.6 Bayed Theorem Let us iiow consider a tetrahedron with its faces colored as the first face is red, A, the second face is green: B , the third face is blue, C, and finally the fourth face is in all three colors, ABC. In a roll of the tetrahedron the color red has the probability 1 1 1 P ( A ) = - + - = -. 4 4 2
(16.31)
This follows from the fact that the color red shows in 2 of the 4 faces. Similarly, wc can write the following probabilities: 1 2
P ( B ) = P ( C ) = -, 1 P ( A / B ) = P ( B / C )= -, 2 1 P ( C / A )= P ( B / A ) = -, 2 1 2
P ( C / B )= P ( A / C ) = -.
(16.32)
INTRODUCTION TO PROBABILITY THEORY
675
This means that the events A, B , C are pairwise independent. However, if it is known that B and C has occurred, then we can be certain that A has also occurred, that is,
P ( A / B C )= 1,
(16.33)
which means that events A, B , C are collectively dependent. Let us now consider an event B that can occur together with one and only one of the n mutually exclusive events:
{ A l , Az, . . . ,An}.
(16.34)
Since BA, and BA, with i # j are also mutually exclusive events, we can use the addition theorem of probabilities to write n
P ( B )= C P ( B A i ) .
( 16.35)
i=l
Using the multiplication theorem [Eq. (16.28)], this becomes n
( 16.36) i=l
which is also called the total probability. We now drive an important formula called Bayes’ formula. It is required to find the probability of event Ai provided that event B has already occurred. Using the multiplication theorem [Eq. (16.28)], we can write
P(AiB) = P ( B ) P ( A i / B )= P ( A i ) P ( B / A i ) ,
(16.37)
which gives (16.38) Using the formula of total probability [Eq. (16.36)], this gives Bayes’ formula:
( 16.39) which gives the probability of event Ai provided that B has occurred first. Bayes’ formula has interesting applications in decision theory and is also used extensively for data analysis in physical sciences (Bather; Sivia and Skilling). Example 16.1. Colored balls in six bags: Three bags have composition A1 with 2 white and 1 black ball each, one bag has composition A2 with 9 black balls, and the remaining 2 bags have composition A3 with 3
676
PROBABILITY THEORY AND DISTRIBUTIONS
white balls and 1 black ball each. We select a bag randomly and draw one ball from it. What is the probability that this ball is white? Call this event B. Since the ball could come from any one of the six bags with compositions All A2, and A3, we can write
B
= A1B
+ A2B + A3B.
(16.40)
Using the formula of total probability, we write
P(B)= P ( A i ) P ( B / A i )+ P(A2)P(B/A2)+ P(A3)P(B/A3), (16.41) where (16.42) 3 4
2 3
P ( B / A l )= -, P(B/A2)= 0 , P(B/A3)= -,
(16.43)
to obtain 3 2 6’3
1 6
P(B)=--+-.O+--
2 3 6‘4 (16.44)
Example 16.2. Given six identical bags: The bags have the following contents:
3 bags with contents A1 composed of 2 white and 3 black balls each, 2 bags with contents A2 composed of 1 white and 4 black balls each, 1 bag with contents A3 composed of 4 white and 1 black balls each. We pick a ball from a randomly selected bag, which turns out t o be white. Call this event B. We now want to find, after the ball is picked, the probability that the ball was taken from the bag of the third composition. We have the following probabilities:
( 16.45) 2
1 5
4 5
P ( B / A I )= -, P(B/A2)= -, P(B/A3)= -. 5
(16.46)
Using the Bayes’ formula [Eq. (16.39)], we obtain
(16.47)
677
INTRODUCTION TO PROBABILITY THEORY
2a
Figure 16.1
Buffon’s needle problem.
16.1.7 Geometric Probability and Buffon’s Needle Problem We have mentioned that the classical definition of probability is insufficient when we have infinite sample spaces. It also fails when the possible outcomes of an experiment are distributed continuously. Consider the following general problem: On a plane we have a region R and in it another region r. We want to define the probability of a thrown point landing in the region r. Another way to pose this problem is: What is the probability of a point, coordinates of which are picked randomly, falling into the region r. Guided by our intuition, we can define this probability as
P=
area of r area of R ’
(16.48)
which satisfies the basic three axioms. We can generalize this formula as
P=
measure of r measure of R’
(16.49)
where the measure stands for length, area, volume, etc.
Example 16.3. Buffon’s needle problem: We partition a plane by two parallel lines separated by a distance of 2a. A needle of length 21 is thrown randomly onto this plane (1 < u ) . We want to find the probability of the needle intersecting one of the lines. We show the distance from the center of the needle to the closest line with x and the angle of the needle with Q (Fig. 16.1). Configuration of the needle is completely specified by x and 8. For the needle to cross one of the lines, it is necessary and sufficient that
x 5 1 sin Q
(16.50)
678
PROBABILITY THEORY AND DISTRIBUTIONS
I I
a
7--
x
I
= lsin 0
I I I I I I I
I I I I I I I I I I
0
Figure 16.2
0
Area in the Buffon’s needle problem
be satisfied. Now the probability is the ratio of the region under the curve x = 1 sin8 to the area of the rectangle 1 sinddd in Figure 16.2:
st
P=
:J I sin 8 d8 an
(16.51) Historically, the Buffon’s needle problem was the starting point in solving certain problems in the theory of gunfire with varying shell sizes. It has also been used for purposes of estimating the approximate value of 7r. For more on the geometric definition of probability and its limitations, we refer to Gnedenko.
16.2 PERM UTATlONS A N D COMB INAT10NS We mentioned that in symmetric situations, where the sample space is finite and each event is equally probable, assigning probabilities reduces to a simple counting process. To introduce the basic ideas, we use a bag containing a number of balls numbered as 1 , 2 , . . . , N . As we shall see, the bag and the balls could actually stand for many things in scientific and technological applications.
16.2.1 The Case of Distinguishable Balls with Replacement We draw a ball from the bag, record the number, and then throw the ball back into the bag. Repeating this process k times, we form a k-tuple of numbers,
679
PERMUTATIONS AND COMBINATIONS
( I C ~ , X .~. . , ,xz,. . . ,xk), where x, denotes the number of the zth draw. Let 5' be the totality of such k-tuples. In the first draw, we could get any one of the N balls; hence there are N possible and equiprobable outcomes. In the second draw, since the ball is thrown back into the bag or replaced with an identical ball, we again have N possible outcomes. All together, this gives N2 possible outcomes for the first two draws. For k draws, naturally the sample space contains
(16.52)
lVk
k-tuples as possible outcomes 16.2.2
The Case of Distinguishable Balls Without Replacement
We now repeat the same process but this time do not replace the balls. In the first draw, there are N independent possibilities. For the second draw, there are N - 1 balls left in the bag, hence only N - 1 possible outcomes. For the r t h draw, r 5 k , there will be N - T 1 balls left, thus giving only N - T 1 possibilities. For k draws, the sample space will contain N ( k )elements:
+
N ( k )= N ( N
-
+
1)(N- 2 ) . . . ( N - k
+ 1),
(16.53)
which can also be written as "k)
=
-
N ( N - 1) . . . ( N - k + 1) [ ( N - k ) ( N - k [(N-k)(N- k-1)...2.1] N! (N -k)!'
-
1). . . 2 . 11
~
(16.54)
Permutation is a selection of objects with a definite order. N objects distributed into k numbered spaces has N ( k )distinct possibilities, N ( ' ) , which is also written as N P ~When . k = N , we have N ! possibilities, thus we can write (16.55) This is also taken as the definition of O! as O! = 1.
(16.56)
Example 16.4. A coin i s tossed 6 t i m e s : If we identify heads with 1 and tails with 2, this is identical t o the bag problem with replacement, where AT = 2 and k = 6. Thus there are 26
= 64
680
PROBABILITY THEORY AND DISTRIBUTIONS
possibilities for the 6-tuple numbers. The possibility of any one of them coining, say E = ( I , 2 , 1 , 1 , 1 ,a),is 1 P ( E ) = -. 64
(16.57)
Example 16.5. A die i s rolled five times: This is identical to the bag problem with replacement, where N = 6 and k = 5. We now have 65 = 7776
(16.58)
possiblc outcomes Example 16.6. Five cards selected f r o m a deck of 52 playing cards: If we assume that the order in which the cards are selected is irrelevant, then this is equivalent to the bag problem without replacement. Now, N = 52 and k = 5, which gives 52(5) =
52! = 311,875,200 (52 - 5)!
(16.59)
possible outcomes. Example 16.7. Friends t o visit: Let us say that we arrived at our home town and have 5 friends to visit. There are 5! = 120 different orders that we can do this. If we have time for only three visits, then there are 5(3) = 60 different ways. Example 16.8. Number of different numbers: How many different numbers can we make from the digits 1 , 2 , 3 , 4 ? If we use two digits and if repeats are permitted, there are 42 = 16 possibilities. If we do not allow rcpeats, then there are only 4(2)= 12 possibilities.
16.2.3 The Case of Indistinguishable Balls Lct us iiow consider N balls, where not all of them are different. Let there be 7 7 1 balls of one kind, 122 balls of the second kind,. . . , n k balls of the kth kind SUCll that n1
+ n2 + . . . + nk = N .
(16.60)
We also assume that balls of the same kind are indistinguishable. A natural question to ask is, In how many distinct ways, N(”ln2. n k ) (also written as N P , , ~ ,r l,A~) , can we arrange these balls? When all the balls are distinct, wc have N ! possibilities but n1 balls of the first kind are indistinguishable. Thus. n l ! of these possibilities, that is, the permutations of the n1 balls among themselves, lead to identical configurations. Hence, for distinct arrangements
PERMUTATIONS AND COMBINATIONS
681
we have t o divide N ! by nl!. Arguing the same way for the other kinds, we obtain
Permutation is an outcome with a particular ordering. For example, 1234 is a different permutation of 4231. In many situations we are interested in selection of objects with no regard to their order. We call such arrangements combinations and show them as (16.62)
ncr,
which means the number of ways r objects can be selected out of n objects with no att,ention paid to their order. Since the order of the remaining n,- r objects is also irrelevant, among the ,P, = n ! / r ! permutations, there are ( n - r ) ! that give the same combination, thus
,c,= ( n -p,r ) ! 72
(16.63)
~
-
n! ( n - r)!r!
(16.64)
'
Combinations are often shown as
nc, =
(;).
(16.65)
It is easy to show that
(r)
=
(n
r).
(16.66)
Example 16.9. N u m b e r of p o k e r hands: In a poker hand there are 5 cards from an ordinary deck of 52 cards. These 52 cards can be arranged in 52! different ways. Since the order of the 5 cards in a player's hand and the order of the remaining 47 cards do not matter, the number of possible poker hands is
(Y) 16.2.4
=
52! = 2,598,960. (5!)(47!)
(16.67)
Binomial and Multinomial Coefficients
Since ,C, also appear in the binomial expansion n
(16.68) j=O
682
PROBABILITY THEORY AND DISTRIBUTIONS
they are also called the binomial coefficients. Similarly, the multinomial expansion is given as
where the sum is over all nonnegative integer r-tuples ( k l ,k2,. . . ,k r ) with their sum kl k2 . . . k , = n. The coefficients defined as
+ + +
( 16.70) are called the multinomial coefficients. Some useful properties of the binomial coefficients are:
(16.71) (16.72) (16.73)
( 16.74) (16.75)
16.3
APPLICATIONS T O STATISTICAL MECHANICS
An important application of the probability concepts discussed so far comes from statistical mechanics. For most practical applications, it is sufficient to consider gases or solids as collection of independent particles, which move freely except for the brief moments during collisions. In a solid, we can consider atoms vibrating freely essentially independent of each other. According to quantum mechanics, such quasi-independent particles can only have certain discrete energies given by the energy eigenvalues €1, € 2 , . . . . Specific values of these energies depend on the details of the system. At a given moment and at a certain temperature, the state of a system can be described by giving the number of particles with energy number of particles with energy
€1, €2,
(16.76)
683
APPLICATIONS TO STATISTICAL MECHANICS
Our basic goal in statistical mechanics is to find how these particles are distributed among these energy levels subject to the conditions
Cr~i N ; =
(16.77)
i
C r L i E i=
u,
(16.78)
i
where N is the total number of particles and U is the internal energy of the system. To clarify some of these points, consider a simple model with 3 atoms, a,b, and c (Wilks). Let the available energies be O , E , ~ E ,and 3 ~Let . us also assume that the internal energy of the system is 3 ~Among . the three atoms, this energy could be distributed as
It is seen that all together there are 10 possible configurations or complexions, also called the microstates, in which the 3~ amount of energy can be distributed among the three atoms. Since atoms interact, no matter how briefly, through collisions, the system fluctuates between these possible complexions. We now introduce the fundamental assumption of statistical mechanics by postulating, a priori, that all possible complexions are equally probable. If we look at the complexions a little more carefully, we see that they can be grouped into three states, S1,S2,S3, with respect t o their occupancy numbers, 121 , 712, 7131 as
Note that only 1 complexion corresponds to state S1, 3 complexions to S2 and 6 complexions to state 5’3. Since all the complexions are equiprobable, probabilities of finding the system in the states 5’1,S2, and Ss are and respectively. This means that if we make sufficiently many observations, 6 out of 10 times the system will be seen in state S3, 3 out of 10 times it will be in state S2,and only 1 out of 10 times it will be in state 5’1. In terms of
&,
&, &,
684
PROBABILITY THEORY AND DISTRIBUTIONS
a time parameter, given sufficient time, the system can be seen in all three states. However, this simple 3-atom model will spend most of its time in state 5'3, which can be considered as its equilibrium state.
16.3.1 Boltzmann Distribution for Solids We now extend this simple model t o a solid with energy eigenvalues E I , € 2 , . . . . A particular state can be specified by giving the occupancy numbers 7 2 1 , 1 2 2 , . . . of the energy levels. We now have a problem of N distinguishable atoms distributed into boxes labeled E ~ , E Z , . . , , so that there are nl atoms in box 1, 712 atoms in box 2, etc. Atoms are considered as distinguishable in the sense that we can identify them in terms of their locations in the lattice. Since how atoms are distributed in each box or the energy level is irrelevant, the number of complexions corresponding to a particular state is given as (16.79) The most probable state is naturally the one with the maximum number of complexions subject t o the two constraints
N=CTl,
(16.80)
2
and
u =Y n i E , .
(16.81)
Mathematically, t is problem is solved by finding the occupancy numbers that make W a maximum. For reasons t o be clear shortly, we maximize 1nW and write (16.82)
d (In W ) Sni . dni
=C i
(16.83)
The maximum number of complexions satisfy the condition SlnW
= 0,
(16.84)
subject to the constraints
C~ni= 0, C~ni~i = 0.
(16.85)
i
i
(16.86)
APPLICATIONS TO STATISTICAL MECHANICS
685
We now introduce two Lagrange undetermined multipliers, Q and P. Multiplying Equation (16.85) by o and Equation (16.86) by P and then adding to Equation (16.83) gives (16.87) With the introduction of the Lagrange undetermined multipliers. we can treat all Sn, in Equation (16.87) as independent and set their coefficients to zero:
d (In W) dni
=o.
+cr+PEi
(16.88)
We now turn to Equation (16.79) and write In W as In W
N!
= In
nl!nz!..
'
= InN! - C 1 n ( n Z ! ) .
(16.89)
2
Using the Stirling approximation for the factorial of a large number, namely lnn,!
rz
n, Inn,
-
n,,
(16.90)
this can also be written as (16.91) After differentiation, we obtain
=
-Inn,.
(16.92)
Substituting this into Equation (16.88), we write -hn,
+ + PE, = 0 Q
(16.93)
to obtain
n,= A e P E 7 ,
(16.94)
where we have called A = e a . This is the well-known Boltzmann formula. As we shall see shortly, p is given as
p=--
1 kT
,
(16.95)
where k is the Boltzmann constant and T is the temperature and A is determined from the condition which gives the total number of particles as N = ni.
xi
686
16.3.2
PROBABILITY THEORY AND DISTRIBUTIONS
Boltzmann Distribution for Gases
Compared to solids. the case of gases have basically two differences. First of all, atoms are now free t o move within the entire volume of the system. Hencc, they are not localized. Second, the distribution of energy eigenvalues is practically continuous. There are many more energy levels than the number of atonis, hence the occupancy number of each level is usually either 0 or 1. Mostly 0 and almost never greater than 1. For example, in 1 cc of helium gas at 1 atm and 290 K there are approximately lo6 times more levels than atonis. Since atoms can not be localized, we treat them as indistinguishable particlcs. For the second difference, we group neighboring energy levels in tmndlcs so that a complexion is now described by saying nl particles in the 1st bundle of g1 levels with energy E I , n2 particles in the 2nd bundle of g2 levels with energy ~ 2
,
The choice of g k is quite arbitrary, except that n k has to be large enough to w:trraiit usage of the Stirling approximation of factorial. Also, gk must be large but not too large so that each bundle can be approximated by the avcragc energy E L . As before, the most probable values of n k are the ones corresponding to the niaxiniuni number of complexions. However, W is now more complicated. We first conceiitrate on the kth bundle, where there are gk levels available for t i & particles. For the first particle, naturally all gk levels are available. For thc second particle, there will be (gk - 1) levels left. If we keep going on like this, we find
diffcwiit possibilities. Since gk
>> n k , we can write this as g:k.
(16.97)
Witliiii the kth bundle, it does not matter how we order the n k particles, thus we divide 9:' with n k ! . This gives the number of distinct complexions for the kth biuidle as (16.98) Siiiiilar expressions for all the other bundles can be written. Hence the total nuiiibcr of coriiplcxions become (16.99)
APPLICATIONS TO STATISTICAL MECHANICS
687
This has to be maximized subject to the conditions
N
(16.100)
=c n k , k
u= C n k E k .
(16.101)
k
Proceeding as for the gases and introducing two Lagrange undetermined multipliers, a: and 0,we write the variation of 1nW as
( 16.102) This gives the number of complexions as nk
where A comes from N
=
(16.103)
=AgkePEk,
C kn k , arid ,i3 is again equal to -1lkT.
16.3.3 Bose-Einstein Distribution for Perfect Gases We now remove the restriction on n k . For the Bose-Einstein distribution there is no restriction on the number of particles that one can put in each level. We first consider the number of different ways that we can distribute n k particles over the g k levels of the kth bundle. This is equivalent to finding the number of different ways that one can arrange N indistinguishable particles in g k boxes. Let us consider a specific case with 2 balls ( n k = 0 , 1 , 2 ) and three boxes ( g k = 3 ) . The 6 distinct possibilities, which can be described by the formula
6=
+
[2 (3 - l)]! 2!(3 - I)!
are shown below:
I ++ II - II - I
I + II
-
I1 + I
I - II ++ II I -
I
-
II + II + I
I - II - II ++ I
’
( 16.104)
688
PROBABILITY THEORY AND DISTRIBUTIONS
This can be understood by the fact that for three boxes there are two partitions, shown by the double lines, which is one less than the number of boxes. The numerator in Equation (16.104), [ 2 + (3 - l)]!, gives the number of permutations of the number of balls plus the number of partitions. However, the permutations of the balls, 2!, and the permutations of the partitions, (3l)!, among themselves do not lead to any new configurations, which explains the denominator. Since the number of partitions is always 1 less than the number of boxes, this formula can be generalized as (16.105) For the whole system this gives W as (16.106) Proceeding as in the previous cases, that is, by introducing two Lagrange undetermined multipliers, cr and p, for the two constraints, N = C k n k and = n k & k , respectively and then maximizing In W , we obtain the Bose-Einstein distribution as
u
ck
( 16.107) One can again show that /3 = -l/lcT. Notice that for high temperatures, where the - 1 in the denominator is negligible, Bose-Einstein distribution reduces to the Boltzmann distribution [Eq. (16.103)]. Bose-Einstein Condensation: Using the method of ensembles, one can show that the distribution is also written as (16.108) where ni is now the average number of particles in the i t h level, not the group of levels, with the energy ~ i .For the lowest level, i = 1, this becomes (16.109) which means that we can populate the lowest level by as many particles as we desire by making cr very close t o E I I l c T . This phenomenon with very interesting applications is called the Bose-Einstein condensation. 16.3.4
Fermi- Dirac Distribution
In the case of Fermi-Dirac distribution the derivation of ni proceeds exactly the same way as in the Bose-Einstein distribution. However, with the exception that due to Pauli exclusion principle, each level can only be occupied
STATISTICAL MECHANICS AND THERMODYNAMICS
689
by only one particle. For the first particle there are g k levels available, which leaves only ( g k - 1) levels for the second particle and so on, thus giving the number of arrangements as. (16.110) Since the particles are indistinguishable, n! arrangements among themselves have no significance, which for the kth bundle gives the number of possible arrangements as gk!
n!(gk
-
nk)!'
(16.111)
For the whole system this gives (16.112) Using the method of Lagrange undetermined multipliers and the constraints = c k n k and u = x k n k & k , one obtains the Fermi-Dirac distribution function as (16.113) We have again written ,O = - l / k T and a is to be determined from the condition N = C kn k . With respect to the Bose-Einstein distribution [Eq. (16.107)], the change in sign in the denominator is crucial. It is the source of the enormous pressures that hold up white dwarfs and neutron stars. 16.4
STATISTICAL MECHANICS A N D T H E R M O D Y N A M I C S
All the distribution functions considered so far contained two arbitrary constants, cr and @, which were introduced as Lagrange undetermined multipliers. In order to be able to determine the values of these constants, we have to make contact with thermodynamics. In other words, we have t o establish the relation between the microscopic properties like the occupation numbers, energy levels, number of complexions, etc., and the macroscopic properties like the volume (V), density ( p ) , pressure ( P ) ,and entropy ( S ) . 16.4.1
Probability and Entropy
We know that in reaching equilibrium, isolated systems acquire their most probable state, that is, the state with the most number of complexions. This is analogous t o the second law of thermodynamics, which says that isolated
690
PROBABILITY THEORY AND DISTRIBUTIONS
systems seek their maximum entropy state. In this regard, it is natural to expect a connection between the number of complexions, W , and the thermodynamic entropy, S.To find this connection, let us bring two thermodynamic systems, A and B , with their respective entropies, SA and S B , in thermal contact with each other. The total entropy of the system is
S = SA + SB.
(16.114)
If W A and W , are their respective number of complexions, also called microstates, the total number of complexions is
W = W A. W B . If we call the desired relation
(16.115)
S = f ( W ) ,Equation (16.114) means that
+ f(Wi3) = f(WAWB).
f(wA)
(16.116)
Differentiating with respect to Wu gives us
~ ’ W =BW) A ~ ’ ( W A W B ) .
(16.117)
Differentiating once more but this time with respect to W A ,we get
(16.118) The first integral of this gives In f ’ ( W ) = - In W
+ constant
(16.119)
or
( 16.120) Integrating once more, we obtain f ( W ) = k l n W + constant, where k is some constant to be determined. We can now write the relation between the entropy and the number of complexions as
S
=
k l n W +SO.
(16.121)
If we define the entropy of a completely ordered state, that is, W = 1 as 0, we obtain the final expression for the relation between the thermodynamic entropy and the number of complexions, W, as S=klnW.
(16.122)
691
STATISTICAL MECHANICS AND THERMODYNAMICS
16.4.2
Derivation of
Consider two systems, one containing N and the other N’ particles, brought into thermal contact with each other. State of the first system can be described by giving the occupation numbers as nl
n2
particles in the energy states particles in the energy states
€1 ~2
Similarly, the second system can be described by giving the occupation numbers as n: particles in the energy states n/2 particles in the energy states
E:
E;
Now the total number of complexions for the combined system is
w = w1 . w,
(16.123) (16.124)
When both systems reach thermal equilibrium, their occupation numbers become such that In W is a maximum subject to the conditions
N =
Eni, En:,
(16. 25)
i
N’
=
(16. (16.127)
2
where N , N’. and the total energy, U , are constants. Introducing the Lagrarige undetermined multipliers, a , a’, and ,/3, we write
(16.128) i
( 16.129) i
(16.130) i
Proceeding as in the previous cases, we now write Sln W Stirling’s formula [Eq. (16.90)]to obtain
=0
E(-Inn, + a + ,Lkl)Sn,+ X(-Inn; + a’ + p&i)Sn; L
J
and employ the
= 0.
(16.131)
692
PROBABILITY THEORY AND DISTRIBUTIONS
For this to be true for all Sni and an;, we have to have (16.132) where A = eQ and A' = ea'. In other words, p is the same for two systems in thermal equilibrium. To find an explicit expression for p, we slowly add d Q amount of heat into a system in equilibrium. During this process, which is taking place reversibly at constant temperature T , the allowed energy values remain the same but the occupation numbers, ni, of each level change such that W is still a maximum after the heat is added. Hence, using Equation (16.83), (16.133) and [Eq. (16.87)]: (16.134) the change in In W is written as
=
-aC~ni- P C E ~ S ~ ~(16.135) . i
i
During this process, the total number of particles does not change: = 0.
(16.136)
i
Since the heat added to the system can be written as
dQ =
SniEi,
(16.137)
i
Equation (16.135) becomes
p = --. d l n W dQ
(16.138)
Using the definition of entropy obtained in Equation (16.122), S = kln W, Equation (16.138) can also be written as (16.139)
RANDOM VARIABLES AND DISTRIBUTIONS
Figure 16.3
693
Time function X ( t ) .
In thermodynamics, in any reversible heat exchange taking place at constant temperature, dQ is related to the change in entropy as
dQ = TdS.
(16.140)
Comparing Equations (16.139) and (16.140), we obtain the desired relation as
p=-- 1
kT’
(16.141)
where k can also be identified as the Boltzmann constant by further comparisons with thermodynamics.
16.5
RANDOM VARIABLES A N D DISTRIBUTIONS
The concept of random variable is one of the most important elements of the probability theory. The number of rain drops impinging on a selected area is a random variable, which depends on a number of random factors. The number of passengers arriving at a subway station at certain times of the day is also a random variable. Velocities of gas molecules take on different values depending on the random collisions with the other molecules. In the case of electronic noise, voltages and currents change from observation to observation in a random way. All these examples show that random variables are encountered in many different branches of science and technology. Despite the diversity of these examples, mathematical description is similar. Under random effects, each of these variables is capable of taking a variety
694
PROBABILITY THEORY AND DISTRIBUTIONS
of values. It is imperative that we know the range of values that a random variable can take. However, this is not sufficient. We also need to know the frequencies with which a random variable assumes these values. Since random variables could be continuous or discrete, we need a unified formalism to study their behavior. Hence, we introduce the distribution function of probabilities of the random variable X as
F/y(x) = P ( X 5 x).
(16.142)
From now on we show random variables with the uppercase Latin letters, X , Y, . . . , and the values that they can take with the lowercase Latin letters, x,y,. . . . Before we introduce what exactly F X ( L Cmeans, ) consider a time function X ( t ) shown as in Figure 16.3. The independent variable t usually stands for time, but it could also be considered as any parameter that changes continuously. We now define the distribution function Fx(x)as 1
rT
F X ( z ) = lim -5 C x [ X ( t ) dt, ] T - m 2T 1-T
( 16.143)
where C, is defined as (16.144) The role of C, can be understood from the next figure (Fig. 16.4), where the integral in Equation (16.143) is evaluated over the total duration of time during which X ( L C is ) less than or equal to x in the interval [-T,T].The interval over which X 5 x is indicated by thick lines. Thus, (16.145) is the time average evaluated over the fraction of the time that X(t)is less than or equal to x. Note that the distribution, F x ( z ) , gives not a single time average but an infinite number of time averages, that is, one for each x. Example 16.10.
Arcsine distribution: Let us now consider the function
X ( t ) = sinwt.
(16.146)
%],
During the period [0, X ( t ) is less than x in the intervals indicated by thick lines in Figure 16.5. When LC > 1, X ( t ) is always less than x, hence
Fx(Z)= 1, x > 1.
(16.147)
RANDOM VARIABLES AND DISTRIBUTIONS
T
-T Figure 16.4
When x
695
Time average of X ( t )
< -1, X ( t ) is always greater than x , hence F x ( x )= 0 , x
< 1.
(16.148)
For the regions indicated in Figure 16.5 we write
=
1 -[7r+2sin-'x],
(16.149)
W
thus obtaining the arcsine distribution as 1,
x > 1,
1 + - s1i n. - 1 x ,
1x1 5 1,
0,
x < -1.
2 7 r
(16.150)
Arcsine distribution is used in communication problems, where an interfering signal with an unknown constant sinosoid of unknown phase may be thought to hide the desired signal.
696
PROBABILITY THEORY AND DISTRIBUTIONS
Figure 16.5
16.6
Arcsine distribution
DISTRIBUTION FUNCTIONS AND PROBABILITY
In the above example, the evaluation of the distribution function was simple, since the time function, X ( t ) , was given in terms of a simple mathematical expression. In most practical situations due to the random nature of the conditions, X ( t ) cannot be known ahead of time. However, the distribution function may still be determined by some other means. The point that needs to be emphasized is that random processes in nature are usually defined in terms of certain averages like distribution functions. If we remember the geometric definition of probability given in Section 16.1.7, the fraction of time that X ( z ) 5 J: is actually the probability of the event { X ( t )5 x} happening. In the same token, we call the fraction of time that 2 1 < X ( t ) 5 52 the probability of the event (51 < X ( t ) 5 52} and show it as P(z1 < X ( t ) 5 z 2 } . From this, it follows that
P { X ( t )5
5) = Fx(5).
(16.151)
Since
1, a < x < b , (16.152)
G ( x )- Ca(5)= 0, otherwise,
DISTRIBUTION FUNCTIONS AND PROBABILITY
697
we can write
P{zl < X ( t )5 =
z2} =
l o o
lim 2T
T+=
{CZ* [X(t)l- cz, [X(t)I}dt
Fx(z2)- Fx(z1).
(16.153)
Thus, the probability P { z l < X ( t ) 5 z2} is expressed in terms of the distribution function F x (x).This argument can be extended to nonoverlapping intervals [ X I , zz], [x3,24],. . . as P{z.l < X ( t ) 5
x2,53
< X ( t )5 24,’..}
=
[Fx(x2)- Fx(z1)l + [Fx(z4)- Fx(z3)I + . . . .
(16.154)
From the definition of the distribution function, also called the cumulative distribution function, one can easily check that the following conditions are satisfied: (i) 0 I Fx(z)51, (ii) limr--oo F x ( z ) = 0 and limz-m Fx(z)= 1, (iii) F x ( z ) 5 Fx(z’), if and only if z 5 2’. This means that Fx(z)satisfies all the basic axioms of probability given in Section 16.1.2. It can be proven that a real valued function of a real variable which satisfies the above conditions is a distribution function. In other words, It is possible to construct at least one time function, X ( z ) , the distribution function of which coincides with the given function. This result removes any doubts about the existence of the limit i
F x ( z ) = Iim T-oo 2;]-~
r7
CZ[X(t)]d t .
(16.155)
Note that condition (iii) implies that wherever the derivative of F x ( z ) exists, it is always positive. At the points of discontinuity, we can use the Diracdelta function to represent the derivative of F x ( z ) . This is usually sufficient to cover the large majority of the physically meaningful cases. With this understanding, we can write all distributions as integrals: (16.156) where (16.157) The converse of this statement is that if p x ( z ) is any nonnegative integrable function, (16.158)
698
PROBABILITY THEORY AND DISTRIBUTIONS
Figure 16.6
The uniform distribution.
) in Equation (16.156) satisfies the conditions (i)-(iii). The then F X ( I Cdefined function, ~ x ( I cobtained ), from Fx(x)is called the probability density function. This name is justified if we write the event { X ( t ) in D } , where D represents some set of possible outcomes over the real axis, as
P { X ( t ) in D} =
(16.159)
or as
P { z 5 X ( t ) 5 IC
16.7
+ d r ~ }= p x ( ~dx. )
(16.160)
EXAMPLES OF CONTINUOUS DISTRIBUTIONS
In this section we introduce some of the most commonly encountered continuous distribution functions. 16.7.1
Uniform Distribution
Probability density for the uniform distribution is given as
( 16.161)
699
EXAMPLES OF CONTINUOUS DISTRIBUTIONS
where a is a positive number. The distribution function Fx(x) is easily obtained from Equation (16.156) as (Fig. 16.6)
I
0,
x < -a, (16.162)
16.7.2
Gaussian or Normal Distribution
The bell-shaped Gauss distribution is defined by the probability density
PX(X) =
1
e-(1/202)(z-m)2
, a>0, - m < x < c Q ,
(16.163)
and the distribution function (Fig. 16.7)
( 16.164) where
( 16.165) Gaussian distribution is extremely useful in many different branches of science and technology. In the limit as a + 0, px(x) becomes one of the most commonly used representations of the Dirac-delta function. In this sense, the Dirac-delta function is a probability density.
16.7.3
Gamma Distribution
{
The Gamma distribution is defined by the probability density
a n e ; i i )2n-l
, x>o,
PX(X) = 0,
( 16.166)
x50,
where a , n > 0. It is clear that px(x) 2 0 and S_",px(x) dx = 1. There are two cases that deserves mentioning: (i) The case where n = 1 is called the exponential distribution:
PX(X> =
ae-"",
x > 0,
0,
x 5 0.
(16.167)
700
PROBABILITY THEORY AND DISTRIBUTIONS
Figure 16.7
The Gauss or the normal distribution
(ii) The case where a = 1/2, n = m/2, where m is a positive integer, is called the x2 distribution (Chi-square), with m degrees of freedom:
,-xPx(m12)-1 Px(X) =
2m/2qm/2) '
x > 0, (16.168)
In general, the integral
(16.169) cannot be evaluated analytically for the x2 distribution, however there exists extensive tables for the values of Fx(x).The x2 distribution is extremely useful in checking the fit of an experimental data t o a theoretical one.
16.8
DISCRETE PROBABILITY DISTRIBUTIONS
When X is a discrete random variable, the distribution function, Fx(x),becomes a step function. Hence it can be specified by giving a sequence of numbers, x1,x2,. . . , and the sequence of probabilities, px(xl),px(z2),. . . , satisfying the following conditions: px(z2) > 0,
2
= 1 , 2 , .. .
, ( 16.170)
DISCRETE PROBABILITY DISTRIBUTIONS
701
Now, the distribution function, F x ( z ) ,is given as
Some of the commonly encountered discrete distributions are given below: 16.8.1
Uniform Distribution
Given a bag with N balls numbered as 1 , 2 , . . . ,N . Let X be the number of the ball drawn. When one ball is drawn at random, the probability of any outcome z = 1 , 2 , .. . , N is 1 P x ( Z ) = -.
(16.172)
N
The (cumulative) distribution function is given in terms of the step function as
( 16.173) A ,
i=l
where i is an integer and O(z - i) =
16.8.2
{
1,
xzi,
0,
x
(16.174)
Binomial Distribution
Let us consider a set of n independent experiments, where each experiment has two possible outcomes as 0 and 1. If Yi is the outcome of the i t h experiment, then n
X = C K
( 16.175)
i=l
is the sum of the outcomes with the result 1. We also let the probability of 1 occurring in any one of the events as
P{K
=
l} = p ,
( 16.176)
where i = 1 , 2 , . . . , n and 0 < p < 1. Now, for any event, that is, the set of n experiments with the results
Yl = Y1, y2 = Y 2 , . . . ,yn = Y n ,
(16.177)
702
PROBABILITY THEORY AND DISTRIBUTIONS
we have the probability
P(Y1
= y1,
Yz = y2,. . . , Y,
= yn} = cp"(1
-
( 16.178)
p)"-",
where yi takes the values 0 or 1 and z is the number of 1's obtained. The number c is the number of rearrangements of the symbols 0 and 1 that does not change the number of 1's. Hence, the probability of obtaining z number of 1's becomes
( 16.179) The reason for multiplying with
(:)
n! z ! ( n- z)!
=
(16.180)
is that the permutations of 0's and the permutations of 1's do not change the number of 1's. We now define the binomial probability density function as
p x ( z )=
(p -
p)"-",
z = 0,1,. . . ,n.
(16.181)
Notice that
p x ( 2 ) > 0,
2
= 0,1,. . . , n ,
( 16.182)
and (16.183) This is clearly seen by using the binomial expansion [Eq. (16.68)] as n
n
=(p
I
+1
\
-
p)"
( 16.184)
= 1.
Finally, we write the binomial distribution function as
F x ( z )=
c
(;)p"(l -p)"-",
z = 0 , 1 , . . . ,n.
(16.185)
xln
3,
Example 16.11. A coin tossed 6 t i m e s : We now have p = n = 6 and X stands for the total number of heads. Using Equations (16.181) and
DISCRETE PROBABILITY DISTRIBUTIONS
703
(4 631 64
2016
1516
6/6
11
4I 0
1
2
3
Figure 16.8
571 64
1 42/ 64
I 2 2 64
I1 64
11 6(,
4
5
6
c
X
+
6
Plots of Px(z) and Fx(z)for a coin tossed 6 times.
(16.185), we can summarize the results in terms of the following table:
where the histograms of p x ( z ) arid Fx(z)are given as in Figure 16.8.
16.8.3
Poisson Distribution
Poisson probability density is given by (16.186)
It is obvious that px(x) is positive for every nonnegative integer and (16.187)
704
PROBABILITY THEORY AND DISTRIBUTIONS
Poisson distribution function is now written as (16.188) Poisson distribution expresses the probability of a number of events occurring in a given period of time with a known average rate and which are independent of thc time since the last event. Other distributions and examples can be found iri Harris.
16.9
FUNDAMENTAL T H E O R E M O F AVERAGES
We now state an important theorem, which says that all time averages of the form i
rT
(16.189) where the limit exists and q5 is a real valued function of a real variable, can be calculated by using the probability density px(z) as l TEmm
2T
.II,dX(t)l /” 4(Z)Px(Z) T
dt =
dz
(16.190)
--oo
or by using the distribution function F x ( z ) as (16.191) This theorem is also called the quisi-ergodic theorem. We shall not discuss the proof, since it is far too technical for our purposes. However, a plausibility argument can be found in Margenau and Murphy by Hofstetter. Significance of this theorem is in the fact that it allows us to calculate time averages in terms of distributions or probability densities, which are easier t o obtain and to work with. The quantity on the right-hand side is called the expected value of 4(z) and it is usually shown as
( 16.192) Of course, the time function X ( t ) is still a quantity of prime importance to the theory. Proving the equivalence of relations like Equations (16.190) and (16.191), which establishes the connections between the expected values ca.lculated by using the distribution functions and the time averages, is done in a, series of philosophically involved theorems usually referred to as the law of large numbers or the ergodic theorems. Note that the expected value is basically a weighted average with the weights determined by px(z).
MOMENTS OF DISTRIBUTION FUNCTIONS
705
In the case of discrete random variables? since PX(Z2)
= P{X = Xi},
(16.193)
the expected value is written as
Basic rules for manipulations with expected values are given as
(4= a , (ag1 + bgz) = a h ) + b ( g 2 ) , (9) 5 (Igl), where a and b are constants and g,91, and 16.10
g2
(16.195) (16.196) ( 16.197)
are functions.
MOMENTS OF DISTRIBUTION FUNCTIONS
We mentioned that the majority of the time averages can be calculated by using distribution functions. However, in most cases by concentrating on a few but more easily measured parameters of the distribution function, such as the moments of the distribution functions, one can avoid the complications that measuring or specifying a complete distribution function involves. The nth moment, a,, of a distribution function, Fx,is defined by the equation
1
00
an = (P) =
z,px(z) d z ,
(16.198)
-00
where a, is called the nth moment of the Random variable X . First-order moment, ~ 1 which , gives the expected value or the most probable value of X is also called the mean and is shown as m or m x . Interpretation of the higher moments become easier, if we define the n t h central moment as
I, 00
(z - m)"px(z)dz.
p n = ( ( X - m)")=
(16.199)
The second central moment, p2, is usually called the variance of the distribution and it is shown as u2or u : . Variance can often be calculated by the formula
2 = ( ( x- m)') =
(x')
-
2 m ( x )+ m2 2
= ~ 2 - m .
(16.200)
706
PROBABILITY THEORY AND DISTRIBUTIONS
The square root of variation is called the standard deviation and it is a measure of the spread about the mean value. Another important concept is the median, me,which is defined as
(16.201) gives the midpoint of the distribution. In the case of Gaussian or the uniform distribution, the median coincides with the mean. However, as in the case with binomial distribution with p # 1/2, this is not always the case. In the X!
x j k ) ,where x ( ~=) -we refer ( X - k)! to ( X ) = c\i[k] as the kth factorial moment.
case of discrete random variables, X
16.10.1
=
Moments of the Gaussian Distribution
The mean of the Gaussian distribution [Eq. (16.163)] is written as 03
al=Lxzz 1
After a suitable change of variable,
,-(1/202)(z-m)*
< = (x
-
dx.
(16.202)
m)/a,the integral gives (16.203) (16.204)
=O+m = m.
(16.205) (16.206)
With the same variable change, the variance is obtained as
(16.208) = u2 .
(16.209)
Central moments of Gaussian distribution are given as
( 16.210) and
where n = 0 , 1 , 2 , .
MOMENTS OF DISTRIBUTION FUNCTIONS
16.10.2
707
Moments of the Binomial Distribution
Mean of the binomial distribution [Eq. (16.18l)l is written as
c n
a1 = m =
(16.212)
-
5
x=o n -
-
C (n
x=o n
c nc c x=l
x= 1 n
-
(16.213)
p" (1 - p)"-"
x)!x!
n!
(16.214)
p"(1 - p)"-" ( n - x)!(x - l)!
n
-
xn! -
x=l
( n- l ) ! p z (1 - p)"-" ( n - x)!(x - l)!
( 16.215)
(n- l)! p X ( l- p)"-" [ ( n- 1) - (x - l)]!(x - l)!
(16.216)
( 16.217) We now make the substitutions
2 - 1 = y a n d n - 1 =m'
(16.218)
to write Equation (16.217) as (16.219) Since the sum is equal t o 1 [Eq. (16.184)], we obtain the mean as m = np.
(16.220)
To find the variance, a', we use Equation (16.200): (p2 =)a2 = a2
To evaluate as follows:
a2,
-
we first write (X2) as ( X ' )
m2 . -
( X )+ ( X ), and then proceed
+(X)
a:! = (x2)= ( X ( X - I))
(16.222) (16.223)
= ~x =x2( x - l ) ( ~ ) p x ( l - p ) n - x + m
c n
-
x=2
n!
p"(1- p)"-"
(x - 2)!(n - x)!
c
(16.221)
+m
(16.224)
n
= n(n - l)p2
x=2
( 16.225)
708
PROBABILITY THEORY AND DISTRIBUTIONS
Using the substitutions y =x
-
2 and m' = n - 2,
(16.226)
we write this as (16.227) Sunimation is again 1, thus yielding 0 2 =
n(n - 1 ) p 2 + m.
(16.228)
Using this in Equation (16.221) and the expression for the mean in Equation (16.220), we finally obtain the variance, cr2 = a2 - m2,as cr 2 = n(n - 1)p2 = n2p2 - n p 2
+ m - m2 + n p n2p2 -
= np(1 - p ) .
16.10.3
(16.229)
Moments of the Poisson Distribution
We write the factorial moments of the Poisson distribution as
x=o
(16.230)
*;
where (16.231) Thus,
(16.232)
= Ak,
where we have defined a new variable as x the mean as
m =cq
-
= a[l]=
k A.
= y.
Since x ( l ) = x , we write
(16.233)
MOMENTS OF DISTRIBUTION FUNCTIONS
709
Since (16.234) we can write a2 =q21
+ “111;
(16.235)
hence the variance, c2,is obtained as
c2
=a2--m
=921 = x2
2
+ ap] - m2
+ x - x2
(16.236)
= A.
Using the Dirac-delta function we can write the Poisson probability density [Eq. (16.186)] as a continuous function of z :
(16.237)
and find the mean, m, as the integral
(16.238) Note that the gamma function F(z
+ 1) is equal t o k ! when z = k , integer.
710
PROBABILITY THEORY AND DISTRIBUTIONS
Similarly, we evaluate the second moment, 02, as
= Ae-’ = A2
[Ae’
+ ex]
+ A.
(16.239)
Using definition of the variance,
a2 we finally obtain a’
2 =a2-m,
(16.240)
= A.
16.11 CHEBYSHEV’S THEOREM To demonstrate how well a and c2 represent the spread in the probability density, p x ( z ) ,we prove a useful theorem which says that if m and IY are the mean and the standard deviation of a random variable, X , then for any positive constant, k , the probability that X will take on a value within k standard deviations of the mean is at least 1 that is,
&,
px(Iz - mi- k a )
1
2 1 - -, k2
0
# 0.
(16.241)
Proof: Using the definition of variance we write (16.242)
CHEBYSHEV’STHEOREM
711
Chebyshev theorem.
Figure 16.9
and divide the region of integration as 2
=L m-ku
(x - m)’px(x) dx
+ J’
m+ku
m-ku
(x - m ) 2 ~ ~dx (x)
00
.I,,(x,
-
+
m)2px(x) dx.
(16.243)
Since p x ( z )2 0 , the second integral is always
(x - m)’px(x) dx In the intervals x 5 m - k a and x
+
2 0, thus we can also write
00
(z - rn)’px(x) dx.
(16.244)
L + k u
2 m + k a the inequality
(x - m)’ 2 k2a2
(16.245)
is true; hence Equation (16.244) becomes (16.246) (16.247) or (16.248)
712
PROBABILITY THEORY AND DISTRIBUTIONS
The sum of the integrals on the right-hand side is the probability that X will take a value in the regions x I m - k a and x 2 m k a (Fig. 16.9), which means
+
( 16.249) Since J-",
px (x)dx = 1, we can write
(16.250) thereby proving the theorem. The proof for discrete distributions can be given by similar arguments. Note that the Chebyshev's theorem gives a lower bound to the probability of observing a random variable within k standard deviations of the mean. The actual probability is usually higher. 16.12
LAW OF LARGE N U M B E R S
One of the fundamental concepts of the probability theory is the law of large numbers. It describes how the mean of a randomly selected sample from a large population is likely to be close to the mean of the entire population. In other words, for an event of probability p, the frequency of occurrence, that is, the ratio of the number of times that event has actually occurred to the total number of observations, approaches p as the number of observations become arbitrarily large. For example, the average mass of 20 oranges randomly picked from a box of 100 oranges is probably closer to the actual average than the average mass calculated by just 10 oranges. Similarly, the average calculated by randomly picking 95 oranges will be much closer to the actual average found by using the entire sample. Even though this sounds as if we are not saying much, the law of large numbers allows us t o give precise measurements of the likelihood that an estimate is close to the right or the correct value. This allows us to make forecasts or predictions that otherwise would not be possible. A formal proof of the law of large numbers is rather technical for our purposes (Grimmett and Stirzaker). However, t o demonstrate how the law of large numbers work, consider a random variable X having a binomial distribution [Eq. (16.185)] with the parameters n and p. We now define a new random variable:
X
y=(16.251) n' where Y is the ratio of success in n tries. For example, X could be the number of times a certain face of a die comes in n number of rolls. Using the results for X, that is, np for the mean and np(1 -p) for the variation [Eqs. (16.220) and (16.229)], we can write the mean and the variance of Y as my = p and o$ =
P(1 -. - P )
n
(16.252)
PROBLEMS
713
Using the Chebyshev’s theorem with k = c / g , where c is a positive constant, we can conclude that the probability of obtaining a particular value for the ratio, Y , within the range my - c and my c, that is, between
+
p-c
and p + c
(16.253)
in n trials is at least (16.254) The actual frequency, p , is observed only when the number of tries goes t o infinity, which in this case [Eq. (16.254)] approaches 1. This result is called the law of large numbers. For a coin toss experiment the actual frequency is p = 0.5, hence our chance of observing the frequency of heads or tails in the interval (0.5 - c, 0.5
+ c) is at least
(
):on
1- - . Since the probability in
Equation (16.254) is always a number < 1, this means that n and c have to satisfy the inequality n > 0.25/c2. PROBLEMS 1. A coin is tossed four times. Write two different sample spaces in line with the following criteria:
(i) Only the odd number of tails is of interest. (ii) Outcome of each individual toss is of interest. 2. Write the sample spaces when (i) a die rolled 3 times, (ii) 3 dice rolled simultaneously, (iii) 4 balls selected without replacement from a bag containing 7 black and 3 white balls.
3. If a pair of dice are rolled, what is the probability of the sum being 8?
4. When a tetrahedron with its sides colored as the first face is red, A, the second face is green, B, the third face is blue, C, and the fourth face is in all three colors, ABC, justify the following probabilities by writing the appropriate sample spaces: 1
P ( A ) = P ( B ) = P ( C ) = -, 2 1
P ( A / B ) = P ( B / C )= -, 2 1 2 1 P ( C / B ) = P ( A / C ) = -, 2
P(C/A)= P ( B / A )= -,
P ( A / B C )= P ( B / A C )= P ( C / A B )= 1.
714
PROBABILITY THEORY AND DISTRIBUTIONS
5. Two individuals decide to toss a coin n times. One bets on heads and the other on tails. What is the probability of them coming even after n tosses?
6. Given 6 bags, where 3 of them have the composition A1 with 4 white and 2 black balls each, 1 bag has the composition A2 with 6 black balls, and the remaining 2 bags have the composition A3 with 4 white balls and 2 black balls each. We select a bag randomly and draw one ball from it. What is the probability that this ball is black?
7. Given 8 identical bags with the following contents: 4 bags with the contents A1 composed of 3 white and 3 black balls each, 3 bags with the contents A2 composed of 1 white and 2 black balls each, and 1 bag with the contents A3 composed of 3 white and 1 black ball. We pick a ball from a randomly selected bag. It turns out t o be black; call this event B . Find, after the ball is picked, its probability of coming from the second bag. 8. In a coin toss experiment, you know that the first 4 tosses came heads. What would you bet on for the next toss? What would you bet on if you did not know the result of the previous tosses. 9. In a die roll experiment two dice are rolled. What is the probability of the sum being 2, 5, 9? What is the probability of these sums coming in succession? 10. Four coins are tossed. (i) What is the probability of the third coin coming heads? (ii) What is the probability of having an even number of tails?
11. A bag contains 20 black and 30 white balls. What is the probability of drawing a black and a white ball in succession?
12. Five cards are selected from an ordinary deck of 52 playing cards. Find the probability of having two cards of the same face value and three cards of different face value from the remaining 12 face values. 13. Given 5 bags: Two of them have the composition A1 with two white and one black ball each, one bag has the composition A2 with 10 black balls, and the remaining two bags have the composition A3 with three white balls and one black ball each.
A ball is drawn from a randomly selected bag. (i) Find the probability of this ball being white.
(ii) Find the probability of this ball being black.
PROBLEMS
715
14. How many different six letter “words” can you make using the English alphabet? Count every combination as a word. How many words if consonants and vowels must alternate?
15. What is the number of ways of distributing k indistinguishable objects in N boxes with no empty box. 16. In how many ways can you distribute k indistinguishable balls into N boxes so that there is at most one ball in each box? Obviously, k 5 N .
17. How many distinguishable rearrangements of the letters in the word “distinguishable.” 18. Prove the binomial theorem, n k=O
by induction.
19. Show the following:
(4
2 (;)
= n2n-1,n
2 0, integer.
c ( - l ) k k ( L ) = 0, n
2 0, integer.
k
k=O
(ii)
k=O
(iii)
2 (3
k=O
( ‘) (;)
g ( - l ) km-k k=O
=
n 2 0, integer.
= 0, m, n are integers with
n 2m20.
716
PROBABILITY THEORY AND DISTRIBUTIONS
20. Prove the following properties of the binomial coefficients:
21. In a roll of three dice, what is the chance of getting for the sum either a 5. 9 or a 10.
22. Ten black and 10 white balls t o be ordered along a line. (i) What is the probability that black and white balls alternate?
(ii) If they are positioned along a circle, what is the probability that they alternate?
23. A bag contains three balls; red, white, and green. Five balls are drawn with replacement. What is the probability of green balls appearing three times?
24. Starting with [Eq. (16.106)]
give a complete derivation of the Bose-Einstein distribution:
Also show that Equation (16.99):
PROBLEMS
717
used in the derivation of the Boltzmann distribution for gasses, can be obtained from W = ( ~ ~ with ~ the~ appropriate ~ ~ assumption. ~ $ !
n,
25. Complete the intermediate steps of the derivation of the Fermi-Dirac distribution:
26. Hypergeometric distribution: Consider a shipment of N objects, in which M are defective. Show that the probability density function:
where X is the number of defective items, gives the probability of finding x number of defective items in a random selection of n objects without replacement from the original shipment. Check that p ~ ( x 2) 0 and C , p x ( x ) = 1. As a specific illustration make a table of p x ( x ) vs. x for N = 15, M = 5, n = 5. 27. Find the probability density of the Cauchy distribution:
" Verify that your result is indeed a probability density function. 28. Show that
f(z)=
I
O<x
l<x<2,
otherwise,
is a probability density. Find the corresponding distribution function, F x ( z ) , which is called the double triangle distribution. Plot both f(x) and FX (z). 29. A die is rolled until a 5 appears. If X is the number of rolls needed, find
P { X 5 x}. 30. Two dice are rolled. If X is the sum of the number of spots on the faces showing, find P { X 5 z}.
718
PROBABILITY THEORY AND DISTRIBUTIONS
31. On the average, 10 % of the goods produced in a factory are defective. In a package of 100 items, what is the probability of 3 or more of them being defective. 32. A book of 400 pages has 150 misprints. What is the probability of having 2 misprints in a randomly chosen page? What is the probability of having 2 or more misprints? 33. A fair die is rolled until every face has appeared twice. If X is the number of rolls, what is the probability density of X ? 34. Show that when X has binomial distribution, then 1
P { X = even} = -[I+ ( q - p)"]. 2 Also show that when X has the Poisson distribution, then
P{X
= even) =
1 -[I 2
+ e-"].
35. Show that the following are probability density functions: (i)
(ii) Rayleigh distribution:
36. Find the factorial moments of the hypergeometric distribution described by the probability density
37. Find the factorial moments of the Polya distribution described by the following probability density function:
PROBLEMS
719
where x = 0 , 1 , . . . , N , cx,p > 0. Using this write the mean and the variance .
38. Given the distribution function F x ( x ) = 22/37r --1/2xe-"4/2 1
x>0,
find the mean and the variance. 39. Complete the details of the integrals involved in the derivation of the moments of the Gaussian distribution.
40. Given the following probability density of X : c0x4(1
-
x)4, 0 <
< 1,
f(x) = 0, elsewhere. (i) Find the normalization constant
CO.
(ii) Find the mean m and the variance cr2. (iii) What is the probability that X will take a value within two standard deviations of the mean? (iv) Compare your result with the Chebyshev's theorem.
41. Find a bound on the probability that a random variable is within 4 standard deviations of its mean. 42. Using the Gaussian probability distribution, find the probability of an event to be found within 3 standard deviations of its mean and compare your result with Chebyshev's theorem. 43. Show that the mean and the variation of the hypergeometric distribut ion,
where
x = O , 1 , 2 ,... ,n, x 5 M andn-x 5 N - M, are given as
720
PROBABILITY THEORY AND DISTRIBUTIONS
and
a2 =
n M ( N - M ) ( N - n) N 2 ( N - 1)
44. Using a suitable coin toss simulator, which can be found on the internet, demonstrate the law of large numbers.
CHAPTER 17
INFORMATION THEORY
Information is a frequently used word in everyday language, and almost everybody has some notion of what it stands for. Today, as a technical term, information has also entered into many different branches of science and engineering. It is reasonable to start the beginning of modern information theories with the discovery of communication devices and computers. Before we introduce the mathematical theory of information, let us start by sorting out some of the complex and interrelated concepts that this seemingly simple word implies. For a scientific treatment of information, it is essential that we define a meaningful measure for its amount. Let us start with an idealized but illustrative example. Consider large numbers of aeroplanes and cars of various kinds and models, all separated into small parts and put into separate boxes. Also assume that these parts are separated in such a way that by just looking at one of them, you cannot tell whether it came from an aeroplane or a car. These boxes are then stored randomly in a warehouse with codes, (z,y, z ) , indicating their locations in the warehouse printed on them. These parts are also separated in such a way that they can be brought together with only a few relatively simple tools, like screwdrivers and wrenches. Using the parts in Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
8. SelGuk Bayin 721
722
INFORMATION THEORY
the warehouse and a given set of instructions, it is now in principle possible for anybody who can operate a few simple tools t o build a car or an aeroplane. Instructions will always be simple like: Take the parts in the boxes at ( 2 1 , y1, z1) and (Q, y2, z 2 ) , bring their indicated sides together, and then connect them with a certain size screw, bolt, etc. At each step of the way, the assembler does not have to know what he or she is building, nor will he or she have to make any decisions on anything. Obviously, building a full-size luxury car is going to require a lot more instructions than for a subcompact car. Similarly, a Boeing 747 will require much more information than either one of them. This shows that it should be possible to define a measure for the amount of information, at least for comparable tasks. In the mathematical theory of information, we use bit as the unit of information, where 1 bit is the information content of the answer to a binary question like right or left? top or bottom? etc. This example also contains several other important elements of the information theory like data, information, knowledge and goal, which we will discuss shortly. In the above example, data consist of the locations of the needed parts. Information is the set of instructions, consisting of the sequence in which the parts are to be put together and how. Knowledge is the ability to understand and to follow the instructions; and finally the goal is the product. Adding purpose to these, we can draw the following block diagram for a basic information processing mechanism, where information is included as a part of knowledge. In the above example, purpose lies within whoever
purpose
data
knowledge I - b - / T l
supplies the information t o the assembler. Only the person who supplies the information knows what the final product looks like and why it is being built. An important aspect of information is its value or relevance, which is subjective and naturally very hard to quantify. Sometimes l b of information, like the answer to a simple binary question as go or no go, buy or sell, push or don’t push, may have life or death consequences to a person, to a company, or even to an entire nation. Information that makes one person respond as “great” may only get “SO what” from another. Yet another person may respond as “what else is new?” In the last case, can we still consider it as information to that person? In summary, information depends not only on the receiver but also on where and when it is received. Being too late, too little, or out of place may easily render otherwise valuable information worthless. Besides, information not only depends on the receiver but also on the sender and how reliable the sender is. In many instances, the sender may be trying to manipulate the receiver’s decision processes by deliberately giving carefully selected wrong or incomplete information. When the sender is not reliable, the receiver may
723 be perplexed even by the correct information, hence lose precious time in reaching a critical decision. These become significant tools for manipulating one’s opponent, especially when important decisions have t o be made over a very short period of time. Another important aspect of information is how it is transmitted. This process always involves some kind of coding and then decoding device so that the sender and the receiver can understand each other. In the above example, instructions can be transmitted via written documents or e-mail. The security of the channels used for transmission is also another major concern and brings a third party into the system. Coding theory is primarily interested in developing a spy-proof means of communications. We also need ways to store information for future uses. In such cases, problems regarding the efficient and safe means of information storage and its rapid and reliable retrieval are among the major technological challenges. Sometimes the information we have may involve gaps or parts that have changed or parts that cannot be read for a number of reasons. A major problem of information science is the need t o develop methods of recovering the missing or degraded information. So far, the information we talked about has a sender and a receiver with some communication device in between. Sometimes, the information we require is in the environment, which could be social, political, electronic, or physical. In such cases we need mechanisms to extract the needed information from the environment quickly and correctly and possibly without affecting the environment. From the above introduction, one may get the impression that information is a just a bag of worms that will defy mathematical treatment for years to come. In part this is true. However, beginning with the classical paper of Shannon in 1948, mathematical theory of information has become a serious interdisciplinary science at the crossroads of mathematics, physics, statistics, psychology, computer science, and electrical engineering. Its impact on the development of CDs, DVDs, mobile phones, internet, search engines, and the study of linguistics and of human perception has been extremely important. Even the part of information that has defied proper mathematical treatment is being carefully studied and extensively used in political science and economics for manipulating target audiences at rational levels through the controlled flow of information over various direct or indirect channels without the subject noticing. Also, certain channels, when they cannot be closed or censored, can be rendered useless by filling them with irrelevant information. This is essentially equivalent to lowering the signal-to-noise ratio of that channel. Such tactics with military and political applications are usually discussed within the context of disinformation, which should not be confused with misinformation, which is not intentional.
724
INFORMATION THEORY
17.1 ELEMENTS OF INFORMATION PROCESSING MECHANISMS We have mentioned that the basic elements of information processing can be shown by the above block diagram. We will now discuss what each term stands for in greater detail and add more structure to this diagram. In general, the subject of an information processing mechanism could be a factory, a person, a team of individuals, a company, a social network, or even an entire nation. All of these subjects could be considered as having a collective intelligence and memory-in short, a “mind.” For each one of these subjects-depending on their p u r p o s e d a t a , information, knowledge, and goal have different meanings. The effect of the information processing mechanism on the subject is to change it from one state of mind, ~QI), to another, 1Qz). The state of mind function is going to depend on a number of internal parameters, which at this point is not important for us. For an individual, IQ) represents a certain pattern of neural networks in that person’s brain. Let us now consider a typical scientific endeavor, which is basically an information processing mechanism that can be generalized t o any other situation relatively easily. A scientist starts the process with a definite state of mind, IQ,), with an idea or incentive, which helps to define the purpose of the entire process. The next step is probably one of the most critical steps, that is, the initiative. What starts the entire process is the initiative, not the existence of purpose or incentive. Life is full of examples where two individuals who had the same idealpurpose and the knowledge, but somehow only one of them got the whole process started and finalized with a definite result. The next step involves data. It usually involves raw numbers, consisting of the outputs of some measuring instruments in terms of currents, voltages, times, etc. After the data are reduced, we obtain what we call information. They are now luminosities, electric fields, distances, temperatures, pressures, etc. This information can now be fed into the next box, which involves knowledge. In scientific endeavors, knowledge is a model or a theory like Maxwell’s theory. In this case, information is the initial conditions needed by the model, which will produce new information that needs to be checked by new experiments and data. If the new data support the new information, we will have gained a better understanding of the phenomenon that we are investigating. Or, this may also lead to new questions and thus t o new incentives for new scientific endeavors, which will take us back t o the beginning of this process t o start all over again. Our improved understanding may also bring in sight new technological applications, which could start a new round of information processing. If the new data do not support the new information, we have three options: (i) We may go back and modify our model and resume the process. (ii) If we trust our model, we may recheck our data and information. (iii) If nothing works, we may go back to the beginning and redefine our purpose. Another important aspect of this entire process, which is usually ignored, is its time scale. One of the things that affects the time scale is the subject’s
ELEMENTS OF INFORMATION PROCESSING MECHANISMS
Basic information processing mechanism for scientific endeavor:
c 2. Purpose
3. Initiative
4. Data: Numbers
1
Initial conditions
6. Knowledge: Modelflheory 7. New data (output)
b. Understanding I
Oa. New purpose for new scientific endeavor Ob. New purpose with technological applications
725
726
INFORMATION THEORY
e x p e c t a t i o n , which is what the subject expects or hopes will happen when the goal is realized. Expectation could be quite different from purpose, and it is usually neither openly declared nor clearly formulated by the subject. When expectations are deliberately hidden, they are called u l t e r i o r motives, which could be an important part of the strategy. A n t i c i p a t i o n is another thing that affects the time scale, which involves projections about what may happen along the way. This not only affects the speed of the entire process by making it possible to skip some of the time-consuming intermediate steps, but also helps to avoid potential hazards and traps that may halt or delay the process. At any point along the way, targets of opportunity may pop up, the merits of which may outweigh the potential gains of the ongoing process. This may make it necessary to abandon or postpone the existing process and to start a new one, which may not even be remotely connected to the old one. Among all these steps, incentive, expectation, and understanding are the ones that most critically rely on the presence of a human brain. Understanding naturally occurs when the subjects “mind” reaches a new state, l Q 2 ) . The entire process can be represented in terms an operator G(2, l ) , which represents all the intermediate steps as 1Q2) =
G(2,1) 1Qi).
(17.1)
In real-life situations there are usually several ongoing processes collaborating or competing with each other that complicates the structure of IQI), IQ,), and G(2 , l ) . For another person-say an entrepreneur-data, information, knowledge, etc., mean different things but the above process basically remains intact.
17.2
CLASSICAL I N F O R M A T I O N T H E O R Y
So far we have discussed the conceptual side of information. Classical theory of information quantifies certain aspects of information but does not concern itself with by whom and why it was sent and why one should have it. It is also not interested in its potential effects on both the sender and the receiver. Shannon’s classical information theory basically concerns itself with computers, communication systems, and control mechanisms. It treats information content of a message independent of its meaning and purpose and defines a measure for its amount, thus allowing one t o study its degradation during transmission, storage, and retrieval. This theory works with qualitative expressions concerning the possible outcomes of a n experiment and the amount of information to be gained when one of the possibilities has actually occurred. In this sense, it is closely related to the probability theory. This is also the reason why the term s t a t i s t i c a l i n f o r m a t i o n is often used in classical information theory. To emphasize the statistical side of information, it is frequently related t o dice rolls, coin tosses, or pinball machines. We prefer to work with a different model with properties that will be useful t o us when we
CLASSICAL INFORMATION THEORY
IO> Figure 17.1
727
I 1> Pinball machine with one pin.
discuss other aspects of information and its relation t o decision theory. Our system is composed of a car with a driver and a road map, where at each junction the road bifurcates. The road could have as many junctions as one desires and naturally a corresponding number of potential destinations. We can also have an observer who can find out by which destination the car has arrived by checking each terminal. If the driver flips a coin at each junction to choose which way to go, we basically have the Roederer’s pinball machine, where the car is replaced by a ball and the gravity does the driving. Junctions are the pins of the pinball machine that deflect the balls to the right or the left with a definite probability, usually equally, and the terminals are now the bins of the pinball machine. For the time being, let us continue with the pinball machine with one pin and two bins as shown in Figure 17.1. Using binary notation, we show the state of absence of a ball in one of the bins, left or right, as 10) and its presence as 11). Note that observing only one of the bins is sufficient. In classical information theory, if the ball is not in one of the bins, we can be certain that it is in the other bin. If the machine is operated N times, the probability of observing 10) is po = No/N, where NOis the number of occurrences of 10) in N runs. Similarly, the probability of observing the state 11) is p l = N1/N, where N I is the number of occurrences of the state 11). Naturally, N = NO N1 and po + P I = 1. In the symmetric situation, po = p l = 0.5, by checking only one of the bins an observer gains on the average l b of information. That is, the amount of information equivalent t o the answer of a yes or no question. In the classical theory, the ball being in the right or the left bin has nothing to do with the observer checking the bins. Even if the observer is not interested in checking the bins, the event has occurred and the result does not change. Let us now consider the case where the observer has adjusted the pin so that the ball always falls into the left bin. In this case po = 0 and
+
728
INFORMATION THEORY
= 1. Since the observer is certain that the ball is in the left bin, by checking the bins, left or the right, there will be no information gain. In other words, the observer already knows the outcome and no matter how many times the pinball machine is run, the result will not change. This is a case where the observer may respond as “What else is new?” If the pin is adjusted so that p l is very close to zero, but not zero, then among N observations, the observer will find the ball in the left bin only in very very few of the cases. Now the response is more likely to be ‘LWow!” The classical information theory also defines an objective i n f o r m a t i o n value-as Roederer calls it, the “Wow” factor. It is also called the novelty value or the i n f o r m a t i o n content, which is basically a measure of how surprising or how rare the result is t o the observer. Information value, I , will naturally be a function of how probable that event is among all the other possible outcomes, that is, I ( p ) . To find a suitable expression, let us call the information value of observing the state 10) as 10 and 11) as 11.For the case where we adjusted the pin so that the ball always falls into the same bin, that is, pi = 1 (i = 0 or l), we define the information value as I, = 0. This is so, since the result is certain and no information is to be gained by an observation. We also define Ii -+ 00 for the cases where pi + 0. For independent events, a, b, . . . , we expect the information value t o be additive, that is, I = I , + I b + . . . . From the probability theory, we know that for independent events with probabilities p a , p b , . . . , the total probability is given as the product p = papb.. . , hence we need a function satisfying the relation
p1
+ Ib(pb) + . = I(&) + I(Pb) + . . .
I(paPb . . . ) = Ia(Pa)
‘ ’
‘
(17.2)
We also require the information value to satisfy the inequality I ( p i ) > I ( p j ) , when pi > p j . A logarithmic function of the form I ( p ) = -Clogp, C = const.
(17.3)
satisfies all these conditions. Since p 5 1, the negative sign is needed to ensure I > 0. To determine the constant C, we use the symmetric case of the pinball machine, where the average information gain is l b . Setting I(0.5) to 1, namely I(0.5) = -Cl0g(O.5) = 1,
(17.4)
we find C = 1/ log 2. Thus, the information value for the pinball machine can be written as 1
I a. -- -I log Pi log 2
= -log,pi,
i
=0
or 1,
where log, is the logarithm with respect to base 2.
(17.5)
CLASSICAL INFORMATION THEORY
729
H
Figure 17.2
Shannon’s H function for the pinball machine.
17.2.1 Prior Uncertainty and Entropy of Information Since Shannon and Weaver were interested only in the information content of the message received with respect to the set of all potentially possible messages that could be received, they called I the information content. One of the most important quantities introduced by the Shannon’s theory answers the question: Given the probabilities of each alternative, how much information on the average can we expect to gain when one of the possibilities is realized beforehand? In other words, what is the prior uncertainty of the outcome? For this, Shannon introduced the entropy of the source of information or in short the entropy of information, H , as
(17.6) Note that H is basically the average information value or the expected information gain. For the pinball machine, if the probability of finding the ball in the left bin is p , then the probability of not finding it is (1 - p ) , which gives the H function as (17.7) For a pinball machine with 2 possible outcomes, the entropy of information, H , is shown in Figure 17.2. For the symmetric case, p = 0.5, the expected gain of information is l b . When p = 0 or p = 1, H is zero since there is no prior uncertainty and we already now the result. We can extend our road map or the pinball machine to cases with more than two junctions or pins. At each junction or pin there will be two possibilities.
730
INFORMATION THEORY
Figure 17.3
Pinball machine with 4 possible equiprobable outcomes.
In general we can write H as N-I
(17.8) i=O
where N is the number of possible final outcomes and Cipi = 1. When all the possible outcomes are equiprobable, H has an absolute maximum
H = log2 N .
(17.9)
From the above definition [Eq. (17.8)], it is clear that H is zero only when all pi except one is zero. Since pi = 1, the nonzero pi is equal to 1. Note that when pi = 0, we define p i log, pi as 0. In the case of four equiprobable possible outcomes (Fig. 17.3), probability of finding the ball in one of the bins in the final level is 1/4. In this case H = 2b.
xi
Example 17.1. Pinball machine: In the binary pinball machine, if the pin is adjusted so that p~ = 2/3 and p1 = 1/3, the expected information gain, H , is
H
2 3 3 2 = 0.92b.
= - log2 -
+ -13 log2 3 ( 17.10)
Example 17.2. Expected information gain: With the prior probabilities 0.3, 0.2, 0.2, 0.1, 0.2,
(17.11)
731
CLASSICAL INFORMATION THEORY
H is found as
H = - [0.3log2 0.3
+ 3(0.2) log2 0.2 + 0.1 log2 0.11 ( 17.12)
= 2.25b.
Similarly, for a single roll with an unbiased die, which can be viewed as a pinball machine with a single pin, where the ball has 6 all equiprobable paths to go, the value of H is obtained as
(17.13)
= 2.583.
Example 17.3. Scale of H : We have the freedom t o choose the basis of the logarithm function in the definition of H . In general we can use log,, where T is the number of possible outcomes. In the most frequently used binary case, T is 2. However, a change of basis from T to s is always possible. Taking the logarithm of 2 = T ' O ~ Tto~ the base s, we write log, z = [log, T ] log,
2,
z > 0,
( 17.14)
and obtain the scale relation between H, and H , as
H,
=
( 17.15)
[log, T ] H,.
We can now use logarithms with respect to base 10 to find
= 3.32Hlo.
H2
as
(17.16)
In what follows, unless otherwise specified, we use the binary basis. We also use the notation where log means the logarithm with respect to base 10 and In means the logarithm with respect t o base e as usual. 17.2.2
Joint and Conditional Entropies of Information
The Case of Joint Events: Consider two chance events, A and B , with m and n possibilities, respectively. Let p ( i , j ) be the joint probability of the i t h and j t h possibilities occurring for A and B , respectively. For the joint event we write the entropy of information as m
n
(17.17)
732
INFORMATION THEORY
For the individual events, A and
B,we have, respectively, (17.18)
(17.19)
Comparing
with Equation (17.17) it is easily seen that the inequality
H ( A ,B)5 H ( A ) + H ( B )
(17.20)
holds. Equality is true for independent events, where (17.21) Equation (17.20) says that the uncertainty or the average information value of a joint event is always less than the sum for the individual events. The Case of Conditional Probabilities: We write the conditional probability [Eq. (16.27)] of B assuming the value j after A has assumed the value i as (17.22) where the denominator is the total probability of A assuming the value i that is acceptable by B.The conditional entropy of information of B, that is, H ( B / A ) ,is now defined as the average of the conditional entropy: (17.23) as
m
n
i
j
(17.24)
CLASSICAL INFORMATION THEORY
733
1x7,
where in the last step we have substituted p ( i , j ) for p ( z , j ’ ) ] p ( j / i ) [Eq. (17.22)]. The quantity H ( B / A ) is a measure of the average uncertainty in B when A has occurred. In other words, it is the expected average information gain by observing B after A has happened. Substituting the value of p ( j / i ) [Eq. (17.22)] and using Equation (17.18), and after rearranging, we obtain
( 17.25)
= H ( A ,B ) - H ( A ) .
Thus.
H ( A ,B ) = H ( A ) + H ( B / A ) .
(17.26)
Using Equations (17.20) and (17.26), we can write the inequality
H(A)
+ H ( B ) 2 H ( A ,B)= H ( A ) + H ( B / A ) ,
( 17.27)
hence
H ( B )2 H(B/A).
(17.28)
In other words, the average information value to be gained by observing B after A has been realized can never be greater than the average information value of the event B alone. Entropy of Information for Continuous Distributions: For discrete set of probabilities, p l , p l , . . . , p n , the entropy of information was defined as n
H = -Cp.10g2pi.
(17.29)
i
For continuous distribution of probabilities, H is defined as
JI, co
H
=
-
P ( Z ) log2 P ( Z )
dx.
(17.30)
For probabilities with two arguments we can write H ( x ,y) as (17.31) Now the conditional probabilities become (17.32) (17.33)
734
Pi
1/23
1/23
1\23
1/23
1/23
1/8
1/23
1/23
Ii
3
3
3
3
3
3
3
3
Figure 17.4
Car and driver with 8 possible targets.
Example 17.4. H for the Gaussian distribution: Gaussian probability distribution is given as (17.34)
We can find H as
(17.35)
CLASSICAL INFORMATION THEORY
735
17.2.3 Decision Theory We now go back to our car and driver model. For the sake of argument we use three junctions or “decision” points shown as 1,2 and 3 (Fig. 17.4). At each junction the road bifurcates and the driver has to decide which way t o go. This allows us to add purpose, decisions, and strategy into our model, where the driver has a specific target, say, a friend in terminal 3. When pi = 1/8, i = 1,.. . , 8 , we have the maximum entropy of information of this system, which is 3b: 8
8
i=l
i=l
(17.36) If the driver is not given any instructions and flips a fair coin at each junction to decide which way to go, the probability of reaching terminal 3 in a given try is 1/8. Reaching terminal 3 will naturally require numerous tries involving backtracking and recalling past decisions and then reversing to the other alternative. On the average, only one out of 8 tries the driver will be successful. Once the terminal 3 is reached, the driver will have acquired information with the information value I3= 3b: 13 =
-
1 -log, 8
1 log0.125 log 2
= 3b.
(17.37)
If the driver is given the sequence of instructions about which way to turn at the junctions, like right - left - right, the information content of which is 3b, then he or she can reach the desired terminal with certainty. However, the information expected to be gained by the driver when he or she gets there is now zero. The information given by the driver’s friend has removed all the prior uncertainty. Now, consider a case where some of the roads are blocked as shown in Figure 17.5. Now the H value of the system is
i=l
=
=-[&I
-
i= 1
+ 2(0.125) log, 0.1251 [3(0.25) log0.25 + 2(0.125) logO.125]
[3(0.25)log, 0.25
= 2.256.
(17.38)
Since the prior uncertainty has decreased, H is naturally less than 3b. However, the driver still has two make three binary decisions. Hence, the needed
736
INFORMATION THEORY
Pi
114
0
118
118
0
0
114
114
Ii
2
0
3
3
0
0
2
2
Figure 17.5
Car and driver with some of the roads blocked.
information to reach the terminal 3 is still worth 3b. This is also equal to the information (novelty) value, 13, of terminal 3. The difference originates from the fact that in Shannon’s theory, H is defined independent of purpose. It is basically the average information expected to be gained by an observer from the entire system. It doesn’t matter which terminal is intended and why. On the other hand, if the driver aims t o reach terminal 7, then there are only two binary decisions to be made, left - right; hence the amount of information needed is 2b. Note that a t junction 2B (Fig. 17.5) the right road is blocked. Hence, there is no need for a decision by the driver. The information value, 17, of terminal 7, is also 2b. 17.2.4
Decision Theory and Game Theory
To demonstrate the basic elements of the decision theory and its connections with the game theory, we consider a case where a merchant has to decide whether he/she should expand his/her business or not. In this case, each terminal of the decision tree has a prize/profit or penalty/loss waiting for our subject. Advisors tell that if the merchant expands the electronic goods department (EGD), in recession (R) he/she will lose $50,000. However, if the economy remains good (EG), the electronic goods department will make $120,000. On the other hand, if the merchant expands the household items department (HHI), and gets caught in recession, he/she will lose $30,000 but if
CLASSICAL INFORMATION THEORY
737
-$50000$120000 -$30000 $80000 Figure 17.6 Decision tree for the merchant.
the economy remains good, he/she will make $80,000. Finally, if the merchant does not expand (DNE), in recession, he/she will make $2,000; and if the economy remains good, he/she will make $30,000. This merchant thinks that the probabilities for the economy going into recession and remaining good are 2/3 and 1/3, respectively. The merchant also thinks that if he/she expands, the probabilities for the electronic goods and household items departments outperforming each other are 1/3 and 2/3, respectively. What should this merchant decide? We can draw the decision tree shown in Figure 17.6. We can calculate the merchants expected losses or gains for the next fiscal year as follows: If the merchant expands the electronic goods department, the expected gain is 1 2 3 3
- (50,000) -.-
11 + (120,000) -.= 2,222. 3 3
(17.39)
If the merchant expands the household items department, the expected gain is
.
(;)
= 4,444
(17.40)
738
INFORMATION THEORY
If the merchant does not expand, the expected gain is 2,000
(9 (3 -
$30,000
-
= 11,333.
(17.41)
Since the merchant’s expected profit in the last option, $11,333, is greater than the previous two cases, this merchant should delay expanding capacity for another year. This method is called Bayes’ criteria and works when we can identify the decision points and assess the probabilities. If the merchant has no idea about the probabilities and if he/she is a pessimist afraid of losing money, then the merchant should decide t o wait for another year and avoid the risk of losing $50,000. This is called the minimax criteria. That is, you minimize the maximum expected loss. Minimax criteria is aniorig the many that can be used in such situations. In this example, the merchant appears as if he/she is playing a game with the economy. The merchant has four moves: expand or wait and if he/she decides to expand, expand the electronic goods department or the household items department. The merchant also gets to make the first move. On the other hand, economy has two moves, go into recession or remain good, and it does not care about what the merchant has decided. For a game with two players, A and B , each having two moves, a l , a2 and b l , b2. rcspectively, we can write the following payoff matrix, which is also called the normal form representation : Player A
In this representation, L(ai,b j ) is the loss function for the player A, when A chooses strategy ai and B chooses strategy b j , where i, j = 1 , 2 . For the player A , the loss function, L ( a i,b j ) , is positive for losses and negative for gains and vice versa for the player B . Since whatever one player wins, the other player loses, such games are called zero-sum games. In other words, there is no cut for the house and no capital is neither generated nor lost. Zero-sum games could be symmetric or asymmetric under the change of the identities of the players. That is, in general, L(ai, b j ) # -L(bj, a i ) . Depending on which player makes the first move, the above payoff matrix can also be shown with the decision trees in Figure 17.7. These are called extensive form representations. In extensive form games, players act sequentially and they are aware of the earlier moves made. There are also games where players act without knowing what their opponent has decided. Games where players act simultaneously are essentially games of this type. Games where the players act with incomplete information about their opponents moves have also been
CLASSICAL INFORMATION THEORY
B
A
Figure 17.7 1,2.
739
Decision trees for the players A and B , where Lij = L(ai, b j ) , i , j =
designed. In such games, normal form representation is usually preferred over the extensive form. Let us now consider the game depicted in Figure 17.8, where the player A makes the first move at the first decision point, where he/she has two alternatives, a1 and u2. At the second decision point, player B gets to make his/her move with four choices: b l , 62 when A decides a1 and b3, b4 when A decides a2. We now introduce two random variables, x and y,that can only take the values 0 and 1 at the decision points 1 and 2, respectively. The decision function, dl(x), at point 1 is defined as
dl(X)
=
{
a1,
z = 0,
u2,
z = 1.
(17.43)
Similarly, the decision functions for the player B are defined as
Dl(Y) =
{
bi,
y=0, D2(Y) =
b2,
y = 1,
{
b3,
y = 0,
bq,
y = 1.
(17.44)
In this game the random variables can be tied to the actions of a third element. For example, in designing a strategy for a political confrontation one often has to factor in the potential responses of other countries. Let the player A assign the probabilities p and q for the potential actions of this third element, which affects the decisions of both A and B (Fig. 17.8). To compare the merits of all these decisions for A, we also define the expected risk/loss function R(ai,b j ) as
R ( a z , b , ) = E{L(dl(X),D,(Y))}
7
(17.45)
740
INFORMATION THEORY
A
Ll 1
Figure 17.8
Ll2
L21
Statistical game, where
Lij
L22 = L ( a i ,b j ) , i, j = 0 , l .
where the expected value, E , has to be taken with respect to the random variables z and y as
Player A can now write the payoff matrix
and minimize the expected maximum losses or risks by using the minimax criteria.
Example 17.5. Normal form games and payofl matrices: In decision theory, given the alternatives, it is important to make a n informed choice. Depending on the situation, costs or gains could stand for many different things and the payoff matrices can be made as complex as one
741
CLASSICAL INFORMATIONTHEORY
B
Figure 17.9
Decision trees for the players A and B in Example 17.5.
desires. Consider the following payoff matrix: Player A
Player B I bl
I
7
I
-4
Ib2I 3 I 5
1, I
(17.46)
The corresponding decision trees are given in Figure 17.9. When no probabilities can be assigned, using the first decision tree and the minimax criteria, player A chooses strategy a2 to avoid the risk of losing 7 points in case everything goes bad (remember that plus sign is for losses for player A ) .
Example 17.6. Another game and strategy f o r A: Let us consider player A in the second decision tree (Fig. 17.9, right). Now, B makes the first move and A has to decide without knowing what B has decided. Since A acts without seeing his/her opponent’s move, we connected points 2 and 3 by a dotted line. Another way to look at this game is that A and B decide simultaneously. We now show how A can use a random number generator to minimize his/her maximum loss by adjusting the odds for the two choices, a1 and a2, as y and (1 - y) a t points 2 and 3, respectively. If B chooses strategy b l , then A can expect t o lose Ebl =
7y - 4(1 - y)
(17.47)
742
INFORMATION THEORY
Figure 17.10
Ebl and
Eb2
points. If B chooses strategy b2, then A can expect to lose
Eb2 = 3y 4-5(1 - y)
(17.48)
points. If we plot E b l and Eb2 as shown in Figure 17.10, we see that A can minimize his/her maximum expected loss by following strategy a1 , 9 out of 13 times, and by following strategy a2,4 out of 13 times. If this is a one-time decision, A can use a random number generator to pick the appropriate strategy with the odds adjusted accordingly. The theory of games is a relatively new branch of mathematics closely related to the probability theory, information theory and the decision theory. Since the players and payoffs could have many different forms, it has found a wide range of applications to economic phenomena such as auctions, and bargaining. Other important applications are given in biology, computer science, political science, and philosophy (Szabo and Fath; Miller and Miller; Bather; Osborne) .
17.2.5
Traveler’s Dilemma and Nash Equilibrium
An interesting non-zero-sum game that Basu introduced in 1994, where each player tries to maximize their return with no concern to what the other player is getting, has attracted a lot of attention among game theorists. Even though the game can be played among any number of players, it is usually presented in terms of two players as the traveler’s dilemma.
CLASSICAL INFORMATION THEORY
743
An airline loses two pieces of luggage, both of which contain identical antiques that belong to two separate travelers. Being afraid that the travelers will claim inflated prices, the airline manager separates the passengers and tells them that the company is liable for up to $100 per luggage and asks them to write an integer between and including 2 and 100. The manager also adds that if they both write the same number, the company will honor that as the actual price of the antique and pay both travelers that amount. In case they write different numbers, the company will take the lower number as the actual price of the antique and pay that amount to both travelers. However, in this case the company will deduct $2 from the traveler who wrote the larger number as penalty and add $2 as bonus to the traveler who wrote the smaller number. According to these rules, we can now construct the following payoff matrix, where the first column represents the choices for the first traveler, John, and the first row represents the choices for the second traveler, Mary. The numbers in parentheses represent the reimbursements that John and Mary will receive, respectively. Payoff matrix for the traveler’s game:
For example, the numbers in the third column of the second row is (5,1), which means that when John chooses 3 and Mary chooses 4, John gets $5 and Mary one gets $1. In this case the lower number is 3; hence they both get $3 but since John gives the lower number, 3 , a $2 bonus is added to his reimbursement, while a $2 penalty is deducted from the Mary’s, thus bringing their total reimbursements to $5 and $1, respectively. The question is, What numbers or strategy should the travelers choose? Assuming that both travelers are rational, let us see what the game theory predicts. We start with John. Since his aim is to get the maximum possible amount as reimbursement, he first thinks of writing 100. However, he immediately realizes that for the same reason Mary could also write 100. Hence, by lowering his claim to 99, John expects to pick up the $2 bonus, thus increasing his return to $101. Then on a second thought, he realizes that Mary, being a rational person like himself, could also argue the same way and write 99, thus
744
INFORMATION THEORY
reducing his return to $99. Now, John could do better by pulling his claim down to 98, which will allow him to get $100 with the bonus. Continuing along this line of reasoning, John cascades down to the smallest number 2. Since they are both assumed to be rational, Mary also comes up with the same number. Hence, the game theory predicts that both travelers write the number 2, which is the Nash equilibrium for this game. However, in practice almost all participants pick the number 100 or a number very close to it. The fact that the majority of the players get such high rewards by deviating so much from the Nash equilibrium is not easy to explain mathematically. Since it is not possible to refer to the vast majority of people as being irrational, some game theorists have questioned the merits of this game, while others have proposed various modifications. In analyzing payoff matrices a critical concept is the Nash equilibrium. If each player has chosen a strategy and no player can improve his or her situation unilaterally, that is, when the other players keep their strategy unchanged, the set of strategies and the corresponding payoffs correspond to a Nash equilibrium. In 1950 Nash showed in his dissertation that Nash equilibrium exists for all finite games with any number of players. There is an easy way to identify Nash equilibrium for pure strategy games, which is particularly helpful when there are two players with each player having more than two strategies available. In such cases, formal analysis of the payoff matrix could be quite tedious. For a given pair of numbers in a cell, we simply look for the maximum of a column and check if the second member of the pair has a maximum of the row. When these conditions are met, as in the cell with ( 2 , 2 ) in the traveler’s dilemma, then that cell represents the Nash equilibrium. An N x N payoff matrix can have N x N pure strategy Nash equilibria. In the traveler’s dilemma there is only one Nash equilibrium. A mixed strategy is a strategy where the players make their moves randomly according to a probability distribution that tells how frequently each move is to be made. A mixed strategy can be understood in contrast t o a pure strategy, where players choose a certain strategy with the probability 1. For example, in the traveler’s dilemma game, if at least one of the travelers chooses his or her number by using a random number generator, then the game is called a mixed strategy game. The concept of stability is very important in physical systems and has been investigated for many different kinds of equilibrium. Stability of Nash equilibria in mixed strategy games can be defined with respect t o infinitesimal variations of the probabilities as follows: In a given Nash equilibrium, when the probabilities for one of the players are varied infinitesimally, the Nash equilibrium is stable (i) if the player who did not change has no better strategy in the new situation and (ii) if the player who did change is now playing strictly with a worse strategy.
CLASSICAL INFORMATION THEORY
745
When these conditions are met, a player with infinitesimally altered probabilities will quickly return to the Nash equilibrium. An important point is that stability of the Nash equilibrium is related to but not identical with the stability of a strategy. The dilemma in the traveler’s dilemma game is in our difficulty in explaining why people choose something which the game theory deems as irrational and yet get such high rewards. With the hopes of coming up with an explanation, game theorists have introduced a number of different equilibrium concepts like strict equilibrium, the rationalizable solution, perfect equilibrium, and more. Yet in all these cases one reaches the prediction (2,2) for the traveler’s dilemma (Basu). Game theory assumes that both travelers are rational and have the time to construct and analyze the payoff matrix correctly. The fact that so many players decide diametrically opposite t o what the theory predicts and do well means that there are other important factors in the decision making process that we use. In real-life situations, aside from being rational, we can also assume that, on the average, people are honest and will not view this as an opportunity to make extra cash, hence would be glad to get out even. However, in order to be able to exercise this option, players need to have access to a critical piece of information, which the game lacks-that is, how much they have actually paid for the antique. A realistic modification of the game would be to give each player a number chosen with a certain distribution representing the actual cost of the antique for that player. Most of the players will be given numbers close t o each other within a reasonable bargaining range, say 10%. Fewer players will have paid a lot more or a lot less than the mean price for various reasons. When these factors are missing from the game, players will naturally view picking any number from 2 t o 100 as their declared rightful choice; hence to maximize their return, they will not hesitate to choose a number very close to the upper limit. Since the players are not given any clue whatsoever about the actual cost of the antique and since they are not taking away the other traveler’s right to write whatever he or she wants, and also considering that the airline company is the strong hand, which after all is at fault, they will have no problem in rationalizing their action. In summary, what is rational or irrational largely depends on the circumstances under which we are forced to make a decision and how much time and information we have available. In some situations, panicking may even be a rational thing to do. In fact, in times of crisis when there is no hope or time for coming up with a strategy that will resolve our situation, to avoid freezing or decision paralysis, our brains are designed to panic. This helps us t o come up with a strategy of some sort with the hopes that it will be the right one. Decisions reached through panic are not entirely arbitrary. During this time, our brain and our subconscious mind goes through a storm of ideas and potential solutions and somehow picks one. Once panic sets in, the decision that comes out is out of our control. All we can do is t o hope that it turns out to be the right one or one that is close to being right. When all else fails, in line with the famous saying A bad decision is better than n o decision, the ability
746
INFORMATION THEORY
to panic may actually be a n invaluable advantage in evolution. Of course, unnecessary and premature activation of this mechanism, when there is still time and the means of coming up with the correct decision, is a pathology that needs to be cured. The thought processes that lead t o the experimental findings of the traveler’s dilemma game still remains unknown. The game has been applied to situations like arms race and competing companies cutting prices, where the players find themselves in slowly but gradually worsening situations. While the game theory and the Nash equilibrium may not pinpoint what the right decision is for a given situation, they may help to lay out how far things can escalate and what some, and only some, of the options are. When one finds a stable Nash equilibrium, there is no guarantee that others, nearby or far away, with much higher rewards do not exist. In such cases, information channels called weak links may give players the hints or the signals of the existence of other equilibria. Once a stable Nash equilibrium is reached, whether one should stay put or decide t o abandon that position and search for other equilibria with potentially higher rewards depends largely on one’s insight, ability to interpret and utilize such weak signals, and courage. When these games are applied t o economics, the payoff is money and in biology it is gene transmission, both of which are crucial in survival. All this being said, the role of leadership in critical decisions can neither be overlooked nor underestimated. It is the leadership qualities that open up new options, notice changing parameters, and create new channels of communication, which to others appear nonexistent or impossible. As evidenced in the behavior of many different complex systems, weak links play a crucial role in the processes of decision making (Csermely). As for the traveler’s dilemma, the two players and the airline manager are parts of the same social network, hence they can never be considered as totally isolated. Their minds continuously gather information through weak links from their collective memory, which has important bearing on their final decisions. For example, one, or both, of the passengers may remember from the news that, once an equally reputable company has declined to pay when both passengers claimed reimbursements very close the high end. Such considerations are important even in the experimental game.
17.2.6
Classical Bit or Cbit
The classical bit is defined as the physical realization of a binary system, which can exist only in two mutually exclusive states and which can be read or measured by an appropriate device. If the two states are equiprobable, the amount of information gained in such a measurement is 1 bit. This is also the maximum amount of information that can be obtained from any binary device. We now introduce Cbit as a device or a register, not as a unit of information. Cbit is a stable device whose state does not change by measurement. However, its state can be changed by some externally driven operation. An
CLASSICAL INFORMATION THEORY
Figure 17.11
747
Necker cubes.
external device that sets or resets a Cbit is called a gate. In preparation to our discussion on its quantum version, Qbit, we represent the states of a Cbit by two orthonormal 2-vectors:
(17.49) As far as the observer is concerned, H refers to the potential knowledge, that is, the knowledge that the observer does not have but expects to gain on the average after the outcome is learned. After the observation is made and the state of the Cbit is found, H collapses t o zero, since subsequent observations cannot change the result for that observer. However, for a second observer, who does not know the outcome, H is still l b . In other words, collapse is in the mind of the observer. Cbit is always either in state 10) or 11) , whether an observation has been made or not does not change this fact. Observer’s state of mind, which is basically a certain pattern of neural networks, wonders between the two possibilities, 10) or 11); and after the observation is made, it collapses to a new state, from “I wonder what it is?” to “I see” or from not knowing to knowing. This reminds us the Necker cubes (Fig. 17.11). If we relax and look continuously at the center of the cube on the left, we see two surfaces oscillating back and front. Since we are looking at the projection of a three-dimensional cube onto a plane, our brain does not have sufficient information t o decide about which side is the closer one. Hence, it shows us the two possibilities by oscillating the surfaces with a definite frequency, which is probably a function of the processing speed of our brain. On the other hand, if we look at the cube on the right, which is still two-dimensional but with the missing information about the surfaces included, the two surfaces no longer oscillate. We shall not go into this any further; however, the inclusion of human mind as an
748
INFORMATION THEORY
information processor always makes the problem much more interesting and difficult. For nontrivial calculations, one needs more than one Cbit. A 2-bit classical system can be constructed by combining two Cbits, ( A ) and I B ) , each of which has its own two possible states. Now the combined state, ( A B ), has four possible states given as
Any measurement will yield 2b worth of information culminating with only one of the above 4 states. In general, one represents the states of n Cbits in terms of 2" orthonormal vectors in 2" dimensions. Since technological applications always involve multiple Cbits, it is worth getting acquainted with the nomenclature. The four states of a 2-Cbit system is usually written in terms of tensor products as 10) @ 10) , 10) @ 11) , 11) c 3 10) , 11)@ 11).
(17.54)
Sometimes we omit @ and simply write (17.55) Other equivalent ways of writing these are
and
lo), , 1 1 ), ~ , 1 3 ) ~ .
( 17.57)
In the last case the subscript stands for the number of Cbits in the system, which in this case is 2 and the numbers 0 , 1 , 2 , 3 correspond to the zeroth,
CLASSICAL INFORMATION THEORY
749
first, second, and third states of the 2-Cbit system. In general, for an n-Cbit system there are 2% mutually exclusive states, hence we write
lx), , where
J:
is integer 0 5 x < 2%.
(17.58)
The analogy with vector spaces and orthonormal basis vectors gains their true meaning only in quantum mechanics, where we can construct physical states by using their linear combinations with complex coefficients in Hilbert space. In classical systems, the only meaningful reversible operations on nCbit systems is the 2n! distinct permutations of the 2%basis vectors. We can write the 4-Cbit state 10)10)11) 10) in the following equivalent ways:
10) @ 10) @ 11) @ 10) = 10) 10)11) 10) = j0010) = la), .
( 17.59)
In general a 4-Cbit state is written as
kW2xixn) = b ) 4 ,
(17.60)
where x is a number given by the binary expansion 2 = 823
+ 422 + 221 + 2 0 .
(17.61)
The states are enumerated starting with zero on the right. For example, for the state l0OlO) we have 20= 0, z1 = 1, x2 = 0, 2 3 = 0, thus becomes
x
+ 4.0 + 2.1 + 1.0
= 8.0
= 2.
(17.62)
The tensor product is defined as
(17.63)
For example,
1001) = p)3 =
(i)@ b)@( ( :)=
(17.64)
@ :( ) =
750
17.2.7
INFORMATION THEORY
Operations on Cbits
In quantum information theory all operations on Qbits are reversible. In the classical theory there are only two reversible operations that could be performed on a single Cbit, do nothing and flip, which can be represented as matrices: Do nothing or the identity operator I:
w = (01
l0 ) (
:>=( ;)=lo): (17.65)
Flip operator X:
(17.66) An example for an irreversible operation is the action of the erase operator El:
(17.67) (17.68)
It. is irreversible, since the original states cannot be reconstructed from the output states. We also have the following operators:
w = ( ;. ' ) ( ; ) = ( ; ) = t l ) ,
,,,=(; ZlO)=
(
; ) = - ( :)=-to,, ) ( ; ) = ( ;)=to),
;l)(
01
-1 0
(17.69)
(17.70) Even though these operations are mathematically well-defined, they are meaningless in the classical context. Only the states 10)and 11) have meaning within the context of Cbits. However, operators like Z, which are meaningless for a single Cbit, when used in conjunction with other meaningless operators, could
751
CLASSICAL INFORMATION THEORY
gain classical meaning in multi-Cbit systems. For example, the operator
-(I 1 2
+ ZlZO)
(17.71)
acts as the identity operator for the 2-Cbit states 10) 10) and 11) 11). On the other hand, it produces zero, another classically meaningless result, when operated on 10) 11) and 11) 10) . The subscript on Z indicates the Cbit on which it acts on. For example, on a 4-Cbit state,
the flip operator, XI, acting on the first Cbit is defined as XI = I @ I @ X @ I ,
(17.73)
where
We cannot emphasize enough that we start counting with the zeroth Cbit on the right. Similarly, 1
-(I 2
-
(17.75)
ZlZ0)
is the identity operator for 10) 11) and 11) 10) and produces 0 when operated on 10) 10) and 11) 11). For multiple Cbit systems, another operator which represents reversible operations, is the Swap operator S i j . It exchanges the values of the Cbits represented by the indices i and j :
Another useful operation on 2-Cbit systems is the reversible XOR or the controlled-NOT gate implemented by the operator Clo as
, whenever the control The task of Cl0 is to flip the value of the target bit, 01). bit, [ x i ) ,has the value 1. One can easily show that Clo can be constructed from single Cbit operators as 1 ClO = -(I 2
+ z1+ x0
-
XOZ1).
Other examples of Cbits and their gates can be found in Mermin.
(17.78)
752
17.3
INFORMATION THEORY
QUANTUM INFORMATION THEORY
In the previous sections we have introduced the basic elements of Shannon’s theory and discussed Cbits, which are binary devices with two mutually exclusive and usually equiprobable states. Even though the technological applications of Cbits involve quantum processes like photoelectric effect, semiconductivity, tunneling, etc., they are basically classical devices working with classical currents of particles. Recently, the possibility of using single particles and quantum systems opened up the possibility of designing new information processing devices with vastly different properties and merits. Even though the classical information theory has been around for over 60 years with many interesting interdisciplinary applications, quantum information theory is still at its infancy with its technological applications not yet in sight. However, considering its potential in both theory and practice, quantum information theory is bound t o be a center of attraction for many years to come. In what follows we start with a quick review of the basics of quantum mechanics. 17.3.1
Basic Quantum Theory
A quantum system is by definition a microscopic system, where the classical laws of physics break down and the laws of quantum mechanics have to be used. The most fundamental difference between classical physics and quantum mechanics is about the effect of measurement on the state of a system. Measurement of a system always involves some kind of interaction between the measuring device and the system. For example, to measure the temperature of an object, we may bring it in contact with a mercury thermometer. During the measurement process, the thermometer and the object come to thermal equilibrium by exchanging a certain amount of heat. Finally, they reach thermal equilibrium at some common temperature and the mercury column in the thermometer settles at a new level, which allows us t o read the temperature from a scale. At the end of the measurement process, neither the thermometer nor the object will be at their initial temperatures. However, the amount of mercury inside the bulb is usually so small that it reaches thermal equilibrium with the object with only a tiny amount of heat exchanged, hence the temperature of the object does not change appreciably during this process. This shows that even in classical physics measurement effects the state of a system. However, what separates classical physics from quantum mechanics is that in classical physics these effects can either be minimized by a suitable choice of instrumentation or algorithms can be designed to take corrective measures. In this regard, in classical information theory a Cbit remains in its original state no matter how many times it is measured. It changes its state only by the action of some external devices called gates. In classical physics there are particles and waves. These are mutually exclusive properties of matter. Particles are localized objects with no dimensions,
QUANTUM INFORMATION THEORY
753
while waves are spread out in entire space. One of the surprising features of quantum mechanics is the duality between the particle and wave properties. Electrons and photons sometimes behave like particles and sometimes like waves. Furthermore, nature of the experimental setup determines whether electrons or photons behave as particles or as waves. The de Broglie relation A=-
h
(17.79)
P’
where h = 6.60 x 10-27~m2gs-1is the famous Planck constant, establishes the relation between a wave of wavelength X and a particle of momentum p. Similarly, the Planck formula
E = hu
(17.80)
establishes the particle property of light by giving the energy of photons in terms of the frequency of the electromagnetic waves. In quantum mechanics, measurement on a system changes the state of the system irreversibly and there are limits on the accuracy with which certain pairs of observables can be measured simultaneously. Position and momentum are two such conjugate observables. Heisenberg’s uncertainty principle, which could be considered as the singly most important statement of quantum mechanics, states that position and momentum cannot be determined simultaneously with greater precession than AX&
TI 2
2 -,
(17.81)
where TI = h/27r. Uncertainty principle does not say that we cannot determine the position or momentum as accurately as we want. But it says that if we want to know the momentum of a particle precisely, that is, as Ap + 0, then the price we pay is to lose all information about its position: lim Ax Ap-0
ti
2 ---+ 2AP
m.
(17.82)
In other words, particles begin t o act like pure waves extended throughout the entire space, hence they are everywhere. Similarly, if we want to know the position precisely, then we lose all information about its momentum. As Feynman says, “The uncertainty principle protects quantum mechanics. Heisenberg recognized that if it were possible t o measure the momentum and the position simultaneously with greater accuracy, the quantum mechanics would collapse. So he proposed that it must be impossible.” Since 1926, Heisenberg’s uncertainty principle and quantum mechanics have been victorious over many experimental and conceptual challenges and still maintain their correct status. Mathematical formulation of quantum mechanics is quite different from classical physics and it is based on a few principles:
754
INFORMATION THEORY
(I) The state of a system is completely described by the state vector, defined in the abstract Hilbert space, which is the linear vector space of square integrable functions. State vector is a complex valued function of real arguments. When continuous variables like position are involved, I Q) can be expressed as Q ( x ) .In this form it is also called the wave function or the state function. When there is no room for confusion we use both. The absolute value square of the state function, I S ( x ) l 2 ,gives the probability density, and \Q(z)I2dx is the probability of finding a system in the interval between x and x+dx. Since it is certain that the system is somewhere between -co and co,the state function satisfies the normalization condition
IQ),
2
IQ(x)l dx
=
1.
(17.83)
(11) In quantum mechanics the order in which certain dynamical variables are measured is important. It is for this reason that observables are represented by Hermitian differential operators or Hermitian matrices acting on the state vectors in Hilbert space. Due to their Hermitian property, these operators have real eigenvalues and their eigenvectors, also called the eigenstates, form a complete and orthogonal set that spans the Hilbert space. For a given operator, A, with the eigenvalues ai and the eigenstates I u i ) , or ui(x), we can express these properties as
A lUi) = a2 lu2) , Orthogonality: Completeness:
JC
u,’(x)uj(x)dx = Sij,
ut(x’)ui(x) = ~ ( x-’ x).
(17.84) (17.85) (17.86)
i
Eigenvalues, ai,which are real, correspond to the measurable values of the dynamical variable A. When an observable has discrete eigenstates, its observed value can only be one of the corresponding eigenvalues. Using the completeness property [Eq. (17.86)], we can express the general state vector of a quantum system as a linear combination of the eigenstates of A as (17.87) i
where lui)are also called the basis states. Expansion coefficients, ci, which are in general complex numbers, can be found by using the orthogonality relation [Eq. (17.85)] as (17.88)
755
QUANTUM INFORMATION THEORY
These complex numbers, ci, are called the probability amplitudes, and from the normalization condition (17.83) satisfy the condition (17.89) Now the expectation value, ( A ) ,of a dynamical variable is found as
( A )=
/
Q * ( x ) A Q ( x dx. )
(17.90)
Using the orthogonality relation (17.85), we can write ( A ) as
i
i
(17.91)
A quantum state prepared as (17.92) is called a mixed state. When a measurement is performed on a mixed state of the dynamical variable A , the result is one of the eigenvalues. The probability of the result 2 being the m t h eigenvalue, a,, is given by p , = Ic,~ . The important thing here is that once a measurement is done on the system, it is left in one of the basis states and all the information regarding the initial state, that is, all cis in Equation (17.92) are erased. This is called the collapse of the state vector or the state function. From the collapsed state vector, it is no longer possible to construct the original state vector. That is, all information about the original state contained in the coefficients, cis, are lost irreversibly once a measurement is made. How exactly the collapse takes place is still debated and is beyond the conventional quantum mechanics. Expansion coefficients, cis, in the state function can only be determined by collecting statistical information on the probabilities, p i s , by repeating the experiment over many times on identically
756
INFORMATION THEORY
prepared states. It is also possible to prepare a system in the mth eigenstate of A as l‘zl) =
1%).
(17.93)
In this case, all c,s except c,, which is equal to 1, are zero. Such states are called pure states. We have to remind the reader that for any another observable, Equation (17.93) will not be a pure state, since Iu,) is now the mixed state of the new observable’s eigenstates. Applying all this to Shannon’s theory, we see that for a pure state, that is, p , = 0, i # m, and p , = 1, Shannon entropy, H , is zero-in other words, zero new information. No matter how many times the measurement is repeated on identically prepared states, there will be no surprises. For a mixed state, Shannon entropy is given as N
H = - c p k l o g 2 p k , where k
c
Ick/
2
p k = 1,
=c
k
(17.94)
k
where N is the dimension of the Hilbert space. From a mixed state, the maximum obtainable information is
H = log2 N ,
(17.95)
which occurs when all the probabilities are equal, that is, when pi = 1,” for all i. Note that once the state vector collapses, further measurements of A will yield the same value. In other words, with respect to that observable, the system behaves classically. Using the definition of the expectation value, we can write the uncertainty principle for two dynamic variables, A and B , as
AAAB
1
2 5 IW,ml,
(17.96)
where [ A , B ]= AB - B A is called the commutator of A and B . From the above equation, it is seen that unless A and B commute, AB = BA, they can not be measured precisely simultaneously. Furthermore, if two operators do not commute, AB # BA, then the order in which they are measured becomes important (Merzbacher). In some cases, we can ignore most of the parameters of a quantum system and concentrate on just a few of the variables. This is very useful when we are interested in some discrete set of states like the spin with up/down states or the polarization with vertical/horizontal or as in the quantum pinball machine, which we shall discuss in detail, the path with right/left. In such cases, we can express the state vector as a superposition of the orthonormal basis states, le,) and 1Q2) , corresponding t o the two alternatives as
le) = c1 l‘zll) + c2 led,
( 17.97)
QUANTUM INFORMATION THEORY
757
where 2 lCll
+ lc2l 2 = 1.
(17.98)
When a which-one or which-way measurement is done, pl = lcll 2 and p 2 = lc2I2 are the respective probabilities of one of the basis states, 1Q1) or 1 Q 2 ) , being seen. Statistically speaking, if we repeat the measurement N times on identically prepared states, pl and p 2 will be equal to the following limits:
Nl -,
pl = lim
p2
N
N-cc
N2 N N’
= lim N-cc
= Ni
+
N2,
(17.99)
where N1 and N 2 are the number of occurrences of the states IQl) and 1Q2), respectively. If we do not perform a which-one or which-way measurement on the system, the system will be in the superposed state [Eq. (17.97)) with the probability density given as
+ C2Q2)*
=
(ClQl
=
lC1l2 (Q1I2
= p1 1Q1l2
+
+
(ClQl
C;caQ;Q2
c;c29p2
+ + +
c2Q2)
+
C;clQ;Ql
lc2I2 IQ2I2
+ \q2.
C;clQ;Ql
p2
( 17.100)
The two terms in the middle are the interference terms responsible for the fringes seen in a quantum double slit experiment. In classical physics, the interference terms are absent and the joint probability density reduces to (Feynman et al.) 1QI2 = Pl 1Q1I2
+ P 2 1Q2I2.
(17.101)
In classical physics, regardless of the presence of an observer, the system is always in one of the states: IQ1) or 1Q2). When we make an observation, we find the system in one or the other state with its respective probability, pl or p 2 . However, in quantum mechanics, until a measurement is done and the state function has collapsed, the system is in both states simultaneously, that is, in the superposed state:
IQ)
= c1 IS,)
+c2 1Q2).
(17.102)
Evolution of a state vector is determined by the Schrodinger equation,
a IQ)
H IQ) = ih-
at
,
(17.103)
where H is the Hamiltonian operator, which is usually obtained from its classical expression by replacing 2 and p with their operator counterparts. For example, for a particle of mass m moving under the influence of a conservative
758
INFORMATION THEORY
force field, V(?), the Hamiltonian operator is obtained from the classical Hamiltonian: (17.104) with the replacements
7
-+
7. (17.105)
as
-v2 + V(?). 2m tL2
(17.106)
In technological applications, changes in the state vector,
are usually managed through the actions of reversible transformations, U, called gates, which are represented by unitary transformations satisfying the relation
UUt = I ,
(17.108)
where I is the identity operator and U t is the Hermitian conjugate defined as the complex conjugate of the transpose of U , that is,
Ut = g*.
(17.109)
17.3.2 Single-Particle Systems and Quantum Information To demonstrate the profound differences between the classical and the quantum information processing devices, we start with the experimental setup shown in Figure 17.12. In this setup, we have a light source emitting coherent monochromatic light beam with the intensity I . The beam impinges on a beam splitter, B , and then separates into two, each with intensity I / 2 . Aside from its intensity, the transmitted beam on the left goes through the beam splitter unaffected. The reflected beam on the right undergoes a phase shift of 7r/2 with respect to the transmitted beam. Naturally, the detectors Do and D1 receive the transmitted and the reflected beams, respectively. Next, we consider this experiment in the limit as the intensity of the beam is reduced to almost zero. Technically, it is possible t o control the intensity so that we are actually sending one photon at a time t o the beam splitter, which diverts these photons with equal probability to the left or the right channels. The detectors respond to individual photons with equal probability. For an
QUANTUM INFORMATION THEORY
759
DO Figure 17.12
Quantum pinball machine. T for transmitted and R for reflected.
experiment repeated many times, half of the time DO and the other half of the time D1 will click. So far everything looks like the pinball machine. The difference between the two cases begin to appear when we search an answer to the question: Which path did the photon take? In the case of the pinball machine, the ball has two possibilities; it will go to either the left or the right. The source of randomness is the pin, which diverts the ball to the left or the right with equal probabilities. Once the ball clears the pin, it has a definite trajectory and ends up in either the left or the right bins. The observer just doesn’t know it yet. Whether the observer actually checks the bins or not has no bearing whatsoever on the result. In the case of photons, there are two possible basis states, lQ0) and I Q l ) , corresponding to the eigenstates of the “which-way” operator. State IQO) corresponds to the photon following the left path and 1Q1) corresponds to the photon following the right path. When the detectors are turned off or absent, the photon is in the superposed state
I*)
= co
IQO)
+ c1 IQ1) ,
(17.110)
where co and c1 are in general complex numbers satisfying the normalization condition lcol
2
+ ICll2 = 1.
(17.111)
In other words, the photon is neither in the left channel, l q o ) , nor in the right channel, 1@1), but it is in both of them, simultaneously. To find out which way the photon goes through, we turn on the detectors. One of the detectors
760
INFORMATION THEORY
clicks and the state function collapses to either ~ Q o or ) IQ1). Once the state function has collapsed, the photon is in a pure state, IQo) or l q ~ )During . this process we gain l b of information about the system, but the price we pay is that we have lost (destroyed) the initial state function [Eq. (17.110)] irreversibly. It can no longer be constructed from the collapsed state:
IQ)
=
IQO)
or
IQ)
= 1Ql).
(17.112)
By repeating the experiment many times on identically prepared setups, all we can gain is statistical information about the initial state. That is, square of the absolute values of co and c1, which are related to the probabilities of the initial state vector collapsing to either I@,) or lQ1) as PO = lcol
2
and P I
2
= lc11
.
( 17.113)
In the case of a symmetric beam splitter, the probabilities are equal, po = pl = 112, and the state function can be given in any one of the following forms:
1
IW = 5 [IS,) + lQd1
1 or IS) = 2 [ P o ) - lQ1)l.
( 17.114)
In the classical pinball machine the ball is always in a “pure” state. It is following either the left or the right paths. In other words, it is either in state IQO) or in state I Q l ) . It has nothing to do with the presence of an observer, knowing or not knowing, measuring or not measuring, peeking or not peeking through the cracks of the pinball machine. In classical physics, observation or measurement never has the same dramatic effect on the state of the system that it has in quantum systems. In the following section we discuss how all this can be verified in laboratory through the use of the Mach-Zehnder interferometer.
17.3.3
Mach-Zehnder
Interferometer
In a Mach-Zehnder interferometer (Fig. 17.13), after the first beam splitter, B1, the transmitted and the reflected beams are reflected at the mirrors M L and M R , respectively, and allowed to go through a second beam splitter, B2, before being picked up by the detectors D1 and DO. We refer to the transmitted and the reflected beams of the first beam splitter as the left and the right beams, respectively. We first consider the case of coherent monochromatic beam with intensity I produced by the light source 5’. Keep in mind that each time light gets reflected, it leads the incident wave by a phase difference of 7r/2 and the transmitted wave suffers no phase shift. The beam that gets transmitted at B1 follows the left path and gets reflected at M A , and finally splits into its reflected and transmitted parts at B2. They are joined by the parts of the wave reflected at B1, which follows the right path. The left beam reflected
QUANTUM INFORMATION THEORY
761
*R
Figure 17.13
Mach-Zender interferometer.
at B2 meets the right beam transmitted at B2. Since they both suffered two reflections, they are in phase and interfere constructively t o shine on D1 with intensity I . The part of the left beam transmitted a t B2 is joined by the part of the right beam reflected at B2. Since the right beam has suffered three reflections, while the left beam has suffered only one, they are out of phase by T , thus interfering destructively t o produce zero intensity a t DO.In summary, D1 gets the full original beam with intensity I , while DOgets nothing. Note that all this is true when all the legs of the interferometer are equal in length. We now turn down the intensity so that we are sending one photon a t a time to B1. The experimental result is completely consistent with the macroscopic result; that is, all photons are detected by D1 and no photon is detected by DO. To understand all this in terms of the interference of electromagnetic waves is easy. However, in the case of individual photons we find ourselves in the position of accepting the view that the photon has followed both paths to interfere with itself t o produce no response a t DOand a sure response at D1, From the information theory point of view, we already know the answerthat is, the detector D1-responds for sure. However, we know nothing about which path the photon has followed to get there. To learn the path that the
762
INFORMATION THEORY
p=
114 + 114 = 112
Figure 17.14 Mach-Zehnder experiment with a pinball machine. mutually exclusive paths for the balls seen in Bin 1 are shown on the left.
The two
photon follows, we remove Ba, and either Do or D1 clicks. We now know which path the photon follows, and we gain l b of information in finding that out. In the Mach-Zehnder interferometer, we know exactly which detector responds, that is, D1, but we have no knowledge of how the photon gets there. There is a region of irremovable uncertainty about where the photon is in our device. There is no classical counterpart of this. If we try the same experiment with the classical pinball machine, we see the ball half of the time in bin 1 and the other half of the time in bin 2 (Fig. 17.14, left). This is because the events corresponding to the ball following the left path and the right path to reach bin 1 are mutually exclusive, each with their respective probability of 1/4 (Fig. 17.14, right). If one happens, the other one does not. Thus, their joint probability is given as their sum: = $. A symmetric argument works for the ball reaching bin 2. In other words, no matter how many times we run the experiment, we never see a case where the ball that followed the left path colliding or interfering with itselj, that followed the right path. This is the strange position that we always find ourselves in when trying t o understand quantum phenomenon in terms of our intuition, which is predominantly shaped by our classical experiences.
a+a
QUANTUM INFORMATION THEORY
763
Figure 17.15 Undisturbed paths for the eigenstates of the “which way” operator: I * l e f t ) = * L and l * r ~ g h t ) = *R.
17.3.4
Mathematics of the Mach-Zehnder
Interferometer
In a Mach-Zehnder interferometer let us choose the orthonormal basis states as (17.115) These are the eigenstates of the which-way operator. They correspond to the undisturbed paths, that is, the paths in the absence of both beam splitters (Fig. 17.13). For a photon incident from the right (Fig. 17.15) the left channel is defined as SB1 M L B ~ D owhile , the second one defines the right channel for the undisturbed path for a photon incident from the left: SB1 M R B D1. ~ Strictly speaking, these are meaningless statements. What we need is the full solution of the time-dependent Schrodinger equation, which exhibits the change to the superposition of the right and left path solutions after going through the beam splitter. However, for our purposes it is perfectly all right to work with the reduced degrees of freedom represented by the Left and the right phrases. Between the source, S, and the beam splitter, B1, the photon is in the basis state (Fig. 17.13) (17.116)
764
INFORMATION THEORY
After the first beam splitter, B1,and between B1 and B2, the solution is transformed into the mixed, that is, the superposed state of the transmitted and the reflected parts as / Q I B I B z )= co P l e f t )
+ c1 /QTi,ht).
(17.117)
We now write a mathematical expression for the action of the beam splitter, B1, which acts on state I Q l e f t ) [Eq. (17.116)] t o produce the superposed state [Eq. (17.117)]. Since we use a symmetric beam splitter, we have
( 17.118) We also know that the reflected photon in Figure 17.13 has suffered a phase shift of 7r/2 with respect to the state I Q l e f t ) ; hence without any loss of generality we can take 1 ei7r/2 - i co = - and c1 = -- -
Jz
Jz
(17.119)
Jz'
Thus,
="(;>.i(:)] Jz
(17.121)
We now write a matrix, B, that acts on the initial state [Eq. (17.116)] and produces the mixed state [Eq. (17.121)] as
( 17.122) We can easily verify that (17.123)
1
= - (I
Jz
+ 2X)
(17.124)
accomplishes this task, where (17.125) This can be understood by the fact that half of the incident wave goes unand the identity affected into the left channel, thus explaining the factor operator I, while the other half gets reflected into the right channel with a
&
QUANTUM INFORMATION THEORY
765
phase shift of 7~12,thus explaining the flip operator, X, and the phase shift factor eiTI2: (17.126) Note that the left channel is the path followed by the undisturbed photon incident from the right and vice versa. Some of the important properties of B are: (i) It is a transformation not a n observable. (ii) Since BBt = I, where B t = B*, it is a unitary transformation. (iii) Since it is a unitary transformation, its action is reversible with its inverse given as B-' = B t . (iv) B2 = iX. After the second beam splitter, B2, the final state of the photon between B2 and the detectors Do or D1 is given as
-
IQ'BzD1 or
2)
=
B IQB*Bz) = BB lQSB1)
ix I*SBI) =ix( =
:,)
=i(
y)
(17.127) ( 17.128) (17.129)
( 17.130)
Notice that the phase factor, i = eiTl2, is physically unimportant and has no experimental consequence. However, we shall see that the individual phases in a superposition are very important. We now introduce the channel blocker represented by the operators
E L ? = ( 01 0 ) a n d E L = ( 0 01 ) .
(17.131)
They represent the actions of blocking devices which block the right and the left channels, respectively. They eliminate one of the components in the superposition and lets the other one remain. Another useful operator is the phase shift operator @(4), (17.132) which introduces a relative phase shift of 4 between the left and the right channels. Note that we have written @(4)symmetrically so that the operator delays the left channel by 4/2, while the right channel is advanced by 4/2. In other words, its action is equivalent to (17.133)
766
INFORMATION THEORY
If we insert an extra phase shift device between the two beam splitters, we can write the most general output state of the Mach-Zehnder interferometer as IQoutput)
= BQ?(4)BI Q i n p u t ) .
For a wave incident from the right,
1
1)Jloutput)
=2 [(l - 2 4 )
(Qinput )
=
(17.134)
(3,
this gives
( ;) + i ( l + ( ;)] . 24)
(17.135)
Other commonly encountered operators are
I=
X
=
Z= Y=
( h ;) ( ; ;) (0 ) -1 XZ ( )
(17.136)
: identity, :
( 17.137)
shifts the phase of
(17.138)
:
-1
=
flips two basis states,
0
:
phase shift by
T
followed by a flip,
(17.139)
Among these, I, X, Y, Z, and @ are unitary operators, while E is an irreversible operator. Notice that unitary transformations produce reversible changes in the quantum system, while the changes induced by a detector are irreversible. In addition to these, there is another important reversible operator called the Hadamard operator, which converts a pure state into a mixed state:
=
1
-(X+ Z).
Jz
(17.140)
Hadamard operator, H, is a unitary operator with the actions
(17.141) and
l"L)=H(
!)=&[(;)-( ;)]
(17.142)
Hadamard operator is the workhorse of quantum computing, which converts a pure state into a superposition.
QUANTUM INFORMATION THEORY
17.3.5
767
Quantum Bit or Qbit
We have defined a Cbit as a classical system with two mutually exclusive states. It is also possible to design binary devices working at the quantum level. In fact, Mach-Zehnder interferometer is one example. A Qbit is basically a quantum system that can be prepared in a superposition of two states like 10) and 11). The word superposition on its own implies that these two states are no longer mutually exclusive as in Cbits. Furthermore, in general, quantum mechanics forces us to use complex coefficients in constructing these superposed states. We used the word forces deliberately, since complex numbers in quantum mechanics are not just a matter of convenience, as they are in classical wave problems, but a requirement imposed on us by nature. Real results that can be measured in laboratory are assured not by singling out the real or the imaginary part of the state function, which in general cannot be done, but by interpreting the square of its absolute value, 1912,as the probability density and by using Hermitian operators that have real eigenvalues to represent observables. The superposed state of a Qbit is defined as
where co and c1 are complex amplitudes, which satisfy the normalization condition:
lcol
2
+
2 lCll
(17.144)
= 1.
If we write co and c1 as
co = aoeial, c1 = boeibl,
a0
> 0, bo > 0,
(17.145)
we obtain
Since it has no observable consequence, we can ignore the overall phase factor
eial. To guarantee normalization [Eq. (17.144)], we can also define two new real parameters, 8 and 4, as
a. = cose, bo = sin8 and (bl
-
a l ) = 4,
(17.147)
so that 19)is written as
I 9)= cos 8 10)
+ sin 8e24 11).
(17.148)
Note that the probabilities, 2
2
2
po = lcol = cos 8 and p l = IclJ = sin20,
(17.149)
768
INFORMATION THEORY
are not affected by the phase,
4,at all. po =p1
When 0 = ~ =
1 2
-.
1 4we , have (17.150)
This is the equiprobable case with the Shannon entropy of information, H , of one bit, which is the maximum amount of useful average information that can be obtained from any binary system. In other words, whether we measure one Qbit or make measurements on many identically prepared Qbits, the maximum average information that can be gained from a single Qbit is one bit. This does not change the fact that we need both 0 and 4 to specify the state of a Qbit completely. What happens to this additional degree of freedom and the information carried by the phase 4? Unfortunately, all this wealth of information carried by the phase is lost irreversibly once a measurement is made and the state function has collapsed. Once a Qbit is measured, it behaves just like a Cbit. Given two black boxes, one containing a Cbit and the other a Qbit, there is no way to tell which one is which. If you are given a collection of 1000 Qbits prepared under identical conditions, by making measurement on each one of them, all you will obtain is the probabilities, PO and p l , deduced from the number of occurrences, No and N1, of the states 10) and 11), respectively, as (17.151) Furthermore, there is another restriction on quantum information, which is stated in terms of a theorem first proposed by Ghirardi. It is known as the no-cloning theorem, which says that the state of a Qbit cannot be copied. In other words, given a Qbit whose method of preparation is unknown, the no-cloning theorem says that you cannot produce its identical twin. If it were possible, then one would be able to produce as many identical copies as needed and use them to determine statistically its probability distribution as accurately as desired, while still having the original, which has not been disturbed by any measurement. In other words, there is absolute inaccessibility of the quantum information buried in a superposed state. You have to make measurement to find out, and when you do, you destroy the original state with all your expected gain as one bit. Is there then no way to harvest this wealth of quantum information hidden in the two real numbers 0 and 4? Well, if you do not temper with the quantum state, there is. The no-cloning theorem says that you cannot copy a Qbit but you can manufacture many Qbits in the same state and manage them through gates representing reversible unitary operations and have them interact with other Qbits. As long as you do not meddle in the inner workings of this network of Qbits, at the end you can extract the needed information by a measurement, which itself is an irreversible operation. If we act on a
QUANTUM INFORMATION THEORY
769
superposition,
(17.152) with the Hadamard operator [Eq. 17.140)], H, we get IQsup.2)
=H =
l%lp.l)
;)]
HZ[( i ) + e i 4 (
(17.153)
( 17.154) which is another superposition, ities for the state IQsup.l),
with different phases. The probabil-
IQsup.2),
1 P o = - ,2
1 2'
(17.155)
p1=-
has now changed with the second state,
I Q s u p. % ) ,
1 po = -(1+ cos4), p1 2
=
to 1 -(12
COS@),
(17.156)
thus demonstrating how one can manipulate these phases with reversible operators and with observable consequences. Other relations between reversible operators that are important for designing quantum computing systems can be given as (Mermin)
xz = -zx,
( 17.157) (17.158)
HXH = Z, HZH
=X =
(17.159) -iB2
( 17.160)
On the practical side of the problem, it is the task of quantum-computational engineering to find ways to physically realize and utilize these unitary transformations. For practical purposes, most of the existing unitary transformations are restricted to those that act on single or at most on pairs of Qbits. An important part of the challenge for the software designers is to construct the transformations that they may need as combinations of these basic elements. Any quantum system with binary states like 10) and 11) can be used as a Qbit. In practice, it is desirable to work with stable systems so that the superposed states are not lost through decoherence, that is, through interactions with the background on the scale of the experiment. Photons, electrons, atoms, and quantum dots can all be used as Qbits. It is also possible to use internal states like polarization, spin, and energy levels of an atom as Qbits.
770
INFORMATION THEORY
17.3.6
The No-Cloning Theorem
Using the Dirac bra-ket notation and the properties of inner product spaces introduced in Chapter 5, we can prove the no-cloning theorem easily. Let a given Qbit to be cloned, called the control Qbit, be in the state ( Q A ) . A second quantum system in state Ix) , called the target Qbit, is supposed to be transformed into ~ Q A )via a copying device. We represent the initial state of the copying device as I@). The state of the composite system can now be 1 ~ I@)). ) Similar to an office copying device, the whole process written as IQA) should be universal and could be described by a unitary operator U,. The effect of Uc on the composite system can be written as (17.161) where I @ A ) is the state of the copier after the cloning process. For a universal copier, another state, ~ Q B ) not , orthogonal to ~ Q A ) ,is transformed as
where all the states are normalized, that is,
(17.163)
s_',"
Note that in bra-ket notation the inner product of two states, Q*(z)@(z)dz, is written as (Q I@). Since IQA) and IQB) are not orthogonal, we have ( Q A IQB)
# 0.
(17.164)
From the properties of the inner product, we also have the inequalities
I(@A
I@B)I
51 and
IQB)~
~(QA
5 1,
(17.165) t
where the bars stand for the absolute value. Since a unitary operator, U, U, = I , preserves inner product, the inner product of the composite states before the operation:
(@I
(XI
(QAl
u,'uc1 Q B ) Ix) I@) = ( Q A
IQB)
( x Ix) (@ I@) = ( * A
IQB) 1
(17.166) has to be equal to the inner product after the operation. Hence we can write ( Q A I Q B ) = ( Q A I q B ) ( Q A I Q B ) ( @ A I@B)
= (*A 1@B)2 ( @ A (@B)
( 17.167)
QUANTUM INFORMATION THEORY
771
or = ( Q A IQB) ( @ A
I@B)
(17.168)
Taking absolute values, this also becomes (17.169) Since for nonorthogonal states the inequalities I ( Q A IQB) I 5 1 and I ( @ A I@B) 1 are true, the equality in Equation (17.169) can only be satisfied when I*A)
=
Is,).
I5
(17.170)
Hence, the unitary operator U, does not exist. In other words, no machine can make a perfect copy of another Qbit state, IXPB), that is not orthogonal to I Q A ) . This is called the no-cloning theorem (for other types of cloning see Audretsch; BruP and Leuchs).
17.3.7
Entanglement and Bell States
Let us consider two Qbits, A and B , with no common origin and interaction between them. We can write their states as
and
where 10) and 11) refer t o their respective basis states: (17.173)
Each pair of amplitudes satisfy the normalization condition separately as IQAl 2
+ IPAI2 = 1
( 17.174)
+ IPBI2 = 1.
(17.175)
and 2
IaB/
Since there is no interaction between them, both of them preserve their identity. Their joint state, ~ X A B ) is , given as the tensor product / X A )@ I x B ) , which is also written as ~ X A I x)B ) :
lo),
(17.176) lo), +PB I1)Bl = Q A Q B lo), lo), + ~ A P B lo), I1)B + PAQB I1)A lo), + PAPB l 1 ) A l l ) B .
IXAB) = l x A )
'8 I X B ) =
[QA
+ P A ll)A1 [aB
(17.177)
772
INFORMATION THEORY
However, this is only a special two-Qbit state, which is composed of noninteracting two single Qbits. A general two-Qbit state will be a superposition of the basis vectors:
which span the four-dimensional two-Qbit space as
Complex amplitudes,
aij , satisfy
the normalization condition
c 1
= 1.
(17.180)
J02jj2
i,j=O
In general, it is not possible to decompose ~ X A B in ) Equation (17.179) as the tensor product, I X A ) 63 I x B ) , of two Qbit states: ~ X A and ) I x B ) . Note that only under the additional assumption of QOOQll
(17.181)
= Q01Q10,
Equation (17.179) reduces to Equation (17.177). Qbit states that cannot be decomposed as the tensor product of individual single Qbit states are called entangled. As in the single Qbit case, a measurement on a two-Qbit state causes the state function to collapse into one of the basis states in Equation (17.178 ) with the probabilities given as P i j = IaijI
2
(17.182)
.
Maximum average information that can be obtained from a two-Qbit system, 2b, is when all the coefficients are equal. For entangled states, this may not be the case. As in the case of single Qbits, where we can prepare maximally superposed states, we can also prepare maximally entangled two-Qbit states. There are four possibilities for the maximally entangled states, which are also called the Bell states. Following Josza and Roederer, we write them for la011
2
+ 101012 = 1, a00 =
all
=0
(17.183)
as
(17.184) (17.185)
QUANTUM INFORMATION THEORY
773
and for boo1
2
+ la1112 = 1, a01 = a10 = 0
(17.186)
as (17.187)
(17.188) These four Bell states are orthonormal and span the four-dimensional Hilbert space. Hence, any general two-Qbit state can be expressed in terms of the Bell basis states as IXAB) = c 1 1
I*-)
-k c 1 2 Iq+)-k c 2 1
I@-)
-k c 2 2
I@')
1
(17.189)
where Cij are complex numbers. Since Bell states are constructed from the linear combinations of the original basis states in Equation (17.178), Cij are also linear combinations of aij. If we consider the actions of the unitary operators I, X, Y, and Z on Bell states, we find (17.190) (17.191) (17.192)
( 17.193) The subscript indicates on which Qbit the operator is acting on. To see what all this means, consider a pair of entangled electrons produced in a common process and sent in opposite directions to observers A and B. Electrons are also produced such that if the spin of the electron going toward the observer A is up, lo), , then the other one must be down, Il),, and vice versa. Obviously, the pair is in one of the two Bell states given by Equations (17.184 ) and (17.185). For the sake of argument, let us assume that it is the symmetric one, that is, (17.194)
A measurement by one of the observers, say A, collapses the state function to either lo), ll)Bor l1)A lo), with the equal probability of This removes all the uncertainty in any measurement that B will make on the second electron. In other words, when A makes a measurement, then any measurement of B will have zero information value, since all prior uncertainty will be gone. That is, despite the fact that there are two Qbits involved, this Bell state carries
i.
774
INFORMATION THEORY
only one bit of classical information. In this experiment, separation of A and B could be as large as one desires. This immediately brings t o mind action at a distance and the possibility of superluminal communication via quantum systems. As soon as A (usually called Alice) makes a measurement on her particle, spin of the electron at the location of B (usually called Bob) adjusts itself instantaneously. It would be a flagrant violation of causality if Alice could communicate with Bob by manipulating the spin of her particle. This worried none other than Einstein himself (see literature on E P R paradox). A way out of this conundrum is to notice that Bob has to make an independent measurement on his particle, and still he has t o wait for Alice t o send him the relevant information, which can only be done by classical means at subluminal speeds, so that he can decode whatever message was sent to him. Let us review the problem once more. Alice and Bob share an entangled pair of electrons. Alice conducts a measurement on her electron. She has a 50150 chance of finding its spin up or down. Let us say that she found spin up. Instantaneously, Bob’s electron assumes the spin down state. However, Bob does not know this until he performs an independent measurement on his electron. He still thinks that he has 50150 chance of seeing either spin, but Alice knows that the wave function has collapsed and that for sure he will get spin down. Bob conducts his measurement and indeed sees spin down. But to him this is normal, he has just seen one of the possibilities. Now, Alice calls Bob and tells him that he must have seen spin down. Actually, she could also call Bob before he makes his measurement and tell him that he will see spin down. In either case, it would be hard for Alice to impress Bob, since Bob will think that Alice has after all a 50/50 chance of guessing the right answer anyway. To convince Bob, they share a collection of identically prepared entangled electrons. One by one, Alice measures her electrons and calls Bob and tells him that she observed the sequence TJJTJJT . . . and that he should observe the sequence J T T I T T L . . . . When Bob measures his electrons, he now gets impressed by the uncanny precision of the Alice’s prediction. This experiment can be repeated this time with Alice calling Bob after he conducted his measurements. Alice will still be able t o predict Bob’s results with 100% accuracy. In this experiment, quantum mechanics says that the wave function collapses instantaneously no matter how far apart Alice and Bob are. However, they still cannot use this to communicate superluminally. First of all, in order to communicate they have to agree on a code. Since Alice does not know what sequence spins she will get until she performs her measurements, they cannot do this before hand. Once she does measure her set of particles, she is certain of what Bob will observe. Hence, she embeds the message into the sequence that she has observed by some kind of mapping. For Bob to be able t o read the Alice’s message, Alice has t o send him that mapping, which can only be done through classical channels. Even if somebody intercepts Alice’s message, it will be useless without the sequence that Bob has. Hence, Alice and Bob can establish spy-proof communication through entangled states. One of the
QUANTUM INFORMATION THEORY
775
main technical challenges in quantum information is decoherence, which is the destruction of the entangled states by interactions with the environment. It is for this reason that internal states like spin or stable energy states of atoms are preferred to construct Qbits, which are less susceptible to external influences by gravitational and electromagnetic interactions.
Example 17.7. Quantum cryptology- The Vernam coding: Alice wants to send Bob a message. Say the nine directions to open a safe, where the dial can be turned only one step, clockwise (CW) or counterclockwise (CCW),a t a time. They agree to use binary notation, CW=1, CCW=O, to write the message as
101010011 Afraid of the message being eavesdropped by a third party, Alice and Bob share 9 ordered entangled electrons. Alice measures her particles one by one and obtains the sequence
010010110, where 0 stands for spin up and 1 stands for spin down. Using this as a key, she adds the two sets of binary numbers according to the rules of modulo 2, which can be summarized as
o+o=o, o + 1 = 1 + 0 = 1, 1+1=0, to obtain the coded text, that is, the cryptograph as message key
cryptograph
1 0 1
0 1 1
1 0 1
0 0 0
1 1 0
0 0 0
0 1 1
1 1 0
1 0 1
Now Alice sends the cryptograph to Bob via conventional means. Bob measures his ordered set of electrons to obtain the key and thus obtain the message by adding the key t o the cryptograph with the same rules as cryptograph key message
1 0 1
1 1 0
1 0 0 0 1 0
0 1 1
0 0 0
1 0 1 1 0 1
1 0 1
Since the key is a completely random sequence of zeros and ones, the cryptograph is also a completely random sequence of zeros and ones. Hence, it has no value whatsoever t o anybody who intercepts it without the key that Bob has. This is called the Vernam coding, which cannot be broken. However, the problem that this procedure poses in practice
776
INFORMATION THEORY
is that for each message that Bob and Alice want to exchange they need a new key. That is, it can only be used once, which is also called a onetime-pad system. Another source for major concern is that during this process, the key may somehow be obtained by the eavesdropper. On top of all these, during the transmission, quantum systems are susceptible to interferences (decoherence), hence one needs algorithms to minimize and correct for errors. To attack these problems, various quantumcryptographic methods, which are called protocols, have been developed (Audretsch; BruP and Leuchs; Trigg, and more references can be found at the back of this book). 17.3.8
Quantum Dense Coding
We have seen that entanglement does not help t o communicate superluminally. However, it does play a very important role in quantum computing. Quantum dense coding is one example where we can send two bits of classical information by just using a single Qbit, thus potentially doubling the capacity of the information transfer channel. Furthermore, the communication is spy proof. Let us say that Alice has two bits of secret information t o be sent to Bob. Two bits of classical information can be coded in terms of a pair of binary digits as
00, 10, 01, 11.
(17.195)
First Alice and Bob agree to associate these digits with the following unitary transformations:
uoo = I, UOl = z, UlO
= XI
UIl
=Y.
(17.196) (17.197) (17.198) (17.199)
Then, Alice and Bob each receive one Qbit from an entangled pair prepared, say in the asymmetric Bell state I*-) . Alice first performs a unitary transformation with the subscripts matching the pair of the digits that she is aiming to send Bob safely and then sends her Qbit t o Bob as if it is a mail. Anyone who tempers with this Qbit will destroy the superposed state, hence the message. The unitary transformation that Alice has performed changes the bell state according to the corresponding formulas in Equations (17.190) - (17.193). When Bob receives the particle that Alice has sent, he makes a Bell state measurement on both particles to determine which one of the four states in Equations (17.190) - (17.193) it has assumed. The result tells him what Alice’s transformation was, hence the pair of binary digits that she wanted t o send him. Quantum dense coding was the first experimental demonstration of
QUANTUM INFORMATION THEORY
777
quantum communication. It was first realized by the Innsbruck group in 1996 (Matte et al.). The crucial part of these experiments is the measurement of the Bell state of the tangled pair without destroying the entanglement (Roederer; Audretsch).
17.3.9 Quantum Teleportation Consider that Alice has an object that she wants t o send Bob. Aside from conventional means of transportation, she could somehow scan the object and send all the information contained to Bob. With a suitable technology, Bob then reconstructs the object. Unfortunately, such a technology neither exists nor can be constructed because of the no-cloning theorem of quantum mechanics. However, the next best thing, which guarantees Bob that his object will have the same properties as the original that Alice has, is possible. And most importantly, they do not have t o know the properties of the original. We start with Alice and Bob sharing a pair of entangled Qbits, A and B , which could be two electrons or two photons. We assume that the entangled pair is in the Bell state l Q - ) A B . A third Qbit, the teleportee, which is the same type of particle as A and B and is in the general superposed state IX)T = QT
lo), + PT
I1)T
(17.200)
,
is available to Alice. Any attempt t o determine the exact state of 1 ~ will ) destroy it. Our aim is to have Alice transport her Qbit, I x ) ~ , to Bob without physically taking it there. In other words, Bob will have t o reconstruct 1 ~ at his location. In this process, due t o the no-cloning theorem, we have to satisfy the following two conditions: (i) At any time t o neither Alice nor Bob, the exact state 1 ~ is revealed. ) ~ (ii) At the end, the copy in Alice's hand has t o be destroyed. Otherwise, there will be two copies of 1 ~ ) ~ . We now write the complete state vector of the three particles as 1X)ABT = I'-)AB
Ix)T
We can express lxjABT in terms of the Bell states of the particles A and T held by Alice, that is, in terms of the set [Eqs. (17.184), (17.185), (17.187), and (17.188)] {I'-)AT
,
l'+)AT,
,
I'-)AT
I'+)AT).
(17.202)
The expansion coefficient for basis state 1
I'-)AT
=
-(lo),
Jz
ll)T
-
[')A
lo),)
(17.203)
~
)
~
778
INFORMATION THEORY
is found by projecting
Ix)ABT
along \Q'-)ATas
where we have used the Dirac bra-ket notation (Chapter 5) and the orthogonality relations
(01 0) = (11 1) = 1, (01 1) = (11 0) = 0,
(17.205) (17.206)
for both A and T . Similarly, evaluating the other coefficients, we write the complete state vector, I x ) A B T , as
Now, Alice performs a Bell state measurement on her particles A and T that collapses 1 ~ into one ) of the ~ four~ Bell states ~ in Equation (17.202) with the equal probability of In no way this process provides Alice any information about I x ) ~ , that is, the probability amplitudes (YT and PT,but the particle B , which Bob holds, jumps into a state connected to whatever the Bell state
i.
QUANTUM INFORMATION THEORY
779
that Alice has observed, that is, one of
None of these states is yet the desired 1 ~ measurement that Alice has performed on I of the transportee, 1 ~ ) ~ :
) However, ~ . due t o the Bell state
x ) ~ ~ they , are related to the state
(17.212) by the same unitary transformation that Alice has observed, that is, one of
+ aT lo), - aT lo), + PT lo), - PT lo),
+ PT I1)B
= I IX)T
+ PT ll)B = -z
+ QT I1)B + aT ll)B
1
IX)T
=
IdT ,
=
lx)T 1
,
( 17.213) ( 17.214) (17.215) ( 17.216)
where the operators are defined in Equations (17.136)-(17.139). At this point, only Alice knows which transformation t o use. That is, which Bell state the complete state function, J x J A B Thas , collapsed to. She calls and gives Bob the necessary two bit information, that is, the two digits of the subscripts of the operator Uij, which they have agreed upon before the experiment to have the components
(17.217) ( 17.218) ( 17.219) (17.220) corresponding to the four Bell states, I Q ' - ) A T , I Q ' + ) A T , I@-)AT, I @ ' + ) A T, respectively. Now, Bob uses the inverse transformation, UG', on particle B that he has and obtains an exact replica of the transportee, 1 ~ . For ) example, ~ if Alice observes I @ + ) A T when collapses, the two digits she gives Bob
780
INFORMATION THEORY
is 11, then Bob operates on the particle B that he has with Y-l, which is in state -PT lo), aT ll)B,t o obtain 1 ~ as) ~
+
y - l (-/&
lo), + aT
I1)B) = aT
+ PT ll)B
lo),
= Y-lY
Ix)T (17.221)
= IXJT.
Let us summarize what has been accomplished: (i) Since the teleportee is the same type of particle as A and B , we have obtained an exact replica of 1 ~ at) Bob’s ~ location who has the particle B . (ii) The original, lxjT, that Alice had is destroyed. That is, neither the particle A nor the other particle that Alice is left with, that is, the transportie whose properties has been transferred t o B a t Bob’s location, is in state 1 ~ ) (iii) If somebody had spied on Alice and Bob, the two bit information, that is, the subscripts of the unitary transformation, would have no practical value without the particle, B , that Bob holds. Notice that Alice and Bob has communicated a wealth of quantum information hidden in the complex amplitudes of IX)T = QT
lo), + PT l1)T.
by just sending 2 bits. Neither Alice nor Bob has gained knowledge about the exact nature of the state 1 ~ ) In ~ . other words, neither of them knows what CIT and PT are. This intriguing experiment was first realized by the Innsbruck group in 1997 (Bouwmeester et al.).
PROBLEMS 1. Consider two chance events, A and B , both with 2 possibilities, where p ( i , j ) is the joint probability of the i t h and j t h possibilities occurring for A and B , respectively. We write the entropy of information for the joint event as 2
2
For the individual events we write
rz and
2
1
~ .
PROBLEMS
781
Show that the following inequality holds:
H ( A ,B ) 5 H ( A )
+H(B).
Also show that the equality is true for independent events, where
P(A,B)= P(A)P(B). Apply this to the case where two fair coins are tossed independent of each other. 2. Two chance events ( A ,B ) have the following joint probability distribution:
AJ\B+
1
2
3
4 L
1
L
8
16
L
L
3
1 _ 32
1 _ 32
&
4
1 1 1 0 32 32 16
16
4
Find
and interpret your results.
3. Analyze the following payoff matrices [Eq. (17.42)] for zero-sum games for both players for optimum strategies: (i)
Player A
Player B
I b~ I PI
6
I I
1. l1 I
-3
Note that it is foolish for the player B t o choose bl, since b2 yields more regardless of what A decides. In such cases we say b2 dominates bl and hence discard strategy b l . Finding dominant strategies may help simplifying payoff matrices.
782
INFORMATION THEORY
(ii) Player A
Player B
)b11-2)
I
Ib2l
5 1-11
I
I
4. Consider the following zero-sum game: Player A
(i) Find the randomized strategy that A has to follow to minimize maximum expected loss.
(ii) Find the randomized strategy that B has to follow to maximize minimum expected gain.
5. The two-player competition game is defined as follows: Both players simultaneously choose a whole number from 0 to 3. Both players win the smaller of the two numbers in points. In addition, the player who chose the larger number gives up 2 points to the other player. Construct the payoff matrix and identify the Nash equilibria. 6. Identify the Nash equilibria in the following payoff matrices: (i)
PROBLEMS
783
(ii)
7. Two drivers on a road have two strategies each, to drive either on the left or on the right with the payoff matrix
I I
1 1 \2
+
Drive on the left
I Drive on the left I Drive on the right I 1 (100,100) I (0,O) I)
where the payoff 100 means no crash and 0 means crash. Identify Nash equilibria for (i) the pure strategy game, (ii) the mixed strategy game with the probabilities (SO%, 50%).
Which one of these is stable.
8. In a 2-Cbit operation, find the action of the operator
-(I 1
+ ZlZO)
2
on the 2-Cbit states
9. In a 2-Cbit operation, find the action of the operator
on the 2-Cbit states
10. Prove the following operator representation of the operator exchanges the values of Cbits 1 and 0:
S10, which
784
INFORMATION THEORY
or 1 SlO = 2 [I
+ ZlZO + XlXO + 1
-
,
Y-IYO]
where
Y = xz. 11. Another useful operation on 2-Cbit systems is the reversible XOR or the controlled-NOT or in short c-NOT, gate executed by the operator ClO as
ClO 1.1)
1 0.)
=
xgl 1.1)
1.0).
The task of Clo is to flip the value of the target bit, control bit, I Z ~ ) , has the value 1.
IQ),
whenever the
(i) Show that Clo can be constructed from single Cbit operators as
1 ClO = -(I 2
1 + XO) + -Z1(I 2
-
XO)
1 ClO = -(I 2
1 + Z l ) + -Xo(I 2
-
ZI).
or as
(ii) Show that the c-NOT operator can be generalized as czj =
1 1 z(I + Xj) + -Zz(I 2
-
Xj)
or as czj
1
= 2(1
1 + ZZ) + -Xj(I - Zz). 2
(iii) What is the effect of interchanging X and Z? 12. The Hadamard operator, 1 H = -(X+
fi
-"(' fi -
Z)
')
1 - 1 '
is classically meaningless. Show that it takes the Cbit states (0) and 11) into two classically meaningless superpositions:
1
-2 (10)
* 11))
'
PROBLEMS
785
13. Verify the following operator relations:
x2= I , x z = -zx, HX
=
1
-(I
Jz
+ ZX),
HXH = 2, HZH = X. 14. Find the effect of the operator
COl = (H1Ho)C,o(H,Ho), on Cbits. Using the relations
HXH = Z, HZH = X and 1
1
czj = 2 ( I + Xj) + -Zz(I 2 - Xj), also show that
Cji = (HiHj)Cij(HiHj). This seemingly simple relation has remarkable uses in quantum computers (Mermin). 15. Show that the Bell states
and
are unit vectors and that they are also orthogonal to each other.
786
INFORMATION THEORY
16. Show the following Bell state transformations:
IQ-)
=IA
I*-),
Iq+) = ZA lq-),
I@-) I a')
= - x A l!J-),
I
= Y A !J-)
.
The subscript A indicates the Qbit that the operator is acting on. 17. Given the states
and
can be written in terms of the Bell states,
of the particles A and T as
18. Verify the following transformations used in quantum teleportation and discuss what happens if Bob makes a mistake and disturbs the state of the particle, B , he holds?
+ QT lo), - QT
lo),
+ PT
I1)B = I IX)T >
+ PT ll)B
=
+ PT lo), + QT ll)B = -
PT
lo),
+ aT
ll)B =
-'
IX)T
IX)T > lx)T
'
1
References
Akhiezer, N.I., The Calculus of Variations, Blaisdell, New York, 1962. Ahlfors, L.V., Complex Analysis, McGraw-Hill, New York, 1966. Andel, J., Mathematics of Chance, Wiley, New York, 2001. Apostol, T.M., Mathematical Analysis, Addison-Wesley, Reading, MA, fourth printing, 1971. Appel, W., Mathematics for Physics and Physicists, Princeton University Press, Princeton, NJ, 2007. Arfken, G.B., and H.J. Weber, Mathematical Methods of Physics, Elsevier, Boston, sixth edition, 2005. Artin, E., The Gamma Function, Holt, Rinehart and Winston, New York, 1964. Audretsch, J., Entangled Systems, Wiley-VCH, Weinheim, 2007. Audretsch, J., editor, Entangled World: The Fascination of Quantum Information and Computation, Wiley-VCH, Weinheim, 2006. Basu, K., The Traveler’s Dilemma, Scientific American, p. 68, June 2007. Bather, J.A., Decision Theory, A n Introduction to Programming and Sequential Decisions, Wiley, Chichester, 2000. Bayin, S.S., Mathematical Methods in Science and Engineering, Wiley, Hoboken, NJ, 2006. Essentzals of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
5. Selquk Bayin 787
788
REFERENCES
Bell, W.W., Special Functions for Scientists and Engineers, D. Van Nostrand, Princeton, NJ, 1968. Boas, M.L., Mathematical Methods in the Physical Sciences, Wiley, Hoboken, NJ, third edition, 2006. Bouwmeester, D., J.W. Pan, K. Mattle, M. Eibl, H. Weinfurter, and A. Zeilinger, Experimental Quantum Teleportation, Nature, vol. 390, pp. 575-579, 1997. Bradbury, T.C., Theoretical Mechanics, Wiley, New York, 1968. Bromwich, T.J.I., A n Introduction to the Infinite Series, Chelsea Publishing Company, New York, 1991. Brown, J.W., and R.V. Churchill, Complex Variables and Applications, McGrawHill, New York, 1995. Brup, D., and G. Leuchs, editors, Lectures on Quantum Information, WileyVCH, Weinheim, 2007. Buck, R.C., Advanced Calculus, McGraw-Hill, New York, 1965. Butkov, E., Mathematical Physics, Addison-Wesley, New York, 1968. Byron, Jr. F.W., and R.W. Fuller, Mathematics of Classical and Quantum Physics, Dover, New York, 1992. Churchill, R.V., Fourier Series and Boundary Value Problems, McGraw-Hill, New York, 1963. Cover, T.M., and J.A. Thomas, Elements of Information Theory, Wiley, Hoboken, NJ, second edition, 2006. Csermely, P., Weak Links, Stabilizers of Complex Systems from Proteins to Social Networks, Springer, Berlin, 2006. Dirac, P.A.M., The Principals of Quantum Mechanics, Clarendon Press, Oxford, fourth edition, 1982. Dennery, P., and A. Krzywicki, Mathematics for Physics, Dover Publications, New York, 1995. Dwight, H.B., Tables of Integrals and Other Mathematical Data, Macmillan, New York, fourth edition, 1961. Erdelyi, A., Oberhettinger, M.W., and Tricomi. F.G., Higher 'Transcendental Functions, vol. I, Krieger, New York, 1981. Feynman, R., R.B. Leighton, and M. Sands, The Feynman Lectures on Physics, Addison-Wesley, Reading, MA, 1966. Franklin, P.A., A 'Treatise o n Advanced Calculus, Wiley, New York, 1940. Gamow, G., One Two Three ... Infinity: Facts and Speculations of Science, Dover Publications, 1988. Gantmacher, F.R., The Theory of Matrices, Chelsea Publishing Company, New York, 1960. Gasierowicz, S., Quantum Physics, Wiley, Hoboken, NJ, third edition, 2003. Ghirardi, G., Sneaking a Look at God's Cards, Princeton University Press, Princeton, NJ, 2004.
REFERENCES
789
Gnedenko, B.V., The Theory of Probability, MIR Publishers, Moscow, second printing, 1973. Goldstein, H., C. Poole, and J. Saf'ko, Classical Mechanics, Addison-Wesley, San Francisco, third edition, 2002. Griffiths, D.J., Introduction to Electrodynamics, Benjamin Cummings, third edition, 1998. Grimmett, G.R., and D.R. Stirzaker, Probability and Random Processes, Clarendon, Oxford, third edition, 2001. Harris, B., Theory of Probability, Addison-Wesley, Reading, MA, 1966. Hartle, J.B., A n Introduction to Einstein's General Relativity, Addison-Wesley, San Francisco, 2003. Hassani, S., Mathematical Methods: For Students of Physics and Related Fields, Springer Verlag, New York, 2000. Hassani, S., Mathematical Physics, Springer Verlag, New York, second edition, 2002. Hauser, W., Introduction to Principles of Mechanics, Addison-Wesley, Reading, MA, first printing, 1966. Haykin, S., Neural Networks, A Comprehensive Foundation, Prentice Hall, U p per Saddle River, 1999. Hildebrand, F.B., Methods of Applied Mathematics, Dover Publications, New York, second reprint edition, 1992. Hoffman, K., and R. Kunze, Linear Algebra, Prentice Hall, Upper Saddle River, NJ, second edition, 1971. Inan, U.S., and A.S. Inan (a),Engineering Electrodynamics, Prentice Hall, Upper Saddle River, 1998. Inan, US., and A.S. Inan (b), Electromagnetic Waves, Prentice Hall, Upper Saddle River, 1999. Ince, E.L., Ordinary Differential Equations, Dover Publications, New York, 1958. Jones, G.A., and J.M. Jones, Information and Coding Theory, Springer, London, 2006. Josza, R. in H.-K. Lo, S. Popescu, and T. Spiller, editors, Introduction to Quant u m Computation and Information, Word Scientific, Singapore, 1998. Kaplan, W., Advanced Calculus, Addison-Wesley, Reading, third edition, 1984. Kelly, J . J., Graduate Mathematical Physics, ments+ CD, Wiley-VCH, Weinheim, 2007.
With Mathematica Supple-
Kolmogorov, A.N., Foundations of the Theory of Probability, Chelsea Publishing Company, New York, 1950. Kusse B.R., and E.A. Westwig, Mathematical Physics: Applied Mathematics FOT Scientists and Engineers, Wiley-VCH, Weinheim, second edition, 2006. Kyrala, A., Applied Functions of a Complex Variable, Wiley, New York, 1972. Lang, S., Linear Algebra, Addison-Wesley, Reading, MA, 1966.
790
REFERENCES
Lebedev, N.N., Special Functions and Their Applications, Prentice-Hall, Englewood Cliffs, NJ, 1965. Lebedev, N.N., I.P. Skalskaya, and Y.S. Uflyand, Problems of Mathematical Physics, Prentice-Hall, Englewood Cliffs, NJ, 1965. Margenau, H., and G. M. Murphy, editors, The Mathematics of Physics and Chemistry, Van Nostrand, Princeton, NJ, 1964. Marion, J.B., Classical Dynamics of Particles and Systems, Academic Press, New York, second edition, 1970. Mathews, J., and R.W. Walker, Mathematical Methods of Physics, AddisonWesley, Menlo Park, CA, second edition, 1970. Mattle, K., H. Weinfurter, P.G. Kwiat, and A. Zeilinger, Dense Coding in Experimental Quantum communication, Phys. Rev. Lett., vol. 76, pp. 4656-4659, 1966. McCollum, P.A., and B.F. Brown, Laplace Transform Tables and Theorems, Holt, Rinehart and Winston, New York, 1965. McMahon, D, Quantum Computing Explained, Wiley-IEEE Computer Society Press, Hoboken, NJ, 2007. Medina, P.K., and S. Merino, Mathematical Finance and Probability, Birkhauser Verlag, Basel, 2003. Mermin, N.D., Quantum Computer Science, Cambridge University Press, Cambridge, 2007. Merzbacher, E., Quantum Mechanics, Wiley, New York, 1998. Miller, I., and M. Miller, John E. Freund’s Mathematical Statistics With Applications, Pearson Prentice Hall, Upper Saddle River, NJ, seventh edition, 2004. Morsch, O., Quantum Bits and Quantum Secrets: How Quantum Physics Is Revolutionazing Codes and Computers, Wiley-VCH, Weinheim, 2008. Morse, P.M., and H. Feshbach, Methods of Theoretical Physics, McGraw-Hill, New York, 1953. Murphy, G.M., Ordinary Differential Equations and Their Solutions, Van Nostrand, Princeton, NJ, 1960. Myerson, R.B., Game Theory, Analysis of Conflict, Harvard University Press, Cambridge, MA, 1991. Nagle, R.K., E.B. Saff, and A.D. Snider, Fundamentals of Differential Equations and Boundary Value Problems, Addison-Wesley, Boston, 2004. Osborne, M.J., an introduction to Game Theory, Oxford University Press, New York, 2004. Peters, E.E., Complexity, Risk, and Financial Markets, Wiley, New York, 1999. Pathria, R.K., Statistical Mechanics, Pergamon Press, Oxford, 1984. Rektorys, K., Survey of Applicable Mathematics Volumes I and II, Springer, Berlin, second revised edition, 1994. Roederer, J.G., Information and Its Role in Nature, Springer, Berlin, 2005.
REFERENCES
791
Ross, S.L., Differential Equations, Wiley, New York, third edition, 1984. Saff, E.B., and A.D. Snider, Fundamentals of Complex Analysis with applications to Engineering and Science, Prentice Hall, Upper Saddle River, N.1, 2003. Shannon, C.E., A Mathematical Theory of Communication, The Bell System Technical Journal, vol. 27, pp. 379-423, 623-656, 1948. Shannon, C.E., and W. Weaver, The Mathematical Theory of Communication, The University of Illinois Press, Urbana, IL , 1949. Sivia, D.S., and J. Skilling, Data Analysis: A Bayesian Tutorial, Oxford, New York, second edition, 2006. Spiegel, M.R., Advanced Mathematics f o r Engineers and Scientists: Schaum’s Outline Series in Mathematics, McGraw-Hill, New York, 1971. Stapp, H.P., Mind Matter and Quantum Mechanics, Springer, Berlin, second edition, 2004. Stolze, J., and D. Suter, Quantum Computing: A Short Course from Theory to Experiment, Wiley-VCH, Weinheim, 2004. Szabo, G., and G. Fath, Evolutionary Games on Graphs, Physics Reports, Vol. 446, pp. 97-216, 2007. Szekerez, P., A Course in Modern Mathematical Physics: Group, Hilbert Space and Differential Geometry, Cambridge University Press, New York, 2004. Titchmarsh, E.C., The Theory of Functions, Oxford University Press, New York, 1939. Thomas, G.B. Jr., and R.L. Finney, Thomas ’ Calculus, Addison-Wesley, Boston, alternate edition, 2000. Todhunter, I., A History of the Theory of Probability From the Time of Pascal to Luplace, Chelsea Publishing Company, New York, 1949. Trigg, G.L., editor, Mathematical Tools for Physicists, Wiley-VCH, Weinheim, 2005. Wan, F.Y.M., Introduction to the Calculus of Variations and its Applications, Chapman and Hall, New York, 1995. Wang, F.Y., Physics with Maple: The Computer Algebra Resource for Mathematical Methods in Physics, Wiley-VCH, Weinheim, 2006. Watson, G.N., A Treatise on the Theory of Bessel Functions, Cambridge University Press, London, second edition, 1962. Weber, H.J., and G.B. Arfken, Essential Mathematical Methods for Physicists, Academic Press, San Diego, 2003. Wilks, J., The Third Law of Thermodynamics, Oxford University Press, London, 1961. Whittaker, E.T., and G.N. Watson, A Course on Modern Analysis, Cambridge University Press, New York, 1958. Woolfson, M.M., and M.S. Woolfson, Mathematics for Physics, Oxford University Press, Oxford, 2007. Zeilinger, A., Quantum Information, Physics World, vol. 11 no. 3, March 1998.
792
REFERENCES
Zeilinger, A., Quantum Teleportation, Scientific American, pg. 32, April 2000. Ziemer, R.E., Elements of Engineering Probability and Statistics, Prentice Hall, Upper Saddle River, N J , 1997.
INDEX
Absolute maximum, 14 Absolute minimum, 14 Absolutely integrable, 591 Action, 653 Action at a distance, 109 Addition formula Bessel functions, 537 Alternating series, 313 Amplitude spectrum, 609 Analytic functions, 349 derivative, 384 Taylor series, 11 Antiderivative pirimitive, 36 Arc length, 83 Area of a surface, 173 Argument, 335 function, 3 Associated Laguerre polynomials, 566 Average function, 35 Baker-Hausdorf formula, 294 Basis states, 754 Basis vectors, 141, 167, 245 Bayes’ criteria, 738
Bayes’ formula, 675 Bell states entanglement, 771 Bernoulli equation, 423 Bessel function addition formula, 537 Jacobi-Agner expansion, 537 Bessel functions boundary conditions, 531 expansion theorem, 531 first-kind, 513 generating functions, 519 integral definitions, 521 orthogonality roots, 527 recursion relations, 518 second-kind, 514 third-kind, 517 Weber integral, 533 Wronskians, 522 Bessel’s equation series solution, 510 Bessel’s inequality, 590 Binomial coefficients, 681 Binomial distribution, 701 moments, 707
Essentials of Mathematical Methods in Science and Engineering. By Copyright @ 2008 John Wiley & Sons, Inc.
5. SelGuk Bayin 793
794
INDEX
Binomial formula binomial coefficients, 323 Bit, 723 Boltzmann distribution gases, 686 solids, 684 Bose-Einstein condensation, 688 Bose-Einstein distribution, 687 Boundary conditions, 4 spherical coordinates, 558 Boundary point, 2 Bounded variation, 594 Bra-ket vectors, 297 Brachistochrone problem, 664 Buffon’sneedle, 677 Cartesian coordinates, 62 Cartesian tensors, 148 Cauchy criteria, 306 Cauchy integral formula, 382 Cauchy principal value, 41 Cauchy product, 316 Cauchy-Euler equation explicit solutions, 441 Cauchy-Goursat theorem, 376 Cauchy-Riemann conditions, 346 polar coordinates, 351 Cauchy-Schwartz inequality, 586 Cbit, 746, 767 operations, 750 Central moment, 705 Change of basis, 254 Channel blocker, 765 Characteristic equation, 258 Characteristic value eigenvalue, 257 Chebyshev’s theorem, 710 Chi-square, 700 Clairaut equation, 428 Closed set, 2 Collectively independent events, 675 Combinations, 681 Commutator, 756 Comparison test, 309 Completeness, 274, 754 Complex algebra, 332 Complex conjugate, 334 Complex correlation function modified, 615 Complex functions exponentials, 354 hyperpolic functions, 357 inverse trigonometric functions, 362
limits and continuity, 344 logarithmic function, 358 polynomials, 354 powers, 359 trigonometric functions, 356 Complex infinity, 339 Complex integrals contour integrals, 370 indefinite integrals, 379 Complex plane extended, 339 Complex series convergence, 393 Laurent series, 389 Maclaurin series, 388 Taylor series, 385 Components, 275 covariant /contravariant, 159 Compressible flow, 81 Conditional probability, 673 Conjugate harmonic functions, 351 Conjugate variables, 753 Conservative forces, 118 Constraints, 659 Continuity, 4 piecewise, 596 Continuity equation, 129 Contour integrals, 370, 373 Contraction tensors, 150 Contravariant components, 159 Control bit, 770 Convergence absolute, 309 conditional, 309 integrals conditionally convergent, 37 series, 309 uniform, 309 Convergence tests, 309 Convolution, 621 Coordinate axes, 155 Coordinate curves, 155 Coordinate surfaces, 155 Coordinates components, 246 Correlation coefficient, 610 modified, 611 Correlation function, 610 Coulomb gauge, 126 Covariant components, 159 Cramer’s rule, 226 Critical point, 16 Cross product
INDEX
vector product, 61 Cryptograph, 775 Cryptography, 775 Cumulative distribution, 697 Curl, 77 Curl-meter, 105 Curvilinear coordinates, 154 Cylindrical coordinates, 187, 191 Darboux sum, 33 De Broglie relation, 753 Decision theory, 735 Decoherence, 769, 775 Del operator gradient, 74 DeMoivre’s formula, 336 Dense coding, 776 Dependent variable function, 3 Derivatives chain rule, 22 Determinants, 220 Laplace development, 222 minor, 221 order, 221 properties, 223 rank, 222 Differential equations exact equations integrating factors, 442 explicit solutions, 408 first-order, 410 exact, 417 integrating factors, 419 linear, 416 methods of solutions, 412 F’robenius method, 452 first-order equations, 462 general solution, 408 harmonic oscillator, 435 homogeneous nonhomogeneous, 409 implicit solution, 408 initial conditions, 409, 452 linear and higher order, 450 operator approach, 437 particular solution, 408, 444 quadratures, 408 second-order, 429 methods of solution, 430 singular solution, 408 uniqueness of solutions, 452 Differential operators differential equations, 409
Diffusion equation Cartesian coordinates, 550 cylindrical coordinates, 572 heat flow equation, 541 spherical coordinates, 564 Dirac’s bra-ket vectors, 297 Dirac-delta function, 618, 699 Direction cosines, 142 Directional derivative, 75 Dirichlet boundary condition, 558 Discrete distributions, 700 binomial, 701 Poisson, 703 uniform, 701 Displacement vector, 168 Distribution function, 694 Distribution functions arcsine, 694 Cauchy, 717 chi-square, 700 double triangle, 717 exponential, 699 gamma, 699 Gaussian, 699 hypergeometric, 719 Polya, 718 probability theory, 696 Rayleigh, 718 uniform, 698 Distributions expected value, 705 mean, 705 standart deviation, 705 variance, 705 Divergence div operator, 77 integral definition, 82 Divergence theorem Gauss’s theorem, 82 Domain function, 3 Domain of definition, 343 Dominant strategies, 781 Double integrals, 47 properties, 49 Dual spaces, 297 Duality, 753 Dummy index tensors, 151 Duplication formula, 521 Eigenstates, 754 Eigenvalue characteristic value, 257
795
796
INDEX
degenerate, 257 Eigenvalue problem symmetric matrices, 277 degenerate roots, 278 distinct roots, 278 Eigenvectors, 258 Electrostatics, 128 Entanglement Bell states, 771 Entire function, 349 Entropy solids, 689 Entropy of information, 729 Equation of continuity, 81 Equilibrium, 16 Essential singular point, 394 Euler constant, 516 Euler equation alternate form, 642 variational analysis, 642 Euler’s formula, 354 Events certain, 667 collectively independent, 675 impossible, 667 independent, 674 mutually exclusive, 669 pairwise independent, 675 random, 667 Exact differentials path independence, 114 Expectation value, 755 Expected gain, 739 Expected loss, 739 Expected value, 705 Extensive forms, 739 Extremum local absolute, 15 maximum minimum, 15 with conditions, 18 Extremum points, 637 Fermi-Dirac distribution, 689 Fields, 242 Flip operator, 766 Fourier series change of interval, 602 convergence, 593 differentiation, 603 Dirichlet integral, 594 exponential form, 592 fundamental theorem, 596
generalized, 588 Gibbs phenomenon, 598 integral representation, 594 integration, 603 periodic extention, 600 Riemann localization theorem, 595 sine/cosine series, 602 square wave, 597 triangular wave, 599 trigonometric, 591 uniqueness, 597 Fourier transform correlation function, 615 derivative, 621 existence, 620 inverse, 615 properties, 621 Free index tensors, 151 Frequency of occurrence, 672 Frequency spectrum, 609, 617 Frobenius method, 452 Function, 2 Functionals, 638 Fundamental theorem averages, 704 calculus, 36 Game theory, 737 Gamma distribution, 699 Gamma function, 458, 521, 526, 536 duplication formula, 526, 527 Gates, 747, 752, 758 Gauss’s law, 118 Gauss’s method linear equations, 217 Gauss’s theorem divergence theorem, 82 Gauss-Jordan reduction, 218 Gaussian distribution, 699 moments, 706 Gaussian surface, 111 General boundary condition, 558 General solution, 409 Generalized coordinates, 154, 653 area element, 171 curl, 185 divergence, 182 gradient, 179 Laplacian, 186 orthogonal, 186 volume element, 177 Geometric probability, 677 Geometric series, 310
INDEX
Gibbs phenomenon, 598 Gradient del operator, 74 generalized coordinates, 179 Gram-Schmidt orthogonalization, 276 Gramian, 275 Gravitational field, 108 Birkhoff’s theorem, 112 stars, 111 Gravitational potential, 116 Gravitational potential energy uniform sphere, 121 Green’s first identity, 107 Green’s second identity, 107, 137 Green’s theorem, 91 Cauchy-Goursat theorem, 376 multiply connected domains, 96 Hadamart operator, 766 Hamilton’s principle, 651 Hamiltonian operator, 758 Hankel functions, 517 Harmonic functions, 350 Harmonic series, 310 Heat flow equation Cartesian coordinates, 550 cylindrical coordinates, 572 spherical coordinates, 564 Heisenberg uncertainty, 753 Helmholtz spherical coordinates, 563 Helmholtz equation, 542 cylindrical coordinates, 570 Helmholtz theorem, 122 Hermite equation series solution, 487 Hermite polynomials, 491 contour integral definition, 492 generating function, 494 Hermite series, 499 orthogonality, 496 recursion relations, 495 Rodriguez formula, 493 special values, 495 weight function, 497 Hermitian, 289 Hermitian operators, 294 Hilbert space, 296, 754 completeness, 754 orthogonality, 754 Homogeneous differential equation, 409
Identity matrix unit matrix, 209 Identity operator, 766 Identity tensor, 152 Implicit functions, 25 Implicit solution, 408 Improper transformations, 140 Impulse function Dirac-delta function, 618 Incompressible fluids, 129 Independent variable function, 3 Indicia1 equation, 453 Inflection point, 15 Information conditional probabilities, 733 continuous distributions, 733 H-function, 729 joint events, 732 unit, 723 Information content, 728 Information processing, 726 Information value, 728 Initial conditions boundary conditions, 409 Inner product, 272, 586 norm, 586 Inner product space, 274 Integral indefinite, 36 Integral test, 309 Integrals absolutely convergent conditionally convergent, 37 Cauchy principal value, 41 Darboux sum, 33 double triple, 47 improper, 37 M-test, 42 multiple, 50 with a parameter, 42 Integrating factor, 419 Integration by parts, 37 Integration constant, 409 Interference, 757 Interferometer Mach-Zehnder, 760 Invariants, 147, 178 Inverse basis vectors, 167 Inverse Fourier transform, 615 Inverse functions, 30 Inverse matrix, 230 Inverse transformation, 144
797
798
INDEX
Irrotational flow, 129 Isolated singular points, 394 Jacobi determinant implicit functions, 27 Jacobi identity, 130 Jacobi-Agner expansion Bessel function, 537 Jacobian, 157 inverse functions, 30 Jordan arc, 372 Kinetic energy, 87 Kronecker delta, 63 identity tensor, 152 L 'HBpit a1's rule limits, 6 Lagrange multiplier extremum problems, 20 Lagrange's equation, 426 Lagrangian, 653, 657 constraints, 659 Laguerre equation, 500 series solution, 500 Laguerre polynomials, 502 contour integral definition, 502 generating function, 504 Laguerre series, 506 orthogonality, 505 Rodriguez formula, 503 special values, 504 Laplace development, 222 Laplace equation, 119, 541, 650 Cartesian coordinates, 546 cylindrical coordinates, 569 spherical coordinates, 557 Laplace transform, 622 differential equation, 625 inverse, 623 transfer functions, 627 Laplacian, 105 Laurent series, 389 Law of large numbers, 712 ergodic theorems, 705 Left derivative, 6 Legendre equation, 470 polynomial solutions, 474 series solution, 470 Legendre polynomials, 474 generating function, 478 Legendre series, 484 orthogonality, 482 recursion formulas, 481 Rodriguez formula, 477
special values, 480 Leibnitz's rule, 43 Levi-Civita symbol permutation symbol, 152 Limit comparison test, 309 Limits, 5 Line element, 164, 168 Line integrals arc length, 83 Linear combination, 244 Linear equations, 216 homogeneous, 233 Linear independence, 244, 275 Linear spaces vector space, 242 Linear transformations matrix representation, 293 operators, 249 Lines, 68 Liouville theorem, 402 Lorentz gauge, 129 M-test integrals, 42 Mach-Zehnder interferometer, 760 mathematics, 763 Maclaurin series, 11, 324, 388 Magnet ost at ics, 128 Magnitude, 58 Mapping function, 2 Matrices adjoint, 232 algebra, 209 cofactor, 231 diagonal, 209 dimension, 207 Hermitian, 294 self-adjoint, 289 identity matrix, 209 inverse matrix, 230 h e a r equations, 216 orthogonal, 287 rectangular, 207 row matrix column matrix, 208 spur trace, 211 square order, 208 submatrix partitioned matrix, 215 symmetry, 209
INDEX
transpose, 208 unitary, 291 zero matrix null matrix, 209 Maxwell’s equations, 128 Mean, 705 function, 35 Mean square error, 590 Mean value theorem Rolle’s theorem, 36 Median, 706 Method of elimination, 218 Metric tensor, 165 Minimax criteria, 738 Minkowski inequality, 586 Minor determinants, 221 Mixed state, 755 Modified Bessel functions, 523 Modulus, 334 Moment of inertia scalar, 285 Moment of inertia tensor, 265 Multinomial coefficients, 681 Multiple integrals, 50 Multiple-to-one functions, 3 Multiplication theorem, 673, 674 Multiply connected domain, 381 Multivalued functions, 3 Multivalued functions complex functions, 358 principal value, 358 Mutually exclusive events, 762 Nash equilibrium, 742 Natural boundary conditions, 642 Necker cubes, 748 Neighborhood, 2 Neumann boundary condition, 558 Neumann function, 515 No-cloning theorem, 768 control Qbit target Qbit, 770 Norm, 58 magnitude, 274 Riemann integral, 34 Normal distribution, 699 Normal forms, 738 Novelty value, 728 Null matrix zero matrix, 209
Null set, 2 Numbers scalars, 242 Nyquist sampling frequency, 609 One-time-pad quantum cryptography, 776 Open set, 2 Operators on Cbits, 750 Ordinary derivative, 6 Orthogonal functions completeness, 590 convergence mean, 590 inner product, 586 linear independence, 587 theory, 586 Orthogonal matrices, 287 Orthogonal transformations, 140 Orthogonality, 274, 754 Orthogonality condition, 143 Outer product tensors, 149 Pairwise independent events, 675 Parceval’s formula, 590 Parceval’s theorem, 622 Partial derivative, 6 Particular solution, 409 Partitioned matrices symmetry, 214 Path independence, 113 Payoff matrix, 738 Permutation symbol, 65 Levi-Civita tensor, 152 Permutations, 681 Phase shift operator, 766 Phase spectrum, 609 Piecewise continuous, 5 Planck formula, 753 Planes equation, 69 Poisson distribution, 703 moments, 708 Poisson’s equation, 119 Poles singular points, 394 Potential energy gravitational, 117 Power series, 321 Primitive antiderivative, 36 Principal coordinates, 265
799
800
INDEX
Principal directions, 265 Principal moments of inertia, 265 Prior uncertainty, 729 Probability classical definition, 668 entropy, 689 Probability amplitudes, 755 Probability density, 754 Probability density function, 698 Probability theory basic theorems, 669 Bayes’ formula, 675 Buffon’s needle, 677 Chebyshev’s theorem, 710 combinations, 678 compound element, 669 conditional probability, 673 distribution function, 694 elementary event, 669 event, 669 frequency of occurrence, 672 fundamental theorem, 704 geometric probability, 677 law of large numbers, 705 multiplication theorem, 673, 674 permutations, 678 random variables, 693 sample space, 668 simple event, 669 statistical definition, 672 total probability, 675 Proper transformations, 140 Protocols quantum cryptography, 776 Pseudotensors, 178 Cartesian tensors, 153 Pure state. 756 Qbit, 767 Qbit operators, 766 Qbit versus Cbit, 767 Quadratic forms, 285 Quadratures differential equations, 408 Quantum cryptography protocols, 776 Quantum dense coding, 776 Quantum information cryptography, 775 Vernam coding, 775 Quantum mechanics, 752 Radius of convergence, 322 Random variables, 693
Range, 343 function, 3 Rank, 222 tensors, 147 Ratio test, 310 Residue theorem, 398 Riccati equation, 424 Riemann integral, 34 Riemann localization theorem, 595 Riemann sphere, 340 Riemann-Lebesgue lemma, 591 Right derivative, 6 Rolle’s theorem mean value theorem, 36 Root test, 310 Row matrix, 208 Sample space, 668 Sampling property, 618 Sampling theorem Shannon, 609 Scalar field, 71 Scalar product dot product inner product, 60 Schrodinger equation time-dependent spherical coordinates, 566 time-independent spherical coordinates, 565 Schwarz inequality, 37, 65 Selenoidal fields, 82 Self-adjoint operators, 294 Self-energy gravitational, 1 21 Separation of variables, 542 Cartesian coordinates, 542 cylindrical coordinates, 567 spherical coordinates, 553 Sequences Cauchy criteria, 304 upper/lower limit, 307 Series Cauchy product, 316 convergence, 309 grouping, 314 indeterminate forms, 325 infinite, 308 multiplication, 314, 316 rearrangement, 315 Series of functions uniform convergence, 316 Series operations, 314 Signals, 608
INDEX
Similarity transformation, 256 Simple closed curve Jordan curve, 372 Simple pole, 394 Simply connected domain, 381 Singular point, 349 Singular points classification, 394 Singular solution, 409 Smooth curve, 372 Smooth functions very smooth functions, 596 Spectrum eigenvalues, 258 Spherical Bessel functions, 525 Spherical coordinates, 193 Spherical pendulum, 666 Spur tensors, 150 trace, 211 Square matrix, 208 Standart deviation, 705 State function wave function, 297, 754 State vector collapse, 755 wave function, 754 Stationary functions, 639 Stationary points, 637 Stationary values, 638 Statistical information, 727 Statistical probability, 672 Stereographic projection, 340 Stirling’s approximation, 312 Stokes’s theorem, 97, 102 Strain tensor, 153 Streamlines, 71 Stress tensor, 153 Submatrices, 214 Summation convention Einstein, 163 Superposed state, 757 Surface integrals, 88 Tangent plane to a surface, 75 Target bit, 770 Taylor series, 11, 324, 388 radius of convergence, 388 remainder, 387 Teleportation, 777 Teleportee, 777 Temperature, 691
801
Tensor density, 178 Cartesian tensors, 153 Tensors algebra, 148 rank, 147 spur trace, 150 tensor product outer product, 149 transpose, 149 Total differential, 10 Total probability, 675 Trace spur, 211 tensors, 150 Trace formula, 295 Transfer functions Laplace transforms, 627 Transformation matrix, 143 Transformations active/passive, 286 algebra, 252 inverse, 254 linear, 249 matrix representation, 250 product, 253 similar, 255 unitary, 291 Transpose, 149, 208 ’Ikaveler’s dilemma, 742 Triangle inequality, 65 Triple product, 66 Uncertainty principle, 753 Uniform convergence M-test, 42 properties, 319 Weierstrass M-test, 318 Uniform distribution, 698, 701 Union, 2 Unitary matrices, 291 Unitary space, 274 Unitary transformation, 758 Variational analysis Euler equation, 642 functionals, 638 general case, 647 inverse problem, 650 Laplace equation, 650 minimal surfaces, 649 natural boundary conditions, 642 notation, 645 stationary functions, 639
802
INDEX
stationary paths, 638 Vector algebra, 62 Vector field, 71 Vector multiplication, 60 Vector product cross product, 61 Vector spaces, 242 basis vectors, 245 dimension, 246 generators, 244 Vectors addition, 60 differentiation, 72 magnitude norm, 58 vector spaces, 242 Velocity potential, 129 Vernam coding, 775
Wave equation, 541 Cartesian coordinates, 544 cylindrical coordinates, 570 spherical coordinates, 563 Wave function state vector, 754 Weak links, 746 Weber integral Bessel functions, 533 Work done, 84, 113 Wronskian, 429 differential equations, 440 Zero matrix null matrix, 209 Zero-sum games, 738