David
Applied and Numerical Harmonic Analysis Series Editor John J. Benedetto University of Maryland Editorial Advisor...
407 downloads
2637 Views
28MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
David
Applied and Numerical Harmonic Analysis Series Editor John J. Benedetto University of Maryland Editorial Advisory Board Akram Aldroubi N I H , Biomedical Engineering/ Instrumentation Ingrid Daubechies Princeton University Christopher Heil Georgia lnstitute of Technology James McClellan Georgia lnsitute of Technology Michael Unser NIH, Biomedical Engineering1 lnstrumentation M. Victor Wickerhauser Washington University
Douglas Cochran Arizona State University Hans G. Feichtinger University of Vienna Murat Kunt Swiss Federal lnstitute of Technology, Lausanne Wim Sweldens Lucent Technologies Bell Laboratories Marfin Vetterli Swiss Federal lnstitute of Technology, Lausanne
Applied and Numerical Harmonic Analysis J.M. Cooper: Introduction to Partial Differential Equations with MATLAB (ISBN 0-8176-3967-5) C.E. D'Attellis and E.M. Fernandez-Berdaguer: Wavelet Theory and Harmonic Analysis in Applied Sciences (ISBN 0-8 176-3953-5)
H.G. Feichtinger and T. Strohmer: Gabor Analysis and Algorithms (ISBN 0-8176-3959-4) T.M. Peters, J.H.T. Bates, G.B. Pike, P. Munger, and J.C. Williams: Fourier Transforms and Biomedical Engineering (ISBN 0-8 176-394 1- 1) A.I. Saichev and W.A. Woyczynski: Distributions in the Physical and Engineering Sciences (ISBN 0-8176-3924-1)
R. Tolimierei and M. An: Time-Frequency Representations (ISBN 0-81763918-7)
G.T. Herman: Geometry of Digital Spaces (ISBN 0-8176-3897-0) A. Prochazka, J. Uhlii, P.J.W. Rayner, and N.G. Kingsbury: Signal Analysis and Prediction (ISBN 0-8176-4042-8) J. Ramanathan: Methods of Applied Fourier Analysis (ISBN 0-8 176-3963-2)
A. Teol is: Computational Signal Processing with Wavelets (ISBN 0-81763909-8) W.O. Bray and C.V. Stanojevic: Analysis of Divergence (ISBN 0-81764058-4)
G.T. Herman and A. Kuba: Discrete Tomography (ISBN 0-8176-4101-7) J.J. Benedetto and P.J.S.G. Ferreira (Eds.): Modern Sampling Theory (ISBN 0-8176-4023-1) P. Das, A. Abbate, and C. DeCusatis: Wavelets and Subbands (ISBN 08176-4136-X)
L. Debnath: Wavelet Transforms and Time-Frequency Signal Analysis (ISBN 0-8176-4104-1) K. Grochenig: Foundations of Time-Frequency Analysis (ISBN 0-81764022-3) D.F. Walnut: An Introduction to Wavelet Analysis (ISBN 0-8176-3962-4)
David F. Walnut
An Introdution to Wavelet Analysis With 88 Figures
Birkhauser Boston Basel Berlin
D a v i d F. W a l n u t Department of M a t h e m a t i c a l Sciences George M a s o n University Fairfax, V A 22030 USA
Library of Congress Cataloging-in-Publication Data Walril~t,,David 12. Ari iritrod~lct~ion t,o wavelet analysis / David I". Walnut, p. crri. (Applied and n~irrlc!ricalllarri~o~lir itrlalysis) Iric111dc:sl)ibliug~-apl~iral refcl-eiices and indt:x. ISBN 0-81763-3962-4 (alk. paper) 1. Wavelets (Matliemat,ic:s) I. Title. 11. Series. QA403.3 .W335 2001 515'.2433 dc21 2001025367 CIP
Prir~t~ed 011 acid-f'rei: paper. @ 2002 Hirkhii~iserBost,ori
Birkhauser
All rig lit,^ reserved. This work irlay riot. I J ~t,la~lslat,eclor copicd ill whole or in part wit.tlout t,l~r:wl.it,t,t!~~ 1~r.1-lllission of t,he p~lblisher(Birkki&liser Bost,ol~,c/o Sprir~gcr-VcrlagNew York, Irir., 175 Fift,h Avenue, New York, NY 10010, USA), except for brief excerpts in corir~ection wit,11 reviews or scholarly analysis. lJse ill conncctiori with any form of information storage and rctricval, elect,ronic adapt,at,iori, cornplit,er soft,ware, or t)y sirnilar or dissirnilar methodology riow kriown or liereafter developed is forbidden. T h c 11sc of gelirral descript,ivo names, t,radc rlanles, tradcmarks, etc., i11 this publication, oven if t,l~eforrrier are not (:specially ide~~t.ified, is not t,o 1 ~ :t,a.kr~las a, sign that silcli ilalnes, as ulltlerstood l)y tlie Trade Marks anti Mcrcharldise Ma1.k~Act,, rrlay accordingly be used freely by anyoni:.
IS13N 0-8176-3962-4 ISBN 3-7643-3962-4
SPIN 10574019
Product,io~~ marlageti t)y Louise Farkas; riian~ifact,~~ring sl~pa.visedt?y .Jacqui Ashri Typc:sc:t, I)y t,lie aiitlior ill LaTeX2c. Printeti and I)oli~id1,y Edwards Rrot,hers, Inr., Arin Artlor, MI. Pririt,ed i r ~t,lit: l1liitc:d St,at,es of America.
L3irkliiiiist:r Host,o~l Basel
Berlin
A rrr ernbcr of /3erlr,lsn1,ur~nSpringer Science+ B7rs,irir ss hI(:d./:nG,nl.bH
To my parents
and to Megan
Unless the LORD builds the house, its builders labor in vain. -
Psalm 1 2 7 ; l a (NIV)
Contents xiii
Preface
I 1
Preliminaries
1
F'linctions and Convergence 1.1 Functions . . . . . . . . . . . . . . . . . . 1.1.1 Bounded (L") Functions . . . . . 1.1.2 Integrable (L1) Functions . . . . . 1.1.3 Square Integrable (L2) Functions . 1.1.4 Differentiable (Cn) Functions . . . 1.2 Convergence of Sequences of Functions . . 1.2.1 Numerical Convergence . . . . . . 1.2.2 Pointwise Convergence . . . . . . . 1.2.3 Uniform (L") Convergence . . . . 1.2.4 Mean ( L 1 )Convergence . . . . . . 1.2.5 Mean-square (L2) Convergence . . 1.2.6 Interchange of Limits and Integrals
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
2 Fourier Series
2.1
Trigonometric Series . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Periodic Functions . . . . . . . 2.1.2 The Trigonometric System . . 2.1.3 The Fourier Coefficients . . . . 2.1.4 Convergence of Fourier Series . 2.2 Approximat.e Identities . . . . . . . . . 2.2.1 hlotivation from Fourier Series 2.2.2 Definition and Examples . . . . 2.2.3 Convergence Theorems . . . . . 2.3 Generalized Fourier Series . . . . . . . 2.3.1 Orthogonality . . . . . . . . . . 2.3.2 Generalized Fourier Series . . . 2.3.3 Complet.eness . . . . . . . . . . 3 The 3.1 3.2 3.3
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
Fourier Transform Motivation and Definition . . . . . . . . . . . . . . . . . . . Basic Properties of the Fourier Transform . . . . . . . . . . Fourier Inversion . . . . . . . . . . . . . . . . . . . . . . . .
3 3
3 3 6 9 11 11 13 14 17 19 21 27 27 27 28 30 32 37 38 40 42 47 47 49 52
59 59 63 65
Contents
viii
3.4 3.5 3.6 3.7 3.8 3.9
Coilvolutior~ . . . . . . . . . . . . . . . . . . . . . . Plancherel's Formula . . . . . . . . . . . . . . . . . The Fourier Trarlsfornl for L~ Functions . . . . . . Smoothiless versus Decay . . . . . . . . . . . . . . Dilation, Translation, ancl Modulation . . . . . . . Bandlirnitetl Functiorls and the Sarrlpling Formula.
. . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . .
4 Signals and Systems 4.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Systerris . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Causality and Stability . . . . . . . . . . . . 4.3 Periodic Signals a r ~ dthe Discrete Fourier Transform 4.3.1 The Discrete Fourier Transform . . . . . . . . 4.4 The Fast Fourier Transform . . . . . . . . . . . . . . 4.5 L2 Fourier Series . . . . . . . . . . . . . . . . . . . .
I1
. . . . . . . .
. . . . . . . .
. . . . . . . . . . . .
68 72 75 76 79 81
87 88 90 95 101 102 107 109
The Haar System
5 T h e Haar System 5.1 Dyadic Step Functions . . . . . . . . . . . . . . . . . . . . . 5.1.1 The Dyadic Intervals . . . . . . . . . . . . . . . . . . 5.1.2 The Scale j Dyadic Step Functions . . . . . . . . . . 5.2 The Haar System . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 T h e H a a r Scaling Functions and the Haar Functions . . . . . . . . . . . . . . . . . . . . . 5.2.2 Orthogonality of the Haar System . . . . . . . . . . 5.2.3 The Splitting Lemma . . . . . . . . . . . . . . . . . 5.3 Haar Bases on [O, 11 . . . . . . . . . . . . . . . . . . . . . . . 5.4 Comparison of Haar Series with Fourier Series . . . . . . . . 5.4.1 Representation of Functions with Small Support . . 5.4.2 Behavior of Haar Coefficients Near J u m p Discontinuities . . . . . . . . . . . . . . . . . . 5.4.3 Haar Coefficients and Global Smoothness . . . . . . 5.5 Haar B a s e s o n R . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 The Approximation and Detail Operators . . . . . . 5.5.2 The Scale J Haar System on R . . . . . . . . . . . . 5.5.3 The Haar System on R . . . . . . . . . . . . . . . .
115 115 115 116 117
6 The Discrete Haar Transform 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 The Discrete Haar Transform (DHT) . . . . . . . 6.2 The DHT in Two Dimensions . . . . . . . . . . . . . . . 6.2.1 The Row-wise and Column-wise Approximations and Details . . . . . . . . . . . . . . . . . . . . .
141
117 118 120 122 127 128 130 132 133 134 138 138
. . 141 . . 142 . . 146 . . 146
Contents
6.2.2 6.3 Iinage 6.3.1 6.3.2 6.3.3
I11
ix
The DHT for Matrices . . . . . . . . . . . . . . . . . Analysis with the DHT . . . . . . . . . . . . . . . . . Approximation and Blurring . . . . . . . . . . . . . Horizontal, Vertical, and Diagonal Edges . . . . . . "Naive" Image Compression . . . . . . . . . . . . . .
Ort honormal Wavelet Bases
147
150 151 153 154
161
7 Multiresolution Analysis 7.1 Orthonormal Systems of Translates . . . . . . . . . . . . . . 7.2 Definition of Multiresolution Analysis . . . . . . . . . . . . 7.2.1 Some Basic Properties of MRAs . . . . . . . . . . . 7.3 Examples of Multiresolution Analysis . . . . . . . . . . . . . 7.3.1 The Haar MRA . . . . . . . . . . . . . . . . . . . . . 7.3.2 The Piecewise Linear MRA . . . . . . . . . . . . . . 7.3.3 The Bandlimited MRA . . . . . . . . . . . . . . . . 7.3.4 The Meyer MRA . . . . . . . . . . . . . . . . . . . . 7.4 Construction and Examples of Orthonorrnal Wavelet Bases . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Examples of Wavclct Bases . . . . . . . . . . . . . . 7.4.2 Wavelets in Two Dimensions . . . . . . . . . . . . . 7.4.3 Localization of Wavelet Bases . . . . . . . . . . . . . 7.5 Proof of Theorem 7.35 . . . . . . . . . . . . . . . . . . . . . 7.5.1 Sufficient Conditions for a Wavelet Basis . . . . . . . 7.5.2 Proof of Theorem 7.35 . . . . . . . . . . . . . . . . . 7.6 Necessary Properties of the Scaling Function . . . . . . . . 7.7 General Spline Wavelets . . . . . . . . . . . . . . . . . . . . 7.7.1 Basic Properties of Spline Functions . . . . . . . . . 7.7.2 Spline Multiresolution Analyses . . . . . . . . . . . .
163 164 169 170 174 174 174 179 180
8 The Discrete Wavelet Transform 8.1 Motivation: From MRA t o a Discrete Transform . . . . . . 8.2 The Quadrature Mirror Filter Conditions . . . . . . . . . . 8.2.1 Motivation from MRA . . . . . . . . . . . . . . . . . 8.2.2 The Approximation and Detail Operators and Their Adjoints . . . . . . . . . . . . . . . . . . . . . 8.2.3 The Quadrature Mirror Filter (QMF) Conditions . . 8.3 The Discrete Wavelet Transform (DWT) . . . . . . . . . . . 8.3.1 The DWT for Signals . . . . . . . . . . . . . . . . . 8.3.2 The DWT for Finite Signals . . . . . . . . . . . . . . 8.3.3 The DWT as an Orthogonal Transformation . . . . 8.4 Scaling Functions from Scaling Sequences . . . . . . . . . . 8.4.1 The Infinite Product Formula . . . . . . . . . . . . . 8.4.2 The Cascade Algorithm . . . . . . . . . . . . . . . .
215 215 218 218
185 186 190 193 196 197 199 203 206 206 208
221 223 231 231 231 232 236 237 243
x
Contents 8.4.3
The Support of the Scaling Functio~i. . . . . . . . . 245
9 Smooth. Compactly Supported Wavelets 249 . . . . . . . . . . . . . . . . . . . . . . . 9.1 Vanishing Moments 249 Vanishing Moments and Smoothness . . . . . Vanishing Moments and Approximation . . . Vanishing hilomcnts and thc Reproduction of Polynomials . . . . . . . . . . . . . . . . . 9.1.4 Equivalent Conditions for Vanishing h!Iornent.s The Daubechies Wavelets . . . . . . . . . . . . . . . 9.2.1 The Daubechies Polynomials . . . . . . . . . 9.2.2 Spectral Factorization . . . . . . . . . . . . . Image Analysis with Smooth Wavelets . . . . . . . . 9.3.1 Approximation and Blurring . . . . . . . . . 9.3.2 "Naive7' Image Compression with Smooth Wavelets . . . . . . . . . . . . . . . .
9.1.1 9.1.2 9.1.3
9.2
9.3
IV
. . . . 250 . . . .
254
. . . . 257 . . . . 260
. . . . 264 . . . . 264 . . . . 269
. . . . 277 . . . . 278
. . . . 278
Other Wavelet Construct ions
10 Biort hogonal Wavelets 10.1 Linear Independence and Biorthogonality . . . . . . . . . . 10.2 Riesz Bases and the Frame Condition . . . . . . . . . . . . 10.3 Riesz Bases of Translates . . . . . . . . . . . . . . . . . . . 10.4 Generalized Multiresolution Analysis (GMRA) . . . . . . . 10.4.1 Basic Properties of GMRA . . . . . . . . . . . . . . 10.4.2 Dual GMRA and Riesz Bases of Wavelets . . . . . . 10.5 Riesz Bases Orthogonal Across Scales . . . . . . . . . . . . 10.5.1 Example: The Piecewise Linear GMRA . . . . . . . 10.6 A Discrete Transform for Biorthogonal Wavelets . . . . . . 10.6.1 Motivation from GMRA . . . . . . . . . . . . . . . . 10.6.2 The QMF Conditions . . . . . . . . . . . . . . . . . 10.7 Compactly Supported Biorthogonal Wavelets . . . . . . . . 10.7.1 Compactly Supported Spline Wavelets . . . . . . . . 10.7.2 Symmetric Biorthogonal Wavelets . . . . . . . . . . 10.7.3 Using Symmetry in the DWT . . . . . . . . . . . . . 11 Wavelet Packets
289 289 290 293 300 301 302 311 313 315 315 317 319 320 324 328
335 11.1 Motivation: Completing the Wavelet Tree . . . . . . . . . . 335 11.2 Locs.1iza.t.ion of Wavelet Packets . . . . . . . . . . . . . . . . 337 11.2.1 Time/Spatial Localization . . . . . . . . . . . . . . . 337 11.2.2 Frequency Localization . . . . . . . . . . . . . . . . 338 11.3 Orthogonality and Completeness Properties of Wavelet Packets . . . . . . . . . . . . . . . . . . . . . . . . 346 11.3.1 Wavelet Packet Bases with a Fixed Scale . . . . . . 347
xii
Conterlts B .1.2 Wavelets with Rational Noninteger Dilation Factors . . . . . . . . . . . . . . . B . 1.3 Local Cosine Bases . . . . . . . . . . . . . . B . 1.4 The Co~ltinuousWavelet Transform . . . . B . 1.5 Non-MRA Wavelets . . . . . . . . . . . . . B . l.G Multiwavelets . . . . . . . . . . . . . . . . . B.2 Wavelets in Other Domains . . . . . . . . . . . . . B.2.1 Wavelets on Intervals . . . . . . . . . . . . B.2.2 Wavelets in Higher Dimensions . . . . . . . B.2.3 The Lifting Scheme . . . . . . . . . . . . . B.3 Applications of Wavelets . . . . . . . . . . . . . . . B.3.1 Wavelet Denoising . . . . . . . . . . . . . . B.3.2 Multiscale Edge Detection . . . . . . . . . . B.3.3 The FBI Fingerprint Compression Standard
C References Cited in the Text Index
. . . . . 434 . . . . . 434 . . . . . 435 . . . . . 436 . . . . . 436 . . . . . 437 . . . . . 437 . . . . . 438 . . . . . 438 . . . . . 439 . . . . . 439 . . . . . 439 . . . . . 439
441
Preface These days there are dozens of wavelet books or1 the market, sonie of which are destiried t o be classics in the field. So a natural question to ask is: Why another one? I11 short, I wrote this book t o slipply tlie particular rieeds of students in a graduate course on wavelets that I have taught several tirnes since 1991 a t George Mason University. As is typica.1 with sllcli offerings, the course drew an audience with widely varying backgrounds and widely varying expectations. The difficult if not inlpossible task for me, the instructor, was t o present the beauty, usefiilriess. arid matliernatical depth of the sul~jectt o such an auclience. It would be insaiie t o claiin that I have been entirely successful in this task. However, through much trial ant1 error, I have arrived a t sorrle basic principles that are reflected in the stri~ctilreof this book. I believe that this makes this book distinct frorn existiilg text,s. and I hope that others may find the book useful. (1) Consistent assumptions of mathematical preparation. In sonle ways, the subject of wavelets is deceptively easy. It is not difficult to understand and implement a discrete wavelet trarlsforni and from there tlo analyzc and process signals arid irriages with great success. However, the underlying itleas and conrlectiorls that rrlake wavelets such a fascinating subject require some considera1)lc rriathematical sophistication. There have bceil some excellent books written on wavelets cnlphasizirlg their eleinelitary nature (e.g., Kaiser, A I;i-iendlv Glzide to Wavelt:ts; Strang and Nguyen, Wavelets and Filter Banks; Walker, P r i m ~ ron Wavelets and their Scientific Applications: Frazier, Irltroduction to Wavelets through Linear Algebra; Nievergelt, Ilitzvelets Made Easy; Ateyer, Wavelets: Algorithms and Applications). For rriy own purposes. such texts required quite a bit of "filling in the gaps" in order to make some conrlections and to prepare the student for rnore advanced books and research articles in wavelet theory. This book assuriies an upper-level undergraduate semester of advanced calculus. Sufficient preparation would corrle from, for example, Chapters 1-5 of Buck, Advanced Calculus. I have tried very hard not t o depart from this assumption a t any poirit in the book. This has required at times sacrificing elegance and generality for accessibility. However, all proofs are completely rigorous and conta.in the gist of the more general argunient. In this way, it is hoped that the reader will be prepared to tackle niore sophisticated books and articles on wavelet theory. (2) Proceeding from the continuous to the discrete. I have always found it more meaningful and ultirnately easier to start with a presenta-
xiv
Preface
tion of wavelets arid wavelet bases in the continuous dornain and use this to motivate the discrete theory, even thougli the discrete theory liarlgs together in its own right and is easy to understaiid. This can he frustrating for the student whose priniary int,erest is ill applications, but I believe that a better understanding of applications can ultirrlately be achieved by doing things in this order.
(3) Prepare readers to explore wavelet theory on their own. Wavelets is too broad a subject to cover i ~ai single book and is iriost interesting to study when the students have a particular interest in what they are studying. 111 clloosirig what to include in the book, I have tried t o ensure t,hat students are equipped to pursue more advanced topics on their own. I have included an appendix called Excursions in Wavelet Theory (Appendix B) that gives sorrie guidance toward what T consider to he t,he iriost readable articles on sorrle selected topics. The suggested topics in this appendix ca,rl also be used as the basis of serrlester projects for the students.
Structure of the Book The book is divided into five part,s: P~elirni7~n7..ies, Thc Hu,nr. Systcrrl, hfultiresol~~tion, Ar~nlysisand Orthonormal Wavelet Bases, Other Wavelet Constr.uctions, and Applications. Preliminaries Wavelet theory is really very liarcl to appreciate outside tlio context of t,hc language and ideas of Fourier Analysis. Chapters 1-4 of tlie book provide a background in sorrie of these ideas arid include cverytl.iing that is subsequerltly used in the text. These chapters are designed to be more than just a reference but less tllan a 'bbook-witliin-a-book" or1 Fourier analysis. Depenclirig on the background of the reader or of the class in which t,liis book is being used, these chapters are intended to t ~ edipped into eitllcr sliperficially or in detail as appropriate. Naturally there are a great rliarly books on Fourier analysis that cover the same rriaterial better and rnore thoroughly than do Chapters 1-4 and a t the sarne level (more or less) of Inathenlatical sophistication. I will list some of my favorites below. Walker, Fourier Analysis; Ka~nmler,A First Course in Fourier Analysis: Churchill and Brown, h u r i e r Series a i d Boundary Value Problems: Dym and McKean, Fourier Series a,nd Integrals; Korner, Fourier Analysis; and Benedetto, Harmonic Analysis and Applications. The Haar System Chapters 5 and 6 provide a self-contained exposition of the Haar systern. the earliest example of an orthor~orrrialwavelet basis. These chapters could
Preface
xv
be presented as is in a course on a d v a ~ ~ c ecalculus, d or an undergraduate Fouricr analysis coursc. In the context of the rest of the book, these chapters are designed to motivate the search for rrlore general wavelet bases with different properties. and also t o illustrate some of the more advanced concepts such as multiresolutiorl analysis that are used throughout the rest of the book. Chapter 5 contains a description of the Haar basis on [O, 11 and on R, and Chapter 6 shows how t o implernerlt a discrete version of the Haar basis in one and two dimensions. Some exa~rlplesof inlages analyzed with the Haar wavelet are also included.
Ort honor~nalWavelet Bases Chapters 7-9 represent the heart of the book. Chapter 7 contains an exposition of the general notion of a multiresolutiorl analysis (MRA) together with several examples. Next, we describe the recipe that gives the construction of a wavelet basis from an MRA, and then construct corresponding cxarrlples of wavelet orthonorinal bases. Chapter 8 describes the passage from the continuous domain to the discrete domain. First, properties of MRA are the11 used to niotivate and define the quadrature mirror filter (QMF) conditions that any orthonormal wavelet filter must satisfy. Then the discrete wavelet transfornl (DWT) is defined for infinite signals, periodic signals, and for finite sets of data. Finally the techniques used to pass from discrete filters satisfying the QhlF conditions to continuously defined wavelet functions are described. Chapter 9 presents the cor~struction of compactly supported orthornomal wavelet bases due to Daubechies. Daubechies's a.pproa.ch is motiva.ted by a, lengthy disclissiori of the importance of vanishing moments in the design of wavelet filters.
Other Wavelet Constructions Chapters 10 and 11 contain a discussion of two inlportarlt variations on the theme of the const,ri~ct,ion of orthonormw,l wavelet, ba.ses. The first. in Chapter 10, shows what happens wlleri you consider nonorthogonal wavelet systems. This chapter contains a discussion of Riesz bases, and describes the serni-orthogonal wavelets of Chui and Wang. as well as the notion of dual MRA and the fully biorthogonal wavelets of Daubechies. Cohen. arid Feauveau. Chapter 11 discusses wavelet packets. another natural variation on orthnorrnal wavelet bases. The motivation here is to consider what happens to the DWT when the '.full wavelet tree" is conlputed. JVavelet packet functions are described, their time and frequency localizatio~iproperties are discussed, and necessary and sufficierit conditions are give11 u n d e ~which a collection of scaled and shifted wavelet packets constitutes an orthonormal basis on R. Finally, the notion of a best basis is described. and the so-called best basis algoritliln (due to Coifman and Wickerhauser) is given.
xvi
Preface
Applications Many wavelet books have been written emphasizing applications of the theory, most notably, Strang and Nguyen, Wavelets and Filter Banks, and Mallat's comprehensive, A Wavelet Tour of Signal Processing. The book by Wickerhauser, Applied Wavelet Analysis from Theory to Software, also contains descriptions of several applications. The reader is encouraged to consult these texts and the references therein to learn more about wavelet applications. The description of applications in this book is limited to a brief description of two fundamental examples of wavelet applications. The first, described in Chapter 12, is to image compression. The basic components of a transform image coder as well as how wavelets fit into this picture are described. Chapter 13 describes the Beylkin-Coifman-Rokhlin (BCR) algorithm, which is useful for numerically estimating certain integral operators known as singular integral operators. The algorithm is very effective and uses the same basic properties of wavelets that make them useful for image compression. Several examples of singular integral operators arising in ordinary differential equations, complex variable theory, and image processing are given before the BCR algorithm is described.
Acknowledgments I want t o express my thanks t o the rnany folks ~ 1 . 1 0made this book possible. First and foremost, I want to thank my advisor and friend John Benedetto for encouraging me to take on this project and for graciously agreeing to publish it in his book series. Thanks also to Wayne Yuhasz, Lauren Schultz, Louise Farkas, and Shosharina Grossman at Birkhauser for their advice and support. I want to thank Margaret Mitchell for LaTeX advice and Jim Houston and Clovis. L. Tondo for modifying some of the figures to make them more readable. All of the figures in this book were created by me using MATLAB and the Wavelet ToolBox. Thanks to the Mathworks for creating such superior products. I would like also t o thank the National Science Foundation for its support and to the George Mason University Mathematics Department (especially Bob Sachs) for their constant encouragement. I also want to thank the students in my wavelets course who were guinea pigs for an early version of this text and who provided valuable feedback on ~rganiza~tion and found numerous typos in the text. Thanks t o Ben Crain, James Holdener, Amin Jazaeri, Jim Kelliher, Sami Nefissi, Matt Parker, and Jim Tirnper-. I also want t o thank Bill Heller, Joe Lakey, and Paul Salamonowicz for their careful reading of the text and their useful comments. Special thanks go to David Weiland for his willingness to use the manuscript in an u~ldergraduatecourse at Swarthmore College. The book is all the better
Preface
xvii
for his insights, and those of the unnamed students in the class. I want give special thanks to my Dad, with whom I had many conversations about book-writing. He passed away suddenly while this book was in production and never saw the finished product. He was pleased and proud to have a ~ ~ o t h published er author i11 the family. He is greatly missed. Finally, I want t o thank my wife Megan for her constant love and support, and my delightful children John and Genna who will someday read their names here and wonder how thcir old man actually did it.
Fairfax, Virginia
David F. Walnut
Albrecht Diirer (1471-1528), Melencholia I (engraving). Courtesy of the Fogg Art Museum, Harvard University Art Museums, Gift of William Gray from the collection of Francis Calley Gray. Phot,ograph by Rick St,a,fford, @President and Fellows of Harvard College. A detail of this engraving, a portion of the magic square, is used as the sample image in 22 figures in this book. The file processed is a portion of the image file detail.mat packaged with MATLAB version 5.0.
Part I
Preliminaries
Chapter 1 Functions and Convergence 1.1 Functions I . I . 1 Bounded
(L")Functions
D e f i i t i 1 . 1 A pzece,wise C O ~ I L L ~ ~ ~ ~ L U O ~~ UUS, I L C ~ ~ , Of T( Z5 ) defined 01%a n interval I is bounded ( o r L m ) o n I i f there i s a number A1 > O such that If (z)l 5 A1 for allx E I. T h e L m - n o r m of a functzor~f ( 2 ) i s defined b y IlfI2 = s 1 1 p { l f ( x ) I : : rE I ) .
(1.1)
Example 1.2. (a) If I is a closed. firlit,e interval, tlien nriy fil~lctio~l f (cc) contiriuous on I is also Lm 011 I (Theorern A.3). (b) Tlie fiirictiori f (z)
=
1/x is corltinuous and has a finite value at each
point of the irlterval ( O , l ] but is not bounded on (0, 11 (Figure 1.1).
(c) The functiorls f (x:) = sin(z) arid f (z) = cos(z) are Lm on R. Also. the complex-valued function f (z) = eix is Lm on R. 111 fact, 11 sin 1 1 , = 11 cos 1 1 , = Ileirllm = 1.
(d) Ariy polynornial f~lrictionp ( z ) is not Lw on R but is L" on every finite subinterval of R. (e) Any piecewise coritinllous fiirlctiori with orily jump discontirinities is L" ariy firlite interval I.
011
I . 1.2 Integrable (L1) Functions Definition 1.3. A piecewise continuous function f ( x ) &finned o n n,n, i,n,terual I is integrable ( o r of class L' o r simply L') o n I if the integral
is finite. The L'-norm of a function f ( x ) i s defined b y
4
Chapter 1. Functions and Convergence
FIGURE 1.1. Left: f (x) = 1/x is finite-valued but unbounded on (0, 11. Right: sin(x) (solid) and cos(jc) (dashed) are L" on R.
Example 1.4. (a) If f (z) is LW on a finite interval I, then f (z)is L1 on I. (b) Any function continuous on a finite closed interval I is L' on I . This is because such a. function must be Lm on I (Theorem A.3). ( c ) Any function piecewise continuous with only jump discontinuities on a finite closed interval I is L' on I.
(d) For any 0 < a < 1, the function f(z)= Izl-" is L1 on the interval [- 1, I]. Clearly f (z) is picccwise continuous with a11 i~~firiite discontinuity 1 ( f(x)ld z is improper and must be evalua.t,ed at x = 0. Thus the integral as an improper integral a,s follows: Is/-"dz
-
-E
12)"
ds
1
+ lirn
/xi-" dl:
E+O
1
1-a
lim (1 -
E-+O
=
2
The above example shows that an L1 function need not be Lm. If a then f(x) is not L1 on [-I, I]. ( e ) If a > 1, the function f (x) = xp" is L1 on the interval improper Riemann integral
converges.
> 1,
[I,ocl) since the
1.1. Functions
5
( f ) If 0 5 a 5 1, then f (x)= xpLYis not L1 on [ I ,co).But f (x)is L" on [I,co).This shows that an Lm function need not be L' on I if I is infinite. (g) The function f (x)= e-ixl is integrable on R since the improper Riemann integral
1, 00
e-1.1
converges. In fact,
JTme
x dx
d.
= 2.
We present below our first approximation theorem. It says that any function L1 on R can be approximated arbitrarily closely in the sense of the L1-norm by a function with compact support. Theorem 1.5 is illustrated in Figure 1.2.
Theorem 1.5. Let f (rc) be L' o n R, and let E > 0 be given. T h e n there exists a number R such that 2f
Then
Proof: Since f (x) is integrable, the definition of the improper Riemann integral implies that there is a number v such that rr
Hence, given
E
> 0, there
Pick a nurrlber R
Then,
roo
is a number ro
> 4.0, and define
> 0 such that
if r
> ro, then
6
Chapter 1. Functions and Convergence
FIGURE 1.2. Illustration of Theorem 1.5. Left: Graph o f f (x). Area of shaded region is < F. Right: Graph of g(x) with R = 10.
Definition 1.6. A piecewise continuousfunction f (x) defined o n a n interval I is square-integrable ( o r of class L' or simply L') on I i f the integral
i s finite. The
norm of a function
f (x) is defined by
Example 1.7. (a) Any function bounded on a finite interval I is also L2 on I. This includes functions continuous on closed intervals and functions piecewise continuous on closed intervals with only jump discontinuities. (b) Any function that is La and L1 on any interval I (finite or infinite) is also L~ on I.
(c) For any 0 < a < 112, the function f (x) = x-" is L2 on the interval [-I, 11. Therefore an integrable function need not be bounded. If a 112, then the corresponding f (x) is not L2 on [-I, 11.
>
(d) If a > 1/2, the function f (x) = xp" is L2 on the interval [I,m ) . If 0 Q 1/2,then the corresponding f (x) is not L2 on [I,m ) .
< <
Theorem 1.8.
(Cauchy-Schwarz Inequality) Let f (z) and g(z) be L~ o n the
1.1. Functions
7
interval I . Then
Proof: Let us assume first that f (x)and g(x) are real-valued, and let t be an arbitrary real number. Then
This expression represents a real-valued quadratic function of t that is non~legativefor all t . Hence its discriminant must be rlonpositive (the discriminant of a quadratic expression at2 + bt + c is b2 - 4ac). Therefore,
and (1.5)follows. If f (z)and g(x) are not real-valued, then we observe
(Theorem A.4) and then proceed as in the real-valued case.
C]
From the Cauchy-Schwarz inequality, we can say something about the relation between L1 and L~ functions.
Theorem 1.9. Let I be a finite interval. Iff (x) is L2 o n I , t h e n i t is L1 o n I.
Proof: By the Cauchy-Schwarz inequality,
, right side and hence Since I is a finite interval and since f (x) is L ~ the the left side of the above inequality are finite.
8
Chapter 1. Functions and Convergence
Remark 1.10. (a) The conclusion of Theorem 1.9 does not hold if I is l / x if z 1 an infinite interval. For example, let f (x) = Then f (x) 0 ifx<1. is L~ on R but is not L1 on R.
>
(b) The converse of the Theorem 1.9 is false for both finite and infinite intervals. This means that if f (z) is L' on I , then it is not necessarily true that f ( x ) is L2 on I. For example, let f (x) = x-'I2 for x E ( 0 , l ) . Then f (x) is L1 but not L2 on ( 0 , l ) . The following consequence of the Cauchy-Schwarz inequality allows us to conclude that linear combinations of L~ functions are also L~ (Exercise 1.19). It is also known as the triangle inequality for L~ functions; that is, it says that J J f 9 ) J 25 J J f 112 J I g ) ) 2(see Exercise 1.18).
+
Theorem 1.11. internal I . Then
+
(Minkowski's Inequality) Let f ( x ) and g ( x ) be L 2 on t h e
Proof: By the Cauchy-Schwarz inequality,
and (1.6) follows.
The following approximation theorem says that a function f (z) that is
L2 on R can be approximated arbitrarily closely by a compactly supported
no norm.
function in the sense of the The proof is very similar to that of Theorem 1.5 and is left as an exercise (Exercise 1.20).
Theorem 1.12. Let f(x) be L~ on R, and let E > 0 be given. Then there ), exists a n u m b e r R such that i f g ( x ) = f ( x )X I - ~ , ~ ] ( Xthen
1
00
If
( x ) - s(x)12dx =
llf
- gill
< r.
1.1.Functions
1.1.4
9
Dzflerentiable (Cn)Functions
Definition 1.13. Given n E N , we say that a function f (x) defined o n a n interval I is C n on I if it is n-times continuously diflerentiable o n I . C0 o n I means that f (x) is continuous o n I . f (x) is C m o n I if it is Cn o n I for every rc E N .
W e say that f (x) is C," o n I if it i s Cn o n I and compactly supported, C,O o n I if it is C0 o n I and compactly supported, and C y o n I if it is C" o n I and compactly supported.
Example 1.14. (a) If p(x) is a polynomial, then p(x) is Cm on R. (b) The "hat" or "tent" function defined by f (x) = (1 - 1x1) Xr-l,ll (x) is continuous but not differentiable on R since the derivative fails t o exist at x = -1, 0, 1. Thus f (x) is C0 (in fact C:) but not C1 on R (Figure 1.3(a)). (c) We can generate an example of a snloother function by taking the tent function f (x) defined in (b) and taking its antiderivative. That is, we can define the function g(x) by
Putting -m as the lower limit of integration is just a notational convenience since we could just as easily have started the integration a t any constant less than -1. By the Fundamental Theorem of Calculus (Theorem A.5)) g f ( x ) = f (x) for all x E R since f (x) is colltirluous on R. Therefore g(x) is C1 but not C? Note that g ( x ) is not C,' on R. (Figure 1.3(b)). (d) The function g(x) defined in (c) is L" on R but is not L1 or L~ on R. In fact, since g(x) = 1 for all x > 1, g(x) does not even go to zero as x goes t o co.However, by modifying the construction in (c), it is possible to define a function that is C: but not C2 on R. In particular, such a function would be L1 and L2 on R. The idea is to form two shifts of f (x), and subtract one from the other so that the resulting sum has integral zero. Then the antiderivative will vanish after a certain point. Specifically, define the function F(x) by
and the function g(x) by
S(x) =
J(t) d t .
Then g(z) is C1 but not C2 for the sarne reason that g(x) is, and F(x) is CE since it is supported in [-3/2,3/2] (Figures 1.3(c) and (d)).
10
Chapter 1. Functions arid Convergerlce
(e) It is possible to define a sequence of functions with irlcreasiilg snloothness based on the procetlure outlined in (d). First define the function B o ( x ) by BO(x) = )C;-1/2,1/21 (x),and for each n E N. define B,,( 2 : ) by ..I'
(
=(
B l - l ( t
+ 1/21 - (
t - 1/2)) dt
=
J
.I,+1 / 2
1.-1/2
,
B,,- ( t )d t .
Note that B l ( x ) is exactly the tent function defined in (I)). Each B,, (x) vanishcs outside the interval [ - ( ? I 1)/2. (rz + 1)/2] and for n 2 1 is Cn-l but not C" on R (Bo(.r) is not C O ) The . functioil B,,(J.) is called the Bsplinr of order I L a i d is ill fact a pieccwise poly~loilliiil.hlore int,erestirlg properties of spliiie functions arc give11 in Scctioii 7.7.1.
+
FIGURE 1.3. Top left: Graph of "hat" function f ( r )= (1- 1x1) (x). Top right: Antiderivative of f ( s ) . Bottom left: Graph of f (x 112) and - f ( x - 1/2). Bottom right,: Antiderivative of f(s). This function is C: on R but not C'.
+
Exercises Exercise 1.15. Prove each of the statcnients made in Example 1.2.
1.2. Convergence of Sequences of Functions
11
Exercise 1.16. Prove each of the statements rnade in Example 1.4. Exercise 1.17. Prove each of the statements made in Example 1.7. Exercise 1.18. Prove that each of the norms 11 - I / , for p = 1, 2, oo satisfy the norm axioms: Given f (x) arid g ( x ) LP on an interval I: (a)
llf llp 2 0.
(b)
(1 f ( 1 ,
(c)
IIQflip = IQI If ( l p , for every
(d)
11 f' + g1Ip 5 1 flip + llgllp (this is known as the triangle inequality).
=
0 if and only if f (x) = 0. Q
E C.
Exercise 1.19. Prove that the collection of functions Lp on an interval I (p = 1, 2, oo) is closed under the forrriation of linear combinations. That is, show that if f,,(z) is LP 011 I for 1 5 n < N and if a l , (22, . . . , CYN are complex numbers, then a,, f,,(x) is also LP on I.
c:==,
Exercise 1.20.
Prove Theorem 1.12.
1.2 Convergence of Sequences of Functions 1.2.1 Numerical Convergence Definition 1.21.
T h e sequence { a , , ) , , E ~converges to the nurnher a if for every c > 0, there is a n N > 0 such that if n N , then (a,, - a1 < t. In this case, we uirite a , t a as n t m, o r lim,,,, a,, = a . A numerical series, denoted a,,, convcrgcs to a number S i f the sequence N of partial sums { S N ) N € N defined b y s~ = E n = l an converges t o S . I n this case, we write E F = I an = S. W e will frequently denote the series a, b y
>
zr=l
xr=,
-
L n E N
A series
CnEN a , converges absolutely if EILEN la,,I converges.
Remark 1.22. (a) A fundamental property of the real numbers is known as the completeness property. The completeness property for the real numbers says that every set of real numbers bounded above has a supremum (or least upper bound). (b) A sequence of real numbers {an)nENis an increasing sequence if a, 5 a,+l for all n E N. The completeness property of the real numbers implies
"
'Since we assume that f (a)is undefined if f (x) has a discontinuity at x = a,then f (x) s On means "f (x) = 0 a t each point of continuity of f (x)."
12
Chapter 1. Functions and Convergence
that any bounded, increasing sequence of real numbers always converges to its least upper bound. a,, The partial sums of a series with nonnegative terms, (i.e., CnEN where a, 0) form an increasing sequence. Therefore, it follows that if a series of nonnegative terms is bounded, then it converges.
>
(c) A sequence of real or complex numbers is Cauchy if for every e > 0, there is an N > 0 such that if n , m _> N , then la, - a,l < t. Another consequence of the conlpleteness property for real numbers is the following: Every Cauchy sequence of numbers converges.
Example 1.23. (a) Consider the series quence can be computed since
If Iri < 1, then
SN
-+ 1/(1
-
r ) as N
Cr--nr n . The partial ,-
+ 00.Therefore, if
Irl
surn sc-
< 1, then
(b) Consider the series C,q"=l l / n ? Clearly,
Therefore,
for all N. Since each of the terms l / n 2 is positive, { s N l N E Nis a bounded, increasing sequence. Therefore, it converges t o its least upper bound and the series Cr=l1/n2 converges. Note that we have proved that the series converges but we have made no statement about the iralue of its limit. The same argument can be used to show that the series C:=l l/nP converges for every p > 1, but again does not give the value of its limit. (c) The Weierstrass M-test is a well-known test for convergence of a series. Consider the series En a,. The Weierstrass M-test says that if la, 1 5 b, for all n and if CntNLT1 converges, then CTrEN an converges. For example, consider the series cos(n)/n2. Since 1 cos(n)/n21 5 l / n 2 for all n and since Cr=,1/n2 converges, so does C:=, cos(n)/n2. Note that again we have proved that the series converges but have not given the value of its limit.
,,
En=,
1.2. Convergence of Sequences of Functions
13
(d) A consequence of the Weierstrass M-test is the following. If a series converges absolutely, then the original series also converges. Absolute convergence is equivalent to saying that the series converges regardless of the order in which the terms are summed. It is not true that all convergent series are absolutely convergent. For example, it is shown in most calculus books that the series x~=,(-l)"/71 converges but the harmonic series C z = l l / n does not. In dis(e) A doubly infinite sequence is a sequence of the forrn cussing the convergence of such sequences, we look at two lirnits, namely, linl,,-too an and lim,,,, a_,. If both converge to the same number, say, to a , then we write lirnl,,l+, a,,= a.
(f) A doubly infinite series is a series of the forrn C r = - m a,. In discussing cc the convergence of such series, we look at two series, rlxnlely, Cr,=, arband 00 El,=,a P n. If both of these series converge, then tliere is no problem. If C T00L =a, , = St and C:IP=, a p r 1= S ,then Cr=-, a,, = on+St+S- = S . In this case, we write also l i ~ n ~ , ~ ~ , ,r = a,,- =~S. ~ We will frrqllently denote the series a,, by C r L E artZ 01. simply by C,,a,, .
xr=-,
(g) If a doubly infinite series converges absolutely, then it converges regardless of the order in which the terms are surnmed. This is not tkic casc with series that do not convergc absolutely. Consider the series l / n ,where the n = 0 term is understood to be zero. Clearly, this series does not conN verge absolutely. However, because of cancellst ion, s~ = C rl=- 1/71 = 0. Hence, the sgmmetric partial sums corivergc to zcro. However, if we define
x:=-,
N2
S
N
>
~
1/x dz
= 111 N
+ oo
as N + ca. Therefore, if a doubly infinite scries does rlot converge absolutcly, then the form of the partial sums rnust be givcrl explicitly in order to discuss the convergence of the series. This is true of any series that converges but not absolutely.
2
. Pointwise Convergence
Definition 1.24. A sequence of functions {f, ( x ) ) ~ defined ~ ~ N o n a n interval I converges pointwise t o a function f (x) i f f o r each xu E I , the numerical sequence { f n ( x O ) ) n E N converges t o f (xu). W e write f,(x) + f (x) pointwise o n I, as n
4 00.
T h e series
Cr==l l T L ( x= ) f (x) po%nt,wiseo n a n inter~1alI
i f ,for each
XU
E I,
C:y, fn(x0) = f (20). Example 1.25. (a) Let f,,(x) = zn, x E [0,1) for all 71 E N. Then fn(z)+ 0 poilltwise 011 [O, 1) as 12 + a. See Figure 1.4(a).
14
Chapter 1. Furlctiorls and Convergence
(b) Let
2n:c 2 - 2nz 0
+ 0 pointwise on
Theri f,,(.r.)
ifx~[O.l/2n) if T E [1/2n,l / n ) if x E [ l / n , 11.
[0, 11. See Figure 1.4(1-I).
(c) Lct
2n2x
Tlle11 f ,,( J : )
+ 0 pointwise oil
if z E [O, 1/211) if x E [1/21~,l / n ) if x E [ l / n , 11.
[0, 11.See Figlil-t. 1.4(c)
1
zr' = -poir~twiseon (- 1. 1).
(d) The series
1-
r,=O
3(,
COS 'I).[.'
- converges poiritwise on R to its liniit by the
( f ) Tlie serics
11,2
00
COS 7) :X'
-- coilvergcs at odd lrlult,iplesof
( g ) The series r,=l
;.r
(since it reduces
'It '
t o thc: :tltcrnatirig scrics C r = l ( - l ) " / n ) but divergcs a t even m~iltiplcsof sr (since it rcduccls t o the liarrr~o~lic series). In h c t . it can be shown that t h e series coiivcrgcs for all .c that are riot ever1 i ~ i u l t i p l ~ofs K .
1.2.3
Unrifornl ( L m ) Convergence
Definition 1.26.
T h e sequc,rLce { f,, (z)},, E~ converges ~lrliforrnlyon I to t h e hirlctiorl f (.c) i f for r:uery c > 0, t h ~ v is : u n N > 0 .such that i f 11 2 N , t / ~ e , r ~ 1 f,L(x)- f (r)I < 6 for all z E I . We write f r L ( . r )+ f ( x ) ~ n i f o , r m l yo n 1 ass 71 + m . T h e series f , , ( z ) = f ( z ) u n i f o m l ~o n I zf the sequence of partial sums SN(X)
=
xr=,
zit=l fn N
( 2 ) converges
~ ~ n i f o m to l y f (x) on I .
Remark 1.27. (a) With uniforl~lconvelgence. for a given E the sanie N works for all x E I, whereas with pointwise Convergence N may depend on both F arid x. In other words, unifornl convergerice says that given E > 0 there is an N > 0 such that for all n ',N , the maximum difference between f,,(x) and f ( J ) on I is smaller than E . Because of this. uniform convergence
1.2. Convergence o f Sequences of F~lnctions
15
FIGURE 1.4. Top Left: Graph of , f , , ( x ) = .I." on [0, 1) for 11 = 2 , 4.8. Top Right: Graph of f,,(:x:) on [ 0 ,1 ) where f , , ( . ~ : is ) defined in Example 1.25(b). Bottom: Graph of f , , ( x ) on [O. 1 ) where f , , ( s ) is defined in Example 1 . 2 5 ( ~ ) .
is also called LX con?)f>.rgcJncr. Tliat is, f,,(.r.) only if 11 f,, f ,1 + O as 11 + 3 ~ .
--+
f(.r.)
unifori~~ly on I if ant1
-
(11) 111 Exainple 1.25(1)), thr. coilvcrgcnce of' f,, (s)t o 0 is pointwisc but not uniforrn. This is t)ecausc the nlaxirllur~~ cliffcrerlcc between ,f,,(r)ant1 t,hc liniit filrlctiol~f (1:) = 0 is 1 110 ma.t,t,e~. what I , is. 111 otller words. 11 f r ,- f I m = 11 fr,llno = 1 for all n,, aild so I f r ,- f 11% f i O as 11 + m-
(c) 111 Exall~ple1.25(c), t,llc convergc~llrbe of f,, (.I.) t o O is also poiiltwisc 1)ut not ~inifornl.111 fact, in this cast‘, 11 f,, - .f llcx: = Ilf,, llx = 7 1 h r a11 1 1 . Tliereforc 11 f,, - f ,1 + cc as n + m. 111fact,. there. an.c3 110 exainplcs of sequences that converge ul~iforrrllyon a n interval but not poiiltwise. 111 otlier words, the followil~gtheorel11 11olds.
Theorem 1.28. I f f , , (.r) + f
(.c)
in L"
071
a n ~ntcmnlI . then f,,(.r)
+ f (z)
poi7~t7uise an I .
Proof: Exercise 1.44.
An irllportai~ttheorcn~fro111 ;idvailced calc111usis t,he followillg. Its proof
16
Chapter 1. Functions and Convergence
is left as an exercise but can be found in almost any advanced calculus book (for example, Rurk, p. 266, Theorem 3).
Theorem 1.29, If fn(x) + f (x) unzformly on the interval 1 , and if each f , , ( x ) is continzious on I , t h . ~ nf , (z) is con,tin.uo?sson I .
Proof: Exercise 1.45.
Example 1.30. As an illustration of Theorem 1.29, let
Then each f n (x) is continuous on [- 1,1]and { frL( x ) } , , converges ~~ pointwise to the function f ( z ) defined by
which has a jump discontinuity at x = 0 (see Figure 1.5). It car1 be shown directly t,hat f,,(z) does not converge to f (z) in Lm on [-I, 11,but a different argument utilizing Theorem 1.29 would be as follows. If f l L (x) + f (z)in Lm on [-I, 11, then since each f, (z) is continuous, Theorem 1.29 irmplies that f (z)shou,ld also be continuous. Since this i s not
the case, the convergence cannot be i n La.
Example 1.31. (a) The sequence { x n I r L E N converges iiniformly to zero on [-a,a] for all O < a < 1 but does not converge uniformly t o zero on (-111). -
~
(b) The series
1 I" = - uniformly
1-x
n=O
on [-a, a] for all 0
< a < 1, but
not on (- 1 , l ) . 30
(c) The series
1n!
IC -=
ex uniformly on every finite interval I, but not on
n=O
R. rX,
cos n x
- converges
(d) Tlie series n=l
Weierstrass &!-test.
n . .2
uniformly t o its limit on R by the
1.2. Convergence of Sequences of Functions
FIGURE 1.5. Left: Graph of f,,(x) of Example 1.30 for Right: Graph of the limit function f (.r).
1.2.4
71.
17
= 2, 4 , 8.
Mean ( L 1 )Con,veq:rlmce
Definition 1.32.
T h e sequence { f , , ( in ruean to the function f ( s ) 071, I if
x ) ) , ~ ~defined N
o n a n interval I coilverges
W e write f,,(x)+ f(r) in mean orL I a s TL + a. Mean convergence is also referred to as L' convergence because f,,( x ) + f (.r) in m e a n o n I as rl + oo is identical t o the staternent that lim,,,, 11 f,,- f 111 = 0 . The series fi,(x)= f (s) in rnean o n I if the seyuer~ceof yar.tir~lsums N S N (n.) = f,, (x) converges i n mean to f (x) 0 7 , I .
Cr=, CIL=,
Llean convergence can be interpreted as saying that the area between the curves y = f,,(z)and y = f (z) goes t o zero as 7 1 + oc. This type of convergence allows point values of f,,(.~.) and f (x) to differ considerably but says that o n averagc t,hc functions f',,(x) and f ( z ) are close for large n.
Example 1.33. (a) Let J;,(le) = .c". :r* E [O,1)for all r~ E N. As we have seen in Example 1.25(a),this sequence converges to f (z) = 0 poirltwisc on [O, 1) but not uniformly on [0, 1). Since
as n
+ oo:f,,(z)+=O in mean on [o. 1).
18
Chapter 1. Functions andconvergence
-
(b) Consider the sequence {fn(z)),,EN defined in Exarnple 1.25(b). The sequence converges pointwise but not uniformly t o f ( z ) 0 on [0, 11. Since the area under the graph of f,(z) is 1/2n for each n. the sequence also converges in mean t o f (z) on [0, 11. In this example, we can see the character of mean convergence. If n is large, the function f,(z) is close to the limit function f (z) 0 (in fact identical to it) on most of the interval [ O , l ] , specifically on [1/n, 11, and far away from it on the rest of the interval [O. 1/72?. However, 011 average, f, (z) is close t o the lirnit function.
-
(c) The sequence { f , ( ~ ) )defined , ~ ~ in Example 1.25(c) tells a different story. The sequence converges pointwise but not uniformly to f (z) Y 0 on [O. 11, but since the area under the graph of f,, (x) is always 1, f,(z) does not converge to f ( z ) in mean. The width of the triangle under the graph of f (z) decreases t o zero, but the height increases to infinity in such a way that the area of the triangle does not go t o zero. The above examples show that sometimes pointwise convergence and mean convergence go together and sometimes they do not. The proof of the followirlg theorern is left as ail exercise (Exercise 1.47).
Theorem 1.34. If f,(x)
-+ f (x) in Lm
on n finite interval I , then f,(x)
+
f ( x ) in L' on I .
Remark 1.35. (a) The conclusion of Theorem 1.34 is false if the interval I is infinite. Consider fur exarrlple the sequence f , (x) = ( l / n )X [ o , n (x). ~ Then f,(s) i 0 in L" on R but Jyx I f , (z) - 01 dx = 1 for all n, so that f,(z) does not converge to zero in L1. (b) The converse of Theorem 1.34 is also false, as can be seen by considering Example 1.33(b). In this example, f,,(x) convcrges to 0 in L' on [ O , 1 ] but does not converge to 0 in L" on [0, 11. (c) In all of the examples of mean convcrgcncc considered so far, the sequences have also converged pointwise. hfust this always be the case? The by the following example. answer turns out t o be "no," as is ill~st~rated
Example 1.36. Define the interval I i . k by I,i.k = [2-jk, 2 - J ( k + I ) ) , for j E Z+ and 0 5 k 5 2J - 1. Let us make some elementary observations about the intervals I j , k . (a) Each
Ij.k
is a subinterval of [0, 1).
(b) The length of
is 2-3; that is, I I i ~ k= 2 - j .
(c) Each natural number n corresponds t o a unique pair (j,k), j E Z+ and 0 k 5 2.7' - 1, such that n = 2.7' + k. For each n E N, call this pair (j,, , k,). As n --+ co,j, -+ co also.
<
1.2. Convergence of Sequences of Functions
19
{I~.~}:L~'
(d) For each j . the collection of intervals forms a partlition of [O, 1);that is, the intervals are disjoint and cover all of [O; 1).
-
Now, define f , ( z ) = X I J I L , k r L( z ) . Then since I I j 3 , , k r L1 + O as j oo, f n ( z ) + O in mean on [O,l). However, f,(z) does not converge t o zero pointwise because for every x E [0, I), there are infinitely many n for which f,(z) = 1. Therefore, f,(z) does not converge to anything at any point of [0,1). See Figure 1.6.
FIGURE 1.6. Graph of f n ( x ) of Example 1.36 for 1 5 n 5 12.
1.2.5 Mean-square ( L 2 ) Conuergence Definition 1.37. T h e sequence {~,(X)),~~N converges in mean-squa.re to t h e function f (x) o n a n interval I 2f lim n--1s
If,(l) - f(5)12dz
= 0.
W Pvrnte f,(x) + f (x) i n mean-square on I as n + m. Mean-square convergence is also referred to as L~ convergence because f, (x) + f ( z ) i n mean o n I as n + cc is equivalent to the statement that limn,, IIj, f 112 = 0.
20
Chapter 1. Furictions and Convergence
T h e series Cr=,f,(x) = f ( x ) i n mean-square o n I zf the sequence of partial N f n ( 5 ) converges in mean-square to f ( x ) o n I . s u m s s N ( x )=
The proof of the following theorem is left as an exercise (Exercise 1.48). The proof of the first part is similar t o the proof of Theorern 1.34 and the proof of the secvilcl part ~llakesuse of the Caucl~y-Schwarziriequality (Theorem 1.8).
Thcorcm 1.38. (a) If fTL(:c) + f (x) i n La o n a finite interval I , then f n ( x ) + f ( x ) in L~ o n I. ( b ) If f,,(x) I.
+f
( x ) in L~ o n n finite interval I , then f,,(z-)
+f
(x) in L' o n
Remark 1.39. (a) The concl~isionof Tlleorerr~1.38(a) is false if Lm convergerice is replaced by poiritwise convergence. Example 1.33(c) shows a sequence that converges t o zero poirltwise on [O, 11 hiit not in L1 on [r), 11. By Theorem 1.38(b), the sequence does riot coriverge in L2 either, for if it did, then Tl~eorerr~ 1.38(b) would imply that it also converged ill L1.
(b) T11c corlclusion of Theoreni 1.35(a) is false if I is an infinite interval. For exarnple, if f;,(z) = ( 1 / f i ) X,03,,l(z), then f,, (x) + 0 uniformly on R, .rX! tit , 1 j ( ) - 012 tlz = 1 for all iz, so that f (x)does not converge to 0 in L2. (c) Tllc co~iclusionof Tl~eorern1.38(b) is false if I is an infinite interval. For example, if f ,,(x) = (1In) X,,,, ,,](z), tlien since
as n
+ m, f, (x) + 0 in L2 on R. but
for all n so that f,,(z) does riot converge t o 0 in
LI
on R.
(d) The converse of Theorem 1.38(a) is false. To see this, consider Exarnple 1.33(b). In this exarnple, it car1 be shown directly that f,,(z) + 0 in L2 on [ O , 1 ] (Exercise 1.49). However, f,,(z) does not converge to 0 in L" on [o: 11. (e) The converse of Theorerri 1.38(b) is false. To see this, let f,,(z) = J n X ( ~ , l / n(x). ] The11
1.2. Convergence of Sequences of Functions
as n
+ co so that
21
f,,(x)+ 0 in L1 on (0.11. However,
for all n so that f,, (x) does riot converge t o 0 in L2 or1 (0,1]
( f ) Finally, note that Exarriple 1.36 shows that L2 convergence does not irnply pointwise convergence since the sequence defined there also converges to 0 in L~ on [0, 1).
I . 2.6 Interchange of Lim,it.s and In,tegro,l,s A problerrl that we will encounter frequently in this book is the following. Slippose that a sequence of functions { f,,( x ) ) ,on~ an ~ interval ~ I converges in some serise described in one of the previous four subsections t o a fiirlction f (z)on I. Under wliat coiiditiorls is it true t,hat
Silicc we call write f ' ( z ) = lim,,,,
f , , ( . r ) .the above call t)e rewritt,erl as
and this problern is often stated as: W h e n can 7ue e3:chnnge the limit and
the integral? Tlie rliost typical forrn in which this problcrrl nriscs is wllcil the sequence is a sequence of partial surns of a series of fiinctions. In tliis case, thc cquivtilent question is: W h e n cart we intc:y.atc? a series c!f ft1,nctions termhy-term,? To see this, recall that the integral of an finite surri of functiorls is tlie sllrrl of the integrals. so that if sN(z)= C , , = , f,,(.x), then
J
If we could interchange tlie limit and the integral in this case, we would
lim
n=l
N+m
SN
(x)dx
Chapter 1. Functions and Convergence
22
The following theorem gives several conditions under which interchanging the limit and the integral is permitted. Theorem 1.40. (a) If f , (x)+ f (x)in L I
on I , t h c n
(h) If f,, (x)+ f (x)i n La o n
(c)
a, finite
V fTl(x)+ f (x) i n L~ o n a finite
interkvnl I , then
inteninl I , t h e n
dx =
lirn rt+m
Proof: (a) Let
.l;,(:c) + f
(x) in L'
011
1
f (x)dr.
I . Then
(b) By Theorem 1.34, if f,,(z) -+f (z) in L" on I, then it also converges in L1. Then the result follows frorri part (a). (c) By Tlleorenl 1.38(b), if f,,(z) + f ( z ) in L~ 011 I, then it also converges in L1. The11 the result follows from part (a).
If I is an infinite interval, then the cor~clusionsof Theorems 1.40(b) and 1.40(c) are both false, as can be seen by considering the example given in Rernark 1.39(c). In this example, f,,(z) + 0 both in LOC and L2 011 10, m). However, since JI f,, (x) d z = 1 for all n,
However, in the case of infinite intervals. we can prove a useful theorern by making an additional assumption on the sequence { f, ( x ) ) , , ~ ~ .
1.2. Convergence of Sequences of Functions
23
Theorem 1.41. Suppose that for every R > 0, f,, (x)3 f (x) in L" or in L' o n [-R, R ] . T h a t i s , ,for each R > 0 ,
I f f (z)i s L1 on a n interval I and 2f there i s a function g(x), L' o n I , such that for all x E I and all n E N , I f , (x)1 I g(x), then
Proof: If I is a, finitmeinterval, then there is nothing to do by T h e e rem 1.40(b) and (c), so wc Inay assume that I is infinite, and for convenience we will take I = R. By Theorem 1.40(a), it will be sufficient t o prove that f,, (z) + f (z) in L1 on R. Let e > 0. Since f (x)a,nd g ( z ) are L' on R, by Theorem 1.5, there is a number R > 0 such that
If
(o.)( d x
< 1/3
and
/
J,q(x) 1 di < r/3.
rl>R
Therefore, usirig the triangle inequality for the L1-norm (Exercise 1.18(c)),
By 'l'heorem 1.34 and Theorem 1.38(b),if f,,(x) + f (z) in LtX or L~ on [-R, R], then it also converges in L1 on [-R, R]. That is,
Hence, there is an N such that if n
LR1 R
fn
Therefore, if n
> N, then
> N, then
(x) - f ( x i I dz
< d3.
24
Chapter 1. Functions and Convergence
and (1.7) follows.
Next we present a variant of Theorem 1.41.
+ f (x) in L" o r in L~ and a n N E N such that for all
Theorem 1.42. Suppose that f o r every R > 0, f,,(x) o n [-R, R]. If for every n 2N,
t
> 0 , there
is an R
>0
Then
Proof: The proof is the same as that of Theorem 1.41, except that we choose R > 0 and N E N such that for all n 2 N.
Then (1.8) hecomes
from which (1.9) follows.
Exercises Exercise 1.43. Prove each of the statements made in Example 1.25. Exercise 1.44. Prove Theorem 1.28. Exercise 1.45. Prove Theorern 1.29. Exercise 1.46. Prove each of t,he claims made in Example 1.31. Exercise 1.47. Prove Theorern 1.34. Exercise 1.48. Prove Theorem 1.38.
1.2. Convergence of Sequences of Functions
25
Exercise 1.49. Prove that if f,(x) is defined as in Example 1.33(b),then f,(x) 4 0 in L~ on [ O , l ] . Exercise 1.50. (a) A sequence of functions { f , ( ~ ) ) ,defined ~ ~ on an interval I is said to be unzformly Cauchy on I if for every E > 0, there is an N > 0 such that if n, m 2 N then 11 f n - f,lI, < E . Prove that any sequence that converges in L" on I is uniformly Cauchy on I.
(b) A sequence of functions { f n ( x ) j n E Ndefined on an interval I is said to be L1 Cauchy on I if fur every E > 0, there is an N > 0 such that if n , m 2 N , then 11 f, - f,lll < E for all x E I. Prove that any sequence that converges in L1 on I is L1 Cauchy on I . (c) A sequence of functions { f n ( x ) j n E Ndefined on an interval I is said to be L2 Cauchg on I if for every E > 0, there is an N > U such that if n , m 2 N , then I(f , - f, ( I z < E for all x E I. Prove that any sequence that converges in L2 on I is L~ Cauchy on I.
Chapter 2 Fourier Series 2.1 Trigonometric Series 21.1
Periodic Functions
A fur~ctionf (x) defined o n R has period p > 0 Lf f ( r+ p ) = f (x) for all x E R. Such a function is said t o be periodic.
Definition 2.1.
Remark 2.2.
(a) Tlie functions sin(x) arid cos(x) have periocl 2n. The functions sin(ax) and cos(a,z), a > 0. have period 27rla. (b) If f (x) has period p > 0. it also lias period kp. for k E N. IIeilce a periodic function car1 have many periods. Typically the sirlallest period of f (x) is referred to as the pernod of f ( x ) .
Definition 2.3.
Given a function f(z)on R. n.n,d periodization o f f (x) is defined as the fun,ction
0,
number. p
> 0 , the
p
provzded that the sum makes sense. See Figure 2.1
Remark 2.4. (a) It is easy to verify that in fact the function f,,(x) lias period p by nlakirig a change of sunimatiori index in the sun1 on the right side of (2.1). Specifically,
where we have made the change of suinnlatiorl index n e n + 1.
(b) If f (z) is conlpactly supportcd, then the surri in (2.1) will converge poiritwise on R. This is because for each x the slirn will have only finitely riiaiiy terms. (c) If f (x) is supported in an interval I of length p. then f,(a.) is referred to as the period p extension of f (x). This is because for z E I. f,(x) = C,LEz f ( x + np) = f (x) since all terms in the sum besides the n = 0 tern1 are zero. (Whys?)Another way of thinking of this is that we a,re taking ilifi~iitel~ Inany copies of the fuiictiorl f (.c) and placing thein side-by-side on the real line.
28
Chapter 2. Fourier Series
FIGURE 2.1. Top Left: Graph of f (x). Top Right: Graphs of f (z + n p ) for -2 n 2 and p = 1. Bottom: Graph of the 1-periodization of f (z).
< <
Definition 2.5.
Given a
> 0,
the collection of functions (2.2)
{e2711r'T'a )nt~
ss called the (period a ) trigonometric system.
Remark 2.6.
+
(a) Recall Euler's formula: ei.' = cos(x) i sin(x). This formula can be proved by expariding both sides of the equation in a Taylor series (Exercise 2.20). Therefore
and it follows from this that each element in the trigonometric system has period a.
(b) The period a trigonometric system is sometimes given in the form
Systems (2.2) and (2.3) can be obtained from each other by forming simple
2.1. Trigonometric Series
29
linear combinations. Specifically, for n E Z,
and for n E N,
and
(c) A function that can be written as a finite linear combination of elements of the (period a) trigonornctric system is called a (period a ) t r i g o n o m e t ~ i c polynomial. That is, a trigonometric polynorrlial has the forni
for some h1,N E Z and some coefficients c ( n ) .
Theorem 2.7. T h e period a trigonometric s y s t e m ( 2 . 2 ) satisfies the following orthogonality relations:
Proof: Exercise 2.22. Remark 2.8. Note that since the functions e2"inxl" a11 have period a. the integral in (2.4) can be taken over any interval of length a. For example,
A fundamental problem in Fourier series is the following: Given a function f ( x ) with period a
> 0. can we write
,for some choice of coeficients { ~ ( n ) )? , , ~ ~ This problem leads t o three related questions that will be answered in the following subsections:
30
Chapter 2. Fourier Series
( a ) I n order ,for. (2.5) to hold, what m u s t the coeficients c ( n ) be? (b) Assuming we know the answer to question (a), in what serlse doea the s e ~ i e so n the right side of (2.5) converge? (c) Assurniny we know the answers to que.stions ( a ) and ( b ) ,does the series o n the right of (2.5) converge to f (x),or to some other funct,io*n,?
2.1.3 Let
The Fourier Coeficients
11s begin
hy answering question (a) above.
Definition 2.9.
G i v e n a function f ( x ) w i t h period a , the Fourier coefficier~ts o f f ( z ) are defined by
provided that those zntegrals m a k e sense. For example. i f f ( - r ) zs L' on [O. a ] , t h e n the integral in (2.6) converges for each 7 1 .
Remark 2.10. Tlie definition of the Fourier coefficients of a function f (x) is by no means arbitrary. In fact we are essentially forced to define them that way by the followilig a r g u m ~ n l . Sr~ppusethat in fact f(x) = C r l E c(7L) Z e2ai,1x/u. T1lei1 in light of' Theorern 2.7. for rrr 6 Z fixed,
since by (2.4), the only nonzero terni in t,he sum is the rz = m term. Note that the above argument is not a rigorous proof since we interchanged an integral and an irifiriite surn without having any idea liow or even if the slim converged. However. the argument is sufficient motivation for defining the Fourier coefficients as in Definition 2.9.
Definition 2.11.
G w e n a function f ( z ) with period a , L' o n [O, a ] , the Fourier series associated witahf ( z ) is defined as the formal series
where the c ( n ) are defined b y ( 2 . 6 ) . W e refer to (2.7) as a 'fformal series" since w e d o n o t yet know how or i f the series converges. W e write
2.1. Trigonometric Series
31
Remark 2.12. It is possible to rewrite the Fourier series of a function in terms of the real trigonometric system defined by (2.3). To see this, note that
Conversely, a series of the form
can be rewritten as
where
Example 2.13. (a) Let f (x) be the period 2 extension of the f ~ ~ n c t ~ i o n X,-1/2,1/21 ( 2 ) .The Fourier coefficients of f (x) are
=
i
0 if n is even, n # 0, 1 -( - I ) ( ~ - ' ) / ~ if n is odd, nn 1 if n = 0. 2
32
Chapter 2. Fourier Series
The Fourier series associated to f ( x ) is
See Figures 2.2 and 2.3.
(b) Let f ( x ) be the period n extension of the function x X(,,,)(x). Then
and for n
# 0,
Therefore,
f(x,
-.+,
i
C I,
e2inx =
~ E Z
-
C sin (n2 n x ) LEN
See Figure 2.4. (c) Let f ( x ) be the period n extension of the function x X(-,/2,,/2)(x). Then c(O) = 0, and for n # 0, c(n,)= ( - l ) n1;n/2n8, so t,ha,t,
(d) Let f ( x ) be the period 2n extension of the function 1x1 Xc-,,,,
1 . 4
( 2 ) .Then
Convergence of Fourier Series
Definition 2.14. A function f (x) o n a finite inten)u,l I i s piecewise differentiable o n I i f ( a ) f (x)i s piecewise continuous o n I with only j u m p discontinuities (zf a n y ) , ( b )f f ( z ) exists at a21 but finitely m a n y points in I and ( c ) f ' ( z ) i s piecewise continuous o n I with only jump discontinuities (zf a n y ) . A function f (z)i s piecewise dzfferentiable o n a n infinite interval I if i t is piecewise differentiable o n every finite subinterval of I .
2.1. Trigonometric Series
33
FIGURE 2.2. Top left: Graph of f (z) from Example 2.13(a). Top right: Graph of Fourier coefficients of f (z).Bottom lcft: Graph of f (z) from Example 2.13(b). Bottom right: Graph of absolute value of Fourier coefficients of f (z) .
Example 2.15. on I .
(a) Any function C1 on I is also piecewise differentiable
(b) If I is any finite interval, then the function Xr(x) is piecewise differentiable on any i~ltervalJ with I C J . (c) The tent function Bl(x) is piecewise differentiable on R because it is linear on the intervals (-m, -I), (-1, O), (0, I ) , and (1, GO).
(d) Any piecewise polynomial function is piecewise differentiable on R. The following convergence result is due to Dirichlet.'
Theorem 2.16.
>0
and is piecewise diflerentiable on R . T h e n the sequence of partial sums of the Fourier series (Dirichlet) Suppose that f (x)has perzod a
'The proof of Theorem 2.16 will not be given here but can be found for example in Walker, Fourier Analysis, Oxford University Press (1988), p. 19 (Theorem 4.5) and p. 48ff.
34
Chapter 2 . Fourier Series
of f (z),{SN( ~ ) } N E Nwhere ,
-
converges pointwise to the function f (z),where
FIGURE 2.3. Partial sums S N ( X )of the Fourier series f (z) from Example 2.13(a). Top left: A; = 10, top right: N = 20, bottom: N = 60.
-
Note that F ( a ) = f ( a ) if f (z) is continuous at x = a and that f ( a ) is the average value of the l e f t and right-hand limits of f ( x ) at x = a when f ( x ) has a jurrlp discontinuity. If we assume that f (z) has 110 discontinuities, then we can make a stronger statement as in the following heo or em.^
Theorem 2.17.
Suppose that f (z) has period a
>
0 and is continuous and
2 ~ h proof e of Theorem 2.17 can be found in Walker, Fourier Analysis, Theorem 4.4, p. 59.
2.1. Trigonometric Series
35
FIGURE 2.4. Partial sums S N ( Z )of the Fourier series f(x) from Example 2.13(b). top left: N = 10, top right: N = 20, bottom: N = 60.
piecewise dzfferentiable o n R. T h e n the sequence of partial s u m s S N ( X )gzuen b y (2.8) converges t o f ( z ) i n L" o n R.
What if the function f (x) is continuous but not piecewise differentiable? What can be said about the convergence of the Fourier series of such a function? It is by no means obvious that such functions exist, but they do. The most famous example is due to Weierstrass, who constructed a function continuous on R but not differentiable at any point of R. This function is defined by f (x) = CrLCN 3-n ~ o s ( 3 ~ xThe ) . Weierstrass AT-test can be used t o show that this function is continuous, but the proof that it is nowhere differentiable is hard.3 By the Weierstrass M-test, the Fourier series of the Weierstrass function converges uniformly on R. However, this is not the case for all periodic functions, continuous on R. The following theorem is due to ~ u ~ o i s - ~ e ~ r n o n d . ~ 3An example of a continuous, nowhere differentiable function similar t o the Weierstrass function, together with a very readable proof, can be found in Korner, Fourier Analysis, Cambridge University Press (1988). Chapter 11. 4 ~ x c e l l e n texpositions and proofs of this theorem can be found in Korner, Fourier Analysis, Chapter 18. and also in Walker, Fourier Analysis, Appendix A.
36
Chapter 2. Fourier Series
Theorem 2.18.
(DuBois-Reyrnoud) There exists a function f ( : E ) continuous on. R. n.nd e~rifhperiod 27r .ssuch that the Fourier serzes o f f (x) diucrgcs at .r = 0 . T h a t is, lim~,,
SN(O)does not exist where S N ( X )is given by (2.8).
In fact. it is possible to find a continuous, period 27r function whose Fourit>r s e r i ~ sdiverges at every rational nlultiple of 27r.5 Therefore, it is iinpossil~leto make the st,ateiiieilt that the Fourier series of every coiltiliuous fuilctioii coilverges pointwise t o tliat function. The ~ i e x ttheoreill, Theorem 2.19, is due to Fejitr and makes a geilera,l st,ate~nentahout the convergence of the Fourier series of a continllolls function. The idea behind Fejbr's Thcorcrn is tlie following. Instead of looking at tlw part,iixl sums (2.8), coilsider the arithmetic m e a n s of those part)ial suins; that is. coi~sicterthe sequence
It is oftell the case that when the corivergence of a sequence fails due t o oscillatioli ill the terms of the sequence, the arithmetic rrleans of the sequence will have better convergerlee behavior. Take the simple exariiple of the sequeiice { ( ~ ( n ) } ,where ~ ~ . a ( n ) = (- 1 ) " . Clearly lim,,, a ( n ) does not exist hecause the t,erms sirrlply oscillate back and forth between 1 and -1. Homrever. if mre coiisider the sequence of aritllrrletic means, { ~ ( T z ) } ~ ~ ~ . given 1 ) ~ .
so that linl,,, ~ ( 7 1= ) 0 (Exercise 2.25). If t,he original sequence { ~ ( n , ) } ,already , ~ ~ converges. taking the arithmetic means will not affect the convergence; that is, if lim,,,, a ( n ) = a. then also lim,,,, g ( n ) = a (Exercise 2.26). (Ft:j&r'sTheorerri) Let f (x) be a f7~nctionwith period n > O con,tiri,vovs o n R. and define for each 71, E N the functior~a ~ ( x by ) (2.9), where Sl;(cr) i s give77, by (2.8). T11,enCT.~T (z)converges uniformly to f (x) o n R as N -+ GO.
Theorem 2.19.
"bValkcr. Fourier .Arlal,ysis, Xppcrltlix A.
37
2.2. Approximate Identities
Exercises Exercise 2.20.
Prove Euler's formula: For every x E R, em
=
cos(x) +
i sin(x).
Exercise 2.21.
Prove that for every real nilmher o,,
Exercise 2.22. Prove Theorem 2.7. Exercise 2.23.
Prove each of the statements made in Remark 2.12.
Exercise 2.24.
Prove each of the statements made in Example 2.13.
Exercise 2.25.
Show that if a ( n ) = n
1 ~ ( n =) - x a ( k ) = n
n E N, then 0 -l/n
k=l
Exercise 2.26.
Show that if lini,,, where a ( n ) is given by 2.10.
2.2
a(n)
if n is even, if n is odd.
=
a , then liin,,,
a(n)
=
a,
Approximate Identities
The notion of an approximate identity or summability kernel is used extensively in all branches of analysis. The idea is t o make precise the notion of a "delta funct.ionn that is well known and widely used by physicists. engineers, and mathematicians. The delta function, 6(z),has the property that for any continuous function f (x),
or more generally,
f (t) d(x
-
t) d t = f (4
for. every z E R. From sorrle elernenlary considerations he reader. may fill in the details), any function b(t) satisfying (2.11) must satisfy, b(t) = 0, t
#0
and
b(t) dt = 1.
38
Chapter 2. Fourier Series
It is impossible for any ordinary function to satisfy these conditions since the Riemann integral of a function, f (z), vanishing at every 17: # 0 must be zero. This must be true even under more general definitions of the integral (such as the Lebesgue integral). Therefore, 6(t) is not an ordinary function. So the question remains: How are we t o make sense of this concept? There are two ways to do this.
1. Extend the definition of function. This has been done by L. Schwartz who defined the notion of a distribution or generalized f u n ~ t i o n . ~ 2. Approximate the delta by ordinary functions in some sense. This more elementary approach has its natural completion in the theory of distributions alluded to above, but can be understood without any advanced concept,s. The idea is t o replace the single "function" d(t) by a collection of ordinary functions {KT(t)},>" such that for every continuous function f (z),
and more generally,
where the limit is interpreted in some sense and described in Section 1.2. The purpose of this section is to explain the theory of approximate identities.
2.2. I
Motivation from Fourier. S e ~ i e s
In order t o further motivate the notion of an approxirrlate idenlily, let us consider how one might prove Theorems 2.16 and 2.19.
Definition 2.27. For each lc E N , and a > 0, define the Dirichlet kernel Uk (x) b y k
D&)
=
C
,2nimx/a.
(2.12)
m=-k
See Figure 2.5. good exposiliorls uf this theory can be found in Horvath, A n introduction t o distributions, The American Mathematical Monthly, vol. 77 (1970) 227-240, and Benedetto, Harmonic Analysis and Applications, CRC Press (1997).
2.2. Approximate Identities
Theorem 2.28. For each k
E
39
N , and a > 0, the Dirichlet kernel, D k ( x ) , can
be written as Dk(x)=
+
sin(-ir(k l ) x / a ) sin(~x/a) '
and for any period a function f ( x ) ,
Proof: Eqlxakion (2.13) is a,n exercise (Exercise 2.38) and reqi~iresonly t,he formula for summing a geometric series. As for equation (2.14),
Sk ( r )
=
C c ( n )e2K"x/U
The result follows by making the change of variables t F+ x - t in the above int,egral a.nd remembering t,ha.t bot,h Dk (a)a.nd f (z) ha.ve periocl a.
Definition 2.29.
For each n E N , and a > 0, define the Fejhr kernel F,(x)
bY n- 1
CD~(X).
1 ~ ~ (= x ) k=O
(2.15)
See Figure 2.6.
Theorem 2.30. For each n E N , and a > 0 , the Fej6r kernel, F,(x),can he written as
and for any period a function f ( x ) ,
:La
a ( n )( x ) = -
f
(X -
t ) F,(t) d t .
Proof: Equation (2.16) is an exercise (Exercise 2.39) and requires only the formula for summing a geometric series and some manipulation.
40
Chapter 2. Fourier Series
FIGURE 2.5. The Dirichlet kernel D k ( x ) (2.14) for a
= 1.
Equation (2.17) is also a,n exercise (Exercise 2.40), ancl the derivation is similar to (2.14).
17
From Theorems 2.28 and 2.30, we see that the proofs of Theorems 2.16 (Dirichlet) arid 2.19 (Fejer.) amoul~tto sliowing that
pointwise for every period a f~inct~ion f (z),piecewise differentiable on R, ancl
in LDO on R for every period a function f (x) continuous on R. Such convergence results depend on properties of the sequences { D k ( ~ ) ) k E and N { p ? 2 ( ~ ) ) 7Consideration 26N. of tJhe required properties of these sequences leads to the notion of an approximate identity or summability kernel.
2.2.2 Definition and Elcamples Definition 2.31. A collection of functions {Kr(x)),>o o n a n interval I
=
( - a , a ) ( a = m is permitted) is a n approximate identity or a summability kernel
2.2. Approximate Identities
20
i
i
i
i
i
i
i
i
i
'
O
i
i
i
i
i
I s . ~ ~ j ~ - - j - - - ~ - - i - - - ; - - - ~ - - j - - - ; . . . ~,s-.-j---i-..;..;...j...;.. ..l l l l 16--.]---+...t..]...i...t.--
l
l
I I--
I
I
I
I
I
I
I
I
I
I
I
I
I
I
l
l
l
I
l
l
l
6~~~d~-~l...L..A...l..-l~~~l...l...l...
- 0 5 0 . 4 - 0 3 -0.2 -0.1
0
I 1
l
l
l
01 02 03 04 05
I
I
I
I
I
-05 - 0 4 0 . 3 - 0 2 - 0 1 0
i I
01
i
A
;' :1 . -
i --
41
i
-
-. .
0.2 0.3 0 4 0 5
FIGURE 2.6. The Fej6r kernel F,,(x) (2.14) for
a =
1.
on I if the following conditions hold. (a,) For all
7
> 0,
L
KT (x)d z = 1 .
( b ) There exists M > 0 such that for all
7-
> 0,
S_:
/ K T ( x )dl x 5 M
( c ) For every 0 < 6 < a ,
IKT( x ) 1 d x = 0.
lim ~--t'+.
S
Example 2.32. (a) For 0 < r < 1, let K T ( x ) = 1 / ( 2 r )X(-,,,I. { K T(x))O
Then
Then {h',(x))o<,
( c ) More generally, let g ( x ) be any function L1 on R and supported in (-a, u ) ( u = oo is permitted) such that
42
Chapter 2. Fourier Series
For 0 < 7 < 1, let KT(x) = (117) y ( x / ~ ) Tlier~ . {KT(x))o
7
> 0, let K T ( x )= 7
sin(nx/ ( r a ) ) sin(nx/a)
X(-a,a)
(4-
Tlien { I ( , (x)),,~ is an approximate iclentity oil (-a, a). 111 fact, Kll, (x) = F, (x) ?C(-,.) (x), where F, (z) is the Fej6r kernel defined by (2.16). (e) For
T
> 0, let
Then Kllk(z)= D k ( x )X(-,.,) (x), where D k ( x ) is the Dirichlet kernel defined by (2.13). However, {KT(z)),>o is not, a.n a.pproximate identity on (-a, a ) .
(f) For each r
> 0, let
-1
a,
=
[~:,(1
-
t2)'11 dt]
, and let
Then {K,(z)),>o is an approximate identity on (-1,l). Define L, (n;) = Kll,,(x). Then the sequence { L , ( X ) ) , ~ is ~ referred to as the Landau kernel.
2.2.3 Convergence Theorems The idea behind approximate identities is to investigate the validity of the statement f ( t ) K T ( x- t) dt = lim f T ( x ) = f (x).
,to
Proving (2.18) will allow 11s t,o a,pproxima,te a,rbitrary functions f (n;) by functions of the form f, (x) that often can be made to have desirable properties (see Corollary 2.37). In this subsection, we will consider mostly approximate identities on R as those will be most useful t o us later in the book. Let us first consider pointwise approximation.
Theorem 2.33. Let f (x) be L"
o n R and continuous at the point x = a . Suppose that {K,(x)),>o i s an approximate identity o n R. T h e n f ( t )K T ( a - t ) dt = f ( a ) .
(2.19)
2.2. Approximate Identities
43
Proof: Note first that by a change of variables,
f (t) KT( a - t ) dt =
f (a - t ) KT (t) dt.
By Definition 2.31(a),
Therefore, for ally umber 6
> 0,
Let t > 0. Since f ( x ) is continuous at x = a , there is a 6 > 0 such that if It1 < 6, then If (a) f ( a t)l < c/(2116), where 116 is the upper bound in Definition 2.31 (b). Hence for such a delta, -
Since f (z)is L"
on R,
-
If
(y)l
< 11 f /lo
for all t E R. Hence, for any 6 > 0,
for all y E R. Therefore,
Chapter 2 . Fourier Series
44
as
T
-+ of
0
by Definition 2.31(c). Therefore, there is a then
Hence for 0
TO
> 0 such
that if
< r < 70,
and (2.19) follows. Note that if K T ( x ) = (117)g ( x / r ) for some function g ( x ) , L1 on R and compactly supported, then the assumption that f (x) is L" on R is unnecessary. It would only be necessary t o assume that f (x) is bounded on some small interval containing x = a , which follows from the continuity of f ( T ) at n: = a (Exercise 2.42). Now let us consider uniform convergence. Theorem 2.34.
T,et f(z) he L" and unzformly continuous on R . Suppose is an approximate identity o n R. T h e n that {K,(X)),>~
Proof: Making the same estimate as in (2.20), it is sufficient t o show that given r > 0, there is a 6 > 0 and an TO > 0 such that if 0 < T < TO, then
f (x) f (X t ) / -
-
IKT(t)I
dt
+
.I
If
(z)
-
f (X - t)l lKT(t)l df
< E.
It126
(2.22) As for the first term in (2.22), the uniform continuity of f (x) implies that there is a 6 > 0 such that if It1 < b , then for all x E Ft, f (x) - f (x - t)l < € / ( 2 M ) , where M is the bound in Definition 2.31(b). Hence, a.s in tJheproof of Theorem 2.33,
for all r > 0. As for the second term in (2.22), since f (x) is L"
on R,
2.2. Approximate Identities
which converges to 0 as r that if 0 < r < T O , then
45
+ 0+ for any 6 > 0. Hcncc thcrc is a TC,> 0 such
for all z E R, and (2.21) follows. The condition that f (z)is uniformly continuous on R is satisfied if for example f (z) is C: on R. More generally, it is satisfied if f (z) is Co on R and liml,l,, f (x) = 0 (Exercise 2.43). Lemma 2.35 (which we state without proof) establishes a very important property of functions referred to as the continuity of translation for L1 and Lqunctions. This means that the translate of an L1 or L2 function remains very close to the original function provided that the translation is small enough. Here closeness is measured in the sense of the L' or L2 norm. Continuity of translation is used to prove L1 and L2 convergence using approximate identities.
Lemma 2.35.
(Continuity of Translation) Suppose that f (x) is piecewise con-
tznuous o n R . T h e n the followzng hold.
(a) I f f (z)i s L1 o n R, then
(b) I f f ( z ) is L2 o n R, then
f%Lf(x)-
f ( x - t ) 2 dz=O.
(2.24)
As for L1 or L' convergence in (2.18), the following theorem (which we state without proof) holds.
Theorem 2.36. (a) Suppose that f ( x ) is L1 o n R, and that {K,(x)),,o identity o n R. T h e n
zs a n approximate
r
( b ) Suppose that f ( z ) is L2 o n R, and that {K,(X)),>~is a n approzimate identity o n R. T h e n
46
Chapter 2. Fourier Series
FIGURE 2.7. Illustration of Lemma 2.35. Top Left: Graph of function f (x). Top Right: Graph of f (x) and f ( x - t ) where t = 1/16. Bottom: Graph of If (4 - f ( x - t)l. Area of shaded region is small for small t.
The following approximation theorein is a consequence of Theorein 2.36. Corollary 2.37.
( a ) Let f (x) be L' o n R, and let R such that
E
> 0. T h e n there i s a function g(x), C: o n
llf ( b ) Let f (x) be L~ o n R, and let R such that
E
-
9111 < E .
> 0. T h e n there i s a function g(x), C: o n
Ilf
-
9/12 < t.
Proof: (a) By Theorem 1.5, there is a compactly supported function h(x), L1 on R, such that 11 f - hill < €12. Now, let
> 0 (SCC
Examplc 2.32(b)). Thcrl { K T ( ~ ) ) T > isoan approximatc idcntity on R. By Theorem 2.36(a), T
lim h, (x)= lim r+o+
7+0+
h(t)K T ( x - t)dt = h(x)
47
2.3. Gcncralizcd Fourier Series
in L1 on R. Hence there is a 70 > 0 such that Ilh, - hill < ~ / 2 .Let g(x) = h, (x). That g ( x ) is compactly supported follows from Exercise 3.25, and that g (x) is C0 on R follows from Theorem 3.18. The proof of (b) is similar (Exercise 2.44).
Exercises Exercise 2.38. N-1 r n
'1
CrL=O
-
Prove equation (2.13). (Hint: Recall that for any number
1-r N
~
-
1
Exercise 2.39.
Prove equation (2.16).
Exercise 2.40.
Prove equrttiorl (2.17).
Exercise 2.41.
Prove each of the statements made in Example 2.32.
Exercise 2.42. Prove that if f (x) is continuous at x = a:then there is a > 0 such that If (xjl 5 M for all x E [a - b, a 61. b > 0 and a number
+
Exercise 2.43. (a) Prove that i f f (x) is C: on R, then f (x) is uniformly continuous on R. (b) Prove that if f ( x ) is C0 on R and limlslj30 f (x) uniformly contir~uouson R.
=
0, then f (z) is
Exercise 2.44. Prove Corollary 2.37(b).
2.3 Generalized Fourier Series 2.3. I
Orthogonality
Definition 2.45.
A collection of functions { g n ( ~ ) ) , E L~ ~ , o n a n interval I is a (general) orthogonal system o n 1 provided that (a) / gI n ( X ) S r n ( x ) d x = O L f n j i m , and
Part ( b ) says i n particular that none of the gn(x) can be zdentically zero. Th.e collection { ~ , ( x ) ) , ~ Nis a (general) orthonormal system o n I provided t l ~ u ti L ,is ur1 orthogonal system o n I and
48
Chapter 2. Fourier Series
It is not nccessary that the set {gn(x)) be indexed by N, and in fact we have seen an example (the trigonometric system) that is indexed by Z. In all future examples, the index set will either be specified or will be clear from the context. Whenever a generic system of functions is considered, the index set will be assumed t o be N.
Remark 2.46.
(a) Any orthogonal system can be normalized so that it becomes an orthonormal system. Tha,t, is, if {gn(.x)) is a11 orthogonal system, then we may define the functions
Then the system {&(x)) is an orthonormal system. (b) The Cauchy-Schwarz inequality guarantees that each of the integrals in Definition 2.45 exists as a finite number. That is, since f (x) and g(x) are L2 on I,
(c) Throughout t,he hook, we will use inner product notation to represent the integrals in Definition 2.45. That is, we write for any functions f (x), and g(x) L2 on I,
This means in particular that
Example 2.47.
(a) Given any a
> 0, the collection
is an orthogonal system over [-a, a ] . I t is also orthogonal over [O,2a]and in fact over any interval I of length 2a. The collection
is an orthonormal system over [-a, a ] . I t is also ort'honormal over [O, 2a] and in fact over any interval I of length 2a.
2.3. Generalized Fourier Series
(b) Given any a
49
> 0, the collections
{sin(-irnz/a)},~~
and
{cos(~nz/aj},~~
are each orthogonal systems over [-a, a ] . The collections
arc each orthonormal systems over [ a , a ] . (c) Given a > 0, the collection
is an orthogonal system over [0, a], and in fact over any interval I of length a. The collection
is an orthonormal systcm ovcr [0,a], and in fact ovcr any intcrval I of lcngth a.
2.3.2
Generalized Fourier Series
Definition 2.48.
Given a fun,ction f (z), L 2 o n a n interval I , and a n orthonormal system {gn(x)) o n I , the (generalized) Fourier coefficients, {c(n)) of f (x) with respect t o {g,(x)) are defined by
The (generalized) Fourier series o f f (x) with respect to {g,(z)) is
The fundamental problem is to determine under what circumstances the in the above definition becomes a "=" and, if so, in what sense the infinite series on the right side of the equality converges. It turns out that the most convenient form of convergence in this case is L~ convergence on "N"
T. Theorem 2.49. {y,(z))
(Bessel's inequality) Let f (x) be L 2 o n a n interval I , and let be wrt, outhonownul sysLe,ln U,IL I . T J L ~ ~ I L
50
Chapter 2. Fourier Series
The proof of Bessel's inequality will require the following lemma. Lemma 2.50. Let { g , (x)} be an, orthon.orm.al system, on, o,n, in,ten~o,lI . Then for every f (x),L~ on I , and every N E N ,
Proof: The proof is just a calculation making use of thc orthonormality of
{gn(x>>.
+
C C (f'g n ) (f.Smj
n=l m=l
I
~n ( 2 )gm
( x )d:r
which is (2.28).
Proof of Theorem 2.49: Let f ( x ) be given, and let {gn(x)) be an orthonormal system. Then by Lemrrla 2.50, for. each fixed N € N,
2.3. Generalized Fourier Series
51
Therefore, for all N E N,
Since I (f, g,) 1' > 0 for all n , the partial sums of the series CncN( f ,gn) l 2 form an increasing sequence bounded above by JI 1 f (x)I2d x . Thus the series CnEN 1 ( f , gn)I2 converges so that we can allow N to go to infinity Thus;
which is (2.27). Closely related to Lemma 2.50 is another very important inequality that will be very useful in the next subsection.
Lemma 2.51. Let { g , ( x ) } be a n orthonormal system o n I . Then for every f (x),L~ o n I , and every finite sequence of numbers { a ( n ) ) ~ =, ,
Proof: Let f ( x )be given, and let { g n ( x ) )be an orthonormal system. Then
52
Chapter 2 . Fourier Series
by Lemma 2.50.
We are now in a position t o answer the fundamental question about Fourier series, namely: When is an arbitrary function equal to its Fourier series and in, ~r~h,nf; sense does t h t Fourier series converge? The answer lies in the notion of a complete orthonormal system.
Definition 2.52.
Given a collection of functions { g , ( z ) } , L~ o n a n interval 1 , the span of {g,(x)), denoted span{g,(x)), is the collection of all finite linear combinations of the elements of { g , ( x ) ) . In other words, f (z) E span{g,(x)) zf N and only i f f ( x ) -a ( n )y n ( x ) for some finite sequence {a(n)};=, . Note that N is alwal~sfinite but m a y he arbitrarily large.
E x a m p l e 2.53. (a) Let PI denot,e t,he set of all polynomials on the interval I. Then PI = span{xn),"==,. (b) span{e-2TznLjn,,Z is the set of all period 1 trigonometric polynomials. ( c ) Lel p(z) = (1- 1x1) X r l , l l ( x ) .Then span{cp(x - n)InEz is the set of all functions that are ( i ) continuous on R, (ii) linear on intervals of the form
[n, n
+ l ) ,n E Z, and (iii) compactly supported.
R e m a r k 2.54. (a) For any collection of functions {g, (x)} , span{gn (x)} is a linear space: that is, it is closed under the formation of lincar combinations. Specifically, if {f,(x))E=, & span{g,(x)), then for any finite seN quence { a ( n ) ) L , the function f (x) = a(n)f, (x) is in span{gn (z)) (Exercise 2.61). (b) The definition of span involves only finite sums. Without additional assumptions on the collection {g,(x)}, there is no guarantee that any sum of the form CnEN a(n) g,(x) will converge in any sense. For example, if g,(x) = xn-l for n E N, then the series n! x r V o e s not converge except at x = 0, and the series Cr=o 2-n x n does not convcrgc if 1x1 2. See also Theorem 2.55 below.
E,"&
>
(c) Related to the notion of span{g,,(x)) is the notion of the mean-square (or L') closure of span(g,(x)), denoted @ZTi{g,,(x)} which is defined as follows. A function f (x) E SjZZTi{g,(x)) if for every E > 0, there is a function g(z) E span{gn(x)) such that 11 f - gllz < E .
2.3. Generalized Fourier Series
53
As a partial answer to the question of when finite sums can be replaced by infinite sums, we have the following theorem. Theorem 2.55. Lct { g , , ( x ) )bc a n orthonormal s y s t e m on a n zr~tervalI . T h e n a function f (z), L~ on I , i s in SjZiiT{g,(z)) i f and only i f
Proof: (+)
(2.29) is equivalent t o the statement that
Therefore, given
E
> 0,there
is an N
> 0 such that
and f (x) t w { g n (x)).
(==+) Suppose that f (x)E m { g , (z)), and let c > 0.Then by definition there is a finite sequence {a(n)):l,, some NO t N, such that
Since
is a, decrea,sing secliience (Exercise 2.63), it follows t,hat for every N
> No,
54
Chapter 2. Fourier Series
and (2.29) follows. If every function L2 on I has a representation like (2.29), then we say that the collection { g , ( 2 ) )is complete on I. This means that every furiction L~ 011 I is equal t o its Fourier series in L2 on I.
Definition 2.56. Let {g,,(x)) be a n orthonormal system o n I. T h e n {g,(x)} is complete o n I provided that e v e q function f (x),L' o n I, is i n m { g , ,( x ) } . A complete orthonormal system is called a n orthonormal basis. The next theorem gives several equivalent criteria for an orthonormal system to be complete.
Theorem 2.57.
Let {g,(x)} be a n orthonormal system o n I. Then, the ,following are equivalent.
(a){gn(x)) is complete o n I ( b ) For ever9 f (x),L ' on
I,
( c ) Every f u n c t i o n f (z), C: o n I , i s i n span{g,,( z ) } .
(d) For every function f ( x ) , C: o n I,
Remark 2.58. (a) Note that Theorem 2.57(c) is precisely the definition of completeness but with C: functions replacing more general L2 functions. It is often easier t o work with cont,inuous compactly supported filnctions, and the theorem states that this is sufficient.
(b) Theorern 2.57(d) says that Bessel's inequality is an equality for conlplete orthonormal systems. This eyualily is referred t u as Pluszcherel's FOI-mula.
Proof of Theorem 2.57: (a) of Theorem 2.55.
(b). This follows exactly as in the proof
(a) + (c). This follows immediately from the fact that every furiction C: on I is also L2 on I. (a) +== (c). Let f (a)be L~ on I, and let E > 0. Then by Corollary 2.37 there exists a function g(x), C: on I such that I f - 9112 < 612. By (c).
2.3. Generalized Fourier Series
55
there exists N E N such that
Applying Minkowski's inequality, we obtain
Therefore,
so that f (x) E span{g,(x)) and (a) follows. (c)
* (d). By Theorem 2.55, (c) holds if and only if
for all functions f (x) C: on I. But by Lcmma 2.50,
Therefore, (c) is equivalent t o the statement that
and (d) follows. To illustrate an application of this theorem, we will prove the following result about trigonometric Fourier series.
Theorem 2.59.
T h e trigonometric system {e2""x}nEz is complete o n [ O , l ] .
Proof: Wc will usc Thcorcm 2.57(c). To that end, let f (x) be continuous on [0,1] (note that it is also compactly supported), and let E > 0. By
Chapter 2. Fourier Series
56
Exercise 2.65, we call find a functioil f ( x ) that has period 1, is C' on R, and such that - 7112 < t/2. By %er's Theorem (Theorem 2.19), G N ( z ) converges in L" on R t o as N i m, where
If
S(2)
and
1
j ( t ) e-2Tmxdz.
c(n)=
By Theorem 1.38(a), ZN(z) also converges t o Note that
-
f (z) in L~ on [0,11.
(Exercise 2.66). Therefore, for N large enough
arid by the triangle inequality,
But the function
2
(1
-
y)
11") 1.2"znx
n=-N
is in
s p a n { e 2 ~ z r ~ z }nEZ
Hence, f (z) is in span{e2Tinz}n,Ezand by Theo-
rem 2.57(c), the trigonometric system is cornplete on [O, I].
Exercises Exercise 2.60. Prove that if {g,(x)} is an orthonormal systerrl on an interval I and if {a(n));=, is any finite sequence of numbers, then
2.3. Generalized Fourier Series
57
Exercise 2.61. Prove that if {g,(x)) is any systerrl of L2 ~ U I I C L I O I ~Llle11 ~, span{g,(x)) is a linear space (that is, it is closed under the formation of linear combinations, see Remark 2.54 (a)). Exercise 2.62.
f (x)E
Prove that if {g,(x)) is any system of L2 functions, then
(x)) if and only if there is a sequence of fiinctions {fk(z)}
such that fk(2) E span{g,(x)) and such that limk,, 11 f - f k ( I 2 = 0. (Hint: For the "only if" direction, choose f k ( x ) E span{g,(x)) such that 11 f
-
fkll2
< Ilk..)
Exercise 2.63. Prove that if {g,(x)} is an orthonormal system on an interval I, then for any f (x), L2 on I, the sequence
is
il,
decreasing sequence. (Hint: Use Lemma 2.50 or 2.51.)
Exercise 2.64. Show that if {g, ( x ) ) is a complete orthonormal system on an interval I, then (2.30) holds for every f (x), L' on I . Exercise 2.65. Let E > 0, and let f (x) be C0 on [0, 11. Then there is a function f(x) that has period 1, is C0 on R, and such that (J: 1 f d ~ ) ' /<~e. (Hint: For 6 > 0 sufficiently small, you can construct r ( x ) by modifying f jx) only on the interval [l - 6, I ] , and then extending periodically.)
fi2
Exercise 2.66.
Prove equation (2.31).
Exercise 2.67. Let {g,(x)) be a complete orthonormal system on an interval I. Show that the Fourier series of any function f (x), L2 on I, can be integrated term-by-term in the following sense. For any numbers a < b, such that [a,b] I,
(Hint: If the sum converges in 1,' on I , it converges in L2 on [a,b]. Then use Theorem 1.40(c).)
Chapter 3
The Fourier Transform 3.1 Motivation and Definition We have seen that if f (x) is a function supported on an interval [-L. L] for some L > 0, then f (x) can be represented by a Fourier series as f
=
C
e27Mx/2L)
where
l
2L c(n)= -
S_, f ( t ) L
e-"7t(n/2L) d t .
,"
(3.1)
Of course the Fourier series actually equals the 2L-periodization of f (x) (Figure 3.1). What happens to this representation if we let L + co? In order to answer this question, define for each L > 0 and each integer
- 1;
n the number
T((nl2~)
f (t) c-~""('"'")
dt: A
so that f^(n/2L) = (2L) c ( n ). If we were to plot the numbers { f (n/2L) inEz for very large values of L , then the resulting graph would begin to resemble a function of a continuous variable on R (Figure 3.2). This function would naturally be defined by
In addition, we could also write
for large L, since the last sum is a Riemann sum for the last integral. Therefore, we have formally established the duality
fiY)
/
R
f(t) e
dt.
f (x)
-
~ ( 7e2nzx7 ) d7.
(3.3)
The discussion in the remainder of this chapter will focus on two general questions: In what way are the properties of f (x). for example, continuity.
60
Chapter 3. The Fourier Transform
FIGURE 3.1. 2L-Periodizations of a function f (x) assumed to be supported in [-L. L]. Top left: L = 1, top right: L = 2 , bottom: L = 4.
differentiability. int,egral)ility or square-integral~ility. reflect ecl in t h~ corresponding properties of f (y)? and What properties lrl~lstf (s) and f (7) satisfy in order for the *.N" in (3.3) to be replaced by "="'! Let us first make a defiiiitio~~. A
A
Definition 3.1. The Fourier trarlsforrn o f n function f (z). L' on R. i s also n function o n R, denoted f^(-y) defined by
Remark 3.2. The assumption that f (x)is L1 oil R is rrlade in order to ensure that the integral in (3.4) converges for each riuinber 7 . This convergence liolds by virtue of tlie fact that for each 2 E R, we call establish a Cauclly condition on the numbers
La 0
d.r
and
s,
=
f (.r)c - ~ " ~ 'd. ~ . :
cr.
>0
3.1. hlotivation and Definition
61
FIGURE 3.2. Fourier coefficirrrts for the functions graphed in Figure 3.1. Note how the graphs of the sequences begin to resemble the graph of a continuously defined function. as follows. If b
> u > 0, then
<
-
=
lim lim
..e+w
1 dz
- Z . ~ Z ~ X
1,
If(x)ldx
By the completeness property of the complex numbers (Remark 1.22(c)), there exist numbers s+ and s- such t h a t limaico S$ = sf and lim,,, sa S - . Thus, f ( y ) = s+ s-. A
+
A
Example 3.3.
(a) Let
f(x) = Xi-112,1i21 ( 2 ) .Then f ( 7 )= sirl(*7)/~7(Exer-
cise 3.4). (b) Let f (x) = (1 - 1x1) Xl-,,ll (l-). Then
f(y) = sin2( * ? ) / ( T ~ )(Exercise ~ 3.5).
(c) Let f (x)= e-Z"xI.Then f l y ) = l/s(l
+ y2) (Exercise 3.6).
62
Chapter 3. The Fourier Transform A
(d) Let f (z) = e ? T z 2. Then f ( 7 ) = e-"-r2 (Exercise 3.7).
See Figure 3.3.
FIGURE 3.3. Example 3.3. Left: f (x). Right: f^(?).
Exercises A
Exercise 3.4.
Prove that if
f(z) = X[-,,,]
(z), then f
(y)=
sin(2roy) Tiry
3.2. Basic Properites of the Fourier Transform
Exercise 3.5.
Prove that if f (x) = (1 - Ix/al)X[-a.al (x):a
>
63
0. then
A
Exercise 3.6. If f ( z ) = e-"lzI, a
2a > 0, then f (y) = (a,+(2x712) .
= ~ c ( " ~ ) ~(Hint: / ~ Exercise 3.7. If f (x)= ePaz2, a > 0, then f^(?) See, for example, Kammler, A First Course in Fourier Analysis, PrenticeHall (2000) p. 132-133 for the a = 7r case.)
3.2 Basic Properties of the Fourier Transform In this section, we will present two basic properties of the Fourier transform of an L1 function. A
Theorem 3.8. I f f (z)is L' on R, then f (?) is unzformly continuous on R. Proof: Given 71: 7 2 E R,
1fix)l
e-2""~1-~z)x
-
1( dx.
Note that the last term depends only on the difference yl - 7 2 and not on the particular values o l yl a ~ 7~2 . dHer~ceto show unzform continuity on R. it is enough to show that P
We will use Theorem 1.41 t o do this. Since 1 - 1 15 2 ,
If
(x)1 le-2"iax
-
11 5 2 If
(XI/,
and 2 1 f (z)l is L1 on R since f (x)is. By Taylor's Theorem, given any A > 0,
for some [ t [-A, A] and all x t [-A, A].Therefore,
.
64
Chapter 3. The Fourier Transform
for all x E [-A, A]. Therefore, A
1 f (z)J/e-'""""
A
-
11 dx
5 5
Thus, for every A by Theoreni 1.41,
> 0, ( f (x)1 e-2""x
-
lirn 27r)al
a!-0
lim 2 ~ A l a l lfJ111 = 0.
a-0
11
+ 0 in L~ on
[ - A , A ] ,so that
The next Theorem is known as the Riemann-Lebesgue L e m m a and describes the decay a t infinity of f (7). A
Theorem 3.9.
(Riemann-Lebesgue Lemma) I f f (x) is L'
on
R, then
A
IY
lim f ( 7 ) = 0. l--too
Proof: We will present an outline of the proof. The details are left to the reader in Exercise 3.10. S t e p 1. Show that if f (x) = X [ a , b l ( ~ )then , (3.6) holds. This can be done by direct calculation. S t e p 2. Show that if f (x) is a step function of the form
for some coefficients c(n) and intervals [a,, b,], then (3.6) holds. Step 3. Show that if f ( x ) is C: on R, then given E > 0, there is a step function g(x) of the form (3.7) such that 11 f -g(Il < t. Then show that this implies that (3.6) holds for f (x) (cf. Exercise 5.26). S t e p 4. Show t,hat (3.6) holds for any function f (x), L' on R.
Exercises Exercise 3.10. Complete the proof of Theorem 3.9. (Hint: For Steps 3 and 4, use the estimate if^?) 1 5 11 f - 9/11.)
ls(-y)l+
3.3. Fourier Inversion
65
3.3 Fourier Inversion The purpose of this section is to investigate the conditions under which equality holds in (3.3). From the definition of the Fourier transform, we can write
where we have exchanged the order of integration in the double integral. This formal calculation is not valid, strictly speaking, because the integral
does not converge for any particular value of x or t. Nevertheless, this calculation provides a starting point for investigating (3.3). The idea will be to place a "convergence factor" in (3.9) so that it converges for each value of x and t ; that is, we write instead of (3.9),
A
for some function K ( x ) chosen so that its Fourier transform, K ( y ) , forces the integral in (3.10) t o converge and so that equality holds in (3.3) for
K(x). We now obtain
If K ( t ) is some element in an approximate identity, then
which gives us a valid approximate inversion formula for the Fourier Transform. It orlly relllairls lo choose arl approximate identity satisfying the required conditions. There are many valid choices, but a very convenient one is to let ,-.
In this case KT(?) = epm27' (see Exercise 3.7). and the same calculation shows that equality holds in (3.3) in this case, that is, that
66
Chapter 3. Tlie Fourier Transform
It is also easy to see that K,(z) is L' on R for each 7 > 0 and that {KT( x ) ) , , ~ is an approximate identity on R (Example 2.32(c)). Now we are in a position to prove the following theorem.
Theorem 3.11. If f (x) is C O and L1 on R, then for each z
E
R,
Proof: Repeating the calculation in (3.8), we obtain
But since {KT(x)),>o is an approximate identity on R, Theorem 2.33 says that
dt
=
f (4,
for each x E R . A
With an additional assumption on f (y), we can get equality in (3.3) in a pointwise sense.
Corollary 3.12. V f (x) is
SR
for each x 1: R,
CO
and L' on R, a n d iff^(?) i s L' on R, t h e n
f^(y) e2Ti7xdy = f ( : r ) .
(3.14)
Proof: By Theorem 3.11, it will be enough to show that
2
2
But since lim,,o+ e-"' 7 = 1, the proof amounts to justifying the interchange of the limit and the integral in (3.15). This is accomplished using Theorem 1.41 in a similar way to the proof of Theorem 3.8. We leave the details as an exercise (Exercise 3.14). Corollary 3.12 does not cover all the cases that will be of interest to us in this book. For example, in Example 3.3(a), we saw that if f (x) = X~-,,,](x), A
then f (?) = s i n ( 2 ~ a x ) / ( ~ c cIn) . this case, f (x) is L1 but is not continuous, and f^(x) is not L1, though it is L~ (Exercise 3.15). Therefore, neither
3.3. Fourier Inversion
67
Theorem 3.11 nor Corollary 3.12 apply. The answer is to replace pointwise convergence of the limit in (3.13) with L2 convergence. In this case, we have the following theorem.
Theorem 3.13.
If f (z) is L' and L2 on R, and ifT ( y ) i s L' on R, then liln 7+0+
JR~ ( yeTT272 ) e Z n a ydxy = f (x),
(3.16)
Proof: Since f (z) is L1, Theorem 3.18 implies that the function
is continuous on R and Theorem 3.21(a) implies that f T ( x ) is L1 for each r > 0. Since f (z)is L2, Theorem 2.36(b) says that f,(x) i f (x)in L2 on R as r i o+. Therefore,
Since f T ( x ) satisfies all of the hypotheses of Theorem 3.11, it follows that
) e2ni7xdy =
rw
2 y 2 f (?) e2rr-yr A
and (3.16) follows.
d 71
n
Exercises Exercise 3.14.
Complete the proof of Corollary 3.12.
Exercise 3.15.
Prove that the function f (x) = s i n ( ~ x ) / ( ~isz )L2 on R
but not L1 on R.
68
Cl~aptel-3 . The Fourier Transform
3.4 Convolution Definition 3.16.
G i v e n functions f ( x ) and g ( x ) , t h e convolution o f f (x) and g(x), denoted h ( x ) = f * g ( x ) , is defined by
,tuheneve7. the integral m a k e s sense.
Remark 3.17. (a) We have encountered integrals like (3.17) before, namely in thc dcfinition of approxirnate identity. There it was shown that under specific hypotheses on f (x), the integral
is a good approxiirlatiori to f (x) as long as {K,(z))T>o is an approximate identity.
The above observatioil can provide good insight into the action of convolution. Take, for example, the approximate identity defined by (11)
(Example 2.32(a)). In this case, we can see that for any function f ( x ) , the value of f * KT(xo)is just the average value of f (z) on an interval of length T centered at xo. If f (x) is continuous, then these averages are good approxirrlations to the actual point values of f (z). If we consider K T ( z ) = (llr)(1- I x ~ / T ) X,-,,,](x), T > 0, then f * KT(xo) can be interpreted as a "weighted average" of f (z) around the point xo, where points close to zo are given more "weight" than are points further from zo. Thus. the convolution f * g(x) can be interpreted as a "moving weighted average" of f(x),where the "weighting" is determined by the function g(x). See Figure 3.4. By changing variables, it can be shown that convolution is commutative, that is, that f * g(z) = g * f (x) (Exercise 3.22). Then f * g(x) can also be iriterpretecl as a nloving weighted average of g(x),where the weighting is determined by the function f (x). (c) If the function f (x) has large variations, sharp peaks, or discontinuities. then averaging about each point x will tend to decrease the variations, lower the peaks, arid smooth out the discontinuities. In this sense, convolution is often referred t o as a smoothing operation. A more precise statement of this idea is contained in Theorems 3.18 and 3.19.
3.4. Convolution
69
FIGURE 3.4. Illustration of convolution. Top left: Graph of f (x). Top right: Graph of g(x). Bottom: Integral of the product of the solid and dashed function is f * g(1).
Theorem 3.18. If f(x) is L" o n R, and volution f * g(x) is continuous o n R.
zf
g(x) is L' o n R, t h e n the con-
Proof: Given z, y E R,
If * 9 ( x ) - f * 9(y)l
=
li,f s S ,If
=
5
If llm
(t)(9(x - t )
- 9(y -
t))dt
(t)l Ig(x - t ) - 9(y - t)l dt
1 R
9 ( t - ('
-
Y))
-
9(t)l d t .
By Lemma 2.35(a) (continuity of translation for L' functions), lim
if * g(x) f * g(y)I 4 IIf I , -
X+Y
and the result follows.
lim X+?/
j
R
lg(t - (z - Y ) )
- g(t)l dt =
0%
70
Chapter 3. The Fourier. Trarlslurm If f (z)and g ( x ) are both L2 o n continuous o n R.
Theorem 3.19. f
* g ( x ) is
R, t h e n the convolution
Proof: Let e > 0. Then given x, y E R,we calculate as above, but this timc using thc Cauchy-Schwarz inequality,
(1I f
-
(
Idt)
'(
1' 2
9
-
(x - Y ) ) - li(tl12 dt)
R
By Lemma 2.35(b) (continuity of translation for L~ ~ u I I c ~ ~ o ~ ~ B ) ,
and the result follows. We have seen that the convolution of a bounded function with an integrable function and the corlvolutiorl of two L~ functions produces a continuous function. The next theorem addresses the issue of the decay at infinity of a convolution.
Theorem 3.20. ( a ) I f f( x ) and g(z) are both L1 o n
on
R,then the convolution f * g ( x ) is also L'
R,and
llf * 9111 2 Ilf 111 119111. (b) If f ( x ) is L1 o n R,and g ( x ) is L' o n R,t h e n the convolution f L2 o n
(3.18)
* g ( x ) is
R,and llf * 9112 5 llf 111 I d s
( c ) I f f( x ) and g ( x ) are both L2 o n
(3.19)
R,then the convolution f * g ( x ) is Lw o n
R,and
[If * d m 5 Uf ( d ) If f (x) is L"
on Lm o n R, and
R,and
g(z) is
112
119112.
L1 o n R, t h e n the convolution f
IIf * gIIm 5 IIf IIm IlgII1.
(3.20)
* g ( x ) is (3.21)
3.4. Convolution
71
Proof: We will prove (a) and (b) and leave (c) and (d) as exercises (Exercise 3.24). (a) Let f (x) and g(z) be L1 on R. Then
and (3.18) follows. (b) Let f (z)be L1 on R, and g(x) be L~ on R. By the Cauchy-Schwarz inequality,
Therefore,
and (3.19) follows.
Theorem 3.21.
(The Convolution Theorem) If f (x) and g(z) are L' o n R,
then fG-ig(7) =
37).
Proof: Let, f (z) and g(x) be L1 on R. Then
(3.22)
72
Chapter 3. The Fourier Transform f ( t )c~(z- t) e - 2 T i ~dt x dz =
/R IR
Exercises Exercise 3.22. 9
Show that if f ( x ) and g(x) are L1 on R, then f
* g(z) =
* f (4.
Exercise 3.23. Show t h a t under the hypotheses of Theorems 3.18 and 3.19, f * g ( z ) is actually uniform,ly continuou,s on R. Exercise 3.24.
Prove Theorem 3.20(c) and (d)
Exercise 3.25. (a) If f (x) and g ( z ) are compactly supported and L1 on R, prove that f * g(x) is also. (b) If f (z)and g ( z ) are compactly supported and L%n R, prove that f * g(x) is also.
3.5 Plancherel's Formula Theorem 3.26. (Plancherel's Formula) Iff (i-) is L' and L' o n R, t h e n f^(y) i s also L~ o n R and (3.23)
Proof: Define f(z) = f (-z). Then
3.5. Plancherel's Formula
73
where we have made the change of variable x e -z in the last step. Since f (z) is L1 and T , ~on R, so is By the Convolution Theorem (Theorem 3.21),
F(r).
A
A
f * .f'((ri
l2
= F(0)5(7)= I~^((T)) *
y(n:) *7(x)
Since f (z) and are both L1 on R, Theorem 3.20(a) implies that f *F(r) is also L' on R, and since f (x)and f x ( ( r ) are both L~ on R, Theorern 3.19 implies that f is continuous on R . Therefore, we can apply the Fourier inversion formula (3.13) and conclude t,hat for each x E R,
=
IR
f (t) f(x
- t) d t
Evaluating the above equality at z = 0 gives
It remains only to show that in fact,
We will do t,his in t,wa steps.
Step 1. We will show that
then
f^((y)
is L~ OII R by showing that if
74
Chapter 3. The Fourier Transform
contradicting (3.24) in light of the assumption that f (x) is L2. If (3.25) holds, then given any number 111 > 0, there exists a number A > 0 such tl~at
d m ) .But
whenever r > 0 is small enough (specifically, if 0 < r < this is exactly the meaning of (3.26). Therefore, f (y) is L~ on R. A
Step 2. Since f^(?) is L' on R, lf^(y)12is L1 011 R. We leave it as an exercise (Exercise 3.29) to prove that
From this, (3.23) follows.
A related result is the following formula. Theorem 3.27. R, t h e n
(Parseval's Formula) Iff (z) and g(x) are both L' and L~ o n
Proof: Exercise 3.30. One easy consequence of Theorem 3.26 is to sinlplify the statement of the L v o u r i e r inversion formula (3.16). Specifically, we no longer need to state explicitly the llypothesis that f ^ ( ( ~ is ) L2 on R since by Theorem 3.26 tjhis is a,i~t,orna,t,ic given the assumption that f (x) is L1 and L2 on R. Theorem 3.28.
(Theorem 3.13) I f f (z) is L' and L%n R, t h e n
3.6. The Fourier Transform for u unctions
75
Exercises Exercise 3.29. Complete the proof of Theorem 3.26. (Hint: Use Theorem 1.41 and Corollary 3.12.) Exercise 3.30. Prove Parseval's Formula (Theorem 3.27). (Hint: Consider the function g(x) = g ( - r ) , and repeat t,he a,rgllment in the proof of Theorem 3.26 with appropriate modifications.)
Exercise 3.31.
Prove that sin2( t )
dt = T ,
where the first integral is interpreted as
k i sin(t)
dt
= 2 1im )'+a2
sin ( t ) -dt
t
since sin(t)/t is not an Ls function. Hint: Prove the first equality by integrating the second integral by parts, and prove the second equality using Plancherel's Forrxlula and Example 3.3(a,).(See Benedetto, Harmonic AuaI.ysis and Applications, p. 25.)
3.6 The Fourier Transform for L2 Functions Until now, we have been rnaking the assunlption that a function f (x) must be L1 on R in order for its Fourier transform to be defined. This assumption was rnade in order t80gi~a~rantee that the integral in (3.4) converges absolutely for each y. However, we have seen examples that suggest that we need to expand the definition to a larger class of functions. Specifically,
y(?)
f ( L ) is L1 on R, but is not, and in order for equality to hold in both parts of (3.3), we would like to be able to make the statement that if
that is, that
The question is: How do we interpret the integral in (3.29) since it does not converge absolutely?
76
Chapter 3. The Fourier Transform
We have sccn the answer already in Theorem 3.28, which asserts in this case that
in L~ on R. That is, we interpret the nonconvergent integral (3.29) as a limit (in the L~ sense) of convergent integrals. The remaining question is: C a n we do this with any L2 function? The answer is "Yes," but the proof of this assertion is beyond the scope o l this book and involves knowledge of the theory of Lebesgue measure and the Lebesgue integral. We state the relevant theorem for completeness. C L T ~function f (x),L~ o n R, there exists a function Theorem 3.32. Gi~uer~ ,f^(?), L~ o n R (in the sense of Lebesgue), such that
iirn 7+0+
(Z)e-"'
2
2
' e2"77x d s = f ( i )
(3.31)
zn L~ o n R. I n this case, Plancherel's formula holds; that is,
and the Fourier inziersiorz holds in the sense of Theorem 3.28; that is,
3.7 Smoothness versus Decay One of the basic principles of Fourier Transform theory can be loosely stated as follows: T h e smoother f ( z ) is, the more rapidly f^(?) will decay a t infinity, and conversely, the more rapidly f decays at infinity, t h e smoother f (y) will be. There are many ways to measure the smoothness of a given function f (s), but for the purposes of this book, we will measure smoothness of f (x) by counting the number of continuous derivatives it has. We have already seen an illustration of this principle in Theorem 3.8, which asserts that if f (z) is L1 on R (a statement about its decay at infinity), then j^(-y) is uniformly continuous on R ( a statement about its smoothness). In light of the Fourier inversion formula (Corollary 3.12), we can assert that if an L1 function f (z) has an L' Fourier transform (decay of
(XI
A
3.7. Smoothness versus Decay
77
f^(?) at infinity), then f (z) is also uniformly continuous on R (smoothness of f ( x > ) . A more precise statement of this duality starts with the following theorem. Theorem 3.33. (Differentiation Theorem) If f (x) and x f (x) are L' o n R, t h e n f^((r) is continuously differentiable o n R, and
Proof: We wish t o show that for each y,
First, form the difference quotient for
Since
e-2~ihx
f^(?) and calculate.
-
1
lim
= -2nix, h the proof reduces t u justifying the interchange of a limit and an integral. Specifically, we must prove that h-tO
) dx. We will make two estimates on thc quantity ( l l h ) (e-2nihx - 1). First, we expand the function g ( h ) = e-2Tihx about h = 0 in a Taylor series and use Taylor's formula (keeping only one term in the expansion) t o obtain the estimate
Taking now two terms in the expansion, we obtain the estimate
5
I h 2'
-
max
olsih
1
d2 -e
ds2
x
n
2
x
2
(3.37)
Chapter 3. The Fourier Transform
78
Using (3.36), we estimate
By hypotllesis, 277- 1x1 If (x)l is L1 on R. Using (3.37), we note that for any R > 0,
as h 3 0. Therefore, by Theorern 1.41, the interchange of lirrlit and integral is justified and (3.34) follows. The following corollary to Theorem 3.33 can be proved by induction (Exercise 3.37).
Corollary 3.34. f^((y)
If f (z)and x N f (x) are L' o n R for s o m e N E N , t h e n is cN o n R , a n d f o r 0 5 j 5 N ,
We can state ;L partial converse of Theorem 3.33 relating srnoothrless of the Fourier transforrrl of a function to the decay at infinity of the function itself. A
Suppose t h a t f(x) i s L1 o n R, and t h a t following hypotheses.
Theorem 3.35.
( a ) F o r s o m e N E N, (b) B o t h ?(?) and (c)
f^((y)
is
f(?)
cN o n R.
jlN)(?) are L'
f?")(?) or o 5 j 5 N , I Iirn rl-)~
o n R. = 0.
Then Iirn zN f(x) = 0 IX+W
Proof: Consider the function F ( x ) defined by the integral
satisfies the
3.8. Dilation, Translation, and Modulation
79
Integrating by parts N times and using (a) and (c), we obtain
Using ( b ) and the Fourier inversion formula,
Hence F (z)= (2niz) f (x).By (b) and the Riemann-Lebesgue Lernrna (Theorem 3.9), lim,,*, F ( x ) = 0 and (3.39) follows. Finally, we present a theorern relating decay at infinity of the Fourier transform of a function to smoothness of the function.
Theorem 3.36.
If f ( x ) and?(?) are L' o n R and i f T N a y ) is L' o n R for some N E N , the,,&J ( z ) i s c No n R, and for 0 j <: N ,
<
Proof: By the Fourier Inversion Formula (Corollary 3.12), we car1 write f ( x )=
~ ( y~~~~7~ ) dy.
Equation (3.40) follows by repeated application of the argument in the proof of Theorem 3.33. The fact that f ( s )is C N on R follows since ?Jf^(?) is L1 on R and b y applying the argurrlent of Theorern 3.8.
Exercises Exercise 3.37. Prove Corollary 3.34 Exercise 3.38. Prove that if f ( x ) has period 1 and is C K on R, then there is a constant A > 0 such that for all n E Z: Ic(n)l < A J n l - K ,where c ( n ) = J; f (z) e-2T"i"" dx.
3.8 Dilation, Translation, and Modulation Definition 3.39. Given a > 0, the dilation operator, D,, defined o n functions f (x), L' or L' o n R, is given b y
80
Chapter 3. The Fourier Transform
Given b t R, the translation operator, Tb, defined o n functions f (x), L' o r L' on. R, is giuen by (3.42) Tbf(x) = f ( x - b). Given c E R, the modulation operator, Ec, defined o n functions f (x), L I o r L~ o n R, is given b y E , f ( z ) = e2""'" f ( ~ ) . (3.43)
Theorem 3.40.
For any function f (x) L' o n R, A
( a ) For every a > 0 , Daf (7) = ~ 1 / ~ 7 ( 7 ) .
Proof: Exercise 3.44. Remark 3.41. (a) Note that if a > 1, then D, f (x) is a "narrowed down" version of f (x), and if 0 < a < 1, then D, f (x) is a "spread out" version of f ( x ) . Theorern 3.40(a) says that when a function is "narrowed down" by dilation by a > 1, its Fourier transform is "spread out" by dilation by 0 < l / a < 1.
(b) Theorcrrl3.40 says that modulatiori in the time or spatial variable corresponds to translation in the frequency variable. For this reason, modulation is often referred to as a frequency shift or phase shift. Theorem 3.42. (Properties of Dilation and Translation) For every f ( z ) and g ( z ) , L~ o n R, and for e7rery a > 0 , b E R, the following hold. (a) D,Tbf(z) = a"2 f ( a x - 6 )
(e)
( f , DaTb9) = ( T - b D a - l f ,
( f ) (Dof , Dagj =
(f
9
9).
g).
Proof: Exercise 3.45. Theorem 3.43. (Properties of Translation and Modulation) For every f (x) and g ( z ) ,L~ o n R, and for every b, c E R , the following hold.
3.9. The Sampling Formula
(a) TbEcf (x) = e
-27rzbc
(b)
( f , Ecgj
(c)
( f , TbEcg) = e
81
EcTbf (x).
= ( E - c f ,g ) . 27rzbc
(T-bE-,f. g ) .
Proof: Exercise 3.46.
Exercises Exercise 3.44. Prove Theorem 3.40. Exercise 3.45. Prove Theorem 3.42. Exercise 3.46. Prove Theorern 3.43.
3.9 Bandlimited Functions and thc Sampling Formula Definition 3.47. A function f ( x ) , L'
on R, is bandlimited zf there is a number 0 > 0 such that f^(y) is supported in the interval [ - R / 2 , 0 / 2 ] . In this case, the function f (x) is sazd to have bandlirnit 0 > 0. The function f (x) has bandwidth B > 0 if there is an interval I such that II/ = B and such that f (7)is supported in I . A
Remark 3.48. (a) A furlclivn J(x) with bandlimit R > 0 also has bandwidth R > 0. However, in general the numbers are not the same. For example, let f (x) be the function whose Fourier transform f^(?) equals (?). (Whet is f (x) in this case?) Then the bandlirnit of f (x) is 2, whereas the bandwidth is 1.
(b) The bandlimit and bandwidth of a function f (x) are not unique numbers. For example, iff (x) has bandlimit fl > 0, then f (x) also has bandlimit 0' > 0 for any number 0' > fl. Similarly, if f (x) has bandwidth B > 0, then f (x) also has bandwidth B' > 0 for any number B' > B. (c) Intuitively, if f (x) is bandlimited, then f (x) does not contain arbitrarily high-frequency components. The Fourier inversion formula for a function with handlimit fl looks like
82
Chapter 3. The Fourier Transform
That is, f (z) consists only of "frequencies" e2"irz of period 2/R or greater. Thus, one might expect that a bandlimited function would be slowly oscillating and not have any sharp jumps or disconlinuities. In fact, the following theorern holds.
Theorem 3.49. Let f (x) be a bandlimited function with bandlimit R . Then: (a) T h e Fourier inversion formula holds for f (x);that is, for each z E R,
(b) f (x) is C" o n R.
Proof: We will prove (b) first, given the assumptiori that (a) holds. We would like to use Theorem 3.36, since f (x)being bandlirnited iniplies lllat f^(?) is L1 arid that rN is L' for every N F N. However, since we have only assuriied that f (x) is L2 on R and not necessarily L1, we carinot use the theoreni directly. However, if we examine the proof of Theorern 3.36, we see that all that is required is that the Fourier inversion formula hold. Then the argument in the proof of Theorem 3.33 may be applied. But this is exactly (3.44). In proving (a), we agairi run into the difficulty that f (z) has been assumed only to be L2 and not L1 on R. This is certainly not an insurmountable obstacle, but it does require some rather subtle argumentation. According to Theorem 3.32, the Fourier inversion formula holds in the L2 sense for f (x);that is,
y(?)
in L2 on R. By Plancherel's Formula for the L2 Fourier Transform (3.32), we know that f^(?)is L2 on R Since f ( y ) is also compactly supported, Theorem 1.9 says that is also L1 on R. Therefore, we can prove (see Exercise 3.51) that in fact, A
y(?)
lim
r+o+
Y,
in L" on R. Let us call this uniform lirrlil function g ( x ) ; that is,
Thus, we have an L2 limit function f (x) and an L" limit function g ( x )for the same sequence of functions. So we must show that in fact, these limit functions are the same.
3.9. The Sampling Formula
83
In order to do this, define the functions f, (a)by
and fix a number A
> 0.Then by Minkowski's inequality,
The left side of the inequality is independent of r , and the right side can be made as srnall as desired by choosing r > 0 srna.11 enollgh. Therefore for every A > 0, A
which implies that
If
(x)- 9(2)12dz
= 0.
Since f (x) is piecewise continuous by assumption, and g(z)is in fact continuous by the argument used in the proof of 'l'heorem 3.8, f (z) = g(z), except possibly at the discontinuities of f (x).Since there is no problem redefining f (x)at these points, we can conclude that f (x) = y(x) for every x E R. But this is (3.44). One of the fundamental results in Fourier analysis is the Shannon sampling theorem. The theorern asserts that a bandlimited function can be recovered from its sarrlples on a regularly spaced set of points in R, provided that the distance between adjacent points in the set is srnall enough. The formula is also very important in digital signal processing applications. Theorem 3.50. (The Sharlrlorl Sarrlplir~gTheurem) I1 S ( x ) with bandlimit Cl, t h e n f (x)c a n be w r i t t e n as
where the s u m converges in L2 and L"
,is b~7idli,11L%led
o n R.
'This theorem has a long and interesting history that is recounted beautifully in the article by Higgins, Five short stories about the cardinal series, Bulletin of the AMS, vol. 12 (1985) 45-89.
84
Chapter 3. The Fourier Transform
Proof: Since f^(y) is supported in the interval 1-R/2, fl/2], we can expand f^(y) in a Fourier series and obtain
for y E [-fl/2,fl/2], where
But by (3.44), it follows that
Making the change of summation index n
+ -n
leads to
Again applying (3.44), we obtain
where we have used the fact that Fourier series can be integrated term-byterm (Exercise 2.67) and that for any numbers a > 0 and b # 0,
(Exercise 3.4). To see that the convergence of (3.47) is uniform on R, let N, M E Z be fixcd.
3.9. The Sampling Formula
85
where we have used the Cauchy-Schwarz inequality and where the c(n) are t,he Fourier coefficients of f^(y). But since the Fourier series of f (?) , . coriverges to f (y) in L~ on [-012, fl/2], A
The L2 convergerice of (3.47) follows from the fact that the collectio~i
is an orthonormal system on R (Exercise 3.52).
Exercises Exercise 3.5 1. Prove equation (3.46). Exercise 3.52. Prove that the collectiorl
is an ortllonormal system on R. (Hint: Use Parseval's formula.)
Exercise 3.53.
x E R,
Show that if f (x) has bandlimit fl > 0, then fbr every
86
Chapter 3. T h e Fourier Transform
Exercise 3.54. Show that if f ( x ) has bandwidth B > 0, then f (x)can be completely recovered from the samples { f (TL~B)),,~. Exercise 3.55. Let f (x) be L~ on R, and let r > 0 be given. Prove that there exists a handlimited function g(x) such that $(y) is C0 on R and 11 f - g1I2 < r. (Hint: Use Corollary 2.37(b) in the Fourier transform domain.)
Chapter 4
Signals and Systems In the previous chapter, we considered piecewise continuous functions with period 1 and showed that it is possible to represent such functions as an infinite superposition of exponentials en(t) = e2"int, n E Z. Each such exponential has period l / n and hence completes n cycles per unit length (which we can interpret as measuring time). If the exponentials are interpreted as "pure tones" of n cycles per second, then each f (t) has a "frequency representation" of the form
where
We also know that
Conceptually, there is nothing stopping us from changing our perspective and regarding the sequence { f (n)InEz as the object to be given a "frequency representation." In this case, (4.2) is such a representation in ,. which we consider f (n) to be a coritinuous superpositiorl of "pure tones" on Z, e,(n) = e2T"Lx,which complete about one cycle every l / x time steps (indexed by r ~ ) Equation . (4.1) now gives a forrnula for the coefficients in this continuous frequency representation. This new perspective is very well suited for digital signal processing (DSP) applications in which data are necessarily in the form of arrays of numbers. These arrays are of course always finite but can be of arbitrary length. Hence, it is convenient t o regard these objects (to which we will give the name signals) as being infinite sequences. A related perspective regards signals of length N as periodic sequences with period N . This chapter provides a discussion of some of the ternlinology and basic results of the mathematical theory of DSP from both perspectives. A
88
Chapter 4. Signals and Systems
4.1
Signals
Definition 4.1. A signal is a sequence of numbers {~(n)),~z satisfying
Remark 4.2. By basic results on convcrgcnt scrics, any signal must be bounded, that is, there is a number M > 0 such that /x(n)1 < M for all rL E Z. It is also true that any signal satisfies C , Ix(n) 1' < m. Such sequences are said to be (Exercise 4.6).
Definition 4.3.
t2 sequences or somctimes to have finite energy
The frequency domain representation of a signal x(n)is the
function
Remark 4.4. (a) Since En 1x(n)1 < a, the sum defining ?(w) converges ilniformly to a continuous function with period 1.
(b) Recall that the set {e2""}w,,[o,I) is the unit circle in the complex plane. This is because if z = e2"7w,then lzl = 1. Hence the function X(eZKLW) car1 be tlwught of as the restrictioii t o tlie unit circle of some function X ( z ) defined on some portion of the conlplex plane containing the unit circle. Specifically, we can define
wherever the surn makes sense. The function X ( z ) is referred t o as the z t m n s f o r m of z(n,).
Example 4.5.
Also,
(a) Let z ( n )=
1 ifO
Then,
4.1. Signals
89
See Figure 4.1. (b) Let la( < 1, and let
Then,
and CU
X(z) =
C a"
z-n, =
r~=0
whenever lzl
1 1-az-I
-
Z
z-a
> lal. See Figure 4.2.
Exercises Exercise 4.6. Show that every signal is an f 2 sequence and that there are f 2 sequences that are not signals.
90
Chapter 4. Signals and S y s t e m s
FIGURE 4.2. Left: x ( n ) of Example 4.5(b). Right: the real part of z ( w ) .
4.2
Systems
Definition 4.7. ( a ) A s y s t e m is any transformation T that takes a n input signal x ( n ) t o a n output signal y ( n ) . We write T x ( n )= y(n). ( b ) A system T i s linear if
for every pair of signals
XI
( n ) and
5 2 ( n ) ,and
every pair of constants a and b.
( c ) A linear system T is stable i f there is a constant C signals x ( n ),
>
0 such that for all
( d ) For ~ z oE Z , ,we d e f i r ~ el l ~ eLrarlslatiorl operator, rn,,, by
Tn,,x(n)= x ( n - n o ) . ( e ) A linear translation-invariant ( L T I ) s y s t e m is a linear s y s t e m T for which
T(rnox ) ( n )= T~~ ( T x )( n )= T x ( n - n o ) . ( f ) T h e convolution of two signals szgnal y ( n ) given by y(n) = xl
rl
( n ) and x z ( n ) , denoted
* x2(n) =
C x 1 ( I C )x Z ( n
-
li).
XI
* x z ( n ) , is
the
4.2. Systems
91
Theorem 4.8. (a) If x1 ( n ) and x z ( n ) are signals, then so is y ( ~ z ) = X I * x2 (7.1).
( b ) For any pair of signals
XI
( n ) and
2 2 ( n ),
(c) Let h ( n ) be a signal, and define the transformation Th o n signals b y
T h e n TI, is a stable LTI system.
Proof: (a)
(b) Exercise 4.25. (c) By (a), whenever z ( n ) is a signal, so is Thz(n), so that Tjzis a system. That Th is linear is a simple verification (Exercise 4.26). That Th is stable follows from (a) by taking C = C , Ih(n)1. To see that Th is LTI, fix no E Z. Then
by a change of summation index.
Remark 4.9. (a) The r~otiorlof stability defined in Definition 4.10(c) is another way of saying that T is continuous in the sense that a small change in the input z ( n ) results in only a small change in the output y(n) = T z ( n ) .
92
Chapter 4. Signals and Systems
(b) To be more precise about what this means, we define a n o r m on the class of all signals (called the tl-norm) by
We measure the distance between two signals z l ( n ) and z;?(n)as ((zlzz( ( e l ,so that a small change in z ( n ) (to say Ic(n)) means that - lcllPl is small. In this notation, the inequality in Definition 4.10(c) can be written as
IITzIlel I C llll:lel. (c) We can now give a precise definition of continuity as follows: Given > 0, there is a S > 0 such that whenever lllc < 6, IlY'z - T.Zllel < 6 . An equivalent definition (Exercise 4.27) is the following: Let {zk ( n ) ) k , ~ be a sequence of signals converging t o the signal z ( n ) in the sense that 1l.z. - xkllp -+0 as k + m. T h e n T z k ( n ) convenjes to T z ( n ) in the sense that llTz - Tlckllpl + 0 as k + m.
zllel
E
Our next goal is to show that all stable LTI systems have the form Th for some signal h(n) (Theorem 4.12(c)).
Definition 4.10.
T h e unit impulse signal 6 ( n ) is defined b y
See Figure 4.3.
Lemma 4.11. A n y signal x ( n ) can be written as a s u m of shifted impulses; that is, for each n E Z,
Proof: Exercise 4.28. Theorem 4.12. Let T be a stable L T I system. T h e n there is a signal h ( n ) such that
T s ( n )= ( x * h ) ( n )=
a ( k )h ( n - k ) .
(4.5)
kEZ
Proof: Define h ( n ) = (Tb)(n). Since T is a system, h(n) is also a signal; that is, C , J h ( n ) J< m. Given a signal z ( n ) , we can write z(n) =
4.2. Systems
93
FIGURE 4.3. The unit impulse.
C ~ Ex(k) Z rk6(n) by
(4.4). Since T is a stable LTI system,
( T z )(n)
~ ( k( )T T ~( n~ ))
= k€Z
which is (4.5). The assurnptions of stability and linearity are required to justify pulling T inside an infinite sum like we did above. The details for this justification are the content of Exercise 4.29.
Definition 4.13.
Given a stable LTI systern, T , the signal h ( n ) such that T x ( n ) = ( x * h ) ( n ) is called the impulse response of T. T h e impulse response of a stable L T I system i s often called a filter. The frequency representatzon of h ( n ) , h ( w ) , is called the frequency response of T, and the z-transform of h ( n ) ,H ( z ) , is called the system function of T .
A
Remark 4.14. (a) Referring to h ( n )as an impulse response makes sense because according to the proof of Theorem 4.12, (TS)(n)= h ( ~ z )so, that h(n) is the response of the systenl to a unit impulse. (b) That x ( w ) is called the frequency response also makes sense for the following reason. Suppose that the input to the system T was a "pure frequency" z ( n ) = e2"inwo. Even though z ( n ) is not a signal according to our definition, it still makes sense to form the convolution (z* h ) ( n ) .
94
Chapter 4. Signals and Systems
Therefore,
( T s )( n )
s ( k )h ( n - k )
kEZ
x x ( n - k)h(k)
=
k€Z
C
-
e 2 x i w ~ ( n - k )h
k€Z -
e2ninw"
C h(k)
(k) e-2xikwU
k€2 -
e2xinwo
IL(WO)
A
In other words, thc signal z ( n ) passes through the system T unchanged ex,--. cept for multiplication by the constant h ( w o ) .We say that pure frequencies ,-. are eigenvectors of LTI systems with eigenvalues given by h ( w ) . (c) A similar calculation is valid when dealing with real valued signals and justifies some further terminology. Suppose that the input to the system T is a real sinusoid of the form
Moreover, by (b),
Assuming that h ( n ) is real valued, we have that
(Exercise 4.30), so that writing
it follows that Therefore,
A
A
A
h(-wo) = h ( w a )= I h ( w o )1 e - ' Q ( W o ) .
4.2. Systems
95
In other words, the signal z ( n ) passes through the system T unchanged except for multiplication by the real constant Ih(wo)l and a phase shift by O(wo). We refer to lx(w)l as the m a g n i t u d e response of T and to B ( w ) as the phase response of T . A
Remark 4.14(a) can be thought of as a particular case of the following theorem.
Theorem 4.15.
Let
XI
( n ) and x z ( n ) be signals, and let y ( n ) =
* xz)(n).
(zl
Then
G(W)=
A
J;1
( w) $2 (w) .
Proof:
=
z, ( k )z 2 ( n - k ) e-2"zn,w
zl (k)
=
i a ( n- k ) e-2T'n"
Corollary 4.16. Let T be a n L T I system with frequency response x ( w ) . T h e n for every signal x ( n ) , T z ( w ) = z ( w )h ( w ) . h
4.2.1
Causality and Stability
Definition 4.17.
A stable LTI system T is causal
satisfies h ( n ) = 0 for n
2f
its impulse response h ( n )
< 0.
Remark 4.18. (a) The system function for a causal system has the form
which is a power series in z - l . Consequently, wc can talk about the radius of convergence of H, and note that if R > 0, then H ( z ) converges uniformly for 1x1 > R - l .
96
Cliapter 4. Signals and Systems
Obviously, if h(n) is a causal FIR filter (Definition 4.24), then the system function H ( z ) is a polynomial in z-I. (b) If T is causal, then given r ( n ) ,
(Ts)( n )
z ( k ) h ( n - lc)
= kEZ
~ ( nh(0) ) + z ( n - 1)h(1) + z ( n - 2) h(2)
=
+ ....
Hence the output y ( r ~ )at n = no depends only on the input z ( n ) at n = no, no - 1, no - 2, . . .. If we imagine that our input signal z ( n ) is some time series, then T being causal means that the output q ( n o )does not depend on the "future" values of the input (that is, x ( n ) , n > no) but only on the "past" values of the input (that is, s ( n ) , n 2 n o ) In this sense, any realistic systeni rnust be causal. Definition 4.19. A system T is realizable zf the relation between the input x(n) and the output y(n) of T is given by a n equation of the form
for each n E Z
Remark 4.20. (a) Under the assuniption that a(0) note that (4.6) car1 be written for each n as y(n)
=
b ( O ) . x ( r ~ ) + b ( l ) z ( n -1)
+
- a ( o ) ' ( a ( l ) y(n - 1)
+ - - . + a ( K ) y(n
#
0 and b(0)
# 0,
+b(A4)x(n-hf)
+ 4 2 ) y(n - 2)
-
K)).
In other words, the output y(n) a t n = no depends on the current value of the input z ( n a ) and the values of the input M steps in the past, and on the values of the output K time steps in the past. Of course, the output of such a system at times n = 0, 1, . . . will differ depending on the K "initial values" of the output, y(-1), y(-2), . . . , y(- K ) . These values may in principle be computed knowing the values of the input s ( n ) infinitely far in the past, but as a practical matter, it is usually assumed that y(-1) = y(-2) = - . . = y(-K) = 0.
(b) Note that each side of (4.6) is a convolution with the finite filters a ( n ) and b(n). Thus, (4.6) can be rewritten as (a * Y) (n) = (b * z)(n) . By Theorem 4.15,
4.2.Systems
97
Hence the system function for T is given by
where
B ( t )=
x
x --
b(m)z P m
and
A(z) =
a(k) z p k
so that R(x) is a rational function of x-l. (c) Note that if R ( r ) is a rational function in z-', it is clearly also a ratiorial lu~lctionin t . This is true since z N B ( z ) and z N A ( r ) are polynomials in z as long as N 2 max(M, K). Therefore, the system function of a realizable system is completely determined by its poles (the zeros of its denominator) and its zeros (the zeros of its numerator). (d) We know that given n rational function R(z) in wlliclr the degree of the denominator is greater than or equal to the degree of the numerator, we can write the rational function R(z)/z in a partial fraction expansion as follows:
where the pi are the poles of R ( t ) / z , m, is the multiplicity of each pi,and
(e) Next note that if a
# 0, then
whenever 1x1 > la/. Taking the derivative with respect to a , we get
Therefore, whenever 1x1 > /a/,
98
Chaptcr 4. Signals and Systems
Hence we can write
which will converge as long as lzl > max{lpi I).
By the above calculations, we have nearly proved the following theorem.
Theorem 4.21.
If the system function, R ( z ) , of a realizable system T has all of its poles inside the unit circle of C , then T i s causal and stable.
Proof: It follows from Remark 4.20(e) that T is causal with impulse response given by
To show that T is stable, we rnust show that En Ir(n) 1 note that r ( n ) is a finite surn of terms of the f o r n ~
< m. To do this,
and that for each i and j : (n+l?!
$+'-J
j!qn + 1 - j)! n! Ai,j j!(n - j ) ! P?
as n + m . If pi1 < 1 for each i , then for each i and j, and it follows that C , Ir(n) I < m .
Example 4.22.
(a) Let
C , Ir(n,i,j)l < m ,
4.2. Systems
99
This rational function has first-order poles at x = 1/2 and z = 3. Expanding R(x)/x in a partial fraction expansion, we obtain
and
Therefore, T is causal but not stable.
(b) Let
-
This system function has a first-order pole at z = 1/2 and a second-order pole at x = 1/3. Expanding in partial fractions, we obtain
Therefore, T is both causal and stable.
Remark 4.23. (a) Note that in the above example, the impulse response of each systcrrl described is infinite in length, so that it direct computation of T x ( n ) fur some signal x ( n ) would in principle require infinitely many calculations. In fact only finitely many calculations are required since Tx(n) can he realized as the solution to the finite difference equation (4.6). (b) The infinite lengths of the impulse responses of the systems in Example 4.22 arise from the fact that the system function has nonzero poles. A pole at t = 0, cven of high order, will not result in an irlfirlite impulse response. Such a system is called a delay (Exercise 4.31). This leads to the following definition.
Definition 4.24.
The i17zpulse Y ~ S ~ O T of L Sa~ realizable system whose system function has n o poles except possibly at z = 0 is called a finite impulse response (FIR) filter and one with nonzero poles inside the unit circle is called a n infinite impulse response (TTR)filter.
100
Chapter 4. Signals and Systems
Exercise 4.25.
(a) (xl
Let x1 (n), x2(n), and h(n) be signals. Show that
+ x 2 ) * h(n) = X I * h(n) + 2 2 * h ( n ) .
(b) If a is any number, then ( a 21) * h(n) = a (xl* h) (n).
Use (a) and (b) above to show that any system defined as in Theorem 4.8(c) is linear. Exercise 4.26. Prove Theorem 4.8(b). Exercise 4.27. (a) Show that the two definit,ions of continuity given in Remark 4.9(c) are equivalent. That is, show that a systern T satisfies the first definition if arld only if it satisfies the second.
(b) Show that a stable linear system is continuous by showing that such a system satisfies either definition of continuity given in Renlark 4.9(c). (Hint: If you use the first definition, then given 6 > 0, take S = E I C . ) Exercise 4.28.
(a) Prove Lemma 4.11 by showing that for each n t Z, s(k)S(n-k) =x(n).
lirn
N-+co
k--N
(b) Prove that in fact (4.4) holds in the following strong sense:
lirn
N+m
1Ix(n) k=-N C z(k) b(n -
-
k) = 0.
~LEZ
In the notation of Remark 4.9(b), we are being asked to prove that lim IIx - :xNllel,
N+m
where
N
Z N ( ~= )
C
x(k)h(n - k).
k=-N Exercise 4.29. (a) We say that x ( n ) is a finite signal if there exist numbers N, M E Z such that z ( n ) = 0 if n, < I\/[ or n > N. In other words, x ( n ) is a finite signal if it has only finitely many nonzero entries. Prove that, given a signal x(n), there is a sequence of finite signals {xk ( n ) j k E N such that lirnk,, Ilx - xlc = 0.
(b) Note that all calculations in the proof of Theorem 4.12 are legitimate for finite signals ~ ( n ) Show . that the stability of T implies that in fact, Tx(n) = (x * h)(n) for all signals x ( n ) (finite or not).
4.3. Periodic Signals and the DFT
101
A
Exercise 4.30.
Prove that if h ( n ) is a real valued signal, then h(-ci?) =
A
h(4. , Exercise 4.31. Show that if R ( s ) = s P mm system T is givcn by T x ( n ) = z ( n - n ~ ) .
> 0, then tlie correspoiiding
Periodic Signals and the Discrete Fourier Transform A different inode1 for thinking lnathenlatically about finite sigrlals is to corisider the finite signal to be infinite in length but periodic. Iri other words, given a finite data set of leligtli N , ( ~ ( 0 )x(1). . . . . , . r ( N l ) } .define a corresporidirig irifi~iitesequence F(n),n E Z, by F ( n ) = . c ( r ~11iod N ) . Iii this case, . c ( n ) = J:(TI,)wlierlever 0 < rl, < N , so that .?(?I,) is considcrctl an extension of
n.(~i,).
FIGURE 4.4. Left: A signal tension F(n).
~ ( 7 1 )of
length 5. Right: its period-5 ex-
Definition 4.32. G i v e n N E N , a sequence { ~ ( n ) tz ) , ,is a period N sigrial "x(n+ N ) = z ( n ) t o r all 7~ g Z . I n this case x(n) is said to be periodic. Remark 4.33. (a) It is clear that a period N signal, unless it is identically zero, can never be a sigrial in t,he sense of Definition 4.1, since the absolute values of its entries will always surn t o infinity. However, a periodic signal is bounded in the sense that there is a number A f such that I.c(n)1 A1 for all n E Z.
<
102
Chapter 4. Signals and Systems
(b) Even though a periodic signal is never a signal in the sense of Definition 4.1, it is always possible to pass a periodic signal through a stable LTI system and have the result make sense. Since stable LTI systems are characterized by their impulse response filter, this statement amounts to the statement that the convolution of a filter and a periodic signal is well defined. This is the content of the following theorem. Them-err1 4.34. Given u filler. h(71) a n d u. period N signal x(n),t h e convolut i o n x * h(n) i s defined for all n a n d i s a period N signal.
Proof: That z* h ( n )= CktZ ~ ( kh )( n - k) is defined for all n amounts to showing that the sum converges for each n. But this follows from the fact that for sorne M, Ix(n)l5 M for all n and the calculatiorl
To see that z * h,(n) has period N, note that
=
z ( k ) h(n - (k - N))
since x(n)has period N. Since filtering operations are defined on periodic signals, it makes sense to look for an interpretation of such an operation in the frequency dorrlairl analogous to Corollary 4.16. Consequently, we need a notion of frequency representation for periodic signals. This is done via the the Discrete Fourier Transform (DFT).
4.3.1
The Discrete Fourier Transform
Definition 4.35.
G i v e n a period N signal x(n),the (N-point) Discrete Fourier i s t h e period N sequence
T~-ar~slor.rr~ ur. (N-point) D F T of z ( n ) , denoted
defined b y N-1
z(n),
4.3. Periodic Signals and the DFT
103
The DFT is invertible as follows.
Theorem 4.36.
G i v e n a period N sequence x(n)with D F T ;(n),
for each j E Z .
Proof: Note first that for any nurnber r ,
/ir
l-r
'
n=O
so that N-1
In
n=O
Hence, for 0 I j _< N
-
Therefore, for each 0
<j
-
1- e 2 ~ i ( . i - k ) / N
'
1,
-
1,
N-I
Since the sum
1-e2rd~-k)
N-1
N-1
defines a period N sequence in j , the above holds for all j E Z. We can now prove the following theorem relating the DFT of a periodic signal to the DFT of a filtered version of that signal. Theorem 4.37.
L e t h ( n ) be a filter a n d x(n) a period N signal. T h e n
Chapter 4. Signals and Systems
104 where
z(n)is the D F T of z ( n ) and x ( w ) is given by Definition 4.3.
Proof:
,.
Remark 4.38. (a) Notice that since h(w) has period 1, the sequence h(n,/N) has pcriod N . Therefore, it is reasonable to ask if there is some period N signal h(n) wllose N-point DFT is X(n/N), and if there is such a periodic signal, how is it related to the filter h,(n)?
A
(b) To find the period--N signal h(n,),all we need to do is take the inverse DFT of ;(n/N). This gives
BY (4.91, N - l
N - l e2n"".(-~)/N En=,
0 otherwise. Therefore,
= 1 if j -
k. = ,N
for some
V Lt
Z and
4.3. Periodic Signals and the DFT
We say that
h(n)is the
-
105
N-pcriodizntion of h ( n ) .
(c) In terms of h(n), we can write the formula for the convolution of x(n) with h ( n ) as follows.
Since z ( n ) has period N , x ( j - m N ) = x ( j ) for all m and we continue
This suggests the following definition of the convolutiori of periodic signals.
Definition 4.39. Let x(n) a n d y ( n ) be period-N signals. T h e n the circular convolution of x ( n ) a n d
~ ( ' I L ) is
defined b y N-I
x * y(n) =
Remark 4.40. cise 4.42(a)).
(a) Note that x
x ( k ) y ( n - t).
-c ~ ( 7 7 ; ) is
also a period-N signal (Exer-
(b) The sum in Definition 4.39 can be obtained by surrlnlirlg over any N adjacent indices. To see this, let j E Z be given, and let m be the unique integer such that (m - I)N 5 j < m N . Then,
106
Chapter 4. Signals and Systems
Frorn this it can he shown that x * g(n) = g * x(n) (Exercise 4.42(\1)). (c) Circular convolution car1 be realized as nlultiplicatiorl by a matrix whose rows are shifts of one anotlier. Giver1 a period N sequence z ( n ) ,define the matrix X by
If y has period N and r (n) = x * :q(n),define y = [g(0) r = [r.(O) . - . r ( N - I)]. Then r = Xy.
Theorem 4.41.
. . . y (N- I)] and
L e t x(rr) a n d ? / ( r ~he ) period-N sign,al.s, a n d let F ( n ) a n d c(n)
be t h e l r DFTs; t h e n (3:
whew (x * y)"
(TI)
* !J)"
(11)
d e n o t e s the D F T of
=~
:L:
I
L ~ )( T L ) ,
* IJ(TL).
Proof: Exercise 4.43.
Exercises Exercise 4.42.
Let x(n) and y(n) be period-N signals.
(a) Prove that thc circular convolutiorl x * y ( n ) is also a period-N signal. (b) Prove that z
Exercise 4.43.
* y(n) = y * x(n).
Prove Theorem 4.41.
Exercise 4.44. Prove that if 2(n)is the N -point DFT of the period-N signal x (n,), then N-1
N-1
4.4. The Fast Fourier Transform
107
The Fast Fourier Transform
4.4
The N-point DFT can be thought of as a linear transformation on the finite-dilnensional vector space C N ,and hence can be writtcrl as a matrix. That is, given the vector
its DFT,
can 1)e writtcn
Tlie matrix WN dofirled 1-y (WN),,,i= where CI:, = syrnrnctric, ort2logoiia1, and
w$'.
is
Tlic irlversc DFT can he writtc.11
where Wh is the atljoint of WN. Notc tliat all of the cnt,ries in tlie rnatrix WN have absolute value 1. This illearis that to coinpute the DFT of a period N sequcnce . c ( n ) by doing straight rnatrix in~lltiplicationwill require N%ri~llti~lications. This niakcs the tlircct iinplerrlcritatiori of the DFT inipractical for large N . Fortllrlately therc is an algorithrrl known as the Fast Fo7~rler.Tmn,sfonn,or F F T t,h;~t, s p ~ p d su p thc? co~~lputatiorl of thc: DFT c~rlsiclorabl~.~ Tlie idea hehincl the F F T is the following. Lot N E N witli N even. and \Vnr = ?-"dlN. If N = 2 A f , t,heil
Now. given ;-L period N scqlierlce .c(2j 1). Tlleri
+
.C(TL)
(lefiiie ( ~ ( j )= x ( 2 . j ) a~lclb ( j ) =
'The following discussion is adapted frorr~Papolilis, Signal Analysts, hlcGraw-Hill (1977), p. X2ff.
108
Chapter 4. Signals and Systems
where Z ( n ) and a(n)are the W-point DFTs of n and b respectively. Note further that
so that if 0 5 n 5 A1
and if hf 5 r~ 5 N
-
z(71)
-
1 , the11
1, then
1 )+ 1
(71-hI)
=
(
=
ii(7~+ ) win' w;i(rL)
1 -
-b(n
-
hl)
A
=
a ( ri,)
A
-
W; b (n),
A
rernern1)cring that Z ( n ) and b ( n ) have period h l . In inatrix notation, the above discussioll can be sumrnarizcd by the fol. . . ~ ( A I- I ) ] . lowing. Let S = p(0) 6(1) . - . Z ( h 1 - I ) ] and = Tlleil
6(0)Z(l)
(4.10) where flnl is the A4 x h! diagonal matrix giver1 by
and PN is the N x N permutation matrix that separates the even and odd entries of a vector. For example,
4.5. L~ Fourier Series
109
The efficiency of the F F T comes from the fact that the dense rnatrix WN can be factored into a product of sparse matrices as in (4.10). If Ad is a,lso even, then Wnr can also be factored in this way, reducing conlputations still further. The greatest reduction in conlputations is realized when N = 25 for some s > 0. In order t o determine the cornplexity of the F F T , define m ( N ) t o be the number of multiplications required to compute an N-point DFT using the F F T algorithm. Theri since
corrlplitirlg 2 requires rrb(N/2) ~rlultiplicatioilsto cornpute each of 2 and b arid N/2 r~~~~ltiplications to compute W$ b(r1). Thus, A
Thcrcforc, we lii-lve the following theorem.
Theorem 4.45.
If N = 2" f o r sorne s E N , then
Proof: 'l'l1e proof' is by induction. Sirice n1,(4) = 4 the result holds for N = 4. Assurrie that it holds for N. Now. by (4.11), rn(2N)
+N
=
2711(N)
=
2 ((N/2) lo&2(N)) N log2(N) N log, (2)
=
= =
+
+
N(log2( N ) log;?(2)) N log2(2N).
4.5 L~ Fourier Series By Definit,iorl 4.1, a signal z(n,)rnl~st,sa,t,isfyC,,Iz(n,)1 < oo.LVc ha,ve seen t>liatall signals are .!%equences: that is, that C , Iz(n)l\ m, but that not all t%sequerlces are signals (see Remark 4.2 and Exercise 4.6). It turns out that there is a very rich arid clegarit rnathenlatical theory related t o the Fourier analysis of t2 sequences which provides an entryway into the very powerful theory of Hilbert spaces. The full devcloprnent of this tl-icory is beyorid the scope of this book, but we will need one of the main results of the theory (The Riesz-Fischer Theorem, Theorem 4.48) in Chapter 9.
110
Chapter 4. Signals and Systems
Definition 4.46.
e2
Given an sequence z = denoted z ( y ) , is the period 1 function given b y
The series (4.12) is referred to as a n ~"ourier
{ I C ( T L ) ) ~ ~ its Z,
Fourier series,
series.
Remark 4.47. (a) Since z ( n )does not rlecessarily satisfy En Iz(n)l < m, the sum in (4.12) does not necessarily converge uniformly on R. In fact, it is not clear that such a series even converges pointwise. For example, consider the series 02 cos(;;nt) 71=1
This is just (4.12) with x(n) = 1/(2)nJ)for n # 0 a.nd z(0) = 0. Clearly, x ( n ) is t % ~ ~ if tt = 0 (or any even multiple of T),the series reduces to the llarmonic series arld does not converge. (b) In light of Plancherel's formula for Fourier series (Theorem 2.57(d)), it seems reasonable t o expect that the series (4.12) will converge in L2 on [O, 1). However, if we carlrlot even guarantee that the infinite sum (4.12) converges at any given point, then the question arises: How are we t o interpret the s u m (4.12) as u function defined o n [0, l ) ?
(c) It turns out that the proper answer to this question requires us t o allow into the picti~refunctions that are not piecewise continuous. This more general notion of function is referred t o as a Lebesyue measurable function,, and to integrate such functiorls requires a more general notion of integral than does the Rierrlann int,egral, na,mely, the Lebesyue integrul. Both of tliese ideas require the notion of the Lebesgue measure of a set that is beyond the scope of this book. (d) The Riesz-Fischer Theorem (Theorem 4.48) gives the final word on the convergence of (4.12) in L" and Carleson's Theorem (Theorem 4.49) asserts that (4.12) converges pointwise except on a set of Lebesgue measure zero. Such sets includc finite sets of points and countable sets of points but are by no means limited t o that.
Theorem 4.48.
e2
(the Riesz-Fischer Theorem) Given an sequence { c ( n ) ,) , E ~ , there exists a Lebesgue measurable function f (x) on [ O , l ) with the property that
where the integral is the Lebesgue integral. In this case,
4.5. L' Fourier Series
111
Theorem 4.49.
(Carleson's Theorem) Given an l2sequence { ~ ( n ) } , , ~the z, symmetric partial sums N
converge at each point of [ O , 1 ) except possibly on a set of Lebesgue measure zero.
Part I1
The Haar System
Chapter 5 The Haar System In this chapter we will preserit ari exarrlple of an orthoi~oril~al systeni on [O, 1) known as the Huur system. The Haar basis is the simplest and historically the first exarriple of an orthonormal wavelet basis. Marly of its properties stand in sharp contrast t o the corresponding properties of t,he trigonometric basis (Definition 2.5). For example, (1) the Haar basis functions are supported on srnall subintervals of [0, l ) ,whereas the Fourier basis fiinctions are nonzero on all of [ O , l ) ; (2) the Haar basis functions are step functions with jlirnp discontinuities, whereas the Fourier basis functions are C" or1 [0, 1); (3) the IIaar basis replaces the rlotion of frcqucncy (represented by the index rL in the Fourier basis) with the dual rlotions of scale and location (separately indexed by j and k ) ; and (4) the Haar basis provides a very efficient represeritatiorl of filrlctions that corlsist of srnoot,h: slowly varying segments punctuated by sharp peaks and discontinuities, whereas the Fourier basis best represents f~lnctionsthat exhibit long terrri oscillatory behavior. More will be said about this contrast in Sectiori 5.4. Our first goal is the construction of the Haar basis on the interval [0, 1). In the colirse of this presentation, we will introduce many of the concepts required for the uriderstanding of rriultiresolution analysis and for the construction of general wavelet bases.
5.1 Dyadic Step Functions 5.11
The Dyadic Intervals
Definition 5.1. For. each pair of integers j , k E Z7 define the interual
I,,k
by
The collection of all such intervals i s called the collection of dyadic subirltervals of R.
The dyadic intervals have the following useful property.
Lemma 5.2. either
G i v e n j o , ko, j ~k.1, E Z, with, either jo # jl or. k o
# kl, then
116
Chapter 5 . The Haar Systerri
In the latter two cases, the smaller interval i s contained in either the right half o r left half of the larger.
•
Proof: Exercise 5.6.
Definition 5.3. G i v e n a dyadic interval at scale j , I,,,, we write I,,, = I , : . ~U I;,,, where I;,, and are dyadic intervals at scale j + 1, to denote the left half and rzght half of the interval I,,k.. I n fact, I:,, = I J f 1 , 2 k and I;,, = (Exercise 5.13.
5.1.2
1,+1,2k+l
T h e Scale j Dpadzc Step Functions
Definition 5.4. A dyadic step function i s a step ,fi~n,ctionf ( r c ) ulith the property that for som,e j E Z , f ( r )i s constar~to n all dyadic inter?~als1 3 , k , k E Z . W e say in this case that f (x) is a scale j dyadic step fiinction. For a n y interval I , a dyadic step filnctiorl on I i s a dyadic step function that is supported o n I . See Fzgure 5.1. Remark 5.5. (a) khr each j E Z, the collectioil of all scale j dyadic step functiorls is a linear space. That is, any linear conibination of scale j clyadic step functions is also a scale j dyadic stcp f~xnctiori.
(h) For each j E Z arid interval I, the ~ollect~ion of all scale j dyadic step filrlctiorls on I is a linear space. That is, any lirlea,r co1nbina.tion of sci~lcj dyadic step funct,ions on I is also a. scalc j dyadic step fullction on I. (c) If f (z) is a scale j dyadic step function on an interval I, then f (:I;.) is also a scade j' dyadic step furiction on I for ariy j' 2 j .
Exercises Exercise 5.6.
Prove Lerrlnla 5.2.
Exercise 5.7.
Prove that for each j,k E Z ;
~ i % ~ and I;\i = 13+1,2k
Ij+1.2k+1.
Exercise 5.8. Prove that any function f (z) of the form
=
5.2. Tlle Haar Systerrl
117
FIGURE 5.1. Scale j dyadic step fur~ctions.Left,: j = 2. Right: ,j = 4.
car1 he written i11 the form
5.2 The Haar System
Definition 5.9.
Lct p(.r') = X p . l ) (n.), an,$ for.
,TLC/L
j. k E Z, dc,firj,e
pl,k,(.r)= 2 1 ' 2 p ( 2 J . r- X.) = D 2 , 7 ; ~ ) ( . ~ ) .
(5.2)
(For the dej?n,ztion o,f thc d,ll(~,tionoper,ntor D,, c ~ n dthe t ~ a n s l a t i o roperator ~ 7b, see D ~ ~ f i n i t i o3.39.) r~ The collection { p , l , ~ ( ~ ) }i s, r.ef'erre(t l . ~ ~ ~ t o as t h e .system of Haar scaling hliirtions. For ea,ch ,j E Z . th,c rsollection { p , . k ( . r ) } ~ i s ~r.efr.r-r.rjd ~ t o a s the s!lsfrrn of' scale j Haar scaling fil~lctioris.See figure 5.4
Remark 5.10.
(a) For eadl j. k E Z. p,.k(.r.) = 2,1/' t I , , , ( r ) so . that, supported or1 the interval IJ,kant1 tlocs rlot, v a l ~ i s l on i that ilit,ei.vi~l. Therefore, u7e refer to Dhe scaling fullction o , , , k . ( . I : ) as being ussociatcd with,
p , ] . k ( z ) is
the inte,rvc~,lIj,h..
(b) For each j , k E Z,
and
Definition 5.11. Let h , ( z )
=
X 1 0 , 1 / 2 j ( x )- X [ 1 / 2 , 1 ) ( ~ )and , for each j , k E Z,
d efiiz e
h j , k ( z ) = 2J'2 h(2"
-
k) = D2,,Tkh(z).
(5.3)
Th,e collection, { h , j , k ( x ) } , , k c ~ i s referred t o as the Haar system on R. For each j E Z, the collection { h l , k ( x ) } k E is ~r e f e ~ e dt o as the s y s t e m of scale j Haar
functions.
Remark 5.12. (a) For each j , k E Z, hj,/c(x)= 2"'
( X p. I . i( )
-
I
.I
,
(
)
' 2
2'
XI,,,,.^ (x)
-
Y I ~ + ,,,,+,(r)),
so that h,,k(x) is supported on the interval I j , k and does not vanish on that interval. Therefore, wc refer t o tllc Hnur furlctiorl hj,Jlc;.) as being ussociated with the interval I J , k .
(Is) For each j , k E Z, F L , ~ , ~ ( x )is a scale j (c) For each j , k E Z,
+ 1 dyadic step function.
and
5.2.2 Orthogonality of the Haar System Theorem 5.13.
T h e Haar s y s t e m o n R is a n orthonorrnal systern o n R.
Proof: First, we show orthonormality within a given scale. Let j E Z be fixed, and suppose that k , kt E Z are given. By Lemma 5.2,
5.2. The Haar System
119
If k # k', then the product hj,k(x) hj,,/(x) = 0 for all x since the functions are supported on disjoint intervals. Hence, if k # k',
If k = k', then
Nexl, we show orthoriormality between scales. Suppose that j, j' E Z with j # j', say j > j', and let k , k' E Z. Then by Lelrinia 5.2, there are three possibilities.
(1)Ij/,k/ f' Ij,k= 8. In this case, h,,k(x)hj/,k/(x) = 0 for all z and
I;, ,,.
(x)is identically 1 on I:,,,,.Since Ij.*C In this case, I:,,,, it is also identically 1 on I j , , Sirrce hi,,( s )is supported on I,>r,
(2)
C
(3) I j , k c I;, Thus,
k,.
111this case, hj,,kt(x) is identically -1 on Ij;,,, arid on
IJ,k.
Theorem 5.14.
Given any j E Z , the collection of scale j Haar scaling functions is a n orthonormal system o n R.
Proof: Exercise 5.19. Although it is by no means true that the collectiori of all Haar scaling functions is an orthonormal system on R, the following theorem holds.
Theorem 5.15.
Given J E Z , the following hold.
( a ) T h e collection {p.~,k(x), h,,,k(x):j > .I, k E Z ) is a n orthonormal system o n R. (h) T h e collection {P.I,k(x),h.~,k(x): k E Z) is a n orthonormal system o n R.
Proof: Exercise 5.20.
120
5.2.3
Chapter 5. The Haar System
The Splitting Lemma
Lemma 5.16. (The Splitting Lemma) Let j E Z , and let g,(x) be a scale j dyadic step function. T h e n g, (x) can be written as g, (x)= r,-I (x) g,-I ( x ) ,
+
where r, - 1 (x) has the form
for some coeficients { a , -1 ( k ) ) k E z ,and g, -1 (x) i s a scale j - 1 dyadic step function.
Proof: Since gj(x) is a scale j dyadic step function, it is constant on the intervals I j , k . Assume that g j (x) has the value cj (k) on the i~lt~erval Ij,k. For each interval I j W l , k , define the scale j - 1 step function gj-l(z) on I j - l , k by
In other words, on I j - l , k , g3- 1 (x) takes the average of the values of y j (z) on the left and right halves of Ij-l,k(see Figure 5.2(a)). Let r,-l (z)= gj (z) - gjPl(x). By Remark 5.5(a), gj-l (x) is a scale j dyadic step function, and by Remark 5.5(c), so is T ~ - ~ ( xFixing ). a dyadic interval Ij-l,l., recall that lIj-l,k= 2-(i-'). hen'
1 2-(j-l) (cj(2k)
2
=
+ cj (2k + I ) )
0.
Therefore, on Ij-l,k, r j p l ( x ) must be a rnultiple of the Haar function hj-l,k(x) and must have the form (5.4) (see Figure 5.2(c)).
Exercises
5.2. The Haar System
121
FIGURE 5.2. Illustration of one step in the Splitting Lemma. Top left: Solid: Scale 4 dyadic step function y4(z). Dashed: The scale 3 dyadic step function g3(x) constructed as in the Lemma. Top right: Graph of g3(x). Bottom: Graph of the residual rs(x).
Exercise 5.17.
Prove the statements rrlade in Remarks 5.10(a) and 5.12(a)
Exercise 5.18. Prove that po.o(x) = 2-112 P,.,(x)
+ 2-'12
ho,o(") = 2-'12 p 1 , 0 ( x )
-
p1)L,1(2)
and
Exercise 5.19. Prove Theorem 5.14. Exercise 5.20.
Prove Theorem 5.15.
2-112
p1.1 (x).
122
Chapter 5. The Haar System
5.3 Haar Bases on [ O , 1 ] Definition 5.21. For any integer J 2 0, the scale J Haar system on [ O , 1 ] is the collection
W h e n J = 0, this collection will be referred to simply as the Haar system on [0, 11. See Figure 5.3.
Remark 5.22. (a) The Haar system on [O,1] consists of precisely those Haar functiorrs h j , s ( x )corresponding to dyadic intervals Ij,k that are subsets of [0,I], together with the single scaling function po,0(x).
(b) For J > 0, the scale J Haar systern on [0, I] consists of precisely those Haar functions h j , k ( ~corresponding ) to dyadic intervals Ij.k for which j 2 J and that are subsets of [0, I], together with those scale J Haar scaling functions that are supported in [O,l].
Lemma 5.23. Given f(x) continuous o n
[U, 11, and
t
> 0, there is a J
E Z,
and a scale J dyadic step function y (x) supported in [O,1]such that If (x)-y (x)1 < for all x E [0, I]; that is, 11 f - g J ( , < E .
E,
Proof: Exercise 5.26. See also figure 5.5. Theorem 5.24. For each integer J > 0, the scale J Haar system o n [0,I] is a complete orthon,omn,al system o n [0,11. Proof: That the scale J Haar system on [0, I] is an orthonormal system on [0, I] follows from the fact that it is a subset of the collectio~l { p ~ , ~ ( hj,k(z): x), j J , k E Z), which is an orthonormal system on R by Theorern 5.15(a). For completeness, it is sufficient, by Theorern 2.57(c), to show that if f (x) is Ci' on [ O , l ] , then
>
Let E > 0, and let f (x) be C0 on [0, I]. By Lemma 5.23 there exists j and a scale j dyadic step function on [0, I], gj(x) such that
> 0,
Since any scale j dya,dic st,ep function is also a scale j dyadic step function at all higher scales, we can assume that j 2 J.
5.3. Haar Bases on [O, 11
123
FIGURE 5.3. Some uf the Haar functions h j S k ( 5 on ) [O, 1).
By the Splitting Lenima (Lemma, 5.16), g,i(z) m a y be written g j (x) = rJ- 1(x) gj- 1(x), where rj- (x) 1-ias the form (5.4), and is suppurled in [O, 11 and gj-1 (x) is a scale j-1 dyadic step function. Repeating this process j - J times, we conclude that
+
where each r e ( x )can be written
for some constants a p ( k ) and where g~(x)is a scale .7 dyadic step function (Figure 5.6). But this just means that g J ( z ) is a finite linear combination of the collection {pJ , (~s)}:Lil. Thus gi ( z ) is in the span of the scale J Haar system on [ O , l ] and I f - gj1I2 < t.
0
124
Chapter 5. The Haar System
FIGURE 5.4. Some of the Haar scaling functions p , , k ( x ) on [0,1).
Example 5.25. (a) Let f (.c) = X[0.3/4)( 2 ) . Taking J = 0, we see that = 314, and that ( f ,11,,7.k)= O wheiiever Ij,kC [O: 314) or I j , k 5 (f, [3/4,1). This is true for every j > 2 arld all 0 5 k: 5 2J - 1. Thus, the only ilorizero Haar coefficieiits are (f , ho.") = 114 arid ( f ,h l T l )= 2 - v 2 . Notice that the P ( ~ , ~ ( Xterrn ) is simply the average value of the function on 10, I ) , and that the only nonzero Haar coefficients correspond to the Haar functions that "strad(lleVthe discontinuity of f (z). (lr). Again, assuine that J = 0. Then (.f, po,o) = (b) Let f ( z ) = X p l 11/16, which is the average value of f (x) on [0, 11, and ( f ,hj,k) = 0 whenever IjSk C: [O, 11/16) or I j , k C: [11/16,1). This is true for every j 2 4 and all 0 k 5 2j - 1. The only nonzero Haar coefficients are ( f ,ho.o)= 5 . T4, ( f ,h l , l ) = 3 . r 7 i 2 ,(f,h2,2)= T3,and ( f ,h3,5)= 2-5/2. See Figure 5.7.
<
(c) Let f (x) = X [ 0 , 2 / 3 ) ( ~and ) , assume that J = 0. Since 2/3 is not a dyadic rational number (that is, one whose denominator is a power of 2), there will be nonzero Haar coefficierlts for this furlctiorl a t all scales. However, note that ( f ,t ~ j , = ~ )0 if 213 6 I,,k so that at each scale there is exactly one nonzero Haar coefficient. We list the absolute values of several of these coefficients below. Note that as the scale increases, the size of the coefficients
5 . 3 . Haar Bases on [ O , 1 ]
125
FIGURE 5.5. Illustration of Lemma 5.23. Approximation of a continuous function by scale 2, 3, and 4 dyadic step functions.
decreases, but as tlle third colurr~ilof the table indicates, the coefficients are exactly proportional t o 2-j/2. See Figure 5.8. (d) Let 0 x E [0,1/3), 6x - 2 z E [1/3,1/2), .f (.c) = - 6 : ~+ 4 J E [1/2,2/3), 0 x E [2/3, 1).
This function is zero on 10,113) U [2/3, I ) , rises to 1 linearly on [1/3,1/2) and falls to zcro again on 1112,213). Hence, f (x) is C0 on [O,1)with discontinuities in its first derivative at x = 113, 112. a~ltl213. Note tllal nonzero Haar coefficie~ltsare possit~leori1-y when I j , kn [1/3,2/3) # 8.This nlearls that at each scale, about 213 of the Haar coefficients will be zero (See Figure 5.10 and cf. Section 5.4.1). In the previous example, we saw that near the jump discontinuity, the Haar coefficients were proportional t o 2-jl2. Consider the behavior of the Hamarcoefficient,^ near the corner at x = 213. Some values of the Haar coefficients for which 213 E Ij.k are listed in the table in Figure 5.9. Note that the Haar coefficients are approximately proportional to 2-3j/2. Now consider the behavior of those Haar coefficients for which 17,kC [1/2,2/3). It is easy to calculate that each of these Haar coefficients is
126
Chapter 5. The Haar System
FIGURE 5.6. Full decomposition of the function y4(x) of Figure 5.2 using the Splitting Lemma. Down the left column are the functions g:<(x),. q ~ ( x ) , (x),and go(x). Down the right column are the functions 7-3 (.c), T Z (z), r 1 (x),and T O (x).
eqllal to (3/2) 2 - " . 1 / ~ Thus, . the Haar coefficients bchave like 2-"12.
Exercises Exercise 5.26. Prove Lernma 5.23. (Hint: Use the fact that since f (x) is continuous on [O. I], it must be unzform,Zy continuous on [0,I].)
5.4. Comparison of Haar with Fourier
127
FIGURE 5.7. Example 5.25(b).
Exercise 5.27. Prove that every scale j dyadic step Pr~nctionis also a dyadic step functiori at all scales larger than j , and that any linear combination of scale j dyadic step function s is also a scale j dyadic step function. Exercise 5.28. Calculate (numerically or analytically) the Haar coefficients up to scale j = 8 for the function f ( t )= 8 t 2 ( 1 - t ) on [O, 1).Explain the behavior of the coefficients as j increases.
5.4
Comparison of Haar Series with Fourier Series
Note that by Remark 5.12, each Haar function h3,k(x)vanishes outside the interval I j , k .The length of the interval Ij,kis 2-,I SO that if j is large, then the length of Ij,k is small. We say therefore, that the function h,,k(x) is well localized in time (or depending on the context, well localized in space). This property is to be contrasted with the trigonometric basis {e2"i7Lx}nEZ. Note that each element of the trigonometric basis has absolute value 1 for every z E [0, I ) , and so it never vanishes for any x.
128
Chapter 5. T h e Haar System
FIGURE 5.8. Example 5.25(c).
FIGURE 5.9. Example 5.25(d).
5.4.1 Representation of Functions with Small Support One consequence of the good time localization of the Haar basis is that a function f (z) that vanishes outside a small subinterval ( a ,b) of [0, 1 ) will have rrlosL of its Haar coefficients identically zero, since the Haar coefficient ( f ,h j , k )will be zero if does not intersect ( a , b). Note that even if a function f (x) was supported inside a small subinterval of [0,I ) , that most of its Fourier coefficients would be nonzero. Of course, in both cases, we are dealing with infinite series with infinitely many coefficients; so we must be precise about what we mean by "most" or "manynof either the Fourier or the Haar coefficients of a given function. In the case of the Haar series, let us fix an integer j 0 and note that there are 2j functions h j , k ( x )in the Haar system on [0, I). Given a function f (z) supported in an interval ( a ,b), then for this j , the Haar coefficients
>
130
Chapter 5. The Haar Systcm
so that
Therefore, we conclude that t l ~ efraction of possibly rlorlzero Haar cocfficierlts for a fulictio~lvstrlisliing outside an interval is approximately proportional t o the lengtli of that interval (see also Figure 5.10).
5.4.2 Behavior 0.f Huur Coeficients Ncur Jump Disco~~tinuilies Suppose that f (z) is a furictiorl defined on [ O , l ] , with a j u ~ i ~discontinuity p a t 2 0 E ( 0 , l ) arid contiriuous a t all other points in [0, I]. The fact that the Haar functions hj,k(x) have good localization iii tirrie leads us t o ask tlic question: Do the H a a ~coc:ficient.s (f,I L ~ . ~such ) that x0 E 13,k behave differer~tlythan do the H(l,ar. coefjicients S ~ L C that ~ L ~r:o $ IJ,k? In ptirticular, can wc firltl the location of a jump discontin~~ity just by examining tlle ;~l)solutevaluc of tlle Haar coefficients? Wc will sce that in fact wc can do tliis. For sirriplicity, let 11s asswnc that the givcn functio~if ( J ; ) is C2 or1 the iritervals [O, xu] :-tnd [.xu, I]. Tliis nlcarls that both f t ( x ) and f t t ( x ) exist. are co~it,iriuousfunctions, arid lierice are l)ou~idcdon each of these intervals. Fix i~ltegcrsj 2 0 a~itlO 5 k 5 2 5 - 1, anc1 let .c5,k 1)c the rriidpoirit of the interval IJ.k.;that is, .r,?k. - 2 - j ( k : 1/21. Tliere ;ire ~ i o wtwo possibilities; cithtr :x:o E I j . k .or 20 $!
+
Casel: .rao @ I,,,.. If .r.o 6 I j , k , then expancling f (:r) a1)out .x.,,,.by Taylor's fo~~iliulit, it follows tllttt for all .r: E
wliore
Ej,,.
i s s o ~ n epoint, i r ~rJ,k. Now, using thc k c t that
h j q k ( xd)s
-
0,
5.4. Comparison of Haar with Fourier
131
If j is large, then 2-5f12 will be very small compared with 2-"12; so we conclude that for large j ,
Tf xo E I,,*, t,hen either it is in I:,, or it is in I:,. Let us Case 2: r o E assume that xo E I:,,. The other case is similar (Exercise 5.30). Expanding f (x)in a Taylor series about $0, we have
Therefore,
where
Thus,
If j is large, then T 3 j I 2will be very small compared with 2 - j I 2 ; so we conclude that for large j ,
132
Chapter 5. The Haar System
The quantity Izo - 2TJk can in principle be small if s o is close t o the left endpoint of I ; , , arid can even be zero. However, we can expect that in rnost cases, xo will bc iii the middle of so that s o- 2-Jkl F; (114) 2-J. Thus, for large j. 1 ( f , )= f ( - ) - f ( ~ o + ) 2 - . 7 / ~ . (5.8)
11~
1 for 1a.rgej is Cornparing (5.8) with (5.6), we see that the decay of I ( f ,11 considerably slower if zo E Ij.kthan if :co @ IJ,k. That is, large coefficients in the Haar expansion of a fiinction f (.c) that persist for all scales suggest the preserice of a jurnp tliscoritiriliity ill the intervals IJ,k corrt:sporiclirig to the largc coefficients. 5.4.3 H a w Coeficients and Global Smoothness We know that the global silioothrless of a function f (.c) defined on [O. I] is reflected in the decay of its Fourier coefficients. Specifically, if f (z) is periodic and CK 011 R. thcn therc exists a constant A depending on f (z) such that for all Ir E Z, I(., 1 A ~ T I J - ~where . c,, are the Fourier coefficients of f (z) (Excrcisc 3.38). This can be regarded as a stntcrrlcnt about the frequency contcrlt of srnootli fi~nctions~ rlanlcly that snloother filnctions tend to have smaller high-frequency corrlponerlt,~than do filnctiorls that are not smootli. However. no such cstirnat,~holds for tlie Haar series. To see tliis. sirrlply note that tlie f~~nctiori f (n.) = o'"' has period 1 and is Cw on R with all of its dcrivatiw:~1)oilnded by 1 (Show tliis). But by Exercise 5 . 3 2 ,
<
-
ant1 since siri(:c)/~:= 1 a,t .c = 0 and sirice sin((1/4) 2-.I) % (114) 2-J for large j , this means that I(f. I I , . ~ ) (114) 2-"/' for large j . But this is the same rate of decay ol~servedfor furictions continuous lout with a tliscoritiriuous first derivative (Exaniplc 5.25(tl)). Herice, g1ol)al snloothness of a furlctiorl docs riot affect the rate of decay of its Haar coefficierits.
Exercises Exercise 5.29.
Show that i f f (.c) is C" on I , , k , then the Haar coefficients
satisfy 1
I( f . hJ,k)I = - q f f ( ~ J . ~ ) 2 - 3 J ' + 2 pJ,t(.~). where IrJ.r(x)l
1 <max 1 f " ' ( ~ ) 1 2 - ~ ~ / ~ 768 Z E I , . ~
5.5. Haar Bases on R
Exercise 5.30.
I(f,
133
Show that if xo E IT,k, then
hj,k)I
23/2 ) 2 ' ( k
+ 1) - x o ~ ~ ~ ( x o--f)( x o + ) -
Exercise 5.31. Let a l denote the first positive local rrlaximlirri of' the function f (x)= sin2(x)/x. Show that a* = 1.16556.. .. Exercise 5.32.
For integers j
4, (k) =
> 0 and 0 < k < 2J
1,
-
1, let
h,j,k(z) e-2"77'"' ctz.
(a) Sliow that
where
~ : j , k :=
2-.j(k
+ 112) is the rrlidpoirit of I,,c
(b) Show that ( a i ( k ) (is maximized when In( = (314) 2.j, and that the first positive zero of la{l( k )1 occln-s when n = 2J+' (see Exercise 5.31). Hence it is reasonable t o say that ! L ~ , ~ ( Xis) localized at the 2J f r e q ~ ~ e n cies satisfying 2.ipl 5 In(<_ 2.1.
Exercise 5.33. Using MATLAB or sonie other software. explore nurrierically the statement that h j,k ( 2 ) is approximately localized in frequency at the frequencies satsifying 2)-I 5 n l 23. Specifically, with f (z)= cos(2nm,x), for some fixed positive integer m , compute the quantities
<
for various j . Show numerically that Aj is largest when 21-I 5 n-n 5 2,. Do the same when f (x) is some finite linear cornbirlation of cosines at various frequencies.
5.5 Haar Bases on R In this section, we wish to define a system of Haar and scaling functions that form a complete orthonormal system on R.
134
5.5.1
Chapter 5. The Haar System
The Approximution and Detail Operators
Definition 5.34. For each j t Z , define the approximation operator P, on functions f ( z ) , L' o n 11, by
Remark 5.35.
(a) For each j E Z, define the appr.oximation space
1/, by
Since { p , % k(z):k E Z ) is an orthonormal system on R, Lemma 2.51 implies that Pjf (x) is the function in 5 best approximating f (x) in the L2 sense. (h) Since pj,k (x) = 2.i/2 X.lJ,a (x),
In other words, on the interval
Ij,k,
Pjf (cc) is the average value of f (z)on
For this reason, we think of the fuilction Pjf (x) as containing the features of f (x) a t r.esoL7~tionor scale 2 - J . In other words, in passing fro111 f (z)to Pjf (z),the behavior of f (x) on the interval I j , k is reduced to a single nurrlber, the average value of f (x) on Ij,k. Therefore, any data about the small scale variation of f (x) on I , , k is lost. In this sense, P , J ( L ) can be thought of as a blurred ~ ~ e r s i oof'f n (x) at scale 2-J; that is, the details in f (z)of size smaller than 2-J are invisible in the approxi~nationPjf (z), but features of size larger tllarl 2-j are still discernible in Pjf (z). We car1 prove the following facts about the operators Pj
Lemma 5.36. ( a ) For each j E Z , P, is linear; that is, given f (x) and g ( x ) , L~ o n R, and a , P E C, P J ( 0 f+ P 9 ) ( x ) = a P , f ( x ) +pP,g(x). ( b ) For each j E Z , PJ is idempotent; that is, given f (z), L2 o n R,
(c) Given integers j , j' with j
5 j', and g ( x ) E V,,
5.5. Haar Bases on R
(d) Given j
E
Z , and
f(z),
135
L2 on R,
Proof:The proofs of parts (a) and (b) are left as cxcrcises (Exercise 5.46). Part (c) follows frorn (b) and the fact that if g(x) E V,,then P,g(x) = g(x). We will prove (d) in detail. Since {pj,k(x))ktzis an orthonormal system on R, by Exercise 2.60,
By the Caucly-Schwarz inequality,
Thus.
Lemma 5.37. Given f (x)c': on R, (a) lim IIP3f - f l l z = 0 , and J'c=
(h) lim IIP3f112 = 0. 3'00
Proof: Suppose that f (x)is support,ed in an interval of the form [-2N, 2N] for some integer N. By Excrcisc 5.26(b), there is an integer J, and a function g(x) E VJ such that
If j 2 J, then by Lemma 5.36(c), P,g(x) = g(x) and by hlinkowski's inequality and Lemma 5.36(d),
136
Chapter 5. The Haar System
Since
E
Ir/ -
f ll2 < 5.
Combining this with (5.10) proves (a). The proof of part (b) is left as an exercise (Exercise 5.48).
Definition 5.38.
For each j E Z, define the detail operator Q, o n functions
f ( x ) , L2 o n R , by
& ~( f4 = P ~ + l(2) f - Pjf ( 5 ) . Remark 5.39.
(5.11)
(a) For each j E Z, we define the wavelet space W j b y
Since {hj,k(x))kEZis an orthonormal system on R and in light of (5.12), Lemma 2.51 implies that Q j f (x) is the function in W j best approximating f (x) in the L2 sense.
(b) In light of the interpretation of Pj f (x) as the blurred version of f (x) at scale 2-j, we can interpret Q, f (x) as containing those features of f (x) that are of size smaller than 2-1 but larger than 2 - p l . That is, Qj f (x) has those details invisible to the approximation Pj f (x) but visible to the approximation f (x).
Lemma 5.40. ( a ) For each j E Z, Q, is linear; that is, given f ( x ) and g(z), L' o n R , and a , p E C, Q , ( a f + P g ) ( z ) = a Q , f ( x ) +PQ,g(x). ( b ) For each 3 E Z, Q, i s idempotent; that is, given f (x),~ \ n R,
(c) If g(x) E W, and if j' is a n integer with j'
( d ) Given j E Z , and f (x), L'
o n R,
# j,
then
5.5. Haar Bases on R
137
Proof: Exercise 5.47.
Lemma 5.41. Gioen j E Z, and a function f (x)
on
R,
where the sum i s finite.
Proof: To prove this, let j t Z be given and let f (x) be CE on R. Consider QJf (x) for x E Ij.k. Note that
and that
if x E I,,*. For x t I!,,,
and on I:,k,
Since
2j/' -2il2 0
if x E I:,, if x E otherwise,
Q,f (x) = (f;h ~ , khj,k(x) ) and (5.12) follows.
138
Chapter 5. The Haar System
5.5.2 The Scale J Huur Sv.stern on R Definition 5.42.
Let J E Z be given. T h e collection
is called the scale J Haar system on R .
Theorem 5.43.
The scale J Haar system o n R is a complete orthonormal
system o n R .
Proof: Note first that by Theorem 5.15(a), the scale J Haar system on R is orthorlormal on R . To prove compl~t~eness, by Theorem 2.57(c), it is sufficient to show that given f (x), C: on R,
To this end, let E > 0 and let f (x) be C: on R. By Lemrna 5.37(a), there < E. We can take N > J . is an integer N such that J J PfN- f BY (5.11),
Therefore,
where the sum over k is necessarily finite because f (x) is co~npactlysupported. Since
and ( ( P N f -f
11 < E , completeness is proved.
5.5.3 The Huur System on R The Ho,ar system o n R (Definition 5.11) is a complete orthonormal system o n R.
Theorem 5.44.
5.5. Haar Bases on R
139
Proof: By Theorem 5.13, the Haar system on R is orthonornial on R. To prove completeness, by Theorem 2.57(c), it is sufficient t o show that given f (x), C:! on R,
To this elid, let
F
> 0 and let f (x) be C: on R. For ally J E N , by (5.11),
By Lerrlma 5.37(a) and (b), there exists J t N so large that
Therefore, by hlinkowski's inequality,
Also,
wlierc thc slirri over k is iiecessaril-y finite hecallst. f (:r) is conlpactly supported. Since J- I
complctcness is proved.
Exercises Exercise 5.45. Expose the fallacy in the followirlg argument,: Let f (x) be L2 on R. By Theorem 5.44, we may write
140
Chapter 5. The Haar System
Integra,ting both sides, we obtain
f (x) dx = 0. (Hint: Tlie Therefore, every function L~ on R must satisfy fallacy has notliing to do with f (x.). That is, we may assume that f ( L ) is integrable, or infinitely differentiable, or compactly supported and it will not change the argument as written.) Exercise 5.46. Prove Lemma 5.36(a) and (b). Exercise 5.47. Prove Lcmma 5.40. Exercise 5.48. Prove Lemrrla 5.37(b). (Hint: Use Minkowski's inequality ti11tl the fact that f (z) is conlpactly supported.)
Chapter 6
The Discrete Haar Transform 6.1 Motivation Rcctall that a fl~nctiorlf ( x ) ctcfinecl or1 [O. 11 hxs tar1 cxpunsiot~ill terms of Hawl. fiulc-lio~lsa:, follo~~rs. Gi\rc)il ally integcr J 0.
>
iii L2 on [O. 11. 111 ordcr to ~llc)t,iviit,(> i \ (liscrot(>versioli of tliis (~xpi~l~sioli, tllc' D ~ ; s c Y . ~ ~ c Hwar 'Tr7rar1,.s,for.7rl( D H T ) . wc iissllnle tll;it, wc l i i \ v ~only ~ :L finite. tliscret,c t~pproxiniatiwt,o f (.c). 111this ct~sc.tllo liiost natlual sl~cllapproxirllatioll is hY the tlgaclic. step fllrlctioil PAv f (.r). w11el.c N E N and N > J . Tliat is. gia-cli f ( . I . ) . 2r-1
f
(J.)
= p ~ f ( . ~=- C ) ( f . P N . ~ o) ~ . i ( x ) -
(6.2)
k=O
Tl~us.t,lic Haar- cocfficit.tlts of f ( J . ) call l;)e i~ppl.oxirriatcdby t,llr>Hi~ill.coefficicrits of PNf(.x.).That is.
Notc that,
(1) by Tlicorcirl 5.15(a), if j 2 N. tlle~i( f . arid
=
O for 0
5 2J - 1.
(2) sincc liiilN,, PN f ( . r ) = f(.r) in L~ 011 R (Leinrna 5.37(a)). tlie Cauclly-Schwarz ineqlxttlity irriplies tliat linl (PNf'. 11j . k ) = { f .1 1 ~ . k ) a.nd
N-+x
lit11 ( P Nf .
IV- > r n
pJ.k) =
(f.l ) J . k )
(6.4)
(Exercise 6.5). Supposc that we are given ix finite sequence of data of length 2N for some N E N. {cO( I C ) 2\} ~ -1, ~ . Wc assume that for some underlying function f (.c). cao(k)= ( f . ~ ~ Fix , ~ ,I) E. N. tJ < N. and for each 1 < j < tJ, define
142
Chapter 6. The Discrete Haar Transform
It turris out that there is a convenient recursive algorithm that can be used k ) d , ? ( k )from ~ j - ~ ( k This ) . algorithm t o compute the coefficierit,~~ , ~ (and uses the fact, proved in Exercise 5.18, that for each Q, k E Z, 1
(XI
+
P P , ~ = - (~e+i,zlc (x) p~+1.21;+1 (x)),
J'
(k)
Jz
=
( f - ~ ~ - , , . k )
-
1 -
h
1 (fl(r)~-j+l.2k) + - ( f .~
JZ
~ - j + l , 2 k + ~ )
arid by (6.6),
By writing (6.7) arid (6.8) in rriatrix form, it is easy t o see that the calculatiorl is corripletcly reversible. In fact,
6.1.1
The Discrete Haar Transform ( D H T )
Therefore, wc rrlake the followirlg definition.
Definition 6.1. Given J , N E N with .I < N and a finite sequence co = { c g ( k ) } : Z , ' , the ( D H T ) of co is defined by
where
6.1. Motivation
143
The inverse DHT is given by the formula
As with the DFT, thc DHT can be thought of as a linear transformation on a finite-dimensional space and as such can be written as multiplication by a matrix.
Definition 6.2.
Given L
N even, define the ( L / 2 ) x L matrices H L and
E
G L by 1 1 0 0
0 1
...
1
0
...
0
1
HL = 0
...
1
Define the L x L matrix Wr, by
Il'he matrix H L i s referred to as the approximation matrix, the matrix GL as the detail matrix, and the matrix WI, as the wavelet ma,trix.
From here on, we will suppress the subscript L in the matrices H L , G L , and WL in order to clarify the presentation. The value of L will either be clear from context or will be indica,t,ed sepa.ra.t,ely.It is easy to verify that for each L, W is an orthogonal matrix (Exercise 6.6). Hence its adjoint is its inverse; that is,
=
(H* G * ) ( E )
where I is the L x L identity matrix.
144
Chapter 6. T h e Discrete Haar Transform
If we consider our initial sequence of data, co, to be a vector of length L = 2 N , then the DHT algorithm reduces t o rr~atrixmultiplication. Specifically, let co = ( ~ " ( 0 )c o ( l ) - . . ~ ~- 1)): ( and 2 for~1 < j < J, let
and
dj = (d,,(0) dJ (1) . - - d , (2N--'
-
1)).
T h e n t , h e DHT of co is giver1 by
where H and G arc 2N-"2 by
x 2 N - , ~ - 1matrices. Thc inverse DHT is give11
Tlie above car1 be sunlrriarized as follows. Theorem 6.3.
Given .I, N E N with .I
and
(L
vector
c,, = (c(,(O) c.o(l) . . . c(lj2N - 1))
of lenqth 2", the D H T of co zs the liecto~.
where
where H and G are 2N-"2 x 2 N - ~ - matrices as in Definition 6.2, and 1 5 j 5 J . T h e inverse D H T i s given by
Thcorcrri 6.3 is illust,ratcd in the tree diagram in Figure 6.1.
Theorem 6.4.
G i w n N E N , the D H T of a vector of length 2" can be computed with no more than 2N+' multiplications.
Proof: Let c o
=
(co(0) c o ( l ) . . . ~
~- 1)) ( be 2given.~ Computing
6.1. Motivation
145
FIGURE 6.1. Tree diagram for the DHT.
requires one nlultiplication for each 0
< k < 2"-l
-
1, and conlputing
d l (k) = (l/v/Z) (cn(2k) - co (2k + 1 ) ) requires one multiplication for each k for a total of 2" niultiplications. Computing cz and d2 requires 2 " ' multiplicatio~ls.It follows that the total number of rriultiplications is
Exercises Exercise 6.5. Show that (6.4) holds for any function continuous on [0, 11. Exercise 6.6. Prove that the matrix W N defined in Definition 6.2 is symmetric and orthogonal.
Exercise 6.7. Prove that if {ii,(k):l
<j
5 J ; 0 5 I; j 2"-J
-
N
l } ~ { c ~ ( k ) :
is the DHT of a sequence {c~(lc)};=,', then
<2
"
~ 1)~
-
146
Chapter 6. The Discrete Haar Transform
6.2 The DHT in Two Dimensions In many applications, especially image processing, the objects being analyzed are best thought of as matrices, rather than one-dimensiorial finite signals. That is, we are interested in L x &I matrices c of the form c = {c(n,m): 0 5 n 5 L - 1; 0 m - 1). The purpose of this section is to define a generalization of the DHT for matrices.
< <
6.2.1
The Row-wise und Colurrzn-wise A p p r o x i m a t i o n s an,d Details
The generalization of the DHT to two-dirrlensiunal sig~lalsinvolves the separate application of t h e ordinary DHT to the rows and columns of the signal. Given an even nun~berL E N, let H arid G be thc ( L / 2 ) x L matrices dcfiricd in Definition 6.2. Let c be an hl x L matrix of the form
In this notation, ce is the 4th row of c. We define the row--wise approximation rnatriz of c, Hrowc, t o be the M x ( L / 2 ) matrix defined by
We define the row--wise detail matrix of c, GroWc, to be the M x ( L / 2 ) matrix defined bv
In fact.
Clearly, HroWc is the matrix obtained by multiplying ea,ch row of c by the matrix H, and GroWc is the matrix obtained by multiplying each row of c by the matrix G. See Figure 6.3 (left).
6.2. T h e D H T in Two Dimentions
147
Given L F N even, let c be an L x Af matrix of the form
where ce is the f t h column of c. We define the column-wise approximation matrix q f c , Hcolc, t o be the (L/2) x M matrix defined by
In fact, (Hcolc)(n, m) =
1 --
fi
1
c(2n, m ) + - c(2n + 1,m).
JZ
(6.12)
We define the column-wise detail matrix ofc, GroWc, t o be the (L/2) x A1 matrix defined by
In fact. ( c c o l c ) ( n ,m ) =
1 -
Jz
c(2n1rn) -
1 -
Jz
c(2n + 1.m).
(6.14)
Clearly, Hcolc is the matrix obtained by multiplying each colrlrnn of r: by the matrix H ; and Gcolc is the niatrix obtained by rrnrlitplying each column of c by the inatrix G. See Figure 6.3 (right).
6.2.2
The D H T for Matrices
We are now ready t o define the DHT for rnatrices. For sirrlplicity we will assurrie (1) that the matrices we analyze are square, and (2) that the nurriber of rows and columris of these matrices are powers of two, that is, that c is always a 2N x 2 N rnatrix for some N E N. p -1
Definition 6.8. Given J , N E N with J < N and a matrix co = { c ( n ,m ) },,,,,,=,,. For 1 5 j 5 J , define the 2N-' x 2N-.1 matrices c, d;", dl'), and d") .I by
where H C O ~Gcol, , HroW, and Grow are the 2N-3-2 x 2 N - 3 - 1 matrices defined b y (6.9)-(6.13). T h e D H T of co is the collection of rnatrices
148
Chapter 6. The Discrete Haar Transform
FIGURE 6.2. Original "magic square'' image.
Thtl i~ivcrscof t,lie DHT for rilatrices irivolvcs applying tllr atljoiiit of the iliatri~~ Hs iill(1 G row-wise i ~ ~ ~o1111111i-wis~. id Givoil L E N cvc.ri, lct II* wiicl G* 1 ) tllc ~ wtl~oiiit,~ of H ai1c1 G . Lct (. ail Af x ( L / 2 ) matrix of t,hc for111
We define tlic ro2li-wise npproz.irn,at~;onacijoint A1 x L ~riat~rix
r. ~ ' " ~ * cto. 11c
the
We define the ?.ow-wise detail adjoint o,f r , G " ~ ) ~ to * (be~ the , A 1 x L lnat,rix
6.2. The UHT in Two Dimentions
149
FIGURE 6 . 3 . Left: The row-wise approximation and detail matrices applied to image in Figure 6 . 2 . Right: The column-wise approximation and detail matrices applied to image in Figure 6 . 2 .
Hrow*c is the matrix obtained 11y multiplying eadl row of c: by the nia.trix H*, and GroW*c:is the rnatrix obtained by rrlultiplyilig each row of ca by the matrix G*. Given L E N even, let c be an ( L / 2 )x A4 rnatrix of the form
We define tlme colurr~n,l l l i s e detctil (tdjoi71,t of c. GroWc, t,o t ~ et1.w L x 41 matrix G ~ * r.~= l( G*co G*cl . . . G*c*Ai-l ) . (6.19)
the matrix obtained by nnrltiplying each rolu~rlnof i s by tJllr r~iatrixH*, and Gcol*r is the ~rlatrixobtairleii by nmultiplying car11 mlu~ilu of c by the matrix G*. Combining (6.9)-(6.19) we have that H ~ O ~ * is C
and that
150
Cha,pter 6. The Discrete Haar Transform
Since H*H
+ G*G = I.
Theorem 6.9.
~ ~ 0 G1 ,~
T h e inverse D H T for matrices i s given b y
H~ ~ Ol W
Proof: Fix j with 1 5 j
and
G ~ O W are 2 " - 3 - 2
X p - 3 - 1
i~aatricesgiven above.
< J. Then
Similarly,
G r ~ ~ * H ~ ~ l *+dGj r2~) ~ * G ~ ~ l *=d G j 3TOW ) * Grow Since
H ~ O W* H ~ O W cj-1
+ Grow * Grow
Cj-1
-
cj-1,
(6.23) follows.
Exercises Exercise 6.10. Prove that the matrices HroW and Hc0' commute. That is, prove that for any vector c , HroWHcol c = H"O' HroWc .
6.3 Image Analysis with the DHT A digital black-and-white image is stored in a computer as a matrix of numbers { c o ( n ,m)). Each location ( n ,m) in the matrix corresponcls to a picture element or pixel, and the value of c ( n , m) is a nonnegative integer indicating the gray scale value or pixel value a t that location. The pixel values range from 0 t o some maximum value hf.A value of 0 means that
6.3. Image Analysis with the DHT
151
FIGURE 6.4. DHT of "magic square" image with J = 1. The four quadrants contain (clockwise from upper left) el, dl , d(3), d ( 2 ).
,
the intensity at that location in the image is black, and a value of M means that the intensity at that location is white. Numbers in between 0 and &I represent various. shades of gray. If the matrix of pixels is displayed on a grid with each cell of the grid given the appropriate shade of gray, then the eye interprets the display as an image. In this section we will use the DHT for matrices to analyze images represented by matrices of gray scale values. We will give interpretations of the DHT matrices c j , d (1) j , d (2) j , and d (j 3 ) in terms of corresponding features of the image and will indicate the properties of the DHT for matrices tJhat make it effective for image compression.
6.3.1 Approlcirnation a n d Blurring In this subsection, our goal is to provide an interpretation for the approximation matrices cj produced by (6.15). Consider first the 2N-1 x 2N-1
152
Chapter 6. The Discrete Haar Transform
FIGURE 6.5. DHT of "magic square" image with J = 2. Uppcr lcft quadrant contains (clockwise from upper left) c2, d 2(1) , d ,( 3 ), .
approximation matrix e l . Since cl = H'$ 2N-1 for any fixed 0 5 n , m < I
H!$wcol we have by (6.12) that
6.3. Image Analysis with the DHT
153
In this calculation, the pixel values of the four pixels at locations ( 2 n . 2 m ) . (2n. 2rr1. + 1). ( 2 r ~+ 1.2m). and ( 2 r ~ 1 . 2 +~ 1~) are replaced by a single value c z ( n , m ) that is twice the average value of those four pixels. This mearis that ally variation in pixel values within that 2 x 2 block of yixels is lost. In other words, el represents only those features of the image that exist at scale 2 or larger. See Figure 6.4. Computilig the rnatrix c2 irlvolves taking twice the average of the four valucts ci ( 2 n , 2171,), c1( 2 n + 1 , 2 m ) , c1( 2 n , 2 m + 1 ) , and c1(2n 1,21n + 1). Since each of these numbers was cornputed as an average (times 2) of a 2 x 2 block of pixels, each elerrlent of c2 is an average (times 4) of a 4 x 4 block of pixels. Thus, the variatioil in pixel values within that 3 x 4 block of pixels is lost. That is, cz represents only tllosc fcatures of the image that exist at scale 4 or larger. In general, the matrix c j represents those features of the irna.ge that exist at scale 2J or larger. See Figure 6.5. In order to see this "blurring effect" more clearly. we take a 256 x 256 image a i d compute the 128 x 128 matrix c l . the 64 x 64 matrix c z , and the 32 x 32 matrix c:$ (Figure 6.6). We reconstruct a bhlrred version of the original irnagc froin these iiiatrices by itssurni~igthat tlie detail matrices d (j1 are zero for i. j = 1, 2, 3, and the11 applying the illverse DHT for nlat,rices. We see that the blurred images are very blocky arld are fairly unpleasant to look at. We will see in Chapter 9 that this effect can be mitigated by designing wavelet and scaling filters that correspoild to smoother scaling and wavelet functions.
+
+
6.3.2 Hor.ixonta1, Vertical, and Diagonal Edges Intuitively. an edge in an irrlage is a point at which there is a large variation ill pixel value. That is, if the value of a pixel is significarltly different froril the value of orle of its neighbors. theri we say that that pixel is an edge point of the image. Now. each pixel in an irriage lii~seight neighbors. two in the horizontal direction, two in the vertical direction, and four in the diagonal direction. If at a given location, the variatiorl in pixel value is snlall in the vertical dirpctioii but large in the horizorital direction. then that pixel is a vertical edge point of the image. Sirriilarly. if the variation in pixel value is small in the horizo~ltalclirectiori 11ut large in the vertical direction. then that pixel is a horizontal edge point of the iiuage. If the variation is large ill both the horizoiital and vertical directions, then the pixel is a diagonal edge point of the image. Sirlc~the DHT for inatrices illvolves coniputing averages and differences of adjacent pixel values in various combirlations. we can interpret tlie DHT coefficicnt,~as ideiltifyi~igedge poiilt>sof the image. Corlsider for exanlple the 2N-1 x 2N-1 mat,rix d j l ) derived from an
z N -1
image coo = {co (n, rn) I,,,,, =0 by (6.15). Fix 0
5 n. 7 n 5 2 N - 1 . Since
Chapter 6. The Discrete Haar Transform
154
djl) = G ~ B ~ H we ~ % have ~ that C ~by, (6.12) and (6.14), djl)(n,m )
-H
=
row
~ N co(2n,m)
JZ
1
--
JZ
H;O,WC,,
1 (co(2n,fZm) 2
=
-
(2n
+ 1,m )
+ co(2n, 2m + I ) )
If (2n, 2m) is a horizontal edge point of the image co: then the differences co(2n, 2m) - co(2n 1,2m) and co (2n,2m I ) - co(2n 1,2rrz 1) will tend to be large due to the large variation in pixel values in the vertical direction. If (2n, 2m) is a vertical edge point, then these same differences will tend to be close to zero. If (212, 2.172) is a diagorlal edge point, then tlie pixel values will tend to be similar in one of the diagonal directions. That is, at least one of co(2n,2m) - co(2n 1:2m 1) or co(2n,2m + 1)- co(2n 1,2m) will be
+
+
+
+
+
+
+
close to zero. Hence, if (2n, 2m) is a horizontal edge point then dl( 1 ) (n, rn) will tend to be larger than if (2n, 2m) is either a vertical or a diagonal edge. The same argument can be made if the edge point is at (2n,2m + l), (2n 1,2m), or (2n 1:2m 1). Similarly, d l( 2 ) (n. m) will be largest if any of (2n, 2m) : (2n: 2m 1).(2n 1,2m), or (2n 1,2m 1) is a vertical edge, and dY)(n,m ) will be largest if any is a diagonal edge. See Figure 6.4. Since the matrix cj can be thought of as containing the features of the origi~lalimage t,hat are of size 2-r la,rger, t,lia,t is, those features that are larger than scale 2% the matrices d,(1), d,( 2 ) , and d (j 3 ) are interpreted as identifying, respectively, the horizontal, vertical, and diagonal edges at scale 2.7' (See Figures 9.3, 6.7-6.9).
+
6.3.3
+ +
+ +
+
+
"Naive" Image Compression
The key to good image compression is to find a representation of the image with as few numbers as possible. In the language of orthogonal decompositions, this means finding an orthonormal basis in which most of the coefficients of the original image are zero or at least very close to zero. In principle the srnall coefficients can be set to zero without significantly affecting the quality of the image. The purpose of this section is to illustrate some of the principles that make wavelets effective for image compression. The central idea has been
6.3. Image Analysis with the DHT
155
alluded to before, namely that in decomposing a matrix cn into the four matrices cl, d l(1) , d y ) , and dl( 3 ), we have separated the smooth (or slowly varying on a scale of two pixels) parts of the image from the nonsmooth (or rapidly varying on a scale of two pixels) parts of the image. These latter parts are usually interpreted as edge points. If the image consist.^ of large areas of constant intensity separated by edges (which is true of many images), the detail mat,rices will contain many elements that are nearly zero. The same is true when we decompose the matrix cl into c2, d2(1), d(2) , and d r ). This principle is illustrated in Figure 6.10. Here we have taken an image and have computed its DHT with J = 3. We choose various thresholds; that is, fixed numbers below which the DHT coefficients are set to zero, and compute reconstructed images. We see that if 80% of the smallest coefficients are set t o zero, the image is virtually unchanged. If 90% of the smallest coefficients are set t,o zero: most important features of the image are still visible. If 97% are set t o zero, there is significant distortion, but gross features of the image are still recognizable.
FIGURE 6.6. Original image (top left). Reconstruction using only the coefficients (top right), c.2 coefficients (bottom left), and r:c coefficients (bottom right). Reconstructions are increasingly blurred and blocky.
6.3. lrrlage Ai~alvsiswith the DHT
15'7
FIGURE 6.7. Left: Horizontal edges at scale 1. Right: Horizontal edges at scalc 2.
FTGURE 6.8. Left: Vertical edges at scale 1. Right: Vertical edges at scale 2.
158
C1,lapter 6. The Discrete Haar Transfur~n
FIGURE 6.9. Left: Diagonal edges at scale 1. Right: Diagonal edges at scale 2.
6.3. Image Analysis with the DHT
159
FIGURE 6.10. Original image (top left). Compressed image with smallest 80% of DHT coefficients set t o zero (top right). Compressed image with smallest 90% (bottom left) and 97% (bottom right) of DHT coefficients set to zero.
Part I11
Ort honormal Wavelet Bases
Chapter 7 Mult iresolut ion Analysis In Section 5.5, we saw that if h(x) = X[o.l/2)(x) - X[1/2,1)(x), then the collection
forms an orthonormal basis on R. In this chapter, we will see how this construction can be generalized. In particular, we will present a general framework for constructing functions $(x), L2 on R, such that the collection
is an orthonormal basis on R. Such a function $(x) is called a wavelet and the collection {$j,k(x)}j,kEza wavelet orthonormal basis on R. This framework for constructing wavelets involves the concept of a mz~ltiresobution analysis or MRA. Before giving the definition of an MRA, we need to study some properties of c~llect~ions of fi~nctionsof the form
where g(x) is some fixed L~ function. In Section 7.1, we address the following questions: (1)When is the collection {T,g(x)} an orthonormal system? and (2)When does Llle subspace spal?{Tng(x)) admit an orthonormal basis of the form {Tnh(x)) for some possibly different function h(x)? In Section 7.2, we define the notion of an MRA and derive some of its basic properties and in Section 7.3 present some examples of XIRA. In Section 7.4, we give the very simple recipe for corlstructing a wavelet orthonormal basis from an MRA and present some examples of wavelet bases. In Section 7.5. we present a proof that this recipe works, and in Section 7.6, we gather some necessary properties of the scaling and wavelet functions t#hatfollow from the definition of MRA arid the colistruction of the wavelet. Thcsc properties will be useful in later chapters when we explore more examples and generalizations of wavelet, orthonormal bases. Finally, in Section 7.7, we discuss the Battle-LeMarii! construction of spline wavelet orthonormal bases.
164
Chapter 7. Multiresolution Analysis
7.1 Orthonormal Systems of Translates In our study of multiresolution analyses arid their associated wavelet bases, we will frequently encounter orthonormal systems that are integer translates of a single function. In addition to sharing the general properties of orthonormal systems presented in Section 2.3, systems of this form also have special properties that will be valuable in the construction of wavelet bases. In this subsection, we present some of these properties.
Definition 7.1. A n orthonormal system o n R of the f o r m {Tng(~))nE~, where g(z) i s L2 o n R i s called a n orthonormal system of translates.
Example 7.2.
(a) The collection of scale 0 Haar scaling functions
(Definition 5.9) is an orthonormal system of translates by Theorern 5.14. (b) The collection of scale 0 Haar functions {ho.k(z):k E Z ) (Definition 5.11) is an orthonormal systern of translates by Theorem 5.13.
Rernark 7.3. By Theorem 2.55, if (T,g(x)) is an orthonormal system of translates, then it is by definition an ortllornormal basis for the subspace span{Tng(x)). In other words,
f (x) E span{T,g(x)) if and only if
Lemrna 7.4.
T h e collection (T,g(z)) i s a n orthonormal system of translates zf and only if for all y t R,
Proof: Not,e first, t,ha,t, ( T k g , Tl g ) = ( g , Ttpkg ) = 6 ( k - !) if and only if (g, Tkg) = 6(k). By Parscval's formula,
7.1. Orthonormal Systems of Translates
165
(see Exercise 7.10 for a justification of the interchange of the sum and integral in the last step). By the uniqueness of Fourier series,
if and only if x F ( y + n ) / 2 = 1 forall ~ E R . n
Lemma 7.5.
Let { T , g ( x ) ) be a n orthonormal system of translates. T h e n the function f ( x ) t @Zi%(Tng(x)) zf a n d only i f there i s a n t2 sequence { c ( n ) ) such LJLU L
ic-:) = n-a (Ccc1.
e-2x.n~)
n
Remark 7.6. The only assumption being made about the coefficieiits { ~ ( n ) is ) that C , Ic(n)I2 < CCI. Therefore, we cannot conclude necessarily that the Fourier series Enc ( n )e p Z T i nis~ a piecewise continuous function. The most we can conclude, however, is that this Fourier series represents a function L2 on [0,11 in tlie sense of Lebesgue. We know this to be true by the Riesz-Fischer Theorem (Theorem 4.48). This particular technicality does not enter seriously into the proof of Lemma 7.5; so we will not mention it further. In light of (7.3), we know that the coefficients c ( n ) are given by ( f ,T,g) so that by making appropriate assumptions on the functions f ( x ) and g(x), we can say more about the coeficierlLs ~ ( 7 1 )and llle corresponding Fourier series.
Proof:
(=-=+) Suppose that f ( x ) E ~ { T n g ( x ) )By . Theorem 2.55,
in L2 on R.Taking the Fourier transform of both sides and using Theorem 3.40(b),
166
Chapter 7. Multiresolution Analysis
By Bessel's inequality,
so that
sat sifies (7.2). (t=) Suppose that (7.2) holds. Then by the Riesz-Fischer Theorem (Theorem 4.48),
Letting
C
f ~ ( =4
c ( n )Tng(:c),
InllN
it follows that f N ( x ) E span{~,g(x)} and that
Therefore, by Plancherel's formula (Theorem 2.57(d)), Lemma 7.4, and c(n) tip2""?, the periodicity of
and hence f (x) E @ZE{T,g(z)).
7.1. Orthonormal Systems of Translates
167
In some of the examples of multiresolution analyses that follow, we will encounter collections of the form { T , ~ ( X ) )that , ~ ~are not orthonormal systems of translates but that satisfy a weaker version of (7.1), namely that there exist constarlts A, B > 0 such that for all y E R,
For such a system, we wish to consider the subspace ~ { T n g ( x )and ) show that in fact there is an L~ function g(x) such that {T,,g(x)) is an ortllonormal basis for @Zii{T,g(x)). The construction of g ( x ) is referred to as an orthogonalixation of the collection { ~ , , g ( x ) ) . The following lemma shows how to orthogonalize a collection {T,g(x)) satisfying (7.4). To avoid certain technicalities in the proof, we assume that g(x) has compact support. Lemma 7.7. Suppose that g ( x ) is L' o n R with compact support. If the system { T , g ( x ) ) satisfies (7.41, then there is a function ;(XI, L~ o n R, such that: (a) {T,g(z)) is a n orthonormal system of translates and
Proof: Sirice g(x) is compactly supported, Exercise 7.11 and (7.4) implies that the function
is a trigonometric polynomial that never equals zero. Define
Then @(y) is C0 (in fact, Cm) on R and car1 be expanded in a Fourier series as
where the Fourier coefficients satisfy Define the function g ( x ) by
C , Ic(n)l2 < a.
Taking the inverse Fourier transform of both sides, it follows that
168
Chapter 7. Multiresolution Analysis
Since g(x) has compact support, the sum on the right side is finite if x is restricted to any closed finite interval. Hence, on every such interval, g(x) is piecewise continuous and so is piecewise continuous on R. By (7.4), @(y) is L" on R so that ?(?) and F(z)are L2 on R. Since @(y)has period 1,
for each k E Z, and hence
By Lemma 7.4, {T,g(z)) is an orthonormal system of translates and (a) is proved. To see that (b) holds, note that by (7.5) and Lemma 7.5,
By Exercise 7.9, Tkg(x) E span{T,g(x)) for each k E Z and by Exercise 2.61, span{TTLg(z)} is closed under the formation of linear combinations. Therefore span{T,,g(x)) C ~pan{T,g(x)}. Let f (x) E S@Zifi{~,g(x)). This means that given t > 0, there is a function h ( z ) E span{~,ij(x)) such that (1 f - hlla < c/2. Since span{T,g(x)) C: ~p""{T,g(x)), there is a function r ( x ) E span{T,g(x)) such that 11 h-rl2 < €12. Therefore, by Minkowski's inequality,
Hence f (x) E span{T,g(x)) and
A
Since g^(y) = y(?) @-l(-y) the same argument with a l ( y ) replacing @(y) shows that span{T,g(x)) C spnn{T,g(x)} and (b) follows.
Exercises Exercise 7.8. Given any function f (x), L~ on R with J J f 1J2 center of mass of f (x) is defined to be the number
=
I, the
7.2. Definition of MRA
169
provided that the integral exists. Prove that if the center of mass of a function $(x), L2 on R with )($112 = 1, is m, then the center of mass of 4,,k(x)is 2 - j ( r n k ) .
+
Exercise 7.9. Consider the collection of functions (orthonormal or not) {Tng(x)) where g(x) is L~ on R. Show that w { T n g ( x ) ) is invariant under integer translatio~l.T l l a ~is, slluw that if f (x) t span{TT,g(z)), then Tkf(x) E spal?{Tng(x)) fur each k E Z. Exercise 7.10. Justify the interchange of the sum and int,egral in the proof of Lemma 7.4 by showing that if g ( x ) is L2 on R, then the suln
converges in L1 on [0, I].
Exercise 7.11.
(a) Show that if g(x) is L2 on R, then
(b) Conclude that if g(z) is L2 on R and has compact support, then
En
+ n) l2
is a trigonometric polynomial.
7.2 Definition of Multiresolut ion Analysis Definition 7.12. A multiresolution analysis o n R i s a sequence of subspaces { V , ) , E ~of functions L~ o n R satisfying the following properties. ( a ) For all j f Z , T/,
C V,+1.
( b ) I f f ( z ) i s Cz o n R, then f (z) E Sj5iE(V,),Ez. That is, given i s a j E Z and a function g ( x ) E V, such that I1 f - 9112 < E .
E
> 0, there
( d ) A function f (x)E Vo i f and only if D02.1 f (x) E VJ. (F)
There exists a function cp(z),L~ o n R, called the scaling function such that the collection {T,cp(z)) i s a n orthonormal system of tran,slates and
170
Chapter 7. Multiresolution Analysis
Remark 7.13. (a) Typically, an MRA is defined by first identifying the subspace Vo, defining V, by letting
so that Definition 7.12(d) is satisfied, and then proving that Definition 7.12(a), (b), (c), and (e) hold. Vo can be defined by first identifying a function cp(x) such that {T,p(x)) is an orthorlormal system of translates, and then defining
(b) Note that if f (x) E Vo, then by Definition 7.12(e),
By the orthonormality of {T,,cp(x)),
7.21 Some Basic Properties of MRAs In this subsection,
{VJ)
always denotes an MRA with scaling function p(x).
The Approxirrlation and Detail Operators
Definition 7.14.
For each j , k
E
Z define cpj,k(x)b y
For each j E Z, define the approximation operator Pj o n functions f (x),L' o n R by
For each j
E
Z,
define
th,e detail operator Q, o n functions f (x),L2 o n
R
by
Lemma 7.15. For each j E Z, {pj,k(x))j,kEz i s a n orthornormal basis for
7.2. Definition of MRA
171
Proof: Note t h a , t since y O , k ( ~ E) VOfor all k, Definition 7.12(d) implies that D 2 3 ~ 0 , k= ( ~(Pj,k(x) ) E for a11 k . Also, since {cpo,k(x)}is an orthonormal system of translates, Theorem 3.42 (c) implies that,
Hence { c p j , k ( ~ ) ) k E z is an orthornormal system on R. Given f (x) E DZ-/ f (x) E Vo SO that by Definition 7.12(e) and Theorem 3.42(c),
4,
Applying D2J to both sides of Lhe above equation, we obtain
and the result follows by Theorem 2.55.
Lemma 7.16. (a) lim (IP,f
For all f (x),C: o n R: -
3'00
flln =
0, and
Proof: To see (a), let E > 0. By Defirlition 7.12(b), there exists J E Z and g(x) E VJ such that 11 f - g1I2 < ~ / 2 By . Definition 7.12(a), g(x) E and Pig(x) = g (x) for all j 2 J. Thus,
VJ
IIf
-
Pjf 112
Ilf 5 Ilf
g I PJg - P J f l l z
=
9/12 + IIPJ(f- g)112 2llf - 9112
<
-
E,
where we have applied Minkowski's and Bessel's inequality. Since this holds for all j J, (a) is proved.
>
172
Chapter 7. Multiresolution Analysis
To see (b), suppose that f (x)is supported in [-A, A] and let c > 0. By the orthonorrnality of { p j , k ( z ) ) k E z , and applying the Cauchy-Schwarz and Minkowski inequalities,
To do this, let c > O and choose K so large that
Therefore, if 2,'A
< 112, tiler]
Since for each k E Z, limj+-,
.2'A-k J-2JA-k
lq(x)l2dx = 0 ,
7.2. Definition of MRA
Since
E
173
> 0 was arbitrary, (b) follows.
The Two-scale Dilation Equation Lemma 7.17.
There exists a n t2 sequence of coeficients { h ( k ) } such that
in L' o n R. Moreover, we m a y write
where
Proof: Since cp E VoG Vl, and since by Lemma 7.15(a), { q l , k ( ~ ) )isk E Z an orthonorrrlal basis for Vl,
Thus, (7.7) holds with h ( k ) = (cp, c p l , k ) , which is e2 by Bessel's inequality. Equation (7.8) follows by taking the Fourier transform of both sides of (7.7).
Definition 7.18. LeL y3 ( z ) be the scaling function associated with a n MRA {V,}. T h e sequence { h ( k ) ) satisfying (7.7) is called the scaling filter associated with y ( x ) . T h e .function r r ~ ~ defined ( ~ ) by (7.9) is called the auxiliary function associn.ted with ~ ( 2 ) .
Remark 7.19. To call h ( n ) a filter is slightly misleading. According to Definition 4.13, a .filter must satisfy C , ( h ( n 1) < oo.This does not necessarily follow from the definition of h(n)given in Lemma 7.17. It is convenient t o makc this assumption, and we will do so in what fulluws. In fact the sca,ling filter will satisfy I h(n)1 < CX; in every example in this book but one (the Bandlimited MRA) .
En
Chapter 7. Multiresolution Analysis
174
7.3 Examples of Mult iresolut ion Analysis 3
.
The Haar M R A
Let Vo consist of all step functions f (x) such that (1) f (x) is L2 on R and (2) f (x) is constarlt on l l ~ einle~vals for k: E Z. 111other words, Vo is the collection of all scale 0 dyadic step functions, L2 on R. By Exercise 7.26,
where p(x) = XIO,L)(x). Since by Theorem 5.14, {T*(z)) is an orthonormal system of translates, this proves that Definition 7.12(e) is satisfied. For each j E Z, define 4 by Definition 7.12(d); that is, f (x) C T$ if and only if D2-.,f (x)E Vo. By Exercise 7.27, 4 consists of all step functions f (x) such that (1) f (z) is ~ % n R and (2) f (x) is constant on the intervals Ij,k, for k E Z. In other words, V, is the collection of all scale j dyadic step functions, L2 011 R. It remains only to verify Definition 7.12(a)-(c). To see tha,t Definition 7.12(a,) holds, we must prove that if f ( x ) E Ii,, then f ( x ) E y+l for ariy j E Z. Recall that by Definition 5.3, IJ,k = Ij+1,2kU Ij+1,2k+l for all j , k E Z. This means that if f (x) is corlstarlt or1 Ij,k for all k E Z, it is also constant on Ij+l,efor all 4!E Z. Thus, if f (x) is a scale j dyadic step function, it is also a scale j 1 dyadic step function, and Defiriitiorl 7.12(a) is verified. That Definition 7.12(b) holds is a direct consequence of Lemma 5.37(a). To see that Dcfirlition 7.12(c) holds, note that to say that f (x) E ng-, is to say that (1) f (x) is L2 on R and (2) f (z) is constant on thc intcrvals [o,CCI) and (-oo,0). But the only such function is the function identically zero.
+
v.
7.32 The Piecewise Linear MRA Let Vo consist of all functions f ( x ) , L~ and C0 on R., a n d linear on the intervals for k E Z. For each j E Z, define V, by Definition 7.12(d); that is, f (x) E if and only if D2-) f (x) E Vo.By Exercise 7.28, consists of all functions f (x), h2 and C0 on R, and linear on the intervals IJ k , for k c Z. It remains to verify Definition 7.12(a)-(c) and (e). To see that Definition 7.12(a) holds, we must prove that if f (x) E V J 1 then f ( x ) E V,+l for any j E Z. Since IJ,k = IJ+1,2kU IJ+1,21c+l for all for all k E Z, is also lincar on IJ+l,e j, k E Z , any function linear on for all .t E Z. Thus, if f (x) is L2 and C0 on R and linear on IJ,k for all k E Z, it is also L2 and C0 on R and linear on IJ+l,efor all !E Z. Thus, Definition 7.12(a) is verified. In order to prove Definition 7.12(b), let E > 0 and let f(x) be C: on R and supported in the interval [-A, A]. Since f (x) is continuous and has
5
compact support, it is uniformly continuous; so for j large enough, we know that given xo E I j , k , If ( x ) - f ( xo)l < for all x E IJ,kand k E Z. Now, let f j ( x ) be defined as follows. For each k E Z, let 13,k= [a,b) and let b-x x-a f (a') -f ( h ) , f,(4= - a b-a for x E Ij,+.Since b-x x-a = 1, b-a b-a
~/m
,
+
+-
.ir~i-.:~w =
b-x x 7 f
(
b -a b-x
x-a xb -) -
b-x b-u
x-a
b-a x
-
a
for all x E Ij,k.Thus,
This proves Defintion 7.12(b). To see that Definition 7.12(c)holds, note that to say that f ( x ) E ng-,V, is to say that ( 1 ) f ( x ) is L~ and C O on R arid (2) f ( x ) is linear on the intervals [O, m) and ( - O O , ~ ) . But the only such function is the function identically zero. To see that Defintion 7.12(e) holds, we will use Lemma 7.7. Before applying the Lernrna, we will need to establish the following facts. Let
Lemma 7.20.
I f f (x) is then f (x) can be written
CO
o n R and linear o n the intervals Io,k for k E Z ,
where the s u m converges pointwise.
Proof: Let k E Z be fixed, and consider (7.11) for x E 10,k.For any such x , the sum on the right side of (7.11) consists of exactly two terms. Hence the sum converges pointwise and we must verify that in fact f ( x ) = f ( k ) Tkcp(x)
+ f ( k + 1)T k + l v ( ~ ) .
(7.12)
176
Chapter 7. Multiresolution Analysis
Since T,cp(x) is linear on I o , k for all n t Z , it follows that the right side of Since Tkcp(k) = cp(0) = 1, and Tk+lcp(k) = p ( - 1 ) = (7.12) is linear on 0 , equation (7.12) is satisfied when x = k. Since Tkcp(k 1 ) = p ( 1 ) = 0 , and T k + l p ( k + 1) = p ( 0 ) = 1 , equation (7.12) is satisfied when x = k + 1. Thus, the right side of (7.12) is a linear function on I o , k that agrees with f ( x ) at the endpoints. Since f (x) is also linear on it must agree with f ( x ) on the wholc interval. Sincc this holds for any k E Z , (7.11) holds for every x E R.
+
Since we are interested in L~ convergence of (7.11) and since pointwise convergence does not necessarily imply L 2 convergence, we must prove L~ convergence separately.
Lemma 7.21. Suppose that f (z) i s linear o n the i.r~ter.uul [n,rL + 1) Jos some n E Z. Then
Proof: Since f ( x ) is linear on [n,n+ 1 ) , f ( x ) = f ( n )+ ( f (n+1 ) - f ( n ) (x ) n) for x E [n, n 1 ) . Therefore,
+
Because of the inequality 2ab 5 a 2
Therefore.
Also
+ b2, for any real numbers a and b,
7 . 3 . Examples of MRA
177
a'nd (7.13) follows.
Lemma 7.22. Suppose that f ( z ) is L~ and C O o n R and is linear o n the intervals I o , for ~ k E Z. T h e n (7.11) holds in L~ o n R. Proof: Since f ( x ) is L2 on R, (7.13) implies that
In particular limlni,, 1 f (n)I2= 0. Let M , N E N , and consider the partial sum cause (7.l l ) holds pnintwise,
x
f(-M)(x I M+1)
N
f ( n ) T n ~ ( x=)
n=-hl
f (4
f ( N ) ( N +1 - x ) 0
xr=N=_M f (n)T,,q(x). Beif x E [-A(- 1 , - M ) , if x E [-A[, N ) , if x E [ N , N I ) , otherwise.
+
Therefore.
+
lim
+
lim
If
(x>12 dx
I f (412dx
Chapter 7. Multiresolution Analysis
178
Lemma 7.23. Vo = span{T,p(x)). Proof: Since f (z) E 6 is L~ and C0 on R and linear on 1 0 , k for k E Z. Lernma 7.22 says that (7.11) holds in L2. Since every partial sum of the f ( n )Tnv(x) is in span{Tnp(x)}, Lemma 7.22 implies that form f ).( E s p a n { T T L c p ( . ~ ) ) . Conversely, suppose that f (x) E SjT%i{Tncp(x)}. By Exercise 2.62, we can find a sequence of functions { fk(z)}such that f k ( x ) E span{Tn cp(z)) and lirnk,, 11 f - f k 11 2 = 0. We need to show that f (z) is linear on the intervals loe for e E Z, that f ( x ) is C0 on R and that f (x) is L~ on R. Fix & E Z. Since each f k (x) is linear on In,! and since f k (x) converges to f (x) in L~ 011 Exercise 7.30(a) implies that f (x) is also linear on Io,g. Since by Exercise 7.30(b), the convergence is also uniform, f (x) is C0 on R. Finally, since each f k ( x ) is L2 011 R arid since f k ( x ) converges to f (x) in L~ OII R, f (z) is also L~ on R. Hence, f (x) is in Vo.
c:=-~
We are now in a position to prove that Definition 7.12(e) holds.
Theorem 7.24.
There is a functi07~@(LC), L~ o n R, S,UCJZ that:
( a ) ( T , , y ( x ) }i s a n orthonormal system 0.f translates, and
(b) Vo = 3FGi{Tr,F(x)).
Proof: We will prove this theorem by applying Lemma 7.7. To do this, we must first show that cp(.z.) = (I - 1x1) Xr-l,ll (x) satisfies (7.4). In order to do this, note that (Tncp,T,p)
=
213 if n = m , 116 if In - ml = 1, 0 otherwise.
(7.14)
By Exercise 7.11, it follows that
Hence.
for all y E R and (7.4) is satsified. The theorem follows by Lernma 7.7(a) and Lemma 7.23. See Figure 7.1.
7 . 3 . Examples of MRA
7.1. Top left: ~ ( z ) . Top IG(? + n) 12)-'I2. Bottom right:
FIGURE
(C,
7.3.3
right:
F(?).
Bottom
179
left:
A
G(?).
The Bandlimited MRA
Let Ifo consist of all functions f (x) bandlimited with baridlirriit 1 (Definition 3.47). In other words, Vo consists of all functions f (z),L~ on R, such that is supported in the interval [-1/2, 1/21. For each j t Z, define I.; by Definition 7.12(b). That is, f (x) E V, if and only if D z P Jf (x) E Vo By Exercise 7.31, 5 consists of a11 filnctiorls f (x) bandlimited wit21 bandlirriit 2J. To see that Definition 7.12(e) holds, let
y(?)
Then {T,cp(x)} is an orthonormal system of translates by Exercise 3.52. It remains to show that
To see this, note that if f (x) E Vo, then by the Shannon Sampling Theorem (Theorem 3-50), f (x) E s p a n { T , c p ( ~ ) ) , ~ ~For . the opposite inclusion, note that if f (x) E span{T,,,cp(~)}, then by Exercise 2.62 there is a sequence of functions {fk(x)}k6Nsuch that for each k E N, f k ( x ) E span{T,cp(x)),
180
Chapter 7. Multiresolution Analysis
and such that limk,, 11 f - f k I 2 with bandlimit 1. Therefore,
= 0. For each
=
Ilf
k
E
N, Jk(x)is ba~ldlii~iited
2
- fklla
-+ m , we conclude that h71,,,z 1 f ( ? ) I 2 dy = 0 and hence that Y(7) = 0 if 171 > 1/2. Thus, f ( x ) E Voand (7.15) follows. A
Letting k:
. . ,
,
4
Clearly the subspaces are nested so Definition 7.12(a) is satisfied. To verify Definition 7.12(b), let f (x) be C: on R. Since f (z) is also L2 on R, Plancherel's forniula says that f^(y) i d 2 on R By Corollary 2.37(b), there is a function g(y), C: on R such that 11 f - ? I z < E . Since ?(y) is L2 orr R, so is g ( x ) (it's inverse Fourier transform) by Pla,ncherel's fnrrn~lla,.Also by ,. Plancherel's formula, 11 f - gl12 = 11 f - 5112 < E . Since g(y) is supported in an interval of the form [-A, A] for some A > 0, then y (x) E VJ as long as 2 ~ - I> A. Definition 7.12(b) follows. To verify Definition 7.12(c), let f (z) E nJEzVi. Then f (7) is supported in [ - 2 j 1 , 2 i 1 ] for all j E Z. Letting j i m , it follows that f^(?) varrislies everywhere except possibly at y = 0. But since f (7) is L2, this rrrearis that .f (y) must be identically zero so that f (x) is ideritically zero as well. A
A
A
A
7.4
The Meyer M R A
Tliis example of an MRA is due to Y, hleyer, and the correspondiilg wavelet basis is historically the first example of a smooth orthonormal wavelet basis. The idea behind the Meyer MRA is to create a "smoothed" version of the bandlimited MRA by replacing the sharp frequency cutoff function X[-1/2,1/2)(y)by a smoother bell-shaped cutoff function in the frequency domain. The result will be a wavelet with better decay in the time domain (see Section 3.7). To this end, we define below the specific properties required of our smooth cutoff function.
Definition 7.25. Given k E N ( o r k = oo), a ,function b ( z ) is a C" bell function over [- 1/2,1/2] provided that b(z) is C" o n R and satisfies the follouling conditions: ( a ) b ( z )= 1 zf 1x1 5 l / 3 , (b) b ( z ) = 0 if 1x1
> 2/3,
7 . 3 . Examples of MRA
181
(c) 0 5 b ( x ) 5 1 for all x E R, and (d) x l b ( x + n ) 1 2
-- 1.
One way to construct such a function is as follows (see Figure 7.2).
c"'
(1) Define a "bump" function P(n:), (or Cw) on R and supported in the interval [-I, 11. If k is finite, this can be done with the followirlg construction (cf. Example 1.14(e) and Section 7.7.1). Define P o ( 4 = X[-l/(k+t),~/(k+l)](~). (7.16) and let Pn(x) = PO* PrL-l(x), for n E N. By this definition, (Exercise 7.32). Finally, let
(7.17)
PI; is Ck-I on R and is supported in [
1,11
where ck is chosen to guarantee that
(Exercise 7.33). For the k = m case, see Exercise 7.34. (2) Define a "sigmoid" furlctiorl B ( s ) 1,y taking an antiderivative of /?(n)
Then H(x) is C y o r C") on R arid satisfies: (a)
Q(2)=
O i f x 5 -1,
(b) Q(x)= 1 if n:
(c) 0 5 H(x)
2 1, and
< 1 for all n: E R.
(3) Define
(
)
=1
(
'7i
( 6 ) )
and
c ( r ) = cos
(i
~ ( 6 s ) .)
Then s(x) and c(z) are each C v o r CCO)on R and satisfy: (a) s(x) = 0 and c(x) = 1 if x 5 -1/6, (b) s(z) = 1 and c(x) = 0 if n:
2 1/6,
(7.21)
182
Chapter 7. Multiresolution Analysis
< c(x) 5 1 for all x E R, and
(c) O 5 s ( x ) 5 I and 0
+
(d) s 2 ( x ) c2(z) = 1, for all x
E
R.
(4) Define b(z) by
Then b(x) is a Ck bell function over [-1/2: 1/21. The Meyer MRA is defined as follows. Given k E N (or k = oo), let p(x) be such that +(y) is n C k bell function over [-1/2,1/2]. Define
and for j E Z, T(, by f (x) E T(, if and only if D2-If ( x ) E Vb. Thus, Definition 7.12(d) is satisfied. Since +(7) satisfies Definition 7.25(d),
{T,cp(x)) is an orthonormal system of translates by Lemrna 7.4, and Defirliliori 7.12(e) is satsified. In order to verify that Definition 7.12(a) holds, recall that by Lemnla 7.5, f (x) E Vo if and only if there is an t2 sequence of coefficients {a(n)) such that i ( 7 ) = ( x a i n ) e-2x.n-)
pi-) = a:-:
~(7).
(7.22)
Tl
We first show that Vo Vl. Let f (x) E Vb. In order to show that also f (z) E Vl , we must show that Dll2f (x) E VO. By (7.22),
Since @(2y) is supported on the interval [-1/3,1/3] and @(y) = 1 on [-1/3,1/31, 8 2 7 ) = F(27)
@(?I.
Therefore, A
Dl12f (7) = A a ( 2 y ) +(2y) Q(y). Define a(y) to be the period I extension of the function a(2y) @(2(~)). Since 6;(2(~))is continuous on R and periodic, it is bounded. Hence a(?) is L~ on [0, 1) and therefore has t2 Fourier coefficients {Z(n)}. Hence,
7.3. Examples of MRA
183
D l 1 2 f ( z ) E Vl. If for any j E Z, f ( z ) E 4, then D2-Jf ( x ) E V". By the above argument, D2-Jf ( x ) E Vl and hence D2-3-l f ( x ) E VO.Thus, and Definition 7.12(a)holds. f(x) E In order to prove Defiiiition 7.12(b),let E > 0 and let f (x) be C: oil R. By Exercise 3.55, we can find a function g^(y),C: on R and supported in an interval of the form [-A, A] such that 11 f - glla < E . In order to see that g E U J F Z y ,clloose J SO large that A < (1/3)2'. In this case @ ( 2 ~ ~=? 1) 011 [-A, A] SO that ?(?) = c ( y )F(2Ps7y). Let a(?) be the period 2.' extension of G(y).Since 2 > A, G(y) = a ( y )@(2-J y ) , and so Ij(zJ y ) = a(2J y ) $ ( y ) . Since F ( 2 J y ) is continuous on R, c ~ ( 2 ~has y ) t2 Fourier coefficients. Thus, g ( 2 - J x ) E Vo and g ( x ) E V J . To prove Definition 7.12(c),let f ( z )E n j E z V , . This means that for every j t Z, D 2 ,f ( x ) E Vo or that f^(?) has the form T ( y ) = a ( y )@(2-jy).But sirlcc +(2-.jy) = 0 if (2/3)2-1,lctting j + C O , wc scc that f (?) = 0 for every y 1 > 0, which implies that T ( y )isidentically zero and hence that f ( z )is identically zero.
lyl >
A
Exercises Exercise 7.26.
Show that
where p(:x) = Xlo,l)(z), where Vo is given in Section 7.3.1 Exercise 7.27. Prove that a step function g ( x ) is constant on the interfor k t Z if arid only if D2-.,g ( x ) is a step function coristant on vals the intervals for k E Z. Exercise 7.28. Prove that a function g ( x ) is Go on R and linear on the intervals Ij,k, for k E Z if and only if D Z p g 7 ( x ) is C" on R and linear on the intervals for k E Z. Exercise 7.29. Prove equation (7.14). Exercise 7.30. (a) Let {fn(z)),EN be a sequence of linear functions converging to a function f ( x ) in L2 on a finite interval I . Show that f ( x ) must also be linear on I. (Hint: Let f n ( x ) = anx b,. Use the fact that { f n ( z ) ) , is~ L2 ~ Cauchy on I-see Exercise 1.50(c)-to show that {a,,) ,EN and { b n j n E N are Cauchy and hence convergent sequences of numbers. Prove that if lim,,, a, = a and limn,, b, = b, then f ( z ) = ax b on I . )
+
+
(b) Prove that under the assumptions in part (a), f n ( x ) converges uniformly to f ( x ) on I .
184
Chapter 7. Multiresolution Analysis
FIGURE 7.2. From top left: P(x) with k = 2, the sigmoid function O(x), s ( x ) , and c ( x ) . The c2bell function b ( x ) (bottom).
Exercise 7.31. Prove that for any j E Z , f (z) is bandlirnited with bandlimit 23 if and only if D2-3f ( z ) is bandlimited with bandlimit I. (Hint: Use Theorem 3.40(a).) Exercise 7.32. Prove that for each n E N, the function /?,(z) defined by (7.16) and (7.17) is on R and is supported in the interval [-I, 11.
c"'
Exercise 7.33. For each k, find c(k) in (7.18) such that (7.19) is satisfied. (Hint: Consider the Fourier transform.)
7.4. Construction and Examples
185
Exercise 7.34. Show that the function e("-l)/("+1)
Q(x) =
0 1
if < 1, if x < -1, i f n : ? 1,
is a C" sigmoid function. That is, Q(x) is Cw on R, and satisfies (a) 8(x) = 0 if n: -1, (b) O(x) = 1 if z 2 1, and (c) 0 5 8(2) 1 for all x E R. Use tliis fullcliorl t o construct a CCObell functioii on [-I, 11.
<
<
7.4 Constructio~~ and Examples of Ort honormal Wavelet Bases The goal of this section is to provide an algorithm for constructing a wavelet orthonorma,l ba,sis given an MRA. The algorithm is given in the following theorem.
Theorem 7.35. Let {h)be a n M R A with scaling function cp(x) and scaltng filter h(k).' Define the wavelet filter g ( k ) by
and the wavelet @ ( x ) by
Then
{'$J,~(x))J,~Ez
is a wavelet orthonormal basis o n R. Alternatively, given any J E Z ,
is a n orthonormal basis o n R.
Remark 7.36. (a) Since 6 = SjEF{cpl,k(z)),(7.24) implies that $(z) E Vl (b) Taking the Fourier transform of both sides of (7.24) gives that
'We are making the assumption that h ( n ) is a filter, i.e., t h a t convenience only. See Remark 7.19.
lh(n)l
<
m for
186
Chapter 7 . Mi~lt,iresnli~tinn Analysis
where
and mo(y) is defined by (7.8) (Exercise 7.39)
7 . 1 Examples of Wavelet Bases The Haar Wavelet: Recall that scaling function for the Haar MRA was defined by y ( z ) ( z ) . Also note that by Exercise 5.18,
=
Therefore,
and as in the proof of Theorem 7.35,
Therefore, by (7.24),
By Theorern 7.35, { I j l j , k ( ~ ) ) j , k E Z is a wavelet orthonormal basis on R. The Piecewise Linear Wavelet: Recall that the scaling function, defined by
where q ( x ) = (1 - 1x1) X[-I.I]
F(z), for
(2) and
the piecewise linear MRA is
7.4. Construction and Examples
187
Since
p(x) = -
1 -
2
p(2i
+ 1) + p(x) + 51 p(2x
1
1)
1 1 +cp1,o(z) + cp1,1(x), Jz 2Jz
-c P l , - l ( ~ )
2&'
-
-
(7.27)
it follows that (Exercise 7.40)
@(Y)= c 0 s 2 ( ~ 7 / 2P) ( Y / ~ ) Therefore, $(y)
=
Therefore, by (7.26),
6(1 + 2 cus2( n y ) )1/2 y-(y)
(7.28)
188
Chapter 7 . Multiresolution Analysis
Taking the inverse Fourier transform of both sides,
wliere
See figure 7.3.
FIGURE 7.3. Left: The piecewise linear scaling function F ( x ) . Right: The piecewise linear wavelet.
The Bandlimited Wavelet: Recall that tlie scaling function for the bandlimited MRA is defined by ?(y) = Xl-llz,llz)(7').Since p(y/2) = X[-l,l)(y),it follows that
where rno(7/2) is the period 2 extensiori of X~-112,112)(?) SO that mo(?) is the period 1 extension of X[-114,~14) (7). Now, by (7.26), rnl(7) is the period 1 extension of the function
7.4. Construction and Examples
189
so that
By taking the inverse Fourier transform,
?%4=
s i n ( 2 ~ x) COS(TX) n(.?: - 1/21
-
sin n ( z - 112) (1 - 2 sin TX) T ( x - 112)
See figure 7.4.
FIGURE 7.4. Left: The bandlimited scaling function sin(~x)/(;.x)). Right: The bandlimited wavelet.
The Meyer Wavelet: Recall that the scaling function for the Meyer MRA if 171 5 113, .s(y 1/21 if E (1/3,2/3), c(y-112) i f y E (-213,-1/3),
I
+
where s(y) and c(y) are defined by (7.21). Recall also that P(y) is ~ % n R and has the property that 0 < F(y) 1 lor all y E R. Since @(y/2) is identically 1 on the interval [-2/3,2/3], @(y)= mo(y/2) @(y/2), where mo(y) is the period 1 extension of the function $(2y) X,-1/2,1/21(7).Note also that mo(y + 112) e-2ffz(*1/2) vanishes on [-1/6,1 jG] , is identically 1 on 1113, 2/31 and on 1-213, - 1/31, rises from 0 to 1 on [1/6, 1/31, and falls from 1 to 0 on [-113, -1161.
<
190
Chapter 7 . Multiresolution Analysis
By (7.25), $ (x) is defined by
and
' 4 ' , ( ~ ?=
Tllell 0
< $(?)
5 1 for all y E R, and T(y) is ~ " . n R. See Figurc 7.5.
FIGURE 7.5. Top left: The Fourier transform of the Meyer scaling function with k = 2 . Top right: The Fourier transform of the Meyer wavelet. Bottom left: The Meyer scaling function. Bottom right: The Meyer wavelet.
7 . 2 Wavelets in Two Dimensions The goal of this section is to dcfinc an expansion of a function f (x, y), L~ on R2 in terms of an orthonormal collection of functions that is based on dilations and translations of basic wavelets. There are many ways to
7.4. Construction and Examples
191
do this but here we will present a very popular and simple construction called tensor-product wavelets. The construction is based on a single twodimensional scaling function @(x,y) and a set of three two-dimensional wavelet functions P ( l ) ( x ,y), Q ( ~ ) ( xy), , and Q ( ~ ) ( xy). , We will prove the following theorem.
Theorem 7.37.
Let p(z) and $(x) be the scaling and wavelet functions associated with s o m e MRA and define
For each j , F1, k2 E Z, define
T h e n the followzng hold. ( a ) T h e collection
~~:ll~
, k 2 ( X , ~ ) 1 1 < 7 < 3 , , / , k 1. k 2 E z
(7.32)
is a n orthonormal basis o n R2. ( b ) For each J E Z, the collection
i s a n orthonormal basis o n R2
Remark 7.38.
(a) Note lhal in the above construction the dilation is isotropic; that is, it is the sarne in both the x and y direction. This is desirable for a couple of reasons. First, the basis elements are indexed by only three parameters, scale j, and location (kl, k 2 ) , instead of four (we are not counting the index 2 ) . Second, the basis does not "favor" any spacial direction; that is, each basis element does not have any particular orientation but is spread out evenly over its center of mass.
(b) It is possible to define orthonormal wavelet bases of the form (7.32) where only one wavelet is required rather than three. However, such wavelets are not associated with arly MRA so there is 110 corresponding scaling function. Any wavelet orthonormal basis in two dimensions that comes from an MRA will require three wavelet functions.
192
Chapter 7. Multiresolution Analysis
(c) We can provide some insight into why three wavelets are required for this construction by considering natural extensions of the approxirrlatiori and detail operators Pj and Qj to two dimensions. For each j E Z, let us define Pj by
We can realize the approxirnation operator P j as the composition of two operators operating in the x and y directions independently. Define
Then we car1 assert that (2) (1) P j f ( x , Y ) = p' j" ~ (j~ ' f ( . x , t J=) P j P, f ( x , y )
(7.35)
(Exercise 7.41). We also define the detail operators in the x and y directions
and
Then and Therefore we can write
7.4. Construction and Examples
193
where
4:" f
((r,
y)
=
x x(f,
*:il
' @ ~, k ~2 ~ ) l
kl
, k 2 (1, Y).
kz
Proof of Theorem 7.37: Verifying orthonormality of the systems (7.32) and (7.33)is easy and follows from the orthonorrnality of the corresponding one-dimensional wavelets and scaling functions. To verify completeness, note that it can be shown (Exercise 7.42) that for every f (n;, y ) , C: or1 R ~ ,
Then for any such f (x,y),
7.4. ,? I;ocnlzzatzon of Wuvelet Bases Time or Spatial Localization: We have seen in Remark 5.12(a) that if $ ( a ) is the Haar wavelet, then $;,k(x) is supported on the dyadic interval I,,* = [ 2 - i k , 2 - J ( k + 1)).This
194
Chapter 7. Multiresolution Analysis
identifies the Haar basis as a local basis in the sense that if f (x) is supported in an interval (A, B), then its Haar coefficients { (f,Qj,k)) will vanish whenever does not intersect (A, B) (Section 5.4.1). In all of the other examples of this section, the scaling function cp(z) is even (cp(-z) = cp(x)) and has its center of mass at zero. Consequently, +(x) satisfies $(1 - x) = $(x) and has its center of mass at 1 / 2 (Exercise 7.43). Thus, we may think of $(x) as concentrated in the interval = [O, 1) arid of $j.k(x) as co~icentratedin the interval Ij,k. Another point of view is to note that since the center of mass of $(x) is 112, the center of mass of @j,k(x)is 2-j(k + 112) (Exercise 7.8). Therefore, even with non-compactly supported wavelets, we think of the bases as local bases in the sense that the coefficierlt {(f,$I,k)) somehow captures sorne aspect of the behavior of f (z)only near the interval I J , k or near the point 2-j(k 112). In the case of wavelets in two dimensions, we need to consider the coricentration or support of the scaling functions as well. In the Haar case, p(x) is supported on [ O , l ) so that the two-dimensioilal scaling furletions and waveletmu,@(x,y) a n d * ( i ) ( z , g ) , 1: = I , 2, 3 , are supported on the square [0, 1) x [0, 1). Consequently, the scaling functions cDj,k,,kz (z, y) and (i) the wavelet fimctions *j,kl , k z (x, y) are supported on JJ,k, x Ij,k2. The situation is slightly but not substantially different in the other examples considered in this section. As above, we consider y(z) to be concentrated in the interval lo,o - = [-1/2,1/2) and hence c p j . k ( z ) to be concentrated in the interval I , , k = [2-j(k - 1/2),2-j(k 1/2)). We think of the scaling furictiori cD,j,k,,k2 (z,y) as being cori<:entrated on the interval ~ ~ x ,I j , k k2 , ~ , k Z( 2 , y) on the interval Ij,r., x I,sn,, .i2 (x,y) on
+
+
~ :- l i ~
~jZ2,
(3)
the interval l j , k 1 x l j , k , ,and !J!j,k, ,k, (x, y) on the interval I j , k l x I j , k z . Another point of view is to consider the center of mass of the two-dimensional wavelets and scaling functions. This is done in Exercise 7.44. Frequericy Localization: Here we consider the frequency concentration of the scaling functions and wavelets pI,k(x) and II,,,k(x). The question is: What is the best way to do this? If we compute the center of mass of these functions, we quickly run into trouble. Since (at least in all of the examples presented so far) p ( x ) arid ,+(x) are real--valued, the functiorls Ipj.k(?) 1' and (?) l2 are even and hence A
/Y
?j,k(?)12
d~=/T
&.x(r)12 d7 = O
for every j and k. Since in all of our examples F(O) = 1, we car1 safely recognize that @ ( y )l2 resembles a "bump" centered at zero in the frequency domain. However, since q ( 0 ) = 0, we see that in general l $ ( ? ) I 2 will reA
A
7.4. Construction and Examples
195
semble a pair of symmetric bumps in frequency. A proper discussion of frequency localization of thc wavelets should take this into account. A reasonable approach is to consider the bandlimited wavelets as a paradigm for the notion of frequency localization of a wavelet. Specifically, we note that in this case
arid Ihal
4(?)l2= XI-1.-1/21 (7)+ X [ ~ / ~ , L ) ( Y ) .
It follows then that
and that
~ $ ~ , r ( y= ) lx~~ - z ~ , - z ~ - ~ ) ( + Y )X [ ~ ~ - ~ . Z J ) ( Y ) ,
and we define the frequency localization or concentration of the scaling functions y3,k(x) and the wavelets $ j , k (x) to be the intervals [-2j-', 2'-') and [-2j, -2-1-I ) U [2jp1,23) respectively. The accuracy of t#hisparadigm can be verified by computing the Fourier transform of the wavelets and scaliiig fuiictions giver1 in this section. This is the content of Exericse 7.46.
Exercises Exercise 7.39. Prove equation (7.26). Exercise 7.40.
Prove equation (7.28).
Exercise 7.41. Assuming that f (x, y) is C: on R2 and that the scaling function p(t) has compact support on R, show that (7.35) holds.
Exercise 7.42. Show that if f (x, y) is (7.34), then lim llP,f
j-+m
-
C,O on
R 2 and if Pj is given by
f 112 + IIP-3fllz
= 0.
(Hint: The proof is similar to that of Lemma 7.5.)
Exercise 7.43. Show that in each wavelet example of this section (with the exception of the Haar wavelet) the following hold. (a) The scaling function cp(x) is even; that is,
cp(-2)
= cp(x)
(b) The center of mass (see Exercise 7.8) of cp(x) is zero.
196
Chapter 7. Multiresolution Analysis
(c) The wavelet $(z) is symmetric about 112; that is, $(1 - x)
=
+(x).
(d) The center of mass of $(x) is 112.
Exercise 7.44. Suppose that
Show that
Exercise 7.45. Show that if $(x) is a wavelet, then
1$(2jr)l2 = 1 for 3
all y. Exercise 7.46. For each example of a wavelet given in this section, calculate either numerically or analytically its Fourier transform and verify informally that this function is concentrated on the intervals [-I, -112) U [l/2,1).
7.5 Proof of Theorem 7.35 The goal of this section is to prove the first part of Theorem 7.35. This is done in two steps. The first step is Lemma 7.48, which lists three conditions on the wavelet $(z) sufficient to guarantee that { $ J , k ( ~ ) ) J , k E Z forms an orthonormal wavelet basis on R. The second step involves showing that the
7.5. Proof of Theorem 7.35
197
function $(z) given by (7.24) satisfies those three conditions. The proof of the second part of Theorern 7.35 uses the same techniques and is left as an exercise (Exercise 7.50).
7 . 5 1 Suficient Conditions for a Wavelet Basis Definition 7.47. For each j E Z , the wavelet subspace W, is defined by
Lemma 7.48. Suppose that there exists a function +(z) E VI satisfying the following conditions. (a) ( T , @ ( x ) )is a n orthonormal system 0.f translates. I n light of Definition 7.47, this is the same as saying {T,+(x)) is a n orthonormal basis for Wo. (b) (T,@,T, p) = 0 for a11 n, m E Z . Since {T,,p(x)) is a n orthonormal basis for Vo,this i s the same as saying that Vo and Wo are orthogonal subspaces; that is, .Ilf f (x) E Vo and g ( x ) E Wo,then ( f , g ) = 0. This i s often written as Wo IVo. ( c ) Given a function f (z), C: o n R,
QO f ( x ) E SjZE{T,$(x)) = Wo.
is a wavelet orthornormal basis o n R.
Proof: As usual we must show orthonorrnality and completeness. Let us first prove orthonornlality within scales. Let j t Z he fixed. Then
by (a), as requircd. To prove orthonormality between scales, let j, j' t Z with j' < j , and let k , kt E Z be arbit,rary. Since $(z) E Vl, $ o , k / (z) E Vl also (cf. Exercise 7.9). By Definition 7.12(d), $j~,/c'(x)= D ~ ~ / $ o , ~EI5 ( x/ )+1. Since by (b), (qo,,,
( ~ 0 ,= ~ 0)
for all n, rn E Z , it follows that
( D 2 J '$0,n, D2J ( ~ 0 . m= ) ($3,n, ( P j , n l )
for all j , m, n E Z.
= 0,
198
Chapter 7.Multircsolution Analysis
Given f (z)E
4, we know by Lemma 7.15(a) that
Hence for f ( x ) E V,,
Finally, note that by Definition 7.12(a),since j' < j , 4 1 + ~4 and since $ 3 / , k , ( 5 ) E T/n,+l, g j , , k , (x) E V, also. Hence ( g j , k , = 0. Therefore { $ ~ ~ , ~ ( isz an ) )orthonormal ~ , ~ ~ ~ system on R. In order to prove completeness, let f (z) be C: on R. We will show first that (c) implies that for any j E Z , $j/,kl)
To see this. note that
Qjf(x)
=
Pj+l
f
( 2 )-
By (c) (see Remark 7.3),
Therefore,
Pjf (4
7.5. Proof of Theorem 7.35
199
and (7.37) follows. In order to complete the proof, recall that by Lemma 7.16,
For J f Z fixed,
=
Ilf
-
P J +~P-.~f 112
Since for each 3,
& i f(2)
C(f1'$.7.k.)
'd1j.k (lr)>
k.
Therefore by Lemma 7.16,
and { Y ~ . ' J , / ~ ( x )is) ~complete. .~~Z
7 . 5 2 Proof of Theorem 7.35 Let h ( n ) be the scaling filter. Define the wavelet filter g(n) by (7.23) and $ J ( x )E Vl by (7.24). We will show that { $ J , k ( . x ) ) j , k E zis a complete orthonormal system on K. by showing that Lemma 7.48(a)-(c) are satisfied. To prove (a), note that since {T,cp(x)) is an orthonormal system of translates,
200
Chapter 7. Pvlultiresolution Analysis
where we have split the second sum into its even and odd terms, and used the periodicity of m o ( r ) . By (7.26): Iml (?/2)12 lml (?/2 1/2)12 = 1 also and using an argument similar t o the above
+
+
and (a) follows. To prove (b), note that by (7.26), and the orthonormality of {Tnp(x)),
7.5. Proof of Theorem 7.35
201
for all n, !E Z. To prove (c), let f(x) be C,O on R. Since Qof(z)= Plf(z)- Pof(z),and by Lemma 7.15(a), 1
=
f
l
k
l
k
and
f i f ( 2 )=
k
C(f. Po,?+)P " . ~ ( Z ) . k
(7.39) Taking the Fourier transform of both sides of (7.39), we have that
and
Since by Bessel's inequality
the Riesz-Fischer Theorem implies that a,(?) a n d b(y) are functions L2 on [O, 11 in the sense of Lebesgue. By Lemma 7.5, it is enough to show that there is an f2sequence { c ( n ) ) such that (7.42) ( 7 )= ~ Y ) ? ? ( Y ) = m,l(Y/2) $(7/2). If (7.42) did hold, then in light of the definition of Qo, (7.40), and (7.41), E(y) would satisfy
Kf
BY)
Thus, (7.42) would follow if we could find E(7) such that
Chapter 7. Multiresolution Analysis
202
Replacing y by y + 1 in (7.43) and remembering that a(?), b(y), and ?(y) each have period 1, we obtain
Combining (7.43) with (7.44), we obtain the system
(
ml (712) m1(y/2 112)
+
mo;;Pf!/2)
) ( Eh) ) - ( b(y)
a(y/2) a(yI2 + l / 2 )
1.
(7.45)
Since ml(?) =
e-Z7rz(y+l/Z)
mob + 112)
and lmo(y)I2 + Imo(7 + 1/2)12 = 1, the matrix
is unitary for all y
E
R; that is,
Applying this fact to (7.45) gives
so that
It can be verified directly that ?(-(y) has period 1, and since m l ( y ) l 5 1 for all y,that ?(?) is L~ 011 [O, 11 ill the sense of Lebesgue arld Ilence has t 2Fourier coefficients. Therefore, (7.42) holds, (c) is proved, and the result follows from Lemma 7.48.
Remark 7.49. In the course of the preceeding proof, we showed the following facts. (a)
(b) If j
(z))kEz is an ortl~onorrnalbasis for WJ
# j'
then W j IWjl.
(c) For each j E Z,
VjIW j .
7.6. Necessary Properties of the Scaling Function
203
(d) For each j E Z, q+l= V, @ W j . This means that every f ( x ) E y+, can be "split" as f ( z ) = fi (z) f 2 (z), where f (z) E V, and f 2 (x) E Wj . By (c), (f1 , f 2 ) = 0. This fact is t o be compared with the Splitting Lemma (Lemma 5.16) for the Haar system.
+
(e) Every f (z), L2 on R, can be written as a sum
where fj (x) E written as
W'
and by (Is),
(fj,fj1)
=
O if j
# j'.
This is usually
Exercises Exercise 7.50. Prove the second part of Theorem 7.35.
7.6 Necessary Properties of the Scaling Function In this ~ e c t i o nwe , ~ derive sorne propcrtics that thc scaling function, c p ( ~ ) , and the wavelet function, $(+), for a given MRA must satisfy. Throughout the section, it will be assumed that the scaling function is both L' and L2 on R and that the wavelet function defined by (7.24) is also L1 on R.
Theorem 7.51.
Proof: Let f (z) be given so that
I f 11
A
= 1, f (7) is continuous and sup-
ported in the interval [-R, R] for some R
> 0. By Theorem
3.40,
-
2 ~ h material e in this section is adapted from Daubechies, T e n Lectures on Wavelets, Society for Industrial and Applied Mathematics (1992) p. 144.
204
Chapter 7. Multiresolution Analysis
By Parseval's formula,
Since (2 j I 2 e 2 T i k 2 p " y ) k , z is a complete orthonormal system on the interval [-2j-', 2jP1], then as long as 2j-' > R, the above sum is the sun1 of the squares of the Fourier coefficients of the period 23' extension of the function f^(o)@(2-iy). Therefore, by thc Planchcrcl formula for Fourier series.
Sincc cp(z)is L1 on R, @(?) is continuous on R by the Riemann-Lebesgue Lemma (Theorem 3.9). It follows that
un,iforrn,ly on [-R, R]. Therefore, by Theorem 1.40(b),we can take the limit under tlie integral sign and conclude that
Ilf 1 ;
=
lim IPjf l l 2
3-rn
Hence I@(O) I = 1, and since p(z) is L1,
7.6. Necessary Properties of the Scaling Function
205
Corollary 7.52.
Proof: Since @(y)= mo(y/2) @(y/2),where 7 ~ (y) ~ 0is defined by (7.8),and since by (7.46), @(0)# 0, mo(0) = 1. Since by (7.25)
and since by the ortl-lonormality of {Tkcp(z)),
$(o)
mo(l/2) = 0, and hence on R.
= 0. Therefore,
(7.47) holds since $(z) is L'
Corollary 7.53. G(n)= 0 for. all irrtegers n # 0.
Proof: Since {T,,p(z)) is an orthornorrnal system of translates, Leinma 7.4 says that 7L
for all y E R. Letting y
= 0,
By Theorem 7.51, Ig(0) l2
=
this gives
1 so that
and (7.48) follows.
Corollary 7.54.
+
Proof: Note first that the function C , p(z n) is L1 on [0,1) and has period 1. By Corollary 7.53, @(0) = 1 and @ ( k )= 0 for all integers k # 0.
206
Chapter 7. Multiresolution Analysis
Therefore for each k E Z,
The only function with period 1 and Fourier coefficents equal t o b ( k ) is the function that is identically 1 on [0, 1). Therefore, (7.49) follows.
7.7 General Spline Wavelets In Section 7.3.2, we studied an MRA in which the spaces VJ consisted of continuous piecewise linear functions. The wavelet $(z) associated with this MRA is also piecewise linear and continuous, but is not compactly supported. However, since it has rapid decay a t infinity, it is very small outside a relatively small interval. As a result, the piecewise linear wavelet expansion has most of the advantages of the Haar expansion and does a better job of representing smooth functions. In particular, any partial sum of the piecewise linear wavelet series of a smooth function is continuous. We would like to do even better. In this section, we will construct wavelets that are smooth and piecewise polynomial. Specifically for each n E N, we will construct a wavelet that is C7'-I on R and that is a piecewise polynomial of degree n. To do this, we will require some preliminary properties of piecewise polyrlomial functions and specifically of spline functions.
7.7.1 Basic Properties of Spline Functions Definition 7.55.
Let B O ( x )= X [ 1 / 2 , 1 / 2 ~( x ) , and for n E N , define x+1/2
B r L ( x )= BTL-1* B o ( x ) =
S
B , l ( t ) dt.
(7.50)
x- 1 / 2
Thc function B,(x) zs callcd thc B-splinc (or spline) of ordcr n. For n E Z', define ( x ) by (7.51) & ( x ) = B,(x - ( n 1 ) / 2 ) .
g,
+
7.7. General Spline Wavelets
(k) Bl (s)= Bo * Bo(:) = ( 1
Example 7.56.
-
207
1x1)X [ - l , l j ( x )
x
€
[-3/2, -112)
otherwise
1 i 2+,Ix 1 4
+
B3 (4 = B2 * B o ( z )= =
x
[ 3 / 21 / 2 )
otherwise,
2-1/2
Exercise 7.64.
Bob) =
-
B
)
-
(4,
X[0>1]
{
2
z 0
zE[O,l),
: E , 2 ] ,
otherwise.
-
B3( z ) = Exercise 7.64.
Lemma 7.57.
( a ) B,(z) i s supported in [-(n
[O,n+11.
-
T h e functions B, (x)and B,(x) satsify the following properties.
+ 1)/2, ( n + 1)/2], and
&(x)
is supported in
( b ) B, ( x ) and
-
En(z)are cnP1 o n R.
(c) B,(x) is equal t o a degree n polynomial o n intervals of the f o r m [ k ,k k E Z.
gn(y)=
(y)
sin(~y)
n+l
-
A
, and B,(-y)
=
e-Ti'nli'
(?)
sin(7i.y)
+ 11,
n+ 1
.
Proof: (a) Exercise 7.65.
(b) The proof is by induction on n. Clearly B l ( x ) is continuous on R. By (7.50), B,,(x) = B, - (t) d t . By the Fundamental Theorem of Calculus, BA,(x) = B n P l ( x 112) - B,,-l(z - 112).
J~!T~;:
+
By the induction hypothesis, B,,-l (x) is CnP2on R. Therefore, B, (z) is CIL-1 on R. (c) The proof -is by induction on n. Clearly the result holds for B ~ ( X ) . Now assume that B,,(z) = pk(x) on [k, k + 11, k E Z, where pk (x) is a degree n polynomial. Fix k, and let x E [k,k I]. By Exercise 7.63,
+
Since the indefinite integral of a degree n polynomial is a degree n polynomial, we are done.
+1
(d) Exercise 7.66.
7.72 Spline Multiresolution Analyses Given n E N, define the degree n s p l i n e m u l t i r e s o l u t i o n a n a l y s i s by
and for j E Z, define
Note that any function f (z) E Vois Cn-' on R and is a degree n polynomial on each interval k E Z. Any function f (z) E V, is Cn-' on R and is a degree n polynomial on each interval I j , k , I% E Z.
7.7. General Spline Wavelets
209
We need to verify that { V , ) j c z is an MRA. To verify Definition 7.12(a), we need the following lemma.
Lemma 7.58. For each n E N ,
B,(z) satisfies n
E n
where m o ( y ) = 2-'"'
(1
Proof: If n = 0, then
(7) = mo (712) En (74121,
+ e-2"")nf
(7.54)
l.
Bo(z)= Xlo,ll
(x), and by Exercise 5.18,
Taking Fourier transforms gives
A
B,,
A
Since (?) = (go ("r))"+' (Exercise 7.63 and the Convolution Theorem), raising both sides of ( 7 . 5 5 ) to the n 1 power gives (7.54).
+
Since
and Definition 7.12 (a) follows. The verification of Definition 7.12(e) is contained in the following lemma. 1
Lemma 7.59. thonormal basis for
Thcrc cxists a function F(x) such that {T,@(X)}
is a n or-
Vi.
Proof: In light, of T,emrna 7.7, it, will he sufficient t,o show that there exist constants A, B > 0 such that for all y E R,
To see this, note that
210
Chapter 7. Multiresolution Analysis
A
Since CI:(En(?+6)l2 has period 1, it is enough t o show that it is boundcd above and away froni zero on the interval [-1/2,1/2]. For y E [-1/2,1/2],
and
To verify Definition 7.12(b), we require the following lemma. Fix n E N , let f (z) be Da.r~dlirruiLedwilh b a ~ ~ d l i n z R i t > 0, (Definition 3.4'7) and suppose that f (7) is C' on R. 'L'hen
Lemma 7.60.
A
( f , D~~ &) D~~TxEn (x)= f (x)
lim J'S
Proof: Applying Parseval's formula, wc obtain
(f,D~~T, B,)
=
/f
(x) ~
2
)
dx
R
=
~ ( y~) ~ ( 2 -2 j j /~2 e) 2 r i ( 2 - " y ) k d,Y
7.7. General Spline Wavelets
211
Recall that {2-312 e-2TL2-'7 is an orthonormal basis on [ - 2 k 1 , 2,1-']. This means that as long as 2J-I > R, ( f , D2J is the kth Fburier A
fi;) g,,(2-j;).
coefficient of the period 2i extension of the fur~ction
C f^(?+ 2%) 5,(2-j1. + k ) .
Hence
..
-
2i/2
(7.57)
k
In light of (7.57), taking the Fourier transform of the left side of (7.56) gives
The term in the sllrri for m = 0 is
A
and in fact, this term converges to f ( 7 )as follows.
-
A
since limj+, that
As long as 2"' that
11- B , ( 2 - 9 ) l2
=
0 uniforrr~lyon [-R, R].It remains to show
> R, the supports of each term in thc sum arc disjoint so
Chapter 7. Multiresolution Analysis
212
-
A
y(f^(~)
Since iscontinuous and compactly supported, it and also Bn(?) are L" on R. Therefore,
But since
1
5 jj
03
c
m=l
( T
1 (~ 1/2))2n+2
.2-.1
R
J-2-,n
s i ~ l ~ (" ~+ y~dy. )
Since { 2 j f 1 R XI-2.1R,2-J n]( x ) ) , is~ an ~ ~approximate identity arid si~lce ( ~ y is) a, continuous function that vanishes at y = 0: Theorem 2.33 says that ,in1 2~ l 2 - I R sin2"+ 2 ( ~ yd )y
,i+"o
Thus,
2
3R
= 0.
7.7. General Spline Wavelets
213
asj-m. To complete the verification of Definition 7.12(b), let f (z) be C: 011 R. Let F > 0. By Corollary 2.37(b), and Plancherel's formula, there is a bandlimited function g(x) such that i?j(y) is C0 on R and such that 11 f - g1I2 < ~ / 2 By . Lemma 7.60, we can find J > 0 such that
Therefore,
T~B,,
Sillce Ek(y. D ~TJk E n )0 2 . 7 (z) t I/,, Definition 7.12(11) holds. 'l'hc verification of Definition 7.12(c) and (d) is left as a n exercise (Exercise 7.68).
Exercises Exercise 7.61.
Verify the formula for B3(z)given in Exan~plc7.56(a).
Exercise 7.62. Verify directly that
E,(r)
is CrL-'on R for
IL =
1, 2, 3.
Exercise 7.63. Prove that for each n E N,
Exercise 7.64. Calculate explicit formulas for B3(x) and Example 7.56. Exercise 7.65.
Prove Lemma 7.57(a).
~
~
( as2 in)
214
Chapter 7. Multiresolution Analysis
Exercise 7.66. Prove Lemrna 7.57(d). (Hint: Use the Convolution Theorem. Theorem 3.21 .) Exercise 7.67. Prove that for each n E N, B,(x) satisfies E n
( 7 ) = m n ( ? / 2 ) En(?/2),
where mu(y) = cosn+l(2ny). Exercise 7.68. Verify Definition 7.12(c) and (d) for the degree n spline
MRA.
Chapter 8
The Discrete Wavelet Transform 8.1 Motivation: From MRA to a Discrete Transform The MRA structure allows for the convenient, fast, and exact calculstiorl of the wavelet coefficients of an L2 function by providing a recursion relation between the scaling coefficients at a given scale and the scaling and wavelet coefficients at t,he next coarser scale. In order to specify this relation, let {V,) be an MRA with scaling furictiorl cp(z). Then by Lernma 7.17: ~ ( 3 : satisfies a two--scale dilation equation (7.7)
The corresponding wavelet $(z) is defined by (7.24)
where g ( n ) = (-l)?'h ( l - n) (7.23). Suppose that we are given s signal or scqucrlcc of data {cO(k))kEZ. We lllake the assurrlption that c o ( k ) is the ktli scaling coefficient for some underlying furlctiorl f (z); that is,
fur. each km E Z. This assunlption allows the recursive algorithrrl to work, but
it is important t o understand that this interpretation ofco(k) 0,s th,e sca,ling coe.ficient of some function f (x) is dzfferent from the usual interpretation of data in signal processing as the samples of some underlying function f (z).' We will show that all scalirig and wavelet coefficients o f f (x) for all negative scales car1 be calculated using a very convenient recursive algorithm. lIn spite of the interpretation of data as scaling coefficients and not samples, the samples of a function f (z), { f ( k ) J k E z will , often be treated as input to the Discrete Wavelet Transform. That is, it is assumed that f ( k ) zz ((f, which need not be tile case. Strang referrs to this assumption as a "wavelet crime." Strang strongly suggests preprocessing sampled data by taking c o ( k ) = f ( n )p ( n - k ) . See Strang and Nguyen, Wavelets and Filter Banks, Wellesley-Cambridge Press (1996) p. 232-233.
En
)
216
Chapter 8. The Discrete Wavelet Transform
Since p o , o ( x ) = En h ( n ) yl,,,(x), it follows that for any j. k E Z ,
Similarly,
&,k(x) = C g ( n- 2k) ~ j + l , n ( x ) .
(8.2)
For every j E N, define c j ( k ) and d j ( k ) by
for k E Z . Then b y (8.1))
n
In order t o see t h a t the calculation of cj+,(k) and d j + l ( k ) is completely reversible, recall thal by Definition 7.14,
and that by (7.37),
Also, by Definition 7.14, for any j E Z ,
Writing this out in terms of (8.5) and (8.6) gives
By matching coefficients, we conclude that
cJ ( k ) =
+
cj+l ( n )h(k - 2 ~ ' )
dj+l ( n )g ( k
-
2n).
(8-7)
We sumnlarize these results in the following theorem. Let {V,) be an M R A with associated scaling function p(x) and scaling filter h ( k ) . Define the wavelet filter g ( k ) b y (7.23) and the wavelet function $ ( z ) by (7.24). Given a function f ( z ) ,L~ o n R, define for k € Z,
Theorem 8.1.
and for every j E N and k E Z ,
and CJ
( k )=
C C J + I (h~( )k
-
2n)
+ C d j + l ( n )g ( k - 2n).
218
Chapter 8. The Discrete Wavelet Transform
8.2 The Quadrature Mirror Filter Conditions Theorem 8.1 suggests that the key object in calculating ( f ,pj,k)and ( f , $,,k) is the scaling filter h ( k ) and not the scaling function cp(x). It also suggests that as long as (8.7) holds, (8.3) and (8.4) define an exactly invertible transform for signals. The question is: What conditions must the scaling filter h ( k ) satisfy i n order for the transform defined b y (8.3) and (8.4) to be in,?~ertihEeb y (8.7)? These properties will be referred to as the Quadrature Mirror Filter ( Q M F ) conditions and will be used in the next section to define the Discrete Wavelet Transform. In this section, we will motivate the QMF conditions, refornlulate them in the language of certain filtering operations on signals called the approximation and detail operators, and finally give a very simple characterization of the QMF conditions that will be used in the design of wavelet and scaling filters.
8.2.1 Motzuation from MRA In this subsection, we will derive some properties of the scaling and wavelet filters I L ( ~ z ) and g ( n ) that follow directly from propcrtics of MRA. Ultimately this will motivate our definition of the QMF conditions.
( I ) By Theorem 7.51, Jk ~ ( xdx)
=
# 0 so that
h ( n )2-1/2 n
L
~ I ( z d) s .
Cancelling the nonzero factor JR p ( s ) d z from both sides, it follows that
By Corollary 7.52, JR $ ( s )d x = 0 so that
8.2. Thc QMF Conditions
219
Hence,
This is equivalent to the statement that
(Exercise 8.15). (2) Since {YO,, (x)) and {cpl,,(x)}are ortllorlorrnal systerrls on R,
Hence.
Since {$o,,(z):n E Z ) is also an orthorlormal system on R, the same argument gives y(k) y ( k - 271,)= 6 ( n )
C
Since ( $ o , ~ ~PO.^^) , = 0 for all n, m E Z , the same argument gives
for all n E Z.
(3) Since for any signal co (n),
Chapter 8. T h e Discrete Wavelet Transform
220
where co (m) h(m - 2k)
cl ( k ) = m
and dl ( k )=
co(m) y ( m - 2k) ; m
it follows that
Hence we must have
We surrlmarize these results in the followirlg theorem. Theorem 8.2.
Let {V,) be a r ~MRA with scaling filter h ( k ) and wavelet .filter g ( k ) g i ~ ~ e hy n , (7.23). Then
h ( k ) h ( k - 2 n ) = x g ( k ) y(k
(c) k
(d)
-
2n) = 6(n),
k
1
g ( k ) h ( k - 2n) = 0 for all n t Z , and
h(m- 2k) h ( n - 2k)
(e) k
+x g ( m
-
2k)g(n- 2 k ) = 6 ( n - m)
k
Remark 8.3. (a) Condition (a) is referred to as a normalization condition. The value fi arises from the fact that we have chosen to write the two-scale dilation equation as p(z) = Enh ( n )21/2 p(2x - n) . In some of the literature on wavelets and especially on two-scale dilation equations,
8.2. The QR4F Conditions
221
the equation is written p(x) = C h ( n ) cp(2x - n). This leads to the normalization C , h ( n ) = 2. The choice of normalization is just a convention and has no real impact on any of the results that follow.
(b) Conditions (c) and (d) are referred to as orthogonality conditions since they are immediate consequences of the orthogonality of the scaling functions at a given scale, the orthogoilality of the wavelet functions at a given scale, and t h e fa,c,t,t,hat the wavelet functions are orthogonal t o all scaling functions at a given scale. (c) Condition (e) is referred to as the perfect reconstruction condition since it follows from the reconstrliction formula for orthonormal wavelet bases.
8.2.2
The Approzimation and Detail Operators and Thei~, Adjoints
The goal of this subsection is to reformulate Theorem 8.2(c)-(e) in terms of certain filtering operations on signals referred to as the approximation and detail operators. These operators will also play an important role in the defi~itionof l l ~ eDiscrete Wavelet Transform.
Definition 8.4. Let c ( n ) be a signal. ( a ) Given m E Z, the shift operator r, is defined by
( b ) The downsampling operator ./. ,is defined by
(Note: (.j,c)(n) is formed by removing every odd term in c ( n ) . ) ( c ) T h e upsampling operator
( (Note:
(T c ) ( n ) is formed
is
)
=
defin,ed h p
{
c ( n / 2 ) i f n is even, a if n is odd.
by inserting a zero between adjacent entries of
c ( n ).) See Fzgurc 8.1.
Definition 8.5. Civcn a signal ~ ( nand ) a filter h ( k ) , define g ( k ) by (7.23). T h e n the approximation operator H and detail operator G correspondtng to h ( k ) are defined by
222
Chapter 8. The Discrete Wavelet Transform
FIGURE 8.1. Top left: A signal c ( n ) , top right: ( J c ) ( n ) bottom: , (Tc)(n) (right). The approxiniation adjoirit H* a n d detail adjoirlt G* are defined b y
Remark 8.6. ( a ) The operators H and G can be thought of as convolution with the filters h(n) = h ( - n ) and g ( n ) = y ( - n ) followed by downsampling. That is, ( H c ) ( n )=J ( c * b)( n ) and
( G c )( n )=J ( c * g)( n ) .
( b ) H * and G* can be thought of as upsampling followed by convolution with h and g ( x ) . That is, ( H * c ) ( n )= ( T c ) * h ( n ) and
( G A c ) ( n= ) (?c)*g(n).
( c ) The operators H * and G* are the formal adjoints of H and G . That is, for all signals c ( n ) and d ( n ) , ( ~ cd ), =
x
C ( H C ) (=~ )e ( k ) ( ~ * d ) ( =t )(c. ~ * d ) k
k
8.2. The QMF Conditions
223
and
(Exercise 8.16). Taliing the above remarks into consideration, we car1 reformulate the conditions of of Theorem 8.2(c)-(e) as follows.
Theorem 8.7. Given a filter h ( k ) , de6ne g ( k ) by (7.23) and let I denote the identity operator. Then:
if and onlg i,f H H * = GG* = I , where I is the identitg operator o n sequences,
for all n E Z i f a,nd only if HG* = GH* = 0 , and
k
k
i f and only if
H*H
+ G*G = I ,
where I is the identity operator.
Proof: Exercise 8.17
8.2.3
The Quadrature Mirror Filter (QMF) Conditions
In Theorem 8.2, we set forth conditions on the scaling filter h ( k ) that are consequences of the fact that h ( k ) is the scaling filter for an MRA. In Theorem 8.7, we saw that sorrle of these conditions can be characterized in terms of the approximation and detail operators and their adjoints defined in Definition 8.5. In this section, we will show that all of the conditions in Theorem 8.2 can be written as a single condition (Theorem 8.11(a)) on the auxiliary function mo(7)= 1/1/2 Enh ( n )e-2Xin' plus the normalization condition mo(0)= 1. These two conditions will be referred t o as the Quadrature Mirror Filter ( Q M F ) conditions.
224
Chapter 8. The Discrete Wavelet Transform
We will need the following lemmas.
Lemma 8.8.
Given a signal c ( r ~ )the , followir~g hold.
( a ) For every m E Z
(y)
( T ~ ~ C ) =~ e-2x"%(y)
See Figure 8.2.
Proof: (a) Exercise 8.18. (b) To prove (8.21), we compute tlie Fourier coefbcier~tsof the right-hand side. Let n E Z be fixed.
and (8.21) follows by the Uniqueness of Fourier series.
(4
8.2. The QMF Conditions
225
FIGURE 8.2. Top left: The Fourier series, ?(?) of the signal c ( n ) of Figure 8.1. Top right: ( J c ) ~ ( Bottom: ~). (?C)~(?).
Lemma 8.9. ml
Given a jiltcr h ( k ) , define g ( k ) b y ( 7 . 2 3 ) , 7 7 ~ " ( ~ )2lrj (7.9), and
( y ) b y (7.26). T h e n for any signal c ( n ) ,
Proof: To prove (8.23), note that by defining h(n)= h(-n) that
and recall that for any c, H c
=I( c * h).Taking the Fourier transform,
226
Chapter 8. The Discrete Wavelet Transform
The other part of (8.23) follows similarIy. To prove (8.24),note that
and recall that for any c, H*c = h
* (Tc).
Talking t h e Fourier transform,
The other part of (8.24) follows similarly. In Theorem 8.7(c), the equivalence of the conditioris X I ,g ( k ) Fb(k - 2 n ) = 0 for all n E Z (Theorem 8 . 2 ( d ) ) and HG* = G H * = 0 was demonstrated. The next lemma shows that Theorem 8 . 2 ( d ) is a consequence only of the way in which the wavelet filter y(k) was defined arid is not related t o any other property of the scaling filter h ( k ) . Lemma 8.10.
Given a filter h ( k ), define the jilter g ( k ) b y (7.23). Then
and
HG* = G H * = 0.
Proof. To see (8.25), note that by the definition of g ( n ) ,
Then,
+
+
mo(7)r r ~ o ( + y 112) ml(7)m l ( y 1/21 = mo( y )mo( y + 112) - e - 2 " i ~ m o( y + 1/2)e2"" mmo (7) = mo(r)m,o(y + 112) - m o ( + ~ 1/21 ~ o ( Y )
To see (8.26), note that given any signal (8.23),
~ ( Y L ) ,a i d
applying (8.24) and
8.2. The Q M F Conditions
227
U
We can now prove the following theorem.
Theorem 8.11.
Given a filter h ( k ) , define g(k) b y (7.23), m o ( y ) b y (7.9), rnl ( y ) b y ( 7 . 2 6 ) , and the operators H , G, H * , n,n,d G* hg (8.1 8 ) an,$ (8.19 ) . Then. the following are equiua81ent.
( a ) 1mn(y)l2
+ + m o ( y+ 1/2)12
1.
h ( n )h ( n - 2 k ) = d ( k )
(b) n
( c ) H* H
+ G*G = I .
( d ) H H * = GG* = I .
Proof: (a)
* (b). Applying Parseval's formula to h ( n )and ~
=
Jo
+
e2""'y ( I r n 0 ( ~ / 2 ) 1 ~Irn0(7/2
Therefore, (h) is equivalent t o the statement that
~ ~ hgives ( n )
+ 1/2)12) dy.
228
Chapter 8. The Discrete Wavelet Transform
But this is true if and only if Imo(y/2)12
+ mo(y/2 + 1/2)12 = 1
for all y E [0, I ) , which is (a). (c). Given a signal c(n), (a)
Similarly,
Therefore,
by (8.25). Therefore (c) holds if and only if 4 7 ) (lmo(7)12+ lm1(Y)l2)= ?(YL
A
for every signal c ( n ) , which is true if and only if
+
I m ~ ( r l 2I2 ) + l m o ( ~ l 2 1/2)12 = 1 for all y E [O,l). This is (a). (d). C ivcn a signal c(n), (a)
8.2. T h e Q M F Conditions
229
Similarly,
Therefore,
H H * c = GG*c = c, for every signal c ( n ) if and only if
which is (a).
Definition 8.12.
Given a filter h ( k ) , define m.o(y) h g (7.9) Then h ( k ) is a
QhfF provided that: ( a ) m o ( 0 ) = 1 and
+
( b ) ( r n 0 ( ~ / 2 ) (+~m o ( y / 2+ 1/2)12 = 1 for all y F R. W e refer t o ( a ) and ( b ) as the quadrature mirror filter ( Q M F ) conditions.
Theorem 8.13. Suppose that h ( k ) is a QMP. Define g ( k ) by (7.23). Then: (a)
h ( n )=
d?,
( e ) x g ( k )h ( k - 2 n ) = 0 for nil n E Z . k
Proof: (a) By the definition of mo(y),
and (a) follows.
230
Chapter 8. The Discrete Wavelet Transform
(b) Since mo(0) = 1 and l r r ~ ~ ( +~ I) 7 7 ~ mu(1/2) = 0. But by the definition of g(k),
l2
+~ (1/2)12 ~ =
1, it follows that
so that
and (b) follows. (c) By Exercise 8.15, (b) is equivalent to (c). (d) By Theorem 8.7(a), (d) is equivalent t o the staterrleilt that H H * = GG* - I , which by Theorem 8.11 is equivalent to Definition 8.12(b). (e) By Theoreni 8.7(b), (e) is equivalent to the staternerit that G H * = HG* = 0, which is Lemma 8.10.
(f) By Theorem 8.7(c), (f) is equivalent t o the statement that H* H+G*G I , which by Theorem 8.11 is equivalent to Definition 8.12(b).
=
Remark 8.14. It follows frorn the first part of Theorem 8.13(d) that an FIR filter h,(n) that satisfies the QMF conditions car1 be supported on only an even number of points. That is, if h(n) = 0 for n < hf a r ~ dn > N , h ( M ) # 0, and h ( N ) # 0, then N - M 1 is even (Exercise 8.19).
+
Exercises Exercise 8.15. (7.23), then
Prove that if h,(k) is any filter and if g ( k ) is given by
Exercise 8.16. Verify the statement made in Reinark 8.6(5) Exercise 8.17. Prove Theorem 8.7. Exercise 8.18.
Prove Lemma 8.8(a).
Exercise 8.19.
Prove the statement made in Remark 8.14.
Exercise 8.20. The purpose of this exercise is to show that the formula (7.23) for the wavelet filter g ( k ) is not arbitrary. Prove that if h(k) is a
8.3. The Discrete Wavelel Trafislol.nl
231
real-valued FIR QMF and if g(k) is any real-valued FIR filter such that Theorem 8.13(a)-(f) are satisfied, then g(k) must be of the form
for surrle odd integer equivalent to
71.
(Hint: (1) Show that the QMF conditions are
H(z) H(zpl)+G(z) ~ ( z - l )= 2
and
H(-z) ~ ( z - ' ) + G ( - z ) ~ ( z - l )= 0,
where H ( z ) and G(z) are the z-transforms of h(n) arid g(n). (2) Show that these identities imply that ~ ( z - l )= G(-a) cwzN and G(zpl) = - H ( - 2 ) a z N for some N E Z and a E C . (3) Sllow that u2 = ( - I ) ~ + ' , and rewrite the identities from (2) in terrns of h ( n ) and g ( n ) . )
8.3 The Discrete Wavelet Transform (DWT) 8.3.1
The D WT for. Signals
Summarizing some of the considerations giver1 in the prcviolis section, we can now make a formal definition of the discrete wavelet transform.
Definition 8.21.
Let h ( k ) be a Q M F , define g ( k ) by (7.23), and let H , G , H * , and G* be giz~enby (8.18) and (8.19). Fix J t N . The D W T of a signal c o ( n ) , is the collection of sequences { d , ( k ) :1 5 j _< J ; k E Z ) U { c ~ ( k k) :E Z ) , where C,+I ( n )= ( H c 3 ) ( n , )
and
d,+l(n) = ( G c , ) ( n ) .
(8.30)
The inverse transform is defined by the fomw~~la
If J = m, then the D W T of co is the collection of sequences {d,(k): j E N ; k E Z ) .
8.3.2
The D WT for Finite Signals
In practice, we never deal with infinite signals and this raises the question of how to take the DWT of a finite signal. There are essentially two ways to do this. (1) Zero Padding. This approach is to treat the finite signal as an infinite signal padded with zeros. Then apply the DWT as in Definition 8.21.
232
Chapter 8. The Discrete Wavelet Transform
The main difficulty with this approach is that the representations we obtain are not as efficient as possible. For example, suppose that our signal has length 2N. That is, suppose that co(n) satisfies co(n) = 0 if n < 0 or n > 2N - 1. Suppose also that the scaling filter h ( n ) and the wavelet filter ( 1 1 ) have length L > 2, with L even. In this case, the sequences cl = Heo and dl = Gco would each have length (2N L - 2)/2. Similarly, cj and dj would have length at least 2Np3 (1 - 2-j)(L - 2) (Exercise 8.24). This means that the total length of the DWT for co would be at least
+
+
where J E N indicates the depth chosen for the DWT. Thus, the representation of a length 2N signal (which may be thought of as an 2N-vector) is achieved with at least 2N + J ( L - 2) coefficients. This may be acceptable for certain applications, especially if J and L are srnall compared to 2N, but is not the most efficient representation possible. (2) Periodization. A rriore efficient representation is achieved if the finite signal is viewed as a periodic signal. The following lernnia shows that the DWT defined in Definition 8.21 can be applied to periodic signals. In this case, for a period 2N sequence, c j ( n ) and dj(n) will have period 2N-.j so that if the depth of the DWT is J N, then the DWT of the sequence has exactly 2N coefficients (see Exercise 8.25(b)).
<
Lemma 8.22. For some N E N , let c ( n ) be a period 2 N signal, let h ( k ) be a QI\!lF, define g ( k ) by ( 7 . 2 3 ) , and let H , G , H * , and G* be defined as in. Definition 8.5. T h e n ( H c ) ( n ) and (Gc)( n ) are well-defined sequences (that is, the surns defir~ing( H c ) ( k ) and ( G c ) ( k ) converge absolutely for each k E 2)with period 2 N , and ( H *r ) ( n ) and ( G * c ) ( n ) are well-de,fined sequences with period 2Nfl
'
Proof: Exercise 8.25(a).
8.3.3
T h e D WT as an, Orthogonal Transforrnatiorr,
The DWT of a period M = 2N signal can be thought of as a linear transforrnat,ion takirlg the hl-vector
into the &I-vector
d = [dlId2 1
.--
I ~ I cJ ] ,
where
dj = [dj(0) d, (1) - - . d j (2-JlV1 - I ) ]
8.3. The Discrete Wavelet Transform
233
and CJ =
[ ~ j ( 0~) ~ ( . .1. C)
J ( ~ - ~ M-
I)].
This linear transformation from R~ to R" (for this discussion we are assuming that the data sequence c o i n ) and the scaling filtcr h(k) arc rcal valued) can be represented by an Ad x A l matrix W such that
In the remainder of this subsection, we will make some observations about the strilcti~rea n d properties of the matrix W .
W is an Orthogonal Matrix Because of the orthogonality properties of the scaling and wavelet filters, the matrix W will be an ortliogonal matrix; that is, its rows (and columns) form an ortliogonal set in RM.This rrlearls that W-' = W*, where W* is the conjugate transpose (or adjoint) of W. To see why t,his is true, consider the action of the averaging operator H on a sequerice c ( n ) with period p > 0. By Lernma 8.22, (Hc)(n,) has and car1 period p/2. Thus, H is a linear trarisfornlation from R" to be represented by a p/2 x p matrix. We will call this matrix HI, or sirriply H when its size is clear frorri context. Sirriilarly, the detail operator G car1 be represented by a p/2 x p matrix G,, (or G).
Example 8.23. Let h(k) be a real-valued scaling filter of length four2; that is, h,(k) = 0 if k < 0 or k 4. Define g(k) = ( - I ) ~h(3 - k ) , so that also g(k) = 0 if k < O or k 4, and let p = 8. Then
>
>
and
The approxirnation and detail adjoints H* and G* are represented by the adjoints of the matrices H and G (Exercise 8.27). 'For examples of such filters, see Exercise 8.26
234
Chapter 8. The Discrete Wavelet Transform
Since
=
I,,
where I, is the p x p identity matrix, W, is an orthogonal matrix. Therefore, the first step in the DWT of an A{-vector co is given by
the second step by
and in general, the j t h step by
The A4 x DI matrix W representing the DWT taken to level J is therefore the product of J such matrices. Since each matrix in this product is orthogonal, so is W (Exercise 8.28). Basis Vectors for the Finite DWT Since the DWT d of an M-vcctor co is realized as the product of co with an I\/I x M orthogonal matrix W, it follows that each number in the vector d is the inner product of co with the corresponding row of W. Taken as a set of vectors in R", the rows of W form an orthonormal basis for R"',
8.3. The Discrete Wavelet Transform
235
which is referred to as a discrete wavelet basis for R M .These vectors can be calculated and plotted simply by taking the inverse DWT (8.31)of the canonical basis vector ei = [0 . . . 0 1 0 . . . 0] in Rn*,where 1 is in the i t h position. A plot of the discrete wavelet basis for R16 based on the Daubechies lengtli-four scaling filter is shown in Figure 8.3. This lor firldirlg and displaying the wavelet basis vectors is actually the same as the cascade algorithm described in Section 8.4.2. The only difference is that here we consider our sequences t o be periodic, and in Section 8.4.2, the sequences are assumed to be zero padded.
FIGURE 8.3. Discrete wavelet basis for R~~based on the Daubechies length-four scaling filter.
236
Chapter 8. The Discrete Wavelet Transform
Exercises Exercise 8.24. Let co(n) be a finite signal that satisfies co(n) = 0 if n < 0 or n > 2N - 1 for some N E N. Also suppose that the scaling and wavelet filters h and g ( x ) have length L > 2. Prove that if cj and dj are given by Definition 8.21, then cj and d j are finite sequences with length equal to the smallest integer greater than 2N-j (I - 2-j) (L - 2).
+
Exercise 8.25. (a) Prove Lemma 8.22. (Hint: Use the fact that a periodic sequence must be bounded.) (b) Show that if the depth of the DWT of a sequence with period 2N is J 5 N, then the DWT has exactly 2 N coefficients.
Exercise 8.26. Prove that all four-coefficient scaling filters (that is, QMFs h(n) such that h(n) = 0 for n < 0 and n > 3) can be parametrized by
ho =
JZ -+4
JZ
cosa , hl = 2 4
sin a
2 '
h2=
JZ 4
cosa 2 '
h3=
JZ
sina
4
2
(Hint: The QMF conditions reduce to:
(b) h i
+ hf + h; + h i = 1, and
+
+
+
(1). Show that (ho h2)2 (hl h ~ - )1. ~ (2). Show that ho h2 = hl h3 = &/2.
+
+
Jz + t , hl = $ + s , (3). Letting ho = 4 s, t E R, show that s2 + t2 = 114.)
h2 =
JZ - t , and h3 = a Jz - s , for
Exercise 8.27. Show that when applied to period p signals, the approximation and detail adjoint operators H* arid G* are linear transformations from Rp t o R2" and can be represented by the matrices Hzp and Gzp respectively. Exercise 8.28. onal.
Show that the product of orthogonal matrices is orthog-
8.4 Scaling Functions from Scaling Sequences We have seen how the scaling function, ip(x), associated with an MRA gives rise to a sealing filter, h(k) , namely h(k) = ( c p , cpllk). We have also seen that any scaling filter associated with the scaling function of an MRA
8.4. Scaling Functions from Scaling Sequences
237
must satisfy the QMF conditions (Theorem 8.13). The question we address in this section is: G i v e n a QMF, can we find a scaling function associated with it that gives rise t o a n MRA?
8 . 1
The Infinite Product Formula
(5)with scaling function cp(x),we know by Lemma 7.17
Given an MRA that p(x) satisfies
8 7 ) = mo(y12) @(7/2),
where mo(y) is given by (7.9). Therefore, we may write
Letting n 4 co,it follows that
provided that the infinite product makes sense. In order t o deal with infinite products of functions such as in (8.32), wc will require a few definitions and theorems. I~lfi~lite Products of Numbers
Definition 8.29. Let {z,},~N be a sequence of complex numbers. Then
provided that the limit exists.
Remark 8.30. (a) If z, (b) Let p~ =
n,-,z,, N
= 0 for any
with z ,
n E N, then
lim
r ) =~ z
Then p ~ / p = ~ z- ~~Since . lim p~
= 0.
# 0 for all n , and suppose that
N+w
N--too
nr=,z,,
=
lim
N+oo
p ~ + l=
z,
238
Chapter 8. The Discrete Wavelet Transform
limN+,m z~ = 1. In other words, if a n infinite product of numbers convcrgcs, then t h e limit of t h e terms must be 1. In what follows, we will always assume that the sequence { z , ) , , ~ satisfies z, 0 for all n and tha,t lim,,, z,, = 1.
+
Let {Z,),~N be a sequence of complex numbers. Let log(z) denote the ~ , I . Z ~ L C~Z*~II.LUL~ofI Cthe / ~ logarithm; that is, i f z = Izl e z e , with 0 8 < 27r, then log(z) = In lzl + i 0 . If log(z,) converges, then so does z,.
Theorem 8.31.
<
xr=l
Proof: Let s~ =
N
n:=,
log(zn)- Then
Tf s~ + s as N + cc. then esN + es as N p~ + eS as N + oo. Therefore nr==l z,, = e S .
Definition 8.32.
W e say that
+ oo;or
in other words:
n;=,z, converges absolutely xr=llog(&) if
conuerqes absolutely.
C,,l(~,,-
Let { z , ) , ~ bc ~ a scqucnce of complex numbers. ~f 1 ) converges absolutely, then z, converges ubsolutely.
Theorem 8.33.
n:=,
Proof: From Taylor series, we know t h a t if lzl
If
121
W
< 1, then
< 112, then
+
Therefore, fur /zl < 112, / log(1 z ) 1 5 ( 3 / 2 ) / a / . 00 If x , = , lz, - 11 < 00,then lim,,, z, = 1 so that for all n large enough, lzn - 11 < 112. For all such n ,
Hence x r = = ,1 log(z, ) 1 converges since converges absolutely.
C;=, 1 zn
-
1/ does and so
Hzl• 2n
8.4. Scaling Functions from Scaling Sequences
239
The Scaling Function as an Infinite Product
Theorem 8.34. L e t h ( k ) be a finite QMF, a n d define mo(y) b y (7.9). T h e n for all R > 0 , t h e infinite p m d u c t
converges absolutely a n d zn Lm o n [-R, A].
Proof: Since mu(0) = 1,
Lct C = En Ih(n)(In(.Then
since for all x, ( sin(x)l 5 1x1. Thus,
and so given R
> 0, for all
171 I R ,
Therefore, by Theorem 8.33 for every R absolutely and in LDO on [-R, R].
> 0,
njc,mo(y/2i) converges 17
Theorem 8.34 asserts that the infinite product formula converges uniformly on intervals [-R, R] to some liniit function. Anticipating that this liinit function will be the Fourier transform of our scaling function, let us write M
8(r)=
n
j=1
m"(r/2').
240
Chapter 8. The Discrete Wavelet Transform
Note that since mo(0) = 1, G(0) = 1. It remains t o prove that in fact @(?) is L2 on R so that by Plancherel's formula we may corlclude that our scaling function cp(x) is also L2 on R. Since mo(y) has period 1, the partial e product n j = , rno(7/2j) has period 2'. 'l'herefore, it does not make sense to say the the partial products converge in L2 on R since no periodic function can be close to an L2 function in the sense of the L~ norm. We therefore restrict our attention to one period of each partial product and define
Note that Kt(?) and hence also p e ( ~is) L2 on R and that p e ( ~is) a bandlimited approximation t o the scaling function p(x). The next theorem asserts that in fact k ( y ) converges in L2 to G(y).
Theorem 8.35.
L e t h ( k ) be a finite QMP', a n d let m o ( y ) be defined by (7.9). S.n,,ppose t h a t there is a n u m b e r c > 0 s u c h t h a t
For l E N , define G ( y ) by (8.34) a n d ?(y) b y (8.33). T h e n : (a) ,!%(y) -+ (P(y) in L'
o n R, a n d
Proof: The proof will use Theorern 1.41. Specifically, we will show that: (1) for each R > 0, Fl(7)+ $(Y) in LO" on [-R, R],
(2) there is a constant co and y E R and
(3)
> 0 such that lF!(y)(I co (@(y)(for all
.e E N
JR l a o I 2 d 7 < m.
Once (1)-(3) have been established, we apply Theorem 1.41 as follows. Consider the sequence of functions {Ik(r) - @(y)12)eEN, and note that by (I), for each R > 0, I & ( ? ) - F(7)l2 + 0 in LDO on [-R, R]. Since II.e(7)
-
@( ? ) I 2
1 2(lGh(7)l2+ I ( P ( Y ) I ~ )
(Exercise 8.41), it follows from (2) that IKp(7)
-
G(7)I2 I 2 ( 1 +
4)lG(7)12,
which is L1 on R by (3). Therefore, Theorem 1.41 applies and r
241
8.4. Scaling Functions frorn Scaling Sequences
Proof of (1). Lct R > 0 be given. By Theorem 8.34, we know that
e n,=, mO(y/2j) for all
in L" on [-R, R]. As long as 2'-' > R, = y E [-R, R]. Thus, k ( y ) t p(y) in Lm on [-R, R].
Proof of (2). Since h ( k ) is finite, mo(y) is continuous and hence so is
n:=,
ny=,
mo( r / 2 j ) for ea,ch I t N. Since @(y)= lirn,,, mo(r/2j)uniformly on every interval [-R, R], @(y)is continuous on R. Since @(O) = I, there is an E > 0 such that if Iyl < E, then I@(y)l2 1/2. Since
we may choose J so large that < E for all lyl then for 1 5 j 5 J, /2-jyl 114, and by (8.35),
<
Given
e E N, if lyJ <: 2e-1,
Of course, if 12 1
then 12-e71
5 112. If l y /
< 112,
112 so that
> 2(-' , then E ( y ) = 0; so the inequality holds for all y.
Proof of (3). We will prove by induction that for all !E N, ilpel12 = 1. First, let k' = 1. Then
242
Chapter 8. The Discrete Wavelet Transforrrl
For the induction step, fix l. Then
Since /i1(7)converges ~inifornilyto a(?) on all intervals [-R, R],it follows (Exercise 8.42) that
8.4. Scaling Functions from Scaling Sequences
8.4.2
243
The Cascade Algorithm
Another way to compute the scaling function from the scaling filter is t,o examine the two-scale relation directly. That is, we know that
p(z) =
h(n) 21/2 p(22 - n).
In other words, the function p(z) is a fixed point of the operator defined by
This suggests an iterative scheme to compute p ( x ) . Specifically, we fix some initial function qO(x)and define for all !E N,
Q(X) =
x
h(n) 2'/"7-1(22
-
(8.36)
n).
If the sequence {qe)eENconverges, it will converge to the scaling function. See Figure 8.4. We can prove the following theorem.
Theorem 8.36. Let h ( k ) be a finite QMF, let mo(y) be defined by (7.9), and , f o r ! E N , define r l e ( ~ ) suppose that (8.35) holds. Let q o ( x ) = X , - l / 2 , 1 / 2 ~ ( x )and by (8.36) and p(x) b y (8.33). T h e n :
( a ) q g ( z ) + p(x) in L~ o n R, and
( b ) (Tnp(x)),Ez is a n orthonormal s y s t e m of translates.
Proof: The proof will use Theorem 1.42. Specifically, we will show that:
el(?)
(1) for each R > 0, + $(?) in L" on [-R, R], and (2) given E > 0, there is an R > 0 and an L > 0 such that if!
> L, then
Once ( I ) and (2) have been established, we apply Theorem 1.42 as follows. Consider the sequence of functions { I ? & ( ? ) - G(y)12)eEN, and note that by ( I ) , for each R > 0, IFt(?) @(?)l 2 + 0 in L"" on [-R, R]. Given E > 0, there is an R > 0 and an L > 0 such that if e L, then -
>
by (2) and by the fact that @(?) is L2 011 R (Tlieorerrl 8.35(b)). Since
l?t(?)
-
@(?)I25 2(l@t(?)I2+ l$(7)l2):
244
Chapter 8. The Discrete Wavelet Transform
it follows that
Therefore, Theorem 1.42 applies and linl/
t+m R
1G(r)- G(Y)l2
dry
=
0.
Proof of (1).Let R > 0 be given. Since
in Lm on [-R, R] and it is easy to see that
lim e+x
sin(~y/2~) =I ~ y / 2 ~
in Lm on [- R, R]. Thus, qt(y) + @(y) in L X on [-R, R].
Proof of (2). We will prove by induction that for each 4 E N . JJqclJ2 =1 by showing that for each l E N , {TnqB( x ) ) , ~is~ an orthornormal system of translates. First note that {Tnqo(x)),Ez is an orthonormal system of translates. Next note that for any Q E N , and k E Z ,
mvt, 5)
=
(
h
)2
h
1 ( 2 1n) m
)2
-
I
( - m)
8.4. Scaling Functions from Scaling Sequences
245
by the induction hypothesis and the QMF conditions. Therefore, {T,,rll (x)) is an orthornormal system of translates. Setting k = 0, it follows that for each E N, (w,rle) = Ilrlell; = 1. Given E > 0, choose R > 0 so that
Since
I@(y)1 dy = 1 , this means that
Since by ( I ) , ?jp(y)t @ ( y )in Lm on [-R. R ] ,there is an L E N such that
Since JR lf&(r)12
d y = JR Irlr(x)12dx = 1, this means that
which was to be proved.
8.4.3
The Support of the Scaling Function
We have seen in the previous subsections that in most cases, a finite QAIIF, h ( k ) , gives rise t o a scaling function y ( x ) that is L2 on R by means of the formula X
where m o ( y ) is given by (7.9). In this subsection, we will show that in fact the scaling function associated with a finite scaling filter by means of the above formula is compactly supported and that the length of the smallest interval in which the scaling function is supported is closely rclntcd to thc length of the scaling filter. First we state a very simple lemma whose proof is left as an exercise.
Lemma 8.37. Let {V,) be a n M R A with scaling function p(z) and scaling filter h ( k ) . If p ( x ) i s conzpactly supported, t h e n h ( k ) is a finlte sequence.
Proof: Exercise 8.43. The next theorem shows that the length of the scaling filter determines the length of the support of the scaling function.
246
Chapter 8. The Discrete Wavelet Transform
FIGURE 8.4. Illustration of the cascade algorithm. The scaling filter is the Uaubechies filter of length 4 (see Examples 9.16 and 9.22(a)). Top left: q a ( z ) . Top right: qz (x). Bottom left: T ~ ( X ) .Bottom right: v ~ ( x ) .
Theorem 8.38. Suppose that h ( k ) is a finite QMF, let rno(y) be given by (7.9), and suppose that (8.35) holds. Suppose that for some N E N , h ( k ) has length 2 N ; that is, if ma is the least integer such that h,,, # 0 and illo is the largest integer such that h b ~ ,# 0, then iLf0 - m o = 2N - 1. T h e n the scaling function p ( x ) defined by (8.33) is supported i n a n interval of length 2N - 1.
Proof: Let m ( z ) = XL-1/2,1/21(x),and for !E N, define ve(x)by (8.36). Letting Le be the length of the smallest interval on which rlp(x)is supported, we see that Lo = 1 and that Le satisfies the recursion formula
This recursion formula is solved by
(the reader is asked t o verify this in Exercise 8.45). By Theorem 8.36,~ ( x + ) ~ ( z in ) L2 on R as !+ rn and since Lu 2N la! boo, t h e l e n g t h o f t h e s u p p o r t o f p ( x ) must b e 2 N - 1 .
+
8.4. Sra,ling Filnctions from Scaling Sequences
247
The next theorem shows that the length of the support of the scaling function determines the length of the scaling filter.
Theorem 8.39.
Suppose that p(z) is tlze scalirzg furzctiurz assuciated ,wilh su~rze MRA and that p(x) is supported i n a n internal of length 2N - 1 for some N E N and that N is the smallest such integer for which this i s true. T h e n the scaling filter h.(k) hu.s 1en.gth.2 N .
Proof: In light of Exercise 8.44, we can assume that p ( x ) is supported in the interval [O,2N - 11. We can find numbers €0 0 and €1 0 such that p(z) is supported in the interval [eo,2 N - 1 - el] and not in any smaller subintcrval. Notice also that eo < 2 and el < 2 for if either were larger than or equal to 2, it would follow that ~ ( xis)supported in an interval of length 2N1 - 1 with N1 < N , contrary to the hypothesis of the theorem. Therefore, for each n, E Z, ~ ( 2 . 1-: n,) is supported in the interval [eo/2 n/2, (2N 1 - c1)/2 n/2] and not in any smaller subinterval. By Lemma 8.37, the scaling filter h ( k ) is a finite sequence. Let mo be the smallest integer such that h,, # 0, and let Ago be the largest integer such that h,, f 0. Then the two-scale dilation equation becomes
>
-
>
+
+
and by looking at the supports of both sides of the above equation, we must have
Thus, mo = €0 and h40 = 2N - 1 - €1. This constrains €0 2 0 and €1 2 0 t o be integers and since each is strictly less than 2, the only possible values each can take are 0 or 1. If €0 = 1, then €1 cannot be 1 since this would imply that p ( x ) is supported in an interval of length 2(N - 1) - 1, contrary to hypothesis. Also, €1 cannot bc 0 sincc this would mean that h(k) would have odd length, irr~possiblefor a sequence satisfying the QMF conditions (Remark 8.14). Similarly, if €1 = 1, then €0 cannot be 0 or 1 for the same reasons. Hence €0 = CI = 0 SO that mo = 0 and Mu = 2N - 1. El
Exercises (X,
Exercise 8.40. Prove directly that for each z t R, T L = ~
sin x cos(z/2") = -. .x
(Hint: Use the facts that sin(2x) = 2 sin(x) cos(x) and that sin(x)/x + 1
248
Chapter 8. The Discrete Wavelet Transform
Exercise 8.41. Prove that for any two numbers a and b, ( a 2(a2 + b2). (Hint: Since ( a - b)2 0, 2ab 5 a 2 b2.)
>
+
+ b)2
5
Exercise 8.42. Prove that if for every R > 0, f,,(x) + f (x) in Lm on [-R, R], and if there is a number Ad > 0 such that J l f n 1 2 M for all n, then J J f j I 2 5 A 4 . (Hint: Prove this by contra.dict,ion.)
<
Exercise 8.43. Prove Lemma 8.37. (Hint: Recall that the scaling filter satisfies h ( k ) = (p,c p ~ , ~ ) . ) Exercise 8.44. Let h ( k ) be a finite QMF with scaling function p(x) given by (8.33). If h(k) is shifted by some integer m, prove that the scaling function is also shifted by m. (Hint: Shifting h(k) by m means that mo(y) becomes e2"jrn? mo(y) .) Exercise 8.45. Verify that (8.38) solves the recursion formula given by (8.37) and Lo = 1. Exercise 8.46. Investigate the convergence of the cascade algorithm for 4-coefficient QhlFs given by various values of a in Exercise 8.26.
Chapter 9
Smooth, Compactly Supported Wavelets We have seen in Chapter 7 several examples of orthonormal wavelet, bases. However, the only example we have seen so far of a compactly supported wavelet has been the Haar wavelet. In Section 5.4.1, we saw that the compact support of the Haar wavelets rriearit that Llie Haar decompositiur~had good time localization. Specifically. this meant that the Haar coefficients were effective for locating jump discontinuities and also for the efficient representation of signals with small support. We have also seen disadvantages in the fact that the Haar wavelets have jump discontinuities, specifically in the poorly decaying Haar coefficients of smooth fiinctions (Section 5.4.3) and in the blockiness of images reconstructed from subsets of the Haar coefficients (Section 6.3.1). The goal of this chapter is to construct wavelet bases that have the advantages of the Haar system, namely compact support, but that are also smooth. This should result in good time localization but also better decay of the coefficients for smooth functions and higher quality image reconstruction. The starting point for this construction is the observation rnade in Section 8.4.3 that compactly supported scaling functiorls correspond to finite scaling filters. So we seek finite filters satisfying the QhlF conditions. But how do we know that the scaling function constrlicted via (5.33) will be smooth? The answer is given in the next section.
9.1 Vanishing Moments We have seen that any wavelet $(x) that cornes from an AIRA must satisfy
(Corollary 7.52). The integral in (9.1) is referred to as the zeroth of $(x), so that if (9.1) holds, we say that $(x) has its zeroth urtnishing. Tlle integral JRxk $(x) dx is referred t o as the k t h of $(x) and if JR x k $(x) dx = 0, we say that $(x) has its k t h
moment moment moment moment
vanishing. In this section, we will examine three very important properties of the wavelet $(x) related t o the number of its vanishing moments.
250
Chapter 9. Smooth, Compactly Supported Wavelets
The first property is smoothness. We will show that if {$j,k(x))j,ktz is an orthonormal system on R and if $(x) is smooth, t,hen it, will have vanishing moments. The smoother $(x), the greater the number of vanishing moments. The second property is approximation. We will see that vanishing monients have implications for the efficient representation of functions. Specifically we will see that the wavelet series of a srilootli function will converge very rapidly to the function as long as the wavelet has a lot of vanishing moments. This means that in this case, relatively few wavelet coefficients will be required in order t o get n good approximation. If in addition the wavelet is supported on some finite interval, then we can say that where the function is smooth, few wavelet coefficients are needed but where it is not smooth, more wavelet coefficients are needed for a good approximation. The implications for image compression are clear: Where the image is smooth, we need t o keep only a few coefficients and where it is nol srr~uotli (i.e., where there are edges) we need more coefficients. The third property is the reproduction of polynomials. This property says that if $(x) has compact support and N vanishing moments, then any polynomial of degree N - 1 call be written as a linear combination of integer shifts of the scaling function p ( x ) . Loosely speaking, this says that polynornials of degree N - 1 reside in the scaling space Vo.lA more precise way of saying this is the following. If f (x) is a piecewise polynomial function of degree N - 1, L2 on R, then the polyrlomial parts of f (x) will be In other words, if f (x) is a degree invisible to the wavelets { $ j 3 . k ( ~ ) ) j . k E Z . N - 1 polynomial on the support of the wavelet $j,k (x), then (f,G j , k ) = 0. This means that the nonzero wavelet coefficients of f (x) will only occur when the support of $j,k(x) contains a point of discontinuity of f (z), that is, a point where f (x) changes from one polynomia.1 to another. Since any srrlootli fulictioii can be well approximated by piecewise polynomial functions, this property can be thought of as a restatement of the general principle that where a function is smooth, few wavelet coefficients arc nceded to accurately represent it, and where a function is not smooth, more wavelet coefficients are required.
9. I . 1
Vanishing Moments and Smoothness
The goal of this subsection is to prove Theorem 9.3, which relies on Theorem 9.1 below. The reader may recognize that the conclusion of Theorem 9.1 is the same as that of Corollary 7.52. The difference is that Theorem 9.1 assumes only some smoothness of the functiorl $(x) and orthogonality. It does not require that the collection of functions { $ j , k ( ~ ) ) J , k E Z he an or] This statement is not correct mathematically since polynomials are not L~ on R and hence cannot be elements of Vo.
9.1. Vanishing Moments
251
thonormal basis, nor that it be associated with an MRA.
Theorem 9.1. Suppose that {$,,k ( x ) } , , i~s ~a ~n orthogonal system o n R and that $(z) and $(?) are both L' o n R. T h e n
Remark 9.2. (a) The assumption that $(x) is L1 guarantees that the integral JR $(z) d~ exists, and the assumption that $(y) is also L1 can be viewed as a smoothness assumption since by the Riemann-Lebesgue Lemma (Theorem 3.9), if $(y) is L ~then , $(x) is uniformly continuous on R and goes to zero at infinity. Also note that we have not assumed that {$j,k(x))j,kE~is complete. (b) The idea of the proof is contained in the observation that if J $ f 0, then by a suitable normalization, we can assume that $ = 1. 111 this case, the collection of functions {2il"j,0(x): j E N) form an approximate identity on R. Thus, A
A
P
but by orthogonality,
for all j . Thus $(O) = 0. We can shift the argument to any dyadic point xo = 2J0ko,jo,ko E Z , by noting that as j -+ m,
=
L
$(2)2jI2 ?,hj,o (x
-
xo) dx
+ .1cl(xo).
Therefore, $(xo) = 0 at every dyadic point, and since $(x) is continuous, 0. we arrive at the absurd conclusion that $(x)
Proof of Theorem 9.12: Since ,$(z)is continuous, clioose a dyadic point xo = 2J0kosuch that (jo, ko) # (0,O) and $(xo) # 0. By Parseval's formula, 2 ~ h proof e of Theorem 9.1 is taken from the paper by Benedetto, Heil, and Walnut,
252
Chapter 9. Smooth, Compactly Supported Wavelets
Choose a sequence tkj E Z satisfying 2-jkj = 2jok-o= u-0 for all j E N (just This can be done as long as j > -jo.Then as j + m, let k j = 2j+jqlco).
Since $(zo)# 0, ;(0)
=
0, which is the same as (9.2).
A similar argument applies to the case of higher vanishing moments. Theorem 9.3. Let $(z) be such that for s o m e N E N , both x N + ( x ) and y ~ f;(?)' are L' O,JLR. IS { $ l l r k ( ~ ) ) l l , l c E ,is C L 07-thogolzal ~ s y s t e m o n R, t h e n z m + ( z ) d z = ~for
O
(9.3)
Remark 9.4. (a) The assurrlptiorl that z N $(z) is L' guarantees that each of the integrals in (9.3) exists.
(b) The assumption that y"+'$(7) is L' can be viewed as a smoothness assumption. Note that since (27rfy)"+' G(7) is L' on R,$(x) has N 1 continuous derivatives and since
+
'1 (z)is uniformly continuous and vanishes at infinity by the KiemannLebesgue Lemma.
dl("+
Proof of Theorem 9.3: The proof is by induction on m. Uncertaznty principles for time-frequency operators, Operator Theory: Advances and Applications, vol. 58 (1992) 1-25.
9.1. Vanishing Moments
253
If m = 0, then the result holds by Theorem 9.1. Assume that the result holds for 0 5 m 5 k - 1 for some k 5 N. By Taylor's formula, 1 G(7)= yk $(r)(0) + Rk(?)r
2
where
for some
between 0 and y. Thus,
Choose zo = 2j0ko such that $ J ( ~ ) ( # z ~0 )and (jo,ko) # (0'0). It follows LS in the proof of Theorem 9.1 that for all j 2 -jo,
By (9.4) and the fact that
ly
lN+1($(7)l is L1 on R,
Also,
Therefore, 0
and
cI,2-jk
+ I2
( k ) (0) d k ) ( x o )
0 = C I , $(") (0) l/:(k)(zo)+ 2 5 q 2 . A
Since by (9.5), 1 2 I2~/ 5 ~ CI, 2 - j , letting j
or G("(0) = 0, which was to be proved.
cc gives
254
9.1.2
Chapter 9. Smooth, Compactly Supported Wavelets
Vanishing Moments and Approximation
The purpose of this subsection is to show that a wavelet with many vanishing moments does a good job of approximating smooth functions. By a smooth function, we mean one with a large number of continuous derivatives. We will show that the wavelet coefficients of such a function will have very rapid decay as j -+ m. To make the proofs easier, we will assume that our wavelet $ ( z ) has compact support.
cN
Theorem 9.5.
G i v e n N E N , assume that the function f ( z )i s o n R, and that f ( N ) ( z ) is Lm o n R. A s s u m e that the function @ ( z ) has compact suppor.1, that
xm +(x) d z = 0,
for
o5m5N
-
I
(9.6)
and that S,/$,,k(x)12 d z = 1 for all j , k E Z . T h e n there is a constant C depending only o n N and f (x) such that for every j , k E Z,
>0
Proof: - (1)- Suppose that $(x) is supported in the interval I, which has the = [0,a] for some a > 0. It follows that the function Q j l k ( x )= form I = 23/'$(21x - k ) is supported in the interval = [2-3k, 2-1(k a ) ] ,and that the length of denoted is 2 ~ j Denote a the center of the interval Ij,k by Ej,k, and note that % , k = 2-(3+')a 2-jk. As a consequence of (9.6), given any polynomial p ( x ) of degree no greater than N - 1, and any j , k E Z ,
Gtn
IG,~~,
+
+
( 2 ) Since f ( z ) is C N on R, for each j , k t Z , f ( x ) can be expanded in a Taylor expansion about the point E j c k . That is,
where
.-
for some number J between Zj,k and x. If x E
Ij,k,
then we have the estimate
9.1. Vanishing Moments
255
(3) Applying (9.8) to (9.9),we compute
Applying the estimate (9.10) and the Cauchy-Schwarz inequality,
Note that with C = ( l / N ! )a3j222-N 11 f ( " ) 1 1, ,
(9.7) is satisfied.
We can actually go a little bit further by observing that since f (")(z)is C0 on R and since the lengths 1 i 0 as j i m, f (N)( E ) will be very close to f(N)(Zi,k) for 5 E I;,a.Therefore, for large j ,
Hence
256
Chapter 9. Smooth, Compactly Supported Wavelets
Hence we can make the qualitative statement that as j gets large,
'l'he value of (9.11) is that it identifies the decay of the wavelet coefficients of a smooth function as a local phenomenon. That is, suppose that f (x) is not CN on all of R, but does have N continuous derivatives at some point zo.This means that f (N)(x)is in fact defined on some small interval I' containing the point xo. Since $(x) is supported in the interval I as described above, computing the coefficient ( f ,Qj,k) requires knowing the values of f (z) only on the interval ?;,k. From this, it follows that the estimate (9.11) holds for every J and k such that &,r L 1'. The above paragraph suggests the general principle that the wavelet coefficients corresponding to the smooth parts of a function will be very small compared with the wavelet coefficients corresponding to the nonsmooth parts of a function. This observation has implications for the compression of images using wavelets since marly classes of images consist of large areas of constant intensity (for example, background areas) that can be interpreted as the smooth parts of the image, separated by edges, that can be interpreted as the nonsmooth parts of thc imagc. Therefore, the wavelet transform of such an image will have a few large coefficients, corresponding to the edges, and a lot of small coefficients, corresponding t o the regions of near-constant intensity. This is just what is required in most image compression algorithms.
Example 9.6.
Consider the linear spline function f (x) defined by
It is clear that f(x) is linear on the intervals (-co,-I), (-1: O), (0, I ) , and (I,m), and hence continuously differentiable infinitely many times there. At the points -1, 0, and 1: f (x) is continuous but has a discontiniuty in
9.1. Vanishing Moments
257
its first derivative. Hence f (x)is only C0 on R, but in fact f (x)is C" at all but a few points. Now suppose that $(x) is a real-valued wavelet function with two vanishing moments; that is,
and that $(x) is supported in the interval [O, 31. We will see in Section 9.2 that such a wavelet exists. If j = 0, then by considering the support of the fullctions $ J ~ , ~ ( Xit) , follows that ( f ,Q O , k ) = 0 for k 1 and for k -4, so that there will be at niost four nonzero wavelet coefficients of f (x) at this scale. The same holds when j is any negative integer as well. = If j = 1, then support considerations lead to the conclusion that (f, O for k 2 2 and for k I -5 leading to at most six nonzero wavelet coefficients at this scale. For j = 2, something different happens. Support considerations lead to the conclusion that (f, ?,b2$) = 0 for k 2 4 and k I -7, but we also observe that if k = 0, then
>
<
since $(x) has two vanishing moments. The same holds when k = -3 and k = -4. In general, we observe that ( f , $ ~ ,is, ~ zero ) whenever the support of ll,j,k(x)is entirely contained in either of the intervals [ O , 1 ] or [-I, 0] since f (x)is linear there, and that the only potentially nonzero coefficients in the expansion occur for those j and k for which the support of Gj,k(x)contains the points at which f (x) has discontinuities in its derivative.
9.1.3
Vanishing Moments and the Reproduction of Polynomials
Corollary 7.54 said that any compactly supported scaling function associated with an MRA must satisfy C , y ( x n) = 1. In this subsection, we generalize that result t o say that if the wavelet $(x) has N vanishing moments, then for any 0 < k < N - 1, the polynomial x"an be reproduced exactly as a linear combination of integer shifts of the scaling
+
258
Chapter 9. Smooth, Compactly Supported Wavelets
function. Specifically, we show that there exists a sequence of coefficients { q k , n ) n E Zsuch that qk,rcp(x n ) = x k . This result g e n ~ r a , l i x to ~ s say that any polynomial of degree N - 1 can be exactly reproduced as a linear combination of integer shifts of the scaling function (Exercise 9.14).
En
+
Lemma 9.7. Let p ( z ) be a compactly supported scaling function associated with a n MRA, and let $ ( x ) be the wavelel deJi7~t.d by (7.24). If ,$(x) has N vanishing moments, then @ ( k ) ( n=) 0 for all integers n f 0 and 0 5 k 5 N - 1.
Proof: By Theorcrn 9.11 (b), $(x) has N vanishing moments if and only if mC1(1/2) = O for O 5 k N - 1. Suppose first that n is an odd integer so tfhatn = 2m+ 1 for some m E Z. since G(r)= mo (712) Gi(r/2),
<
If n # 0 is even, then n can be written n = 2 P m for some p E N and odd integer m. Since
by the previous paragraph.
Lemma 9.8. Let p ( z ) be a compactly supported scaling function nssocin.ted with a n MRA, and let + ( x ) be the wavelet de$ned b y (7.24). If $(z) has N vanishing moments, then
9.1. Vanishing Moments
259
Proof: Fix k . Since p(z) has compact support,, ( 2 ~ i . zcp(x) ) ~ is L' on R, and by Corollary 3.34,
The same argument as in the proof of Corollary 7.54 gives
by Lemma 9.7 and (9.12) rolluws.
Lemma 9.9. Let p(x) be a com,pactly supported scaling function associated with a n MRA, and let $(z) be the wavelet defined by (7.24). If $(z) has N vani,sh,i,n.g m,om,ents, t h e n for each 1 of degree k - 1 such that
k
-
1, thcrc i s a polynomial
Proof: The proof will be by induction on k. For each k, let
Recall that by Corollary 7.54, En y ( x
+ n) = 1. If k = 1, then
Therefore,
x n , p ( x + n , )= -x+al, and (9.13) holds for k = 1. For the induction step, not,^ that by the binomial theorem,
pk-1 (z)
Chapter 9. Smooth, Compactly Supported Wavelets
260
+ Letting P ~ - I ( x )be the degree k
n%(x -
+ n).
1 polynomial defined by
we have that
Since by Exercise 9.12,
0 -2
if k is odd, if k is even,
(9.13) follows.
A straightforward induction argument, left as an exercise (Exercise 9.13), gives the following theorem. Theorem 9.10. Let y(x) be a compactly supported scaling function associated w i t h a n MRA, and let $(x) be the wavelet defined by (7.24). If $(x) has N vanishing m o m e n t s , t h e n for each znteger 0 5 k < N - 1 , there are coeficients {qk,n )
9
4
i
L
~
such that ~
Equivalent Conditions for Vanishing Moments
The goal of this section is to prove a theorem giving ecluivdent conditions for vanishing nioments of a wavelet $ ( x ) . In the theorem, va.nishing moments of $(x) are characterized in terms of conditions on the auxiliary
9.1. Vanishing Moments
261
function m o ( y )and on the scaling filter h ( n ) . Thus the idea will be to find a QMF h ( n ) satisfying the additional conditions guaranteeing vanishing moments of $ (2).
Theorem 9.11.
Let p(x) be a compactly supported scaling function assoczated with a n MRA with finite scaling filter h(n). Let $(x) be the corresponding wavelet given b y (7.24). Then for each N E N , the following are equivaler~t.
( b ) m c ' ( l / 2 ) = 0,for 0 5 k
< N - 1.
(c) mo(y) can be factored as
for some ~ e r i o d1 trigonometric polynomial C(y) h ( n ) (-1)" n"
(d)
0 for 0 5 k
< N - 1.
Proof: (a) (b). Note that (a) is equivalent to the statement that G("(0) = 0 for 0 < k 5 N 1. By construction,
*
-
Therefore, since @ ( O ) = 1,
if and only if
m o ( 1 / 2 )= 0;
$(o)
=
n r o ( l / 2 )(-@'(O)
+ 27ri $(o)) + -21 rnb(1/2) Q(0) = 0
if and only if
mb(1/2) = 0, and an easy induction argument leads to
if and only if
rnc'(1/2) f o r O < k i N-1.
=0
Chapter 9. Smooth, Compactly Supported Wavelets
262
*
(b) (c). Let A(z) = (I/&) C , h(n) C n be the z-transform of h. Then A ( e 2 " i ~ )= mo(y), and we may assume that A(x) is a polynomial in x-' with a nonzero constant term (see Remark 4.18(a)). If not, all we need to do is look a t x ( z ) = z p L A ( t ) for some appropriate integer L. Everything that follows could as easily be done for A j r ) as for A(z). Note that since eTi = -1, A(-1) = mo(1/2). Therefore, if (b) holds, then A(')(-1) = 0 for 0 5 k 5 N - 1.
By Taylor's formula,
where RN(z) is some polynolnial in x-l. Letting x = eZTi7,then
where
which is a period 1 trigonometric polynomial since R N ( z ) is a polynomial in z-'. This is (c). If (c) holds, then A(z) can be written
for some polynomial P ( x ) in x-'. Hence A(z) has a zero of order N a t z = -1. If xo is on the unit circle, then zo = e2"i~ofor some real number yo. By Exercise 9.15, zo is a zero of A(z) of multiplicity p if and only if yo is a zero of mo(y) of multiplicity p. Therefore, since x = -1 is a zero of multiplicity N for A(x), 1/2 is a zero of m~iltiplicityN for mo(y). This is (b). (b)
* (d) Since
and since h(n) is a finite filter, it follows that
9.1. Vanishing Moments
263
Therefore,
Therefore,
x
h ( n )n' (-1)"
=0
if and only if
(k)
rno (1/2) = 0
n
Exercises Exercise 9.12. Prove that for any integer k k-1
e= 1
0 -2
> 1,
if k is odd, if k is even.
(Hint: By the binomial theorem, for any numbers a and bi
so that
Exercise 9.13.
Prove Theorem 9.10.
Exercise 9.14. Assume the hypotheses of Theorem 9.10. Prove that for such any polynomial p ( x ) of degree N - 1, there exist coefficients {tfn),,z that ifn cp(z n ) = p ( x ) .
En
+
Exercise 9.15. Let c ( n ) be a finite signal with polynomial z-transform C(z) = c ( n ) z-n and Fourier series E ( y ) . Prove that y F [O, 1 ) is a zero of multiplicity m for Z ( y ) if and only if x = e 2 " i ~is a zero of multiplicity m for C ( x ) .
En
264
Chapter 9. Smooth, Compactly Supported Wavelets
9.2 The Daubechies Wavelets Theorem 9.3 says that if { $ ~ ~ . ~ (isz an ) ) orthonormal wavelet basis, and the wavelet $ ( z ) is smooth, then it rnust have vanishing moments. The smoothcr thc wavelet, the more moments that must vanish. Therefore if we are interested in the construction of smooth orthonormal wavelets, we must look for wavelets $ ( z )with many vanishing moments. This is the approach taken by I. Daubechies who constructed a family of smooth, compact,ly supported wavelet bases. The Daubechies wavelets have the largest number of vanishing moments for their support. Specifically, for each N E N, the Daubechies wavelet of order N has N vanishing moments and is supported on the interval [O,2N - 11. The Daubechies wavelets also become smoother with increasing N .
9.2. I
The Daubechies Polynomials
In light of Theorem 9.11 and Theorem 8.39, t o construct a, wa,velet with N vanishing moments and that is supported in the interval [O,2N - 11, we must find a finite scaling filter h ( k ) of length 2N or equivalently, a t,rigomometric polynomial of the form
satisfying Theorem 9.11(c) and the QMF conditions. For simplicity and because we are primarily interested in real-valued wavelets, we assume that our filters h ( k ) are real valued. If Theorem 9.11(c) holds, then for some trigonometric polynomial L ( y ) ,
where L ( y ) = IC(y)I? Hence
Since all of the h ( n ) are real, L ( y ) is a real-valued trigonometric polynomial with real coefficients. It follows that if
thcn c ( n ) = c ( - n ) so that
9.2. The Daubechies Wavelets
265
Since c o s ( 2 ~ n y can ) be written as a degree n polynomial in cos(21~y) with real coefficients (Exercise 9.23), L(y) is a polynomial in c o s ( 2 ~ y ) Since . c o s ( 2 ~ y )= 1 - 2 sin2( ~ y ) L(y) , can also be viewed as a polynomial in sin2(IT?). In order for the QMF condition lmo(y) Imo(y 1/2) = 1 to hold, we must have for some polynomial P,
l2 +
+
'1
Letting y = sin2(ny), this means that we must find a polynomial P satisfying the equation
<
Also, since L(y) = p(sin2((ir7)),and since 0 sin2(j7y) must also satisfy P(g)LO, for yE[0,1].
< 1 for all y, P ( y ) (9.16)
Finding a polynomial satisfying (9.15) and (9.16) can be done in several ways (Exercise 9.24).3 Fix N E N. Then,
Making the change of index m = 2N - 1- k in the second sum and observing
we continue
3 ~ e r ewe use a n idea of Strichartz in the excellent article How to make wavelets, American Mathematical Monthly, vol. 100 (1993) 539-556.
266
Chapter 9. Smooth, Compactly Supported Wavelets
where PN-l ( 9 ) is the degree N
-
1 polynomial defined by
For example,
Example 9.16. Let us calculate the Daubechies scaling filter for a few values of N. Recall that
where
Obtaining mo(7) requires factoring P
This factorization is trivial, yielding
~ (sin2(Ty)) - ~ to find L(y).
9.2. The Daubechies Wavelets
267
Therefore,
h ( n )=
{
.JZ
if n = 0, 1,
otherwise,
and we have recovered the Haar system.
JNsince ~ ~ ( y 1)+ 2y, =
We seek a trigonometric polynomial C ( y ) of thc form
C ( ~ ) = a + b e - ~ ~ a~ ,~ b, ~ R . We obtain
~ ( yl 2 = ) (a
+ b eC2"")
(a
+ b e2"'?) = ( a 2 + b2) + 2nb cos(27iy).
By matching coefficients,
so that
( a + b,
2
2
= a 4-
b
2
+ 2ab = 1
Since mo(0)= C ( 0 ) = 1, a
and
( a - h)2 =
= =
1,
a.
Solving gives
Thercfore,
and
-
2nb = 3,
+ b = 1, and we have the 2 x 2 linear system, a+b a-b
a=
+ b2
I+& 2
and
b=
1 - v'5 2
268
Chapter 9. Smooth, Compactly Supported Wavelets Since P2(y) = 1
I q y ) l2
+ 3y + 6y2,
=2(sin2(.ir:i)) = 1 + 3 s i n 2 ( . i r Y ) + 6~ i n ~ ( ~ y ) =
We seek a trigonometric polynomial
L(?)of
the form
We obtain after some manipulation
By matching coefficients, we obtain the nonlinear system,
We can solve the system by noting that
but since, as before, a
+ b + c = 1,
(a- b + ~ = ) (~ a + b + c - 2b)2 = (1 - 2bj2. Thus,
Choosing the root b = (1/2)(1 -
m)and substituting back' we obtain
Solving the resulting quadratic equation gives the solutions
9.2. The Daubechies Wavelets
269
Substituting back into
9.2.2
Spectral Factorization
For N 2 4, solving for the coefficients in the Daubechies scaling filters becomes very complicated; so we are interested in a more general technique to find the coefficients. The technique of spectral factorization is fairly standard in the engineering literature. Recall that in designing the Daubechies scaling filters, we encountered the equation
where P N - ~ ( is Ythe ) Daubechies polynomial of degree N the degree 2 N - 1 polynomial p 2 ~ - l ( y )by
so that
+ p 2 N - 1 (1
P ~ N - 1 (siii2(ny))
-
-
1. We define
sil12(.;r7)) = 1.
Since sin2(rr) = (1- cos(2;77))/2, and since c o s ( 2 ~ y= ) (ezTi7+ e - 2 T z)/2) ~ ~ 2 ~ - l ( s i n ~ ( ; .can r ~ )be ) written as a trigonometric polynomial. We define
270
Chapter 9. Smooth, Compactly Supported Wavelets
that Pzni-l(e2"") = has the form SO
(sin2(T?)).By Theorem 9.18(a), PZN1(2) 2N-1
P2N-l(~)= amzm. m=-2N+1 We will often want to refer to the polynomial in z having the same coefficients as P 2 N P 1 ( 2 ) ; SO we define
-
where a,
= a,-z~.
Example 9.17. N.
Let us compute the polynomials named above for various
9.2. T h e Daubechies Wavelets
271
The functions P 2 N - 1 ( ~satisfy ) some very special properties, which we summarize below.
Theorem 9.18.
For each N E N , P 2 N - 1 ( ~ satisfies: ) 2N-1
( a ) Pzn-- I ( z ) =
a,, z
m
for some real-valued coefficients a,, .
m=-2N+l
+
( b ) P Z N - ~ ( z ) P Z N - I ( - Z ) = 1 for all z E C , z
( d ) P Z N - ~ ( z=) P ~ N - I ( Z - ' ) for all z E C , z
(e) a,
= a-,
for -2N
+ 1 5 m 5 2N
( f ) am = O i f m i s even and m
-
# 0, and
# O.
# 0.
1.
ao = 1/2.
+ +
nNP1
Proof: (a) Since ( z ) = P2N-1(1/2 ( Z z-')/4), where (y) is a degree 2N 1 polynonlial with real coefficients, P2N-1 (x) is a degree 2N - 1 polynomial in ( x x - l ) and can be written -
+
Chapter 9. Smooth, Compactly Supported Wavelets
272
where we have made the change of index k = 2! order of the summation. Since both c, and rk,,
m and interchanged the are real, (a) follows.
-
(b) Since z2"-l P 2 N - I (is~a) polynomial in z of degree 4N - 2, it will be enough to prove that (b) holds at more than 4N - 2 points. In fact, we will show that (b) holds for all lzl = 1. Let lzol = 1. Then zo = e2"i~'jfor some yo E R. Thus (b) is equivalent to
Since - e 2 r i y o
-
e?r,i e 2 ~ i ~ ~e2STz(yo+1/2). ) But
by definition,
so that (b) is equivalent t o
which is just (9.15) with y = sin2(;7y). (c) Follows from the definition of P 2 N - 1 ( ~ ) (d) Follows from the fact that (z z-I). (e) Follows from (a) and (d).
+
(f)
P2N-l(d) can
be written as a polynomial in
BY (b) and (a),
By matching coefficients, a, = 0 if m is even and nonzero, and
-
a0 =
112.
n U
In order to use spectral factorization on the polynomial P 4 N - 2 ( we ~), will require a full understanding of the location of its zeros. This is the content of the following Lemma.
Lemma 9.19.
Let N E N be given. Then:
9.2. The Daubechies Wavelets
273
-
( a ) P ~ N - ~ ( has z ) a zero of order 2N at z = -I; that is,
for some degree 2N - 2 polynomial & 2 N - 2 ( ~ ) .
(c)
-
If zo E C, zo # 0 and if zo zs a zero of mu~tzplzcity - m for P 4 N - 2 ( ~ )then , -1 z , , zn, and E-' are zeros of multzplicity m for P 4 ~ - 2 ( 2 ) .
R e m a r k 9.20. Lemma 9.19 shows that the zeros of three categories.
P 4 N P 2 (2)
fall into
(1) The zero at -1.
(2) The real zeros not equal to -1. By Theorem 9.18(d), these zeros come in pairs, namely (zo,z i l ) , and since z0 # *l, one of the pair must have absolute value less than 1 and the other absolute value greater than 1. For the purposes of the proof of Lemma 9.19, let us define ZR by
-
(3) The nonreal zeros. By Lemma 9.19: and since P 4 N - 2 ( ~ has ) real co-1 - --I efficients, these zeros come in clusters of four, namely (zg, z0 , zg, zg ), and only one of these zeros can lie within the unit circle and in the upper half-plane. For the purposes of the proof of Lemma 9.19, let us define Zc by (9.22) Zc = {zo E C :P4N-2(zO) = 0, lzOl < 1, ~ ( Z O>)0).
P r o o f of L e m m a 9.19: (a) By the definition of p 2 ~ -(x), 1 P2N-l(")
Thus
1 z+x-I
=
-
%N-1(:
(1
5(
x+x-I 4
))
))
N
(1
1
+f ' ) )
=
(1
=
z 4l z ) 2N P4 N - l ( ~ ( z4 + z l ) )
-
-z p N
-
-
2
1
1 4N
(r
PN-1
2
1
4
- -
(2
1
(-2
+ 1l2" pN-l
1
1 (z 4
- -
+XI)).
Chapter 9. Smooth, Compactly Supported Wavelets
274
Since PNPlis a polyrlomial of degree N is a polyiioniial of degree 2N
-
-
1, zNpl PNPl
2.
(b) By definition, P4N-2(1) = P 2 N - l (1) and by Theorem 9.18(a) and (b),
(c) Since ~ ~ Theorem 9.18(d),
~ has ~real ~coefficients, ( 2
-
PQW(Z) ) =
If zo is a zero of multiplicity m, then for each 0 5 k 5
P4~-2(-)
rrl
-
and by
1,
Hence s t 1 is also a zero of multiplicity m. A sirrii1a.r argument shows the same for % and %- 1 . Let N E N . T h e n there exists a polynornial BZN-1( z ) of 1 with real coeficients such that
Theorem 9.21. degree 2N
-
+
Moreover, B Z N - ( z ) = ( z 1 ) C N - I( 2 ) for some degree N ( z ) with real coe.ficients.
-
1 polynomial
CN-
-
Proof: Since both sides of (9.23) are polynomials, it will be enough t o show that (9.23) holds for all x with 1x1 = 1. Now, by Lemma 9.19, P4NP2(z)can be written as a product of (z+l)'" with factors of the form (2-20) (z-z,~), -, xu E Zc. where 2, E ZR and (2-2") ( z - z l l 1) ( Z - ~ ) ( X - X ; ~ )where If = 1, then z = zP1 and
IxI
9.2. The Daubechies Wavelets
275
so that 1 (z - zo) (x - xo I )-) = 1 - --I zl I Z - zoI2 = Izol-I Iz - zoi2. X" Courltirlg multiplicities, P 4 N - 2 (has ~ )a total of4N -2 zeros with exactly 2N zeros at -1. The remaining zeros occur either in pasirs of the forrn (z,, 2,') if zo E ZR or in quadruplets of the form ( z o ,-z - ' , X;)' with xu E Zc. Hence, remembering that lzl = 1 and that P 4 N - 2 (2 ~ )0 for (21 = 1, --
z,
-
-
P4~-2(z) =
IP~N-~(x)I
Hence the result follows with
It remains t o verify that B 2 N - 1 ( ~ has ) real coefficients and that it has the factorization described in the theorern (Exercise 9.25).
Example 9.22.
-
(a) With N
=
2, we have seen that
Since P6(2) has four zeros at -1, we factor
276
Chapter 9. Smooth, Compactly Supported Wavelets
Therefore,
-
l+fiz" 3 + 8
8
d z2 + 3 - f i
z+-.
I-&
8
This leads to the same scaling filter as in Example 9.16.
(b) With N = 3,
Sirlee ~ ~ ~has( six z zeros ) at
- 1,
we factor
where
Therefore.
which leads to the same scaling filter - as in Example 9.16. Figure 9.1 shows the zeros of P4N-2, for N = 1-4. The corresponding scaling and wavelet functions, computed using the cascade algorithm, are shown in Figure 9.2.
Exercises Exercise 9.23. Prove that for any n E N, cos(2srny) can be written as a degree n polynomial in cos(2ry) with real coefficients. Exercise 9.24.
(a) Use Taylor's Formula t o show that
FIGURE 9.1. Zeros of the polynomials ~ i N - 2 , for N = 1 (upper left), N = 2 (upper right), N = 3 (lower left), and N = 4 (lower right).
for Iyl
< 1. Verify that
(b) Prove that for some corlstalit (1
-
CN
2 N 2 =1 sin (ny)) PN-l (sill (nTTY))
Exercise 9.25.
-
CN
s i 1 1 ~ (~2 -7~~dd. )
Complete the proof of Thc~orenl0.21.
9.3 Image Analysis with Smooth Wavelets In this section, we will apply the DWT to images in the form of square matrices. The procedure will be to apply the periodic one-dimensional transform row-wise and colilmn-wise exactly as for thc DHT for- ~na,t,riccs.
278
Chapter 9. Smooth, Compactly Supported Wavelets
9.3.1 Approzimation and Blurring One very beneficial effect of using smooth wavelets in image analysis is that the blocking effect present with the Haar wavelet (see Section 6.3.1) is significantly reduced. In Figures 9.3-9.?5below, we retain only the approximation matrices when reconstructing an image lisirlg the four-coefficient (db2), 8-coefficient (db4), and 12-coefficient (db6) Daubechies wavelets. The resulting blurred images are much less blocky and more pleasing to t8heeye than are the corrcspoilding images using Haar.
9.3.2
"Nui71e" Frr~uye Corr~pressio7~ w.itlz S7r1,ootl~W ( ~ vlets e
Here we repeat the calclllatioris of Sectiorl 6.3.3 rising. slnoother wavelets instead of the Haar wavelet. The basic principle is the same: If the irnage consists of large areas of corlst,arlt intensity separated by edges, the c-letlail matrices will contain rriany elenierlts that are nearly zero. By setting the sirlallest coefficierlts to zero, we can achieve significant compression of the images while retairiing rrlost of the important features. This procedure is carried out in Figures 9.6-9.8 using tllc four-, eight-, and twelve-coefficient Daubechics filters. The same nuurlt)er of coefficients are retained in each corrlprcssetl image, but the images are rnl~cllbetter looking. Tlle rnain irriproverrlerlt over thc cornpressed images iri Figure 6.10 is t: reduction of 1)locking cffects.
9.3. Iniage Analysis with Smooth Wavelets
279
FIGURE 9.2. The Daubechies scaling and wavelet functions with two vanishing moments (top), four vanishing moments (middle) and six vanishing moments (bottom).
280
Chapter 9. Smooth, Compactly Supported Wavelets
FIGURE 9.3. Original i111ag.e (Cop left). Reconstruction using Daubechies four-coefficent filter and only the cl coefficients (top right), c2 coefficients (bottom left), and c3 coefficients (bottom right).
9.3. Image Analysis with Smooth Wavelets
281
FIGURE 9.4. Original image (top left). Reconstruction using Daubechies eight-coefficent filter and only the cl coefficients (top right), c:! coefficients (bottom left), and c3 coefficients (bottom right).
282
Chapter 9. Smooth, Compactly Supported Wavelets
FIGURE 9.5. Original image (top left). Reconstruction using Daubechies twelve-coefficent filter and only the cl coefficients (top right), cz coefficients (bottom left), and cs coefficients (bottom right).
9.3. Image Analysis with Smooth Wavelets
283
FIGURE 9.6. Original image (top left). Compressed image using Daubechies $-coefficient filter with smallest 80% (top right), 90% (bottom left), and 97% (bottom right) of DWT coefficients set to zero.
284
Chapter 9. Smooth, Conlpactly Supported Wavelets
FIGURE 9.7. Original image (top left). Compressed image using Daubechies B-coefficient filter with smallest 80% (top right), 90% (bottom left), and 97% (bottom right) of DWT coefficients set to zero.
9.3. Image Analysis with Smooth Wavelets
285
FIGURE 9.8. Original image (top left). Compressed image using Daubechies 12-coefficient filter with smallest 80% (top right), 90% (bottom left), and 97% (bottom right) of DWT coefficicnts sct to zero.
Part IV
Other Wavelet Constructions
Chapter 10
Biort hogonal Wavelets In Chapter 2, we considered the notion of orthonormal bases that have infinitely many elements and that can be used t o represent arbitrary L2 functions. In this section, we will consider nonorthogonal systems with many of the same properties. Such systems are referred t o as Riesx bases.
10.1 Linear Independence and Biort llogo~lality The notion of the linear independence of vectors is an important concept in the theory of finite-dimensional vector spaces like Rr',and is closely related to the notion of a basis. Specifically, a collection of vectors { v l , v2, . . . , v,) in Rn is linearly independent if any collection of scalars {a1,a 2 , . . . . a,) such that a1v1 +a2v2 . . . +Q,V, =0
+
must satisfy a1 = 02 = . . . - a, = 0. If in addition m = n: that is, if the number of vectors in the set matches the dimension of the space, then {vl, v2, . . . , v,,) is called a basis for Rn. This means that any vector x E Rn has a unique representation as
where the pi are real scalars. How are the scalars p, computed? It can bc shown that there exists a unique collection of vectors {Vl, V2, . . . , ) ,V called the dual basis that is biorthogonal t o the collection {vl, va, . . . , v,). This m m n s that
In this case, the pi are given by pi = (Vi, x). In generalizing the notion of a basis to the infinite-dimensional setting, we retain the notion of linear independence.
Definition 10.1.
A collectzon of functsons { g n ( ~ ) ) n F N L~, on a n interval I , is linearly independent i f given any t2 sequence of coeficients { a ( n ) ) such that
in L~ o n I , then a ( n ) = 0 for all n
E
N.
290
Chapter 10. Biorthogonal wavelet,^
It is often difficult to verify directly whether a given collection of functions is linearly independent. The next lemma gives a sufficient condition for linear independence that we will use throughout the remainder of the chapter. It relies or1 the noti011 of biorthogonality.
Definition 10.2.
A collection functions { & ( x ) ) ~L2~ o~n a n interval 1 is biorthogonal t o a co2lection { g ? z ( ~ ) ) n L2 E ~ o, n I , if (~n,Sia)
/ " 9 , L ( x ) ~ m ( x )= d r6 ( n - m ) .
Lemma 10.3. Let { g , ( z ) ) be a collection of functions L~ o n a n interval I and suppose that there i s a collection { & ( x ) } , L2 o n I , biorthogonal t o {g,(x)}. T h e n { g , ( x ) ) is linearly independent.
Proof: Let { ~ ( n ) )be, an ~ l~2sequence, and satisfy
in L~ on I. Then for each m E N.
0 = (0,L)=
( Ca(.)
g,,
g,,.) = C a ( n ) (g,,
-
9,)
= a(m)
n=l
n= 1
by b i o r t h o g o ~ ~ a l iTherefore t~. {g, (x)) is linearly independent.
C]
10.2 Riesz Bases and the Frame Condition Definition 10.4.
A collection of functions {.g,(x)), L' o n a n interval I , is a
Riesz basis o n I ij: ( a ) { g , ( x ) ) is linearly independent and ( b ) there are constants A, B
>0
such that fo7- all functions f (z),C: on I ,
Remark 10.5. (a) An orthonormal basis is also a Riesz basis. Linear independence of an orthonormal basis follows from Lemrna 10.3 and the fact that an orthonormal basis is biorthogonal to itself. Definition 10.4(b) is satisfied with A = B = 1 by Theorem 2.57(d).
10.2. Riesz Bases and the Frame Condition
291
(b) Riesz bases are valuable since they have many of the useful properlies of orthonormal bases, such as the unique representation of arbitrary functions, but do not require orthogonality. Hence greater flexibility can be exercised in the construction of such bases. (c) Condition (b) in Definition 10.4 is referred to as the frame condition. It is a weakening of the Plancherel formula (Theorem 2.57(d)) for orthonormal bases.
(d) Any set {g,, (x)) , L2 on I , not necessarily linearly independent, which satisfies the frame condition is called a frame on 1.' If {g,(x)) is a frame, then any function f (x), L2 on I, has a representation
in L2 on I for some choice of t2coefficients {c(n)). If {g,(x)) is not linearly independent, then this representation is not unique. To see why, note that linear dependence means that there is an e2 sequence of coefficients {a(n)) such that
in L2 and where not all of the a ( n ) are zero. Therefore; we may write
Since both (c(n)) and (ajn)) are t2,so is {c(n)
+a(n))
(e) The frame condition can be interpreted as guaranteeing the stability of the reconstruction of an arbitrary function f ( x ) . To illustrate this, let {e7i(x))nENbe some orthonormal basis on an interval I and define g,(x) = ( l l n ) e,,(x). Then {g,(x) ) is linearly independent but does not satisfy the frarne condition since A = 0 (Exercise 10.6.) Since { e , ( ~ ) ) , ~ is ~an orthonormal basis, any function f (x) can be written as
'
For more information on frames, see Daubechies, ' l k n Lectures on Wavelets, Chapter 3, and Heil and Walnut, Continuous and discrete wavelet transforms, SIAhI Review, vol. 31 (1989) 628-666.
Chapter 10. Biorthogonal Wavelets
292
Suppose that for some large m E N, an error is made in the calculation of the coefficient c ( m ) = ( f ,g,) so that E(m)is calculated instead. This error could be the result of noise or simply roundoff error. Using this erroneous coefficient in thc reconstruction, we arrive at
S(x) =
C c(n) n en (x)+ Z(m)rn
em (z)
and
since {e,(z)) is an orthonormal basis. Thus the small error Ic(m) - E(rn)I2 is magnified by the large factor m2 leading to a large error in the reconstruction. This sensitivity to small errors is what is meant by instability.
( f ) Definition 10.4 illust,ra,testhe contrast between the finite- and the infinitedimensional settings. In the finite case, the linear independence of n vectors is enough to guarantee that the vectors form a basis for Rn. Stability of reconstruction is automatic in this case. In the infinite case, stability is not automatic and must be included as part of the definition. (g) It is a fact (whose proof is beyond the scope of this book) that if {g,(x)) is a Riesz basis on I, then for each function f (x), L2 on I, there is a unique e2 sequence of coefficients { c ( n ) )such that: e ( n )g,,,(z) in L2 on I and
i f (z) = 1
ii
Bllfll$
5
lc(n)125 71
1
f 1.:
where A and B are the same con-
=1
stants as in Definition 10.4(b).
Exercises Exercise 10.6. Verify the statements made in Remark 10.5(5).
10.3. Riesz Bases of Translates
293
10.3 Riesz Bases of Translates Definition 10.7. Let {g,(x)} be a collection of functions L2 o n a n interval I . T h e n {g,(x)} is a Riesz basis for span{g,(x)) if: ( a ) {g,(x)) i s linearly independent and ( b ) there are constants A, B
> 0 such that for every f ( x ) E W { g , ( z ) ) ,
In this section, we will consider collections of functiorls of the form { T n y ( ~ ) ) n E zwhere , y(x) is a fixed function L2 on R and give necessary and sufficient conditions under which {Tncp(x)) is a Riesz basis for -{Tn y (x)). We will require some preliminary lemmas.
Le11l11ia 10.8. Suppose
LILUL fu,r~ckior~s p(z) W L (P(z), ~ L' Then { T , p ( x ) ) i s biorthogonal t o { ~ l ; , ~ ( izf )and ) only zf
"72
R, ure yzuen.
Proof: The proof of this result is very similar to the proof of Lemma 7.4 and is left as an exercise (Exercise 10.14). L e m m a 10.9. Suppose that the function p ( x ) , L2 o n R. satisfies the following condition: There exist constants c l , cz
> 0 such that
for all y E R . T h e n { T n P ( z ) }is linearly independent.
Proof: Wc will find a function q(x) such that {T,$(x)) is biorthogonal to {Tnp(z)). The resull will lollow by Lerrlma 10.3. Define g(x) by
Ry (10.1), the denominator is never zero so that this division is defined for all y.
294
Chapter 10. Biorthogonal Wavelets
Note that
By Lernrna 10.8, {Tn@(x))is biorthogonal to {Tnv(x)) and the result follows by Lemma 10.3. Lemma 10.10. Suppose that p ( ~ )satisfies (10.1). T h e n for any
1
~ I 5I c2: 1 ~ z ( r ) ~ ~ d x
(10.3)
Enc ( n )Tnp(z)
and ?(T) is
1
I
5
where { ~ ( n )is} a finite sequence such that f (z) i t s Fourier transform.
=
Proof: Since f (z) = C , c ( n )Tncp(x) by Plancherel's Formula,
If (10.1) holds, then
which is (10.3). Lemma 10.11. A compactly supported function p ( x ) , L' o n R, s a t i ~ f i e s (10.1) if and only zf there exist constants A, B > 0 such that for all f (x) E
10.3. Riesz Bases of Translates
295
Proof: Note first that by Plancherel's formula,
(===+)Suppose that (10.1) holds, and let f (x)E span(', is a finite sequence {c(n)) such that
~(x)).Then there
Therefore, for each m E Z,
Therefore (f,T,p) is the -mth Fourier coefficient of the period 1 function ClcI @ ( 7 k) E(7) and by Planchcrcl's formula for Fourier series,
+ l2
so that
Chapter 10. Biorthogonal Wavelets
296
But by (10.3),
so that c Z 1f
ll;
5
C l(f,Tnv)125
(.:~;l
llf 1:.
and (10.5) follows. ( k The )
proof of the converse is somewhat inore complicated and is not given here. The outline of a proof is given in Exercise 10.15.
Remark 10.12. (a) If y ( x ) is compactly supported and satisfies (10.1), then by Exercise 7.1 1 , E nI?(y+n,) 1' is a trigonometric polynomial bounded away from zero. Therefore, ( E n[@(y+ n ) I 2 ) ' can be written as an L2 Fourier series. If
then the function F(x) given by (10.2) satisfies
Taking the inverse Fourier transform of both sides, we obtain
in L2 on R. Therefore, F(x) E s p a n { ~ , , y ( ~ ) } . (b) In fact, @(x) given by (10.2) is the unique function in span{Tnp(z)} such that {T, Fix)) is biorthogonal to {T, cp(x)).However, there exist other functions @(x) n o t in span{Tncp(x))such that {T,,F(x)) is biorthogonal to {Tncp(x)).This fact will be exploited in the construction of Riesz bases of wavelets that have compact support. Theorem 10.13. Let p ( z ) be L~ on R and compactly supported and let { T , p ( x ) } be a Riesz basis for s p a n { T n p ( x ) ) . If there exists a fwr~ct.Lo,r~ G(5) such that {T,$(x)} is biorthogonal t o {T,p(x)}, then:
( a ) for every f (x)E Sf%Zi{T,p(x))
where the s u m converges i n L~ on R and
10.3. Riesz Bases of Tkanslates ( b ) there exist constants A, B
> 0 such that for
297
all f (x) E ~ { T n P ( x ) ) ,
Proof: Wc will first prove (a) and (b) lor f (x) E span(Tncp(x)) and then generalize to f (x) E -{T, cp(x)). To see (a), let f (x) E span{Tnp(x)). Then there is a finite sequence {c(n)) such that (10.9) f (+) = c(n) TnP
x n
By the biorthogonality of {TnF(x)),
and (a) follows. To see (b), recall that by (10.3), there are constants el, c;?> 0 such that for all f (x) E span(T,cp(x)),
where E(7) is the Fourier transform of the sequence { c ( n ) ) of (10.9). By the Plancherel formula for Fourier series and the fact that c(n) = ( f , T,g):
and (b) follows. Generalizing the previous results to span{Tncp(x)), we will prove (b) first. By Exercise 2.62, given f (x) E @Zii{T,,cp(x)},there is a sequence { f m ( x ) } m Gsuch ~ that fm(x) E span{Tn9(x)) for each rn and
Also, by the Cauchy-Schwarz inequality,
for every n E Z. For every N E N, since (10.8) holds for each f,,(x), we have that
Chapter 10. Biorthogonal Wavelets
298
Since the right side of t,he ineqixality has nothing to do with N, we may let N + cc and conclude that
Therefore, we have established the upper bound in (10.8). To see that the lower bound in (10.8) holds, note that by the CauchySchwarz inequality for t2 sequences, we have that for each m E Z ,
Since (10.8) holds for each fm( x ) ,and since the upper bound of (10.8) holds for each f m ( x ) - P ( 4 ,
Letting m
-+co,we have that since I f r n ;
i
(1f 1 ;
which is the lower bound in (10.8). To prove (10.7) for f (x)E span{Tncp(x)l-7 let
E
> 0 and consider the partial sum
and since I fm- f
1 ; + 0,
10.3. Riesz Bases of Translates
299
for some N, A1 E N. Let g ( x ) t s p a n { T , y ( x ) } be such that 11 f - 9 / 1 2 < 6. Since (10.7) holds for g ( x ) . we know that for all N , A f t N large enough.
'l'herefore,
By the Cauchy-Schwarz inequality,
by (10.3), there is a constant
c2
such that
and by (10.8)
Therefore, for all n, ~$1t N large enough,
Since
t
> 0 was arbitrary,
(10.7) follows.
Exercises Exercise 10.14.
Prove Lemma 10.8.
300
Chapter 10. Biorthogonal Wavelets
Exercise 10.15. The purpose of this exercise is to prove the "only if" part of Lemma 10.11. (1) Since cp(x) has compact support, Exercise 7.11 implies that the function + k)I2 is a period 1 trigonometric polynomial and therefore bounded on [0, 1).Therefore it remains only to prove the existence of the lower bound of (10.5).
(2) Equations (10.4) and (10.6) hold regardless of whether (10.1) holds. (3) Let { F , ( x ) ) , be ~ ~the Fejkr kernel defined by Definition 2.29, and fix [ O , l ) . There is a trigonometric polynomial ZN(?)such that lZM(y)l2 = F N ( y - y o ) (Hint: Use an argument similar to that of Theorem 9.21 on spectral factorization of the Daubechies polynomials.)
70 E
(4) Let ZN (7) = C I cN L (n) e-2Tiny, and let f N ( x ) = Use (10.6) to show that
C,,cN (n,)y ( x
-
n).
(5) Use (10.4) to show that
(6) Show that if the lower bound in (10.1) fails t o exist, then for every > 0, we can find a function f (x) (which will be f N ( x ) for some N and snme yo) si~chthat E
10.4 Generalized Multiresolution Analysis (GMRA) In order to construct Riesz bases of wavelets, we require a generalized notion of Mulliresoluliorl Analysis. The defirlition below is exactly the same as Definition 7.12 except that (e) no longer requires orthonormality for the collection {T,,y (x)) of shifts of the scaling function. Definition 10.16.
A generalized multiresolution analysis (GMRA) o n R i s a sequence of subspaces {V,),tz of functions L~ o n R satisfying the following properties. (a) For. all j E Z, T/j C T/5+1
Chapter 10. Biorthogonal Wavelets
302
Lemma 10.18. Suppose that {V,},Ez is a GMRA with scaling function p(x). T h e n there ezists a n e2 sequence {h(n)},€z called the scaling sequence or scaling filter
l;h.n.t
S'LLC~,
p(x) =
h ( n )2'"
y (2x - n)
(10.10)
and a period 1 function mo(y) called the auxilliary function such that
Proof: By Lemma 10.17, {cpl,,(x)).,,z is a Riesz basis for Vl. Since cp(x) E Vo Vl, (10.7) says that there is an t2sequence {h(n)),Ez such that ~ ( 1 := )
C h(n) y ~ , , ( x ) C h ( n )2'1' =
~ ( 2 s n), -
which is (10.10). Taking the Fourier transform of both sides of (10.10) gives (10.11) with
10.4.2 Dual GMRA and Riesx Bases of Wavelets Dual GMRA Definition 10.19. A pui,,. of G M R A ' s
{&}lE~ with scaling function p(x) and with scaling function $(z) are dual t o each other if {TnP(z)}is biorthogonal t o {T,@(x)).
{e}3E~
Remark 10.20. (a) Since there may be more than one function F(x) such that (Tny(x)) is biorthogonal t o {TnF(x)), there may be more than one GMRA {&}jEZ dual to {ll,Ijtz. (b) Since {Tncp(x)) is a Riesz basis for Vo = span{T,cp(x)), it is always possible t o define @(x) by (10.2). In this case, the GMRA generated by @(x) will be dual t o the one generated by cp(x). However, if @(z)is defined by (10.2) then {T,F(x)) is also a Riesz basis for Vo.From this, it follows that = V, for all j t Z.
&
for dual GMRAs. Definition 10.21. Let y(x) and $(x) be scaling functions For each j E Z , define the approximation operators P,, P,, and the detail operafunctions f (z) b y tors Q j and I), on
10.4. Generalized Mult,iresolution Analysis
Lemma 10.22.
&, Q,
The operators P,,
303
and Q,satisfy the following prop-
erties. (a,) P, f ( z )= f ( x ) i f and only i f f ( x ) E f ( x ) E V,.
( b ) &, f ( x ) = 0 for all f(z)E
4, and
f ( z )= f (z) if and only if
-
11/7 and &x,f(z) = 0 fi)r a11 f ( x ) E I/li.
( c ) For all f ( x ) , C: on R ,
lirn((P,f-fl(2-0
and 0-'3
lim JJP3fJ12 =0.
h,,)
Proof: ( a ) Pj f (x)= f ( x ) i f and only if f ( x ) = C , ( f , pjln(x). Since { ~ j( ~ , ~7 . 'is) a,) Riesz ~ ~ ~basis for T/, and since { @ J , , L ( x ) ) l L EisZ biorthogonal to {vj,n( x ) ) n , z , Theorem 10.13 says that f ( x ) = C , ( f , @ j , n )pj.n (x) if and only if f ( x ) E ~ ( c p j 3 , ( x ) ) , G z= V j . A sirnilar argument works for Pf ( x ) . so that by (a), P j f ( x ) = P j + 1 f ( x )= (b) If f ( x ) t q,then f(x) E f ( x ) . Hence Q3f ( x ) = P,+l f ( x ) - Pj f ( x ) = f ( x ) - f ( x ) = 0. A similar argument works for G j f ( x ) . ( c )The proof of (c) is only a slight modification of the proof of Lemrna 7.16. The details are left as an exercise (Exercise 10.27).
0
The Wavelet $(z) and the Dual Wavelet '4;;(x)
Definition 10.23. Let p ( z ) and @ ( x ) be scaling functions for dual GMRA's, and let h ( n ) and x ( n ) be the scaling filters corresponding to p ( z ) and @ ( x ) (Lemma 10.18). De,fine the dual .filters g ( n ) and g(n) by
-
g(n)=(-l)nh(l-n)
and
-
g(n)=(-l)nh(l-n).
(10.12)
Define the wavelet $ ( z ) and the dual wavelet J ( x ) b y
$(.)
=
g ( n ) 21/2y ( 2 x - n )
and
-
$(x) =
?(n,) 21'2 G(2x - n)
304
Chapter 10. Biorthogonal Wavelets
The followi~lglemma contains some basic properties of the wavelet and its dual.
Lemma 10.24. Let $ ( x ) and
&(z) be the wavelet and dual wavelet corre-
sponding to the G M R A ' s {&) with scaling function p ( z ) and function @ ( x ) . Then the following hold. ( a ) (/, ( z ) t
(b)
{&,n
{c)
with scaling
K and $(z) E GI.
(x)} is biorthogonal to { Q o , , ( z ) ) .
( c ) { $ ~ , n ( x ) 2s ) a Riesz basis foi-sl,air{$o,,,(z)} and { , & ~ , ~ ( is x ) a} Ricsz basas for span{Go,, ( x ) } .
-
( d ) For all n, m E Z, ( $ o , n , ~ o , m= > ($)o,n,po,m)= 0 .
( e ) For any f ( z ) ,C: on R, &of ( x ) E sp8n{&,n (z))
-
and
Qo f
(z)
~ { & o ,( z, j ) .
Proof: (a) This follows from the definition of $(x) and $(x). (b) Taking the Fourier transform of both sides of (10.13) gives
where -2ni(y+1/2)
So(r+ 1/21
(10.15)
-27r2(-y+1/2)
mo(r + 112).
(10.16)
m1(r) = e and %1(y) = e
Since {po,,(z)) is biortl~ogonalt o {@o,,,(x)),Lemma 10.8 says that
Combining (10.15), (10.16), and (10.17) gives
10.4. Generalized hlultiresol~ltionAnalysis
305
Repeatirlg the argument giving (10.17) gives
Therefore, by Lemma 10.8. {$",,, (x)} is biorthogonal to
{qo,,(x)}.
(c) By Lemma 10.11, it is enough to show that for some constants e l . 0,
c2
>
,-.
and similarly for &(?). Sincc {po,n(x)}is n Riesz basis for span{~o,,,(x)), Lemma 10.11 implies that there are constants A, B > 0 such that
Therefore.
so that
A B < lmo(r12)12+ l h 0 ( r / 2 + 1/2)12 < A. B -
-
(10.21)
306
Chapter 10. Biorthogonal Wavelets
also. Finally,
and similarly,
A2
<
B -
X~@(Y
+n)12.
n
Therefore, by Lemnia 10.11, {$o,n (x)) is a Reisz basis for span{$o.,(x)) . A similar argument shows that {.II;O,n(~)} is a Reisz basis for ?p%i{To.,(x)). (d) Let n,m E Z be fixed. By Plancherel's formula,
10.4. Generalized iLIultiresolution Analysis
307
since by (10.15),
(qo,,
= 0 for all n; rn E Z. Similarly, , (e) Let f ( x ) be Cf on R. Mimicking the argument used t o obtain (7.40) and (7.41),we have that
where a(?) and b ( y ) are L2 Fourier series. In order to prove that Q0f ( x ) E will be sufficient to find an L2 Fourier series c ( y ) such that
span{ll.)o,,(x)},it
C f(7)
=
47)&Y)
=
~ (m 4 1 ( 7 / 2 )@(?I21-
As in the proof of Theorem 7.35, this leads t o the linear system
Letting =
(
rno (?/a) ml (?/a) ( 7 1 2 1/21 m~( 1 1 2 + 112)
+
l
it follows from (10.15) and (10.17) that det M ( y ) = e - " ' ~ so that M(y)-' is given by
Finally, we arrive at
Since rn,o(y) is bounded, c ( y ) is an L2 Fourier series. A similar argument shows that f (z)E span{&, ( x ) } .
Go
Theorem 10.25.
•
T h e collections {$,,r,(z)),,kEz
and
{&,k(x)),,ktz
defined
b y (10.13) are Rzesz bases o n R.
Proof: We must verify (a) linear indcpcndcncc and (b) thc framc condition.
(a) For linear independence, it will be enough t o show that { ' $ j , k ( ~ ) ) j , k E Z is biorthogonal to { + j , k ( ~ ) ) j , k E ~
308
Chapter 10. Biorthogonal Wavelets
To show hiorthogonality within a given scale, let j . k and k' E Z be fixed. We will show t,hat (10.23) ( $ j , k . $ J , k f ) = 6(ik - k').
B y Lernrna 10.24(b), (10.23) holds with j = 0. If j rem 3.42(f).
#
0, then by Theo-
To show biorthogonality between scales, let ,j, j ' , k and k' E Z be fixed and assurne that j < j'. We will show that
First note that since Qo,k(z) E Vl. $hj,k(x) - = D 2 J d l o . k ( ~E) & + I & & I . Therefore, it will be enough to show that Q j / , k / ( x ) is orthogonal t o every element of K t . To that end, let f (x) E & I . By Lemina 10.17, { p J l , k ( x ) ) k , z is a Riesz basis for 41. Therefore. there exists an t2 sequence ( ~ ( k ) } ~ ~ ~ such that f (x) = C kc ( k )P ~% Ik(x) in L~ on R. By Lerrlma 10.24(d).
Hence:
and (10.24) follows. (b) To show - the frame condition, we must show that there exist constants A. B. A, B > 0 such that for all f ( x ) , C: on R:
and
We will prove this in three steps. Step 1. Show that for every f (x). C; on R,
in L~ on R. Step 2. Show that having the upper bound in (10.25) and (10.26) together with (10.27) implies that we have the lower bound in (10.25) arid (10.26).
10.4. Generalized hIultiresolution Analysis
309
Step 3. Show that we have the upper bound in (10.25) and (10.26).
Step 1. Using the same argument as iri Theorem 7.35, it is possible to show that pj+, f (x)= D,, PI D,-,
f (x)
and
(XI.
P, f (x)= D ~ I P O Df~ - J
By Lemma 10.24(e), Qaf (x) E sp""{$o,n (x)},,tz so that for some P2 sequence {c(k))~ E z , &of (r)= c ( k )$)O,I(~). To see that in fact c(k) = ( f :
-
-
Since Qo f (x)= Pl f (x)- Pof (s),
so that for m E Z fixed,
=
C(f, a,.,)( d ~ ~ P. , ~~ ~. ,~ )
Therefore, for j E Z. Q j f
(XI
=
q,+lf(XI
-
note that since L'o.k(s) E VI:
-
Pjf
(XI
Chapter 10. Biorthogonal Wavelets
310
For any J E N, we may write
By Lemma 10.22(c) and h!linkowski's inequality,
as J + oo. This proves the first part of (10.27). The second part is proved similarly.
Step 2. Suppose that we have upper bounds in (10.25) and (10.26). That is, suppose that there exist constants B, > 0 such that
zz
( f ,$J,k)12 4 ~ l l 1 : f
j
~ (&.,r)l2 f; 5 Elif 1.:
and J
(10.29) We will show that, t h e lower bounds in (10.25) and (10.26) hold also. To see this, note that by (10.27) and the Cauchy-Schwarz inequality for sequences,
10.5. Riesz Bases Orthogonal Across Scales
311
Canceling 11 f 112 from both sides. we arrive at
1
B
Ilf 1 : 5
C C l (f.
d5.k)
.I
l2
k.
A similar argument shows that
Step 3. The proof of Step 3 is rather complicated and relies on the following lemma.
Lemma 10.26. ?(?)I
If p(x) and @(x) satisfy
< c (1 + y l ) '
and
I$(~ 5 )CI(1+ 1h4-l
(10.30)
for some C > 0 and all y E R, t h e n (10.25) and (10.26) hold.
Condition (10.30) is sa,t,isfied by all examples we will consider in this book.
Exercises Exercise 10.27. Prove Lemma 10.22(c).
10.5 Riesz Bases Orthogonal Across Scales In this section, we will construct2 Riesz bases of wavelets that satisfy a partial orthogonality condition; specifically. they are orthogonal across scales. 'This construction is due t o Chui and Wang, A cardinal splzne approach t o wavelets, Proceedings of the American Mathematical Soceity. vol. 113 (1991) p. 785-793.
312
Chapter 10. Biorthogonal Wavelets
That is, we will construct Riesz bases of the form { I ) ~ , ~with ) ~the . ~ ~ ~ property that ($j,k, $ j / , k / } = 0 whenever j .f j'. An advantage to this construction is that the dual GMRA's are the same; that is, 5 = This means that finite approximalions to a function f (x) have similar properties. For example, if we start with the piecewise linear MRA of Section 7.3.2, then the partial sums
&.
are both in VJ and hence are both piecewise linear approximations t o f (x). This example will be explored in detail in Section 10.5.1. - A drawback t o this construction is that the wavelet $(x) and its dual $(x) cannot both be compactly supported. This is a problem especially for numerical algorithms involving these bases. This difficulty can be overcome by allowing the dual GMRA's t o be different (Section 10.7). Let be a GILIRA with compactly supported scaling function p(z). Let a(?)= 1x7 + k)12.
(4)
C Ic
Then a(?)is a period 1 trigonometric polynomial bounded above and away from zero on [ O , 1 ) (since ( p o , n ( x ) }is a Riesz basis for Vo).Define as in (10.2) by
@(XI
$(7)= a(?)-' @(?)-
By Lemma 10.9, {cpo,,(x)) and { @ o , n ( ~ )are ) biorthogonal. Since is COon R, we can write it as an L~ Fourier series as
(10.31)
a(?)-'
Taking inverse Fourier transforms of both sides of (10.31), we have that
Thus, @(x) E Vo and it follows eventually that
4
and that T/, = for all j E Z. Now, in order to define the wavelets $(x) and &(x) in this case, note that by (10.11), there is an L~ Fourier series mo(y)such that
10.5. Riesz Bases Orthogonal Across Scales
313
where
% ~ (= r )@(a?>-'@(r) ~o(Y). Reinembering that
a(?) is
(10.32)
real-valued and has period 1, we define by
(10.15),
arid ?)(l,-;
= e-2.rri(r+1/2)
mo(r t 1/2).
(10.34)
Then the wavelets $(z) and &(z) are given by
and
10.5.1 Example: The Piecewise Linear GMRA Recall the MRA defined in Section 7.3.2 in which Vo consisted of all functions f (x), C0 on R and linear on the intervals lo,k for k E Z. We showed in Section 7.3.2 that the MRA (4) satisfies Defir~iliorl10.16(a)-(d).It re) that {yo,,) is a Riesz mains to show that there is a function v ( ~ such ha.sis for Vo.However, letting p(x) = (1 - 1x1)X , - l , , ~ (x), then Exercisc 7.11 implies that
so that
1 -
<
3 -
C I@(? k
+ h)"
1.
314
Chapter 10. Biorthogonal Wavelets
Moreover, we saw that
span{cpo,n (x)} = vo. Therefore, by Lemma 10.11, {cpo.,(x)) is a Riesz basis for Vo and
(4) is a
GMRA. In this case. by Exercise 7.67: mo(y) = cos2(ay)
Therefore,
and
@(y)
1 =
(1 + 2 cos2(iry)).
(1 + 2 c0s2(iry)) c0s2(7ry) rTLO(y) =
1
+ 2C O S ~ ( ~ T ~ )
and
(?) = e 2 ~ i ( 7 + l /sin2 ~ ) (ay). Finally, we arrive at
and
so that
where (1
and
where
+ 2 sin2(xy)) sin2(a?) 1 + 2 cos2(2ay) $(2)= C Z(n
-
I)
-1
d(n) e-2nii~~,
p(2x - n ) .
10.6. A Discrete Transform
315
10.6 A Discrete Transform for Biorthogonal Wavelets As with orthogonal wavelets, there is a very simple and fast discrete version of the biorthogonal wavelet expansion.
10.6.1
Motivation from GMRA
Suppose that we are given a signal c o ( k ) . We make the assumption that c o ( k ) is the scqucncc of scaling coefficients for some underlying function f ( x ) E Vo,that is, that for k E Z ,
The scaling and wavelet coefficients of f ( x ) , (f,c p j , k ) and ( f ,$ l l , k ) for j < 0 can be calculated using a very convenient recursive algorithm. Since po,o = E n li(n)p l , , , it follows as in Section 8.1, that for any j, k E Z , (10.38) qj.k(x)= h ( n - 21;)y j + ~ , (n x )
C n
and that
g ( n - 2 k ) V ~ +, T II( X I .
1;I,k(x)=
(10.39)
n
For every j E N, define the sequences cj (lc) and d j ( k ) by cj
(k>
(f,9 - j . k )
Then by (10.38),
cj+i(k) = and by (10.39):
d3+l ( k ) =
and
x x
dj( k )
(f,$ - j , k )
2k),
(10.40)
c j (n)g ( n - 2 k ) .
(10.41)
cj(71)/ L ( I L
-
n
The calculation of c j + l ( k ) and d j I i ( k ) is completely reversible. Recall that by Definition 10.21, for any j E Z ,
and that by (10.28),
Also, by Definition 10.21,
316
Chapter 10. Biorthogonal Wavelets
Writing the out in terms of (10.42) and (10.43), we have
By rrlatching coefficients, we conclude that
we summarize these results in the following theorem. Theorem 10.28. Let ~ ( x and ) $(x) be scaling functions for dual G M R A 's. and let h ( n ) and x ( n ) be the co~respondingscaling filters ( L e m ma 10.18). Define the ,filters g ( n )and g(n) b y (10.12) and the ~ua.uelets$ J ( x )and $(x) b y (10.13). Giuea a function f ( x ) , L~ o n R, define for k E Z,
and for every j E N and k C J ( ~= )
E Z,
( f .9-3.k)
and
dJ ( k ) = ( f , cp-,,k).
Then.
and
The operations in (10.45) a.re precisely the approximation and detail operators corresponding to the filters h ( n ) and g(n) (8.18). Equation (10.46) -involves the approxirnation and detail adjoints corresponding to the filters h ( n ) arid g(n) (8.19). This leads to t>hefollowing definition (cf. Definition 8.5).
10.6. A Discrete Transform
317
Definition 10.29.
Given a pair of filters h ( k ) , and % ( k ) , define g ( k ) and g ( k ) b y (10.12) g ( k ) = (-qkh ( 1 - k ) . Define the-corresponding approximation opcrntors H and H and detail operators G and G o n signals c ( n ) by
and the approximation adjoints H * , H* and detail adjoints G * , G* by
This leads to the following restatement of Theorem 10.28. Theorem 10.30.
Keeping the same notation as Theorem 10.28,
-
and
c3 = H*c3+l
10.6.2
-
+ G*d3+b
The QMF Conditions
In this subsection, we will define the analogue of the QMF conditions (Definition 8.12) in the biorthogonal case. By way of motivation, suppose that p(x) and $(x) are scaling functions for dual GMRA's, with scaling filters h ( n ) and h(n) (Lemma 10.18) and g(n)and g(n)the corresponding wavelet filters (Definition 10.23). Then we can prove the following analogue of Theorem 8.2. Theorem 10.31.
h ( n ) x ( n- 2 k ) = x g ( n ) ~ (-n 21;) = 6 ( k )
(a)
(b)
W i t h h ( n ) , x ( n ) ,g ( n ) , a n d F ( n ) as above:
C g ( n )h(n n
-
2k) =
C c(n)h ( n
( c ) x h ( m - 2 k ) h ( n- 2 k ) k
- 2k) =
O for all k E Z .
n
+
g ( m 2 k ) Z ( n P 2 k ) = 6 ( m - n,). k
318
Chapter 10. Biorthogonal Wavelets
Proof: Exercise 10.35.
We also have the following analogue of Theorem 8.11. With h ( n ) , h(n),g ( n ) , and
Theorem 10.32.
g(n) as above,
define
a,
r n l ( y l , and % ( y ) b y (10.15). Define the opemtors H , H , G, H*, and G* as i n Definition 10.29. T h e n the following are equivalent.
H*, G*,
+ 1/2) = 1
( a ) m o ( y ) G o ( y )+ m o ( y + 1 / 2 ) f i o ( y 03
(b)
C h(n)h(n
-
2k) = b(k)
n-cc
Proof: Exercise 10.36.
This leads to the following definition. Definition 10.33.
Given filters h ( k ) and h ( k ) , define the Fourier series m o ( y )
and Go(?) by
Jz C h ( k ) e - ' " ~
1 mo(y)= -
and
n
Then h ,
h form
1
Go( y ) = -
Jz
e-2~'n7 n
a Q M F pair provided that
( a ) m o ( 0 ) = G o ( 0 ) = 1 and
+
( b ) m o ( y / 2 )Go(?) m o ( y / 2
+ 1 / 2 ) &o(y + 1 / 2 ) -- 1 for all y E R.
W e refer t o ( a ) and ( b ) as the (biorthogonal) Q M F conditions.
Theorem 10.34. (a)
(b)
Suppose that h ( k ) , h ( k ) i s a QMP' pair. Il'hen:
J?.
h ( n ) = xh(n= )
C g(n)
=
= 0.
10.7. Compactly Supported Biorthogonal Wavelets
(d)
g ( n ) x ( n- 2 k ) =
319
x~(n)
h ( n - 2 k ) = 0 for all k E Z .
Proof: Exercise 10.37.
Exercises Exercise 10.35. Prove Theorem 10.31. Exercise 10.36. Prove Theorem 10.32. Exercise 10.37. Prove Theorem 10.34.
10.7 Compactly Supported Biorthogonal Wavelets The idea behind the construction of compactly supported biorthogonal wavelets is similar to the construction of compactly supported orthonorrnal wavelets. That is, we seek trigonometric polynomials satisfying the biorthogonal QMF conditions. As before, we will impose vanishing moment conditions on the wavelet and its dual in order to get smoothness and good approximation properties.3 Once appropriate filters have been found, the wavelets and scaling functions can be constructed using the cascade algorithm.* Recall that in constructing compactly supported orthonormal wavelet bases, we defined for each N E N the Daubechies polynomial P N - l ( y ) satisfying (1 - Y ) P~N - l ( y ) yN PN-l(l - Y ) = 1.
+
Replacing y by sin2(T?), this becomes
3Note that Theorem 9.3 relating smoothness and vanishing moments required that the collection { $ J ~ , ~ ) ~ ,be ~ Ean Z orthonormal system. However, the same theorem holds under the assumption that there exists a collection { $ J , k ) 3 , k E Z biorthogonal to { $ g , k ) j , k E z . The proof is very similar. 4 ~ h construction e presented in this section is due to Cohen, Daubechies, and Feauveau, Biorthogonal bases of compactly supported wavelets, Communications on Pure and Applied Mathematics, vol. 45 (1992) 485-560.
-
Chapter 10. Biorthogonal Wavelets
320
Therefore, we will have found trigonometric polyilomials satisfying the QMF corlditiorls by finding trigonometric polynornials satisfying
where P2NP1( 2 ) is given by (9.19) and satisfies properties listed in Proposition 9.18. Constructing compactly supported biorthogonal wavelets, then, amounts to factoring the polynomial PdN-2(z) = z2N-1 P 2 ~ - 1 ( ~ ) .
0 7.1
Compactly Supported Spline Wavelets
The Haar Case If I+(x)= X,O.ll (x), then we have seen that ~ ( xis)the scaling furiction for the Haar MRA and that the auxiliary function mo(?)= ( 1 / 2 ) ( l + e C Z T i 7in ) this case. For any N t N, the polynomial ~ d N - 2( 2 ) contains 2N factors of the form ( z + 1 ) so that for each N E N, it is possible to find trigonometric polynornials (?) such that
&A7N
Let 11s consider some examples. Here
d
Therefore, 1
-
1 2
P I @ )= - P 2 ( 2 )= - ( 1 + 2 - l ) Z
and
1 2
P ~ ( ~ ~= " "- )( 1 + e-2"")-
1 (2
+ 1)
1 (1+e-2~ir), 2
Thus, %A,1(7) = mo(7), and we have recovered the Haar orthonormal basis.
1N
=
2 I Here
Therefore.
10.7. Compactly S i l p p ~ r t ~Riorthogonal d Wavelets
321
arid
Therefore,
and -1 -
h ( n )=
0
n=O,l, otherwise
-
h ( n )=
and
8y5
----
8 -p
JZ 0
-2, 3 , n = -1, 2, n = O , 1, otherwise.
n
=
The Linear Spline Case If y(x) = (1 - Ix - 1I)X,O 21 ( x ) , then we have seen that p(x) is the scaling function for the piecewise linear MR.A a,nd t,hat the auxiliary - function is rnO(y)= (1/4)(1+ e-""il2. For any N 1, the polynomial P 4 N - 2 (con~) tains a factor of the form ( x + 1)2so that it is possible to find trigonometric polynomials (?) such that
>
Let us consider some examples.
so that Pl(X) =
1-P2(z) Z
=
1 -1 2 (l+z ) Z 4
-
and
Therefore, 0
and
= e-25ri7
Chapter 10. Biorthogonal Wavelets
322
This wavelet is not useful since F ( x ) must be a "6" and hence not an L2 function.
so that
Therefore,
and -1
n = 0 , 2, 0
IN
= 4 I Here
Therefore,
otherwise
n = -1, 3, n = 0, 2, n = 1,
10.7. Compactly Supported Biorthogonal
wavelet,^
323
and
and
0
h ( n )=
otherwise
The Cubic Spline Case
+
, the auxilliary functiorl is mu(?)= ( 1 / 8 )( 1 ep2"'7):'. If p(x) = B 3 ( z ) then For any N 2 2, the polynomial P 4 N P 2 ( Z ) contains the factor ( z 1)%0 that it is possible t o find trigonorrletric polyrlolnials iiii.N(y) such that
+
1 N = 2 I Here
so that
Therefore,
0
IN
=4
1.
Here
n - 2 otherwise
a d
-
-1
h ( n ) = { T
n, = 0. 3. n, = 1. 2, otherwise.
324
Chapter 10. Biorthogorlal Wavelets
Therefore,
and -5
zi (n)=
{"
n = -6, 3 , n = -5. 2, - $1 -h ( r ~ ) = 256 n = -4, 1, 256 fi - 26 n = -3, 0, 2 5 6 9 L 7L = -2, -1, 251ifi 0 otherwise). 25p9&
7b
= 0: 3,
n=1.2, otherwise
O
and
10.7.2 Sym,m,etric Biorthogonal Wavelets Each of the filters described in the previolis subsectiori was syrrlrrletric in the sense that for sorne A f , h ( A t - n ) = h ( n ) .For exarnple, - the linear spline filters with N = 2 has h(2 - 7 1 ) = h ( u ) and h ( 2 - 7,)- = h,(n),arld - the c u t ~ i c spline filters with N = 2 havc h(3 - r b ) = h ( n )and h ( 3 - n ) = JL(TJ,). III fact, this is not an accident but follows from the way in which thc polynomial z ~ ~ ( z~) is factorized. ~ P ~The ~followi~lg tlieorem contairls the basic idea. Theorem 10.38.
Let P ( z ) = a(7-1)z7' be any pol?jnomial i n z ( h e w n ( n ) is a finite sequence with a ( n ) = 0 2f n < 0 ) . Th,en the following are equzvalent.
(a) For some A1
E
Z, a ( M - n ) = a ( n ) for all n.
( b ) z-"' P ( z ) = P ( Z - ' ) for all z E Z .
Proof: If (a) holds, the11 for all z,
10.7. Compactly Supported Biorthogonal Wavelets
325
which is (b). If (b) holds, then for all z E Z,
and (a) follows.
Remark 10.39. (a) Any polynonlial of the forrr~
satisfies Theorem 10.38(b) since
(b) In order to guarantee that our biorthogonal scaling filters are syrnrnetric, we require a factorizatiorl of , Z " - ' P ~ ~ - ~ ( Z ) into factors of tile form (10.51). In each exarrlple in the previous subsection, such a factorizatiorl was made. T h e following examples illustrate that other sucll factoriz iLt'~ o n s are possible. Filters Corresponding t o N = 4
326
Chapter 10. Biorthogonal Wavelets
where a ==: .32887591778603
and
B = .28409629819182 +i.24322822591038.
1 The 8/8 filter pair / We factorize z7 P 7 ( z ) as
so that
This leads to the filter coeficierlts
h ( n )=
arid
I
-.05261415011925 .I8870142780633 .60328894481394 0
n = -3, 2, n = -2, 1, n = -1, 0,
.07565691101399 -.I2335584105275 .09789296778/110 .85269867900940 0
n = -3, 4, n = -2, 3, n = -1: 2, n = 0, 1, otherwise.
The 9/7 filter pair We factorize z7 P 7 ( z ) as
so that
otherwise
10.7. Compactly Supported Biorthogonal Wavelets
This leads t o the filter coefficients
-.06453888262894 -.04068941760956 .41809227322221 .78848561640566 0
n = -4, 2, T L = -3, 1, rL = -2, 0, n = -1, otherwise
.03782845550700 -.02384946501938 -.I1062440441842 .37740285561265 .85269867900940 0
n = -3, 5, n = -2, 4, n = -1, 3, n = 0, 2, n = 1,
and
otherwise.
he 10/6 filter p a q
so that
This leads t o the filter coefficients
h(n) =
-.I2907776525788 .04769893003876 .78848561640566
0
n = -3, 2, n = -2, 1, n = - 1 , 0, otherwise
327
328
Chapter 10. Biorthogonal Wavelets
and
-
h(n) =
L
10.7.3
.01891422775350 .00698949524381 -.06723693471890 .I3338922559712 .61505076731103 0
n = -4, 5 , Y L = -3, 4, n = -2, 3, n = 1 , 2, n = 0, 1, otherwise.
Using Symmetry in the D WT
Symmetric filters are most valuable for minimizing so-called edge eflects in the wavelet representation or DWT of a function. The source of these effects is the fact that the periodization of a smooth function need not be smooth. For example, suppose t,ha,t a C" function f (x) defined on the interval [0,p] satisfies f (0) # f (p). Then the period-p extension of f (x) will have jump discontinuities a t the points n p , n E Z . If we apply the DWT to the periodized version of f (x), then there will be large coefficients a t each scale because of the jump discontinuities. These large coefficients will be artificial sincc they do not correspond t o a fcnture of f (x) itself but orily t o the fact that we periodized f (z) before taking the DWT. One possible solutiori t o this problem is t o define the function feven(x) on p] by fevell (z) = (f (x) + f (-x))/2. Then feverl (x) is coritinuol~son [-p, and satisfies feveIl (x) = feven(-x). Then the period-2p extension of feveri(z)will be coritinlious on R. Taking riow the DWT of this function will provide a representation of f (x) (since fevel1(x) = f (x) on [O,p]) and the edge effects will be rnini~nized. Thc drawback to this proposed solution is that we will need twice a,s nialiy coefficients to represent feveIl(x) since its support is twice as long as that of f (x). Tliis clearly destroys any advantage gained by eliminating the jump cliscont,inuity in the periodic extension of f (x). Tlle purpose of this subsection is to sliow how sy~nrnetricfilters can be used t o eliminate this disadvantage.
Definition 10.40. A sequence h ( n ) is synlrnetric i f there is a n integer N such that h ( n ) = h ( N - n ) f o r a21 n E Z. It is whole-point symmetric if N i s c.r~sn,and half-point symmetric if N is odd.
Remark 10.41. (a) The terms whole-point and lialf-point syinllletry arise from the observatiori that if h(n) = h ( N - n ) for all n, then the vertical line x = N/2 is an axis of syrnmetry for h(n). For a whole-point symmetric sequence, this axis of symmetry is an integer and for a half-point symrrietric sequence, it is a half-integer. (b) If h(n) is whole-point symmetric, then its shift ( N / 2 ) )satisfies 7-l\il2h(-n) = ' 7 - ~ / 2 ~ ( n )
T - ~ / ~ ~ L (I I h(n )
+
10.7. Compactly Supported Biorthogonal Wavelets
329
for all n. If h ( n ) is half-point symmetric, then its shift ~ - ( ~ - ~ ) ~= ~ h ( n ) h ( n + ( N - 1)/2) satisfies
and
T - ( ~ + ~h)( /n ~ )=
h(n
+ ( N + 1 ) / 2 )satisfies
In the following, we show how to cornpute the DWT using syrrimetric filters in such a way that edge effects are minimized arid yet efficient represeritatiori of the signal is achieved. The idea will be to modify slightly the approximatiori arid detail operators H and G arid their adjoirits to take advantage of symmetry. We will assumc tllroughol~tthat we are analyzing an hf-vector c = (c(0) c(1) . . . c(A1 - 1)). where &I is even. Tlierc are four separate cases tliat rriust be considerecl. We examine two below. The other two are left as exercises. Case 1: Whole-point Symmetry of
/L(~L)
and h ( n )
If the filters h ( n ) and h(n) are whole-point symmetric, we nlay -assume after an appropriate shift that h(n,) = h ( - n ) and that h(-72) = h(n) for all n. We will define nlodificatiorls of the four operators H, G, H * , and G'*. We first define a period-2M - 2 sigrlal c(n) corresponding t o the vector c as
where one period of the signal is shown. Note that neither c(0) nor c(h4- 1) is repeated, and that c(-n) = C(YL). In other words, c(n,) is whole-point symmetric about x = 0 and also about z = A/I - 1. To define H c , let a ( n ) be the period-M - 1 signal H c ( n ) and note that a(n) satisfies a ( - n ) = a ( n ) .To see this, note that by the symmetry of c(n) and h ( n ),
330
Chapter 10. Biorthogonal Wavelets
Also note that n.(n,) is completely determined by the M / 2 values
Therefore, we define the M/2-vector
Now suppose that we are given an M/2-vector a and that we want to define the M-vector ~ * aFirst . define a corresponding period-hf - 1 signal a ( n ) as
Note tlml u ( ( M -2)/2) is repeated but a(0) is not, and that a(-n) - a(n). Next apply H* to a ( n ) as usual. Then ~ * a ( nwill ) have period 2121-2 and, because of the symmetry of h ( n ) , will satisfy Ij*a(-n) = H * a ( n ) .Finally, since H * a ( n j is completely determined by its first M contiguous values, we define ~ *= a ( H * ~ ( oH)*, a ( l ) , . . . , H * n ( M - 1)).
-
-
-
h ( l - n ) satisfies g(2 - n ) = Siiice h ( n ) = h ( - n ) , the filter g(n) = g(n). In other words, it is whole-point symmetric with axis of symmetry x = I . To see this, note that g(2 - n )
-
=
( - I ) ~ -h~( l
=
(-I)n h(1 - n,)
-
(2 - n ) )
To define Gc, we define the period-2M - 2 signal c(n) as before, let d(n) be the period-M - 1signal Gc(n), and note that d(n) satisfies d(-1-71) = d(n). To see t,l~is,note that by the syrnrnetry of c(r1) and y ( r ~ ) ,
10.7. Compactly Supported Biorthogonal Wavelets
331
Since din) is complet,ely determined by the A4/2 valucs
we define the hi'/2-vector
Now suppose that we are given an M/2-vector d and that we want t,o define the M-vector ~ * dAs . above, since h(-n) = ~ ( I L ) T(n) , = h ( l - n) satisfies g(2-n) = g(n). First define a corresponding periodM - 1 signal d(n) as
d(n) =
{
( 0 ( 1 . . . , ( ( M 4)/2), d ( ( M d ( ( M - 4)/2), . . . , d ( l ) , d(0). . . . } . -
- 2)/2),
Note that d(0) is repeated but d ( ( M - 2)/2) is not, and that d(- 1 - n) = d ( n ) . Next apply t o d(n) as usual. Then G*ci(n) will have period 2hf - 2 and, because of the symmetry of T(n), will satisfy e * d ( - n ) = G*d(n). Findly, since G*d(n) is completely determined by its first Ad contiguoils values, we define
C*
Note also that all four operators defined above correspond to applying the usual operators H and C: tJo the period-2A4 - 2 signal c ( n ) and the usual operators H* and to H c ( n ) and Gc(n) respectively Therefore, we still have the identity
e*
Hence both the DWT and the inverse DWT are properly defined. Case 2: Half-point Synlnretry of h(n) and h ( n ) If both filters h(n) and h ( n ) have half-point symmetry, - we may -assume after an appropriate shift that h(-1 - n ) = h ( n ) and hi-1 - n ) = h(,n) for all n. Let c be an hi'-vector where M is even. We will define modifications of thc four operators 11, G, H*, and We first define a period-2DI signal, c(n) corresponding t o the vector c as
c*.
where one period of the signal is shown. Notc that c ( 0 ) and c(Af - 1) are both repeated, and that E(1 - n ) = F(n).In other words, c(n) is half-point symmetric about x : 1/2 and also about x = (2Af - 1)/2.
332
Chapter 10. Biorthogonal Wavelets
To define Hc, let a ( n ) be the period-ll.f signal H c ( n ) and note that a ( n ) satisfies a ( l - n ) = a ( n ) . To see this, note that by the symmetry of c(n) and h ( n ),
Also note that a ( n ) is completely determined by tJhe h1/2 values
Now suppose that we are giver1 an Al/2-vector a and that we want to define the Al-vector ~ * aFirst . define a corresponding period-hl signal ~ ( 1 2 a ) S
Note that both a(U) and a ( ( h i - 2)/2) are repeat,ed, and that a ( l - n ) = a(,rr). Next apply to a ( n ) as usual. Then H * a ( n ) will have period 2A4 and, beca~lseof the symmetry of z ( n ) ,will satisfy ~ * a (-l n ) = H * a ( n ) . Finally. since &*a(n) is completely determined by its first ill contiguous values, we define
H*
-
Since h(-1
-h.(n,),the filter g(n) = ( - l ) n-h ( l
n ) satisfies g(3 n ) = - g ( n ) . In other words, it is whole-point symmetric w i t h axis of symmetry x = 1. To see this, note that - n.) =
g ( 3 - n) = =
( - ) 3 n
-
( 1- (3- n))
-
h(-1
-
(1- n))
10.7. Compactly Supported Biorthogonal Wavelets
333
To define Gc, we define the period-2A.l signal c(n) as before, let d ( n ) be the period-Al signal Gc(n), and note that d(n) satisfies d(-1 - n) = - d ( n ) . To see this, note that by t,he symrr~et~ry of c(n,) a,nd g(n.),
we defiiic: the Al/2-vector
Now suppose that we are given an AI/2-vector d aiid that we wai~tto define tlie AI-vector G * d . As above, sirice h(-1 - 11) = ~ ( T L ) ,? ( I ] ) = (-1)" h(1 - n ) satisfies 5(3 - n,) = -!i(rl). First, define a correspouding period-hl signal d(n) as
Note that both d(0) and d((A4 - 2)/2) are repeat,ed, and that d(-1 - n ) = d(n). Next apply G* to d(n) as usual. Tlieri G*d(n) - will have period - 2121 and, because of the symmetry of y(n),will satisfy G*d(l - n ) = -G"d(n). Finally, since G*d(n) is completely determined by its first A / l contiguous values, we define
As before, the fact that the operators H*H and d * -~ are' consistently operating on the same periodic extension of c implies that H*HC+G*GC= c so that the inverse DWT also works.
Chapter 11 Wavelet Packets 11.1 Motivation: Completing the Wavelet Tree Recall that corriputirlg the DWT of a signal co(n) ilivolves recursevely applying thc filtering operat,ors H and G as in the diagram in Figure 6.1, where each node on the tree corresporids t,o a sequence. Each sequence c,,( k ) is split into a pair of sequences c , + ~( k ) and d,j+l(k) by tlie action of the ;lpj)roximation and detail operators H and G: thnt is,
c:j+l=Hr]
and
d,l+,=Gc,l.
In nlotivating the
FIGURE 11.1. The wavelet packet tree.
Let 11s first consider the result of coi~ipletiilgthe tree dowii t,o level 2. Define , w O ( x ) = ~ ( x a)i d ull(.c) = y ? ( r ) . Then we have seen thnt rl:(k) in Figure 11.1corresporids to thc initial data c o ( k ) . which we interpret as coefficients co( k ) = (f.poTx.) of sorrie l~riderlvirlgL2 fi~nctionf (x).TVe have also seen that
336
Chapter 11. Wavelet Packets
Moreover,
d:(k) = c2(k) = (f,~
1 2 % ~ )and
di(k)= da(k)= (f,w52%k).
We now ask the question: Is there a function w2('x) such that dz(k)
(f,~ 5 2 , k ) ' To answer this, note that
(x). where w 2(x)= El,h(n)u?:,,, Similarly,
di (k)= (Gdl)(k)= (f u ) ? ~ %, ~ ) where w 3( x )= CvL y (n)w:%, (2). Going on t o the next level in the tree and using the same argument, we see that d z ( k ) = ( f ,wn3.,.) for n = 0, . . . , 7, where wo(x),w 1(a;), w2(a;), and w3(x)are as before and
11.2. Localization of Wavelet Packets
337
(Exercise 11.9). This motivates the following definition.
Definition 11.1. Let w q x ) be an orthogonal scaling function with corresponding scaling $filter h ( n ) . Define the ulavelet filter g ( n ) as usual, and define the seq~ience{?urn( x ) ) , , ~ ~ + of wavelet packet f u n c t i o ~ ~by s
w2n+'
(x) =
C g ( k )ru;,* (x).
Some examples of wavelet packets corresponding to various scaling filters are given in Figures 11.2 and 11.3. In the following sections, we present some properties and important features of wavelet packets and indicate why they are a usefill generalization of wavelet bases.
1 . Localization of Wavelet Packets 112.1
Time/Spatial Localization
We have seen that under certain assumptions on the scaling filter h ( k ) and the auxiliary function mO(y), a filter of length 2N corresponds to a scaling funct,ion supported in an interval of lcrlgth 2N - 1. In Fact, if wO(x) is supported in [0,2N - 11, then all of the wavelet packets w" (z), n E N, are supported in [0, 2N - 11. Note that as in the proof of Theorem 8.38, we can assert that if h j ( k ) = O for k < 0 and k 2 2N and if a function f (x) is support,ed in [0,2N - 11, then the function
is also supported on [0,2 N - 11. A simple induction argument (Exercise 11.7) conlpletes the proof of tjhe following t,henrem.
Theorem 11.2.
Suppose that for some N E N , the scaling function wo(x) is supported i n [O,2N - 11 and that the scaling filter h ( k ) satisfies h ( k ) = O for k < 0 and k 2 2 N . T h e n for each n E N , wn(x)i s supported i n [O,2N - 11.
338
Chapter 11. Wavelet Packets
FIGURE 11.2. The first eight wavelet packets corresponding to the Daubechies four-coefficient filter.
11.2.2 Frequency Localization In this subsection, we will examine the frequency localization properties of the wavelet packets w77x). We will show t h a t each wn (x) can be associated with a specific frequency interval.
The Fourier Transform of wn ( x ) By definition, wo(x) = cp(x), the scaling function of some MRA, and w l ( x ) = $(x), the corresponding wavelet. Hence wo(x) satisfies
11.2. Localization of Wavelet Packets
339
FIGURE I 1 -3. The first eight wavelet packets corresponding to thc Daubechies eight-coefficient filter.
and w1 (x) sntisfics
?u (7) = m1(7/2)
(11.2)
*
Proceeding by induction, we can prove the following theorcrn.
Theorem 11.3. Let ,n E Z+ be given, and let sequence such that
( I n other words,
E, =
€1 € 2
( € 0 , €1,
. . . , ~ k - 1 ) be the unzque
0 o r 1 and such that
. . . e k - 1 is the binary representation of the integer n.) T h e n
340
Chapter 11. Wavelet Packets
Proof: If n = 0, then we have k
If
12
= 1, then
k = 1 and
€0 =
=
1 and
€0 = 0,
and by (11.1),
1 and by (11.2),
Let n > 1 be given, and assume tliat the theorem holds for all rn < n. If rl is even, then n = 2m with m < n and
Hence, n = 0+2c0 + 4 ~ l + arid since
m" (z)=
C l r ( k ) 2'12
+2jcj-i, ~ ~ " ' ( 21
k),
arid the result follows for n,.A siniilar argument gives the result if n is odd (Exercise 11.8). Some examples of wn(?)are calculated and plotted in Figure 11.4 for various orthogorial MRA (the absolute values are plotted). Note tliat l G ( ? ) 1 is symmetric and appears to have a single domirlarlt peak in [O, GO) as well as several smaller peaks. The location of this dorriinant peak is referred to as the nominal frequencg of wr' (x). A
The Nominal Frequericy of w"(x) In order to identify the nominal frequency of w n ( x ) rnore precisely, consider the wavelet packet functiorls associated with the bandlirnited MRA (Section 7.3.3). We will identify these wavelet packet functions as ll)gL(x). Recall that the bandlimited MRA had "perfect frequericy localization" with = ( X I - 1 / 2 ] (7) + X[1/2,11(~)) e-z"7. F(7)= XI-1/2;1/2,(7) and Using Theorem 11.3, we can calculate explicitly the Fourier transform of the w g , ( x ) . Some of these (in absolute value) are plotted in Figures 11.511.7. Note that for each n E Z+, there is an interval Ao,, [0,GO) of the form A",, = [k(n)/2,( k ( n ) 1)/2) for some k(n,) E Zt such that IwgL(7)I = XAo,,,(x) XAo,n(-x). We therefore define the nominal frequency of any wavelet packet furictivr~, w n ( x ) as the rlurrlber k('rz)/2. Sorrle useful properties of the intervals Ao,, and the numbers k ( n ) (the number Ic(r2) is referred to as the Greycode of n ) are summarized in Theorem 11.6. A
+
+
11.2. Localization of Wavelet Packets
341
~z(~)l
FIGURE 11.4. for n = 0, 1, . . . , 7 corresponding to the Daubechies twelve-coefficient filter.
Definition 11.4.
Suppose that a nonnegatzve integer n E Z+ has the ,urz.ig.ue
representatzon n = (0
+ 2cl
+4c2
+ ...
where c, = 0 or 1. T h e n n has cvcrl sequcrlcy zf if CJn=, t n 2.7 odd.
Definition 11.5.
1 zJcJ,
xi,=,,is E,,
ellen and odd sequcrlcy
Given j E Z and n E Z + , we define the interval A,.,, b y
where A"., and k ( n ) are defined above.
342
Chapter 11. Wavelet Packets
1
08
06 04
02
or-
-
-
-
I
-
FIGURE 11.5. Top left t o bottom right: Imo(r/2)1, Imo(r/4)1, Imo(r/8)l, and their product,
1t~'g~,(~)l.
Theorem 11.6. For each j E Z and n
E
z':
( a ) If n has even sequency, then k ( n ) = 2 k((n/2)) and i f n has odd sequency k(n) = 2 k((n/2))+ 1. Here (x) denotes the greatest integer less than or equal to z. ( b ) A,,, = A3-1.2n U AJ-l,an+l, where the union is disjoint. If n has even sequency, A , - 1 ) 2 , is h e lefL / ~ u l f oAf J , n , and 2f 12 has odd sequency, A , - 1 . 2 , is the right half of A,,,,
Proof: (a) The proof is by induction on n. If n = 0 , then n has even sequency and Ao.o = [0, 112). Thus k(0) = 0 = 2 k((012)) = 2 k(0). If n = 1, then n has odd sequency and A"," = [1/2,1). Thus k(1) = 1 = 2 k((1/2)) = 2 k(0) + 1 and the theorem holds for n = 0, 1. Suppose that n > 1 has even sequency and that m = (7212) < n also has even sequency. By Theorem 11.3, it must be true that (why?)
with
11.2. Localizat.ion of Wavelet Packets
343
FIGURE 11.6. Top left t o bottom right: ( m l ( y / 2 1), ( m o ( y / 4 )1, ( r n 0 ( ~ / 8 ) 1 , and their product, J w h L( 7 )I.
+
and 2Ao,, = [k(m),k(m) 1) with k(m) even by the induction hypothesis. Since mo(712) = 1 if y E [& - 1/2, Q 1/2) for even integers !and 0 elsewhere, it follows that
with Ao,, = Ik(m), k(m)
+
+ 112) = [k(n)/2,(k(n) + 1)/2).
Thus, k(n) = 2 k(m) = 2 k((n/2)). The other three cases are similar: (1) n > 1 has even sequency and m = (n/2) < n has odd sequency, (2) n 2 1 has odd sequency and m = (n/2) < n has even sequency, (3) n > 1 and m = (n/2) < n have even sequency. The details are left as an exercise (Exercise 11.10).
(b) Since Aj,, = 2 AjPl,,, it is enough t o show that for all n E Z + , 2 A",, = U A0.2n+l By Definition 11.5,
If n has even sequency, then 2n has even sequency and
212
+ 1 has odd
344
Chapter 11. Wavelet Packets
FIGURE 11.7. Top left t o bottom right: lmo(y/2)1, lml (r/4)1,lrrlo(r/8)1, and their product, Iw;, (y)l. A
sequcncy. By (a), k ( 2 n ) = 2 k ( n ) and k ( 2 n
+ 1 ) = 2 k ( n ) + 1. Since
(b) follows for n,.Tho argurr~entis similar when n has odd sequerlcy (Exercise 11.11). Figure 11.8 lists the first 16 wavelet packet indices and their nominal frequencies. By cornparing these frequencies with the graphs given in Figures 11.2 and 11.3, we see that the greater the nominal frequency, the larger the r~urriberof zero-crossings of 21in(:c) per unit length. The reader is asked to explore this relationship in Exercise 11.14.
Exercises Exercise 11.7. Prove Theorem 11.2.
11.2. Localization of Wavelet Packets
n
sequency
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
even odd odd even odd even even odd ocld even even odd cvcrl odd odd even
k(n)
345
Ao.??
FIGURE 11.8. The first 16 wavelet-packet indices and their nominal frequencies.
Exercise 11.8. Conlplete the proof of Theorern 11.3 by considering tlie case wlien n is odd. 7nG(r). Exercise 11.9. Verify tlie fornlul;~sgive11 above for v~"(a), ~rlyx)). and 7u7(.c) .
Exercise 11.10. Complete the proof of Tlleoreal 1I.G(a) by consitlering the remaining three cases. Exercise 11.11. Cornplete the proof of Theorem 11.6(t)) by considering the case in which n has odd seqliency. Exercise 11.12. Write a h2ATLAB program that cornputes k(r2) and one that computes k-l (n). Exercise 11.13. Complete the proof of Theorern 11.6 by considering the case in which n has odd sequency. Exercise 11.14. (a) For several wavelet packets with conlpact support, compare the nominal frequency of wn(x) with the number of zero-crossings per unit lengtli (that is, for each w n ( z ) , count the number of times its graph
346
Chapter 11. Wavelet Packets
crosses the x-axis and divide this riuinber by the length of its supporti~ig interval).
(b) Conjecture a relationship between the ill~rnberof zcro-crossings per unit length and the nomina.1 frequency of a wavelet packet. (c) Write a RIATLAB prograrn that takes as input a finite scaling filter h ( k ) a d a11 integer n and returns tlie number of zero-crossings per uiiit length of the wavelet packet uln(x)corresponding to that scaling filter. (d) Check whether your corijecture i11 part (b) persists for large scalirlg functions with more and more vanishing rnoments.
71, arid
for
11.3 Orthogonality and Completeness Properties of Wavelet Packets Since U I ' ( Z ) = p(x) and llll(.r) = 1/1(.7:), a simple restateirierit of tlie sccond part of Theoreni 7.35 with J = 0 is that the collectiorl
is an ort,tlonormal basis on R. It is also true tllal { t ~ ( ; ' , ~is )an~ orthonornial basis on R (Theorem 11.19). Thcre may bc other collectiorls of scalcd and shifted wavelet packets that form orthonorrnal lxises on R. Tlie goal of this section is to dcterlrliiie exactly wlic.11 this takes place. The solut,ioxl is closely related to the properties of the intervals {A,,>,,) defirietl in Definition 11.5. To each sllcli iritcrval is associated a sut~space Wl.,, as follows.
Definition 11.15. Gzven j E Z , and n E Z' , define
E Z, the silbspax:e WJ,l= Wj, where WJ is the wavelet subspace defined in Definition 7.47. Note also that WoSo = Vo (see Defiriitioli 7.12(e)). As rnentiorierl above, the collection
Remark 11.16. (a) For eacli j
is an orthonormal basis on R.Note further that thc, collection of intervals { A j , l .Ao.o),lEz+.k,z is a disjoint partition of [0, m).
(b) By Theorem 11.19, tlie collectio~i{ u ) $ , ~E Z) +~. k E Z is a11 orthonorinal basis on R. Note that { A o ~ n ) n G zis + a disjoint partition of [ O , m). Also, by Corollary 11.20, for each fixed j E Z, {wZk. ( z ) ) ~ ~ is ~ an + orthonorrnal , ~ ~ z basis on R and {Aj*,),Ez+ is a disjoint partition of [0, cm).
~ ~ ~ + ~
11.3. Orthogonality and Completeness
347
In light of Remark 11.16, we can formulate at this point a correspondence between disjoint partitions of [0, m ) by intervals of the form A,,, and orthonormal bases on R consisting of the functions {wzk(x)}j,,tZ+ ,k tZ. The remainder of this section is devoted to making this correspondence explicit.
11 3 1
Wavelet Packet Bases with a Fixed Scale
Theorem 11.17. o n R.
The collection { w : ~( x ) ) , ~ ,k,z ~ + is a n orthonor7nal system
Proof: Since
{w;~, ' u I ~ ~ @= ) {w$,~-@, ~1;;'~) for any n,, rn E Z f , k , !E Z, it will be enough to show that for all k t Z and n,, m E Z+, ( 11.3) ( U I ? . ~ ,wCo) = d(m - n,)6 ( k ) .
The proof of (11.3)is by induction on n and m. If n = 0, then ( i r ~ : , ~t , r ~ = ~ ~ (q~~,~:, p) = h ( k ) by t,he ort,l~onorrnalityof the scaling function. Given 7n > 0, assurrie that {w;,~, w : . ~ )= (Sit') SIX:) for all k E Z, 0 ':O < m. If m is even and m > 0, then m / 2 > 0 arid by the induction hypothesis,
If m is odd and m > 1, then (m- 1 ) / 2 > 0 and by the irldlictiori liypothesis.
Similarly, if m
=
1, then (m - 1 ) / 2 = 0 and by (8.13),
~ }
348
Chapter 11. Wavelet Packets
Thus, (wZ;L,, wi>,) = 6 ( m )S ( k ) . Let n t 'N be given, arid assume that for all rn t Z + , k t Z, (tuck, w e ~ ,=~ ~ ) 6 ( t - m) 6 ( k ) for all 0 5 /. < n. If m and n are even and m > n, then m / 2 > n / 2 and b y the induction hypothesis, (w:d2, w:,f) = 0 for every !E Z and so
If r , and n are both odd and if m tlic ~ e s u l follows t similarly sirice
,,,,
> n, then (m - 1 ) / 2 > ( r ~ 1 ) / 2 and -
( I 2 :md (u10 . a, ( 1 1 1 ) / 2) = 0 by the irldriction hypothesis. If nL is even and n is odd a.nd i f r n n,, then m / 2 > (n 1 ) / 2 and b y the iriduction liypothesis,
,
-
Finally, if r n is o d d and 72, is even with rrl > n , then either (711 - 1 ) / 2 > n./2, in which case, ( u ~ ; 'a,;,,) ~ , = 0 as above; or ( m - 1 ) / 2 = n / 2 . in wllicll case,
Corollary 11.18. For each fixed j E Z , the collection a71
O T ~ L ~ L O T L O ~ ~ I I LsCyLs~ t e m on
{
W
~
~
(
X
)
)
is~
R.
Proof: Exercise 11.27.
We now prove completeness of the systems defined in Theorem 11.17 and Corollary 11.18.
~
~
,
~
~
~
+
11.3. Orthogonality and Completeness
Theorem 11.19.
The collection
{ W ~ , ~ ( X ) ) ~ ~ ~is, a , n~ orthonormal ~ +
349 basis
o n R.
Proof: Since orthogonality was proved in Theorem 11.17, it remairis only to show cor-npleterless. We will do this by proving that for each J t Z f ,
(see Exercise 11.28). Since 1
{'1CI~,k)kcz= { w , J , ~ ) ~ E z is a11 orthonormal basis for W J (Remark 7.49(a)), it is enough to show that for each m E Z,
The proof of (11.5) is by induction on J . If J = 1, then n = 1 and clearly
Suppose that (11.5) holds for J - 1. By the induction hypothesis and the orthogonality of { w $ , ~ ) ~ ~ ~ , ~ ~ ~ Z + ,
so that
Note also that by definit,io11,for any 1E Z , ~ ~ ~ ( 2 ) = ~ h ( ~ - 2 C ) w and ; ~ ( x w;",+'(z:) ) =~g(p-2e)ur;lP(x) P
'2
Therefore, by the QMF condition (8.14), for any k E Z,
350
Chapter 11. Wavelet Packets
Thus, for each k E Z , (
x
)
t
span{w$(r), wo2n+1 ( L . ) : e E Z , 2J-2 j n < 2J-1}
=
~ p a n { w F ~ ( x )E : k~ , 2 ~ - n ' < ZJ).
<
Corollary 11.20.
Given j E Z , the collection
is a n or-
thonormal basis o n R.
Proof: Exercise 11.30.
I I . 3.2
Wavelet Packets with Mzxed Scales
Lemma 11.21. For each j E Z ,
2n+l
and { ~ ; 1 1 ~ , ~ (wJP x),
(x))ktz
is a n orthonormal basis for W J ,.,
Remark 11.22. (a) Recall that
and
m5-1.2,,+1
= span{u(lti(r):
k
E
Z}.
In this sense, Lemma 11.21 can be interpreted as saying that the suhspace Wj,.,Lis "split" into the subspaces WJ-1,2n and Wj-1.2n+l. In notation analogous to that used for subspaces of finite-dimensional spaces, we write
(b) By C~rolla~ry 1 1 .I 8, t,he si~bspacesWj-l,2n and Wj-1,2n+l are orthogonal subspaces meaning that if f E Wj-1,2n and g E Wj-1,2n+lr then ( f , g ) = 0 (Exercise 11.31). Thus we say that the splitting (11.6) is an orthogonal splitting.
Proof of Ler1111ia11.21: It follows froin Corollary 11.18 that
is an orthonormal system on R. It remains to show that
11.3. Orthogonality and Completeness
351
and that
(x),ur:",:(x):
-{w;",~
k E Z}
c W,,,,
Since by definition
Wj,n = S @ Z f i { ~ j n , ~p ( E ~ )Z}, : in order t,o show the former inclusion, it will suffice to show that for each
P E Z, 2n+l
(11.7)
w ~ , ( x )t span{w:'ll,k (x),wl-l,k(x) J ~ E z , and for the latter inclusion to show that for each p E Z,
(
x
)EW
and that
w ~ ~ T ~G ~W( jx, n).
(11.8)
By definition,
'"
"j-1,~
h(r - 2p) wj",,(x) and
(2) =
r
ur;?;
( 7-
(x) =
-2
~w)T r ( x )
7-
and (11.8) follows. Equation (11.7) follows from the QMF condition (8.14) by
Lemma 11.23. Let A,,, n A,/,,I = 0 for some j , j' E Z , n , n' E Z + . T h e n the subspaces W,,, and W,/,,I are orthogonal; that is, zf f E W,,, and g E W,I,,I, then ( f , g ) = 0.
Proof: By Theorem 11.19, it, will he enough to show that for all k , k' E Z ,
If .j = j',then in order that Aj,, n A,!,,/ = 0, it must be true that n in which case, (11.9) follows from Corollary 11.18. Otherwise, we may assume that j > j'. Then (11.9) reduces t o
# n';
352
Chapter 11. Wavelet Packets
Hence it is enough t o show that for all j E N, k E Z, and n , n' E Z t , if Ai,n f' Ac,r = 0, then ( w , w ) = 0. By applying the result of Lemma 11.21 j tirries, we see that
Since by Theorem 11.6(b), Aj,,, = " P2Ji17"-'~ose, = ~nJ it follows that All.[ n AO,,,l= 0 for all 2jn O < 2j+'n. Therefore, since w,: t W],,, and
<
Theorem 11.17 implies that
and (11.10) follows.
Theorem 11.24. fo7 j ,
TL
E Z',
Suppose that P its a collection of intervals of the form A,<,, that f o r m s a disjoint partition of [0,a); that is:
( a ) I f I , J E P with I
# J , then I n J
=
@,and
T h e n the collection {~;lk(x):
E Z,
E P}
i s a n orthonormal basis o n R.
Proof: That {W;,(X): k c Z, Aj,n c P } is an orthonormal systcm follows from Lemma 11.23. To see that the collection is complete, not,e that by Theorem 11.19, it is enough t o show that for each n E Z + , Wo,,,c sp;tn{wit,(z): k E Z , A,,,
E
P)
and since {w:,, (x)}kEz is an orthonormal basis for Wo,, , it is enough to show that for each n E Z+ and k E Z ,
11.3. Orthogonality and Completerless
353
Since A,., t P implies that j > 0. then for each n E Z + , there is a unique A,,, E P such that A",, Aj,p, arid note that we must have 2,ip n < 2j+lp as well. But since
c
<
lVj,,
= span{ul~,,(:r):k t
Z, 2Jn 5 P < 2ji1n),
it follows that for all k E Z.
( r ): ,n E Z ) w:, (x)E WJ,p = span{~u1?,~ C W { w , " :,, (a:): rn E Z. A,,, t P ) sirice Aj.p E P. Hence
{ U I ~ ~ , ( . Z )k:
E
Z , A,,,,
E
P ) is complete on R.
Corollary 11.25. I J P is a collection, of intervals of the ,form I,<,,. for j E Z , j > .I, 71 E Z', wh,ich fomns a disjoint partition of [O, XI), then the collectzorl
is an or*thonormal basis o n R.
Proof: Excrcisc 11.32.
Exercises Exercise 11.26. Provo tjha,t ( u ! ( ' ;~, ~1.2 ,=~ S(k) ) for all k E Z. Exercise 11.27. Prove Corollary 11.18. Exercise 11.28. Explain why (11.4) is sufficient to prove Theorerii 11.19. Exercise 11.29. Prove that ill fact equality holds in (11.4); that is, prove that
Exercise 11.30.
Prove Corollary 11.20.
Exercise 11.31. Prove that for any j E Z and n E Z + . W,-1.2,1+1are orthogonal subspaces (see Remark 11.22(b) for a definition and a hint). Exercise 11.32.
Prove Corollary 11.25.
354
Chapter 11. Wavelet Packets
11.4 The Discrete Wavelet Packet Transform (DWPT) As with thc DWT, wc wish to define and interpret wavelet packet coefficients for discrete and especially finite sigrlals.
11.4.1
The D WPT for Signals
Suppose we are given a signal co(n). As with the DWT, we interpret this signal as the scaling coefficients of some underlying furiction f (x), L2 on R; that is, cO(n)= ( f ,w:,,,). According to Definition 11.1,for each j , k E Z and n E Zf ,
(f,w?Y,I:)=
C h(m
-
(11.11)
2 k ) ( f , w:~,,.,,,,)
rrc
(see Exercise 11.36). Hence we define the DWPT for co(n) as follows.
Definition 11.33. G i v e n a signal c o ( n ) , the D W P T of c o ( n ) i s the collection of sequences dy = { d , " ( k ) } k E zfor n E Z f , j E N defined b y d:"+l ( k ) = G d:-
d;"(k) = H d y - l ( k ) ,
(k),
(11.13)
where co (n) - d : ( n ) . T h e D W P T is i,taue~tedby ?i-Leansof the f o ~ ~ n u l a
11.2
The D WPT for Finite Signals
As with the DWT, there are essentially two ways to deal with finite signals. (1) Zero-padding. Here we make our finite signal an infinite signal by padding with zeros infinitely in both directions. As with the DWT, we can assert that if CO(TL) has length hf = 2N, that is, c O ( n ) = 0 if n < 0 or n 2 M, and if the scaling filter h(n) satisfies h(n) = 0 if n < 0 or n L for sorrie even integer L > 2, the11 l l ~ eseyuerlce d;L(k) will have length at least 2 N p j (1 - 2 - 9 (L - 2).
>
+
(2) Periodization. Here we assume that co(n) is a period M = 2N sequence. Then the DWPT is defined as in Definition 11.33. In this case, each d y ( k ) is a period
11.4. Discrete Wavelet Packets
355
2-JM = 2N-i sequence so that it is only necessary t o store dy(k) for k = 0, 1, . . . , 2 - j M - 1. Also note tha,t, t,hs depth of the wavelet packet tree can be at most log2( M ) = N. Therefore a total of M log,(M) = N 2N wavelet packeL cueficierlts will be kept for a length &Isignal. The DWPT as a Linear Transformation We can think of a period M = 2N sequence co(n) as an M-vector
as in Section 8.3.3. Since each sequence dy(,k) has period 2-JM we can think of d y ( k ) as a 2-W-vector
=
2N-J,
Since for every j and n,
where the matrices W, are defined in Section 8.3.2. Since W2-1nlis an orthogonal 2-3 M x 2 - W matrix, it follows that for each 1 5 j 5 N and 0 <_ n < 2% tliere is an orthogunal 2-Jhl x h f rrlatrix that takes co into dy. In other words, for each 0 n < 2" there are 2-Jib1 orthogonal vectors, call them v;:,, 0 5 k < 2-jM such that
<
d''( k ) = (v:,., It is easy to compute the vectors
c0).
(11.15)
~ 7 ,Simply ~ . set dy(/c) = 1 and all other
2.' - 1 ] to zero. Then reconstruct elements of the vector dj = [dy 1 d$ 1 dl the vector co using (11.14). This vector will be v z k . AnoLher rr~etl~od for computing and working with the vectors vTk involves the following Lemma.
Lemma 11.34.
356
Chapter 11. Wavelet Packets co(M - I)] be an arbitrary vector in R".
Proof: Let co = [co(0)co(l) By (11.15),
M-1
dT(k) = (v;,,cO) =
1 vj:,(!) e=o
co(P).
(11.16)
By definition, d?Tl ( m ) = (Hdy)(m) =
x
h(k
-
2m) dy ( k ).
(11.17)
k
Again, by (11.15),
Combining (11.16)-(11.18) gives
and (a) follows. (b) is proved similarly, and (c) is proved by applying the same argument to the identity
The Discrete Wavelet Packet Subspaces
< <
<
For each 1 j N and 0 n < 2i, we can define a subspace of RAf corrcsporlding to the wavelet packet subspaces of Definition 11.15; let us define 2N-j-1 w3,n= span{vzk}k=O Then each Wj,, is a 2N-j-dirnensional subspace of R". In analogy with Lemma 11.21, we can prove the following 1,emma..
Lemma 11.35.
For each 1 < j 5 N and 0 5 n
,,,,
< 2J,
2n+1 2N-J-1 is an orthonormal Proof: We must show that {v:; v,+,,,}~=, basis for W,,,. To this end, note that given k and k': Lemma 11.34(a) and
11.5. The Best-Basis Algorithm
357
(b) imply that
2N-3-1
is an orthonormal system and by the QMF conditions. since { v ; , ) ~ = ~ 2N-1-1 -1 2 r ~ + 1 2N-3-1 Since {v:;l,k}k=O a#n' ivl+l,k)k=o are each orthonormal sys2n+1 2N-~-1-1 . terns, we have shown that vj+, ,k}i=O is an orthonormal sys-
{~:n+~,~,
tem. To see that this system spans Wj,,,, it is enough to show that
But this Iullows immediately from Lemrna 11.34(c).
Exercises Exercise 11.S6.
Derive Equations (11.11) and (11.12) from Definition 11.1.
11.5 The Best-Basis Algorithm 11.5.1
The Discrete Wuuelet Packet Library
2N) cg has M log,(M) = N 2 N DWPT coefficients, {dy(k): 1 j N , 0 5 n < 21, 0 < - I; < 2 N - i } , each of which corresporlds to the dot product of co with a vector v,",, by (11.15). The collection of vect,ors
A given M-vector (M
=
< <
is referred to as the discrete wavelet packet library for R M .Since the number of vectors in V ( M log2(M)) exceeds the dimension of thc spacc (Ill),the vectors in V do not form a linearly independent set. However, it is possible to choose certain subsets of V that form orthonormal bases of RM,as the following examples show. Each such orthonormal basis is called a discrete wavelet packet basis for R".
Example 11.37. the vectors
(a) Suppose that for some fixed 1 5 J 5 N , we choose 1
0
{v,,,, v , , : 1
< j < J, 0 5 k < 2N-j}.
(11.21)
358
Chapter 11. Wavelet Packets
The corresponding coefficents of a givcn vcctor c o arc indicntcd in thc tree diagram in Figure 11.9. This is clearly the DWT of co, and hence the collection (11.21) is an orthonorrnal basis for R".
(b) For a fixed 1 5 J
< N, we can also select the vectors
The correspoilding coefficents of a given vector co are indicated in the tree diagram in Figure 11.10. It is clear frorn the tree diagram that co can be reconstructed from its coefficients {dy(k):O n 5, '2 0 5 k < 2 " ~ ~ ) and that there are exactly M vectors in the collection (11.22). Hence it represents a basis for RM.The fact that the vectors are orthonormal follows from the fact that each pair of vectors v ; , ~ ,v?:~,either sit in the same subspace WJ,, (if n = n') or in orthogonal subspaces Wj),rL, WJ,m,for some j J and m # rn'.
<
<
FIGURE 11.9. The wavelet packet basis of Example 11.37(a).
In analogy with Theorem 11.24 and Corollary 11.25, we can characterize all discrete wavelet packet bases for R~ in terms of dyadic partitions of the interval [O, 112) consisting of intervals with lengths no less than 2TNp1.
Theorem 11.38. Let P be a partition of [O, 1/2) chosen from the dyadic intervals
{A-,,,: 1 5 3 5 N , 0 5 n < 2". T h e n t h e collection
11.5. The Best-Basis Algorithm
359
FIGURE 11.10. The wavelet packet basis of Example 11.37(b). is a n or.thonorma1 basis f i r R".
Proof: First, since (A-j,,,(= 2-J-'
and since
P is a partition of
[U, 1 / 2 ) ,
Since there are 2-.JAf vectJors in V p associated with each APj,,,,,thc total number of vectors in V p is
Thc proof that the collectioi~V p is urtl~ogonalis left to the reader but follows from Lemma 11.34 and a proof analogous to that of Theorern 11.24 (Exercise 11.42). Theorem 11.38 allows us to count the rlunlber of discrete wavelet packet bases for a given M = 2N. Specifically, let P ( N ) be the number of such bases for R ~Then ~ clearly . P ( 1 ) = 2 since there are exact,ly two dyadic partitions of [O, 112) with intervals of length not less than 1 / 4 , namely
360
Chapter 11. Wavelet Packets
= {Ao,o) and P2 = {A-l,07A-l,l). For N > 1, note that any dyadic partition of [0, 112) will either be {Ao,o) or else will be the union of a dyadic partition of [0, 114) and one of [1/4,1/2). Since there are P ( N - 1) dyadic partitions of [O, 1/4) and of [1/4,1/2) with intervals of length not less than 2-N-1 , we have the recursion formula
P1
In fact, for $1 = 2 N , there are inore than 2"/"iscrete wavelet packet bases (Exercise 11.41). Figure 11.11 shows the rapid increase of P ( N ) with N.
FIGURE 11.11. The number of wavelet packet bases of R".
11.5.2
The Idea of th,e Best Basis
Exercise 11.41 says that there are more than 2"12 discrete wavelet packet bases for R". The goal of this subsection is to consider the problem of finding the discrete wavelet packet basis that "best fits" or is "best adapted to" a given vector c o . We need to be more precise about what this means. Intuitively, we would like to say that an orthonormal basis is well adapted to a vector if the vector can be accurately represented by just a few of its coefficients in that basis. For definiteness, let us assume that our vector c o is normalized so that llcoI = 1. The best possible fit of an orthonormal basis to c o will occur when c o is one of the basis vectors. In this case, exactly one of the coefficients of c o in this basis will be 1 and all the rest will be 0. Now consider the case when c o sits in a subspace of R" spanned by, say, three of the vectors in an orthonormal basis, call them v l , v2, and vg. Then Cg = Q1 V1
+
Q2 V2 r ' Qy V y
11.5. The Best-Basis Algorithm
361
with cuf + a; +a: = 1. This is still a very efficient representation of co, but we would like to be able to find some way to say that the first representation, with only one nonzero coefficient, is "better" than the second, with three nonzero coefficients. In order to do this, we define a cost functional M tliat can be thought of as a way to measure the "distance" from a vector to an orthonormal system in Rhf. The way this works is as follows. M is a function that maps a vector c and an orthonorrnal system B = {bj) to a nonnega,t,ivereal number. Typically, M (c,B)will be small if the vector c is well represented by just a few of its coefficients in the basis B. For the purposes of the bestbasis algorithm, we will ask that tlie cost f~nct~iorlal M satisfy a mildly restrictive but very powerful additivity condition. ( a ) A function M is a n additive cost functional if there is a nonnegative functzon f ( t ) o n R such that for all vectors c E R" and orthonomnal systems B = {b,} C R",
Definition 11.39.
(b) G i v e n a vector c E Rn',a n addztzve cost functional M , and a finite collection, B, of orthonormal systems in R", a best basis relative to M for' c i s a system B E B for which M ( c , B ) is minimized. Although it i s n o t required by the definition, for the purposes of the best-basis algorithm, w e will alwr~ysm,ake t h e assumpkion that all of the system,^ in B have the s a m e span. I n other u!ord.s, each B E B i s a n orthonormal basis for the same subspace of R"' ( o r for all of R ~ ' ) .
Sorne exanlples of the type of cost functionals we will corlsider are given below.
(1) Shannon Entropy We define the Shannon entropy functional by
Entropy is a well-known quantity in information theory and is used a s a rrleasllre of the amo~xritof 11ricerta.intyin a probability distribution, or equivalently of the arnount of infornlatiorl obtained from one sample from the probability space. If the probability of the it11 outco~llei n a probability space corlsisting of P outcomes is pi, then the entropy of the probability distribution is
362
Chapter 11. Wavelet Packets
If, for example, pl = 1 and p, = 0 for i # 1, then the entropy of this distribution is zero. This is often interpreted as the statement that there is no urlcerLairlLy in llle outcome, or that no inforination is obtained from a single outcome. A probability distribution in which all outcomes are equally probable will result in high entropy, which is interpreted as high uncertainty of each outcome and that a large amount of information is obtained from each outcome. For our pixrposes, it suffices to note that if x is close to 0 or to 1, then the quantity x logx will be close to zero. Therefore, assuming that c is a unit vector in span(B), the entropy M ( c , B) will tend to be small if the coeficier~ts{ ( c ,bJ)}corlsisl of a few large coefficients (close to 1) and many small ones (close to 0). Note that there is no generality lost by assuming that c is a unit vector in span(B) because if not, just define PCto be the projection of c onto span(B) (which we assume will be the same regardless of which B E B is being considered; see Definition 11.39 above). Then
1 M / P c / l l P c , B) =
-M ( c , B)
IlPcll
+ log Ilpsl12
so that minimizing Pc/IIPcII over B is equivalent to minimizing c over 8. It is certainly possible that PC= 0; in which case, any basis from B will be a best basis. (2) Numbcr Abovc Thrcshold Here, for a given threshold value 0
< A, we define M by
In the context of signal or image processing, M measures how rnany coefficients are "negligible" (that is, below threshold) in a transformed signal or image and how many are "important." The more negligible coefficients, the lower the cost. (3) Sum of pth Powers Fix some p
> 0, and define
If p = 2, then for any vector c and orthonormal system {bj),
11.5. The Best-Basis Algorithm
363
Hence this measure is of no value in best-basis selection if p = 2, since llPcI is always the same no matter which system B E B is chosen. If p >> 2, then I(c, bj)IP will tend to be much smaller than ( c . bj)l if (c, bj) is close to zero, and hence M ( c , {bj))will tend to be small if the coefficients {(c, b,)) consist of a few la.rge coefficient,^ (close to 1) and many small ones (close to 0). (4) Signal-to-Noise Ratio (SNR) This cost functional is a combination of (2) and (3) when p = 2. For a given threshold value A, define
This is a direct measure of the rneari-square error erlcountered when the sniall (meaning below tliresholtl) coefficients are discarded arid the signal or irriage is reconstructed lisirig only the large (above threshold) coefficients. Typically, SNR is measured in decibels (db) arid is sorlietinies given by SNR
=
-10 loglo(M(c/llc~ll, {b,))) dl).
<
Since M(c/llcll, B) 1 for any vector c and B E B and since - loglo(x) is a decreasing, nonriegative furlctiorl for 0 < x < 1, niiriirrlizing M ( c , B) over t3 is equivaleilt to rliaxirriizirlg SNR over 8.
1 I . 5.3 Description of the Algorithm The best-basis algorithm is a divide-and-conquer strategy for finding the best basis for a given vector in R" relative to a given cost functional M from among the P ( N ) (here A4 = 2N) possit)le wavelet packet bases. Since P ( N ) can be very large even for relatively srrlall N (see the table followiiig (11.23)), it is not feasible to exhaustively search all such bases to find the one minimizing M. The algoritlim described here uses the lree slruclure of the DWPT and the additivity of the cost functional to avoid this exhaustive search. The Inlportance of Additivity Each example in Section 11.5.2 is an additive cost functional. For example, for the Shannon entropy functional, f (x) = x log ( l / x ) and for the Number Above Threshold functional, f ( x ) - X[x,,)(x). The key to understandirig the best-basis algorithm is the following observation. Suppose that B1 is a collection of orthonormal systems, each of which spans the same subspace S1, and B2 is a collection of orthonormal systems, each of which spans a subspace S2 orthogonal to S1. Let B be the
364
Chapter 11. Wavelet Packets
collection of ortlionormal systems that is the union of a system from B1 and one from 232. Finally, let c be a vector in R ~ ' .Then given B1 E B1 and B2 E B2,
Equation (11.28) can be interpreted as saying that that cost of representing c in B1 U B2 is the sum of the separate costs of representing c in B1 and in B 2 . This is trivial to verify in light of Definition 11.39(a) but is remarkably powerful. This is illustrated in the following Lemma.
Lemma 11.40. If BI E 231 i s the best basis forc relntiue to M in B1 and if Bz E B2 is the best basis for c relative to M i n 232, then B, U Bz is the best basas for c relative to M in B.
Proof: The proof is by contradiction. Suppose that Bi U B&E B is a lower cost basis than B1 U B2.By (11.28), we would then have
Hence it must be true that either M (c, Bi ) < M (c, B1)or M (c, Bh) < M ( c ,Bz) (or both). But this contradicts the assumptiori that B1 and B2
0
were both best bases. Lemnia 11.40 says tllal in order to find the best basis for c in 23, it is enough to separately find the best basis for c in B1 and B2. Wlla,t, is required is that the subspace spanned by the bases in B1 be ortliogorial to the subspace spanned by the bases in B2. This is why a divide-and-conquer strategy works in finding the best wavelct packet basis for a finite signal. The Algorithm Given c E R~ and a wavelet packet library V as in (11.20), let B,,, denote the best basis for c chosen from among those ortliononnal systems that are subsets of V and that span W , , , , and let m j , , , be the cost of representing c in this best basis. In what follows, we will assume that M ( c , (bj}) = f (l(c,bj)l). The best-basis algorithm is as follows:
c:L,
(1) Compute the full DWPT for c down to the desired level J 5 N.
(2) For 0 _< n < 2*', initialize
B J .= ~ {v~.k}::;.'-l
11.5. The Best-Basis Algorithm
365
and
(3) For j = J-1, J - 2 , . . . , Odo For n = 0 , 1, . . . , 2.7'- 1 do
otherwise
( 4 ) Bo,0 is the best wavelet packet basis for c relative to M, and mo,o is the cost of representing c in Bo,o. Example of the Algorithm
In the following example, we will use the best-basis algorithm to compute the best wavelet packet basis for a chirp signal. First, we take as our signal the function sin(40t2) on [0, 11. This is an example of a linear chirp and is shown in Figure 11.12.
FIGURE 11.12. The linear chirp sin(40t2) on [ O , l ] .
We apply the best-basis algorithm as follows. (1) Compute the full DWPT down to level J = 3 for this signal using zero-padding, and calculate the quantity X I ,f (ldT(k)l) for each 0 5 j 5 3
366
Chapter 11. Wavelet Packets
and 0 5 n < 23, where f (x) Figure 11.13(left).
=
-x log2(x). The results are shown in
(2) lnitialize the entropy values ms,,, 0 5 n <: 7 to be the values on the bottom row of the tree. Initialize the best-basis at level 3 to be the basis vectors corresponding to the bottom row of the tree. That is, B3,n= {v;,~) for 0 5 n 5 7. This is shown in Figure 11.13(right).
FIGURE 11.13. Left: Calculated Shannon entropy for the full DWPT of the linear chirp. Right: Initial best-basis for the linear chirp.
+
(3) Fix j = 2. For n = 0, observe that 4.1 3.4 < 8.9. Update m2,o = m3," m3~1= 4.1 3.4 and B2,o = 83,0 U B:3,1. For n = 1, observe that .67+2.1 < 3.9 so that r n 2 , 1 = m 3 , 2 + m 3 , 3 = .67+2.1 and B2.1= B3,2UB3,3. For n = 2, observe that .18 .39 > .5 so that m 2 , 2 = .5arid B2,2= ( ~ 2 2 . ~ ) . Siniilarly, since .33+ .33 > .47, m2,s = .47 and B2,3= { v : , ~ }The . updated entropy vahies and the updated best-basis are shown in Figure 11.14.
+
+
+
+
(4) Fix j = 1. For n = 0, since 7.4 + 2.8 < 17.3, let m1.0 = m2$0 m2,l = 7.4 2.8 and Bl,o = 8 2 , " U B 2 , For ~ n = 1, since .5 .47 < 1.02, let ml,l = m 2 , 2 + m 2 , 3 = .5 + .47 and BIt1= B2,2U B2,3.The updated entropy values and the updated best-basis (which actually has not changed) are shown in Figure 11.15.
+
+
+
+
+
(5) Fix j = 0. Since 10.2 .97 < 28.5, let mo,o = rn,l,o m131= 10.2 .97 and Bo,o= BlIoU B1,l.This basis is the best-basis and its entropy is equal to m o , The ~ final entropy value and best-basis a.re shown in Figure 11.16.
Exercises Exercise 11.41. Prove that for N and P ( N ) = P ( N - 1)2 1.
+
> 1, P ( N ) > 2 2 N - 1 ,whew P ( 1 ) = 2
11.5. The Best-Basis Algorithm
367
FIGURE 11.14. Updated entropy values and best basis at level j = 2 for the linear chirp.
FIGURE 11.15. Updated entropy values and best basis at level j = 1 for the linear chirp.
Exercise 11.42. Complete the proof of Theorem 11.38.
368
Chapter 11. Wavelet Packets
FIGURE 11.16. Final updated entropy values and best basis for the linear chirp.
Part V
Applications
Chapter 12 Image Compression The purpose of this chapter is to present some of the basic concepts behind image coding with the wavelet transform. There are many excellent expositions of the theory and practice of image and signal con~pressioriusing wavelets, and the reader is encouraged to consult those references for more information. The goal here is to give the reader enough information to design a model wavelet-transform image coder. A typical black-and-white image is an hf x Ad array of integers chosen from sorne specified range, say, 0 through L - 1. Each elenlent of this array is referred to as a picture element or pixel, and the value of each pixel is rcfcrrcd to as a grayscale value and rcprcscnts thc shadc of gray of the given pixel. Usually a pixel value of 0 is colored black, and L - 1 is colored white. In this chapter, we will assume for simplicity that h1 is some power of 2, usually 256 or 512. If M = 256 (hence 65536 pixels) and L = 256 (hence 8 bits per pixel), then the storage requirements for an image would be 256 x 256 x 8 = 524288 bits. The goal of image compression is to take advantage of hidden structure in the image to reduce these storage requirements. Any transform coding scheme consists of three steps: (1) the Transform Step, (2) the Quantixation Step, and (3) the Coding Step. (1) The Transform Step. In this step, the image data are acted on by sorne invertible transform T whose purpose is to decorreelate the data as rrluch as possible. This means to remove rediindarlcy or hidden structure in the image. Such a transform usually amounts to computing the coefficients of the image in some orthonormal or rlorlorthogonal basis. Because any such transform is exactly invertible, the transform step is referred to as lossless. See the can (2) The Quantization Step. The coefficients calculated in the transform step will in general be real numbers, or at least high-precision floatingpoint numbers, even if the original data consisted of only integer values. As such, the number of bits required to store each coefficient can be quite high. Quantization is the process of replacing these real numbers with approximations that require fewer bits to store. This "rounding off" process is rlecessarily lossy, meaning that the exact values of the coefficients cannot be recovered from their quantized versions. In a typical transform coding algorithm, all error occurs at this stage.
(3) The Coding Step. Typically, most of the coefficients computed in the transform step will be close to zero, and in the quantization step will actu-
372
Chapter 12. Image Compression
ally be set to zero. Hence the output of Steps (1) and (2) will be a sequence of bits containing long stretches of zeros. It is known that bit sequences with that kind of structure can be very efficiently compressed. This is what takes place at this step.
The Transform Step 1 . 11
Wavelets or Wavelet Packets?
We have seen that wavelet bases are very good at efficiently representing functions that are smooth except for a small set of discontinuities. Any image that has large regions of constant grayscale (for example, a white or black background) can therefore be well represented in a wavelet basis. Hence a wavelet basis wit,h sufficient vanishing rnoments can be used effectively in the transform step. It is also possible to find the best wavelet packet basis for an image and use the expansion in that basis as the transform. The advantage of this approach is that the resultiug coefficients will be optimized relative to some appropriate measure of efficiency. For example, maximizing the number of coefficients below a given threshold is precisely what is called for in a transform coding scheme as described here. A clear disadvantage is that the best basis will depend on the image so that a description of which basis is used must be included in the overhead. Since for an M x M image, therc arc rnoto than 2"2/2 wavelet packet bases, at least ~ ~ bits/ are2required to specify the transform being used. This amounts to at least .5 bits per pixel in overhead costs. One solution to this problem that is especially effective when a large number of images with similar characteristics are being compressed is to compute a single basis well suited to the collection. The way this is done is as follows. First a representative subset { fi):=, of the images to be compressed is chosen. Then for a given cost functional M, the basis B is chosen t,ha,t,minimizes
The basis B is the ensemble best basis for the subset and is used to specify the transform to be used for compression. The best-basis algorithm is still applicable in this case; so this calculation is efficient. An example of a situation in which an ensemble best basis is used is in the compression of fingerprint images. The ridges on a typical fingerprint translate to rapid oscillations in pixel values; so it is not silrprising that a standard wavelet basis does not give the optimal representation.
12.2. The Quantization Step
12.1.2
373
Choosing a Filter
Another question to be raised in choosing the transform is which scaling and wavelet filters to use. There are several things to consider. (1) Symmetry. Symmetric filters are preferred for the reasons outlined in Section 10.7.3, namely that large coefficients resulting from false edges due t o periodization can be avoided. Since orthogonal filters (except the Haar filter) cannot be symmetric, biorthogonal filters are almost always chosen for image compression applications.
(2) Vanishing moments. Since we are interested in efficient representation, we require filters with a large number of vanishing nioments. This way, the smooth parts of an image will produce very small wavelet coefficients. Since, because of symmetry considerations, we are only interested in biorthogonal wavelets, it is possible to have a different number of vanishing moments on the analysis filters than on the reconstruction filters. Vanishing moments on the analysis filter are desirable as they will result in small coefficients in the transform, whereas vanishing moments on the reconstruction filter are desirable as they will result in fewer blocking artifacts in the compressed image. Hence sufficient vanishing moments on both filters are desirable. (3) Size of the filters. Long analysis filters mean greater cornputation time for the wavelet or wavelet packet transform. Long reconstruction filters can produce unpleasa.nt atifacts in the compressed image for the following reason. Since the reconstructed image is made up of the superposition of only a. few scaled and shifted reconstruclion filters, features of t,he reconstruction filters, such as oscillatioris or lack of smoothness, can be visiblc in the reco~istructedimage. Smoothness can be guaranteed by requiring a large number of vanishing moments in the recoristr~ictionfilter, but such filters tend to be oscillat,ory. Therefore, we seek both analysis and rccorlstruction filters that are as short as possible. The more vanishing moments a filter has, the longer that filter must be. Therefore there is a tradeoff between having lots of vanishing morrients and short filters. The 9/7 filter pair turns out to be a good conipro~llisearld is in fact the filter used for fingerprint compression.
The Quantization Step After the image has been transformed, we are left with an M x h1 array of coefficients that can be high-precision floating-point numbers. These values must be quantized or rounded in such a way that they take only a relatively small number of values. Quantization is achieved by means of a quantization map, Q, an int,eger valued step function. A simple quantization scheme called unzform scalar quantixation is defined as follows.
374
Chapter 12. Image Compression
(1) Supose that all of the coefficients in the array fall in the range [ - A , A ] , and that the number of quantization levels, an integer q (usually even) is specified. The interval [ - A , A] is partitioned into q equal subintervals [xO,xl),[xl, xZ): . . ., [xq-1, x q ) , where xo = - A and xi+l - xi = 2 A / q . (2) We define a quantization map Q ( x ) as shown in Figure 12.1(left). Note that the rangc of Q is thc set of q - 1 integers { - ( q - 2)/2, . . . , ( q - 2)/2}.
(3) A dequ~n~tizing function, Q-l, is specified as shown in Figure 12.1(right). Note that each integer value in the range of Q is mapped to the center of the corresponding interval in the partition with the exception that Q-l(O) = 0. There are other types of quantization, such as vector quantixtion and predictive quantization. More complete discussions of the theory of image quantization can be found in the texts listed in the appendix. The goal is to rninimize the quantization error or distortion in the transformed signal.
FIGURE 12.1. Left: Q ( x ) , right:
Q-'(2).
A hallmark of an effective transform for image coding is that rnost of the coefficients of a given iniage are small and hence are quantized t,o zero. If the quantization map Q ( x ) shown in Figure lZ.l(left) is used, then all coefficients less than 2 A / q in absolute value are quantized to zero. It is often desirable to specify an independent parameter or threshold X > 0 such that all coefficients less than X in absolute value are quantized to zero. There are two types of thresholding, hard and soft thresholding. The difference between them is related to how the coefficients larger than X in absolute value are handled. In hard thresholding, these values are left alone, and in soft thresholding, these values are decreased by X if positive and increased by X if negative. Specifically, we define a pair of thresholding
12.3. The Coding Step
375
functions as follows:
Hard and soft thresholding functions are shown in Figure 12.2. If thresholding is used, then the quantization map has the form Q o T (x), where T is either a hard or soft thresholding function.
FIGURE 12.2. Left: Thard (rc), right: TsOft (x).
The Coding Step Suppclse that the tra~lsforri~ed hl x M irrlage has bee11 yuar~tiaedin sudl a way that the data t o be cornpressed consist of a string of M' integers between 0 and r - 1, for some positive integer r. The idea behind coding this string of numbers is t o exploit redundancy in order t o reduce the number of bits required to store the string. A simple example of this idea is the following. Suppose that r = 4, M~ = 16, and the data to be compressed were written as
AABCDAAABBADAAAA (we have substituted the letters A, B , (7,D for the int,egers 0, 1 , 2, 3 for simplicity in what follows). Since there are a total of four distinct symbols
376
Chapter 1 2 . Image Compression
in t,he data, it is possible t o code each symbol with 2 bits or binary digits. We could do this as follows;
In this case, our data would read as
a total of 32 bits. On the other hand, observing that the symbol A appears far more often in the data than does any other symbol (A appears 10 times, B 3 times, C once, and D twice), we can compress the data by represerltirlg A with fewer bits and using more bits for the other symbols. For example, we could use the fullowing code:
Then the data would read as
a total of 25 bits and a savings of about 22%.
In the rema,inder of this subsection, we will present sorrie basic concepts of information and coding theory and introduce the concept of entropy of a symbol source.'
12.3.1
Sources and Codes
Definition 12.1. A symbol source is afinite set S
= { s l , s2,
. . . , s q ) together
<
with associated probabilities given by pi = P ( s i ) for 1 5 i 5 q. Here 0 5 p, 1 and - 1. T h e symbol source S i s interpreted as a "black box" that produces a stream of symbols from S accordzng t o the probabilities given by P . T h e probability that the black box will produce symbol s, is p,. A binary code, C , i s a finite set of jinitc length strings of 0's and 1 's. Each element of C i s called a codeword. A coding scheme is a one-to-one mapping f
xp,
his material is
adapted from Roman, Introduction to Coding and Information
Theory, Springer (1997).
12.3. The Coding Step
377
from S into C . Given a coding scheme, f , for the symbol source S , the average codeword length of f is g i v e n b y
Example 12.2. (a) Let S = { A , B , C , D), and let P ( A ) = 5 / 8 , P ( B ) = 3/16, P ( C ) = 1/16, and P ( D ) = 118. Consider the code C = (00, 01, 10, 11) and the coding scheme
The average codeword length for this coding scheme is
It makes sense of course in this case that the average codeword length would be 2 since each codeword has length 2. (b) Let's consider a different coding scheme.
The ACL for this coding scheme is
This scheme will tend to be more efficient in the sense that the coded version of a typical output of the source will be about 1.5625/2 - .78125 or about 78% as long as for the less efficient coding scheme. Suppose that we are given a message coded using the coding scheme in Example 12.2(b):
Note that no indication is given as to where one codeword ends and the next one begins. Nevertheless, there is only one way t o decipher this message using the given code. The first character, 0, must represent the symbol A
378
Chapter 12. Image Compression
since there is no ot,her codeword beginning with 0. The next character 1 can be the beginning of the codeword for either B, C , or D; however, the next two characters 10 can only represent B since the string 10 is not the beginning of any other codeword. Continuing in this fashion, it is possible to decipher without ambiguity the message as ABACCADBBACAAADB The relevant property of the code is that no codeword appears as the prefix for any other codeword. This property is referred t o as the prefix property and guarantees that every string of codewords can be uniquely deciphered, and moreover guarantees that each codeword can be deciphered as soon as i t is read. A code with this property is said t o be instantaneous. All examples of coding schemes in this chapter will have the prefix property.
1 . 3 . Entropy and Inform,ation Given any symbol source S, there is an intuitive notion related to the amount of uncertaznty in each output of S . For example, if S = {A, B) and P ( A ) = .99 and P(B) = .01, then because it is almost certain that the next symbol put out by the source is A, the source has very little uncertainty. However, if P ( A ) = P(B) = 1/2, then the source has much greater uncertainty. A related intuitive notion is the amount of informatton in the source. When P ( A ) = .99 and P(B) = .01, we learn very little about the source when an A is put out, and we learn much more about the source when a B is put o u t . On average, however, we will see an A being put out 99% of the time so that the average amount of information contained in a given output is very small. On the other hand, when P ( A ) = P(B) = 1/2, then we will on average learn more about the source from each output. Associated with a symbol source S is a number, H(S),called the entropy of the source, which is a way t o measure the amount of uncertainty or information in the source. The entropy of a source S is defined by
In order to see that this definition of entropy makes sense, we will list below a few common sense properties that any measure of uncertainty or information should satisfy, and show that H (S) satisfies these properties.
(1) A symbol source S for which P ( s i ) = 1 for some i and P ( s j ) = 0 for j # i has n o uncertainty, and the average amount of information in each output is zero. Since log2(1) = 0 and since lim,,o+ x log2(x) = 0, we define 0 . log, (0) = 0. Hence H(S) = 0 for such a source.
12.3. 'l'he Coding Step
379
(2) The source with the most uncertaintg is one in which each symbol is equally likely. In this case, P ( s i ) = l / q for all i and
It can be shown (though we will not show it here) that this is the maximum value that H ( S ) can take.
(3) Adding symbols to a source that has n o chance of occurring does not change the amount of uncertainty o r the average arr~ovn~t of in$orm,ation in, the source. In other words, if we add a new symbol s,+l to S and define P(s,+1) = 0, then the new source Sf = { s l , s2, . . . , s q ,s ~ + has ~ ) the same entropy as S. This clearly follows from the definition of H (S) and H ( S 1 ) . ( 4 ) If a pair of independent sources are putting out symbols simultaneously, then the information i n the pair.ed source is the s u m of the inform,ation in each source separately. Suppose that sources A = { u l , . . . , a,) arid B = {bl, . . . , bT) are indethat the output of A is a, and pendent in the sense that the the probability that the output of B is b:, is the ~ r o d u c tP ( a , ) P ( b 7 ) .We define a new source A B = ( a z b j ) l l i l q ; l l j l r with P ( a i b j ) = P ( a j i ) P ( b 3 ) . Then
380
12.3.3
Chapter 12. Image Compression
Coding and Compression
Suppose that we are given a finite sequence of symbols from some alphabet of size q = 2', i s l , sa, . . . , s,), and say that the length of the sequence is AT, where we think of M as being quite large. This sequence can be thought of as a message or signal or image that wc want to store on some storage device or transmit over some channel. Since q = 2', it will require s bits to represent each symbol in the alphabet, so that the message can be represented by a total of s h f bits. Our goal is to exploit the redundancy in the message in order to reduce the number of bits required to represent it. We can interpret this problem as the problem of coding a symbol source if we assign to each symbol in the alphabet a probability P ( s i ) .This probability can be assigned aftcr thc fact by calculating the proportion of times that the symbol si actually occurs in the message. That is,
P(.,) =
number of times s; occurs in the message M
For example, suppose that the alphabet consisted of the integers 0 through q- 1,which were the pixel values of a trans~ormedand a quantized image. If we set a threshold on the transformed image so that 95% of the coefficients are below the threshold (and so are set t o zero), we would assign P ( 0 ) = .95. The probability can also be assigned beforehand based on known probabilities of occurrence of the symbols in the given type of message. For example, it is known that, on average, the letter "en occurs about 13% of the time in English prose, so that if we were trying to compress an English sentence, then we would assign P ( e ) = .13. An efficient coding sdlerne ol this synlbol source is therefore interpreted as an efficient cornpression scheme for the message or signal or image. A good way to rneasure the efficiency of compression is to compute the ACL of the coding scheme. Therefore, for a give11 coding sche~nef , we could expect to represent the symbol sequence with A C L ( f ) . M bits. In the context of image compression, we say that the irnage is compressed at A C L ( f ) bzts per pzxel, and would calculate a compression ratio of s / A C L ( f ) . A fundamental result in information theory gives a relationship between the optimal ACL for a given symbol source and the entropy of that s ~ u r c e . ~
Theorem 12.3. Let S be a symbol source, and let m i n A C L ( S ) = m i n ( A C L ( f ) ) where the minimum is taken over all codirlg schernes, J , of S . Then
Example 12.4. Suppose that we have quantized the transform of an im2A proof of this theorem and Information Theory.
can be found in the book
Rolrlarl,
J~~troductjon to Coding
12.3. The Coding Step
381
age t o q = 32 quantization levels, and that 95% of the trarlsfornl coefficients quantize t o zero. Suppose also for simplicity that the remaining pixel values are distributed evenly ainong the remaining 31 quantization levels. Then P ( 0 ) = .95
and
P ( i ) = .05/31 = 1/620 z .00161.
The entropy of this source is
Therefore, the best possible coding of this image would require about .53 bits per pixel, a t a compression ratio of 51.53 z 9.4 or about 9.4-to-one compression. The previous example exposes a need to improve Theorem 12.3. The theoretical minimum of ACL(f ) is about .5. However, any codeword must conta,in a,t lea,st,one symbol and hence must have length at least 1. Therefore, we must have ACL(f) > 1 for any coding scheme f . It would be very desirable t o somehow get closer t o the theoretical minimurn of .5 as this would improve compression by a factor of about 2. One way to overcorrle this limitation is t o allow single codewords t o represent strings of more than onc symbol.
Example 12.5. (a) Suppose we are given the following message of length 64 made up of the four-symbol alphabet {A, B, C, D),
AABCAAAAAAAAAAAAAAAABCAAAAAAAAAA AAAAAAAAAAAAAADAAAAAAAAAACAAAAAA. By counting the frequency of occurrence of each symbol, we can model this message as the output of a symbol source, S, with the following probabilities: P ( A ) = 58/64, P(B) = 2/64, P ( C ) = 3/64, and P ( D ) = 1/64. The entropy of S is
H(S)
=
- (58164) log,
(58164) - (2164) log, (2164)
-(3/64) log, (3164) - (1164) log, (1164) .5857. Using the coding scheme
382
Chapter 12. Image Compression
we can code this message as
which is 7 3 bits or about 73/64 = 1.14 bits per symbol. Note that this number is well below the upper bound of H ( S ) 1 = 1.5857 for the minimum ACL of any coding scheme. This identifies this coding scheme as an efficient one. However, it clearly does not approach the theoretical lower bound of .5857 bits per symbol, and is in fact almost two times worse.
+
(b) Now suppose that we pair adjacent symbols in the above message, obtaining the following message:
AAAAAAAAAAAAAADAAAAAAAAAACAAAAAA, which can be thought of as a length 32 message from the 16-symbol alphabet {AA, AB, AC, AD, BA, BB, B C , B D , CA, C R , CC, C D , n A , D B , DC, D D ) . By counting symbols, we arrive at a model for this message as the output of a symbol source, S2,with probabilities P(AA) = 28/32, P ( B C ) = 2/32, P ( D A ) = 1/32, P ( A C ) = 1/32, and the probabilities of all the other symbol pairs zero. The entropy of S2is
H(s~) =
- (28/32) log, - (1/32)
(28/32)
log2(1/32)
-
-
(2/32) log2(2/32) (1/32) log2(1/32)
.7311. Using the coding scheme
we can code this message as
which is 38 bits. This is 38/32 = 1.19 bits per symbol if we consider the message to be of length 32 but made up of' symbols chosen from the 16character alphabet, but is 38/64 = .59 bits per symbol if we consider the
12.3. The Coding Step
383
message t o be of length 64 chosen from the four-character alphabet. Note that this second coding is very close to optimal. (c) In practice, storing a coded message also requires storing the coding scheme, f , in order that the coded message can be deciphered. The additional bits required to store f are referred t o as overlzeuci and will always increase the number of bits per symbol needed to store the message. The real significance of this overhead t o the effciency of a particular code can depend on a number of facttors, including the length of the message being coded or the number of messages being coded with the same coding scheme. In parts (a) and (b) of this example, there are only four codewords in each of the coding schemes given. For the example in part (a), there can be no more than four codewords since there are only four characters in the alphabet. However, for the cxamplc in part (b), there are 16 characters in the alphabet, each of which could conceiveably require a codeword. By considering pairs of characters, we see that we can code with near-optimal efficiency at the potential cost, of a, la8rgeincrease in overhead. We now present a general framework for efficient coding of a symbol source by grouping adjacent symbols to form a new symbol source. Given a symbol source S = { s l , sz, . . . , s,) with associated probabilities P ( s i ) = p,, define the n t h extension of S to be the set
Definition 12.6.
with associated probabilities
Remark 12.7. (a) The syinbols in the set Sn corlsist of all strings of length n of symbols in S. There are a total of qn such strings. (b) The assignment of probsbilitics in Definition 11.5 arises from the assumption that each symbol in S represents a probabilistically independent event. In other words, the output of any given symbol does not influence which symbol will be put out next. No matter which symbol a#ct,uallyoccurs, the next symbol is determined only by the original probabilities pi.
Theorem 12.8. Let S be a symbol source and S n its n t h extension. T h e n H(Sn) = nH(S).
Proof:
384
since
Chapter 12. Image Compression
C: -II
p Y 7= l f o r l < j < q
Theorem 12.9.
Let S be a symbol source, and let S" be its n t h extension.
Here m i n A C L ( S n ) = min(ACL(f)), where the m i n i m u m is taken over all coding schemes of SrL.
Proof: By Theorem 12.3 and Theorem 12.8,
Dividing both sides by n gives the result. Remark 12.10. (a) Since each symbol in Sn is a string of 71,syrrlbols frorn S, any coding scheme for the source Sncan also be used as a coding scheme for S.If f n is such a code, then the average codeword length of f n , interpreted as a coding scheme for S, is ACL( f n ) / n . Therefore, Theorem 12.9 says that by taking extensions of a symbol source, it is theoretically possible t o find a coding scheme for the source whose ACL is arbitrarily close to H (S). (b) Any code for Sn must contain qn syrnbols; hence the number. ol coclewords in any coding scheme for Sn grows exponentially with n. For exam-
12.4. The Binary Huffman Code
385
ple, if we are coding a transformed and quantizcd image with 64 quantization levels using its nth extension with n = 5, then the coding scheme would have t o contain 645 > 1 billion codewords! Of course, for any image of reasonable size, only relatively few of the codewords would ever be used but it would not be unreasonable t o expect that several hundred codewords would be required. Such a large code can result in considerable overhead.
The Binary Huffman Code The binary Huffman code3 is a simple algorithm that produces a coding scheme for a symbol source that is optimal in the sense that its average codeword length is as small as possible.
Definition 12.11. Given a symbol source S ltles { p l , p2 , . . . , p,) :
=
{ s l , s2, . . . , s,) with probabil-
1. If q = 2 , let f ( s 1 ) = 0 and f ( s z ) = 1.
>
>
2. Otherwise, reorder S i f necessary so that pl p:! 2 . . . p, and define the new symbol source S' = {sl, s z , . . . , s,-2, s ' ) with probabilities
{ P l , Pz, . . . , P,-Z,P,-l
+ P,).
3. Perform the Huflman coding algorithm o n S ' , obtaining the coding scheme f' ~ Z , U ~ , Iby L
(Here the c, and d are strings of 0 ' s and 1's.) 4. Define the coding scheme, f , for S by
Example 12.12. Consider the message in Example 12.5(a). To show how Huffman coding worlts on such a mcssage, we use the following informal algorithm taken from Roman, Introduction to Coding and Information Theory.
(1) List all probabilities in decreasing order. Think of these probabilities as being the nodes at the base of a, binary tree a,s shown helow. 3 ~ h description e of the Huffman code in this section is adapted from Roman, Coding and Information the or,^, Springer-Verlag ( l Y Y 7 ) , and Roman, lntroduction to Coding and Information Theory.
386
Chapter 12. Image Compression
(2) Combine the smallest two probabilities into one, and assign it the sum of the two probabilities.
(3) Repeat steps (1) and (2) at the next level.
12.5. A Model Image Coder
387
(4) Now assign codewords to each node starting from the top of the tree. Each time a node splits into two children, assign the left child the the codeword of the parent with a zero appended and assign the right child the codeword of the psrcnt with a 1 appended.
12.5 A Model Wavelet Transform Image Coder In this section, we will implement a simple wavelet-based image coder using the principles discussed in this chapter. The compression scheme shown here is very rudimentary and the results far from optimal. MATLAB code implementing the scheme is given a t the end of the section. This code assumes that the reader has access to the MATLAB Wavelet Toolbox and the MATLAB Image Processing Toolbox. (1) Transform. Our coder will use an orthogonal wavelet transform with Daubechies orthogonal filters as described in Chapter 9. The coder will implement the MATLAB command wavedec2, which performs a two-dimensional DWT with zero-padding. Before processing the image, we will subtract from each pixel a constant equal t o the average value of all the pixels in the image. This will guarantee that the image we transform has zero mean. The purpose of doing this is t o make the wavelet coefficients of the image more evenly distributed around zero. Otherwise, the lowest level average coefficient of the wavelet transform will be extremely large thereby affecting the quantization. The reader is invited to remove this command from our MATLAB code in order
388
Chapter 12. Image Compression
t o more clearly see its effect. (2) Quantization. Our coder will use uniform scalar quantization of the wavelet coefficients without thresholding. The quantization map will be exactly as described in Figure 12.l(left). Specifically, for a given even number q, the coder will specify q - 1 bins given by a partition of [-A, A] of the form
{-A
(N'2)-1 + nA>,,o
U
(0) u {A
(N-2)-1 - nA),=,
,
where A is such that the wavelet coefficients of the image are contained in [-A, A] and A = 2A/q. The inverse of the quantization map will map t h e kt,h ql~a.ntizedvalue t o the center of the kth bin. This map is exactly as described in Figure 12.l(right).
(3) Coding. We can think of the q- 1 quantization levels in our quantization scheme as symbols in an alphabet and the string of quantized wavelet coefficients as a message over this alphabet. We have seen in Section 12.3 that the entropy is a useful tool for estimating how efficiently such a message can be coded. We will not explicitly implement a coding scheme for these quantized wavelet coefficients but will instead estimate the optimal compression rate by computing the entropy of thc probability distribution corresponding to the distribution of values in the q - 1 quantization bins. Specifically, we compute q-1
where Pk =
(number of wavelet coefficients in the kth bin) (total number of wavelet coefficients)
This figure is a good estimate t o the minimal codeword length required t o code the quantized wavelet coefficients. The quantity b is measured in bits per pixel and will be referred to as the nominal compression rate.
12.5.1 Examples In our first set of examples, we consider compressing the magic square image at a fixed quantization level and with various choices of wavelet filter. The quantization level is fixed at q = 26, and the Haar, Daubechies four-coefficient, and Daubechies 20-coefficient scaling and wavelet filters are used. Figure 12.3 shows the original image, and Figure 12.4 shows the irnage compressed using the Haar filter. The nominal compression rate is about .26 bits per pixel. Figure 12.5 shows the image compressed using the Daubechies four-coefficient filter. 'l'he nominal compression rate is about
12.5. A Model Image Coder
389
.28 bits per pixel, about the same as with the Haar wavelet. However, the former image exhibits blocking artifacts present with the Haar filter, and the latter image largely lacks these artifacts. Figure 12.6 shows the image compressed using the Daubechies 20-coefficient filter. Again the nominal compression rate of .39 bits per pixel is comparable with the other filters. In the reconstructed image, we clearly see ringing artifacts arising from the fact that long filters with many vanishing moments tend t o be highly oscillatory. In the next set of examples, we fix the scaling filter to be the Daubechies ten-coefficient filter (so five vanishing moments) and try various quantization levels. Here we use q = 64, 46, and 26. The results are shown in Figures 12.7-12.9. The MATLAB code used to produce the above pictures is listed below. The reader is invited t o try variations on the given examples, such as (1) writing your own code implementing hard and soft thresholding, (2) implementing a periodic wavelet transform in two dimensions, (3) extending the image by reflection and using symmetric biorthogonal filters, or (4) using a wavelet packet transform. The MATLAB Code
%% Load the image from a preexisting .mat file. %% Here the file is called msquare.mat. %% The image is placed in the variable X. LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
load msquare; X=msquare;
LLLLLLLLLLLLALLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
%% Set the parameters: %% q = determines number of quantization levels %% (as described in the text above). %% Must be even. %% wname = determines the wavelet and scaling filters %% used. The names are from the MATLAB Wavelet Toolbox. LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
n
0
0
0
0
0
0
0
0
0
0
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
%%
%%
Demean the data and take the wavelet transform down to 8 levels.
390
Chapter 12. Image Compression
X=X-mean(mean(X)); [C S] =wavedec2(X,8,wname) ; 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 P 0 0 0 0 0 0 0 0 0 0 0 0
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL
0 0
Create a vector z containing the centers of the %% bins used for quantization. This will be used %% in the standard MATLAB hist command. %% %% Determine the range of the data: [-L,L]. Then %% specify the binwidth.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
LLLLLLLLLhLLLALLLLhLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL %% Specify bin centers to the left of zero (this is a), %% then those to the right of zero (this is b). 4 Define z t o contain the bin centers. %% Define w to contain half the width of each bin 0 0
a=-L+(del/2):del:-(3/2)*del; b= (3/2) *del :del :L- (de1/2) ; z=[a 0 bl ; w=[(del/2)*ones(l,length(a))
del (del/2)*ones(l,length(b))l
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
LLLLLLLLLLLALLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL
%%
LL
0 0
%%
Define H to be the histogram corresponding the the bin centers z with binwidths w. Then calculate the nominal compression rate.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL
for i=l:length(z) H(i)=sum(abs(C-z(i)) end x=H/length(C)+eps; ent=sum(-x.*log2(x))
<= w(i))
;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL %% Dequantize the wavelet coefficients by mapping %% %%
all coefficients that fall in a specified bin to the center of that bin. We store the
12.5. A Model Image Coder
%% dequantized wavelet coefficients in the variable %%
TMP for convenience.
len=length(C); TMP=C ; for i=l:len for j=l:length(z) if abs(TMP(i)-z(j))<=w(j) end end end
~MP(i)=z(j);
%% Reconstruct the compressed image. LLhLLLALLLLLLLLLLLLLLLLLLLLLLLLLL/oLLLLLLLLLLLLLLLLL 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
LLLLLLLLLLLLLALLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
%% %%
Display the image with 256 uniform gray scales by rescaling the pixel values of Y to lie in the range [1,2561 and defining a uniform gray colormap.
%% LLLLLLLLLhLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL/oL/oLLL 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
m=min (min (Y ; M=max(max(Y)) ; YY=(255* (Y-m)/(M-m))+l; colormap (gray (256)) ; image (YY) ;
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
391
392
Chapter 12. Image Conlpression
50
100
150
200
250
FIGURE 12.3. Original magic square image.
FIGURE 12.4. Iage compressed using Haar with q = 26. Nominal compression rate is .26 bits per pixel.
12.5. A Tblndel Tma.ge Coder
393
FIGURE 12.5. Image compressed using Daubechies four-coefficient filters with q = 26. Nominal compression rate is .28 bits per pixel.
FIGURE 12.6. Image compressed with q = 26 and the Daubechies 20-coefficient filters. Nominal compression rate is .39 bits per pixel.
394
Chapter 12. Image Compression
FIGURE 12.7. Image compressed with q = 64 and the Daubechies ten-coefficient filters. Nominal compression rate is 1.15 bits per pixel.
FIGURE 12.8. Irr~agecorr~pressedw i t h y = 46 and the Daubechies ten-coefficient filters. Nominal compression rate is .74 bits per pixel.
12.5. A Model Image Coder
395
FIGURE 12.9. Image compressed with q = 26 and the Daubechies ten-coefficient filters. Nominal compression rate is .34 bits per pixel.
Chapter 13 Integral Operators The purposc of this ctlaptcr is to dcscribe the aplication of wavelets and wavelet-related ideas to the problem of tlie fast computation of certain integral operators. As with image compression, the success of wavelets in this applicatiorl arises from the fact that wavelet bases provide a very efficient representation of fiinctiorls that are smooth off a small set of singularities. The objects we are interested in are operators of the form
Here cr, and b can be infinite. Such operators arise in a variety of contexts in rriatheniatics. Several exaniples are given in the next section. The algorithm for efficiently coniputing integral operators is described in Sectiori 13.2. It is knowri as the Beylkin-Coifrnun-Rokhlir (BCR) algorithm.' The idea is to discretize the integral operator (13.1) in a special way by using a wavelet basis. This discretization is sornetirrles referred to as the n,onstundu~*d reprf;sentation (the stundurd representution is described in Exercise 13.24).
13.1 Examples of Integral Operators 131.1
Sturm-Liouville Boundary Value Problems
Consider tlie following nonhorriogeneous, second-order boundary value problem: (1:j.a) b ( x )y']' q(x)y = f (z).
+ +
a1 ~ ( 0 ) a2 d(O)= 0,
(13.3)
(13.4) bly(1) + b2 ~ ' ( 1= ) 0, where p ( z ) , p f ( z ) , and q ( z ) are continuous and strictly positive on [O, 11, and u l , a z , bl, and b2 are given. An equation of the form (13.2)-(13.4) is an example of a Sturrn-Liolxville boundary value p r ~ b l e m Such . ~ problems 'This algorithrr~was first described in the paper by Beylkin, Coifman, and Rokhlin. Fast wavelet transforms and numerical algorzthms, Comml~nicationson Pure and Applied Mathematics, vol. 44 (1991) 141-183. 2 ~ e for e example Boyce and DiPrima, Elementary Differential Equations and Boundary Value Problems, Third Edition, Wiley (1977) Section 11.3. See also Folland, Fourier Analysis and Its Applications, Wadsworth & Brooks/Cole (1992).
398
Chapter 13. Integral Operators
arise in the solution of certain partial differential equations of classical physics. We will show that the solution to (13.2)-(13.4) is given by an iritegral operator of the forin (13.1): where K(z,y) is called tlie Green's function for the system and is independent of the right side f (.r). The Honiogeneous Equation Coiisider the liorriogerieous version of (13.2); namely
By the theory of linear, second-order differential equations," there exists a pair of linearly independent solutions yo(z) and y (z) satisfyiilg (13.5). Also, by the standard theory, the Wronskiur~of yo and y l , W(yo,y1)(r) = yo( 2 )yi ( J ' ) - yl (s) yb (z) is norizero on the interval [O: 11. Tlle geiieral solution t o (13.5) has the forrri
+
u(:.c)= C o Yo (x) c1 y1 (x).
(13.6)
Suppose ilow that, we seek a solution to (13.5) satisfyiiig the initial conclit,ion (13.3). Such a solutiorl will always exist. In order to satisfy (13.3), we nlust llave
This equation is satisfird if co and c l satisfy the linear systerri
Tlie deterrninant of the above matrix is ul a2 W(yo,yz)(0), ancl since the Wronskian is nonzero, the above systern will have a unique solution \inless (11 or a2 is zero. If, say, a1 = 0 arld a2 # 0, then we rriust solve co yb(0) cl yi(0) = 0, which will always have at least one solution. Let IL(](T) be a solutiori of (13.5) satisfying (13.3). A sirnilar argument shows that there will always he a solution of (13.5), call it P L (x), ~ satisfying tlie initial conditioil (13.4). We will assume in wliat follows that uo(x) and ul(z)are riot rrlultiples of each other. This means in particular that there is no single furictiori (apart froiri the trivial solution u(z) = 0) that satisfies the boundary value problerri (13.2)-(13.4). Another way of saying this is to say that 0 is not an eigeiivalue of (13.2)-(13.4).
+
"See for example Boyce and DiPrima. Sections 3.2 and 3.3.
13.1. Examples of Integral Operators
399
The Nonhomogeneous Equation Tn order to find a soliltion to the nonhomogeneous equation (13.31, we need to find a particular solution, call it up(x). Such a solution is given by4
Then t,he general ~olut~ion to (13.3) has the form
Note that by our assumption that uo (z) is not a constant multiple of ul (z) we have guaranteed that W (uo,u l ) (z) is never zero. In fact, it can be shown that W (uO,u l ) (z) is a constant multiple of l / p ( z ) (Exercise 13.16). Considering the initial condition (13.31, we arrive at
where we have used the fact that uo(z) satisfies (13.3) and that up(0) = ub (0) = 0. If a1 u1(0) a2 ui (0) = 0, then ul (a) would satisfy the boundary value problem (13.2)-(13.41, contrary to our assurription that there is no such function. Therefore, a1 ul(0) a;!u', (0) # 0 and so cl = 0. Considering the initial condition (13.41, note first that
+
+
so that we arrive at
4This solution is the result of using the technique of variation of parameters; see Boyce and DiPrima, Section 3.6.2.
400
Chapter 13. Integral Operators
where we have used the fact that ul (z) satisfies (13.4).Solving the above equation gives Co
=
Jo
1
U'(t) f ( t )dt. ~ ( M.(uo, t ) ~l)(t)
Combining this with (13.7) gives
where ul(x)uO(t)
~ ( W t )( 7 ~ 01 ~, 1()f )
if
o jt j2 ,
K ( z ,1 ) = uO(s)ul(t)
ifz
~ ( tW)( u o ,,ul)(t) The Green's function K ( s ,t ) satisfies the following properties. ( 1 ) K ( z ,t ) is continuous on the unit square [O,l]"
( 2 ) K ( x ,t ) has continuous partial derivatives of order at least 2 on the sets { ( z ,t ) E [O,l]?x < t } and { ( z ,t ) E [O,l]?x > t } .
(3) We have that
dK ~ - + t + dz
lim - ( z , t )
and lim .rtt
dK
-( z ,t
ax
=
)=
ui ( z ) u o ( z )
~ (W4( ~ uOi ), ( z )f (4 ul (.)b(.) P(.T) W ( I I 7,1~ 1 ) ,(.T)
f.I.(
Therefore d K / d z is in general discontinuous on [O, 112. This suggests that the kernel K ( z ,t ) can be efficiently represented in a wavelet basis. Example 13.1.
(a) Consider the boundary value problem
It is easy to verify that the homogeneous equation y" general solution y = co sin(x) cl cos(z).
+
+y
=
0 has the
13.1. Examples of Integral Operators
401
Condition (13.9) is satisfied by the choice co = 1, cl = 0 giving uo(z) = sin(z), and condition (13.10) is satisfied by the choice co = cos(l), cl = giving u l ( z ) = sin(z - 1).
- sin(1)
The Wronskian Mf(uO,u l ) is found to be
Hence the Green's function, K (2,t ) , for (13.8)-(13.10) is given by sin(z - 1) sin(t) K ( z ,t )
=
sin(z) sin t sin(1' sin(1)
-
1)
ifO
The kernel is plotted in Figure 13.1. A discontinuity in the first derivative is clearly visible on the line z = t. (b) Consider the boundary value problem
(1
+ x 2 )3'' + 2z 3' - 23 = f (z),
(13.11)
~ ( 0=) 0,
(13.12) (13.13)
g(1) + g l ( l ) = 0. It is easy to verify that the homogeneous equation (1+z2)y" +22 y' - 2y = 0 has the general solution
Condition (13.9) is satisfied by the choice co = 1, cl = 0 giving and condition (13.10) is satisfied by the choice co = T + 3, cl
=
-6 giving
u , (z) = (T+ 3) 2 - 6 (1 + z arctan(2)). The Wronskian W ( u ou, l ) is found to be
Hence the Green's function, K ( z , t ) , for (13.11)-(13.13) is given by
The kernel is plotted in Figure 13.1. A cliscontinuity in the first derivative is clearly visible on the line z = t.
402
Chapter 13. Integral Operators
FIGURE 13.1. Left: K ( x ,t ) from Example 13.1(a). Right: K ( x ,t ) from Example 13.1(b)
.
13.1.2
The filbert Transform
Definition 13.2.
The Hilbert transform of a function f ( x ) , denoted H f (s),
is dtficr~edb y
provided that the limit exists in some sense.
Remark 13.3.
(a) If we simply set
t =
0 in (13.14), we arrive a t the
expressiorl
which suggests tlrc convolution of f ( r )and (rrz)-l. Because /_"a z-' dx is not defined for any a > 0 as an irliproper Rie~llallnintegral, the expression (13.15) will in general not exist for arbitrary functions f (x).
(b) Since by symmetry we car1 write
this suggests that the expression (13.15) can make sense for somp funct,ions f (z) provided that we approach the singularity a t 0 symmetrically. This suggests the definition (13.14). This way of approaching a singularity is referred to as the principal vulue znteyrul.
13.1. Examples of Integral Operators
403
(c) As a. function of two variables, K ( x - t) = (x - t)-' is COCon the sets { ( x , t ) :z < t) and {(z, t): z > t ) but has an infinite discontinuity on the diagonal {(z,t): z = t). The Hilbert transform is a simple example of a singulur integrul o p e r ~ t o r . ~ In order t o investigate some basic properties of the Hilbert transform, it is important to establish the class of functions on which it is defined, that is, for which the limit exists. To this end, we consider for a given f (s)and t > 0, the integral
If f (z) is L' on R, then the first integral on the right side is well defined for each z E R as an improper Riemann integral. This follows from t,he fact that since f (z - t ) / t J 5 If (a - t ) if ltl _> 1,
By using thc Cauchy-Schwarz inequality, it would also be sufficient to assume that f (z) is L2 on R. As for the second integral, assume that f (z) is C1 on R. Then by Taylor's formula, f ( s - t ) = f(z)- f r ( z ) t + R ( z . t ) , where limt+o R ( z , t ) / t = 0, and for each z, R ( z , t ) is C1 on R as a function of t. Then
L 5 1 t l
lim
/
tlltl
E
> 0, we arrive at
f (z - t, dt = -2 f r ( z ) +
t
ll 1
R(f' t, dt,
which exists for all z E R . We have proved the following theorem.
Theorem 13.4. Suppose that f (x) is L~ ( o r L') and is defined for each x E R.
c1o n R. T h e n H.f (z)
%n advanced treatment of this theory may be found in Stein, Singular Integrals and Differentiability Properties of Functions, Princeton University Press (1970). A more accessible treatment of the properties of the Hilbert transform may be found in Neri, Singular Integrals, Lecture Notes in Mathematics, vol. 200, Springer-Ver lag (1971).
404
Chapter 13. Integral Operators
We will now consider how the Hilbert transform behaves relative t o the Fourier transform. In order t o do this, we will suppose that f (z) is L ~and , C' on R, and define for each c > 0 the function H
S
f
(4 = 7r
I
f(x;
t, dt.
Itl2f
This integral is well defined for all s E R and is an "honest7' convolution of f ( a ) and the function g, (s)= (7is)-'X l X>, l (s),which is L2 on R. Since f (z) is L1, Theorem 3.20(b) says that H, f (z) is L2 on R. By the Convolution Theorem (Theorem 3.21)' f (y) = F(y) F, (7). We can calculate gE(y)as follows. For M > 0, define
Then limM,,
g E r M ( s )= g,(z) in L2 on R. B y Plancherel's formula, = ge(7) in L2 on R. We calculate g6,M( 7 ) .
L . M ( ~=)
=
1
e-2xi,x
t
dn: 7rx cos 2 r y z dz
-
sin 2 r y x
z
dx
" sin 2 x y x dn:
-2i
7ix
sin u
Letting M
-+ co,we arrive at
I,,, m
XY)=
-
sin u
du
if y
Tisin u
2i
> 0, (13.18)
U
Therefore, for each y ,
A
lim H€f(-Y) = €--+a
-2i
f (7)
sin u
du
if y
> 0.
7r
u
Here we have used the fact that
S,sin(u)/udu =
clu
j7
ify
(Exercise 3.31).
13.1. Examples of Integral Operators
405
We have proved the following theorem.
Theorem 13.5.
Suppose Llmt f (x) is L1 and C' o n
R. With H, f (x) given
by (13.17),
pointwise o n R.
Remark 13.6. (a) Equation (13.19) allows us to extend the definition of H f (z) to any L2 function f (z). Since the Fourier transform is defined for any L~ function f (z) (Section 3.6), and since Plancherel's formula holds for any L~ function, the function -2 sgn(y) f (y) is also an L2 function. Wc can then define H f (z) t o be the inverse Fourier Irarlsforrn of this function. Moreover, this definition establishes that 11 H f [ I 2 = [if 112 for any f (z), L2 on R. h
(b) Equation (13.19) also allows us to write down the expression ( l / z ) = -ni sgn even though 11s is neither L1 nor L2 on R. This is an example of a distributionul Fourier truns.form. (c) In light of (13.19), it is easy t o show (Exercise 13.19) that the Hilbert trallsfvrrrl cvrrlrrlutes with translations and positive dilations, and anticommutes with negative dilations. In fact, it is the only such operator (up to a constant multiple). From the point of view of wavelet theory, there are two immediate consequences of this fact. (i) If { T / I (z) ~ ,) ~ is a wavelet basis (orthonormal or not), then { ( H I / I ) ~ , ~ ( Z ) ) is also a wavelet basis. (ii) If p(z) satisfies a two-scale dilation equation like (7.71, then so does Hp(z). Since Hp(s) of a compactly supported function p(s) will never have compact support (unless p(z) = 0), this shows that every finite dilation equation that has a compactly supported solution will also have a noncompactly supported solution.
(d) The Hilbert transform arises naturally in complex analysis, in particular in the calculation of conjugute hurmonic functions on the upper half-plane. Higher dimensional generalizations of the Hilbert transform are the Riesz transforms of singular integral theory. The Hilbert transform also arises in signal processing when dealing with cuusul signuls (h(t) is causal if h(t) = 0 for t < 0) and with sig~lalstllal coritain only positive frequencies (i.e., h ( y ) = 0 if y < 0).
A
406
Chapter 13. Integral Operators
13.1.3 The Radon Transform Definition 13.7.
Given f
(XI,
xz), L' o n R ~ define , the Radon transform of
f , denoted R f b y
for
e E [O, 27r)
and s E R.
Remark 13.8. (a) Note that for a fixed value of s E R,the vertical line through X I = s is given by the parametric equations X I = s: x2 = t for t E R.If we rotate this line counterclockwise through an angle 8 , then the parametric equations become
( ) ( =
cosI9 i n
sin" -cosH
(;)=(
+
s cos8 t sin8 s sin 8 - t cos 8
Hence (13.20) can be interpreted as the line integral of f (xl, x2) on the line perpendicular t o the angle 0 and a directed distance s from the origin (see Figure 13.2).
(b) Note also that if !is the line corresponding the angle 8 and directed distance s , then (xl, x2) E t if and only if xl cos8 x2 sin0 = s. This follows by a direct calculation using the paramet,rization given in part (a), and is also geometrically obvious by Figure 13.2.
+
(c) We sometimes write x = (xl x2) and O = (cos 8, sin 8); in which case, we say x E !if and only if s = x - 19. In this notation, the Radon transform can be defined by
f (s8
+ t~') dt.
(d) The Radon transform is the basic mathematical model for many different types of tomographic imaging, including C T (Computerized Tomography), P E T (Positron Emission Tomography), SPECT, and MRI (Magentic Resonance Imaging) .6 For example, in C T imaging, an x-ray beam of intensity I. is introduced at a point on the object or body being imaged, and its intensity I is measured at the point where it emerges from the body. If the object has constant density, then the intensity of the emerging beam is related to I. by I = IOeKaps, 'See Deans, T h e Radon Transform and Some o f Its Applications, Wiley (1983), and Natterer, T h e Mathematics o f Computerized Tomography, Teubner ( 19861, for more information.
13.1. Examples of Integral Operators
407
FIGURE 13.2. The line perpendicular to the angle 6 and a directed distance s from the origin. where p is Lhe derlsily of the object, s is the distance between the entry
point and exit point of the beam, and a > 0 is an attenuation coeficient that is related t o the physical properties of the object. In general, the density and attenuation coefficicnt of the object will vary with position. Suppose that they are given by p(x) and a ( x ) . Then in the usual way, we can consider an area element with width ds in a twodimensional cross section of the object, centrered a t x. Then the attenuation of the beam as it passes through this element is given by I. e - " ( x ) f ( x ) d s . Integrating over each of these area elements yields the integral
where t is the line joining the entry and exit point of the beam. Assuming that the attenuation coefficient is constant throughout the object (so by normalizat,ion eq11a.l t,o 1 ), we arrive at -
In
-
=
i
p(x) ds
=
Rp(Q,s ) ,
where t = {x:x . 8 = s). Therefore, the problem of tomographic imaging becomes the problem of inverting the Radon transform. A reasoriable image
408
Chapter 13. Integral Operators
of the cross section of the object can be produced once the density function is known. (e) It is geometrically obvious from Figure 13.2 that the line corresponding to the angle 8 and the directed distance s is identical to the line corresponding to 6' .ir and -s as well as 6' .ir and -s. Therefore, we conclude that R f ( 8 , s ) = R f ( 0 + T , -s) = R f ( 0 -IT, -s).
+
-
Inversion of the Radon Transform We will now present a formula for the inversion of the Radon transform. Part of this inversion formula involves the computation of an integral operator with a singular kernel. If f(z1,xZ) is L I o n R ' , t h e n Rf(6,s) is L' o n [O,27r) x R (Lhal is, JR Rf(Q,s)l ds do is finite). If i n addition, f (xi,xz) is C O o n R' and has compact support, then Rf (6,s ) is CO o n [O,27r) x R.
Theorem 13.9.
SozT I
Proof: To see that Rf (Q, s) is L1 on [0,27r) x R, note that for each 8 fixed,
CO"
(
(
s
i
n
(
dn" cos*
) Idtds
where we have made the change of variables u = s cos 19 + t sin 8, v sin0 - t cos Q and noted that the Jacobian of the transformation is 1. Therefore,
=
As for continuity, suppose that f (xl, x2)vanishes outside a ball of radius A > 0 about the origin. Then for every Q and s,
LA A
RJ(Q,s) =
f ( s cos 6'
+ t sin 0, s sin8
-
t cos8) d t .
Since f (xl, x2) is continuous and compactly supported, it is uniformly continuous on R2. This implies that lim
f (s' cos 6"
+ t sin Or, s' sin 6'' - t cos 8')
(@',s')+(.,Q)
=
f (s cos 8 + t sin 8, s sin 6'
-
t cos 8)
13.1. Examples o f Integral Operators
409
in LW on R. Therefore, lim
R f ( O f ,s f )
(er,sr)+(~,e)
lim
-
-
J A
(Qf,s')+(s,Q)
A
=
f ( s f cos 8'
-A
+ t sin Q', s f sin 8'
-t
cos 8') dt
f ( s cos O + t sin 8 , s sin 0 - t cos 0) d t
Rf(Q,s).
Hence R f ( 8 ,s) is C0 on [O,2;.r) x R. If f ( x l ,x z ) is L1 on R ~then , for each 8 , Ref ( s ) is L' on R. Therefore, we can compute its Fourier transform. The following theorem relates the onedimensional Fourier transform of Ref ( s ) to the two-dimensional Fourier transform of f ( x l ,2 2 ) . It is referred to as the Fourier slice theorem because Ref ( 7 ) is a "slice" of f ( 7 1 , y 2 ) on a line through the origin making an angle 6' with the positive x-axis. This observation will be used t o derive an inversion formula for the Radon transform. A
A
Theorem 13.10.
(Fourier Slice Theorem) Suppose that f ( X I , 2 2 ) is L'
R ~ Then .
on
n
Re f ( y ) = f^(y C
O S ~y,
sin6').
Proof:
S,S, 1/
f ( s cos 8
= =
+ t sin 8 , s sin 8 - t cos 8) d t e-2"iys ds
f ( 2 L , V ) e-2ai-y(u cos H+a sin H )
R
d u dv
R
where we have made the change of variable u = s cos 8 sin 6' - t cos 6' and noted that s = u cos 6' v sin 6'.
+
+ t sinQ, v
Theorem 13.11. (Radon Inversion Forniula) Suppose that both f f^((YI, y ~ are ) L' on R ' . Then
f
( ~ 1 , 2 2= )
2 r z r ( z l cos
0+z2sin 0 )
1.
dr d o .
( X I , x2)
=
and
(13.21)
Proof: Writing the Fourier inversion formula for f (xl, xz) in polar coordinates gives
410
Chapter 13. Integral Operators
LI"^
f ( r cos 8 , r sin 0) e Znir(z1 cos 0+xz
=
Znir(xl cos 0+x2 sin 8 )
sin 8 )
dr d~
r d r dB.
e
Since ~ y f ( r = ) Hs,f(-r)
(Exercise 13.20), we can write 2?rir(xl cos B+xz sin 0)
m
e
Re-?,f ( - r ) e
h"J,
"
=
A
Ref (-r) e
m e
Ref ( - r ) e
r dr d o
27r2r(x1 cos 8+xz sin Q)
,dr d o
2?rir(xl c o s ( B + ~ ) + x zs i 1 1 ( 0 + ~ ) )d ,
-2?rir(zl cos 8+xz sin 8 )
2?rir(xl cos B + X ~ sin e )
do
dr
( - r ) d r dQ.
Combining the two calculations,
f
2 x i r ( x l cou 8 t xz sin B )
r dr d8
2 n i r ( x l cos 0+xz sin 0 )
Irl dr d6'
(XI,5 2 )
-03
=
L T S , c f ( r ) e~ n i r ( cos x ~ 0+x2
sin 0)
lrl
dr do.
Equation (13.21) can b e "unpacked" by looking a t the outer and inner integral separately. T h e outer integral is referred t o as backprojection, and the inner as ramp filtering, which we will see corresponds t o an integral operator with a singular kernel. We will describe each operator below. Given any function h(8, s) defined on [O,2n) x R, define the backprojection operator, R#, applied to h(8, s) as follows.
Definition 13.12.
R# h , ( z l ,22) =
h(8, z~ cos 8 + 2 2 sin 8) dB.
S_:
Note that if h(8 - .rr, s) = h(8, s ) , then ~ # h ( xzz) ~ ,= 2
LT
h(8, X I cos 8
+ x2 sin 8) dB
13.1. Examples of Integral Operators
411
(Exercise 13.21).
Definition 13.13. Suppose that hjx),L' o n R, has the property that 1 x(:,i is also L' o n R. Define the ramp-filtering operator Q o n such h(x) as follows. Qh(i) =
x(Y)
-Yl
c~~~~~dy.
Note that b y Fourier inversion, we have that
Remark 13.14. (a) It is clear from (13.21) and Definitions 13.12-13.13 that the Radon inversion formula can be written as
(b) The ramp-filtering operator is related t o the Hilbert, tra,nsform. By (13.19), we have that for all L~ functions f (x), Hf (7) = -i sgn(7) f^(?). We know by the Differentiation Theorein (Theorem 3.33) that differentiation , .ince corresponds to multiplication of the Fourier transform by - 2 ~ 1 ; ~5' (-2~iy)(-1; sgn(?)) = - 2 ~ 1 ~ 1we , can conclude at least formally that
Of course the interchange of limiting processes in the above calculation must be justified. (c) Just as the Hilbert transform corresponds to "convolution" with the function ( T X ) - l , we can in a similar way interpret Qf (x) as a convolution operator Qf (x) = f * w(x), where ,G(Y) = Irl. Evident,ly v) cannot he a function as we have defined them (but is in fact a generalized function or distribution). Nevertheless we can now write the Radon inversion formula as 1 (13.22) f ( x 1 , ~ 2= ) - R#(RQ* w ) ( x 1 , ~ 2 ) . 2 This explains why Q is referred t o as ramp-filtering. Ref is filtered via collvolutio~lwill1 so~rlethirigwhose Fourier transform is a "ramp" in the frequency domain. (d) What is often done in practice is t o replace w in the above formula with a function wo(x) that approximates w in some sense. Usually, W O ( X ) is defined by writing go(^) = Iy ?(Y) for some function g(y) that decays rapidly at infinity. In this case, (13.22) is used t o define an approximation t o f ( X I ,x2) as
412
Chapter 13. Integral Operators
(e) The relationship between r ( x l , z2)and f (zl,xz) can be determined via the filtered backprojection formula (Exercise 13.22):
where g(Q,s) = g e ( s ) is any function Loo on [O,27r) x R and f (xl,x2) is L1 on R ~The . convolution on the left is in one dimension and that on the right is in two dimensions. Applying filtered backprojection t o (13.23) yields
where R # W " ( X ~ , zz) = Wu(xl,x2). That is, once we know the smoothing function g ( t ) , we can determine the two-dimensional convolution kernel Wo(x1, ~ 2 7 . (f) We would also like t o go in the other direction, specifying the function Wo(xl , x2) and determining the smoothing function g ( t ). Here we will typically allow the smoothing function to depend on 8 , so that we are really determining a collection of functions {go (t))Q,[o,2mpSince
this suggests the relation wo(s) transform side, this becomes
Since Go(?)
=
Iy 1
-
(1/2)QRel.Vo(s).When taken on the
(y), we arrive finally at
F0 (7) = -21 Wo(7 cos 8, y sin 8).
(g) One way to use wavelets in the inversion of the Radon transform is to require that the kernel functions Wo(xl,x2) be the elements of a twodimensional wavelet basis. It ,turns out that the fact that wavelets have vanishing moments is advantageous for inverting the Radon transform efficiently and locally. Local inversion means that a good approximation to the image on a small region of interest can be obtained from processing the Radon transform data corresponding to lines that pass close t o that regi~n.~ 7See for example Rashid-Farrokh~,T , i l l , Rerensteln, and Walnut, Wovelet-bosed multtresolutton local tomography, IEEE Transactions on Image Processing, vol 6, (October 1997) p. 1412-1430, and the references cited there.
13.1. Examples of Integral Operators
413
In light of Remark 13.14(b), we can make the formal calculation
dt Q ~ ( x= ) H-(x) dx
f f ( x- t )
=
dt.
Of course, this calculation involves an exchange of limiting processes that must be justified. Leaving that aside for the moment, we integrate the right side by parts and obtain (Exercise 13.23)
L,>€
ff(x- t )
dt
=
f(x-6) €
+ f(x+e) €
+/
f(x-t) dt. Itl2~ t2
In any case, the following theorem can be rigorously proved.
Theorem 13.15.
Suppose that f ( x ) is
( a ) lim
t--to E
and C' o n R . Then
1
E
(b) If we define for
LI
exisis for every x t R.
> 0,
then for each y t R.
This shows that the ramp-filtering operator Q involves an integral operator with a singular kernel.
Exercises Exercise 13.16. Show that if uo(x) and u l ( x ) are linearly independent solutions to the homogeneous equation b(x)yl]' + q(x) y = 0, then the Wronskian W (uo,ul(x) is a constant multiple of l/p(x). (Hint: Show that the derivative of the function p(x) W(uo,ul)(x) is zero.) Exercise 13.17. Find the Green's functions for the following boundary value problems. Verify that each function has discontinuous first derivatives. (a) yff - 4yf - 12y = f (x), y(0) = y(1) = 0.
+
+
+
+
(b) ( I x) yff yf = f ( x ) , y(0) yf(0) = 0, y(1) y f ( l ) = 1. (Hint: y(x) = 1 is one solution t o the homogeneous problem.)
414
Chapter 13. Integral Operators
Exercise 13.18. Show that a homogeneous second-order linear differential equation with constant coefficients a ytt + b y t + c y = 0, a > 0, is equivalent t o an equation of the form [ ~ ( xyt]' ) q(x) y = 0 for some continuous functions p(x) and g(x) that never vanish on R. (Hint: Take p(x) = A es" and q(x) = c e D X ,and determine a,ppropria,tevaliles of t,he const,a,nt,s.)
+
Exercise 13.19. Show that the Hilbert transform commutes with translations and dilations. That is, show that if a > 0 and b € R, then
and H ( T bf )(x) = T b ( Hf )(x). (Hint: Use (13.191.) A
A
Exercise 13.20. Show that for any r , Ref ( r )= Re-, f ( - r ) . (Hint: Use the fact that Ref (s) = Re-, f (-s).) Exercise 13.21.
Prove that if h(O - T ,s )
= h(Q,s ) , then
where R# denotes the backprojection operator defined in Definition 13.12.
Exercise 13.22. that
Prove the filtered backprojection formula. That is, show
bac go * R e f ) ( x ~ , m = ) (~'ge * f)(xl,x2), where g(d, s) = ge(s) is any function Lm on [O, 27r) x R and f ( X I , 2 2 ) is L1 on R ~ The . convolution on the left is in one dimension and that on the right is in two dimensions.
Exercise 13.23.
Prove that if f (x)is L1 and C1 on R, then for any 6
> 0,
(Hint: Integrate by parts.)
13.2 The BCR Algorithm In this section, we describe the BCR algorithm. Suppose that we wish to approximate the integral operator T given by
13.2. The BCR Algorithm
415
We do not specify any limits of integration, but they should be clear once we specify the integral operator we are interested in. Suppose that we are given a scaling function cp(x) and a wavelet function $(x), which we assume for simplicity are orthonormal. The changes required for the case when they arc not orthonormal arc straightforward and left to the reader.
13.2.1
The Scale j Approximation t o T
A simple way to discretize the operator T is to assume that we can write down an expansion of the kernel K ( x , y) in terms of the scaling function as follows: K(x7 y)
=
C C co(k, t)
%,k(~)
yo,,(y).
Of course there is no reason t o expect that equality will actually hold in the above expansion, as this would assume that the kernel is a function in the two-dirnerlsiorial scalirig space Vox Vo. However, the above assurrlytiorl corresponds to our usual procedure for approximating a continuously defined function by discrete data in such a way that we can conveniently apply the wavelet transform. From now on, we will assume that the only knowledge we have of the kernel K ( x , y) is the coefficients {co(k,t)). We also note that in any practical setting, we will only have finitely many coefficients to work with; so we assume in addition that 0 5 k, t' < M, where Ad = 2N for some N E N. Inserting this expansion of K ( x , y) into the definition for T, we obtain
where so (t)= (f,yo,e). By the orthonormality of the scaling function,
The function Tf (x) is then approximated by the expansion
with equality holding if and only if Tf (x) is in the scale space Vo. Summarizing these calculations, we can write this approximation to T as the following M x M matrix multiplication:
416
Chapter 13. Integral Operators
where Co = [co(k,l)].We can call this the scale 0 apprvximatioll to T. In fact, we could have presented the efficient evaluation of the matrix multiplication (13.24) at the start as the problem to be solved and ignored the connection with integral operators. From this point of view, the BCR algorithm is simply a way to do fast matrix multiplication when the matrix is such that it has an efficient representation in a wavelet basis. Looking at the scale 0 approximation to T, we realize that there is nothing stopping us from forming a scale 1 approximation to T in a similar way. Once we have done it, we will see that it was a good idea. Applying one step of the two-dimensional DWT to K ( x , y ) , we obtain
so that
where s l (1)= ( f ,(I-,,!) the scaling function,
and d l (t) = ( f ,$ ~ ~ , e By ) . the orthonormality of
s; (k) = (Tf,p-l,n) = C ( c i ( k ,4) SI (t)+
(k,
dl (C))
e and d;(k) = (Tf,$ - l , n )
=
C(~l(k.,t) + r l ( k , t )dl(t)). e SI(~)
The function T f (x) is then approximated by the expansion
again with equality holding if and only if Tf (rc) is in Vo.We can write the scale 1 approximation t o T as the following h f x M matrix multiplication:
13.2. The BCR Algorithm
417
where r1 = [ 7 l ( k , i ) ] Bl , = [ P l ( k . , e ) ] .A1 , = [ ~ l ( k , i ) and ] , C1 = [ C l ( k , j ) ] are each M / 2 x M / 2 matrices. Applying the next step in the DWT to K ( x ,y ) , we call write
=
CC
~2
(k0) P-2;k ( 2 )y-2,e (y)
so that
where sz(P) = ( f ,c p - 2 , e ) and d2(!) = ( f ,T / - ~ , ~By ) . the orthonorrnality of the scaling and wavelet functions,
The fi~nctionTf (x)is then approximated by t h e expansion
with equality holding if and only if Tf (x) is in Vo.We can write the scale 2
418
Chapter 13. Integral Operators
approximation to T as the following 3 M / 2 x 3 M / 2 matrix multiplication:
where F l , Bl, and A1 are M / 2 x hf/2 matrices and = [yz(k,t)],B2 = [,&(A, k ' ) ] , A2 = [u2(k,k ' ) ] , arid C2 = [cZ(k,t)]are M / 4 x M / 4 matrices. Continuing in this fashion up to N times, we can form the scale N approximation to T as the matrix product
(13.27) where for each 1 L 3 I N , I'j = ( y j ( k , t ) ) , Bj = (Pj(k, !)), A j = (aj(k,!)), and Cj = (cj(k, 1 ) )are 2-jM x ~ P J matrices, M so that (13.27) is a 2n/l x 2M system.
13.2.2
Description of the Algorithm
The scale J approximation to the integral operator T really consists of the following steps:
(1) Approximate the kernel function K ( x , y) by its projection onto the subspace Vo x Vo.This is written as the expansion
13.2. The BCR Algorithm
419
(2) Approximate the function f (x)by its projection onto the subspace VU. This is accomplished by calculating the coefficients dj (k) = (f. $- j , k ) . and sj(k) = ( f ,( ~ - j , k ) for all k and 1 5 J' 5 J. Of course, not all of these coefficients are required in order to fully represent f (x). This can be accomplished by the expansion
(3) Approximate the function Tf (x) by calculating its projection onto the subspace Vo.This is the expansion
The BCR algorithm consists of one further approximation that is based on the following observation. If the kernel K ( x ,y) has the property that it is smootl-1 apart from singularities on the diagonal, then each of the submatrices 4,and BJ will have large entries near the diagonal and small entries away from the diagonal. The smoothness of the kernel and the number of vanishing moments of the wavelet chosen will help determine exactly how small the off-diagonal entries are. In rrlarly cases, these offdiagonal entries are so small that by establishing a threshold value A , which is usually some small fraction of the largest value in the matrix, and setting to zero all entries whose absolute value is less than X turns each of the submatrices into matrices whose nonzero entries are in a narrow band (say r entries wide, where 7- << M) around the diagonal. Typically, the submatrix CJ is a full matrix (see Figure 13.3). Hence, after the suppression of the small entries, each of the submatrices r j , A ) , and Bj will have approximately r 2 - W nonzero entries. for a total of approximately
rj,
If J = N, where M = 2 N : then this becomes 37-44 - 3R+ 1 nonzero entries. Therefore, with some clever programming, one can perform the Af x 2A1 matrix multiplication approximating T in with 0 (ill)multiplications.
420
Chapter 13. Integral Operators
FIGURE 13.3. The scale 3 approximation t o the integral operators defined by the Green's functions of left: Example 13.l(a) and right: Example 13.1(b). The coefficients displayed are in absolute value larger than l o p 6 of the maximum value of t h e matrix in (13.27). The vertical and horizontal lines are edge effects coming from periodization. Here we have used the Daubechies filter with six vanishing moments.
Since it requires O ( h l )multiplications to calculate the wavelet and scaling coefficients of the approxirnatiorl to f ( z ) .and then another O ( M ) multiplications to recorlstruct the approxiniation to T f (cc) from its wavelet and scaling coefficients. we see that once the expansiori of the kernel K ( z ,y) is calculated, the BCR algorithm is an O(Af) algorithm.
Exercises Exercise 13.24. The standard representation of a linear operator. T, on L2 functions relative to an orthoiiormal basis {g,, (x)):=~is derived as follows. Given an L2 function f (x), we can expand f (x)as
f (4 =
C (f.9.1) g.,(z)
The11 Tf (:I)has the expalision
For each n. we expand T g T L ( zas)
13.2. Tlle BCR Algurithlll
421
Substituting (13.29) into (13.28), we arrive a t
By restricting our attention to a finite orthonormal system, say {gn (x)}:=~, then (13.30) reduces to the matrix equation
where N
N
T = (Tn,rn,)n,m=,= ((Tgn;~ m ) ) ~ , ~ = l and f
=
( ( f ,9n))F=1.
The result c = (c(m))K,l of (13.31) can be interpreted as the coefficients { ( T P Nf gm)};=, , where PN is the orthogonal projector onto that is, N
In other words, calculating (13.31) results in the approximation
(a) Recall t,hat if p(x) and $ ( a ) are the scaling and wavelet function for an orthonormal MRA, then for any finite J > 0, the collection
is an orthonormal basis for Vu. Show that if T is an integral operator of the form (13.1), then
(b) For either of the Green's functions of Example 13.1, compute numerically the coefficients of T for the Daubechies filter with six vanishing liioments. Coiitrast this matrix with the matrix displayed in Figure 13.3. (Hint: For a given M, say M = 256, take the coefficients c o ( k , t )of K ( x ,y) to be just the samples K(k/lLf,j/M), k, 12 = 0, . . , , M - 1 . )
Part VI
Appendixes
Appendix A Review of Advanced Calculus and Linear Algebra A.1 Glossary of Basic Terms from Advanced Calculus and Linear Algebra N denotes the set of natural numbers; that is, N = (1, 2, 3, . . .). Z denotes the set of integers; that is, Z = {. . . , -2, -1. 0, 1, 2,. . .). Z+ denotes the set of nonnegative integers; that is, Z = (0,1, 2, . . .). R denotes the set of real numbers. C denotes the set of conlplex numbers; that is, C = {x i y: x, y E R): where i = RrLdenotes the vector space of n-tuples of real numbers over the field of real numbers. We denote a vector in Rn by x = ( x l , x2, . . . , xi,).Addition and scalar lrlultiplication are defined componentwise in the usual way. Cn denotes the vector space of n-tuples of complex numbers over the field of conlplex numbers. We denote a vector in R" by z = ( z l , z2. . . . , x,,). Addition and scalar multiplication are defined corriponentwise in the usual way. absolute value. The absolute value of a real riumber a, denoted /all is a if a > 0 and -a if a < 0.The absolute value of a complex riurnber x = z+ iy, denoted 121, is . It is also true that \;I2 = z f , where 2 is the complex conjugate of x. (See complex conjugate.) adjoint of a matrix. The ad3oint of an m x n rriatrix A = { a ( i ,3 ) ) is the n x m matrix A* = {a(j,2 ) ) . (See matrix, transpose of a matrix.) characteristic function of an interval. The characteristic function of an interval I, der~otedX I ( x ) ,is the function defined by X I ( x ) = 1 if x E 1, and Xr(x) = O if J: @ I . closed interval. A closed interval is an interval of the form [a,b] = {x: a 5 x 5 b ) , for some real numbers a < b. compact support. A function f ( x ) defined on R has compact support if it is supported on a finite interval. (See supported on an interval.) complex conjugate. The complex conjugate of a complex number z = x + i y is Z = J: - iy. continuous at a point. A complex-valued function f ( x ) defined on an interval I is said to be continuous at a point xo E I if for every t > 0, there
a.
+
426
Appendix A. Review of Calculus and Linear Algebra
is a 6 > 0 such that if x E I and l x - x o l < 6 , then f ( x ) - f ( x o ) l Ecluivalently, f ( x ) is continuous at xo if lirn,,,, f ( x )= f (xo).
< E.
continuous on an interval. A complex-valued function f ( x ) defined on an interval I is said to be continuous o n I if it is continuous at every point in I. continuously differentiable on an interval. A function f ( x ) is continuously differentiable o n a n interval I if the function f ' ( x ) = lirn
f ( t )- f (4 t-x
t+x
is continuous on I. (See limit.)
differentiable at a point. A function f ( x ) is dzfferentiable a t a point xo if ( x o ) exists as a finite number. (See limit.) t - 20 differentiable on an interval. f (z)is said to bc differentiable o n a n interval I if it is differentiable at every point in I. disjoint intervals. A pair of intervals I and J are disjoint if their inter-
the limit f J ( x o )= lim t+xo
section is empty or consists of a single point. finite interval. A fin,ite in,tervn,l is a,n int,erva,l of t,he form [n,,b] = {x:o: 5 x 5 b ) , ( a , b ] = { x : a < x 5 b ) , [ a , b ) = { x : a 5 x < b ) , or ( a , b ) = { x : a < x < b ) , for some real numbers a < b.
greatest lower bound. See infimum. improper Riemann integral. (1) Suppose that a function f ( a ) is continuous on the interval ( a ,b] and has an infinite or oscillatory discontinuity at x = a. Then the improper R i e m a n n integral of f ( x ) on ( a ,b] is defined I. h
f ( x ) d x if the limit exists. If f ( x ) is continuous on the interval
by lirn t+O+
a+c
[a,b) and has an infinite or oscillatory discontinuity at x
=
b, then the i m -
proper R i e m a n n integral of f ( z ) on [ a ,b) is defined by lirn
1"'
t+Of (4d x if the limit exists. (2) Suppose that a function f (x) is piecewise continuous on the infinite interval (-oo,b]. Then the improper Riemann integral of
LM b
f ( a ) on (- oo, b] is defined by lirn M j m
f ( x )d x if the limit exists. If f ( x )
is piecewise continuous on the infinite interval [ a ,GO), then the improper
Riemann integral of f ( x ) on [a,oo) is defined by lirn
M+m
ihf
f ( x ) d x if the
limit exists. (See infinite discontinuity, oscillatory discontinuity.)
infimum. The infimum of a set of real numbers S is a real number A such that A 5 z fur all z E S arld such 1llil.l fur every rlurllber B > A, lhere exists J: E S such that x < B. The infimum of S is also called the greatest lower bound of S and is denoted inf S. (See lower bound.)
A.1. Terms from Advanced Calculus and Linear Algebra
427
infinite discontinuity. A function f (x) has an infinite discontinuity at a point xo if at least one of lim f ( a )or lim f ( x )is infinite. See Figure A.1. x+x;
x+x;
FIGURE A.1. Examples of infinite discontinuities at x = 0.
inner product of vectors. The inner product of two vectors v
=
(v (I), u ( 2 ) , . . . , v(n)) and w
denoted ( v ,w) , is the number ( v ,w) = vectors, orthonormal system.)
=
( w ( l ) , w ( 2 ) ,. . . , w ( n ) ) :
xi=,v ( k )w ( k ). (See orthogonal
interval. An interval is a subset of R of the form [a,b] = { z :a < z < b ) , (a,b]= { x : a < x < b } , [a,b) = { x : a < x < b ) , ( a , b ) = { x : a < x < b } , (-00, b] = { x :x I b ) , (-oo, b) = { x :x < b ) , [a,m ) = { x :x a ) , ( a ,m ) = { x :x > a ) , or (-ca,oo) = R, for some real numbers a < b. julxlp discolltilluity. A fu~lctionf ( z ) has a junxp disco~xti~xuity at a point xo if lim f ( x ) and lim f ( x )both exist as finite numbers but are unequal.
>
x,.;
xwx;
We adopt the convention that if a function has a jump discontinuity at a point xo, then it is undefined at that point. See Figure A.2.
least upper bound. See supremum, limit. Given a function f (x) defined on an interval I, and given real numbers x0 and L , lim f ( x ) = L means that for every E > 0, there exists a x+xo
6 > 0 such that i f x E I and x - x o l < 6 , then If(x)-LI < E . lini f ( x ) = L x+x;
means that for every 6 > 0, there exists a 6 > 0 such that if x E I , x > zo, and - z o l < 6, then I f ( x ) - LI < E . lim f ( x ) = L means that for every
IJ:
x+x; E
> 0, there exists a 6 > 0 such that
if x E I , x
< xo and Ix - xol < 6, then
4213
Appendix A. Review of Calculus and Linear Algebra
FIGURE A.2. Left: Example of a jump discontinuity at Example of an oscillatory discontinuity at ~c = 0.
If
(:c) - LI
=
0. Right:
< E . lirri f ( x ) = cu, rriearis that for every A l > 0. there exists a .I+J(,
b > O s u c l i t h s t i f . r : ~I ; t n d I ~ : - x ~ I< 6 , t l i e r i I f ( z ) >A[.
lirn f ( . c ) = -m .r'+S(,
nieans that for every Al > 0 there exists a 6 > 0 sucli that if .r E I arid 1.c - xol < b thcn (a)J< -hl.
If
linear combination. A linear combination of a collection of functions { fj(s)}r=i defined or1 ;m iriterval I, is a function of tlie form h(.c) = a,
f 1 (z)+(L~ (.c) + + a fN~ (.c) =
N
a,
fj (7) for
some constands {aj},;=,
]=I
A linear con~binationof a collection of vectors {v.,},:!, form x
=
zJxl bJ vj for some corrstants {h, }El.
is a vector of the
N
linear transformation. A function T from R7"nto Rtnis a linear transformation if for every pair of vectors x arid y in R n ,and every pair of real nurnbers u and b, T ( u x by) = n T(x) b T ( y ) .(See matriz representation of a linear transformation.)
+
+
lower bound. A number A is a lower bound for a set of real numbers S if A 5 z for every z E S. (See least upper bound, lower bound, greatest lower bound, supremum, infimum.) matrix. An m x n matrix is an array or numbers arranged in m rows and 12 C ~ ~ U I L We I ~write S. A ={ ~ ( i , j ) } ~ < z ~ ~ , ~ ~ ~ < ~ matrix multiplication. The product of an n a x 111atrix A = { u ( i ,j ) } and an n x p matrix B = {b(i, j ) } is the matrix AB = C = ( c ( i ,j ) } , where c ( i ,j ) = Cz=,a ( i , k ) b ( k ,j ) . matrix representation of a linear transformation. It is always possi-
A.1. Terms from Advanced Calculus and Linear Algebra
429
ble to represent a linear transformation from R.n into Rm as an m x n matrix with respect to a given pair of orthonorrnal bases. Specifically, if { ~ ~ } r = ~ is an orthonormal basis for Rn and if {wi)z?s an orthonormal basis for RTn,then we say that T is represented by the matrix T = {(T(vi), wj)). In this case, let V be the n x n matrix whose colunins are the vectors v, and let W be the m x m matrix whosc columns are the vectors wj. If x is a given vector in Rn, then T ( x )= W T V ~ X(See . transpose of a matrix,
adjoint of a ,matrix, vector, rnatmx multiplication, orthonormal basis, Linear transforn~ation.) monomial. A monomial is a function of the form z~~for some n E Z + . (See polyr~omial.) n-times continuously differentiablc on an interval. A function S ( x ) is n - t i m e s continuously diflerentiable o n a n interval I if the nth derivative f ( n )( x ) , defined recursively by
where f ( ' ) ( z ) = f (z), is continuous on I. In this case, f(z)is said t,o be Cn o n I . C0 on I means that f (z) is continuous on I. A functiorl f (x) is CO" on I if it is Cr' on I for every n E N. open interval. An open interval is an interval of the form ( a . b) = { . T : n. < z < b ) , for sorrie real numbers a < b. orthogonal matrix. A11 n x n matrix is orthogonal if its rows forrn an orthorlorrrlal system. In this case, its columns will also form an orthonormal system. (See inner product of vector:^, orthonomnal system.) orthogonal projector. Given a subspace ll.1of R7"or Cr", and an orthonormal basis {wi}:=, for &I, the orthogonal projector onto A4 is the d linear transforniation PM defined by PM(x)= Gill ( x ,wz)w,. (See sub-
space, linear transformation, ortilonormal system.) orthogonal vectors. A pair of vectors v and w are orthogonal if (v,w) = 0. (See i n n e r product of uectors.) orthonormal basis of vectors. An orthonormal system of n vectors in Rn or C7Lis an orthonorrnal basis for R 7 k r C n . If { v , ~ ) ?is =~ an orthonorma1 basis for RrL(or C n ) , then any vcctor x can be written uniquely as x = C;=~X, vi) vi. orthonormal system of vectors. A collection of vectors {vi)zn=, is an orthonormal system if (vi,vJ)= b(i - j ) . oscillatory discontinuity. A function f (z) has an oscillatory discontinuity at a point z o if f (z)is not continuous at xo and if it has neither a jump nor an infininte discontinuity at z o . See Figure A.2. (See jump
discontinuity, infinite d~is~o~r~l'i~ru'ity.) piecewise continuous. A furiction f ( m ) is piecewise continuous on a finite interval I if f ( z )is continuous at each point of I except for at most finitely
430
Appendix A. Review of Calculus and Linear Algebra
many points. A function f (x) is piecewise continuous on an infinite interval I if it is piecewise continuous on every finite subinterval of I . (See infinite intenial, finite intenial, ,cubinter?ial.) piecewise polynomial. A function f (x) defined on R is a piecewise polyand a nomial function if there is a collection of disjoint intervals {In)nEN collection of polynomials { ~ ~ ( x ) such ) , ~that ~ f (z) can be written in the 00
form
f (4=
C P,,(.P) XI,. ( 4 . rr=l
polynomial. A polynomial is a function of the form p(z) = a"
+ a1 z +
N
a2 z 2
+ - - . + U N x N = C a1 z2for some coristants {ai}N,". 2=0
R i e m a n n integral. The Riemann integral of a function f (x) continuous on a finite closed interval I
=
[a,b], denoted
f (z)dx or
f (z)d z , is > 0,there is a 6 > 0 /a
the number v with the following property: For every t such that for every choice of numbers {.~,,)f=~ such that u = xo < 2 1 < - . . < Z N = b and such that (xitl - x i ) < 6 for 0 5 i 5 N - 1; and for I
every choice of numbers z T E [xi,
N-1
I
f (25) (xi+1 - xi) - v < E .
sequence. A sequen,ce of nvm,berv is a collection of numbers indexed by some index set Z.Typically, Zwill be the integers Z, the natural numbers N, or the nonnegative integers Z + . Such sequences will be denoted by { c , ) , , ~ ~ {, c * ) , , ~ ~or, { c ~ , )Z+ , ~rcspect,ively. ~ A seqz~enceof fun.ction.s is a collection of furictions indexed by Zand denoted {y, ( x ) ) , ~ ~ . s p a n . The span of a collection of vectors is the set of all finite linear combinations of vectors in that set. (See linear combination.) step function. A function f (z) defined on R is a step function if there is a collection of disjoint intervals {In)nEN such that f (z) can be written in the 00
form f ( z ) =
C a,, xi. (x) for some constants {a,,)nEN.A step function is n= l
also referred t o as a piecewise constwnt f u r ~ c t i o ~ ~ . subinterval. An interval I is a subinterval of an interval J if I
C J.
s u b s p a c e . A subset M of the vector space Rn (or C n )is a subpace if it is closed under the formation of linear combinations. That is, if X I , x2 are in hf,then so is ax1 + b x 2 , for ally real (or complex) iiumbers u and b. There will always exist an orthonormal system of vectors { w ~ ) $ = where ~, d 5 n such that M = span{wi). The number d is the dimension of M. The collection { w ~ ) $is~said t o be an orthonormal basis for M. s u p p o r t e d o n an interval. A function f (x) defined on R is supported on the interval I if f (x) = 0 for all x $ I.
A.1. Theorems from Advanced Calculus
431
supremum. The s u p r e m u m of a set of real numbers S is a real number A such that z A for all .7: E S and such that for every number B < A, there exists z E S such that B < x. The supremum of S is also called the least u p p e r bound of S, and is denoted s u p s . (See upper bound, lower bound, infimum.) transpose of a matrix. The transpose of an m x n matrix A = { a ( i ,j ) ) is the n x m matrix = { u ( j , i ) ) . (See matmx, adjoint of a m a t r i x . ) uniformly continuous on an interval. A function f (z)defined on an interval I is u n i f o r m l y c o n t i n u o u s o n I if for every E > 0, there is a h > 0 such that if 2 , y E I satisfy J z- y J < 6,then (z) - f ( y ) l < E . upper bound. A number A is an u p p e r bound for a set of real numbers S if z 5 A for every z E S. (See least u p p e r boun,d, lower hound, greatest lower bound, s u p r e m u m , infimum.) vector. An n-tuple of real or complex numbers is referred t o as a vector. In this book we rnake no distinction between row vectors (1 x n matrices) and column vectors (n x 1 matrices). If v is a vector and A an m x n matrix, then the product Av is defined as though v was written as a column vector. (See matrix, m a t r i x multiplication.)
<
If
A.2 Basic Theorems from Advanced Calculus Theorem A.1.
A linear combination, of functions continuous o n a n interval I i s also continl~,ouso n 1. A bin.ear com,bin,n.tion,of f?sn.ctions uniform.ly continuous o n a n intcrval I is also uniformly continuous o n I .
Theorem A.2. I f f ( x ) is continuous o n a closed, jinite interval I , then f
(z)
is uniform,lg con,tin,~ro7rson. T .
Theorem A.3.
I f f ( x ) I S contznuous o n a closed, jinite interval I , then f(x) i s bounded o n I ; that is, there exists a number M > 0 such that I f ( x ) l M for allx E I .
<
Theorem A.4.
I f f (x) is a complex-valued function o n an interval I , and if its (improper) Riemann integral exists as a finite number o n I , then
Theorem A.5.
(The Fundamental Theorem of Calculus) I f f (x) is piecewise continuous o n a n interval [ a ,b ] , then the function defined by
Appendix A. Review of Calculus and Linear Algebra
432
is continuous o n [a,b ] , and g ' ( x ) exists and equals f ( x ) at each point of continuity o f f (4.
Theorem A.6.
(Taylor's Theorem) Suppose that f ( x ) i s n - t i m e s continuously differer~liubleo n sorrce Z T L L ~ T ~ ~ U IU I ~071Luillii1yL I L ~po%7it LCO. Tlcen fo,r x E I , f ( z ) can be written. f(x)
=
f(x.)+(x-x")ff(z")+ +
where
< is s o m e point
'..
(IC +
(x-~
0
x " ) ~ f- ("-1) ~ (7~ l)!
f
) If
-
~
(xo)
+
(20)
(x - x " ) ~ ~ n!
f
'"I( E l 1
between xo and x.
Theorem A.7. (Mean Value Theorem) Suppose that f(x) i s C 1 o n s o m e interval I containing t h e point xo. T h e n for a n y x, y E between x an,d y such that
f ( z )- f
(I/) = f (E) (X
-
Y).
I, there
i s a point
<
(A.1)
Appendix B Excursions in Wavelet Theory In this appendix, we list some variations, extensions, generalizations, and applications of wavelet theory that were not covered in this book. We give very brief descriptions and then suggest some references for further study. Each topic mentioned here should be accessible (perhaps with some guidance) to anyone who has been able t o follow the preseritation and arguments elsewhere in this book. The list is suggestive but definitely not exhaustive. The goal is t o give the reader some perspective on the marly interesting aspects of wavelet theory. These topics car1 also serve as a source of ideas for semester or senior projects irivolving wavelet theory.
B. 1 Other Wavelet Constructions B. 1.1 M - band Wavelets In this construction, the scaling factor 2 is replaced by an arbitrary integer M > 2. In this case, the definition of MRA (Definition 7.12) chaiiges and Definition 7.12(d) becomes
(d)
A
function f (z) E Vo if and only if DhIl f (x) E
4.
Consequently, the two-scale dilation equation (7.7) becomes
Finally, instead of a single wavelet, $(x), such that
forms an orthonormal basis on R, we require M - 1 wavelets . . ., G n f - l (x) such that
{nrJi2$~(M'z- k ) .
( 2 ) )$j2(x),
~/'z(M'x- k), . . . , hf3/2$ n r - I ( M J-~ l ~ ) ) , ~ ~ ~
forms an orthonormal basis on R. This problem is discussed already in Daubechies, Ten Lectures on Wavelets in Section 10.2, and some of the early references are given there. In a discrete approach to this problem, the downsampling and upsampling operators (Definition 8.4) are modified as follows.
Appendix B. Excursions in Wavelet Theory
434
(b) The downsampling operator, J is defined by
( J c ) ( n )= c(A4n). ( J c )(n,)is formed by keeping only every M t h term in c ( n ) (c) The upsampling operator
is defined by
c ( n / A f ) if n evenly divides M, otherwise. ( ? c ) ( n ) is formed by inserting Ad of c ( n ).
-
1 zeros between adjacent entries
The problem then is to construct filters that give perfect reconstruction in analogy with the DWT as defined here. This a,pproach is described in KOvacevii: and Vetterli, Perfect reconstruction filter banks with rational sampling rates in one and two dimensions, Proc. SPIE Conf. on Visual Comrnunications and Irnage Processing, Philadelphia (1989) 1258-1265. One can also consult their book, I4Ciivelet.s and Subband Coding, Prenticp-Hall (1995).
B. 1.2
Wavelets with Rat.ional Noninteger Dilation Factors
Here the scaling factor 2 is replaced by a positive ratiorial rlurnber r = p l q > 1. The definition of MRA is modified exactly as above, and we ask what modifications to the usual construction of a wavelet basis rrirlst be made. The solution is given in the following theorem due to Auscher, which appears in his article Wavelet bases for L2(R) with rational dilation factor in the book Wavelets and Their Applications edited by Ruskai et al.
Theorem B.1.
There exist p
-
q functions
4,(x),1 < i 5 p
-
q such that
the collection
is a n orthonormal basis o n R.
A brief discussion of the example r = 3/2 is found in Daubechies, Ten Lectures on Wavelets, Section 10.4. For a discrete approach to this problem, see Kovacevii: and Vetterli cited above.
B . 1.3 Local Cosine Bases Recall that we could construct wavelet packet bases that were frequencylocalized to any dyadic partition of the interval [0, co) (Theorem 11.24).
B. 1. Other Wavelet Constructions
435
Also recall that in the case of compactly supported wavelet packets, the frequency localization was only approximate. In particular, the functions ~u;"(~) 1 had a clearly dominant frequency but also had significant "sidelobes" (see Figure 11.4). The idea of local cosine bases is t o construct a wavelet-like basis that is freqliency-localized t o an arbitrary partition of [0, a)and that is compactly supported in the frequency domain; that is, it has no sidelobes. A very readable article describing this construction is Auscher, Weiss, and Wickerhauser, Local sine and cosine bases of Coifman and Meyer and thc construction of smooth wavelets in the book Wavelets: A Tutorial in Theor,y and Applications, edited by Chui. Tlie construction is very beautiful and elementary (which is not t o say easy!)
B. 1.4
The Continuous Wavelet Yi-.ansforrn
111rrlotivating the Fourier trailsform in Section 4.1, we saw that passing from n frequericy representation of periodic functions t o one for functions on R required us to replace discrete frequencies e2"it('"2L), n E Z t o continuously defined frequencies e2""fy, y E R. Consequently the discrete representation
where
f^(n)=
/
.I
f (t) c ~ ~dt " ~ ' ~ ~
is replaced by an integral representation f (t)
- 1 f^(?)
e2"its d?,
R
where f^((?)=
j
f (t) ep2""" dt.
R
In the same way, we can seek t o replace the discrctc wavclct rcprcscntntion of a furletion
where w 4 > f ( 2 j k, ) = by a continuous representation
jRf (t) ozJTIC^ dt
Appendix B. Excursions in Wavelet Theory
436
where
P
The transform W+is referr~dt o as the cnn,tin,uou,uwn,?)el~t tm,n,,sfomn(CWT). There are many very good expositions of the CWT, including in Kaiser, A Friendly Guide to Wavelets, and Daubechies, Ten Lectures on Wavelets.
B. 1.5 Non- MRA Wavelets A
In the case of the bandlimited wavelet (Section 7.4.1), $(z) satisfies $(?)I2 = X [ - 1 , - ~ / 2 ) (7)+X,1/2,11 (7).Because of this particular structure, it is possible t o prove directly, that is, without using any facts related t o rnultiresolution analyses, that the collection { $ J , k (x)) is an orthonormal basis on R. This idea of examining orthonormal wavelet bases without consideration of a MRA structure has been carried out. The following theorem holds. Theorem B.2. Lct $(x) bc L' o n R. T h e n {g3,k:( 2 ) )is a n orthono7.mal basis o n R if and only zf:
1$(2'~)1' I 1 a n d
(a)
(b)
C G(2' (7+ k))y'^(? + k ) = o f o r all j 2 I
As there is no mention of MRA in this theorem, it is possible t o come up with examples of orthonormal wavelet bases for which there is no associated MRA. A good place t o start examining this type of wavelet basis is the book of HernAndez and Weiss, A First Course o n Wavelets.
B. 1.6
Multiwauelets
In this variation, the scaling function of a,n MR,A is replaced hy a finite collection of scaling functions, {cpl (z),. . . , cp,(x)). We define MRA as usual except now we say: (e) The collection
is an orthonormal (or Riesz) basis for Vo. In this case, we have a systern of two scale dilation equations.
B.2. Wavelets in Other Domains
437
which can be more efficiently expressed as
is a vector valued function, and for each n E Z,
is an r x r matrix. Now t,he natural questions to ask include the following. Is there a vector valued wavelet that generates an orthonormal basis on R? Do there exist smooth, vector valued wavelets'? Is there an analog of the QMF conditions for matrix valued scaling filters? Under what conditioris does the cascade algorithm converge?
B.2 Wavelets in Other Domains B. 2.1
Wavelets on Intervals
In Section 5.3, we defined the Haar basis on the iriterval [0, 1). An important property of this simple wavelet basis is that each furictiori in the basis is supported in [0, 1).It is important for a wavelet basis to have this property because it renders moot the problem of edge or. boundary effects that occur when the function (or image) being analyzed has conipact support. This problem is discussed in rnore detail in Section 8.3.2, where the relative rrierits of zero-padding versus periodization are discussed, and in Section 10.7.3, where some techniques for minimizing edge effects are discussed. The question arises: Can we construct a wavelet basis with good properties (smoothness, vanishing moments, symmetry, for example) that, like the Haar basis on [ O , l ) has all of its elements supported in [0, l)? This would completely eliminate the problem of edge effects while retaining the other advantages of wavelet bases. Many authors address this issue (including Daubechies in T e n Lectures on Wavelets, Section 10.7). A very nice treatment of this problem st,arting from a discrete perspective (so perhaps more accessible to readers of this
438
Appendix B. Excursions in Wavelet Theory
~al and ~ ~ ~ ~ l t i ~ . e s ~ l , u t ' i o n book) is found in Madych, Finite o s ~ t h o y o ~ L~u'IL$Jo~.~TLs analyses o n intervals, Journal of Fourier Analysis and Applications, vol. 3, no. 3 (1997) 257-294. There is also a nice description of the construction of Daubechies, Cohen, and Vial in Wavelets o n the interval and fast algorithms, Journal of Applied and Computational Harrnonic Analysis, vol. 1 (1993) 54-81.
B.2.2
Wavelets in Higher Dimensions
Defining wavelets in higher dimensions has been one of the rnost persistent and difficult problems in wavelet theory, and there is now a tremendous literature on the subject. The construction of wavelets in two dimensions presented in this book (Section 7.4.2, know11 as tensor product wavelets) is fairly straiglitforward and has been well known for a long time. The more general problem of duplicating the theory of wavelets in a higher dimerlsional setting is hard. As a starting point, the reader could consult the article of Madycli, S o m e elementary properties of rnultiresolution analyses of L2(RT')in the book Wavelets: A Tutorial in Theory a,nd Applica.tions edit,ed by CEnli. A very readable paper illustrating the complexity of the subject is Grijchenig and Madych, Multiresolutzon analysi.~,Haar bases, and self-similnr. tikings uf R" in IEEE Transactions on Information Theory (Marc11 1992). A paper in tlie s a n e issue by Kovacevid arid Vetterli called Nonscpar.able rnultidimenszonal perfect recon,struction ban,ks and wavelets for Rr' is a bit more challenging to read hut coritairls some fundaniental constructiorls and results in the theory.
B.2.3
The Lijling Scheme
The lifting scheme was developed by Sweldens and Herley and is essentially an easy way t,o find new filters satisfying the QMF conditions given a known QMF. It also leads to a new implementation of the DWT that essentially cuts the processing time in half. Moreover, the ideas also lead to the ability to construct wavelet bases on domains such as the sphere. Some good references are the following. Sweldens, T h e lifting scheme: A custom-design constmction of biorthogonal wavelets in Applied and Computational Harmonic Analysis, vol. 3 (1996) 186-200. Schroder and Sweldens, Building your own wavelets at home, ACM SIGGRAPH Course notes (1996). Schroder and Sweldens, Spherical wavelets: Eficiently representing functions o n the sphere, Computer Graphics (SIGGRAPH '95 Proceedings) (1995).
B.3. Applicatioris of Wavelets
439
B.3 Applications of Wavelets B. 3.1
Wazlelet Denoising
The probleni of removing noise from a signal or image is in rnany ways similar to the problem - of cornpression. Giveii a signal f ( : c ) , we wish to pr.oduce all estilrlate f (x)that we (leein to be a fairly faithful representation of - f ( T ) . In tlie compressiori problern, the main criteria for a good estimator f (z) is that it 11e somehow efficiently representable (for example. with only a few wavelet coefficients). In the derioising problem, the main criterion is that the estililator be frcte of '.noise." This noise may be duc to any nunlber. of sources and is usually nlodeled by some raildon1 process. An overview of wavelet estimatiorl techniques can be found in Clla~>ter 10 of Mallat's book, A n'avelet Tour of Sig~lillP~.ocessi~lg.. Estin~,ations are Approzimati~n~s. 111Strarig arid Ng~xyen,387-388. a brief description of denoisirlg via soft thresholding (first proposed by Donoho in Denoising by soft thre.sholding. IEEE Trarlsactiorls on I~lforrrlationTheory, vol. 41 (1995) 613-627) is given. In Burrus, Gopinatl.1, arld Guo. Wa wllets and Wave1c.t Transforrus:A Pri~i~er, brief descriptio~ls(wit11exterlsive references) of v;~riolis wavelet-based deiioisiilg tecliniqlles are given in Section 10.3.
B. 3.2 M'11,ltiscaleEdge Detection The idea here is to find the edges or tliscontinnities in an iiiiage by fillding a systerrlatic way of extracting the locations of the "large" wavelet coefficients of the irnage. By exarnirling how these coefficients decay with scalp, it is possible to identify the type of discoritiiluity (that is, a jump, or a discontinuity in the first derivative, or in the secorltl derivative, etc.). It is also possible in soriie cases. by nlearls of an iterative algorithm. to co~nplctely recover t,lle irnage frorrl its edges at all dyadic scales. Tlltlse itlexs are due to Alallat and a.re cxplairlecl very well in Chapter G of his book, A Wavelet Tour of Signal Processing-.
B. 3.3 The F B I Fingerprint Compression Standard A front-to-back image compressiori standard adopted by the FBI for tlie conlpression of fingerprint images is fully described in several publicatiorls and is available to the public. See first Brislawn, Fingerprints Go Digital, Notices of the AhIS, vol. 42 (Nov. 1995) 1278-1283 for a general overview of t,he cornpression standard. More details can be found in the following publications: Brislawn. Symmetric Extension Transforms and The FBI Fingerprint Im,age Compression Specification, in the book Wavelet Image and Video Compression, edited by Topiwala, Chapters 5 and 16 (1998) and references therein.
Appendix C
References Cited in the Text Below is a list of references cited in this text. It is by no means intended to be comprehensive. There are rliariy excelleril and exlerisive bibliographies on wavelets available (for example, in Mallat, A Wavelet Tour of Signal Processing) . 1. P. Auscher, Wavelet bases for L2(R) with rational dilation factor in Wavelets and Their Applications, M. B. Kuskai et al., eds., Jones and Bartlett (1992) 439-452. 2. P. Auscher, G. Weiss, and M. V. Wickerhauser, Local sine and cosine bases of Coifman an,d Meyer and the construction of smooth wavelets in Wavelets: A Tutorial in Theory and Applications, Chui, ed., Academic Press (1992) 237-256. 3. J. Benedetto, Harmonic Analysis and Applications, CR.C Press (1997).
4. J. Benedetto, C. Heil, and D. Walnut, Uncertainty principles for timefrequency operators, Operator Theory: Advances and Applications, vol. 58 (1992) 1-25. 5 . G. Beylltin, R. Coifman, and V. Rokhlin, Fast wavelet transforms and numerical algorithms, Communications on Pure and Applied Mathematics, vol. 44 (1991) 141-183. 6. W. Boyce and R. DiPrima, Elementary Differential Equations and Boundary Value Problems, Third Edition, Wiley (1977).
7. C. Brislawn, Fingerprints go digital, Notices of the AMS, vol. 42 (Nov. 1995) 1278-1283. 8. C. Brislawn, Symmetric Extension Transforms and The F B I Fingerprint Image Compression Specification, in Wavelet Image and Video Compression, Topiwala, ed., Kluwer Academic Publishers (1998) Chapters 5 and 16. 9. R. Buck, Advanced Calculus, Third Edition, McGraw-Hill (1978). 10. C. Burrus, R. Gopinath, and H. Guo, Wavelets and Wavelet Tkansforms: A PL-imer,Prentice-Hall (1998). 11. C. Chui and J. Wang, A cardinal spline approach to wavelets, Proceedings of the American Mathematical Soceity, vol. 113 (1991) 785-793. 12. C. Chui, ed., Wavelets: A Tutorial in Theory and Applications, Academic Press (1992). 13. J. Brown and R. Churchill, Fourier Series and Boundary Value Problems, Sixth Edition, McGraw-Hill (2001).
442
Appendix C. References Cited in the Text
14. A. Cohen, I. Daubechies, and J.-C. Feauveau, Biorthogonal bases of compactly supported wavelets, Communications on Pure and Applied Mat,hematics, vol. 45 (1992) 485-560. 15. A. Cohen, I. Daubechies, and P. Vial, Wa,uelets on the interval and fast algorithms, Journal of Applied and Computational Harnlonic Analysis, vol. 1 (1993) 54-81. 16. I. Daubecllies, Ten Lectures on Wavelets, Society for Industrial and Applied Matherriatics (1992). 17. S. Deans, The Radon Dansform and Some of Its Applications, Wiley (1983). 18. D. Donoho, Denoising by soft th,re.sholding, IEEE Transactions on Information Theory, vol. 41 (1995) 613-627. 19. H. Dyrn and H. McKean, Fourier Series and Integrals, Acadernic Press (1992). 20. G. Folland, Fourier Analysis ant1 Its Applications, Wadsworth & Brooks/Cole (1992). 21. AT. Frazier, Ii~troductionto Wavelets through Linear Algebra, SpringerVerlag (1999). 22. K. Grocherlig and W. Madych, Multiresolutio7~analysis, Haar bases, and self-similar tilings of RrL,IEEE Transactior~son Information Theory, vol. 38 (1992) 556-568. 23. C. Heil and D. Walnut, Continuous and discrete u~avelettransforms, SIAAl Review, vol. 31 (1989) 628-666. 24. E. HerriAndez and G . Weiss, A First Covrse on Wavelets, CRC Press (1996).
25. J. R. Higgins, Five short stories about the cardinal series, Bulletin of the AMS, vol. 12 (1985) 45-89. 26. J. Horvatl-1, An introduction to distributions, The Arrierican Mathematical Monthly, vol. 77 (1970) 227-240. 27. G. Kaiser, A Friendly Guide to Wavelets, Birkhauser (1994). 28. D. Kammler, A First Course in Fourier Analysis, Prentice-Hall (2000). 29. T . Korner, Fourier Analysis, Cambridge Uriiversity Press (1988). 30. J . KovaceviC and M. Vetterli, Perfect reconstruction filter banks with rational sampling rates in one and two dimensions, Proceedings of the SPIE Conference on Visual Communications and Image Processing, Philadelphia (1989) 1258-1265. 31. J. Kovacevid and M. Vetterli, Nonseparable multidimensional perfect reconstruction banks and wavelet for Rn, IEEE Transactions on Information Thcory, vol. 38 (1992) 533-555. 32. J. KovaceviC and M. Vetterli, Wavelets and Subband Coding, PrenticeHa11 (1995).
Appendix C. References Cited in the Text
443
33. W. Madych, Finite orthogonal transforms and multiresolution analyses on intervals, Journal of Fourier Analysis and Applications, vol. 3, no. 3, 257-294 (1997). 34. W. Madych, Some elementary properties of multiresolution analyses of L ~ ( R in ~ )Wavelets: A Tutorial in Theory and Applications, Chui, ed., Academic Press (1992) 259-294. 35. S. Mallat, A TIVavelet Tour of Signal Processing, Academic Press (1998). 36. Y. Meyer, Wa,velets: Algorithms a,nd Applica,tions, Society for Industrial and Applied Mathematics (1993). 37. F. Natterer, The Mathematics of Comprlterized Tomography, Teubner (1986). 38. U. Neri, Singular Integrals, Lecture Notes in Math,ematics, vol. 200, Springer-Verlag (1971). 39. Y. Nievergelt, Wavelets Made Easy, Birkhauser (1999). 40. A. Papoulis, Signal Analysis, McGraw-Hi11 (1977). 41. F. Rashid-Farrokhi, R. Liu, C. Berenstein, and D. Walnut, Waveletbased multiresolution local tomograph,y, IEEE Transactions on Irnage Processing, vol. 6 (1997) 1412-1430. 42. S. Roman, Introduction to Coding and Information Theory, SpringerVerlag (1997). 43. S. Roman, Coding and Information Theory, Springer-Verlag (1992). 44. M. B. Ruskai et al., eds., Wavelets and Their Applications, Jones and Bartlett (1992). 45. E. Stein, Singular Integrals and Differentiability Properties of Functions, Princeton University Press (1970). 46. G. Strang and T . Nguyen, Wavelets and Filter Banks, Wellesley-Cambridge Press (1996). 47. R. Strichartz, How to make wavelets, Arnericarl Mathematical Monthly, vol. 100 (1993) 539-556. 48. P. Schroder and W. Sweldens, Building your own wavelets at home, ACM SIGGRAPH Course notes (1996). 49. P. Schroder and W. Sweldens, Spherical wavelets: Eficiently representing functions on the sphere, Computer Graphics (SIGGRAPH '95 Proceedings) (1995). 50. W. Sweldens, The lzfting scheme: A custom-design construction of biorthogonal wavelets, Applied and Computational Harmonic Analysis, vol. 3 (1996) 186-200. 51. P. Topiwala, ed., Wavelet Image and Video Compression, Kluwer Acadernic Publishers (1998). 52. J . Walker, A Primer on Wavelets and Their Scientific Applications, CRC Press (1999).
444
Appendix C. References Cited in the Text
53. J. Walker, Fourier Analysis, Oxford University Press (1988). 54. M. V. Wickerhauser, Applied Wavelet Analysis from Theory to Software, A. K. Peters (1994).
Index k2 sequence, 88, 89, 109-111, 165, 173, 182, 183, 201, 202, 289-292, 298, 302, 308, 309 z-transform, 88, 93, 231, 262, 263 adjoint approximation, 148,149, 221223, 233, 236, 316, 317, 329 detail, 148, 149,221-223, 233, 236, 316, 317, 329 of a matrix, 107, 143, 148, 233 of an operator, 222 approximate identity, 37, 38, 4042'44-46,65, 66,68, 212, 25 1 approximation, 5 , 8 approximati011 space, 134 Auscher, 434, 435 hackprojection, 410, 414 bandlimit, 81-83, 85, 179, 180, 184, 210 handlimited function, 81-83, 86, 179, 180, 184, 210, 213 MRA, 173,179,180,188,340 wavelet, 188, 189, 195, 436 bandwidth, 81 Benedetto, xiv, 38, 252 Berenstein, 412 Bessel's inequality, 49, 50, 54, 166, 171, 173, 201 Bcylkin, 397 biorthogonal scaling filter, 325, 373, 389
system, 289, 290, 293, 294, 296, 301-305, 307, 312, 319 wavelet, 319, 320, 373, 438 Boyce, 397-399 Brislawn, 439 Brown, xiv Buck, xiii Burrus, 439 Carleson's Theorem, 111 Cauchy-Schwarz inequality, 6-8, 20, 48, 70, 71, 85, 135, 141, 172, 255, 297-299, 310, 403 center of mass, 168, 169, 191, 194196 C h i , 31 1, 435, 438 Churchill, xiv Cohen, 319, 438 Coifman, 397, 435 compression, of images, 151, 154, 256, 278, 371, 373, 380, 439 continuity of translation, 45, 69, 70 piecewise, 3, 4, 6, 32, 45, 83, 87, 110, 165, 168 uniform, 44, 45, 47, 72, 126, 175, 251, 252, 408 convolution circular, 105, 106 of functions, 68-71, 402, 404, 411, 412, 414 of signals, 90, 93,96, 102, 105, 222 Daubechies, 203, 264, 291, 319,
446
Index
433, 436, 438 polynomial, 264,269, 300,319 scaling filter, 235, 246, 266, 269, 280-285, 338, 339, 341,387-389,393-395,420, 42 1 scaling f ~ ~ n c t i o279 n, wavelets, 264, 278, 279 Deans, 406 dilation equation, 173, 215, 220, 221, 247, 405, 433, 436 DiPrima, 397-399 Dirichlet kernel, 38-40, 42 Donoho, 439 downsampling, 221, 222,433, 434 dyadic interval, 18, 115, 116, 120, 122, 103, 358 partition, 358-360, 434 point, 251 step function, 115-118, 120123, 125, 127, 141, 174 Dym, xiv Feauveau, 319 Fej6r kernel, 39, 42 Fej6r kernel, 41 filtered backprojection, 412, 414 Folland, 397 frame, 291 frame condition, 291, 301 Frazier, xiii Gauss kernel, 42 Gopinath, 439 Grochenig, 438 Green's function, 398, 400, 401, 413, 420, 421 Guo. 439 Haar scaling function, 117-1 19, 164, 186, 320 system, 11.5 system on [0,1], 118, 119, 141, 437
system on R, 115, 203, 249, 267, 320 system, comparison with Fourier series, 115, 127, 130 system, localization of, 128, 130 tra.nsform, discrete (DHT), 141144, 146-148, 150 wavelet, 186, 193, 249, 278, 373, 388, 389, 392 Heil, 252, 291 Herley, 438 HernAndez, 436 Higgins, 83 Hilbert space, 109 Hilbert transform, 402-405, 411, 414 Horvath, 38 infinite product, 237, 238 infinite product formula, 237, 239 Korner, xiv Korner, 35 Kaiser, xiii, 436 Kammler, xiv Kovacevik, 434, 438 Landau kernel, 42 Lebesgue integral, 38, 76, 110 measure, 76, 110, 111 measureable function, 76, 110, 165, 201, 202 linear space, 52, 57, 116, 143, 289 Liu, 412 localization frequency, 194, 195,338,340, 435 time, 128, 130, 193, 249, 337 Madych, 438 Mallal, x v i , 439 McKean, xiv Meyer, xiii, 180, 435
Index
MRA, 180, 182, 189 scalling function, 190 wavelet, 189 8, 55, 83, Minkowski's in~rlualit~y, 135, 139, 140, 168, 171, 172, 310 multiresolution analysis, 115, 163, 169, 171, 173, 174, 215, 217, 218, 220, 223, 236, 237, 245, 247, 433, 434, 436 bandlimited, 173, 179, 180, 188, 340 dual generalized, 302, 312 generalized, 300, 314 Haar, 174, 186 Meyer, 180, 182, 189 piecewise linear, 174, 186,206 scaling filter associated with, 173, 185, 199, 217, 218, 220, 223, 236, 245, 261, 264, 266, 269, 280-285, 302, 303, 316, 317 scaling function associated with, 169, 170, 173, 185, 186, 188-191, 194, 203, 215, 217, 236, 237, 245, 247, 257-261,302-304,312,316, 317, 338, 421, 436 spline, 163, 208, 209 wavelet filter associated with, 185, 199, 220, 316, 317 wavelet from, 163, 185, 191, 203, 249, 251, 303, 316 Natterer, 406 Neri, 403 Nguyen, xiii, xvi, 215, 439 Nievergelt, xiii operator approximation, 134, 170, 192, 218, 221, 302, 316, 317, 329, 335 backprojection, 410, 414
447
detail, 134, 136, 170, 192, 218, 221, 302, 316, 317, 329, 335 dilation, 79, 80, 117, 190, 191, 405, 414 integral, 397, 398, 403, 408, 410, 413-416, 418, 420, 42 1 modulation, 80 ramp-filtering, 41 1, 413 translation, 80, 90, 117, 190, 221, 405, 414 orthogonal subspaces, 197,350,351, 353, 358 orthonornlal basis, 163, 167, 170, 173, 191, 197, 202, 209, 211, 234, 251, 289-292, 346, 349, 350, 352, 353, 356-361, 371, 420, 421, 433, 434, 436, 437 system, 47-51, 53, 54, 56, 57, 85, 115, 118, 119, 122, 134-136, 163, 190, 198, 219, 250, 319, 347, 348, 350, 352, 357, 361-364, 42 1 system of translates, 164, 165, 167-171, 174, 178, 179, 182, 197, 199, 243, 244 system, complete, 52, 54, 57, 122, 133, 138, 154, 163, 191, 199, 204 Papoulis, 107 Parseval's formula, 74, 75, 85, 164, 204, 210, 227, 252 piecewise constant, 174, 183 continuous, 3, 4, 6, 32, 45, 83, 87, 110, 165, 168 differentiable, 32, 33, 35 linear, 174, 18G7188,206,256 polynomial, 10, 33, 206, 250 Plancherel's formula, 54, 72, 75,
448
Index
quadrature mirror filter conditions, 218,223,229-231, 236, 237, 245, 247, 249, 264, 265, 349, 351, 357, 437, 438 conditions, biorthogonal, 317320 quadrature mirror filter (QMF), 229, 231, 232, 236, 237, 239, 240, 243, 245, 246, 248, 261, 438 quadrature mirror filter pair, 318 Radon inversion formula, 408,409, 411, 412 Radon transform, 406-409, 412 Rashid-Farrokhi, 412 Riemann-Lebesgue Lemma, 64, 79, 204, 251, 252 Riesz basis, 289-293, 296, 300-305, 307, 308, 311-314, 436 Riesz transform, 405 Riesz-Fischer Theorem, 109, 110, 165, 166, 201 Rokhlin, 397 Roman, 376, 380, 385 Ruskai. 434 scaling filter, 153, 163, 173, 185, 199, 217, 218, 220, 223, 226, 232, 233, 236, 243, 245, 247, 249, 261, 264, 276, 302, 303, 316, 317, 325, 337, 346, 354, 373, 437 function, 153, 163, 169, 170, 173, 185, 191, 193-195, 203, 215, 217, 218, 221, 236, 237, 239, 240, 243, 245, 247-250, 257-261, 300-304, 312, 316, 317,
319, 337, 338, 346, 347, 415-417, 421, 436 scaling subspace, 206, 250, 300, 415, 418, 419 Schroder, 438 sequence, Cauchy, 12, 25 Shannon entropy, 361, 363 Shannon Sampling Theorem, 83, 179 spline functions, 9, 10 Stein, 403 Strang, xiii, xvi, 215, 439 Strichartz, 265 subspace, 163, 164, 167, 169, 170, 180, 360, 361, 364 Sweldens, 438 Topiwala, 439 triangle inequality, 8, 11, 23 trigonometric polynomial, 29, 52, 167, 169, 261, 262, 264, 267-269, 296, 300, 312, 319-321, 323 trigonometric system, 28, 29, 31, 48, 55, 56, 115, 127 uniform scalar quantization, 373 upsampling, 221, 222, 433, 434 vanishing rnomcnts, 249,250,252, 254, 257-261, 264, 279, 319, 346, 372, 373, 389, 412, 419-421, 437 vector space, 107 Vetterli, 434, 438 Vial, 438 Walker, xiii, xiv, 33-36 Walnut, 252, 291, 412 Wang, 311 wavelet filter, 163, 185, 199, 217, 218, 220, 221, 232, 233, 236, 316, 317, 319, 373
Index
function, 163, 185, 191, 193195, 203, 217, 249, 258260, 303, 316, 317, 415, 417, 421 wavelet basis, 115, 163, 164, 180, 185, 186, 191, 196, 197, 235, 264, 372, 397, 400, 405, 412, 416, 434, 436, 437 wavelet packet subspace, 346,350, 356, 358 wavelet space, 136 wavelet subspace, 197, 346 Weiss, 435, 436 Wickerhauser, xvi, 435
449