This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
) 2. It is conjectured that, for 1 < p < 2n/(n + 1), these operators are of strong type (p,p) precisely when S > S(p) — n{\/p - 1/2) - 1/2. Progress on this conjecture can be found in Fefferman,12 Carleson-Sjolin,4 Bourgain, 2 Wolff;39 it remains open for n > 2 and p close to 2n/(n + 1 ) . A natural strengthening of the above conjecture is the claim that 5*W is of weak type (p,p) whenever* 1 < p < 2 n / ( n + 1 ) . This claim has been verified "The necessity of the condition on p was shown in Fefferman.13 0, where $}>•* is the portion of ft on the set {|ft| > 2 " } . One defines the corresponding functions Kf'*, etc. in the obvious manner. The sum in (10) can now be rewritten as
<8-l-p
(2.18)
for all J-separated {J/J} in [0, l ] 2 , all {a,} which satisfy (S2 £V \aj\q')1/q' < 1, and all pj G (1,2). Let xjj — Xj, pj = Tj and a, = (62M)~1/q for j = 1 , 2 , . . . , M . Then by (2.18)
/x^AM*) 1 /"' < M|JSI|1/P' < 11*11^ < 6-'-"{82M)l">', which implies (2.16).
□
16
W. Schlag
At this point it might be instructive to consider those bounds on fi that cor respond to the points P, R, 5, T in Fig. 2.1. By Lemma 2.2.1, P : p = 5/2,9 = 5
n<\-3/2M^2
(2.19)
R:p
n<M
(2.20)
/x<
(2.21)
^A"1*-1.
(2.22)
= l,
g = co
S:p=l,
9
=1
T:p=2,
9= 2
Not surprisingly, inequalities (2.20) and (2.21) are trivial, whereas (2.22) follows (up to a | log 6| ' factor) from (2.7). Our main goal will be to show (2.19) (the result below will involve a | log S\ factor, though). In order to do this we shall need an improved version of the L 2 statement, i.e. inequality (2.22). 2.3. The L2
theory
Before formulating the result, we consider an example. E x a m p l e . Let 10 S < lOp < r < 1/2 and define JS = { x e R 2 : l - p < |X| < 1} and A = \fpjr. It is easy to see that F = {MSXE > A} ~ B(0, r) and M ~ r2/S2. To determine fx, note that $ will be approximately constant on E1 = {x:l-p<
|x| < 1 - p/2} .
Hence n\Ei\ ~ /
$~\M5.
JEX
Thus
We shall prove below that this improved version of (2.22) holds in general (up to a | log 81 factor) with r replaced by the typical distance of two intersecting annuli (for a precise version of this see p. 22). To this end we need a refined version of the L 2 inequality (2.7). First we recall a result from Ref. 2.
IP —► Li Estimates for the Circular Maximal Function
Lemma 2.3.1. Let K e L1^) aj=
17
assuming K differentiable. Define for j e Z sup |AT(0|, |*|~2>
Pj=
sup |{VAT(0,OIsuch that supp(f) C {Rd : 2 J _ 1 < |f| <
Then for any fixed j andfeS sup\f t-
1
*Kt\
2j+1} D
a
L (R<<)
By well-known decay properties of daT (see Proposition 1.1.1 above) Lemma 2.3.1 implies that HA*/IU»(tf) < II/IIL»(H»)
(2-23)
for any / € 5 whose Fourier transform is supported in {R2: 2^ _ 1 < |£| < 2^ +1 } for some j > 0. The following proposition shows that this estimate can be improved if one restricts the maximal function to a small ball. We prove this fact by combining Bourgain's original argument with Lemma 2.3.2 below. Proposition 2.3.1. There exists an absolute constant Co so that for any j = 1,2,..., all f G <S with supp(f) C {R 2 : 2*" 1 < |£| < 2> + 1 }, and all 0 < t < 1, x0 e R 2 l l ^ / | | ^ ( f l ( I O , 0 ) < Cot1/2||/||L>(R>) •
(2-24)
Proof. We may assume that x 0 = 0. Choose cutoff functions %j) G C Q ° ( R 2 ) with V = 1 on B(0, 1), 77 G C*£°(l/2, 4) so that r j = l o n (1, 2), and
= 1>(t-lx)r,(r) [
e 2 ™<
to(r\{\)$(2-1
$/(£)<%.
Jit?
Let {r T } T be a 2 _ J net in [1,2]. Suppose rT
18
W. Schlag
sup \Alf\2
[2\JVpf\>dp + 2-i f J\ h
= 2iA +
\^A^ \dP
dp
2-jB.
By Proposition 1.1.1, do- has the representation £ « ) = K{e 2 ^lu,(|£|)}
(2.25)
dku(s) dsk
(2.26)
withw G C°°(0,oo) and
for all fc = 0,1,2,
< ( i + H ) -l/2-fc
Hence the integral of A can be written as
m,m2-jok2-jom)W)<%d£,
f Adx= f f Js?
(2.27)
Jit? Jv?
where (ICI-ICD) J-oo JV?
x
xl>2{x)T]2{r)Lj{r\i\)uj{r\l\)dxdr.
(2.28)
Integrating by parts and applying (2.25) shows that
\m,D\ < 2-ji2(i+t\t - f i)-a(i + IKI - \i\\y2, provided that |£| ~ |f | ~ 2 J . Lemma 2.3.2 and Schur's lemma yield / Jvt?
Adx<2-H\\f\\l
L
Bdx<2^||/||22,
Similarly,
and the proposition follows.
□
The following lemma is true because the ||£| — |£|| factor reduces the twodimensional scaling in the integral below to one dimension.
V —> Lq Estimates for the Circular Maximal Function 19
Lemma 2.3.2. Let 0 < t < 1. Then
sup / (i+t\t - i\r2(i + IKI - i£nr2 dtzt-1. Proof. Fix a £ € R2. Then, on the one hand,
(i + *K-£ira(i + IKI-K"ll)-a#
•^{C:|€-«l<|. = l«l/2}
c £
(1 + *2>W
ai<|€-|/2
.
(i + Hfl-HlD-a^
•fofc-fl**}
,-, /•l«l+2' 2 1 £ (l+^)- 2>|^|- / (l + | r - ^ | | ) - 3 r d r
~
2'<|€|/2
2><|£|
2H<1
t-1<2i<|(|
~ r1 + t~2t ~ t-1. On the other hand, J{t \(-i\> l«l/2} /
( l + ^ - ^ l ) - 2 ( l + ll^|-|CH)-2^ .
.
.
(i
+ t |£|)-2(i +
|| f |_ K -||)-ade
•/{«: l«-€l>l€l/2.ll€|-|«ll
+/
.
( i + ^ - ^ i r a ( i + ii«i-ifiirade
J
it- ll«l-l€ll>l€l/2}
A + B. The first term in (2.29) can be estimated as follows: ^-(l+^l)-2/
(•3|C|/2 /•3ICI/2
(l + | r _ | e | | ) - 2 r d r
•/|£l/2 'ie"l/2
-(i + 'llir^lSr 1 . For the second term compute
(2.29)
20
W. Schlag
(1 + ^ | ) - 2 ( 1 + | ^ | ) - 2 ^
JS~/ 3|||/2<|«|}
+/ [
(l+*|l|)- 2 (l + |il)- 2 «
._ ICI/2>|«I}
(t|Cl)-2(l + ICI)- 2 ^+/
=/ l
■'{«:«-
(1 + kl)- 2 ^
•'{«: 3|€|/2<|€|
a
2
+(i+*ili)- (i+ieir KT < l + |logt| + l < t - x .
D
R e m a r k . (2.28) above shows that Proposition 2.3.1 is essentially equivalent to the following estimate for the two-dimensional wave equation. Let u solve □ u = 0,
u(0) = / ,
ut(0)=0.
Then there exists an absolute constant Co such that /
\u(x,t)\2dxdt
/
JO
(2.30)
JB{x0,r)
for all 0 < r < 1. It might be interesting to ask whether such an estimate can hold in IF with p ^ 2. Interpolating (2.30) with Sogge's sharp local smoothing conjecture, 15 i.e. f
f
|u(x,r.)| 4 dxdr.
(2.31)
Jo JK>
with £ > 0 yields ( f Wo
f JB(XO,T)
''
\u(x,t)\"dxdt)
(2.32)
/
for 2 < p < 4, x0 G R 2 , 0 < r < 1, £ > 0 and all / e 5 . Solving the wave equation above with initial condition / equal to a smooth version of Xcs(o,r) shows that the exponent 2/p—1/2 is optimal. Moreover, as in the case of local smoothing, (2.32) cannot hold for p ^ [2,4] or with e — 0 if p > 2. It is standard to pass from / as in the statement of Proposition 2.3.1 to general / € L 2 . This is done in the following corollary.
V —> V Estimates for the Circular Maximal Function 21
Corollary 2.3.1. There exists an absolute constant Co such that for any f G L 2 (R 2 ), io G R2 and 0 < 6,t < 1,
\\Msf\\»m*o.t)) < ^o* 1/2 |iog*| 1/2 ||/|| 2 .
(2.33)
The equivalent dual statement to (2.33) is: $3 a iXc«(vi,/»i)
^CollogS^S-H1'2
(2.34)
3
L (1P)
for all S-separated {t/j} in B(xo,t), Pi € ( 1 , 2 ) .
all {aj} for which S2 £ ) . I a j | 2 < 1> <*n^ <*H
Proof. Choose
%X6,r-
If Af denotes the usual Hardy-Littlewood maximal operator it is easy to see that \\MSf\\L'(B(zo,t))
£ l|M/o||LJ(B(io,t)) +
< t||Af/o||oo +
XI l<2Kf"'
E
SUP
l
\X6,r*fj\ L'(B(zo,t))
\\Wj\\vw*o,t))
l<2i<6~i
<<||/o||oo + *1/2
E
Mh
K2)<S-i 1/2 2
1 2
t||/0||2 + ^ | l o g * | / (
£
2
H/,11 )
1<2J<«-
(2.35)
In line (2.35) we have used a special case of Bernstein's inequality, namely ll/0||oc<||/0||2.
□
22
W. Schlag
In order to obtain information on /x from (2.33) we will determine the typical distance of the centers of two intersecting annuli in any collection of annuli. More precisely, we can specify the distance of the centers and the angle of intersection of those annuli that contribute most to the multiplicity function $. Following Ref. 8, we will accomplish this by applying the pigeon hole principle to our family of annuli satisfying (2.13). Define A = | log 8\~ A/2, p, = | l o g $ | ~ V & = |log5|" 2 M/2. Furthermore, for all i,j € { 1 , 2 , . . . , M } we let (for the meaning of A see Lemma 2.4.2 below) A i j = max(<J, \\x{ -Xj\-
\r{ - r,-||),
S ( e = { i : Ci n Cj ^ 0, t/2 < \n - xj\ < t,e < Aitj
< 2e},
(2.36)
(recall that C» = Cs(xi, r<) and C* = End). The pigeon hole principle asserts that there are numbers t £ [6,1], e £ [S, 1] such that |{C;:*£,>p}|>A|C,-|
(2.37)
for at least M values of j , say 1 < j < M. Indeed, let j be one of the (at least) M / 2 indices satisfying (2.13), i.e.
KC7:*>/*}|>^|C,|. Let x € CJ so that $ ( i ) > /i. We conclude that for some choices of t and e depending only on x and j we have
For if not, then (in the following sum t and e are dyadic)
t,e€lS, 1]
contradicting the choice of i . Similarly, we see that (2.37) holds for any j as above and for some choices of t and e depending only on j . Finally, applying the pigeon hole principle in j yields that there are t and e such that (2.37) holds for at least M values of j . Otherwise, the number of j ' s satisfying (2.13)
W —► Li Estimates for the Circular Maximal Function 23
would have to be strictly less than | logS\ M = M/2. Henceforth we will fix e and t to be those numbers. By essentially the same argument as in the second part of Lemma 2.2.1 we can now establish the refined version of (2.22). L e m m a 2.3.3. The multiplicity fj, satisfies the following a priori estimate with absolute constants C and b. fiKCllogSfX^S-H.
(2.38)
Proof. Let {ZJ} be a t-net and consider the quantities Mi(i) = card{l < j < M: Xj € B{zut)} M2{i) = card{l < j < M: x} G B{zu
, 2t)}.
Then, clearly
Y^Mi(i)^M
and
^M2(i)~M.
i
(2.39)
t
Since M = | log <5J— M/2 we conclude from (2.39) that there is a point of the net, say z0, such-that Mi = Mi(0) and M2 = M2(0) satisfy Mi > \\ogS\~ M2. Define *1 =
YJ j:\Xj-zo\S2t
XC
'i ■
As in Lemma 2.2.1 we distinguish two cases. If
\{E:9i>ii)\<jrl\M2S,
\Ei\ =
then by Corollary 2.3.1 (setting x 0 = z0) WMsXErWwBfr.t))
< Co I log 8\1/2tl'2\Ei\1'2 2
1 2
The expression on the left is > \(6 Mi) / . that Xj G B(zi,t) we have
Thus (2.37) implies that
or equivalently
.
(2.40)
Indeed, for any 1 < j < M so
24
W. Schlag
\Cjr\Ex\yX\Cj\, MSXEAXJ)
^
or
*•
Since the {ij} are 5-separated, our claim follows. On the other hand, the right side of (2.40) is < C0 | l o g J | 1 / V / 2 ( / i - 1 A M 2 5 ) 1 / 2 by our assumption on \E\\. Recalling the definition of Mi, M2, X etc., we obtain (2.38). If |-Ei| > p~1XM25 we use duality, i.e. (2.34). Letting xo = z0 in (2.34), replacing t by 2t, setting j/j = Xj, pj = r,, and a,j = (S2M2)~1^2 for j = 1 , . . . , M2, or dj = 0 otherwise, we obtain ||*i||a < Co \log6\l/26-HV2{52M2)1/2
.
The left-hand side is > fi\{E : $1 > A}| 1 / 2 = fi\Ei\l/2 > and the lemma follows.
iJL{ix-l\M25yi2 D
In Sec. 3.4 it is shown how to obtain (2.38) (in a slightly different form) without using the Fourier transform. The main tool turns out to be a two-circle lemma, cf. Lemma 3.4.4. 2.4. The three-circle
lemma
In the previous section implicit information about circles was used to prove an L2 bound on the maximal function and thus a bound on the multiplicity /j. In this section we shall attempt to use explicit geometric properties of circles in order to bound /x. The procedure we apply here was discovered by L. Kolasa and T. Wolff,8 who in turn use Marstrand's three-circle lemma, cf. Ref. 9 and Lemma 2.4.1 below. The underlying principle for that lemma is the following geometric observation. We call two tangent circles internally tangent if the smaller one is contained inside the larger one. By assuming that the centers of all circles under consideration are contained in a fixed set of diameter one, we shall henceforth rule out external tangencies. Fact. Given any three circles which are not internally tangent at a single point, there are at most two circles which are internally tangent to the three given ones.
Lp —► Lq Estimates for the Circular Maximal Function
25
Kolasa and Wolff observed that this fact can be combined with a basic result from extremal graph theory to control the total number of possible tangencies in a large collection of circles of which no three are tangent at a single point. Indeed, we have the following: Proposition 2.4.1. Suppose {Cj}f is a collection of distinct circles in the plane so that no three are tangent at a single point. Then card{(i, j): Cu C, ore tangent} < N5/3 . In particular, at least half the circles will be tangent to no more than < N2^3 other circles. Proof. Let Q = {{CuCh,C^,Cj3):
Ci and Cjk are tangent for k = 1,2,3} .
On the one hand, the above fact implies that
(N card(Q) < 2 On the other hand, with rn = card{Cj: d, Cj are tangent}, On the other hand, with U{ = card{Cj: d, Cj are tangent}, N
/
\
N
card(Q)>J2[ * > ^ K - 2 ) 3 / 6 . Thus N
N
^2ni = Y,(ni-2) + 2N i=l
t=l N
N
<(5>i-2)3) Vj=i
1/3
N2/3 + 2N
/ 1/3
N*'3 + N
< #5/3
Since card{(i,j): CitCj are tangent} = £ ) i = 1 "t, we are done.
□
26
W. Schlag
This proof is just a special case of a well-known argument that provides upper bounds for the maximal number of edges in a bipartite digraph with m edges and n sinks containing no Ks
Lemma 2.4.1. Let (xj,rj)?=1 the set S= ix€R2\\jB(xJte):3r€{l,2) *•
with
< X,e < 1. Consider
||z< - x\ - \n - r\\ < e
3=1
for i = 1,2,3 and |e<(x,r) - e,(x,r)| > A for i ^ j ,i,j = 1,2,3 >. Here ei(x, r) = — \Xi -
rsgn(ri - r ) . X\
Then Ns(S) < ( | )
A" 3
for any 0 < S < e. Remark. It is easy to see that the bound on Ng(S) can be attained. Proof. Let fi = {(x,r) € R2 x (1,2): \x-xj\>3e,r?ri, for i^j
|ei(x,r) - ej(x,r)\
,i,j = 1,2,3} 3
and F: fi -> R be denned by F(x,r) = ( | x i - * | - | r i - r | ) | L 1 . It is easy to see that the Jacobian JF of F satisfies
>A
IP —► Lq Estimates for the Circular Maximal Function
27
JF ~ |ei - e2\\ei - e 3 ||e 2 - e 3 | > A3 . Since card(F _ 1 (p)) < CQ for some absolute constant Co and all p € R 3 , we conclude that \F-l(B(0,2e))\<e3X-3. According to the definition of S there exists a function r : S —► (1,2) such that for every x € 5 we have \F(x,r(x))\ < e. Then clearly {(x,r): i e 5 , |r - r(x)| < £> C F - 1 (fl(0,2e)) Ui(x,r):xeSn
( J fl(x,-,3e), |r - r(z)| < e - |
and thus | 5 | < £2A~3.
D
The following lemma contains bounds on the diameter and the area of Cs(x, r)C\Cs(y, s). In various forms it appears in several papers on this subject, see, e.g. Refs. 1, 8, 9 and 15. Since the exact version we use here does not seem to be contained explicitly in any of these references, we provide a proof for the reader's convenience. Let A = max(||x-y|-|r-5||>($). L e m m a 2.4.2. Suppose x, y € R 2 , x jt yt \x — y\ < 1/2, and r,s € (1,2), r =fi s, 0 < 6 < 1. Then there is an absolute constant A such that (a) Cs(x, r) n Cs(y,s) is contained in a 6 neighborhood of an arc of length < Ay/A/\x — y\ centered at the point x — rsgn(r — s) i*"^. (b) The area of intersection satisfies \Cs(x,r)nCs(y,s)\
y/A\x-y\
'
Proof. Let z € Cs(x,r) D Cs(y,s). Then \z - x\ — r\ and \z — y\ = s\, where |r — r i | < 5 and \s — si\ < S. By simple algebra 2(z - x) • (y - x) = r 2 - s 2 + \y - x| 2 . Assume r < s. Then (2.41) implies 2 r i | x - y | ( l - c o s Z ( z - a ; , x - j / ) ) = (rt + |x - y\)2 - sj and thus
(2.41)
28
W. Schlag
^-'■'-W''~t- ( ;r n W^
<242)
If r > s one estimates Z(z — x,y — x) in a similar fashion. If A < 105 the bound in (b) follows from (a). Otherwise consider a = £{z — x, x — y) as a function of r\ and s\. Taking partial derivatives in (2.41) yields da , , . , , ■3— ri\x- y | s i n a — r\ 4- \x - y | c o s a , da - — n x - y\ svna = - s x . os 1 Thus
da
da <(|a||x-y|)-1~(A|x-y|)-1/2. dsi The last equality is true since A > 105 implies that (2.42) holds with ~ instead of <. Since r^ and si vary in a S interval, a will be contained in an interval of length < 6/y/A\x-y\ and (b) follows. □
+
Proposition 2.4.2 below is the main result of this thesis. Proposition 2.4.2. Ms is of restricted weak type (5/2, 5), i.e. for any f €
L1nL°°(R2) ||^/||5,oo
(2-43)
where b and C are absolute constants. Proof. In this proof we let Bs denote a constant of the form C\ log 5\ , where the values of C and b are allowed to vary depending on the context. By Lemma 2.2.1 we need to show IM < Bs\-3/2M1'2
.
(2.44)
C and b are determined implicitly in the calculation below. This will follow from the combinatorial argument in Ref. 8, which is based on the three-circle lemma, and the refined L2 bound from above. A is the absolute constant from Lemma 2.4.2. Case 1:
/e\1/2 X < 100Al-\
On the one hand, by (2.37) and Lemma 2.4.2,
(2.45)
LP -> Li Estimates for the Circular Maximal Function
i™ ~ X ? Xc< ~ card(5'-e)^ ~ M ^'
29
(2 46)
'
On the other hand, by Lemma 2.3.3 On the other hand, by Lemma 2.3.3
Thus
Hence, if M<(-)
(-)
,
(2.47)
then
= BiA~3/2M1/2,
(2.48)
where we have used (2.45) and (2.47) in line (2.48) to replace A respectively. If, on the other hand, M>(i)
(i)
,
and M 1 / 2 ,
(2.49)
then
= Bs\~3'2M1'2. Here we have used (2.45) and then (2.49) in line (2.50).
(2.50)
30
W. Schlag
Case 2:
(2.51)
A>100A(-J
Following Ref. 8 we let Q = {(j, *i,*2»*3): 1 < J1 < A/,ii,t2 > t3 6 5j e and the distance between any two of the sets Cj n Cj,, Cj D C i2 , C, D Q , is at least A/20}. (2.52) Suppose (j,h, 12,13) G Q. Then Lemma 2.4.2 implies that any two of ei = xj - Tj sgn(r,-
-n)-^-
\Xj -
Xi\
for i = ii, %2, 13 are separated by a distance A/20. Indeed, by that lemma, ei is the center of Cj n Cj and in view of (2.36), for any i € Sf e 7 inn Cj) C J )<< 2AJ2 A j | < A/50 diam (Cj
(2.53)
by (2.51). Lemma 2.4.1 therefore implies that card(Q) < ( j )
r3M3.
(2.54)
On the other hand, we claim that
(2 55)
-"^"('i^s)'-
'
This would clearly follow from min_ card({(ii, 12,13) € {Sfe)3
: The distance between any two of the sets
l<j
Cj n Ch, Cj n Ci2, Cj n C i3 is at least A/20}) > U ^ — \
.
(2. 56)
Denote the set on the left-hand side by Q^ and fix any j as above. By (2.46) the number of possible choices of ij is
Suppose that (11,12,13) e Q^.
We claim that
L" — ► L* Estimates for the Circular Maximal function card({t G Si
: ( i i , i 2 , i) € (?<»}) > ^ - ^ l - .
31
( 2 .57)
To prove (2.57) let Ry and ft2 be "rectangles" in Cj of length A/5 and width S centered at e^ and e^, respectively. Using (2.53) we conclude that i G S{e,
d n Rr = 0
for r = 1,2
implies that dist(Cj n C i r , Cj n Q ) > A/20
for r = 1,2.
Since |{C; \ (IZx U i l a ) : *>,« > /i}l > | | C i l , (2.57) follows from (2.46) (simply replace (2.37) with the previous inequality). Estimating the number of admissible choices of i 2 given a fixed i\ in a similar fashion proves (2.56) and thus (2.55). We infer from (2.54) and (2.55) that
,3 S B i r°(£)"'(^) s V..
„58)
Combining (2.58) and (2.38) yields
Hence, if
^ r m *-■''• 'e\,/6/*M1/J\s/,1
(2.59)
we conclude that
s ^ (( ^(^yv.,f = BSX~3/2M1/2.
(2.60)
32
W. Schlag
The expressions in (2.60) are obtained by estimating A by (2.59) and (2.51), respectively. If, on the other hand,
^GJTF) -"""■
(2.61)
then
,(r(«e)-V) w («!e)V = BS\~3,2M1'2.
(2.62)
To obtain (2.62), use (2.61) and the inequality
(D^y'v^,,
which follows from (2.51). Consequently, we have established (2.44) and the proposition follows. □ 2.5. Proof of Theorem 2.1.1 The following lemma states that instead of averaging over S annuli we can average over a mollified version of dar which is essentially concentrated on a 8 annul us. Lemma 2.5.1. Fix a radial function <j> 6 <5(R2). Suppose that for fixed 1 < P < 9 < oo, a < 3 \\Msf\\q <S-a\\f\\p forall0<6
Then \\M{f*
for allO <S < 1, / 6 5 .
IS -* Lq Estimates for the Circular Maximal Function
33
Proof. Write ^(|x|) = 4>{x). We construct a radial, non-increasing majorant for <j> as follows. Let p{r) = r2\<j>'{r)\ and define
V-(N)= r\4>'{r)\dr J\x\ or equivalently
f
1>(x) = / (XB)r(x)p(r)dr, Jo where B is the unit ball in R 2 . Note that /
ip(x) dx = I
JK*
ip(r)r dr •
p(r) dr =
JO
JO
Let / € S. Then sup \dat *{<j>6*f)\<\ Kt<2
I
+
L Jo
\ sup Js-i
[dut * (XB)TS] * | / | p(r) dr
) Kt<2
A + B. On the one hand
\\A\\q < Jof
lMiOr*|/|
<5~a
p(r) dr
r-"p{r)dr\\f\\t Jo
<*-aH/llp since a < 3. On the other hand, by Young's inequality with l + l/q = | | B | | , < r\\(XB)iOrS*\f\\ Js-l
< r
Js-i
"
P(r)dr I <J
UxBhorS ii
1/p+l/s,
Wf\\rP(r)dr *
[°° (6r)-V''p(r)dr\\f\\p
< 11/11, and the lemma follows.
□
34
W. Schlag
Proof of Theorem 2.1.1. Statements (2.9), (2.10) of Theorem 2.1.1 follow via Marcinkiewicz's theorem from the estimates at the points Q, R, T, P (see Fig. 2.1). To prove (2.8), suppose we are given any / 6 S. Let oo
/ = £/; j=0
be a Littlewood-Paley decomposition, i.e. supp(/o) C {R2 : |f | < 2} and supp(/j) C {R 2 : 2?-1 < \i\ < 2J+ 1 } for j = 1,2,.... On the one hand, (2.9), (2.10), and Lemma 2.5.1 imply l|A4/ill,
if(l/p>l/9)eQPuPT(seeFig.2.1)
(2.63)
for any e > 0 and j — 1,2, On the other hand, by the local smoothing theorem in Ref. 10 (see also Ref. 2 and 15) IWillp<2-^||/i||P, where 2 < p < oo, f3 = /3(p) > 0, and j = 1,2, (2.63) yields I W i l l , £ 2-*||/,-||p
(2.64) Interpolating (2.64) with
if (1/p, 1/9) G region I\QP U PT
(2.65)
for some 7 = j(p, q) > 0. Furthermore, IWo||,<||/o||p
(2.66)
by the Hardy-Littlewood and Bernstein inequalities. Finally, (2.8) follows from (2.65) and (2.66) by the Littlewood-Paley inequality. Up to a I log SI factor, (2.11) follows by interpolating the estimate at T, i.e. (2.7), with the ones at the endpoints R and 5: ll^/lli + lW/llcc^^ll/lli. To obtain the sharp estimates, let / = Y^f fj (2.67) is
De
(2.67)
as above. The analogue of
WMfih + WMfiUZnfih
(2.68)
(see Lemma 3.2.1 below). Interpolating (2.68) with the L 2 bound (2.23) yields \\Mfj\U < 2* 2 /»- 1 )||/ j || p
if (1/p, 1/q) € region IV.
(2.69)
V
-+ Lq Estimates for the Circular Maximal Function
35
(2.11) now follows from (2.69) by the same type of argument as in the proof of Corollary 2.3.1 provided that 1 < p. The estimates on the segment SR follow from the ones at the endpoints. We skip the details. D 3. Some Remarks and a Slight Improvement 3.1. Estimates
for the global maximal
function
Theorem 2.1.1 holds for the global version of M. Let Mf = sup
ra\dar*f\.
0
The number a can vary and will be specified below. One can pass from bounds on Ms to bounds on M by Littlewood-Paley theory. In Sec. 3.2 we will then apply these global bounds to obtain a version of Theorem 2.1.1 for the wave equation. The following lemma is essentially a calculation from Ref. 1. Lemma 3.1.1. Let I < p < oo, 1 < p < q < oo and /? € R. Suppose that \\Mf\\q<
2#||/||
(3.1) 2
j 1
2j+1}.
for all j = 1,2,... and f eS such that supp(/) C {R : 2 ~ < |£| < Then with a = 2 ( l / p - 1/q), \\Mf\\q < ll/Hp
ifP<0,q>
2, and 1 < p
\\Mf\\q < C H / H M .
t//3>0, e > 0
for all f € <S. Here L* are the usual Sobolev spaces. Proof. Let / = 53o°/i be as in the proof of Theorem 2.1.1. Recall that (9h-(x) = 2-2kg(2-kx). Then (with Af° denned in Lemma 3.1.2) Mf
= sup sup ra\doT * f\ fe>0r~2-fc
< sup sup r ° dar * k
fc>0r~2-
(S>)
+ sup sup 2 fc>0r~2-*
-fco
dar *
(EA) X
j>k
'
< M ° / + sup^2-fc«(>((/i)2k)2fc>0
j>k
2-ka(M(fj)2>h-> fc>0
j>fc
9\l/9
(3.2)
36
W. Schlag
The first term in (3.2) is bounded by Lemma 3.1.2. Assume first that 0 < 0. Let s = max(2,p). Using the inequalities of Young and Littlewood-Paley, we can then estimate the second term as follows. 9 \ 1/91
2_fco
(£|E "*>0'j>k
k
(^)2 )2->D 9 \ 1/9
= (E(E 2 " fco ii(^(/^)2-ii,) 9 ) 2
, 9\ V?
w
s(E(£ °- ii/iiipY) *>0 ^j>k
'
'
^(Ei^iip) \1/2II
11/ N\
j
/
(3.3)
Up
If f3 > 0 we compute, starting in line (3.3) above,
/ I
(E E
2_fca
<
2_
l
^^k>0 fc>0 j>k I i>Jb
| 9\\9\:1/911
-
(^(/i)2*) 2 - ) 'I ''
2 i£2 +e)
(E ^(E " ^k>0
S>*
^
'U 9\l/9
9
ii/iiip) ) 9 \ 1/9
<
(E 2~kPq (E2_je 11 a - A)^)/ 2 /II P ) ')
<^II/IIL S+ . for any e > 0.
D
In the following lemma we recall a well-known fact about certain maximal averages.
Lp —► L* Estimates for the Circular Maximal Function
37
Lemma 3.1.2. Let 0 < a < 2 and define for any f € L1 f~l L°°(R 2 )
Maf(x)=
sup r°" 2 / 0
\f(x-y)\dy.
JBr(x)
Then 1 1 a fbr- = - - 5 l < p < 9 < o o . q~p~2
Maf\\q < H/llp
(3.4)
Proof. For a = 0 this is just the usual Hardy-Littlewood maximal function. In case 0 < a < 2 , g < o o w e have the inequality
"-'MSjCl**Since the kernel is in weak L2^2 a\ (3.4) follows from Young's inequality. Finally, if q = oo, (3.4) follows from Holder's inequality. □ We can now state the global version of Theorem 2.1.1. Theorem 3.1.1. For any f € <S(R2), l|A
in region I \ ( Q P U P T )
(3.5)
and for any e > 0
IWII,
in region n ,
(3.6)
1 in region IV. P We have set a = 2 ( l / p - l/o) throughout.
(3.8)
P
1
2 7 =
Proof. The above statements follow from Theorem 2.1.1 and Lemmas 2.5.1 and 3.1.1. □
38
W. Schlag
3.2. Some estimates
for the wave
equation
It is well known that circular averages can be imbedded into an analytic family of operators. As in Ref. 18 we let
A?f = k?*f,
v
fes
where 22\a-l
k°(x)
T(a) (i-W )? For 3?(a) < 0 this is defined by analytic continuation. In particular,
dor*f = A%f and u(x, t) = tAf
f solves Du = 0,
u(0) = 0,
ut(0) = f.
By interpolation one can obtain analogues of Theorems 2.1.1 and 3.1.1 for the operators
Maf=
sup
\A?/\:
Kt<2
Maf = sup ta\A?f\. 0
Estimates of this type go back to Ref. 18 (see also pp. 518, 519 of Stein 19 and the references therein). For simplicity we shall restrict ourselves to the wave equation, i.e. a = 1/2. Solving the wave equation above with initial conditions given by suitable modifications of Examples 1-3 on p. 11 shows that the following theorem is optimal (up to e). For the meaning of the points Pi etc. see Fig. 3.1. Theorem 3.2.1. For any f G S let u be a solution of the wave equation as above. Then sup < 0-1 u(-,*)
< ll/Hp
in region A \ XPX U P^Y U YZ
0
and sup t a - 1 u ( - , t ) 0
with e > 0 and
two-dimensional
l^+.
(3.9)
V
- f Lq Estimates for the Circular Maximal Function
39
S=(l,l)
•X = (i.O)
Q = (o,o)
«-(l,0)
'
Fig. 3.1. Regions of boundedness in Theorem 3.2.1.
p 7
q
= JL_i_! 2p
7 =
in region B ,
(3.10)
in region C,
(3.11)
2 2q
2 3 P~2
in region D .
(3.12)
We have set a = 2(l/p — 1/q) throughout. The most interesting statement in Theorem 3.2.1 is probably the bound at P1 = (7/10, 1/10), i.e. sup 1<*<2
u(-,t) L»o(Ra)
l|Lio/7(RJ)
It is easy to see that this is exactly what would follow from the local smoothing conjecture. The same remark applies to the operators Ma. On the other hand, the position of all other points in Fig. 3.1 can be explained fairly easily by
40
W. Schlag
invoking the decay properties of ka. This is done in the following two lemmas. They are proved in general dimensions. Lemma 3.2.1. For all a = a + ir e C there exist constants Ca such that for any j = 1,2,..., o« / € <S with supp(/) C {Rd: 2 J _ 1 < |£| < 2j+1}, and 1
^ C^W'^ll/llixR-),
(3.13)
\\Mafhw < c^Q-HfUw -
(3-14)
\\Maf\\mV)
(3.15)
< Ca2-X»+W- 3 >/ a )||/||| i , ( R - ) .
Moreover, for every \a\ < S there exists a constant Cs so that for all —oo < < oo
T
|Ca|
(3.16)
Proof. Choose a radial function <j> € S(Rd) with supp(«£) C {R d : 1/4 < |f | < 4} and so that f = f *
Sup
Kt<2
< 2>'(1-ff>,
Atfa-i
(3.17)
(3.18)
L^R*)
for all 1 < q < oo. Since
M?/l
sup \J?f\ < sup \A?
,1<«2
inequalities (3.13) and (3.14) follow from (3.17) and (3.18), respectively. By Lemma 2.3.1
Wfhw where
< «)'\aj + /3i)l'a||/||L.(,-),
—► L1 Estimates for the Circular Maximal Function
V
41
CCJ = sup | P ( f ) | , |<|~2i
% = sup | ( V p ( 0 , 0 | . By (3.19) and (3.20) below Qi<2-'«"-
1
)/a+*),
a. < 2-i((<*-3)/2+«r)
Thus (3.15) follows.
D
Lemma 3.2.2. Let <j> £ S be a radial function such that supp(^) C {R d : 1/4 < |£| < 4}. Then for any a = a + ir € C, JV* = 0 , 1 , 2 , . . .
+ 2j\\x\ -
t\)~N
Moreover, the constants Ca,N
Proof. This will follow from stationary phase. We shall use the asymptotic expansion of ka derived in Lemma 2.2.3 in Sogge's book 16 : P ( 0 = e 2 -KI LJ+(\£\) + e-2'^w-m
,
(3.19)
where a;+ and w~ are € C°°(0, oo). Their decay is given by dk
Sf^W
(3.20)
k — 0 , 1 , . . . where Ck,a satisfy (3.16), as can be seen by Stirling's formula. Note that the representation (3.19) includes the surface measure do = k\. By definition of the A?, J°fa-i(x)=
e2™tk°(tt)$(2-*Odt
f
e2^x(+t^w+(£t)4>(2-j0d£
= [
JR*
= /+ + / _ . It suffices to consider J_. Introducing polar coordinates yields
42
W. Schlag
J_ = 2jd f Jo
e2ni2i^-t)ru^{2j\x\r)w-(2jtr)4>(r)rd-1dr
+ 2jd j Jo
X
e- 2 i r i 2 i ( | l l + t ) r w 0 -(2 J >|r)a;-(2 J 'tr)^(r)r < J - 1 dr
= A + B. Consider the first integral. |A| < CNVd(V\\x\
- t\)~N j f ° I (J^j
Y,
I
{u+(V\x\r)ujZ(2nr)4>(r)rd-1} (2 , >|) f c (l + 2 J >|r)-( < J - 1 )/ 2 - f c (2 J 't)'
fc+/<w'1/4 x (1 + 2 ^ r ) - ( d - 1 ) / 2 - a - ' dr < Ca,N2>d{2j\\x\
- t\)~N{\ + 2 j |i|)-( < J - 1 )/ 2 2-^d-^2+^
.
Hence
\A\
+
^\\x\-t\)-N
provided that 1 < t < 2. Estimating B in a similar fashion completes the proof. □ In the following proposition we move point P in Fig. 2.1 to position Pi in Fig. 3.1 by Stein's interpolation theorem. Proposition 3.2.1. Let f € S(R 2 ) so thatsupp(f) for some j = 1,2, Then for any e > 0
C {R2: 2*~l < |£| < 2 J + 1 }
I|M 1 / 2 /IIL>O(R*) < C , y | | / | | L i „ T ( R a ) •
(3.21)
Proof. By Eq. (4) in Ref. 18,
-***& = WZ\ f-VtfW1 1 (") Jo
- »2)a-l»d*
(3.22)
provided that 8?a > 0. Let a = e+ir. In view of (3.22), Theorem 3.1.1 implies that ||Ala/||5
Lp —► L* Estimates for the Circular Maximal Function
43
||^ 1+iT /Hoo < CrVWfWi (see inequality (3.13)) via Stein's theorem yields l|/ai/2/ll,<<7,2*||/||p,
(3-23)
where p -> 10/7 and q -* 10 as e -► 0. The proposition follows by interpolating (3.23) with
IM*1/a/lloo < 2>/2\\f\\u which is a special case of (3.13).
□
Remark. Just as in the case of circular means, the endpoint result (3.21) would follow from the sharp local smoothing conjecture.15 Namely, by that conjecture \\M1/2f\\4 < C £ 2-* 3 / 4 - £ >||/|| 4 for any / as in Proposition 3.2.1. Interpolating this with estimate (3.13), i.e.
ll^ 1/2 /ll=c<2>/ 2 ||/|| 1 yields (3.21). Proof of Theorem 3.2.1. Let f 6 S such that supp(/) c {R2: 2j~1 < |£| < 2j+1} for some j = 1,2,.... By (3.13)-(3.15) ll-M 1 / 2 /IU/3
if (1/p, 1/q) tQXUQZ\ {X, Z}
for some /? = /?(p, q) > 0. By what was shown in the first part of this proof ||A4 1 / 2 /||, < Ct2*\\f\\p for any £ > 0. By interpolation,
if (1/p, 1/q) € XPx U PtY U YZ
44
W. Schlag
\\M^f\\q
< 2~"\\f\\p
if (l/p, l/q) €A\XP,U
PlY U YZ
for some 7 = */(p, q) > 0. Thus (3.9) follows from Lemma 3.1.1 provided that q > 2. In Ref. 19, Stein proved (3.9) on the segment QZ \ {Z}. The theorem now follows by interpolating those estimates with the ones we just derived. □ 3.3. A alight
improvement
In this section we prove Remark (b) from Sec. 2.1. This will follow from an argument similar to the one in Sec. 3.4. However, we will not use (2.38), which contains a logarithmic factor, but rather the following geometric observation. It might be worth noting that (3.25) is not sufficient for the (5/2, 5) estimate. First we need to carry out the pigeon hole argument from p. 22 with different weights. More precisely, let a : [<$,1] x [5,1] —► [0,1] be such that a(2k6,2l5) = l.
£
(3.24)
fc>0,l>0
Define A = \{t,e) = a(t,e)\/2, fi = fi(t,e) = a(t,e)n, M = M(t,e) = a(t,e)M/2. The same discussion as on p. 22 then shows that the inequali ties (2.37) will hold for an appropriate choice of e and t and with the param eters A, fi, M we have just defined. In this section we will always mean these parameters. Lemma 3.3.1. The multiplicity p, satisfies the following a priori inequality: H<eWl*/*8-3.
(3.25)
The constant in (3.25) is absolute, in particular it is independent of the choice of a. Proof. Let 1 6 Cs(xj,rj) n Cs(xi,n) where A ^ ~ e and |XJ — Xj\ ~ t. Then |r» — r^| < t and the angle /C(xi,x,Xj) ~ y/ei. Hence n has to lie in a rectangle of dimensions approximately (st)1^2 x t , By S separatedness the maximal number of x<'s is bounded by (3.25). D Proof of Remark (b) on p. 13: Case I:
A
(3.26)
IP —f Lq Estimates for the Circular Maximal Function
45
Here Co is the same constant that appeared in the case distinction in the proof of Proposition 2.4.2. For later purposes we rewrite (3.26) as
Since card(5(£) < min(M,*2/J2) by the definition (2.36) of the set 5/ e , we conclude from (2.46) that
It is convenient to rewrite (3.25) as follows: /^\1/2/
/
\3/2 M
"*■-'(?) ( s i * )
'"-
<"•>
Let 0 < (3 < 1/2 and 0 < r? < 1 - 20. Multiply (3.28) with the /9th power of (3.29) to wit
/ A \ ( 1 ~2^-'))/ 2 /XKfi/2\
(l~2^+f?)/2
x M( 3+2/3 -")/ 4 min f 1, - £ * V \ M6 ) where we have used (3.27) to obtain (3.30). Let
(3.30)
for r > 0. Then we can choose C\ sufficiently large (depending only on r) so that a satisfies (3.24). For suitably small r depending on 0 and r) we conclude from (3.30) that = \-aM*,
H < X-V+I/V+WMW-""^^™
(3.32)
where 1 < a < 2 and a —¥ 1 astj —*0,a—t2asT] —> 1 and 0 —> 0. Moreover, 1
-±Z = 6 + 21-2/-'1>6 1-/3 1 + 2/3 + 77
and
I±S - 6 1-/J
46
W. Schlag
as r\ —► 1 — 2/3. Thus, according to Lemma 2.2.1, (3.32) corresponds to weak type p -> q estimates with 2 < p < 3 and q > 6. Case 2:
A>C0(-J
•
(3.33)
We rewrite (3.33) as
^(f(flSi)V. With Q defined by (2.52), we infer from Lemma 2.4.1 that card(Q) < ( j ) r 3 M m i n ( l , J^j
.
(3.35)
The minimum occurs on the right-hand side because for a given choice of xit we must have |Zi2 - x», | < |x<, -Xj\ + ^ - Xj\ < 2t. Here 0'»*i>*2i»3) is a typical element of Q (cf. (2.52)). This follows immediately from the definition (2.36) of S3t€. Clearly, the same inequality also holds for Xi3. Combining (3.35) with (2.55) we therefore obtain
Suppose /3 > 0. Writing A6 = A^A6-'3 in (3.36) and estimating A^ by the /3th power of (3.34) yields
(3.37) If 1 < /3 < 3 we can choose r in (3.31) so that (3.37) implies /i < A - ° M 1 - ( 1 + Q ) / 6
for 1 < a < 2 .
(3.38)
Combining (3.32) and (3.38) and applying Lemma 2.2.1 shows that ||/a«/||,,oo < «5 2/9 - 1/p ||/||p,i
for 2 < p < 3, 6 < q.
(3.39)
IP —► Li Estimates for the Circular Maximal Function
47
Inequality (2.9) of Theorem 2.1.1 with e = 0 now follows from (3.39) and the obvious estimates for q — oo via Marcinkiewicz's theorem. □ 3.4. The l? theory
revisited
The purpose of this section is to rederive the crucial L? estimate (Lemma 2.3.3) by purely geometric/combinatorial arguments. This is accomplished by using a two-circle lemma, see Lemma 3.4.4 below, and a suitable iteration scheme. First we introduce some notation, following Ref. 22. By C we shall always mean a family of circles with tf-separated centers lying in some fixed compact set. Let C = C(x,r) = {y GR 2 : \X - y| = r} , C = {y e R 2 : r - p < \x - y\ < r + p}, A(C,C)
=
\\x-x\-\r-f\\,
d(C, C) = \x - x\ + \r - f\,
(3.40)
Cec C?t = {C € C: e/2 < A(C, C) < e, t < \x - x\ < 2t, x\r-f\<
4t}.
The following proposition clearly implies that /x <
Ce5-e\-l8-H
for any e > 0, which is essentially the same as (2.38). The only difference is that we have powers 8~c instead of logarithms, which is inessential as far as the proof of Theorem 2.1.1 is concerned. P r o p o s i t i o n 3.4.1. For all rj > 0 there exist constants Cv > 0 and 6o > 0 with the following properties: given any family C there exists A C C with \A\ > C~l\C\ such that \{x € Cs: nfl(x)
> r ^ A - 1 * - 1 * } ! < X6
for all C G A, all 6 < e < t, 0 < A, provided 8 < 80.
(3.41)
48
W. Schlag
This will follow by iterating Lemma 3.4.5 below. The idea of considering weak type inequalities for the multiplicity in the full range of the parameters originates from Ref. 22. First we establish some technical lemmas needed in the proof of Lemma 3.4.5. Lemma 3.4.1. Suppose A(Ci,C 2 ) - /?, d(Ci,C2) 0 > 10£. Then
= T, t > 2e, and that
Kffnc£|<£-^.
(3-42)
Proof. Let F(x,r) = (|x — x\\ — \r — r\\, \x — x 2 | — \r — r 2 |) be denned on
n = {(x,r): t < \x-Xj\
<2t,j = 1,2}.
Suppose (x,r) € fi and let e< = x — Xj/|x — Xj| and CTJ — sgn(r — r 7 ). Then
(
ei
-<Ti
e2
-01
and thus JF{x,r)
~ |Z(eia 1 ,e 2 CT 2 )| = <*•
2
Here JF denotes the sum of the squares of all 2 X 2 subdeterminants of DF. Suppose (x,r) G fi and \F(x, r)\ < e. Then there exist r'j such that \rj— r'j\ < e and \x - Xj\ = \r - r'j\ for j = 1,2. Moreover, | x - X j I > t > 2e and ||»" —rj| — | x - i j | | <e imply that s g n ( r - T j ) = o~j. Consider first the case where o\ = <72. Then |xi - x 2 | 2 = |x - x i | 2 + |x - x 2 | 2 - 2(xi - x) • (x 2 - x) = |r-ri|2 + |r-r2|2-2|r-r'1||r-r2| + 2|x — Xj||x — x 2 |(l — cosa) = l r i - T'i? + 2 | x - xi||x - x 2 |(l - c o s a ) , and thus t2a2>/?T-8T£>/3T. If <Ti ^ CT2, then by a similar calculation t 2 a 2 > /3r. We conclude that
W -¥ L* Estimates for the Circular Maximal Function 49 JF
~^T
oaQnF-\D(0,e)).
By the coarea formula,
H1(F-1(y)nn)dy=
/ JD(0,c)
f
JF(x,r)dxdr,
yflnF-'(D(0,£))
eH>^\SlnF-HD(0,e))\ which implies \Projv(SinF-l(D(0,6)))\<e-^.
D
For example, set /? = e and t = T in (3.42). Then (3.42) says that the total number of circles in C^1 n Cft3 is no larger than the maximal number of circles in C^tl that pass through one of the points C\ n C 2 . To estimate \Cftl n C ^ 2 | in those cases where Lemma 3.4.1 does not apply we will use the following observation. Roughly speaking, it says that if Cj = C(XJ, 3/4) are internally tangent to C(0,1) for j = 1,2 with the points of tangency being far apart, then C\ and C2 cross each other. Lemma 3.4.2. Let Cj = C{XJ,TJ) for j = 0,1,2. Suppose A(C0,Cj) and \r0 — r,-| < 4pj, pj <\XQ- Xj\< 2pj for j = 1,2. Assume
< 0j
a = Z(sgn(ri - r0)(xi - x 0 ),sgn(r 2 - r 0 )(x 2 - x0)) > Ao.m±M£l±£lL V P1P2
(3.43)
for some sufficiently large constant AQ. Then A(Ci, C2) > Pi + (hProof. Let crj = sgnfo - r 0 ). Then \xi - x2\2 = | n - x 0 | 2 + \x2 - x0\2 - 2 ( i ! - x 0 ) • (x 2 - x 0 ) , In - r 2 | 2 = | n - r0\2 + |r 2 - r 0 | 2 - 2(rx - r 0 )(r 2 - r 0 ) , |*i - Z2I2 - In - r 2 | 2 = | i ! - xo| 2 - | n - r0\2 + |x 2 - xo| 2 - |r 2 - r 0 | 2 + 2a 1 a 2 (|r 1 - r 0 ||r 2 - rQ\ - |x x - x 0 ||x 2 - x 0 |) + 2
50
W. Schlag
A(CuC2)(\xi
- x2\ + \ri - r2\) > Pi/^Q 2 - &\P\ - 02p2 - pifh - P20i = PiPia2 - (/3i + fo)(pi + P2)
where we have used (3.43) in the last step. Furthermore, d{C\,C2) and thus finally
< p\ + pi
A(C1,C2)>-^-Q2>^(/31+/32)>/31+^2. Pl+ P2
□
Using Lemma 3.4.2 we can deal with the case /? < 10e that was left open in Lemma 3.4.1. As suggested by the case where C\ and C2 are tangent, we will show, roughly speaking, that any circle C G C^ 1 n C ^ 2 has to intersect the arc of minimal length on C\ that contains C\ n C2. Lemma 3.4.3. Suppose C2 G C%}. Then l # n C £ | < £ ^ ± ^ .
(3.44)
Proof. We may assume that T < 4t (otherwise C^ 1 n C£ 3 = 0). Let
Suppose C G Cft1 satisfies min(Z(x, 11, x2), Z(x, xi, - x 2 ) ) > Aojo , AQ being the constant in (3.43). In view of (3.40) we apply Lemma 3.4.2 with Co = Ci, C\ = C2, C2 — C, Pi = /3, fa = e, pi = T, and p2 = t to wit A(C,,C2)>£ + / 3 > e . In particular C ^Cft3. We conclude that any C G C^1 n Cft3 has to satisfy m i n ( Z ( x , x i , X 2 ) , / ( x , x i , - x 2 ) ) < Aojo ■
1? —> Lq Estimates for the Circular Maximal Function
51
In particular, the centers of all circles in C^1 n C%? are contained in a At x It A)7o rectangle centered at xi and thus
as claimed.
D
The following result is the aforementioned two-circle lemma. L e m m a 3.4.4. Suppose C2 G C^I. Then |C£nC5'l<£ndn(yi,-I.).
(3.45)
Proof. As before we may assume that T < At. Moreover, we may also assume that 2e < t. Indeed, since |C£' nC%\ < t2/62 either
F *
1
e
^ '■
y/J?F - 24
without loss of generality. In the first case 16e
K#nc£*|<'2 If on the other hand A(Cl,C2)
£
< 10e, then (3.45) follows from (3.44).
□
L e m m a 3.4.5. Let a G (0,1] be fixed and suppose that every C € C satisfies \(xeCp:lxCp''(x)>AX-1S-1t(-\
T } <Ap
(3.46)
for allS < p<e
Ux G C: rf''(x) > ^ l l o g ^ A - 1 * - 1 ^ - )
I J I < Xp
(3.47)
52
W. Schlag
for all C £ A, 6 < p < e < t and 0 < A. Here A\ is some absolute constant depending on A but not onC or the parameters (in fact, we can take A\ = co^/A for some absolute constant Co). Proof. Suppose there are at least |C|/2 many circles C € C with the property that | { x G C : nC^{x) > A i l l o g ^ V 1 * - 1 * ^
j j l > Xp
(3.48)
for some choice of 6 < p < e
for j = l,2,sgn(r - n )
< B,T < \xt - x2\ < 2r} .
(3.49)
Here 8 and T are chosen by pigeonholing so that the lower bound card(S) > | l o g i r 2 | 8 | ( | l o g « r ' * ^ i | l » g < r A - ' j ( J ) ° ' 4 f ) J
^WmtiSflfffl"
(3-50)
holds. To obtain an upper bound on card(S) we would like to estimate the number of choices for C given C\ and C2 by Lemma 3.4.4. However, (3.49) does not specify \r\ — r2\ so it is not clear whether C2 £ C^ (see (3.40)). On the other hand, we only need to consider those (Ci,C2) which also satisfy In - r 2 | < |xi - x 2 | + 2 e . Indeed, given any (C,C\, C 2 ) € 5 we can estimate | n - Fa| = |(ri - r) - (r 2 - r ) | = | | n - r| - |r 2 - r\\ < I In - r \ - \xi - x\\ + ||r 2 - r\ - |x 2 - x|| + ||xi - x| - |x 2 - x|| < 2e + \xi - x 2 | , as claimed. Let us assume first that r > e. Then
V
-* lfl Estimates for the Circular Maximal Function
53
|»*i — f"2| < \*i — Z2I + 2 T < 4 r ,
and thus C2 6 C^^ for any {C\, C2) that can appear in an element of S. Also note that such (C\, C2) satisfy
Cfnc^0. Hence card(S)<£
-£
|CS l nCg»|
(3.51)
To pass from line (3.51) to line (3.52) we have applied Lemma 3.4.4 to bound the cardinality of the intersection whereas the number of terms in the second sum of line (3.51) can be estimated by invoking assumption (3.46). In fact, here we have used the special case p = 0 of the following estimate card({C 2 e < $ : C{nCj
± 0}) < ^ J ^ ) °
|log*|.
To prove (3.53) write C[ as the union of p/y/rj3 rectangles {Rj}*=1.
m = m{X) = AX-1T5-x\J-\
(3.53) Let
J.
Then assumption (3.46) implies that (with m = m(A)) mcard({j: card({C*2 G C%: Rj nC$?
0}) G [m, 2m]})
By summing (3.54) over the |logJ| many dyadic values of m G [1,5 2] we obtain (3.53). We now show that the upper and lower bounds (3.50) and (3.52) are in compatible for large A\. Consider first the case e < /?. Then (keeping in mind that r < At) the right-hand side of (3.52) is
54
W. Schlag
< A|C||log<5|
SFJ3T(T\a/2t2 tet2
e
/t\a/2
which contradicts (3.50) for large A\. If /? < £ then the right-hand side of (3.52) is
„
,0/2/^0/2/^1/2
~»G)'(;) (I) (?)
rsy
,
<^ws(j) er /^/
where we have used a < 1 in order to pass to the last line. For large Ai this will again result in a contradiction. Recall that we have assumed T > e throughout. We will now treat the case T < e. As before, card(S)>|log«5|5|S|A?|QyQy
2
,
(3.55)
whereas trivially card(S)<|B|££<|B|^.
(3.56)
Inequalities (3.55) and (3.56) imply that A\ < 1, which is a contradiction for large A\. O Proof of Proposition 3.4.1. In view of Lemma 3.4.5 it will suffice to show the following. Claim. There exists an absolute constant A such that for any family C \lxeCp:fJ,c/t{x)>A\-16-1t(-} for all C G C, all 6 < p < e < t and 0 < A.
| | \<\p
(3.57)
Lp -+ L* Estimates for the Circular Maximal Function
55
If (3.57) failed for one circle C € C and one choice of the parameters then integrating fip" over C (as in Ref. 8) would yield
l"*'<,«,.£<*.£. which is a contradiction for large A.
□
R e m a r k . It is possible to prove Lemma 3.4.5 without considering variable p, i.e. with p = S. One thus obtains a smaller power of | log<J| in (3.47), but this is immaterial in the present context. Suppose we fix TJ. Given any collection C of circles with (^-separated centers, Proposition 3.4.1 implies that there exists A C C, so that |.4| > C^"1 |C| and that card({C e J& :CsnCs?
0}) < « T ^ - J
^-J
(3.58)
for all C e A (this can be seen by the same type of argument that leads to (3.53)). In particular, if C is the family of circles arising in the context of a restricted weak-type inequality for the maximal function, see p. 14, we define the multiplicity \i using A instead of C - this will merely introduce a S~v in the norms. It turns out that (3.58) then allows for a more transparent proof of Proposition 2.4.2. Indeed, suppose we are in the transverse case (2.45), i.e. A< Then for any C, € A, see (2.37),
A
S-K"(raf) s "'"<-*"(?) (?) • Thus, using our assumption on A, we conclude that
a < r V/2«r"/2 (?)
1/4
< S-'VX-WN1'* ,
56
W. Schlag
as desired. It is more complicated to deal with the tangential case in the same spirit. This seems to require a slightly different version of Marstrand's threecircle lemma from the one we have used, i.e. Lemma 2.4.1. Since the argument does not improve on what we already have, we do not write it out. Appendix A. Capacity and Hausdorff Dimension In this Appendix we establish a special case of the well-known connection between capacity and Hausdorff dimension. For any set E c R d , 1 < p < oo, and a > 0 we let the (p, a) capacity of E (strictly speaking, the Bessel capacity) to be CPtQ(E) = inf {||u||£P : u > 1 on a neighborhood of E} . Here L£(R d ) is the usual Sobolev-Bessel space with norm
n/ik(R') = ii('-Ar/ 2 /iip. Suppose p > 1 and ap < d. Then it is essentially a consequence of Sobolev's inequality that there exists a constant Co depending only on p, a and d such that < C0rd-<"> c - i r d - a P < Cpia(B{x,r)) for all balls B{x,r). This suggests that C p , a and Ud~aip are related. For a general statement in this direction see Theorem 2.6.16 of Ref. 23 and the references to the original literature given there. Here we only deal with a special case which is sufficient for our purposes. The following proof is an adaptation of the argument on p. 107 of Ref. 23 to the case of fractional Sobolev spaces. Proposition. Let 2 < p < oo, 0 < a < 1, and ap < d. Then Cp 0.
Proof. By the definition of capacity there exists a sequence u, G Lva such that Uj > 1 on a neighborhood of E for each j and ||uj||i,p < 2~K We will use the following result about the relation between Sobolev spaces defined in terms of Bessel potentials and Besov spaces. Let A p , ,, (R d ) be the space of functions for which the following norm is finite:
V —► Lq Estimates for the Circular Maximal Function
ll«lk-«(R-) = ll«llp + ^ JKd — j ^ p + s i — - d t )
>
(A-x)
where 1 < p, q < oo, 0 < a < 1. It was shown on p. 155 of Stein, 18 that L£(R d ) C A%p(Rd)
provided that p > 2.
(A.2)
Now let oo
and define u(x, r) = r~d I
u{y) dy.
JBr(x)
Since Uj > 1 in a neighborhood of E for each j , u satisfies lim inf u(x, r) = oo
for all x € E.
(A.3)
r-foo
Let /x be the measure
»{A)=f
j
m-wdydz.
Firstly, fj. is finite by (A.l) and (A.2). Secondly we claim that limsupr a p -' J - e fi(jB r (z)) = oo
for all x e E.
(A.4)
r-+0
Indeed, if for a fixed x e E there is an M < oo such that rap-dfi(Br(x))
< Mre,
then r~d f
\u(y)-u(x,r)\"dy
f
JBr(x)
f
JBr(x)
~
\u(y) - u(z)\" dy dz JBr(x)
LiJ*
= rap-dfjL(Br(x))
f \«v)-«*)\pd dydz dydz \y-z\"°+
< Mre .
In particular,
\u{x,2-i-l)-u{x,2-')|*
<M2"^,
57
58
W. Schlag
which would imply that limj_>oo "(z, 2 J ) exists, contradicting (A.3). Thus (A.4) holds. By a standard covering argument, see Lemma 3.2.1 of Ref. 23, we conclude from (A.4) that
Hd-ap+e{E)
= 0.
□
Acknowledgments It is a pleasure to thank my advisor, Prof. Thomas Wolff. He taught me the harmonic analysis I needed in order to carry out the research leading to this thesis and provided excellent guidance during my thesis project. I also wish to thank Prof. L. C. Evans, who was my advisor at the University of California at Berkeley. Most of what I know about partial differential equations I learned from him. It has been a privilege for me to study at the University of California at Berkeley and at the California Institute of Technology. I am grateful to these institutions for their generous financial support and to Prof. Thomas Wolff for two semesters of support. Bibliography 1. J. Bourgain, Averages in the plane over convex curves and maximal operators, J. Anal. Math. 47 (1986) 69-85. 2. J. Bourgain, On high-dimensional maximal functions associated to convex bodies, Amer. J. Math. 108 (1986) 1467-1476. 3. A. Cordoba, A note on Bochner-Riesz operators, Duke Math. J. 46 (1979) 505511. 4. K. L. Clarkson, H. Edelsbrunner, L. J. Guibas, M. Sharir and E. Welzl, Com binatorial complexity bounds for arrangements of curves and spheres, Discrete Comput. Geom. 5 (1990) 99-160. 5. K. J. Falconer, The Geometry of Fractal Sets. Cambridge Tracts in Math. #85 (Cambridge Univ. Press, 1985). 6. L. Hormander. The Analysis of Linear Partial Differential Operators I. Grund. der math. Wiss. #256 (Springer-Verlag, 1983). 7. S. Klainerman and M. Machedon. Space-time estimates for null forms and the local existence theorem, Comm. Pure Appl. Math. (1993) 1221-1268. 8. L. Kolasa and T. Wolff, On some variants of the Kakeya problem, preprint 1995, to appear in Pac. J. Math. 9. J. M. Marstrand, Packing circles in the plane, Proc. London Math. Soc. (3) 55 (1987) 37-58. 10. G. Mockenhaupt, A. Seeger and C. Sogge, Wave front sets and Bourgain's circular maximal theorem, Ann. Math. 134 (1992) 207-218.
I? —> Lq Estimates for the Circular Maximal Function
59
11. J. Peral, V estimates for the wave equation, J. Funct. Anal. 36 (1980) 114-145. 12. W. Schlag. A generalization of Bourgain's circular maximal theorem, preprint 1995, to appear in J. Amer. Math. Soc. 13. W. Schlag and C. Sogge, Local smoothing estimates related to the circular maxi mal theorem, preprint 1996, to appear in Math. Res. Lett. 14. A. Seeger, C. Sogge and E. Stein, Regularity properties of Fourier integral oper ators, Ann. Math. 134 (1991) 231-251. 15. C. Sogge, Propagation of singularities and maximal functions in the plane, Invent. Math. 104 (1991) 349-376. 16. C. Sogge, Fourier Integrals in Classical Analysis, Cambridge Tracts in Mathe matics #105 (Cambridge Univ. Press, 1993). 17. E. Stein, Singular Integrals and Differentiability Properties of Functions (Prince ton Univ. Press, 1970). 18. E. Stein, Maximal functions: spherical means, Proc. Nat. Acad. Sci. U.S.A. 73 (1976) 2174-2175. 19. E. Stein, Harmonic Analysis, Princeton Math. Series 43 (Princeton Univ. Press, 1993). 20. E. Stein and S. Wainger, Problems in harmonic analysis related to curvature, Bull. Amer. Math. Soc. 84 (1978) 1239-1295. 21. M. Talagrand, Sur la measure de la projection d'un compact et certaines families de cercles, Bull. Sci. Math. 104 (1980) 225-231. 22. T. Wolff, A Kakeya type problem for circles, preprint 1996, submitted to Pac. J. Math. 23. W. Ziemer, Weakly Differentiable Functions, Graduate Texts in Mathematics #120 (Springer-Verlag, 1989).
61
T H R E E R E G U L A R I T Y RESULTS I N H A R M O N I C ANALYSIS
Terry Tao Department of Mathematics, University of California at Los Angeles, 405 Hilgard Ave., Los Angeles, CA 90095, USA E-mail: [email protected] In this dissertation we prove certain regularity properties of three unrelated families of operators arising from separate problems in harmonic analysis. The first result concerns the classical Bochner-Riesz operators Ss on Euclidean spaces R n (as well as more general Riesz means on manifolds). By the work of C. Fefferman, P. Tomas, E. Stein and M. Christ, one can obtain regularity results on these operators from l? restriction theory. We encompass these re sults by using the restriction theorem to prove an optimal weak-type estimate for the index S = (n — l)/[2(n -I-1)], which is the sharpest possible result one can obtain from L 2 restriction theory alone. The second result addresses the question of the pointwise convergence of various wavelet sampling methods. In applications one often samples a func tion / = ^2. . ajiici>j,k by discarding all but the largest wavelet coefficients, leaving a reconstructed function of the form ^ . . . a^V^fe- We show that under general conditions the sampled function converges pointwise almost ev erywhere to the original function as A —> 0. This is achieved by approximating the sampling operator by a linear sampling method whose convergence was established by Kelly, Kon and Raphael. Our final result extends recent work by M. Christ, J. L. Rubio de FVancia and A. Seeger, on weak (1,1) estimates for rough operators in Euclidean spaces, to more general homogeneous groups. This is still a work in progress by the author, and the results presented here are somewhat partial in nature. We show that a homogeneous singular integral convolution operator is of weak-type (1,1) if it is bounded on L2 and the kernel is L log L on the sphere.
1. Genera] Notation For further details on some of the terms used below, the reader is referred to Stein, 30 Meyer20 or Folland and Stein. 16
62
T. Tao
We shall use the symbol C to denote various large positive constants depending only on the inessential variables, and e to denote various small con stants. In this section the inessential variables are the dimension n, the mani fold M, and the pseudo-differential operator P. In Sec. 2 the only inessential variable is the wavelet rp. In Sec. 3 the only inessential variable is the underly ing homogeneous group H. We shall write x = 0{y) or x < y for the statement that |x| < Cy, and write x ~ y if x < y and y < x. If x is a variable of integration, we use dx to denote Lebesgue measure and d#(x) to denote counting measure. When E is a set we use \E\ to denote the Lebesgue measure of E, and #E to denote the cardinality of E. On a Euclidean space R n we use ( , ) to denote the inner product, and define the Fourier transform by /(£) = / e - 2 , r ' ( x , ^ / ( x ) dx. If <j> is a function on R n , we define the Fourier multiplier operator 4>{D) by
#B)/(0 = *(0/(0More generally, we shall use the functional calculus to define operators <j>{P) whenever P is a suitable operator. For example, if P is self-adjoint and has the spectral resolution P = ^ j l j ^j£j, where the Xj are a discrete set of eigenvalues and e, are a mutually orthogonal spanning set of projections, then we can formally define <j>(P) by <j>{P) = Z ) j l i < H^j) e .r Let T be a linear operator. We use T[x, y] to denote the (distributional) kernel of T; thus
Tf(x) = JT[x,y\f(y)dy. If T[x, y] is supported on the set {(x, y) : d(x, y) < CK) where d is an ap propriate metric and R is a positive real, then we say that T is local at the scale of R. Finally, if A and A+ are linear operators, we say that A+ majorizes A if \A[x,y]\ < CA+[x,y] for all x, y, or equivalently that A+ is positivity preserving and \Af\ < CA+\f\ for all functions / . The following notation will be described for balls in a quasi-metric space, but will also be applied to Euclidean dyadic cubes. Suppose that H is a space with a quasi-distance d(x, y), such that the balls B(x, r) = {y: d(x,y) < r} satisfy the usual Vitali-type doubling and covering properties. li I = B(x,r) is a ball (or cube) and c > 0, we write r(I) for the radius (or sidelength) of / , xj for the center of / , cl for the dilate B(x, cr), and I& for an annulus of the form Cl — C~lI. If / is dyadic in the sense that r is a power of 2, and
Three Regularity Results in Harmonic Analysis
63
both / and i appear in an expression, we shall use the convention that i is the integer such that 2' = r{I) unless i is otherwise defined. By a wavelet we shall mean a 0-regular orthogonal wavelet tp on R, in the sense of Meyer.20 In other words, we require ip to be a function on R that is bounded and rapidly decreasing, satisfies the moment condition J i/;(x)dx = 0, and is such that the family {ipj,k} = {2j/2ip(2j • -k)} for j , k 6 Z form an orthonormal basis of L 2 (R). Note that no regularity assumptions are assumed on V; hi particular, our definition of a wavelet will include the Haar wavelet i> = X[0,l/2) - X [ l / 2 , 1 ) -
A homogeneous group shall be a canonical Euclidean space R n equipped with a group multiplication xy = x + y + P(x, y) and inverse x~l = —x + Q(x), where P and Q are polynomials consisting only of second-order and higher terms, and a family of dilations 5o(x1,...,xn)
=
(6a*x1,...,6a'>x)n)
for some constants 0 < a i < • • • < a n , such that i K+ J O I is a group automorphism for each 5 > 0. It can be shown that Lebesgue measure dx is both left- and right-invariant, and \S o E\ = 6N\E\ for all measurable sets E, where iV = QI + • • • + a n is the homogeneous dimension of H. We give H a quasinorm p by declaring p(x) = 1 on the Euclidean unit sphere S = {x: \x\ = 1}, and p(6 o x) = 6p{x) for all 8 > 0, x G H. The function d(x,y) = p(x-1y) thus becomes a quasi-distance. Let / be a function on some ball B(x, r) in a homogeneous group. We say that / is a normalized bump function adapted to I if f(xr °y) =
64
T. Tao
when n = 2 (Seeger 23 ), or when the operator is restricted to radial functions (Chanillo and Muckenhoupt 5 ). In Christ, 7,8 the claim is proved for p < po, where po is such that there is a (po,2) restriction theorem for the sphere in R n . The well-known restriction theorem of Tomas and Stein states that such a restriction theorem holds precisely when po < 2(n + l ) / ( n + 3). Thus the conjecture is known when b 1 < p < 2(n + l ) / ( n + 3). Our first result is that the conjecture also holds at the endpoint p = 2(n 4l ) / ( n + 3). Theorem 2.1. Suppose there is a (p, 2) restriction theorem for the sphere in R n . Then the operator Ss^ is of weak type (p,p). In fact, under the restriction theorem hypothesis we have the stronger es timate / |^)/(x)|2dx
here Mpf = M ( | / | ) / and M is the Hardy-Littlewood maximal function. We also show the analogous result for Riesz means on compact manifolds. More precisely, suppose that P is a first-order self-adjoint elliptic pseudodifferential operator on a smooth compact manifold M. We say that there is a discrete (p, 2) restriction theorem for P if ||X[Jfe,fc+i](-f )l|(p,2) ^ (1 + k)s^ for all k > 0, and consider the Riesz means SSR = (1 - P/R)s+, defined for R, 6 > 0. In Sogge26 it was shown that, if there is a (p, 2) restriction theorem for P, then S^ is of strong type (p,p) uniformly in R, for 5 > 6(p). Also, it was shown by Christ and Sogge10 that S^ ' is weak (1,1) uniformly in R for any P. Combining ideas from these results with the methods of Christ, 7 it was shown by Seeger22 that S'W was weak (p,p) uniformly in R for p < po, if one has a (po, 2) restriction theorem for P. By modifying the proof of Theorem 2.1, we have an end point version of the result in Seeger.22 Theorem 2.2. Suppose there is a (p, 2) restriction theorem for P. Then the operators S R are of weak type (p,p) uniformly in R. As in the Euclidean case, we have the stronger estimate /
|Si?"/Wl1*c
IMpf(x)
holding uniformly in R. b
Hardy space analogues of this conjecture for p < 1 are also known; see Stein et cU.31
(2)
Three Regularity Results in Harmonic Analysis
65
It is known that the (1,2) restriction theorem holds for any P. Furthermore, under an additional curvature assumption, one has a (p, 2) restriction theorem for all 1 < p < 2(n + l ) / ( n + 3). See Seeger,25 Sogge,27 Sogge.28 Combining these results with (2) and interpolating with the trivial case S > 0,p = 2, one has a corollary: Corollary 2.3. Suppose the co-spheres {£ € T*M\0: p(x,£) = 1} have everywhere nonvanishing Gaussian curvature for each x G M. Then the operators S^ are of weak type (p,p) uniformly in R when 1 < p < 2 and S > max((S(p), (n - l ) ( l / p - l/2)/2). The method of proof of Theorems 2.1 and 2.2 is a modification of that used by Christ, 7 which involved L2 Calderon-Zygmund techniques (as used in Fefferman 1 2 ), a dyadic decomposition of the Bochner-Riesz multiplier with special properties, and the Cauchy-Schwarz inequality. Our main innovations are the replacement of the dyadic decomposition with a decomposition into two terms (which depend on the level of resolution), and a greater reliance on a certain "locality" principle. In the Euclidean case the principle is the trivial observation that if a multiplier m has its inverse Fourier transform supported on a set of width 2', then the operator m(D) is local at the scale of 2*. On compact manifolds the corresponding principle is that if a function m on R has its Fourier transform supported on an interval of width T/R, then the operator m(P) is - apart from an error with good decay - local at the scale of 2i/R. 2.1. Proof of Theorem
2.1
Fix n > 2, and assume that 1 < p < 2 is such that a (p, 2) restriction theorem holds for the sphere. Write Ss = ms(D), where m'(£) = (1 - |£|2)$. and S = 8(p) = n ( l / p - 1/2) - 1/2. We have to show that ms(D) is of weak type (p,p). Since Mp is of weak type (p,p), it suffices by Tchebyshev's inequality to prove (1). By linearity we may assume that a = C~l for some large C to be determined later. Fix / , and apply the Calderon-Zygmund decomposition at height C to | / | p . This allows us to write / = g + J2] f>h where ||g||oo ^ 1) the bj are supported on disjoint dyadic cubes / and satisfy ||6/|| p ~ \I\l^p, and the I are such that £ / |J| ;$ 11/11?. Note that Mpf(x) > 1 whenever x e (J/ CI.
66
T. Too
Since ms is bounded, the contribution of g to (1) is acceptable. In fact, as m is also compactly supported, we may similarly dispose of the contribution of the small cubes. 0 Specifically, let g be g + J2i
/
-U, c '
E«»'(0)fc(*)
(3)
<=i
to complete the proof. To estimate this expression we make a decomposition of the multiplier m* for each i > 0, whose properties are summarized in the following lemma: Lemma 2.4. For each i > 0, there exists a decomposition ms = rm + rjiUi such that (i) The inverse Fourier transforms of m< and n* ore supported in the ball of radius 2' around the origin. (ii) For all £ € R n , we have ^ |ifc(fl|a < 1. (iii) For a certain large N, we have \m(t)\ < 2 " w ( l + 2*|1 - |fl|) - A f . Before we prove this lemma, let us see how it implies (3). Write ms(D)bi as mi(D)bi + f]i(D)rii(D)bi. By (i), the operator nii(D) is local at the scale of 2*, so m.i(D)bi is supported on Ur(/)=2* CI and makes no contribution to (3). Thus it suffices to prove || "£,Zi Vi(D)ni(D)bi\\l £ £ / M- % (")> Plancherel's formula, and Cauchy-Schwarz, this expression is 0 ( ^ 2 ^ ||n»(Z))6»H^)- So it will suffice to show that
WmiDMl =
E
r(J)=2<
"iW1"
Em
r(/)=a«
for each i > 0. However, by (i) the operator n<(£)) is local at the scale of 2*, so the individual ni(D)bi have almost disjoint support. Thus we only need prove c
This can be thought of as a dual to the more well-known principle that, if the kernel has compact support, then the large cubes can be safely discarded.
Three Regularity Results in Harmonic Analysis
67
the above estimate for a single cube. But by Plancherel's formula, (iii), and the restriction theorem, we have
\\ni(D)br\\22= f MOI 2 |S/(OI 2 de < f Jo
2-2*(")<(l + 2*|1 - r\)~2N
<2-26(?)i
/" 00 (i + Jo
2 '|l-r|)-
2W
f \bi(ru)\2 Js
durn-ldr
r- 2n /P'||6 / ||2r n - 1 dr
<2-2<5(p)»2-«|/|2/p = \I\ as desired, if N is chosen to be sufficiently large. It remains only to prove Lemma 2.4. Following Christ, 7 we start by choos ing a smooth radial function
(5)
holds. Since (5) holds for m*(f)> it suffices to prove this estimate for m<(^). By differentiating under the integral sign we see that the gradient of nii(£) is 0(2l2~Sx) in the region of interest, so it suffices to prove (5) for f on the unit sphere. Since m* is radial, it suffices to prove it for f = e\. But this follows by approximating ms(e\ — f) by 2*(f,e\) s + and using the assumed moment condition for
< m ( 0 < 2 - " ( l + 2*|1 -
\i\\)~N
for some large M, N, which we reserve the right to choose later. Once we do this, it can easily be seen from (4), (5), and the above estimates that the
68
T. Tao
function rft = (m* — mi)/ni satisfies (ii), and so all the required properties will hold. Let N be a large even integer to be chosen later and consider the onedimensional function V ( 0 = [(£ _ sinf)/£ 3 ] w / 2 , which is comparable to (1 + |£|) - i V and has a compactly supported inverse Fourier transform. By tak ing the tensor product of n copies of tp, averaging over the orthogonal group and then rescaling, we can construct a function ^>n on R n which is positive and comparable to (1 + | £ | ) - w - n + 1 , and whose inverse Fourier transform is supported on the unit ball. It is then a routine matter to check that the function rii = 2 - * , 2( n_1 ) , V>n(2 , •) * dw satisfies all the required properties, with M = N + n — 1; here du is surface measure on the unit sphere. This completes the proof of Lemma 2.4, and the theorem follows. □
Remarks 2.5. (a) The method of proof of Theorem 2.1, in particular the construction of rrii and the use of the Cauchy-Schwarz inequality, is essentially due to Christ. 7 However, a slightly different decomposition was used in Christ 7 ; in our notation, it would be ms = m* + Z)^lo(77l«+*+1 — m»+*)i w here the individual m.i+a+i — m j + s play much the same role in Christ 7 as the function ms — rrii = rjiTii does here. (b) This method can provide alternate proofs of the weak (1,1) type of various oscillatory convolution operators. For example, the weak boundedness of 5 ( n - ! ) / 2 follows immediately from Theorem 2.1, as the (1,2) restriction theorem is trivial. Another example is the one-dimensional operator / H-> f*elirx /x. The multiplier, which is roughly e _Mr ^ /£, can be decomposed for each i > 0 as mj + T/JUJ, where (i) [for rrii only] and (ii) of Lemma 2.4 holds and m(0 = 2 _< (1 + 2-*|f | ) - 1 + e . We omit the details. (c) By interpolation, (1) will hold with 5{p) replaced by 6, for 1 < p < 2 and 6 > max(<5(p), (n — l ) ( l / p — l/2)/2). Simple calculations involving bump functions on small balls or on thin tubes show that this region is the best possible. In the interior of this region (1) also follows from the weighted inequality /
\Ssg(x)\2tP(x)dx<
f
\g\2MT^{x)dx
(6)
Three Regularity Results in Harmonic Analysis
69
proven by Christ 6 ; here r = p/(2 — p) and V" is any positive function. In deed, the adjoint of (6) is / R n ' ^ / J ^ dx ~ JR« V $ 3 dx > w h i c h y i e l d s (X) after substituting ip = | / | 2 - p and using Tchebyshev's inequality. Unfortu nately (6) fails for end point values of S, as the bump function examples will show. (d) We have seen how restriction theorems for the sphere can imply weak type end point bounds for Bochner-Riesz multipliers. A converse to that observation holds: if 1 < p < 2n/(n + 1 ) is such that S*(p) is of weak type (p,p), then there is a strong (p,p) restriction theorem for the sphere. The proof can be sketchedd as follows. Assume the weak estimate. It suffices to show that | | / | | L P ( S ) < C | | / I I P f° r all / supported in a ball of radius r, where the constant C is independent of r. We may heuristically replace the kernel of the Bochner-Riesz multiplier by |z| _n /p e ±2*»|*l away from the origin. This will mean that S'Wf(x) behaves like e ± 2 , r i l x l|x|- n /P./(±x/|x|) when |x| 3> rc. The claim then follows from observing that the weak-L p quasinorm of this last expression is comparable to ||/||L»>(S) uniformly in r. In a similar fashion one can recover the (p, 2) restriction theorem from the inequality (1).
2.2. Interlude
on functions
of elliptic
operators
Let M be a smooth compact connected manifold without boundary of dimen sion n > 1, endowed with a smooth positive measure dx. It will be convenient to fix a smooth global Riemannian metric d(x, y) on M. Let P = P(x, D) be a first-order self-adjoint pseudo-differential operator on M, which we assume to be elliptic in the sense that the principal symbol p(x,£) = lim.\-»oo A _ 1 P(x, A£) of P exists and is positive for nonzero elements £ of the cotangent space T*M. We will assume without loss of generality that the \j are non-negative. If m is a tempered function on R, one can assign an operator m(P) using the functional calculus of P, as described in Sec. 1. The purpose of this section is to relate the "locality" (i.e. decay) properties of m(P)[x, y] to those of m. Proposition 2.6. Suppose m is a function of R such that rh is supported on [—r, r] for some sufficiently small r . Then A similar argument may be found in Wolff.'
70 T. Too n 1 (i) One has \Tn(P)[x,y]\<\\m\\iTd(x,y) whenever d(x,y) > CT. 00 (ii) If in addition m(T-) is in S' , then \m(P)[x, y}\ < T~n(l+d(x, y)/r)-n-1 for all x, y.
Part (i) is a slightly stronger version of a result proved in Christ. 10 The key idea is to exploit the fact that eitp is local when t = 0. Further improvements in this lemma are possible but will not be pursued here. Proof. We first prove (i). By the triangle inequality we may assume that m(P) is of the form eitp. But eitP = I + / 0 ^ e " p ds, so it suffices to show that \fteitp[x,y\\ < d{x,y)-n~l when d(x,y)/t is large. To do this we shall apply a well-known approximation of eltP by a Fourier integral operator. Specifically, if e > 0 is sufficiently small, the (distributional) kernel of extP can be written for \t\ < e in the form eitp [x, y] = Ve(x, y) f
eW'.vWtvMqit,
x, y,0<% + E.(t, x, y),
(7)
JT-M
where r]e(x, y) is supported on a Ce-neighborhood of the diagonal of M x M ,
^p[x^=n€{x,y)
j
e^+pq d£ + 0(l).
Since the integrand is 0 ( 1 +1£|), the integral on the region {|£| < d(x,y)~1} is 0(d(x, j / ) - n _ 1 ) . Outside this region, the term in brackets is — after a suitable rescaling by d(x, y) — a symbol of order 1, and so this part of the integral is also 0(d(x,y)~n~1), by the standard estimates on nonstationary oscillatory integrals. Thus one has the desired bound on ^e* t p [x, y], which proves (i). To show (ii), it suffices by (i) to prove that \m(P)[x,y]\ < r _ n . From (7) we have
Three Regularity Results in Harmonic Analysis 71 m(P)[x,y] = f,,(x,y) Je**^^
( Je**v*)q(ttx,y,^(t)dt)
d£ + O ( l ) .
But by the usual nonstationary phase estimates and the fact that p(y,£) ~ |£|, the inner integral will be ON(1 + T\£\)~N for all N > 0. Taking N > n thus gives the result. □ We now show that operators with the decay of 2.6(i) are effectively local in a specialized IP sense. Proposition 2.7. Let {1} be a collection of disjoint "cubes" in M. Suppose that for each I there is a function 6/ supported on I and a function mi whose Fourier transform is supported on [-2*,2*] such that ||6/||i < \I\ and ||m/||i < 1. Then, for 1 < p < oo, one has p
cic
<
c,£|/|.
In particular, if the CI are almost disjoint, then
Y.miWb,
^(DKC^MIS+EI/IV
Proof. It suffices to prove the first inequality. By Proposition 2.6(i) and elementary estimates, we see that |m/(P)6/(a;)xc/ c (^)| < C{Mxi{x))q, where q = 1 + 1/n > 1. Thus
CI'
M
£( *')
9
1/9
pq
PQ
f
1/9
P9
PQ
7
= Cp£l l as desired, where we have used the vector-valued maximal theorem of Fefferman and Stein, 14 adapted to general manifolds.8 □ "One can also use BMO methods to prove this estimate; cf. the proof of Lemma 4.2.
72
T. Tao
2.3. Proof of Theorem
2.2
Let 1 < p < 2 be such that there is a (p, 2) restriction theorem for P. We wish to show that SR is of weak type (p,p) uniformly in R, where 6 — 5{p). Since the proof is almost identical to that of Theorem 2.1 except for a rescaling and some minor technical complications, we give only a brief sketch. We first note that, since each projection operator is smoothing, the claim is clearly true when R is bounded. Thus we may assume that R > C for some large C to be determined later. Secondly, since the eigenvalues of P are non-negative, we may write SR — msR(P), where mR(X) = £(\/R)(l - X/R)s+ and £ is smooth, compactly sup ported, and equal to 1 on [0,1]. Note that ||m|j||i = O(l). Fix / . It suffices to show that (2) holds for a = C - 1 . We may assume that / is supported on an arbitrarily small subset of M , which we can identify with a subset of R n . We then decompose f = g + ^2jb[ as before except that we use a rescaled dyadic mesh, so that the cubes have sidelength 2X/R instead of 2V Since msR is bounded and supported in [-CR, CR], we may dispose of the contributions of g and the cubes of sidelength at most l/R. Instead of a Gaus sian, one uses an S~°° symbol tp on R such that V> is supported on [—C, C] and ipR = rp(R~1-) majorizes mR. By Proposition 2.6(ii), the kernel of T/»R(P) is majorized by a constant multiple of a positive and "radial decreasing" ap proximation to the identity of "thickness" l/R, and one can use the argument in the proof of Theorem 2.1. The appropriate analogue of Lemma 2.4 is the following decomposition of mR: Lemma 2.8. For each i > 0, there exists a decomposition mR = m4 -(- t]irii such that (i) The functions rhi and Hi are supported on [—2*/.R, 2'/.R], and each has an L1 norm which is 0(1). (ii) For allXeR, we have £ £ i ^(A)! 2 < 1. (iii) For a certain large N, we have |rii(A)| < 2 _ 4 i ( l + 2*|1 X/R\)~N. Apart from the last part of (i), Lemma 2.8 is proved in exactly the same way as Lemma 2.4. Since rhi is by construction the product of mR and a bounded expression, we see that the L 1 norm of ihi is 0(1). To prove the
Three Regularity Results in Harmonic Analysis
73
corresponding bound for nj, use (hi) to observe that r^ is uniformly bounded by 0(2" ,5i 2-*fl), and therefore has a L1 norm of 0 ( 2 " " ) = O(l), as desired. The rest of the argument, deducing (2) from the above decomposition, the Cauchy-Schwarz inequality, and the restriction theorem, is virtually un changed, apart from a rescaling by R. There are only two technical differences: firstly, as the restriction theorem is now discrete, one replaces a certain integral by its Riemann sum instead; and secondly, because the analogous "locality" principle (i.e. Proposition 2.7 for p = 2 combined with Lemma 2.8(i)) is not absolute, one incurs further error terms which are 0(%2{ \I\), but these errors are acceptable in proving (2). We omit the details. D 3. Pointwise Convergence of Wavelet Summation Methods Let ^ be a wavelet of the form discussed in Sec. 1. To each function / G ^ ( R ) , 1 < p < oo, one can associate a wavelet expansion / ~ £V k Oj,fcVi,fc> where a j,k — (f*i'j,k) for j,k G Z. It is well known (Meyer 20 ) that this series converges unconditionally in U norm to / , but the pointwise convergence is more delicate, being dependent on the summation method used. In Sec. 3 we show the pointwise convergence of the following three partial summation operators: the wavelet projections:
Pjf = V^ <*j,kipj,k , i<JM
the hard sampling operators:
T\f =
}J
a
j,k'4)i,k ,
the soft sampling operators:
T\f =
}J
s n a
S ( i,fc)(lai,fcl
—
•^)V'j,fc •
|oj.*l»
Theorem 3.1. / / / G ^ ( R ) for some 1 < p < co, then for almost every x one has Urn Pjf(x) = lim T A /(x) = lim f A / ( x ) = f(x). J-too
A->0
A-+0
The almost everywhere convergence of the wavelet projection operators was already shown (under more general hypotheses) in Kelly et al.19 The nonlinear hard and soft sampling methods are used frequently in applications (Averbuch et al.,1 Donoho et al.11) but this seems to be the first pointwise convergence result for these methods.
74
T. Tao
The proof of the convergence of the hard and soft sampling operators relies on the following observation. At fine scales (i.e. when j is large), the narrow support of the V'j.fe ensures that the wavelet coefficients a,jtk will be relatively small. Thus, a summation such as T\f(x) will consist only of coarse wavelet coefficients, and should be well approximated by a wavelet projection Pjf(x), where J depends nonlinearly on x, A and / . One can therefore hope to ob tain the convergence of T\ from the corresponding convergence of the linear operator Pj. In the last section we show that there are (somewhat artificial) wavelet summation methods which need not converge pointwise, at least for wavelets with little regularity. Theorem 3.2. Suppose that \l> is the Haar wavelet. Then there exists a func tion f which is in IP for all 1 < p < oo but is such that lim supA_^0 \^xf(x)\ = co for all x € R, where S\ is defined by S
*f =
a
Yl
3,k4>j,k ■
a
K,*l>2-'/ A
3.1. Proof of Theorem
3.1
The proof shall be based on the following key estimate: sup \Pjf(x)\, J
sup | 7 \ / ( x ) | , s u p | f A / ( x ) | < Mf(x) X
a.e.
(8)
X
We first show that (8) implies the theorem. Since we only require convergence almost everywhere, we may assume that x is a Lebesgue point f of / and of every Vj\fc- Choose any e > 0. By the usual method of convolving / with a smooth approximation to the identity and then truncating smoothly, one can find a g in C£°(R) for which \M(f - g)(x)\ < e. In turn, one can approximate g uniformly almost everywhere by a finite linear combination h of wavelets (for this fact, see Meyer 20 ). Thus one can find such an h for which |/(x) — h(x)\ < \M(f — h)(x)\ < e. Now we use the fact that the wavelet coefficients of T\f and h + T\{f — h) differ by O(X) when the wavelet occurs in the expansion of h, and are identical otherwise. This implies that T!x/(x) = h(x) + T\(f — h)(x) + O/^A) = / ( x ) + 0{e) + Oh{\) as A -> 0. Since e was arbitrary, we 'In this section we shall not identify functions which agree almost everywhere, so the notion of a Lebesgue point is well defined.
Three Regularity Results in Harmonic Analysis 75 thus have lmiA-»o f\f{x) = f(x) as desired.8 The argument for Pj and T\ is completely analogous. We now show that (8) holds for Pj. We note that this estimate was first proven in Kelly et al.19 under more general hypotheses on tp and / ; we shall give an alternate proof below. The argument relies on two observations. The first is the crude estimate \1>i,k(*)1>i,k(v)\ <
CN2~:>(1
+ V\x - y\YN{\
+ \7?x - k\)~N
(9)
for all N > 0; this is a simple consequence of the rapid decrease of ip and the easy inequality 1 + 2j\x - y\ < (1 + |2J'x - fc|)(l + |2 J y - fc|). The second is the fact that ^2,il>j,k{x)tl)j,k{y) = 0 a.e. j,k
The identity follows from the fact that the {V,j,*}i,fceZ form an orthonormal basis for L 2 (R), so that the above summation converges in a distributional sense to 8(x - y). Since the series is absolutely convergent for x / y by (9), the two limits must agree almost everywhere, hence the claim. As a consequence of these two observations, we see that the kernel defined almost everywhere by K(x,y)=
Y^ 1>j,k(x)1>jAv) = ~ ] C ^i,fc(a:)lk,fc(y) i<0;fc J>0;fc
converges absolutely and is bounded and rapidly decreasing away from the diagonal. However, we have from Fubini's theorem that
pjf(*) = E ^.*(x) [*jMv)f(v) dy j<J;k ~r1.u
JK
J2 ^jMx)ipj>k{y)f(y) dy *j<J;k
-L
2JK(2-Jx,2-Jy)f(y)dy;
the justification for Fubini's theorem follow from (9), the fact that / € Lp, and Holder's inequality. Since K(2~Jx,2~Jy), as a function of y, is rapidly g This argument will in fact give pointwise convergence on the entire Lebesgue set of /, if one assumes some minimal regularity (such as right-continuity) for V>. See Kelly et al.19
76
T. Tao
decreasing away from x, the desired estimate (8) now follows from standard maximal function estimates (see e.g. Stein 2 9 ). We now show that (8) holds for Tx. Fix / G L ^ R ) , A > 0, and x G R. Since we may assume that Mf(x) is nonzero, we can choose a J for which A ~ 2 _ - 7 / 2 M/(x); one can think of J as the finest scale for which significant terms in the summation for T\f(x) can appear. We will show that \T\f(x) — Pjf(x)\ < M / ( x ) , which will imply (8) from our results for the Pj. We shall first need some simple estimates. For each j , k G Z, let D(j, k) be the integer part of 2 + |2 J x —fc|.Note that for each j and for each integer d > 2 there are at most two k for which D(j, k) = d. From the rapid decrease of ip and the usual estimates on approximations to the identity, we have that, for all N > 0: \1>j,k{x)\
h,fc<M*)l <
Note that the second estimate implies that \dj,k\ is greater than A only if j< J + C\ogD(j,k). The estimate for \T\f(x) — Pjf(x)\ is now straightforward: \Txf(x) -
Pjf(x)\
E
HrfiA*) - E ai^j,k{x)
|Oj, fc |>A
j<J\k
E
ai,fcr/'J,fc(x)
l«i,kl>A;J<j
\"j.k\
< E AiiM*)i + j<J;k
E J<j<J+C
oo
<EE
aj,ktpj,k(x)
E
ioj.fc^.fc(x)i
log D(j,k)
oo
A2J/2d
~"+E
d=2j<J
E d=2J<j<J+C\ogd
< A 2 J / 2 + ^ Mf(x)d~N d=2
<M/(x) + M/(x),
log <2
^M/(x)
Three Regularity Results in Harmonic Analysis
77
as desired, where N is a suitable large number. Note that this argument also shows that the sum denning T\f{x) is absolutely convergent, since we know the sum for Pjf(x) is. The proof of (8) for T\ is almost identical to that for T\, although one must use the estimate Y^
sgn(flj,fc)(|oj,fc| - A)ifofc(x) = 5 Z °-i,k^j,k{x)
\<*i,k\»>;i<J
J<J;k
\j<J;k
in the above argument; we omit the details. Thus (8) holds for all three operators under consideration, and the theorem is proved. □ Remarks 3.3. (a) This result can be generalized to bi-orthogonal wavelets, and to decomposi tions involving more than one "mother wavelet". The decay conditions on ip can also be relaxed; for instance, one can get the pointwise convergence result for the Pj alone if one merely assumes that ip has an integrable radial majorant. The proofs of these assertions are simple applications of the ideas developed in Kelly et al.19 (b) The result is also robust under changes in the thresholding method. For example, if one samples under the threshold |a,-,fc| > a* A instead of |a,-,fc| > A, then one has almost everywhere convergence as long a s a / 2 - 1 / 2 . When a = 2 - 1 / 2 , one is thresholding according to the "heights" of the summation terms &j,kipj,k, and by fine-tuning these heights one can sum a large number of terms of comparable magnitude in any desired order. As we shall see in Sec. 3.2, this freedom in summation can destroy the cancellative effect of the wavelet series, so that one no longer has pointwise convergence. 3.2. Proof of Theorem
3.2
We shall construct, for each integer N > 1, a function fs = S , ^ ^fcV'j,* [0,1) for which:
on
(i) All but a finite number of the a^k are 0 and those which are not zero satisfy 1 < 2^ 2 a£ fc < 2;
78
T. Tao
(") || IN Up < CpN^2 for every 1 < p < oo; (iii) supA \T£fN(x)\ > N/3 for every x 6 [0,1). Once we do this, one can see that, if one chooses an increasing sequence of natural numbers M< whose successive differences are sufficiently large, the function oo
i
2M<-1
F(x) = J2^ £ /*(2M'*-*) t=l
fc=0
will be in V for every 1 < p < oo, but is such that limsup A | T " F ( x ) | = oo for every x G [0,1). The function f(x) = ££L_oo 2~^F(x - n) thus satisfies all the required conditions. We shall define fs explicitly:
1+2_J
/NW=EE j=0
fc=0
^
fc+
2 r(2Jx_fc)-
^
'
'
The verification of (i) is immediate, and (ii) follows quickly from Khinchin's inequality11
£ a i. fc ^.* ~ f ElaJ'.fe|2|^.fe|2) j,k
p
\ j,k
/
To prove (iii), fix an x G [0,1) and observe that all but N of the factors ^/>(2Jx — k) in the sum defining / N ( X ) are zero, and the remainder are either + 1 or —1. Furthermore, this factor is non-negative if 1 -I- 2 _J (fc + 1 / 2 ) > 1 + x and nonpositive otherwise. Thus, by inspection of the definition of T " for A = 1 + x and A = 2, we have
*?+./(*)=
(1+2~j(k+\))>
£ V>(2ix-fc)=+l ^
^
' '
and
T1\xf(x)-T2°f(x)=
£
2T1%/(x) - !?/(*) =
E
so
(i + 2-Vfc + i ) ) ;
f1 + 2_j ( fc + I)) > N ■
4>(2'x-k)jt0
which implies (iii). This proves the theorem. "A proof of this inequality can be found in Meyer.20
□
Three Regularity Results in Harmonic Analysis
79
4. Weak Bounds for Homogeneous Convolution Operators Let H be an arbitrary homogeneous group, and suppose that fi is a function on the unit sphere S of H, which we endow with the natural measure duj in duced by the dilation structure. We consider the question of what the minimal conditions on il are so that the convolution operator Tn: / M- / * K is of weak type (1,1), where the convolution kernel K = KQ is given by K = Kn denned by K(r OUJ) = r~NS}(u) and N is the homogeneous dimension of H. Among the necessary conditions known are that Cl must be in L 1 (5), and Tn must be bounded on L2. This in turn necessitates that Q satisfies the cancellation condition fs fi(w) du = 0. When H is an isotropic Euclidean space, the classical theorem of Calderon and Zygmund 3 shows that T is indeed bounded on L2 when il satisfies the cancellation condition and is either odd and in i 1 (-S), or even and in the Hardy space H1^). In particular, we have boundedness on L2 whenever fi satisfies the cancellation condition and is in the Orlicz class LlogL. However, their argument relies on the method of rotations and therefore cannot be applied directly to the question of weak (1,1) boundedness (cf. the discussion by R. Fefferman15). It is natural to conjecture that analogous L2 results hold for arbitrary homogeneous groups. If fi is odd and in L1 one can use the method of rotations and the work of Ricci and Stein 21 on convolution operators on singular sets in homogeneous groups to obtain L2 boundedness, but to the best of the author's knowledge the corresponding question for fi even and in Hl or L log L remains open. The end point question in Euclidean space has been considered by several authors (Christ, 8 Christ and Rubio de Prancia,9 Hofmann17 and Seeger 24 ); recently A. Seeger 24 has shown that Tn is of weak type (1,1) for all il in LlogL that satisfy the cancellation condition. The corresponding questions for odd L1 or even H1 kernels remain open. In this section we generalize the result in Seeger 24 to arbitrary homoge neous groups. Specifically, we show that if Q is any function in L log L such that Tn is bounded on L 2 , then Tn is automatically of weak type (1,1). If the above conjecture is true, the L2 hypothesis on Tn would just reduce to the cancellation condition. The methods in Seeger 24 rely on the Euclidean Fourier transform and do not appear to be adaptable to non-Abelian settings. Our approach is more in the spirit of Christ and Rubio de Francia, 9 in that one considers the expression T n / as an operator acting on the kernel K rather than one acting on / . One
80
T. Tao
can then reduce weak (1,1) boundedness to something resembling a (L 2 ,L 2 ) estimate, which is now amenable to orthogonality techniques such as the TT* and (TTm)M methods. The argument can also be used to treat the slightly smoother maximal and square function operators corresponding to L log L generators il, either by direct modification of the proof, or by using a Radamacher function argument based on the fact that £V Ti(t)f * Ki is of weak type L 1 uniformly in t. One can also use the arguments in Seeger 24 to weaken the radial regularity on K to a Dini-type continuity condition. 4.1. Preliminary
reductions
Let if be an arbitrary homogeneous group, and let fi be an L log L function such that T = Tn is bounded on L2. Our objective is to show that T maps L1 to weak-L 1 . We will assume that all relevant functions are real-valued, since the extension to complex numbers is a trivial matter. Also, we suppose that n = dxm(H) > 2, since the n = 1 theory is treatable by the classical theory of the Hilbert transform. By linearity we may set ||0||i,iogL equal to 1. Let / be an arbitrary L 1 function. By linearity again, the desired estimate reduces to \{x:\Tf(x)\>l}\<\\fh. We now decompose / in the standard Calderon-Zygmund fashion as / = g + 22 ; 6/, where \\gW2 ^ | | / | | i , and the 6/ are functions supported on finitely overlapping balls I with dyadic radii 2*, with ^Zj |/| < ||/|| 1, such that fj 6/ = 0 and fj \bj\ < \I\. Since T is bounded on L2, we may dispose of g in the usual manner, and it suffices to prove that
£*>/**(*)
>i
£l'l-
(io)
By rescaling we may assume that YLi \I\ ' s comparable to 1. The next step is to break up the kernel K into dyadic pieces, although our decomposition scheme will differ slightly from the standard one. We start with an auxiliary decomposition K =■ ^2°Z_00 Ki which can be constructed by any of the standard methods; the only assumptions we place on the decomposition are that the Ki depend linearly on ft, are supported on some annulus JB(0, 2*)AI and satisfy the bounds \\Ki\\p ~ 2N,/p ||£2||p. Define the pushforward operators 51 corresponding to a dilation by t as
Three Regularity Results in Harmonic Analysis
6t.f(y) =
81
t-Nf(t-1°y).
Since K is homogeneous of degree —N, we have S\K = K for all t > 0. Thus, if (p is a suitably normalized bump function adapted to the interval [ C - 1 , C], we can write K — £V ^ t , where Ai = D^Ki is a smoothed out version of K{ and Z)^ = J Sl
£ 6 '** = ££ 6 '**<+' + £ E 6 ' * ^ + £ £ 6 ' * hi * I
s
3>C
I
s>C
I
The first sum on the right-hand side is supported in the set \Jt CI, which has an acceptable measure, and so can be discarded. To treat the second sum, we estimate its L1 norm by the method of rotations as follows:
EEiNiiii^+nii s>C
I
1
>>C
I
s £i'i
E« n>
L log L
Thus, by Tchebyshev's inequality, this term is also acceptable. It remains to treat the last sum, which is the most interesting of the three. By Tchebyshev's inequality again, it suffices to prove the following V estimate for some 1 < p < oo and all s > 0:
>**&•
< Cp2~'''
Suppose that the parameter s > 0, the balls I, and the functions 6/ are fixed. Rewrite 6/ * K^*a as ^jT/Z^A:/, where r/ is the operator corresponding to leftconvolution by jfr, ipi is a normalized bump function equal to 1 on a suitable annulus 2*/A and supported on a slightly thickened version of that annulus,
82
T. Tao
and k, = \I\K^aB. Since ||fc/||2 < 2"2~N'/2\I\1/2 for some arbitrarily small £, we see that the desired inequality will be implied by the following proposition. P r o p o s i t i o n 4 . 1 . Let I = {/} be a collection of essentially disjoint balls with dyadic radii such that ^i \I\ ~ 1) <*n^ fc* bi be functions on I such that IIMIi ~ 1^1 and Sj bi = 0. Then, if kj are an arbitrary family of functions in L2{H), one has the following estimate for all 1 < p < 2 and all s > 0:
(11)
Note that all the support, size, regularity, and interdependence assump tions on ki have been discarded; those aspects of the inequality have either disappeared or been transferred to other components of (11). This deliberate forgetting of structure is advantageous because it simplifies the Hilbert space in which the {&/} live in. By the TT* method, the estimate (11) is equivalent to the claim that Pll(P',P) < C p 2- £ " 5 for all 1 < p < 2, where A : V'{H) -* D>{H) is the self-adjoint operator A/ = 2-"*£>/T/D^T;V>//, and (fi2 is the bump function f2{t) = f sN(p(s)(p(ts) — . For each I, let 1/7 be a normalized bump function on 2*J& which is equal to 1 on the support of ipi, and let
have ||yl+||(q)p) < C 9iP .
Three Regularity Results in Harmonic Analysis
83
Proof. Since the r and D operators are averages of translations and dilations by factors comparable to the identity, they are uniformly bounded on every V space. Thus, the claim will follow by combining the inequality
^-"'^H^/d
which is valid for 1 < r < q < oo, with its adjoint
PV'
,-AT*
En//
*/ r
which is valid for 1 < p < r < oo. To prove this inequality, it suffices by the generalized Holder inequality to prove the estimate H2-"* £ / i>f\\b < Ca>b for all 1 < a, b < oo. Since all the powers of tpi are essentially comparable we shall assume that a = 1. When 6 = 1 the claim follows immediately from the triangle inequality and the normalization condition £ ) 7 |/| ~ 1. It therefore suffices to check the end point 6 = BMO; in other words, we will show that TJT fj \2~N* Y^,[ Vv — c j | < 1 for all balls J and some constants cj. Fix J. We shall estimate the mean oscillation of each summand separately. When r(2*7) < r(J), there is a nonzero contribution only if / is in CJ, and even then the contribution is majorized by C2-N'\2'I\/\J\ = C\I\/\J\. Thus the total contribution of the small cubes is at most C\CJ\/\J\, which is accept able. When r(2'I) > r(J), there are at most C2Ns cubes J at each scale for which there is a nonzero contribution, and an integration by parts argument shows that the maximum contribution from each such cube is 0{2~Na *k.\\)Summing over all large scales thus gives the desired BMO estimate, and we are done. □ Because of the trivial estimates on A, any bound of the form ||>l w || g , p < 2~" for a pair of exponents 1 < p < q < oo and a single integer M > 0 will imply the desired estimate ||i4||(p/>p) < CP2~E** for any 1 < p < 2. To see this, note that one can repeatedly compose such an estimate on AM with the trivial estimate and interpolate with further trivial estimates until M is a power of 2 and q is the dual exponent of p. One can then use the TT* method to obtain a boundedness result (e.g. (2,p) boundedness) on AMI2 with exponential decay, which one can interpolate with the trivial estimates to again obtain a (p',p) boundedness result. Iterating this procedure eventually gives
84
T. Tao
a bound on A, which can be interpolated to give the desired estimate for all 1 < p < 2 (although the decay factor e would of course have been considerably worsened by this process). Thus, to finish the proof of the weak (1,1) boundedness of T, we have to find some estimate on a power AM of A which improves on the corresponding trivial estimate on A+. Since each Dv operator contained in A has a smoothing effect in one "random" direction, one intuitively expects that n iterations of A should create a smoothing effect in all directions, which would interact with cancellation in the 6; to obtain the desired gain. There are two difficulties with this approach, however. Firstly, if one composes parts of A that come from balls I of vastly different sizes, then the cutoff functions tpi corresponding to the smaller balls can seriously truncate the smoothing effect from the Dv corresponding to the larger balls. Secondly, when H is a general homogeneous group, the smoothing effects of the Dv will tend to be along almost parallel directions, rather than being isotropically dispersed. The combined effect of these two obstacles will be that one might only get smoothing effects along very short, very parallel arcs, which will not give much isotropic regularity. To avoid this problem we shall use a much larger number of iterations, namely M = 2 2 n - 3 , which is enough to ensure the existence of at least n untruncated arcs for each M-tuplet of balls h,..., IM ■ Before we continue, however, we shall need a certain amount of machin ery related to the inverse function theorem. These rather technical devices unfortunately seem to be necessary when applying iteration methods such as the (TT*)M method; see for example Christ and Rubio de Francia, 9 Ricci and Stein. 21 4.2. Interlude
on the inverse function
theorem
As in the previous section, s > 0 is a fixed large number. We shall also set 5 > 0 to be an extremely small constant to be chosen later; S will depend only on the homogeneous group H. In this section we shall concentrate on the Euclidean structure of H instead of the group structure. Actually, the correct structure to consider should be the vector space structure of the Lie algebra h, but we shall take advantage of the Euclidean identification between H and f) and avoid introducing any new spaces. We will also be using the space f\ H of fc-forms for all 0 < fc < n, which we give a Euclidean structure by declaring the canonical basis ejf to be
Three Regularity Results in Harmonic Analysis
85
orthonormal, where K runs over all increasing subsequences of l , . . . , n of length k. This will induce a metric on the projective space P(/\ H). We shall use the volume form dx to identify A" H with R in the usual manner. For technical reasons we shall need a coordinate dependent approximate partial ordering ^ on f\ , as well as a rather artificial non-cancellative analogue 0 of the wedge product. We say that ^2K XK^K ^ YLK VK^K, or alternatively x e CSs X ) K K K — 0(Y1K VK^K)I whenever \XK\ < 2 \yKI for each basis element ex- We use x « y to denote the statement that x £ y and y ^ x. The nonlinear operator 0, taking j-forms and fc-forms to j + fc-forms, is defined by
^bjejO^cxeK J
K
= ^2^2±\bjcK\ej j
/\eK ,
K
where the sign is chosen such that ±ej A e# is a basis (j + A:)-form whenever possible. Note that the 0 operator preserves the ^ ordering, and that i A y ^ xOy for all forms x, y. A heuristic converse holds: for "most" x and y one should expect that x A y w x§y. This motivates the following definition: a collection x\,...,Xk of vectors in H shall be called good if one has x a , A--- A i „ r « x a i 0 - - - 0 a : a r for all subsequences o i , . . . , ar of 1 , . . . , k of length at most n. Note that the property of being good is invariant under Euclidean dilations of the individual Xi, or under a single homogeneous group dilation acting on all the x<. Also, note that the property of being good is equivalent to the existence of vectors 2/1, • • •, J/fc such that x 0 ^ ya for all a and xai A • • • A xar £ 2/Ol0 • • • 0j/or for all subsequences a\,..., ar. We shall frequently use the element 1 = ( 1 , . . . , 1) of H, or more generally expressions such as (6\ o 1)0 • • • 0(<Jfc o 1), to dominate various expressions using the ^ relation. Let
86
T. Tao
^-{t) = Jim (A*)-1 • Otk
mrl¥t+*k)),
dtk->Q
4>yk{t) = lim (dtk)-1
■ (
otfc—fO
We shall need the following relationship between the two derivatives: if J £ ( t ) £ 2* o 1 and p(4>{t)) < a2* for some a < 1, then J £ ( r ) - <£,*(*) £ a e • (2* o 1). This is easily verified by rescaling to i = 1 and using the fact that group multiplication is approximately equal to vector addition near the origin. We are now in a position to state a precise version of the following princi ple: if the partial derivatives -g*- are good and essentially constant, then the pushforward
(iii) We have d(<j>0(x),
I / ( ^ ( * ) M * ) dx-
JB
[ /Wo(aOW(z) dx <2~'"
JB
f |/(tfo(*))M*) dx. JB
Three Regularity Results in Harmonic Analysis
87
Proof. By rescaling we can assume that r = 1, so that 2"*+* > 2Cas for all k. Let D
{x:\DMxo)(x-x0)\<2""}
have diameter at most 0(2~Ca3). A covering argument and (ii) then shows that we can restrict our domain from B to a single ellipsoid BXo, and by a partition of unity we can assume that tp, V> are now adapted to BXo instead of B and satisfy the Lipschitz condition \1>(xi) - V(*2)| < 2-"°\DMxo)(xl
- *a)|fe).
We may assume that XQ and
of the proof
We now conclude the proof of weak (1,1) boundedness of the convolution operator T by dominating most of the operator Am by C2~" times the corre sponding portion of A™. One minor difficulty will be that the smoothing effect of the Dv will be along curves instead of straight lines when H is nonisotropic. To avoid this problem we shall make a decomposition which will partition these arcs into a relatively
88
T. Tao
small number of segments which are approximately straight. Choose a such that 5 < < 7 < 1 , and partition the interval [C~X,C\ as the finitely overlapping union of a set G of intervals 7 of length 0{2~"'), where # G ~ 2"'. Write ip2 = 53 ^7, ^2 = J2-y V-n where r]-, and T)7 are normalized bump functions adapted to 7 that satisfy the Lipschitz condition \r)~,(x) — r)^(y)\ < 2
/ =—jji
iji
* W
)yd#(7)-
We use X = \Jj Xi to denote the space of all steps. Finally, we define the propogator P : H x X -> 77 to be the function P(x,{I,v,w,-y,t))
=v(to(w-1x)).
The operators ^4 and A+ can now be written as
Af(x) = Y,A'fW = 2~N° £ A+/(*) = £ > ! + / ( * ) = 2 - " '
^
/ /(^^/W/t^E))^, /" / ( P ( x > E ) ) ^ / ( i ) ^ / ( P ( x > E ) ) d E ? .
Before we proceed further we shall record in the following proposition some facts on how the propogator reacts to changes in the x, v and t variables, and how evenly distributed the tangent vectors ^ are. Proposition 4.4. Let E = (7, v, w, 7, t) be an arbitrary step, x,x' G 77 be in range of E and such that d(x,x') < 2 - £ < " 2 ' + 5 , and let E' = (7,v',10,7, t') be
Three Regularity Results in Harmonic Analysis
89
a step that differs from E only in the v and t variables. Then the following statements hold: P(x,E)- 1 P(ar',E) = t o ( x - 1 a ; ) . d(P(x,E),P(x'X))
<2i+^-^a.
if t = f, then d(P(x, E), P(x, E')) < 2 <+ ( 1 - £ >*.
f(*.S)^2'+'ol.
Finally, for any set y = { j / l t . . . , j/jt} of good vectors and any E, we have that 2/1) • • • 12/fe> l ^ C 1 ) £ ) a r e fl00^ / o r m o s * s- More precisely, if one defines EViz to be the bad set j x G 2 V A : yBl A- • -Aj/ar A — ( x , E ) £ y a i 0 - • • 0ya r 0(2 i + *ol) and Ey^
\/alt...
,ar\
to be the slight enlargment Ey^ = {x: d(x, £ y ,s) < 2 ,+£<5 *}, then
where the constant Ck is independent o/ j/i, • • •, j/fc and E. Proof. All the statements above are invariant under dilations and translations, so we may assume that 2 , + 5 = 1 and xj = 0. The first identity is proven by inspection, and the next four inequalities follow from the polynomial nature of the expressions on the left-hand sides and the bounds on the variation in x and E. To prove the final claim, we write ^ as 8P — (x,E) = t o ( r 1 - X ( W x ) ) , where X(x) is the tangent vector at the origin to the curve t H-» X _ 1 ( 1 +1) o x. The map X is obviously given by polynomials; we shall see shortly that X is invertible and that the inverse map is also polynomial. This implies that, for every hyperplane 7r, the set {x: p(x) ~ 1, d(X(x),ir) < 2~eS3} has measure at most C2~eS', uniformly in ■n. The claim now follows by noticing that the property of x being in Ey<^ (or £ v ,s) is essentially equivalent to the
,
90 T. Tao statement that X(wx) avoids a certain finite union of 2~ £a '-neighborhoods of hyperplanes. It remains to prove that X is invertible and that X~l is given by poly nomials. This will come from the fact that H is a nilpotent Lie group, and that all the group operations are "lower diagonal" in some sense. Since the basis vectors e i , . . . , e n are arranged according to their homogeneous weight, the multiplication and inversion laws must have the form {xy)j = Xj + yj +Pj{xi,..., (x~l)j
=
Xj-i, 2/i,..., j / j - i ) ,
-XJ+<3J(XI,...,ZJ_I).
From the definition of X we can therefore write X(x)j
= ajXj + Rj(xi,...
,Xj_i),
where the Rj are polynomials, and this map can be inverted explicitly with a polynomial inverse by the usual iterative procedure. □ Let M = 2 2 " - 3 . We write AM as £ / , . . . / „ AIl ■ ■ ■ A I M , and use the fol lowing combinatorial lemma to extract for each such sequence a monotonic subsequence of length n which avoids the truncation problems mentioned ear lier. Lemma 4.5. Let I\,...,IM be an arbitrary collection of balls, where M = 2 2 n _ 3 . Then there exists a strictly monotonic sequence fci,...,fcn of integers in { 1 , . . . , M } such thatr(Ikd) < r(Ik) for allj = 1 , . . . , n - l , and allk between kj and kn inclusive. Proof. We first construct an auxiliary sequence i i , . . . , hn-2 of integers and a sequence S\,..., 52 n -2 of intervals of integers by the following inductive procedure. Let Si be the interval { 1 , . . . , M } . For each j = 1 , . . . , 2n — 2 in turn, we choose lj G Sj such that hi has minimal radius among all the balls {/j: / G Sj}. Removing the element lj from Sj divides the remainder into two intervals {I G Sj : I < lj} and {I G Sj : I > lj}; we choose 5 J + i to be the larger of the two intervals. One can easily show that \Sj\ > 2 2 n _ 2 ~J for all j , so that all the lj are well defined. Furthermore, one has r(/j i ) < r(Ii) for all I between lj and /2n-2- Since one of the sets {U < i2r»-2}> {h > ^ n - 2 } has a cardinality of at least n, we can extract a monotonic subsequence fci,..., k„ with the desired properties. □
Three Regularity Results in Harmonic Analysis
91
Fix the sequence I\,..., IM , and let k\,..., kn be as in the lemma. Except for an error term which we shall deal with later, we shall majorize Ajx • • ■ A\M pointwise by 2~"Af • • • AfM. By taking adjoints if necessary and using the fact that the Ai and Aj operators are self-adjoint, one can assume that the fa are monotonically increasing. We can also truncate the sequence I\,..., IM at both ends, reducing M in the process, so that k\ = 1 and kn — M. In fact, we will make the simplifying assumption that M — n and kj — j for j = 1 , . . . , n, so that ij is a monotonically nondecreasing sequence; the general case is completely analogous, but would involve more intricate notation. We will dominate most of |J4J, •• -AInf(x)\ by 2 " £ M £ • • ■ Afjf\(x) for all / and x; we may take x = 0 without loss of generality. The expression Ai1 • • • Ain /(0) can be written as a repeated integral
U L "/„'<">*<-!>n ("wr>f) *$*<*>. where Z is the set of (3n - l)-tuples z = (v2,... ,vn,wi,. ■ •, Wn,7i. • • • ,7n), the Xj are defined inductively by XQ = 0, Xj = P(xj-i,(Ij,Vj,Wj,ij,tj)), the cutoff function * is given by $(z,vi,t) = I"I?=i ^ ( ^ - 1 ) 1 and dfi(z) is some measure whose explicit form is not important. One can of course write a similar integral formula for the A\ ■■■ Af |/|(0), with all terms replaced by enlarged, non-cancellative versions of themselves. We may assume that Xj_i, Xj are in range of Ij for each j . By Propo sition 4.4 and the monotonicity of the ij, the maximum variation of each Xj as a function of the v\ variable is 0(2*' + ( 1 - E ) s ), and the maximum variation as a function of the t variables is 0(2,'+^1~"r^B). In particular, this means that the ^ is essentially independent of v\ and t, as the variation in ^ can be dominated by 2~" times the corresponding function
92
T. Tao
/•■•/ /wiiKcj 1 )?) is majorized by -E<J8
Al
An
j=l V
l
l
/
This however will follow directly from Propositions 4.3, 4.4, and the assumption that z was good. It remains to estimate the error terms arising from the bad histories. The total contribution of these terms to the original expression AM f(x), with M = 2 2 " - 3 , can be dominated by
f
\f(xM)\X(T)dT+,
where X^1 is the space of all histories T = (x, E i , . . . , E M ) endowed with the measure j=l
\
Ij€l
/
a
and x is cutoff function that vanishes on good histories T. We now construct a large set of good histories. Let S = Ufc=i P ( A H) D e the disjoint union of the projective exterior algebras of H, and let N = {v} be a maximal 2 _5s -seperated set of 5, so that #N = 0(2CSs). For any v € N, 7 € G, and x e H, define lx,-r,u to be the set of balls IX.-Y.Z/
= {I e I: x is in range of / , (t o v) A X(x7 x x) ^ ( t o v)0(2 i + a o 1)} ,
where v is any representative of v, t is any element of 7, and X is the map in the proof of Proposition 4.4. Define an exceptional set
E=
(J
{x£H:\lx„,v\>C2(N-£V*};
we shall see presently that E has very small measure if e is chosen appropriately. We can now construct a large subset il™ of X™ by the following inductive procedure. Set ill equal to X\. For 1 < M < 2 2 n _ 3 , a history T will be in fi£* if its predecessor (the truncation of T of length M — 1) is in fi£*-1, the
Three Regularity Results in Harmonic Analysis
93
point XM-I is not in the set E, and IM avoids a certain finite collection of I X M - I ^ M , " - More precisely, for any subsequence k\,..., kT of 1 , . . . , M — 1, we shall require that IM is not in Ii M _ l l 7 M ,i/, whenever v is a ray in N which is within C2~6a of the projective ray corresponding to gp1- A • • • Adx ' ^I t is a routine matter to verify inductively that this condition ensures that all elements of QM are good. Lemma 4.6. \E\ < Cm2~ma for
allm>0.
Proof. It suffices to show this bound for a single £ 7 i „ = {x € H: \Ix,y,v\ > C2( 1 - e ')*}. Fix 7 and u, and for each / € I let \l)jn
\{x:\f(x)\>a}\<e-c°/W™°Mkt a it suffices to show the two estimates
52i>ln,i
< 2(N-cS),
(
z*>,
< s2^N~eS^a
"I,"
BMO
By Proposition 4.4, each Vv.-y.i/ is supported on a subset of 2*/A of measure 0(2~cSa\2*I&\) and is essentially constant on sets of width 0(2*) as measured by the homogeneous norm p, with all constants independent of the choice of v. Furthermore, ^7,7,1/ commutes with left-translations of / . One now repeats the argument in the proof of Lemma 4.2 to obtain the desired estimates. There will be O(s) exceptional scales for which one cannot take advantage of either the smoothness or support conditions on the ^7,7,1/1 and which must be treated individually (using the left-invariance of the ^7,7,1/); tbis is the cause of the additional 5 factor in the BMO estimate. We omit the details. □ The complement of Q.M consists of those histories T which are such that there exists a 1 < j < M such that either Xj is in E, or that Xj is not in E but Ij+\ is on an exceptional set Ip consisting of a bounded number of the 1^,7^!,!/ as detailed above, so that # I r = 0(2^-^"). We shall prove the estimate
L
-n^
l/(zAf)|dTH
< CP,?2"
"ll/ll,
94
T. Tao
for all 1 < p < q < oo, which will finish the proof of weak (1,1) boundedness. This however will be immediate from a finite number of applications of the following lemma. Lemma 4.7. Let M be a fixed integer, and suppose that x(F) is a bounded function defined on \Jx€u X^. Then, ifx satisfies either of the two conditions stated below, one has
f
\f(xM)\X(T)aT+
for any 1 < p < q < oo and f € Lq. (i) There exists 0 < k < M and a set E of measure 0(2~es) such that x ( r ) vanishes whenever Xk is outside E. Or (ii) There exists 0 < k < M and sets /p of cardinality 0(2(-N~eS^3) that depend only on XQ, EO, . . . , Efe, 7^+1, such that x ( r ) vanishes whenever Ik+i is not in Ir-
Proof. Assume first that Condition (i) holds. The expression whose norm is to be bounded is then majorized pointwise by C(A+)kXE(A+)M~k\f\, and so the desired estimate follows immediately from the trivial bounds on A+ and the fact that ||xE||( g , P ) < Cq,p^~£"pS for all 1 < p < q < oo. Now assume that Condition (ii) holds. In this case it will suffice to find (Lq, V) bounds with exponential decay on the operator B defined by Bf{x) = f 2~N> £ £ rPir+D^r+^,f(xk) dT+ , Jx * -reG/eir since the desired expression is majorized by B(A+)M~k~1\f\. Since B is ma jorized by (A+)k+1, it suffices by interpolation to prove a bound of the form ||£||(oo,p) < Cp2~c>' for some 1 < p < oo. However, as the operators i/>/, T / , \G\D^, and Tj"* are all uniformly bounded in L°°, we have \Bf(x)\<
[
2-^|Ir|||/||oodr+<2-^||/||0O(A+)fcl(a:)>
and so the estimate follows from the trivial (oo,p) bounds on (^4+)fe.
□
Three Regularity Results in Harmonic Analysis
95
Acknowledgments T h e author is deeply indebted to his advisor, Elias Stein, for his constant guidance, advice and suggestions. T h e author also wishes t o t h a n k Charles Fefferman, Anna Gilbert, Ingrid Daubechies, and Andreas Seeger for several useful discussions. P a r t s of this research were conducted while t h e a u t h o r was supported by a Fulbright G r a d u a t e award and a Sloan G r a d u a t e Fellowship. N o t e s A d d e d in Proof T h e results of Sec. 2 have been published in Tao 3 2 ; this result has since been improved by the author in Tao. 3 5 (See also T a o 3 4 and Tao, Vargas a n d V e g a 3 6 for related progress.) T h e results of Sec. 3 have been published in Tao, 3 3 and generalized in Tao and Vidakovic. 3 7 T h e results of Sec. 4 have not been previously published. References 1. A. Averbuch, G. Beylkin, R. Coifman and M. Israeli, Fast Adaptive Algorithm for the Elliptic equations with periodic boundary conditions, preprint. 2. J. Bourgain, Besicovitch-type maximal operators and applications to Fourier analysis, Geom. Funct. Anal. 22 (1991) 147-187. 3. A. P. Calderon and A. Zygmund, On singular integrals, Amer. J. Math. 78 (1956) 289-309. 4. L. Carleson and P. Sjolin, Oscillatory integrals and a multiplier problem for the disc, Studia Math. 44 (1972) 287-299. 5. S. Chanillo and B. Muckenhoupt, Weak type estimates for Bochner-Riesz spher ical summation multipliers, Trans. Amer. Math. Soc. 294 (1986) 693-703. 6. M. Christ, On almost-everywhere convergence of Bochner-Riesz means in higher dimensions, Proc. Amer. Math. Soc. 95 (1985) 16-20. 7. M. Christ, Weak type endpoint bounds for Bochner-Riesz multipliers, Rev. Mat. Iberoamericana 3 (1987) 25-31. 8. M. Christ, Weak type (1,1) bounds for rough operators, Ann. Math. 128 (1988) 19-42. 9. M. Christ and J. L. Rubio de Prancia, Weak-type (1,1) bounds for rough operators II, Invent. Math. 9 3 (1988) 225-237 10. M. Christ and C. D. Sogge, The weak-type L1 convergence of eigenfunction ex pansions for pseudo-differential operators, Invent. Math. 94 (1988) 421-453. 11. D. L. Donoho and I. M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika. 81 (1994) 425-455. 12. C. Feffermann, Inequalities for strongly singular convolution operators, Acta Math. 124 (1970) 9-36. 13. C. Fefferman, The multiplier problem for the ball, Ann. Math. 94 (1971) 330-336.
96
T. Tao
14. C. Fefferman and E. M. Stein, Some maximal inequalities, Amer. J. Math. 93 (1971) 107-115. 15. R. Fefferman, A theory of entropy in Fourier Analysis, Adv. Math. 30 (1978) 171-201. 16. G. B. Folland and E. M. Stein, Hardy Spaces on Homogeneous Groups, Math Notes # 2 8 (Princeton Univ. Press). 17. S. Hofmann, Weak (1,1) boundedness of singular integrals with nonsmooth kernel, Proc. Amer. Math. Soc. 103 (1988) 260-264. 18. L. Hormander, The spectral function of an elliptic operator, Ada Math. 121 (1968) 193-218. 19. S. Kelly, M. A. Kon and L. A. Raphael, Local convergence for wavelet expansions, J. Fund. Anal. 126 (1994) 102-138. 20. Y. Meyer, Ondelettes (Hermann, 1990). 21. F. Ricci and E. M. Stein, Harmonic analysis on nilpotent groups and singular integrals II: Singular kernels supported on submanifolds, J. Fund. Anal. 78 5684. 22. A. Seeger, Endpoint estimates for multiplier transformations on compact mani folds, Indiana Math. J. 40 (1991) 471-533. 23. A. Seeger, Endpoint inequalities for Bochner-Riesz multipliers in the plane, Pa cific J. Math. 174 (1996) 543-553. 24. A. Seeger, Singular integral operators with rough convolution kernels, J. Amer. Math. Soc. 9 (1996) 95-105. 25. A. Seeger and C. D. Sogge, On the boundedness of functions of (pseudo)differential operators on compact manifolds, Duke Math. J. 59 (1989) 709-736. 26. C. D. Sogge, On the convergence of Riesz means on compact manifolds, Ann. Math. 126 (1987) 439-447. 27. C. D. Sogge, Concerning the Lp norm of spectral clusters for second-order elliptic operators on compact manifolds, J. Fund. Anal. 77 (1988) 123-138. 28. C. D. Sogge, Fourier Integrals in Classical Analysis, Cambridge Tracts in Math. # 105 (Cambridge Univ. Press, 1993). 29. E. M. Stein, Singular Integrals and Differentiability Properties of Functions (Princeton Univ. Press, 1970). 30. E. M. Stein, Harmonic Analysis (Princeton Univ. Press, 1993). 31. E. M. Stein, M. H. Taibleson and G. Weiss, Weak-type estimates for maximal operators on certain Hp spaces, Rend. Circ. Mat. Palermo Suppl. 1 (1981) 81-97. 32. T. Tao, Weak-type endpoint bounds for Riesz means, Proc. Amer. Math. Soc. 124 (1996) 2797-2805. 33. T. Tao, On the almost-everywhere convergence of wavelet summation methods, Appl. Comput. Harmonic Anal. 3 (1996) 384-387. 34. T. Tao, The Bochner-Riesz conjecture implies the Restriction conjecture, Duke Math. J., to appear. 35. T. Tao, The weak-type endpoint Bochner-Riesz conjecture and related topics, Indiana Math. J., to appear.
Three Regularity Results in Harmonic Analysis
97
36. T. Tao, A. Vargas and L. Vega, A bilinear approach to the restriction and Kakeya conjectures, J. Amer. Math. Soc., to appear. 37. T. Tao and B. Vidakovic, Almost everywhere convergence of general wavelet shrinkage estimators, to appear. 38. T. H. Wolff, Recent work on sharp estimates in second-order elliptic unique con tinuation problems, J. Geom. Anal. 3 (1993) 621-650. 39. T. H. Wolff, An improved bound for Kakeya type maximal functions, Rev. Mat. Iberoamericana. 11 (1995) 651-674.
99
T I M E - F R E Q U E N C Y ANALYSIS I N T H E D I S C R E T E PHASE PLANE
Christoph Martin Thiele Department of Mathematics, Yale University, USA With a section jointly with Lars Villemoes (KTH Stockholm) This thesis is divided into two independent parts. First Part: The bilinear Hilbert transform is given by # ( / , 9) = / l / t / ( x - t)g(x +
t)dt.
We introduce a Walsh model Hw, called quartile operator, for this bilinear operator and prove that the model operator satisfies
l|JM/.»)l|r
Contents 0. Introduction 1. The Walsh Phase Plane 1.1. Introduction 1.2. Geometric properties of the Walsh phase plane
100 104 104 107
100
C. M. Thiele
1.3. 1.4. 2. The 2.1. 2.2. 2.3. 2.4. 2.5. 2.6.
Basic algebraic properties of the Walsh phase plane Carleson measure estimates Quartile Operator Introduction The decomposition of the trilinear form An integration lemma The decomposition of the bilinear operator The forest estimate The connection between the quartile operator and the bilinear Hilbert transform 3. A.E. Convergence of Walsh Fourier Series 3.1. Introduction 3.2. Proof of Theorem 1.1 3.3. The forest estimate 4. A Fast Algorithm for Adapted Time-Frequency Tilings 4.1. Introduction 4.2. Walsh tilings 4.3. Best basis algorithm 4.4. An example 4.5. Algebraic time-frequency localization 4.6. An example related to the FFT
109 Ill 114 114 116 118 122 124 126 129 129 130 132 134 134 136 140 144 146 149
0. Introduction Time-frequency analysis deals with decompositions of functions or operators into pieces which are localized in time and frequency. This localization can be defined in several ways. The most restrictive definition is that a function in one real parameter, which we call time, is localized at a bounded time interval / , if it is supported on this interval, and it is localized at a bounded frequency interval u>, if its Fourier transform is supported on this interval. Unfortunately, with this definition, there is no function which is simultaneously localized in time and frequency. Hence one often considers a weaker notion of localization, which is defined only relative to one fixed function. One fixes a mother function / and defines it to be localized at the rectangle I x LJ in the time-frequency plane. Then the translated, rescaled and modulated function f(x) = f(2kx — XfOe 2 ** 0 ^" 1- * 0 ) is localized at the rectangle 2~k(I + x0) x 2k{w + &)• The time-frequency plane is also called the phase plane.
Time-Frequency Analysis in the Discrete Phase Plane
101
The recent development of multiresolution analysis has shown that there are many orthonormal bases of L 2 (R), the wavelet bases, which consist of translated and rescaled copies of one mother wavelet. If the mother wavelet is chosen appropriately, then the weak definition of time-frequency localization reflects the properties of the wavelet basis very well. These bases are used to obtain an elegant approach to many problems of Littlewood-Paley theory and the theory of Calderon-Zygmund singular integral operators, as well as to numerical calculations. However, for some applications, one needs a larger variety of time-frequency localized functions than those given by wavelet bases. In addition to translating and rescaling one mother function, one also needs to modulate it, in order to shift the localization in frequency. We call a set of functions a complete collection of time-frequency localized functions, if it is the set of functions obtained from one mother function by applying an appropriate discrete group of translations, rescalings, and modulations. Complete collections of time-frequency localized functions are frequently used in data compression and noise reduction. For example, the acoustic signal of a musical piece is well described as a sum of notes, each having a well-defined time, duration and frequency. A more theoretical application of complete collections of time-frequency localized functions appears in the various proofs of the classical theorem of Carleson, 3 which states that the Fourier series of a function in L 2 [0,1) con verges almost everywhere in [0,1). Our point of view of time-frequency analysis is especially apparent in the proof by C. Fefferman.6 More recently, M. Lacey 11 has tried to use complete collections of time-frequency localized functions to prove a bound on the bilinear Hilbert transform. Such a bound has been con jectured by A. Calderon in the early '70s. Those two applications have been the main source of inspiration for much of the work in this thesis. One of the main ideas of time-frequency localization is that two functions are almost orthogonal, if they are localized at regions of the phase plane which are far away from each other. For wavelet bases one can even achieve exact or thogonality of the whole basis by choosing the mother wavelet appropriately. Unfortunately for complete collections of time-frequency localized functions, the situation is much worse. Here one cannot choose a mother function such that two functions in the complete collection, which are localized at distant regions of the phase plane, are orthogonal. This causes many technical diffi culties in applying complete collections of functions, which is reflected in the
102
C. M. Thiele
fact that the proofs of the Carleson theorem are technically very complicated and the fact that the bilinear Hilbert transform is still not well understood. However, one can find discrete models of time-frequency analysis, in which this difficulty does not appear. Probably the most important one is the Walsh model, which is obtained by taking the characteristic function of the unit interval [0,1) as mother function, and by modulating with Walsh functions Wn instead of exponentials e2*^0*. Here the Walsh functions are defined recursively on the unit interval [0,1) by Wo = 1 and, for every non-negative integer n, Wn(2x)
if x < 0.5,
Wn{2x - 1 )
ifx>0.5,
Wn(2x) WWix
-Wn(2x
if x < 0.5, - 1)
if x > 0.5.
The complete collection of time-frequency localized functions in the Walsh model contains all functions 2~i!2Wn{2-jx-l), with integers j , I, n and n > 0, and we define such a function to be localized at the rectangle [2jl,2j{l + 1)) x [2->n,2-J(n + 1)). The Walsh model is a discrete model, because all functions in the complete collection are finite step functions. It has the property, that two functions in the complete collection are orthogonal, whenever they are localized at disjoint rectangles. An important subset of this complete collection is the well-known Haar basis of L 2 (R). It plays the role of a standard wavelet basis in the Walsh model. The aim of this thesis is to develop the theory of time-frequency analysis in the Walsh model and other discrete models. The main example for a discrete model of time-frequency analysis besides the Walsh model is related to the Fast Fourier Transform. The functions of the complete collection of time-frequency localized functions are most conveniently described by their inner products with an arbitrary signal: these inner products are obtained as intermediate coefficients while performing a Fast Fourier Transform.
Time-Frequency Analysis in the Discrete Phase Plane
103
It is clear that the complete collection of time-frequency localized func tions in the Walsh model is too large to form a basis of L 2 (R). For example the Haar functions, a small subset of this collection, form a basis, and there are many other bases obtained by modulating the Haar basis with a fixed Walsh function. It is a natural question, to classify all those subsets of the complete collection which form bases of L 2 (R). This is important for applications like data compression and noise reduction, in which one wants to expand signals into bases which are adapted to the signals. This classification is done in terms of tilings of the phase plane into rectangles, which are associated to the timefrequency localized functions. L. Villemoes has first given a fast algorithm which selects the best basis with respect to certain cost functions. This algo rithm is described in Part II of this thesis, together with its generalizations to other models of time-frequency analysis. In Part I of this thesis, we discuss the analogues in the Walsh model of Carleson's theorem and Calderon's conjecture. We will prove almost everywhere convergence of Walsh Fourier series of V functions by proving a bound on the maximal partial sum operator of Walsh Fourier series. We will also define an operator which is a Walsh model for the bilinear Hilbert transform, and prove boundedness of this operator. Both operators can be naturally decomposed into sums of time-frequency localized pieces, and these sums are essentially indexed by the complete collection of time-frequency localized functions in the Walsh model. There are three main themes in the proofs of boundedness of the two oper ators. The first theme is to study clusters of time-frequency localized pieces in the phase plane, i.e. sets of time-frequency localized pieces which are localized at a region in the phase plane. Such clusters we call trees. The region, where a tree is localized, corresponds to a subspace of L 2 (R), and part of the theory is to restrict the attention to the projections of functions onto these subspaces. Inside the region of a tree, we will detect a subset of the Haar basis or another basis, which is obtained by modulating the Haar basis by a fixed Walsh func tion. This means we can do Littlewood-Paley theory and Calderon-Zygmund theory to study the trees. The second theme is to study the union of trees, which are localized at distant regions of the phase plane. Such a union is called a forest. The main tool here is the orthogonality of the functions or operators associated to the trees. We will apply the Parseval inequality to obtain several estimates, which we call Carleson measure estimates.
104
C. M. Thiele
The third theme is to collect all forests together and to obtain an estimate on the sum over the operators associated to the forests. Here the convergence is obtained by making use of two effects. For some forests, the operator norm of the associated operator is small. For other forest, the operator norm is large, but then the operator is supported only on a small exceptional set. These two effects are measured by a quantity called density, which is associated to the forests. 1. The Walsh Phase Plane 1.1.
Introduction
The Walsh functions W0, W\, W2, . . . are denned recursively on the unit interval [0,1) by WQ = 1 and, for every non-negative integer n,
W2n{x)
WWi(s) =
Wn{2x)
ifi<0.5,
Wn(2x - 1)
if x > 0.5,
Wn(2x)
if x< 0.5 ,
-Wn(2x - 1)
if x > 0.5.
They form an orthonormal basis of the Hilbert space L 2 [0,1) (see Ref. 15). The Walsh Fourier transform of a function / is its expansion 00
£(/,Wn)Wn n=0
into this basis. With the natural pairing,
(f,Wn)=Jf(x)Wn(x)dx, the Walsh Fourier transform is formally defined for every integrable function. We have the following analogue of the famous Carleson-Hunt theorem for a.e. convergence of Fourier series. Theorem 1.1. For every function f £ Iffi,1) sums
with 1 < p < 00, the partial
N
Y,(f,Wn)Wn{x) n=0
converge as N —► 00 for almost every x in [0,1).
Time-Frequency Analysis in the Discrete Phase Plane
105
We extend the Walsh functions to the whole real line by defining them to be zero outside the unit interval. Then we define for each triple j , I, n of integers with n > 0 the Walsh wave packet wjtl,n(x)
=
2-i'2Wn(2-ix-l).
The quartile operator Hw is a bilinear operator which formally assigns to two locally integrable functions / and g on R the function oo
Hw(f,9)=
oo
oo
^'Yj'l-:>l2(f,wjMn)(g,wjMn+i)wJsAn+2-
51 j=-oo
l = - o o n=0
As usual we denote the norm of the Lebesgue space LP[R] by ||/|| p . Then we have the following bounds on the quartile operator. Theorem 1.2. Let 1 < s, t < oo and 1 < r < oo, such that 1/r = \/s + \/t. Then the above series for Hw(f, g) converges absolutely in Lr, whenever f € L* and g € Ll, and we have the bound \\Hw(f,g)\\r
106
C. M. Thiele
for the Fourier transform, which in nature relates time and frequency. In the Walsh model, time-frequency localization is very sharp and allows a straight forward translation of geometric properties of the phase plane into properties of functions which are built of time frequency localized pieces. The correspond ing theory for the Fourier transform, which should reflect many of the aspects of the model theory, is much more complicated because strict localization of functions can be achieved only either in time or in frequency. The theory in the Fourier case is not yet understood satisfactorily. The analogue of Theorem 1.1 in the case of the Fourier transform is clearly the famous Carleson-Hunt theorem about almost everywhere convergence of Fourier series. The basic ideas of time-frequency localization, which are worked out in this thesis, already appear in the two classical proofs of this theorem by L. Carleson in Ref. 3 and C. Fefferman in Ref. 6. The proof for the Walsh case has first been done by Billard for L 2 (R) in Ref. 2 and Sjolin in Ref. 13 for LP(R). The proof we present here is slightly refined such that it gives the Lp estimate without using Hunt's trick 9 with characteristic functions. The Fourier analogue of the quartile operator is the bilinear Hilbert trans form, given by H{f,g)(x)=
( f{x-t)g{x JR.
+
t)\dt. *
It is conjectured that the bilinear Hilbert transform satisfies the same estimates as the quartile operator in Theorem 1.2. This conjecture was first formulated by A. P. Calderon in the early '70s for the special case L2 x L°° y-t L2. Recently, M. Lacey 11 has tried to apply the techniques of time-frequency localization to prove Calderon's conjecture, an approach which has also been suggested by R. Coifman and Y. Meyer. Some of the ideas here have been inspired by Lacey's work. Unfortunately we do not see a method to apply the results for the quartile operator to obtain any information about the bilinear Hilbert transform. The rest of this section is devoted to developing the geometry of the Walsh phase plane and to discussing how it translates into a theory of time-frequency localization of functions. In Sec. 2 we discuss the quartile operator and prove Theorem 1.2. This is done in two steps. First we pass to the trilinear form
T(/!,/ 2 ,/ 3 ) = ( # ( / ! , / 2 ) , / 3 ) .
Time-Frequency Analysis in the Discrete Phase Plane
107
By duality, Theorem 1.2 translates into boundedness properties of the trilinear form. There is a special symmetry in the case that all three arguments in the trilinear form are in Lebesgue spaces with exponents between 2 and oo, allowing an especially elegant proof of boundedness. To get the remaining estimates, we then work directly on the quartile operator, which will also give another proof for the symmetric estimates of the trilinear form. Finally we will give a heuristic argument, which shows the connection between the quartile operator and the bilinear Hilbert transform. In Sec. 3 we will formulate a proof of Theorem 1.1 using the techniques that have been introduced in the previous sections.
1.2. Geometric
properties
of the Walsh phase J
plane
J
Recall that a dyadic interval is of the form [2 n, 2 (n + 1)) with integers j , n. Given two dyadic intervals, either they have empty intersection, or one is contained in the other. The Walsh phase plane is the closed upper half-plane R x R j . We think of the first coordinate as time and the second coordinate as frequency. A tile p is a rectangle Ip x CJP in the Walsh phase plane, such that |/ p ||w p | = 1 and both Ip and up are dyadic intervals. We call Ip the time interval and wp the frequency interval of p. The dyadic structure forces many interesting properties of the set of tiles, and the aim of Sec. 1.2 is to summarize those which are needed later. A bitile is a dyadic rectangle P = Ipx up in the Walsh phase plane of area two, i.e. |/p||wp| = 2 and both Ip and wp are dyadic intervals. Similarly we define quartiles as dyadic rectangles of area four. If P — [XQ, XI) x [fo>£i) is a bitile, we call the tile [loi^i) x [£o, (fo + £i)/2) the upper son of P. Similarly we define the lower, left, and right sons to be the tiles \XQ, X\)X [(£O + £ I ) / 2 , £ I ) , [x0, (x0+xi)/2) x [fo,6)i and [(zo+zi)/2,xi) x [£o,6)- We call the upper and lower sons frequency brothers of each other, and the left and right sons time brothers. In this way each tile has a unique frequency brother and a unique time brother. Lemma 1.3. Let P be a bitile with upper and lower sons p andp', and let Q be a different bitile with upper and lower sons q and q4. Then at least one of the intersections pC\q and p' Ciq' is empty.
108
C. M. Thiele
To prove the lemma, assume that both intersections are nonempty. Pick two points (x,f) G p n q and (x',£') € p' Ciq'. Then the frequency interval of the bitile P is the smallest dyadic interval which contains both £ and £'. The same holds for the frequency interval of Q, hence the two frequency intervals are equal. This implies that the time intervals of both bitiles have the same length, and as the bitiles have nonempty intersection, the time intervals must also be equal. This is a contradiction to the assumption that P and Q are different. We define a tile p to be less than a tile p', and we write p < p', if Ip C Ip' and Up/ C uip. This defines a partial order on the set of all tiles. Observe that two tiles are comparable, i.e. p < p' or p' < p, if and only if they have nonempty intersection. If p is a set of tiles, we denote by p m i n the set of minimal tiles in p and by p m a x the set of maximal tiles in p. Lemma 1.4. Let p be a finite set of pairwise disjoint tiles. Let p be the set of all tiles p with the property
pC (Jp. Then either p = p m i n or p contains a pair of frequency brothers. Proof. The tiles in p m i n are minimal and therefore pairwise disjoint. They are all covered by tiles of p . Hence if p C p m i n we already have p = p m m and we are done. Therefore we can assume that there is a tile in p which is not in p m i n . We pick such a tile p with maximal time interval Ip. There is a tile p G p which is strictly less than p. This p is also strictly less than the frequency brother q of p. The tile p is covered by p, hence there is a tile p' € p which intersects q nontrivially. This tile does not intersect p, hence we conclude that q < p'. But then p' is also a tile in p which is not minimal, and by choice of p we must have \IP>\ < \IP\. This implies q = p', and we have shown that p contains a pair of frequency brothers. Corollary 1.5. Let p and p be as in Lemma 1.4. Then
0 = U Pp€p
p€pmln
As long as p ^ p m i n , we can replace a pair of frequency brothers in p by the pair of time brothers which covers the same bitile, and it suffices to show
Time-Frequency Analysis in the Discrete Phase Plane
109
the corollary for the new set of tiles. Doing this successively, the procedure must terminate, because changing from frequency brothers to time brothers gives tiles with larger frequency intervals, but there is an upper bound on the size of frequency intervals of tiles in p. Hence this procedure terminates with p = p m i n , and the corollary follows. Corollary 1.6. Let p and p be as in Lemma 1.4. Then for each p in p there is a set p ' of pair-wise disjoint tiles which contains p and satisfies
0=0
P€P'
p€p
By the previous corollary we can assume that p = p m i n . We can also assume that each tile in p m m intersects, p nontrivially, by removing all those, which do not. Then p is maximal in p. By the dual version to the previous corollary the set of maximal tiles in p satisfies the requirements we need. The following lemma gives a criterion for a subset of the phase plane to be decomposable into a disjoint union of tiles. We define for bitiles the same partial order as for tiles. Then a set P of bitiles is called convex, if for all ordered triples P < P' < P" of bitiles we have the property that if P and P" are in P , then P' is also in P . Lemma 1.7. The union of a finite convex set of bitiles can be decomposed into a disjoint union of tiles. We use induction on the number of bitiles. Clearly the lemma is true for the empty set of bitiles. Given a nonempty convex set we pick a minimal bitile P in the set. Let p be one of the two frequency brothers which constitute P. Let P' be the bitile which is the union of p and its time brother. If there is any bitile P" € P besides P which intersects p nontrivially, then we have pii < P' < Pt and by convexity we conclude that P' € P . This means that p is either contained in or disjoint from the union of bitiles in P \ {P}. As this is true for both frequency brothers in P, we can apply the induction hypothesis to conclude the statement of the lemma. 1.3. Basic algebraic properties
of the Walsh phase
plane
We relate the geometric properties of tiles in the Walsh phase plane to alge braic properties of wave packets via the following correspondence. A tile p is determined by three integer parameters j,l,n with n > 0, if we write
110
C. M. Thiele
p = [2*1,2*(I + 1)) x [2-jn,2-j(n
+ 1)).
We associate to this tile the wave packet wp(x) = «/,-«,„(*) = T*12 Wn(2~*x - I). Observe that the wave packet wp is supported on the time interval Ip of the tile p, and that its absolute value is constant on this interval. Lemma 1.8. Let P be a bitile and let wup, ioj0, wi and wr be the wave packets corresponding to upper, lower, left and right son of P. Then we have wi = -=(wto
+ wup),
m = -T=(u>io ~ u>up). After unwinding the definitions, this is nothing but a reformulation of the recursion relations for the Walsh functions. Corollary 1.9. Letp andp' be two finite sets of pairwise disjoint tiles, which cover the same region in the phase plane. Then the corresponding sets of wave packets span the same vector subspace of L 2 (R). As in Corollary 1.5, in each set we can successively replace pairs of frequency brothers by pairs of time brothers without changing the spanned vector space, until both sets become p m m . This proves Corollary 1.9. This corollary gives for each subset S of the Walsh phase plane, which is a disjoint union of tiles, a unique vector space associated to the subset. We denote by Us the orthogonal projection onto this subspace in £ 2 (R), and we say that the set S defines a projection. Corollary 1.10. Let p be a finite set of pairwise disjoint tiles and let A denote their union. Then the wave packet of any tile p C A is contained in the vector space spanned by the wave packets of the tiles in p . This is an immediate consequence of Corollary 1.6 and the previous corol lary. Lemma 1.11. / / two tiles p and p' are disjoint, then the corresponding wave packets are orthogonal in L 2 (R).
Time-Frequency Analysis in the Discrete Phase Plane
111
If the time intervals of the two tiles are disjoint, the two wave packets have disjoint support, which implies orthogonality. If the time intervals are not disjoint, then the frequency intervals must be disjoint. By rescaling the time axis we can assume that both time intervals are subsets of [0,1). Let p be the set of tiles of the form [0,1) x w which intersect p and let p ' be the set of tiles of this form which intersect p'. These two sets are disjoint. The wave packet corresponding to a tile [0,1) x u> is a Walsh function restricted to [0,1), and two different Walsh functions are known to be orthogonal on [0,1). Hence by the previous corollary the two wave packets in question are contained in orthogonal subspaces of L 2 (R), which proves the lemma. 1.4. Carleaon measure
estimates
Assume that we are given a function / € L2 and a set p of pairwise disjoint tiles in the phase plane. Then we can use orthogonality of the wave packets to obtain the Parseval inequality
pep
We will be concerned with sets of tiles for which \(f,wp)\/-^/\Ip\ a fixed size 2k. Then the Parseval inequality becomes
is roughly of
pep
We can rewrite the left-hand side of this inequality as
£|J P |= f Np(x)dx, where Np is the function which assigns to each time x the number of tiles in p whose time intervals contain x. We call a bound as above on the sum of time intervals or the function Np a Carleson measure estimate. The following lemma describes Carleson measure estimates when the function / is merely in some LT space. Lemma 1.12. Let f be a function in Lr. Denote the conjugate exponent of r by r', i.e. 1/r' + 1/r = 1. For each integer k let p* be a set of pairwise disjoint tiles p such that \\Upf \\oo > 2k. Then
112 CM. Thiele (a) if 1 < r < 2, ri/(r'+«) + ll^pr °llr < C ( £ , r ) ^ for any small e > 0. (b) ifr = 2, P€pk
(c) i/2 < r < o o ,
2rfc ^ |/p| < C(r
£ k=-oo
P€pk
Given the fact that HIIy/Hoo = \{f,wp)\/y/\Ip\, we have already shown the lemma for r = 2. For r > 2 we use that the function / is locally square integrable. Denote by Mif the maximal function
M2/(x) =
sup
(±- [\f(y)\2dy)
I dyadic; x6/ V M I Jl
. )
Clearly the time interval of a tile in p* must be contained in the set {x:M2/(x)>2fc}. Hence we can apply our L2 estimate to the restriction of / to this set. We claim that
I
k
\f{y)\2dy
J {x:Mif(x)>2 }
Multiplying by 2 rfe_2fc and summing over k proves the lemma in the case r > 2, since the right-hand side becomes ||M2/||r, which by the maximal theorem is bounded by ||/||;. To see the claim, observe that on each maximal dyadic subinterval I of the set {x: | M 2 / ( x ) | > 2 fc_1 } we have
w\Lmy)?dy-C22k since the next larger dyadic interval contains a point x where the maximal function M
Time-Frequency Analysis in the Discrete Phase Plane
113
To show the lemma for r < 2 we need to invoke interpolation of vector valued functions. Let p be any collection of pairwise disjoint tiles. For each function g on R we can form the vector valued function G on R with values in R p as follows: G(x) = (Upg(x))p€p. The L*(i t )-norm of the function G is defined as t/t
\\(G(x))pep\\L.(lt):=
(j
v l/«
(^2\G(x)\ P€p'
We want to use interpolation for the linear operator which maps g to G. The end points of the interpolation are the obvious bounds
H(np
and for some some small e > 0
ll(nP(z))P6pllit+£.(i») = J (sup |npff(x)|)
dx
C\\g\\r
with a new e > 0 which can be made arbitrarily small. Now we apply this estimate to the function / and the set of tiles p*. Together with the observation /
/(r +£)
r+
ll< ' li; = JNPk(xy« ' *Ux
<±.J(J2
x r/(r'+e)
+£
in P /Wr' J
dx
this finishes the proof of the lemma. Corollary 1.13. The Carleson measure estimates of Lemma 1.2 are also true, if the sets pk are replaced by sets Pk of pairwise disjoint bitiles (quartiles).
114
C. M. Thiele
We prove the corollary for bitiles, the case of quartiles being similar. If P is a bitile such that IIIIp/Hoo > 2k, then one of the frequency brothers in the bitile, p, satisfies ||n p (2/)||oo > 2fe. We pick for each bitile in the set P * such a tile and call this set of tiles p*. Then we apply the lemma to the sets pjt and the function 2 / . This proves the corollary. 2. The Quartile Operator 2.1.
Introduction
In this section P denotes the set of all quartiles. Each quartile P is the union of four tiles ap,bp, cp and dp, which have the same time interval as the quartile. They are shown in Fig. 1. dp op Up bp ap Ip Fig. 1. Subdivision of quartiles.
The quartile operator is Hw(f,g)
= ^2
{wap,f)(wbp,g)i 'Cp
P6P
■
VUPI
It has no significant meaning that we have broken the symmetry between the four tiles in writing down the quartile operator. For example the operator Hw{f,g)
= 52 P 6
P
—7ff=i(™dp,f){wcp,9)wa \/\lp\
has the same boundedness properties as the quartile operator. It is important though, that the three quartiles we have picked in the definition of the quartile operator are different.
Time-Frequency
Analysis in the Discrete Phase Plane 115
One makes sense of the infinite sum in the definition of the quartile operator as follows. Fix an / € L' with 1 < s < oo. Then the sum is uniformly absolutely convergent if g is the characteristic function of the interval [0,1]. To see this, observe that in this case a quartile P contributes to the sum only if bp intersects the square [0,1) x [0,1). We find 2" such quartiles with time interval [0,2" + 1 ), if n is a positive integer. This gives the trivial estimate: £
-^r=(wapJ)(wbp,g)wcp
PeP
By translating and dilating g dyadically, and taking linear combinations, one obtains uniform absolute convergence for all dyadic step functions g. Hence the quartile operator extends to all g 6 L* with 1 < t < oo and is bounded, as soon as the bound of Theorem 1.2 is shown for all step functions g. In the case when / or g is in L°°, one uses similar arguments for the appropriate dual of the quartile operator. We will present two different approaches to Theorem 1.2, the first one proving only part of the theorem, and the second proving the full theorem. The reason that we present also the first approach is that it makes explicit use of the symmetry of the quartile operator and therefore results in a more elegant and straightforward proof of this part of the theorem. The symmetry is expressed by pairing the quartile operator with a third function and thereby passing to the trilinear form
T(/i,/2,/3) = V -
^ K . / I K . / J K . M -
(i)
We will show the bound |T(/1,/2,/3)|
I{.:|M#)I>A}|
116
C. M. Thiele
If we fix g and view the quartile operator as linear operator in / , the adjoint operator is essentially of the same form as the quartile operator and satisfies the same estimates. By duality we obtain the theorem for triples of the form (£, oo, t) with 2 < t < oo. Interpolation gives the result for all triples having 2 < t < oo as third entry. By symmetry we also get the estimate for all triples having 2 < s < oo as second entry. Then the only estimate missing is (1,2,2), but this is proved by an appropriate interpolation with one of the weak type estimates above. 2.2. The decomposition
of the trilinear
form
We organize the set of quartiles into a hierarchy of subsets. We need the partial order of the set of quartiles given by:
P
if P n P ' / O
and
IpClp-.
We define the following density of the quartile P with respect to the function / , S(P,f)=
SUp HIWIloo, P'>p
where lip denotes the orthogonal projection onto the space spanned by wap, Wbp) WcP and WdP ■ Then we define for all integers k the sets of homogeneous density P f c (/) = { P e P : 2 k < 5(P, f) < 2 fe+1 } . The top level in our hierarchy of subsets are the subsets with constant den sity with respect to all three functions that are involved in the trilinear form:
Pfc„fe2,fc3= n p *i(/ot=l,2,3
This gives a partition of all quartiles with nonvanishing density, but the quar tiles with vanishing density do not contribute to the trilinear form anyway. We point out that if one of the functions /< is a linear combination of characteristic functions of dyadic intervals, and we can assume that this is the case, then the sets Pfc1,fc2,fc3 are finite. Let P ™ " fc3 denote the set of maximal quartiles in PkiMM with respect to the ordering < of quartiles. We enumerate the quartiles in P ™ ^ fc3 as MM' anc ^ s o o n ' an< * w e define recursively Pfe^fc^fcj to be the set of elements in Pfc,,fc2lfe3 which are less than PJ^MM ^U* n o * ' e s s t n a n a n v °^
Time-Frequency Analysis in the Discrete Phase Plane
117
the quartiles P^^fa with i' < i. The sets PJ.( k k3 are called trees, and the maximal quartiles Pki ki k3 tree tops. The trees are the smallest subsets in our hierarchy. The rest of this section is devoted to proving the following estimate on a single tree.
—frr=XwapJ\){^bp,f2){wcp,h) <2*«+*» + *»|J p*,j . « 2 . « 3 |.
$3
(2)
For boundedness of the trilinear form it will then suffice to show
2 f c '+ f c ^ f c 3^| / ^^ k j< c .|| / l || r i || / 2 || r a || / 3 || r 3
£ ki,k2,k3
(3)
i
which is done in Sec. 2.3 using Carleson measure estimates with respect to the functions / i , /2, and fa. We denote the tree Pxki fca k3 for short by Q and the tree top Pk k k by Q. Fix a frequency £ in the frequency interval of the tree top and split the tree into four subsets, called subtrees: Q a , Q&, Q c and Qd, where Q n contains all quartiles P in Q such that £ lies in the frequency interval of ap, etc. The tree estimate follows by the corresponding estimate for each of the four subtrees, and as the four subtrees are handled in similar fashion, we discuss only the tree Q„. The first crucial observation is that the tiles bp and bp< are disjoint, when ever P and P' are two different quartiles in Q 0 . This follows from Lemma 1.3, because whenever the time intervals of P and P' have nonempty intersection, the tiles ap and a'p have nonempty intersection. Hence as P runs through the quartiles of Q Q , the wave packets correspond ing to the tiles bp form an orthogonal set. The same is true by a similar argument for the tiles cp. Therefore we can apply the Cauchy-Schwarz in equality to get
£K^Kp)/2)Kp)/3) p€Q
Vl-^l
<2*> £
\(wbp,f2)\\(wcp,f3)\
<2 fc MI/2|| 2 ||/3||2.
PGQ
The second crucial observation is that we can replace in this estimate the functions /2 and fz by their projections onto the area AQ = UpeQ ^ °^ *^ e
118 C. M. Thiele tree. This projection IIyiQ is denned, because by construction the tree Q is a convex set of quartiles, and we can apply Lemma 1.7. The left-hand side of the above estimate does not change if we replace /2 and f^ by their projections onto the area of the tree, because by Corollaries 1.6 and 1.9, the functions iu&p and wcp are contained in the image of the orthogonal projection IIyiQ. We estimate the norms of the functions I I ^ Q / J for i = 2,3. Clearly they are supported on IQ. Furthermore if x € IQ, and if P is the minimal quartile in the tree whose time interval contains x, then U.AQfi(x) = Tlpfi(x), because the difference of the two projections is supported outside x. But Hpfi(x) is bounded by 2ki by construction of the tree, hence IlAqfi is also a bounded function. Now we can estimate 2 fcl ||nA Q / 2 || 2 ||nA Q /3||2 < 2k^y/\iQ~\2^yf\IQ'\
= 2fc>+fe*+fe*|JQ|
which completes the proof of the tree estimate. 2.3.
An integration
lemma
We first reduce inequality (3) to a slightly simpler inequality by an easy geo metric argument. Fix a quartile P in Pfe1,ax(/i) and consider the collection P ' of quartiles in P J ? " k which are less than P. The quartiles in P ' are pairwise incomparable. But the frequency intervals of all these quartiles contain the in terval wp, hence the time intervals of the quartiles in P ' are pairwise disjoint. This implies the estimate
E i'H
|/H<
£ J"ePZ,1"fc2.k3
E
l7"l-
P6Pf«(/,)
This holds also with the sets PJP^Cfe) and PJ^Cfe), on the right-hand side. To complete the proof of boundedness of the trilinear form it therefore suffices to show
E kiMM
2kl+k2+k3
M3
'
E
' ' P€Pf"(/i)
i'H
Time-Frequency Analysis in the Discrete Phase Plane
119
This follows by a purely algebraic calculation using the Carleson measure estimates
E2"4*
£
\ip\
p e pjn., ( / .)
k
for i = 1,2,3 of Lemma 1.12. At this point it is convenient to pass from discrete sums over the integers to integrals over the real line. Define for i = 1,2,3 and any real number x:
Si(x)=
p
J2
I'H,
eP£7 l o l 2 | (/0
where [a:/log 2] denotes the largest integer which does not exceed x / l o g 2 . With this definition the above Carleson measure estimates are equivalent to [
e't'SiWdxKCWMZ
JR
and the inequality we want to prove is equivalent to /
e'i+'»+'»
JR3
inl■ Siixjdxidxadx*
< CIIMUIMkH/allr,.
«=1,2,3
It is now a direct application of the following lemma. L e m m a 2 . 1 . Let 5< for i = 1 , . . . ,n be positive measurable functions on the real line such that
f
eriXSi(x)dx
JR for some constants r*, Fi with 1 < r< < oo and J^ i ^- = 1. Then I:=
eXl++x»(
f n
J*
inf \i=1
5«(*i) | dxx • -dxn < n
/
nflFi. .=i
Proof. The interesting case is n > 2. Let r[ denote the conjugate exponent of rit i.e.
i + A-i. We write each i g R " uniquely as x = u + tv + w, where t is a real number, the coordinates of u and v are for i = 1 , . . . , n
120
C. M. Thiele
1
Vi = — ,
n
u i = log(F i r i 1 / r ') l and w is a vector orthogonal to ( 1 , 1 , . . . , 1). Then we can change variables as follows:
7=4= I y/n Jjtn-l
I <£'Uj+tVi+Wi inf(5i(iii + tvi + Wi))dtdw JR
I
= 4 = TT F
I e* inf(5i(tii + tvt + wt)) dt dw,
(4)
where dw is the standard normalized Lebesgue measure on the orthogonal complement of the vector ( 1 , 1 , . . . , 1) and the factor 1/y/n comes from the change of variables. We estimate the inner integral /„, for a fixed w. Let j be the index such that rjWj is maximal. It is clear that Wj is positive. /„, = I e* inf(Si(ui + tvi + Wi)) dt < JR » =
e-rttuj+w,)
f
e'S^Uj + tvj
+Wj)dt
JR
e^^+^i+^^Sjiuj+tVj+Wj^dt
JR
Hence / « < e~r'w' . It is now clear that the integral (2.1) converges, because Wj is up to a factor of at most n equal to the Euclidean norm of w. The purpose of the rest of this section is to get the right constants. The impatient reader may omit this part of the proof. We have /
Iwdw<
f
e-
r
«-) , B «-) dw = V
/
er'v> dw,
where Cj is the cone of those w in the orthogonal complement of ( 1 , 1 , . . . , 1) for which we have picked j(w) = j . We consider the case j = 1. We cut the cone into (n — 2)-dimensional slices with constant height x\. Then the
Time-Frequency Analysis in the Discrete Phase Plane
121
scaling behavior of these slices gives a constant C depending only on r i and the geometry of the cone, such that f eriWi dw = C f JCi
( r 1 x 1 ) n - 2 e - r ' X l r 1 dxi = C(n - 2)!.
JR+
To get this constant, we integrate the characteristic function on the truncated cone /•l/ri
C
1
(r1xl)n-2r1dxj=C-.
Jo (n- 1) and observe that this integral is the volume of the convex hull of the origin and the vertices
(LLL
J_ I _ i J _
1)
where i runs from 2 to n. This volume is equal to 1 det(M)|, (n-1)! where M is the matrix
l/n
i/n
i/n
l/>/n
1/ra-l
l/r 2
1/T2
1/Vn
l/r3
l/r3 - 1
h/y/H
l/ra ^lA/n
l/r„
l/rn
I K - 1^/ The first column is a unit vector chosen to be orthogonal to the others, and the rest of the matrix maps the convex hull of the vectors ( 0 , 1 , 0 , . . . , 0 ) , . . . , ( 0 , 0 , 0 , . . . , 1), which has volume / n i 1 \i to the convex hull which is to be in vestigated. The matrix acts like minus the identity on all vectors orthogonal to ( 1 , 0 , . . . , 0) and ( 0 , 1 , . . . , 1) and acts like 1/v/H
(n-l)/r,
l/yfi
-1/n
122
C. M. Thiele
on those two vectors modulo their orthogonal complement. Hence the deter minant of the matrix has absolute value y/n/ri and we use this to get
L/c,
n
Resubstituting everything gives
i
It is well known that the entropy X)?=i 7rl°g r « ^ maximal subject to the constraint £ " = 1 ^- = 1 if all rj are equal n. This proves n
which was claimed in the lemma. 2.4. The decomposition
of the bilinear
operator
We fix a function / € L* with 1 < s < 2 and view Hw(f, g) as a linear operator in the second argument. We decompose the operator similarly, but this time we consider only the density of a quartile with respect to the function / . We write for simplicity Pfc := Pfc(/), the latter being defined as before. The maximal quartiles in Pfc are again denoted by P™**. For each quartile in Pfc we count how many quartiles of PjJ ,ax are above the quartile, and we define for each non-negative integer i: Pki
= { P e P f c : 2* < \{P' 6 P ^ " : P' > P}\ < 2 i + 1 } .
The sets Pfc,i are called forests. We define the forest operator Tpki
as cp
PePk
Then the basic decomposition of the quartile operator is 0
H{f,9)= £
oo
£TPfc».
fc=-oo »=0
Time-Frequency Analysis in the Discrete Phase Plane
123
We will prove in Sec. 2.5 that the forest operator is bounded on the Hilbert space L 2 (R): ||T P k , j 5 || 2
(5)
With this estimate, the rest of the proof consists of putting the pieces together in the appropriate way, using a Carleson measure estimate. Fix a large constant m > s', and let fco be an integer to be chosen conveniently later. Let A be the set of all forests Pk
B
*o Fig. 2. Decomposition of the set of forests.
For a time x to be contained in the time interval of a quartile of the forest Pfc.i, it must be contained in the time interval of at least 2* quartiles of Pj?11*. Hence if Pkti is a forest in B, then the support of Tpk tg is a subset of
Ek =
{x: Nk{x) > 2m(fc°-*>}
if k < ko
{*: Nk{x) > 1}
if k > ko ■
The size of the set Ek is controlled by the Carleson measure estimate of Lemma 1.12.
\Ek\<
C\\f\\'J{2ak2sm^-k^la'-^)
if Jk
CII/II:/(2'*)
if k > ko .
The constant m is chosen such that these estimates are summable over A;. We obtain for E = Ukl-oo Ek tae bound
124
C. M. Thiele
\E\
2»ko
Outside the set E we only have to consider the forests in A. Hence if g G L2 we have for the quartile operator outside the set E:
£
\\Hw{f,9)lE42<
Tpk,g
P*,,6A
C\\gh2km(ko
< £
- k) < C\\g\\22k°.
k
If g 6 L* with t > 2 we observe again that the time intervals of all quartiles of the forest Pfc,i are contained in the set {x: Nk(x) > 1}. Applying the forest operator to g gives the same as applying it to the restriction of g to this set. For this restriction we have, using the Carleson measure estimate llffl{x: *fc(»)>i}lla < \\9\\t\{x: Nk(x) > i>|i/a-V«
A2
+ C H/II: 2*fco
Now for given A we pick fco to be the closest integer to
iog2(v/('+«)|i/n:/c+*)||fl||t-t/(,+t)) such that we obtain the desired weak type estimate ta/ t+3)
(\\f\\.\\a\\t) \{T(f,g)(x)>X}\
2.5. The forest
(
estimate
We decompose the forests further. For each quartile Q in Pfc 1 " we consider the set of all quartiles in Pfc,j which are less than Q and call this set the tree with tree top Q.
Time-Frequency Analysis in the Discrete Phase Plane
125
Clearly the union of the trees in the forest gives back the forest. It is a beautiful observation of Fefferman in Ref. 6 that two such trees are disjoint. For a proof of this fact assume to the contrary that a quartile P € Pk,i lies below two maximal quartiles Q and Q'. This implies that Q and Q' have disjoint frequency intervals. By definition of P ™ " the quartiles Q and Q' are less than at least 2' maximal quartiles in P*, and by the disjointness of frequency intervals these two collections of maximal quartiles are disjoint. Hence P is less than at least 2* + 1 maximal quartiles in Q t , a contradiction to P G Pfc.tWe conclude that whenever two quartiles in a forest are comparable, they are both comparable to a common maximal quartile, and therefore they are in the same tree. This means that quartiles from different trees are incomparable. This implies that the tree operators Yl
Tr—l(^ap,f)(yJbP,9)wCp
PePk.i,P
'
p
'
are mutually orthogonal as Q runs through Pj^f*- Therefore it suffices to prove the desired L2 estimate for a single tree operator. We decompose as before the tree Q into four subtrees: Q 0 , Qj,, Q c and Q j by fixing a frequency in the frequency interval of each tree top. The bound for the trees Q 0 and Q j can be done in the same way as we did the tree estimate for the trilinear form. The adjoint of the tree operator of Q c is essentially of the same form as the tree operator of Q j , hence by analogy we are done once we have shown the bound for the tree operator of Q^. As in the proof of the tree estimate for the trilinear form we can replace / by its restriction to the area AQ of the tree, and this projection U.AQf is pointwise bounded by C2k. We calculate, using that the tiles cp run over a disjoint collection of tiles t Yl / i T - r ( "ap,/)(w6p,g)wc P PeQ„ VP\
~j£^.M\<*.9WNow we view Q(, as a measure space, where each quartile has the measure
n({P}) = \(wapJ)\2. Then the previous sum can be written as
126
C. M. Thiele
=
\/f*({ p: ^ iK " 9)|j ^}h
We estimate the measure of the set inside the integral. The time intervals of the quartiles occurring in the set are obviously contained in the set E =
{x:Mg(x)2>\},
where M is the maximal function. The tiles ap are pairwise disjoint, hence the measure of the above set by the Plancherel theorem is bounded by the square of the L 2 -norm of the function II^p/, restricted to the set E. This norm is bounded by C2ky/\E~\. We continue the calculation, using the maximal theorem, to get
yj < Jf°° C2*\{x : Mg{xY > \}\d\ C2k\\Mgh
=
This proves the bound for the forest operator. 2.6.
The connection between the quartile and the bilinear Hilbert transform
operator
A formal calculation with the Fourier transform gives for the bilinear Hilbert transform:
H(f,g)(x)=
f f
mme^^'i-^sgniv-^dr,^.
Consider the 77, £ plane and a Whitney decomposition into squares of the area below the diagonal t] — £ as in Fig. 3. This Whitney decomposition is chosen such that for each dyadic interval w = [2Jn, 2 J (n + 1)) there is a square w x w' in the decomposition with w' — [2J(n — 2), 2 J (n — 1)). We fix the notation that w' denotes the dyadic interval which is two steps below w. Let fi be the set of dyadic intervals of the form
2
[ 'H>*K))
Time-Frequency Analysis in the Discrete Phase Plane
127
Fig. 3. Whitney decomposition of the lower triangle.
with integers j and n, i.e. those intervals which occur as frequency intervals of third tile cp of quartiles P. The squares w x w' with u> € $7 are marked with a dot in Fig. 3. We consider the following modified bilinear operator, where 1^, denotes the characteristic function of the interval w and l w its inverse Fourier transform. i( x E / / />)$(0e ^° M»7)M0*?de = EQ ^ * f^WiS * ^'K*) • utnJRjR U€
This is to be understood in a formal and heuristic sense. If we needed to avoid difficulties with the fact that 1^, is not integrable and therefore the convolutions on the right-hand side are troublesome, we would choose smooth cutoff functions instead of lu. To recover the bilinear Hilbert transform from this expression, we have to take a linear combination with the corresponding sum for the area above the diagonal and to average over translated and dilated dyadic grids in order to get rid of the gaps we have left over in the Whitney decomposition. Therefore boundedness properties of the modified operator will translate into boundedness properties of the bilinear Hilbert transform. The expression / * lu can be understood as taking the frequency band u of the function / . In the Walsh model this corresponds to
I: dyadic, ||I|M = 1
128
C. M. Thiele
In the continuous case, the frequency band w of / is multiplied to the frequency band w' of g to get a function with frequency band w + u/. This situation is shown in Fig. 4. In contrast to that, the product ^2(f,wiXu,)wIxw^2(g,wjXU')wIXU' i
I
would give a frequency band which is close to the bottom of the half plane. In order to model the cancellation properties of the bilinear Hilbert transform, one has to shift this frequency band up. We shift to get the frequency band of the continuous case scaled by the factor 1/2, such that it is the band right in between to and w'. frequency
u + u'
(w + u / ) / 2
time Fig. 4. Time-frequency diagram for ( / * lu)(x)(g
* lw')-
Then each term 1 (/, W/xuiXs.
VW\
WIX.U')'WI-X.(U+UJI)/2
equals a summand of the quartile operator except for a permutation of the three frequency bands, which in view of the symmetry of the corresponding trilinear form is inessential to the theory. The double sum over all u 6 Q and
Time-Frequency Analysis in the Discrete Phase Plane
129
all / with |/||u>| = 1 can be indexed as a smn over all quartiles, hence we have arrived at the quartile operator. 3. A.E. Convergence of Walsh Fourier Series 3.1.
Introduction
To prove Theorem 1.1 is equivalent to proving the boundedness of the maximal partial sum operator Smaxf(x)
= sup n
£(/,WW*(x) fc=0
We will prove the weak type estimate |{x: S m a x / ( x ) > A} | < C II/II; for 1 < r < 2. By standard arguments this proves Theorem 1.1. Fix a very large number N and fix a function n : [0,1) i-> No that is bounded by 2N and constant on each dyadic interval of length 2~N. The linearized maximal partial sum operator is n(x)
S"(x) = *=0
To prove the above weak type estimate, it suffices to show
|{x: Snf(x) > X}\ < c f f i with a constant C independent of the number TV and the function n. We decompose the linearized maximal partial sum operator into a sum of operators associated to bitiles. Throughout Sec. 3, P will denote the set of bitiles. Recall that each bitile P is the union of a lower son lp and an upper son up. Define for each bitile P the operator Tp by TPf(x) We claim that
=
(f,wlp)wlp(x)
if (x,n(x))
£uP,
0
if (x,n(x))
$.up.
130
C. M. Thiele
Snf(x)=Y,TrfWP€P
Observe that due to our special choice of the function n there are only finitely many summands on the right-hand side of the equation which are nonzero. Fix a n i o e [0,1). Then the rectangle R = [0,1) x [0,n(xo)) defines a projection Ilfl. Decomposing the rectangle into tiles of the form [0,1) x [fc, fc+1) with corresponding wave packets Wk, we see that Snf(xo) = IIfl/(xo). Now we decompose the rectangle R into another collection of pairwise dis joint tiles. For each point (x, f) in the rectangle there is a unique bitile whose lower son contains (x, £) and whose upper son contains (x, n(xo)). Indeed, the frequency interval of this bitile is the minimal dyadic interval which contains £ and n(xo), and this determines the length of the time interval of the bitile, which is therefore unique. The rectangle is therefore the disjoint union of the lower sons of all bitiles whose upper son intersects the set [0,1) x {n(xo)}. We use this decomposition to calculate I I R / ( X O ) . In doing this, we can re move all tiles from the collection, whose time interval does not contain xo, or equivalently whose upper son does not contain the point (xo, n(xo)). But what remains is clearly equal to S p g p Tpf{xo)3.2. Proof of Theorem
1.1
We organize the set of bitiles into a hierarchy of subsets, as we have done in Sec. 2.4 for the set of quartiles. For any subset Q C P we write T Q for the operator 2Jpgtj Fix a functionTP./ € IS with 1 < r < 2. With the partial order of the set of birtiles, Fix a function / € Lr with 1 < r < 2. With the partial order of the set of birtiles, P
SUp HIWHoo. P'>p
The sets of homogeneous density are The sets of homogeneous density are Pfc = P f c (/) = {P G P : 2fc < 6(P,f)
< 2k+1} ,
and the forests are P M = {P e P f c : 2i < \{P' e P g " " : P' > P}\ < 2 i + 1 } .
Time-Frequency Analysis in the Discrete Phase Plane 131 In the next section we will prove for 1 < s < oo the following estimate on the forest operator
l|rPtil/iL
£
\ip\)
for 1 < s < oo. With that we simply proceed as in Sec. 2.4 and split the set of forests into the set A of all forests Pk,i such that k < ko and i < m(ko — k) and the set B of all other forests, with m large enough and ko to be chosen. The forests in B are supported in the small set E = \Jk Ek, which was defined in Sec. 2.4 by
{
{x : Nk(x) > 2m(fc°-fc)}
if k < ko
{x : Nk(x) > 1}
if k > ko
and which is bounded in measure by
1*1 * C ^f • Pk,i is a forest in the set A, then we can remove all bitiles of this forest whose time interval is contained in the set E, because we will disregard this set anyway. Then we have the following Carleson measure estimate
Y\ Pep
k T x ■•*
\IP\<
f
miniNkix^dx
ip£E
< <
/(ATfc(x))^(2m(fc°-fc))1~^dx cll/llr2m(fc0-fc)(l-7^T)
2rk We substitute this into the forest estimate and sum over all forests in the set A to get
fc=-oo
^
'
which is for large s summable to
lirP/iB.||.
132
C. M. Thiele
HrP/(«) > A}| < cllTp{fM'E' + |E| < c « . 3.3.
The forest
estimate
As in Sec. 2.5, each forest Pfc,i is the union of convex trees such that two different trees cover subsets of the Walsh phase plane which are disjoint. If Q is a tree, then the support of the function T Q / is contained in the set of all x such that (x, n(x)) is contained in U Q G Q Q - Hence for two different trees Q and Q' in the forest Pk,i we conclude that T Q / and T Q < / are supported on disjoint sets. It therefore suffices to show for 1 < s < oo ||T Q k | i /||,
r Qu /(x) = n / x w ( x ) /(x). Moreover we will show that this interval can be written as u(x) = (wi(x)\cj2(x))r\w,
(6)
where ui(x) and a>2(x) are dyadic intervals with cJ2(x) C wi(x) and w = [0,£) with £ as above. By slicing everything into tiles with time interval / , we see that this gives the operator identity
Time-Frequency Analysts in the Discrete Phase Plane
133
njxw(x) = (n; X ui(i) - n/X(1^(X))n/xw • The operator HiXui(x) is pointwise boimded in absolute value by the maxi mal operator. This one can see from slicing the rectangle J x u>i (x) into minimal tiles, i.e. those which have frequency interval wi(i). Then at each point the projection operator coincides with the projection onto a single tile, which is clearly bounded by the maximal operator. The same is true for Hixu>2(x)This gives \TQJ(x)\ < \nIxu(x)f(x)\ < 2MUIxuf(x). The operator Hixu is nothing but the ordinary partial sum operator n fc=0
with a certain n, rescaled to the interval I. This one can see from slicing the rectangle into maximal tiles, i.e. tiles with time interval I. By Paley's theorem 12 this operator is bounded in V for 1 < r < oo with a bound not depending on n. This and the maximal theorem applied to the previous in equality gives the desired estimate for the tree operator. It remains to show the existence of the frequency interval u>(x) with the stated properties. Pick a>i(x) to be the frequency interval of the minimal bitile in Q u which contains the point (x,n(x)), and let W2(x) be the frequency in terval of the upper son of the maximal bitile in Q„ which contains the point (x,n(x)). If no bitile in Q u contains this point, there is nothing to prove because then the operator vanishes at x. By Eq. (6) this determines the inter val u(x). Let P be any bitile in Q u which contributes to TQU/(X), i.e. whose upper son contains (x, n(x)). We show that the lower son lp is contained in the rectangle I x cu. The time interval of lp is certainly contained in the time interval / of the tree top. As to the frequency interval of lp, we observe that it lies below the frequency £ by definition of Q„, and hence is contained in a;. Furthermore it is contained in ti>i(x), because of the extremality of the latter. But it is disjoint from W2(x), because by extremality the interval U2(x) must contain in the frequency interval of up, which is certainly disjoint from the frequency interval of lp. By Lemma 1.3 the lower sons of the bitiles in Q u whose upper sons contain (x, n(x)) are disjoint. To show that the projection onto the union of these tiles coincides at the point x with the projection onto the rectangle Ixw(x), it
134
C. M. Thiele
remains to show that those tiles cover each point of the line {x} x w(x), because then the difference between the two projections is supported outside x. So let (x, f") be a point on this line. Then there is a unique bitile whose upper son contains (x, n(x)) and whose lower son contains (x, £")• By reversing the arguments above we see that it is squeezed between two bitiles of Q u w.r.t. the partial order of bitiles. By convexity of Q the bitile is in Q. Moreover the frequency interval of its upper son contains u^C^) and hence contains the point (x, £). Hence the bitile is in Q u . 4. A Fast Algorithm for Adapted Time-Frequency Tilings* We first consider orthonormal bases of R w consisting of discretized rescaled Walsh functions, where N is a power of two. Given a vector, the best basis with respect to an additive cost function is found with an algorithm of order 0(N log N). The algorithm operates in the time-frequency plane by construct ing a tiling of minimal cost among all possible tilings with dyadic rectangles of area one. Then we discuss generalizations replacing the Walsh group, which controls the structure of the time-frequency plane, by other finite Abelian groups. The main example here involves the Fast Fourier transform. 4.1.
Introduction
The goal of time-frequency analysis is to decompose a signal into elementary wave forms, each having a well-defined position, duration, and frequency. For applications like data compression or noise reduction, the decomposition should be realizable by a fast, stable and adaptive algorithm. The recent best basis algorithm of Coifman and Wickerhauser5 achieves this goal by starting from a large library of orthonormal bases, containing one basis for each dyadic seg mentation of a fixed interval. This interval is the time axis when the library consists of local trigonometric bases and it can be thought of as the frequency axis when the library consists of wavelet packet bases. Given a signal, the best basis is the one which minimizes an additive information cost function defined on the set of expansion coefficients. It can be found by a binary tree search, requiring only O(N) operations for a signal of length N. The whole algorithm for expanding a signal in the best basis is of order 0(N log N) for the wavelet packet library and 0(N(\ogN)2) for the local trigonometric library. a
Section 4 is taken from Ref. 14, which is a joint work with Lars Villemoes.
Time-Frequency Analysis in the Discrete Phase Plane
135
Each basis element from these two libraries of bases can be identified with a rectangle of area one in the time-frequency plane, such that orthonormal bases correspond to disjoint coverings, or tilings of this plane. This method is applied in the software tool 16 to visualize the time-frequency decomposition of a signal. From that point of view, the local trigonometric bases correspond to tilings that can be obtained by first choosing a dyadic partition of the time axis, thereby dividing the time-frequency plane into vertical sectors, and then tiling each sector with rectangles of maximal time width and area one. For wavelet packet bases, the role of the time and frequency axes are interchanged. In the double-tree algorithm proposed by Herley, Kovacevic, Ramchandran and Vetterli, 8 a time segmentation is followed by local frequency segmenta tions. The elementary wave forms are wavelet packets on each time interval, constructed as the interval wavelets of Cohen, Daubechies, and Vial. 4 The num ber of bases in the double-tree library is substantially larger than for global wavelet packets, but there are still many dyadic tilings which do not correspond to orthonormal bases. On the other hand, for the special wavelet packets defined from the Haar filter, any tiling corresponds to an orthonormal basis. These wavelet packets are just rescaled versions of Walsh functions. The first purpose of this thesis is to present a fast algorithm for selecting the best basis among all these bases, given a signal and an additive cost function. Instead of relying on dyadic in terval segmentations, the algorithm will operate directly in the time-frequency plane. Of course, since the frequency resolution of Walsh functions is very poor the algorithm should be viewed only as a first step towards a new class of time-frequency decompositions, based on a more symmetric treatment of the time and frequency parameters. A better frequency resolution would require some nontrivial smoothing procedure. The fast algorithm for selecting the best basis as described in the first part of this thesis applies to more general bases than just the one consisting of Walsh functions. Those bases are described by successively applying orthonor mal transformations on pairs of basis vectors. However in the case of Walsh functions there is a global understanding of the basis vectors which comes from the fact that the Walsh functions are characters of a group. In the second part of this thesis we specialize to this point of view and introduce libraries of bases attached to other groups, for which there is an analogue algorithm to pick the minimal cost basis. The basis elements in these
136
C. M. Thiele
libraries correspond to subsets of the phase plane hat are no longer rectangles, but spread out over the plane. Therefore they are not localized in time and frequency in the usual sense, so we call them algebraically localized. However, for the case of a finite cyclic group, the basis elements have a very natural meaning, they are implicitly used in the Fast Fourier transform. That means in the intermediate steps of the Fast Fourier transform one calculates the expansions of the signal into all bases of the library. Especially the pure frequency basis is contained in the library. Generally the basis elements are localized both in time and frequency at sampling points with certain sampling rates. Rather than trying to formulate the algorithm of this thesis in most gen erality, we decided to present it for the cases we find most important and interesting. That is for Walsh functions, here the algorithm was discovered by L. Villemoes and for bases attached to finite groups, this application was pointed out by Thiele. The library of orthonormal bases of rescaled Walsh functions is defined in Sec. 4.2 and the selection algorithm is described in Sec. 4.3. To illustrate the algorithm, a small numerical example is worked out in Sec. 4.4. The algebraic point of view is developed in Sec. 4.5, and illustrated by an example related to the Fast Fourier transform in Sec. 4.6. 4.2. Walsh
tilings
Let Wo(t) = 1 for 0 < t < 1 and zero elsewhere, and define recursively by W2n(t) = Wn(2t) + (-l)nWn(2t n
W2n+1(t) = Wn(2t) - (-l) Wn(2t
Wi,W2,...
- 1), - 1).
Then {W„}JJL0 is the Walsh system in sequency order, as constructed in Ref. 15. It is an orthonormal basis for L 2 (0,1) and each basis function is piecewise equal to either 1 or —1 on [0,1[. The number of sign changes for Wn on this interval is equal to n as it can easily be seen by induction from (7). Therefore, we will think of n as a frequency parameter, although the actual frequency localization is very poor. The natural (Paley) ordering is obtained by omitting the factors (—l) n in (7). We will build bases consisting of rescaled versions of the Walsh functions. Considering S = [0, l[x [0, oo[ as a time-frequency plane, we will define a dyadic rectangle to be a rectangle with dyadic sides, i.e. a subset of S of the form
Time-Frequency Analysis in the Discrete Phase Plane
137
I x u> = [2~jk, 2~\k + 1)[ x [2j'n, 2^'(n + 1)[ , where j , j ' , n, k are non-negative integers and k < 2 J . Let V be the collection dyadic rectangles of area one, (j = j ' ) . These special rectangles will be called tiles. To each tile p e P w e assign the corresponding rescaled Walsh function wp(t) = 2^2Wn(2jt-k).
(8)
Thus wp is supported on / where it has n sign changes. It is therefore natural to think of p as a Heisenberg box with time interval / and frequency interval w. We will see that disjoint tiles lead to orthogonal functions wp. For the moment, just observe that this is true for tiles with disjoint time intervals. A dyadic rectangle of area two in S can either be split in a left and a right tile {l,r} or in a lower and upper tile {d, u}. All orthogonality properties of rescaled Walsh functions can then be derived from the following simple observation. Lemma 4 . 1 . Assume T = I x u be a dyadic rectangle of area two. Define the four tUes {l,r,d,u} C V by T = lUr = dUu, u(l) = w(r) and 1(d) = I(u). Then {wi,wr} and {w- 1 (2fc+l),2-^- 1 (2A:+2)[, w(d) = [2j(2n),2j(2n + 1)[ and w(u) = [2^(2n + l),2j(2n + 2)[. From (7) it follows that 2j/2W2n(2jt
-k)
= 2j/2Wn(2j+1t
- 2k) + (-l)n2j'2Wn(2j+H
-2k-l),
2j/2W2n+i(2jt
-k)
= 2j/2Wn(2j+H
- 2k) - (-l)n2j/2Wn(2j+H
-2k-l).
Therefore, by (4.2),
( : ) - * ( : - ! : ; ) ( : ) showing that {wd, wu} is obtained from {wi, wr} by an orthogonal transforma tion. □
138
C. M. Thiele
R e m a r k . The matrix in (a) could actually be replaced by any unitary matrix, even depending on the dyadic rectangle T. The algorithm of Sec. 4.3 would still work. In fact, Lemma 4.1 can be seen as a definition of the kind of libraries of orthonormal bases for which Algorithm 4.7 can be used directly. Obvious variations are possible like changing from dyadic to triadic rectangles, (letting the matrix in (9) to be of size 3 x 3), or starting from another basis for RN than the standard basis which we use in Sec. 4.3. Bases that have multiplicative structures like the Walsh functions are discussed in Sec. 4.5. The next result is obtained by induction from Lemma 4.1. It states that dyadic rectangles can be identified with subspaces of L2(0,1). Corollary 4.2. Let T = I x w be a dyadic rectangle of area \T\ = \I\\u\ > 1, and assume that B and B' are two collections of tiles, both defining disjoint coverings ofT. Then {wp\p G B} and {wp\p G B'} are orthonormal bases of the same subspace of L2 (0,1). Proof. Put V — span{io p |p G B} and LQ = max{/(p)|p G B}. Then the set of tiles in B with \I(p)\ — LQ consists of pairs of the form {d, u} as in Lemma 4.1. An application of this lemma to all those pairs changes B to a collection Boo with maximal time interval length L\ = Lo/2 and such that the rescaled Walsh functions corresponding to B^, also define an orthonormal basis of V. We can repeat this process until all tiles have time intervals of length l/|w|. This condition is only satisfied by one tiling £ C V of T. Therefore the same modification of B' must also lead to C. Hence {«>P|p G B}, {wp\p € B'} and {ivp\p G £ } are orthonormal bases of the same subspace of L 2 (0,1). □ Corollary 4.3. The functions wp and wq are orthogonal if and only if the tiles p and q are disjoint. Proof. Assume p and q are disjoint tiles. If they have disjoint time intervals I(p) n I(q) = 0 then wp and wq are trivially orthogonal. If I{p) and I(q) are not disjoint, one of the intervals must contain the other, because they are dyadic. Let us assume that I(p) C I(q), and let u be a dyadic frequency interval such that u>(p), uj(q) C ui. Then it is not hard to construct a tiling of T = I(q) x UJ containing the tiles p and q. Therefore, by Corollary 4.2, wp and wq are different members of an orthonormal set. Conversely, suppose q intersects p. We can assume that I(p) C I(q) and hence w(q) C u)(p). The rectangle I(q) x w(p) can now be covered by disjoint
Time-Frequency Analysis in the Discrete Phase Plane
139
tiles congruent with p or by disjoint tiles congruent with q. By Corollary 4.2 the two corresponding sets of rescaled Walsh functions are orthonormal bases for the same subspace V of L 2 (0,1), and that V contains both wp and wq. Expanding wq in the basis denned from tiles congruent with p it follows that wq J- wp implies wq = 0 on I(p), contradicting the fact that the support of wq is all of I(q). D The above results about finite systems of rescaled Walsh functions are suf ficient for understanding the best basis algorithm of Sec. 4.3, and the next the orem concerning infinite systems is only included for aesthetic reasons. Note that Theorem 4.4 contains the Haar and Walsh bases as special cases. Theorem 4.4. A collection {wp\p € B] is an orthonormal basis of L2(0,1) if and only if B cV is a disjoint covering of S except for a set of measure zero. Proof, (a) Let us first note that the complement of the union of a pairwise disjoint set of tiles B C V is a disjoint union of tiles and a set of the form E x [0, oo [. To see this, let S' = UB and pick a point in the complement (t, £) 6 5 \ S'. Now either ({t} x [0, oo[) n S" = 0 or there is a largest dyadic frequency interval w(f) C [0, oo[ containing f and such that ({t} xu>(£))nS" = 0. In the last case, consider the tile q = I(t) x w(£) where I(t) is the dyadic time interval of length |o;(£)| -1 containing t. Let o/(£) be the neighbor of w(£) in the sense that the union of these two intervals is dyadic. By construction, we know that the line segment {t} x
140
C. M. Thiele
tiles B'j all having time intervals of length 2~J and such that the {u>p|p £ B'j} also span Vj. Moreover, if Ej is the complement of the union of intervals {I(p)\p £ B'j}, then we conclude that Vj is simply the space of functions piecewise constant on dyadic intervals of length 2~J and vanishing on Ej. By construction, Sj = UBj is an increasing sequence of subsets of S such that UJSJ = S\(E x [0, oo[). This implies that Ej is a decreasing sequence of subsets of [0,1[ with HJEJ = E. In particular, limj \Ej\ = \E\ — 0. Let ip be a function piecewise constant on dyadic subintervals of [0,1[ of length 2 - m . Then the projection
algorithm
J
Let N = 2 for some non-negative integer j . We can identify R ^ with the closed subspace Vj of L 2 (0,1) spanned by {wp\p £ B\), where B\ consists of the tiles [k/N, (k + \)/N[x [0, N[, k = 0 , 1 , . . . , N - 1. In other words, Vj is the space of functions piecewise constant on dyadic intervals of length 2 - J , and we identify RN with the subset Sj = [0, l[x[0, N[ of the time-frequency plane 5 = [0, l[x [0, co[. It is clear from the results of Sec. 4.2 that any disjoint covering B C V of Sj corresponds to an orthonormal basis of Vj, and therefore ofR". The above identification is equivalent to associating the function N-l
/(*) = £ xk\fNW0{Nt fc=o
- k)
(10)
to the given vector x £ RN. For each tiling B of Sj, Corollary 4.2 implies that we also have
f=
^(f,wp)wp.
Time-Frequency Analysis in the Discrete Phase Plane
141
This gives a large number of bases to choose from, and we will use an additive cost function as in Ref. 5 to compare these bases. Definition 4.5. A map H from the set of finite pairwise disjoint collections of tiles, B C V and vectors v = (VP)P€B to R is called an additive cost function if for all such pairs (v, B) H(V,B) =
J2H(VP,{P})pGB
(In fact, this is a slight generalization of the measure from Ref. 5.) Typically H(v,{p}) = h(v) and standard choices are h(v) = —u 2 logu 2 or h(v) = \v\p with p < 2. The idea is that an additive cost function should be small when the energy of the vector (vp) is concentrated in few elements. The cost of expanding / € Vj in the basis induced by B is then H(((f,wp))peB,B) = £ / / ( ( / , ™p),p).
(11)
p€B
We are looking for a tiling B of Sj that minimizes (4.5). The corresponding basis of RN is then a "best" basis relative to the vector x and the additive cost function H. Once the vector x is fixed, all tiles p G V with p C Sj have a fixed cost c(p) = H((f,wp),p). There are (j + 1)N such tiles, and the problem above is equivalent to the problem of tiling Sj in the cheapest possible way. The fast algorithm for finding such a minimizing tiling relies on the next lemma which reduces the problem to four problems of half size. Lemma 4.6. Let T C S be a dyadic rectangle of area greater than or equal to two, with left half L, right half R, lower half D, and upper halfU. Assume each tile p C T has the cost c(p). Define m.T = min < Y j c(p)|S c V is a disjoint covering of T> ,
Ipes and similarly mi,
TOJJ, TOD,
J m y . Then
TUT = min{m/, 4-TOR,mo + n^u} ■ Proof. Let B be any tiling of T = I x w. If a tile p G B intersects both L and R, then I(p) = I and if it intersects both D and U then u;(p) = w. Hence
142
C. M. Thiele
B cannot have both types of intersecting tiles since these would intersect each other. Either each p € B is contained in one of the halves L or R or each p € B is contained in one of the halves D or U. The union of a tiling of L and a tiling of R is a tiling of T, so m r < m/, +m/j holds. With the same argument we get that TUT < mo + m y . On the other hand, if B is a minimizing tiling of T, we have just seen above that B is the disjoint union of two tilings, either of L and R or of D and U. Therefore either TTIT > TTII, + WIR or m-r > m/j + my.
D
Repeated use of Lemma 4.6 starting with T = Sj gives a recursive procedure for finding a minimizing tiling of Sj in j steps. The actual algorithm will work in the opposite direction. A l g o r i t h m 4.7. (a) Compute the cost c(p) of all tiles contained in Sj, and put A(p) = {p}. (b) For I = 1,2,..., j ; Consider the dyadic rectangles P C Sj of area 2l~l as tiles with costs c(P). For each dyadic rectangle P' C Sj of area 2l, find a minimizing tiling of P' with the rectangles of area 2 ' - 1 . There are two possibilities. Then mark each rectangle P' with the resulting cost c(P'), and put A(P') = A(Pi)L)A(P2) where P\ andP2 are the chosen rectangles of area2l~l. By Lemma 4-6, A(P') is now a minimizing tiling of P'. (c) Finally A(Sj) is a minimizing tiling of Sj with the minimal cost c(Sj). There are (j — I + 1)2 J ~' dyadic sub-rectangles of Sj with area 2l, so the Ith. step of the iteration involves (j — I + l ) 2 J _ i times two additions and one comparison. This gives a total of
3 J2U ~ I + 1)2J'"' = 3 J2 V2"-1 = 3(1 + (j - 1)2*) 1=1
i/=i
operations. That is, the search for the best basis takes less than 37V log2 N operations. Finding the coefficients (/, wp) for all (j + 1)2J tiles p C Sj can be done by using Lemma 4.1 in j steps of N multiplications and additions. Going back wards from the chosen tiles, the same number of operations will reconstruct the vector x G RN. Thus the whole process of representing x with the best among all rescaled Walsh bases takes 0(N log N) operations.
Time-Frequency Analysis in the Discrete Phase Plane
143
The number aj of such bases for KN can be found recursively from the argument in the proof of Lemma 4.6. Indeed, aj is equal to the number of tilings of a dyadic rectangle T of area 2 J with dyadic tiles of area one. Each such tiling can be split in two sub-tilings horizontally or vertically. There are Oj_! of each of those split tilings, giving 2a 2 _ x possibilities for the tilings of T, except for the fact that we counted the doubly split tilings twice. There are a*_2 such tilings, hence a0 = 1, a
j =
2 a
ai = 2 , j-i-°i-2.
J'^2-
The corresponding recursion for the number bn of wavelet packet bases is 5
In the double tree algorithm of Ref. 8 the bases are obtained by a dyadic time segmentation followed by dyadic frequency segmentation. The number dj of such bases is given by * = 1,
dj=d2_1+bj-bj.1,
J>1.
For j = 0,1,2 we have dj = aj but there are 12 Walsh tiling bases of R 8 that cannot be obtained by the double tree algorithm. All three numbers of bases grow exponentially with N, and asymptotic estimates can be found by comparison with the recursion Zj+i = /xz2, which has the solution Zj = /i - 1 [(/*z m ) 2 m\2' for j > m. For instance, we can define \ij = aj+\/a2j, and observe that /io = 2 and (ij+i = 2 — fij1, so that (j,j \ fj, = (1 + \fb)/2 for j -> oo. Therefore ^ _ 1 [(/«i m ) 2 " m ] 2 J < aj < ^ - 1 [ ( / i m a m ) 2 _ m ] 2 ' for j > m. Using similar techniques for bj and dj we find that for j > 4, 0.618(1.84)2J <
aj
< 0.618(1.85) 2i ,
(1.50) 2 ' < bj < (1.51) 2 ', (1.71)* < dj < (1.72) 2 ' .
144
C. M. Thiele
For example, there are approximately 10 181 wavelet packet bases, 10 240 double tree bases, and 10 272 Walsh tiling bases for R 1024 . 4.4. An
example
The algorithm of Sec. 4.3 is easiest to understand graphically. Therefore we will consider a small numerical example in some detail.
V2
v/2
V2
1
0
1
2
0
0
4
4
v57T M 2J2~
-M 3V2/2.,
3 1 2 2 3 1 1 3 2v/2 2\/2 2\/2 2 ^
V2 1=1
<
4^/2
---2- — •2v^"
Sv^ 4
1
— 4 — - —-4
-4%/?-
2+\/2 1=2
•4+V/2-
-6- — -4\/2-
0 0
V^
2 J=3 0
0 0
4V2 Fig. 5. Graphical representation of Algorithm 4.7 for a signal of length 8. The cost function is equal to the sum of absolute values of the expansion coefficients. The top row of squares contain all tiles and coefficients. The bottom square represents the best basis expansion.
Time-Frequency Analysis in the Discrete Phase Plane
145
Consider the vector x = (3,1,2,2,3,1,1,3) e R 8 . For j = 3, the timefrequency plane is 53 = [0, l[x[0,8[, and for figures we will scale the frequency axis to get a square. The collection of all 32 tiles p C S3 and the corresponding coefficients (/, wp) are placed on four copies of S3 in the top row of squares of Fig. 5. The first of these four squares contains just the elements of the vector x, and describes the initial expansion (10). In the next square, the eight new expansion coefficients are obtained from Lemma 4.1, more specifically (9). Up to a factor \Jl the numbers are just sums and differences of the elements of x. The remaining two squares in the first row are filled with coefficients (/, wp) obtained by successive use of (a). In fact, this is just the usual iterated filter bank operations giving the wavelet packet expansion (in sequency order) of x. To make calculations simple, we use the cost function based on h(v) = |v|. This means that the cost of a tile p is simply the absolute value of the corresponding coefficient c(p) = \(f, wp)\. This makes step (a) of Algorithm 4.7 particularly easy. The next step, I = 1, is to consider the 12 dyadic sub-rectangles of S3 of area 2. These are delimited by solid lines in the second row of squares of Fig. 5. Each of these rectangles can be covered in two ways by tiles from the first row of squares. We mark each rectangle with the cost of the cheapest tiling and indicate the addresses of the chosen tiles by a dashed line. For example, the very first rectangle can be covered by time neighbors with cost 3 + 1 = 4 or frequency neighbors with cost \/2 + 2\/2 = 3-\/3. The first choice is cheapest. If the costs of two coverings are the same, we choose time neighbors. We continue the process with the four dyadic sub-rectangles of S3 of area four on the third row of squares, I = 2. The dashed lines delimit the minimizing tilings. For example, the first rectangle can be covered with dyadic rectangles of area two from the row above in two ways. Time neighbors with cost 4 4- 2\/2 or frequency neighbors with cost 4 -I- y/2. The first choice is cheapest, so the corresponding tiling and cost is carried over to the first rectangle of the row Z = 2. Finally, since 4 + \/2~+6 > 4\/2 + 2 + %/2 the tiling of the right square of the row I = 2 is a minimizing tiling of S3, so we transfer this tiling to the bottom square, I = 3, of Fig. 5. The tiles are marked with the corresponding expansion coefficients taken from the top row of squares. There are only three nonzero coefficients so the algorithm has indeed concentrated the energy of the signal x. Observe that the resulting tiling can neither be obtained by the wavelet
146
C. M. Thiele
packet algorithm nor by the double tree algorithm, where a time segmentation is followed by a frequency segmentation. In the present situation, both of these algorithms lead to a best basis with the higher cost 1 + 6\/2 and four nonzero coefficients in the resulting representation of x. 4.5. Algebraic
time-frequency
localization
In order to motivate the definition below, we now view the interval [0,1] as an image of the group G of all sequences of zeroes and ones with binary addition as group multiplication. Each sequence is mapped to its binary number. Then the intervals [0,2~k] are images of subgroups of G, and an arbitrary dyadic interval is an image of a coset of such a subgroup. The dual group of G becomes the set of Walsh functions on [0,1] with pointwise multiplication as group operation. If we consider as above the sequency order of the Walsh functions, then W0,..., W ^ - i form a subgroup, and the Walsh functions of each dyadic interval of integers form a coset of such a subgroup. The same would be true if we chose the natural Paley order. Hence a dyadic rectangle is a Cartesian product of a coset in G with a coset in G. We take this property as our new definition of phase space cells, and call this point of view algebraic time-frequency localization. For simplicity we restrict ourselves to finite groups. Let G be a finite Abelian group, let G = Hom[G, T] be its dual group consisting of the characters of G. Both are naturally subsets of the group algebra C(G), and with the usual invariant scalar product on the group algebra both are orthogonal bases. We call them time and frequency basis. The linear span of a subset S of the group algebra will be denoted by span(S). Definition 4.8. Consider subgroups H C G and X C G, elements t € G and ( € G , and the corresponding cosets Ht and X£. Then the Cartesian product
HtxXZ is called a phase space cell. It is corresponding vector space is VHtxX( = span(ift) n span(X£). The following theorem relates the area of a phase space cell to the dimen sion of the corresponding vector space. Therefore it resembles the Heisenberg uncertainty principle.
Time-Frequency Analysis in the Discrete Phase Plane
147
The Annihilator Ann(H) of a subgroup H c G is the set of all characters of G that are identically 1 on H. As usual \G\ denotes the order of G. Theorem 4.9. Consider the data of Definition 4.8. If Aan(H) = X, then the phase space cell is one-dimensional and spanned by the unit vector
y/\H\ £»t IfArm(H)cX,
y/\G\\Aim(H)\
V€£(H)e
then dim(VmxX()
=
\Ht\\XZ\/\G\,
and finally if Axm(H) <£ X, then Vmxxz = {0}. We call the number |.ff£||X£|/|G| the normalized area of the phase space cell Ht x X£. Observe that the conditions in Theorem 4.9 are actually symmetric in X and H, if we use Pontryagin duality to identify Ann(X) with a subgroup ofG. Proof of Theorem 4.9. For the first part of the theorem, observe that both vectors for which equality is claimed are unit vectors, and that once the equality is shown, it follows immediately that the vector is contained in the phase space cell. We calculate the scalar product of both sides using the formula |H]|Ann(i/)| — \G\, which follows from well-known facts about finite Abelian groups (see Refs. 7 and 10).
(-TrW E *(«% „ J* 0 , m \ VW tftm
y/\G\\kim{H) ^ ^ ^ 1
V\G\\H\\Ann(H)\
= ]gf 1
E e®?)
E
/
^
t
, 6 / / t ^ „ „ W € WM)?M(v)
HO = L
' t"e//,{"€Ann(H)
The scalar product of the two unit vectors is one, hence equality holds. That this vector spans the phase space cell will follow from the second statement of Theorem 4.9, which we prove by a counting argument. First
148
C. M. Thiele
observe that the corresponding vector spaces of two disjoint phase space cells are orthogonal, because the two cells have disjoint support either in time or in frequency, and both the time basis and the frequency basis are orthogonal. Assume Ann(iZ) C X. The coset X£ is the disjoint union of as many as \X/Ann(H)\ = \X\\H\/\G\ cosets of Ann(Jf), each giving rise to at least a one-dimensional subspace of the corresponding vector space of Ht x X£, which therefore itself has at least the dimension claimed in the theorem. There are \G\/\H| cosets of H and \G\/\X\ cosets of X in G, hence there are \G\2/(\H\\X\) combinations of such cosets, giving the same number of mutually orthogonal vector spaces. As the dimension of the group algebra is \G\, it follows that the lower bound on the dimension of each of these vector spaces, that was found above, is the exact dimension. Finally assume that Aim(H) <£_ X, and pick a character in Ann(iJ) \ X. Considering its natural action on the group algebra, it acts as a scalar on span(iJt), but the only vector in span(X£) on which it acts as a scalar is the zero vector. Hence the intersection of both spaces is the zero vector only. From now on we will only consider phase space cells which satisfy the condition of Theorem 4.9 so as to represent a nontrivial vector space. We have the following analogue of Corollary 4.2. Corollary 4.10. Let a phase space cellp be the disjoint union of phase space cells pj, j G J, each hatting normalized area one. Then picking a unit vector out of each of the one-dimensional vector spaces VPj gives an orthonormal basis
ofVp. Proof. Comparison of the areas shows by Theorem 4.9, that we have the right number of unit vectors. That they are orthogonal has been shown in the proof of Theorem 4.9. From Corollary 4.10 we get a whole library of orthonormal bases of the group algebra to choose from, namely one for each partition of G x G into phase space cells of normalized area one. To get a fast algorithm as before we have to restrict ourselves to a sub-library. For that we fix a chain of subgroups Go = (0) C Gi C • • • C Gk C • • • C Gn = G of G. An admissible tile with respect to this chain is a phase space cell Gfct x Ann(Gfc)£ with Gk contained in the chain. Observe that such a tile has normalized area one. A tiling of a phase space cell is a partition of the latter into admissible tiles. We then have the analogue of Lemma 4.6:
Time-Frequency Analysis in the Discrete Phase Plane 149
Lemma 4.11. With the notation above, consider a phase space cell C x D of normalized area greater than one, where C is a coset of Gk and D is a coset of Ann(Gj). Then a tiling of this phase space cell is either the disjoint union of tilings of the \Gk\/\Gk-i\ phase space cells Cj x D, where the Cj denote the different cosets of Gk-\ in C, or it is the disjoint union of tilings of the |Ann(Gj)|/|Ann(G/ + i)| phase space cells C x Dj, where the Dj denote the different cosets o/Ann(G/+i) inside D. Proof. If the tiling fails to split into tilings in the first way, then necessarily for one tile C" x D' of this tiling the first coset C" is not contained in any of the cosets of Gk+i inside C, which implies that C" = C. Assuming that the tiling does not split in the second way neither, we similarly find a tile C" x D in the tiling ofCxD. But then those two tiles have nonempty intersection, a contradiction. This lemma gives a recursive control over the bases of a phase space cell, which gives a fast algorithm to pick the optimal basis with respect to an addi tive cost function as before in the Walsh case. 4.6. An example related to the FFT For numerical treatment of the Fourier transform, for example on the circle, one discretizes the latter to a finite cyclic group. This, besides the Walsh group, provides the main example for the results of Sec. 4.5. Indeed, when doing the Fast Fourier transform, one implicitly makes use of phase space cells. As an example we discuss the cyclic group G of order eight. It is represented by the residues {0,1,2,3,4,5,6,7} modulo eight. By Go, Gi and G2 we denote the subgroups {0}, {0,4} and {0,2,4,6} and we also write G = G3. These are all subgroups of G, and as for any cyclic group they form a chain, so that the fast algorithm works for the collection of all phase space cells. The characters of the group G are of the form
with n an integer. In fact such a character depends only on the residue of n modulo eight, hence we can identify the dual group also with the set of integers from 0 to 7. It is clear that the Ann(Gfc) is G3_fc. In Fig. 6 each square represents the phase plane with coordinates as shown for the square in the bottom row. Inside each square (k, I) the black area shows the phase space cell Gfc x Gj. The dark gray and the light gray areas,
150
C. M. Thiele (0,3) 1 "in 1 1 1
am
I
■
tS9 »u
! n 11 11 M 1I
sin
11 |
i«n
mi: m:t
1 mn
(1,3)
frequency
1 2 3 4 5 6 7 8
time Fig. 6. Phase space cells for the cyclic group of order eight.
which are obtained from the black areas just by translations, show phase space cells of cosets of the same subgroups. One can see from Fig. 6 how a black phase space cell Gk x Gj splits into cells of smaller dimension: it is the union of the light gray cell and the black cell in the square (k — 1,1), and at the same time it is the union of the dark gray cell and the black cell in (k,l — 1). That each cell of normalized area greater than one is a union of exactly two smaller cells is certainly a special property of our situation, which can be achieved only for groups having a power of two as order. It implies that all
Time-Frequency Analysis in the Discrete Phase Plane
151
calculations for the fast algorithm have the same complexity as in the Walsh case. For each pair (k, I) the phase plane can be partitioned into phase space cells that are products of a coset of Gk with a coset of G/. This means that for each square in the top row of Fig. 6 we obtain an orthonormal basis of the group algebra by picking a unit vector out of each of the one-dimensional cells. These special bases are called homogeneous. Assume that we are given a signal in time basis, that is the basis corresponding to the square (0,3). To calculate the coefficients of the basis corresponding to the square (1,2), we use that each phase space cell in this square is contained in a phase space cell of the square (1,3), which itself is spanned by two phase space cells of the square (0,3). Hence each coefficient of square (1,2) is the linear combination of two coefficients of the square (0,3). The explicit form of the linear combination can be deduced from the formula in Theorem 4.9. Continuing this way, we can successively calculate all coefficients in the upper row. In order to get the signal in frequency basis, we have to calculate 24 linear combinations. This algorithm for calculating the Fourier transform is called the Fast Fourier transform. Observe that in general for a group of order 2 n we need to calculate n2n linear combinations. The algorithm for selecting the best basis now goes along the same lines as in the example in Sec. 4.4: Given an additive cost function, we first select the optimal basis out of the two choices for each of the two-dimensional phase space cells in the second row. Then we successively continue with the other rows, until we obtain the optimal basis for the last square, which represents the whole group algebra. For each phase space cell we have to compare only two bases.
Acknowledgments It is my pleasure to thank my advisor, Ronald R. Coifman, for his guidance through the work of this thesis. I am grateful to him for enthusiastically introducing me into the world of pure and applied harmonic analysis, and for his generous support throughout my years at Yale. My special thanks to Lars Villemoes. This work had been impossible with out the ideas that came out of our collaboration. I am very happy to include a joint paper with Lars in this thesis.
152
C. M. Thiele
There are m a n y people inside and outside t h e Yale M a t h e m a t i c s Depart ment, who deserved being mentioned here for supporting my research, encour aging me or teaching m e useful mathematics. I a m grateful t o all these people, b u t unfortunately it would not fit onto this page t o list t h e m all by n a m e . Finally, I would like t o dedicate this thesis t o my parents, my family, a n d t o my friend Andrea, who is a t r u e master of applied harmonic analysis. Bibliography 1. J. Bergh and J. Lofstrom, Interpolation Spaces, An Introduction (Springer-Verlag, 1976). 2. P. Billard, Sur la convergence presque partout des series de Fourier Walsh des fonctions de I'espace L 2 (0,1), Studia Math. 28 (1967) 363-388. 3. L. Carleson, On convergence and growth of partial sums of Fourier series, Ada Math. 116 (1966) 136-157. 4. A. Cohen, I. Daubechies and P. Vial, Wavelets on the interval and fast wavelet transforms, Appl. Comput. Harmonic Anal. 1 (1993) 54-81. 5. R. R. Coifman and M. V. Wickerhauser, Entropy-based algorithms for best basis selection, IEEE Trans. Info. Theory 38 (1992) 713-718. 6. C. Fefferman, Pointwise convergence of Fourier series, Ann. Math. 98 (1973) 551-571. 7. D. Gorenstein, Finite Groups (Chelsea, 1980). 8. C. Herley, J. Kovacevic, K. Ramchandran and M. Vetterli, Tilings of the timefrequency plane: construction of arbitrary orthogonal bases and fast tiling algo rithms, IEEE Trans. Signal Proc. 41 (1993) 3341-3359. 9. R. A. Hunt, On the convergence of Fourier series, Orthogonal Expansions and their Continuous Analogues, Proc. Conf. Edwardsville (1970) 235-255. 10. Y. Katznelson, An Introduction to Harmonic Analysis (Dover, 1968). 11. M. Lacey, The bilinear Hilbert transform is pointwise finite, preprint (1995). 12. R. E. A. C. Paley, A remarkable series of orthogonal functions (I), Proc. London Math. Soc. 34 (1932) 241-264. 13. P. Sjolin, An inequality of Paley and convergence a.e. of Walsh-Fourier series, Ark Mat. 7 (1968) 551-570. 14. C. M. Thiele and L. F. Villemoes, A fast algorithm for adapted time frequency tilings, Appl. Comput. Harmonic Anal. 3 (1996) 91-99. 15. J. L. Walsh, A closed set of normal orthogonal functions, Amer. J. Math. 45 (1923) 5-24. 16. Wavelet Packet Laboratory for Windows, PC software, A. K. Peters (1993).
153
MULTIRESOLUTION HOMOGENIZATION SCHEMES FOR DIFFERENTIAL EQUATIONS A N D A P P L I C A T I O N S
Anna C. Gilbert Department of Mathematics, University of Princeton, USA To my family
The multiresolution analysis (MRA) strategy for homogenization consists of two algorithms: a procedure for extracting the effective equation for the aver age or for the coarse scale behavior of the solution to a differential equation (the reduction process) and a method for building a simpler equation whose solution has the same coarse behavior as the solution to a more complex equa tion (the homogenization process). We present two multiresolution reduction methods for nonlinear differential equations; a numerical procedure and an analytic method. We discuss the convergence of the analytic method and ap ply the MRA reduction methods to find and characterize the averages of the steady-states of a model reaction-diffusion problem. We also compare the MRA methods for linear differential equations to the classical homogenization methods for elliptic equations.
Contents 1. Introduction 2. A Comparison of MRA and Classical Homogenization Methods . 2.1. Multiresolution homogenization method 2.1.1. Reduction of linear ODEs 2.1.2. Augmentation procedure for linear ODEs 2.2. Second-order elliptic problems 2.2.1. Reduction procedure without forcing terms 2.2.2. Homogenization via augmentation 2.2.3. Reduction procedure with forcing terms 2.3. Several approaches in classical homogenization theory: A review 2.3.1. Asymptotic method
154 158 159 159 167 169 170 179 180 183 184
154 A. C. Gilbert
2.3.2. Flows 2.4. Physical examples 2.5. Conclusions 3. MRA Reduction Methods for Nonlinear ODEs 3.1. Nonlinear reduction method 3.2. Series expansion of the recurrence relations 3.2.1. Recursion relations for autonomous equations . . . . 3.2.2. Algorithm to generate recurrence relations 3.3. Convergence of the series expansion 3.3.1. Closed form expressions 3.3.2. Convergence of the lowest two order terms 3.3.3. Linear ODEs and convergence issues 3.4. Implementation and examples 3.4.1. Implementation of the reduction procedure 3.4.2. Examples 3.5. Homogenization 3.6. Conclusions 4. Steady-States of a Model Reaction-Diffusion Problem 4.1. Setting the stage 4.2. New techniques 4.2.1. Recurrence relations for n-dimensional systems . . . . 4.2.2. Boundary value problems 4.2.3. Rescaling the interval [0,1] 4.2.4. Generalized Haar basis 4.3. Characterizing the average in terms of L 4.4. Complexity of reduction algorithm 4.5. Conclusions 5. Conclusions Appendix A Acknowledgments Bibliography
185 187 193 193 195 203 206 209 211 213 214 221 226 226 228 238 240 241 242 244 245 246 248 252 255 263 264 264 265 266 267
1. Introduction There are many important physical problems which incorporate multiple scales. The interactions and the fineness of these scales make solving these
Multiresolution Homogenization
Schemes
155
problems on the finest scales prohibitively expensive. Often, we would be con tent with the coarse scale behavior of the solution but the fine scales affect this behavior so we cannot simply ignore these. Instead, it is useful to find a way of extracting or constructing equations for the coarse behavior of the solution which take into account the effect of the fine scales. This amounts to writing an effective equation for the coarse scale component of the solution which can be solved much more economically. Alternatively, we might want to construct simpler fine scale equations whose solutions have the same coarse properties as those of more complicated systems. These simpler equations would also be considerably less expensive to solve. These procedures are generally referred to as homogenization, though the specifics of the approaches vary significantly. An example of a problem which encompasses many scales and which is dif ficult to solve on the finest scale is molecular dynamics. The highest frequency motion of a polymer chain under the fully coupled set of Newton's equations determines the largest stable integration time step for the system. In the con text of long time dynamics, the high frequency motions of the system are not of interest but current numerical methods (see Refs. 1 and 22) which directly access the low frequency motions of the polymer are ad hoc methods, not meth ods which take into account the effects of the high frequency behavior. The work of Bornemann and Schiitte (see Refs. 8 and 19) is a notable exception and appears quite promising. Let us briefly mention several classical approaches to homogenization. The classical theory of homogenization, developed in part by Bensoussan, Lions and Papanicolaou, 4 Jikov, Kozlov and Oleinik,15 Murat 18 and Tartar, 24 poses the problem as follows: Given a family of differential operators L£, indexed by a parameter e, assume that the boundary value problem L€ue = /
in
(1
(with u £ subject to the appropriate boundary conditions) is well-posed in a Sobolev space H for all e and that the solutions uE form a bounded subset of H so that there is a weak limit uo in H of the solutions ue. The small parameter £ might represent the relative magnitude of the fine and coarse scales. The problem of homogenization is to find the differential equation that uo satisfies and to construct the corresponding differential operator. We call the homogenized operator Lo and the equation Lotto = / in fi the homogenized equation. There are several methods for solving this problem. A standard technique is to expand the solution in powers of e, to substitute the asymptotic series
156 A. C. Gilbert
into the differential equations and associated boundary conditions, and then to recursively solve for the coefficients of the series given the first-order ap proximation to the solution (see Refs. 3, 16 and 17 for more details). If we consider a probabilistic interpretation of the solutions to elliptic or parabolic PDEs as averages of functionals of the trajectory of a diffusion process, then homogenization involves the weak limits of probability measures defined by a stochastic process. 4 In Refs. 4 and 15, the methods of asymptotic expansions and of G-convergence are used to examine families of operators Le. Murat and Tartar (see Refs. 18 and 24) developed the method of compensated compact ness. Coifman et al. (see Ref. 12) have recently shown that there are intrinsic links between compensated compactness theory and the tools of classical har monic analysis (such as Hardy spaces and operator estimates). Using a multiresolution approach, Beylkin and Brewster in Ref. 9 give a procedure for constructing an equation directly for the coarse scale component of the solution. This process is called reduction. Prom this effective equation one can determine a simpler equation for the original function with the same coarse scale behavior. Unlike the asymptotic approach for traditional homog enization, the reduction procedure in Ref. 9 consists of a reduction operator which takes an equation at one scale and constructs the effective equation at an adjacent scale (the next coarsest scale). This reduction operator can be used recursively provided that the form of the equation is preserved under the transition. For systems of linear ordinary differential equations, a step of the multiresolution reduction procedure consists of changing the coordinate sys tem to split variables into averages and differences (in fact, quite literally in the case of the Haar basis), expressing the differences in terms of the aver ages, and eliminating the differences from the equations. For systems of linear ODEs there are relatively simple explicit expressions for the coefficients of the resulting reduced system. Since the system is organized so that the form of the equations is preserved, we may apply the reduction step recursively to obtain the reduced system over several scales. Beylkin and Coult in Ref. 7 present a multiresolution approach to the re duction of elliptic PDEs and eigenvalue problems. They show that by choosing an appropriate MRA for a given problem, the small eigenvalues of the reduced operator differ only slightly from those of the original operator. This fact is used to reduce parabolic PDEs and generalized eigenvalue problems. In this thesis we will first compare the classical homogenization theory with the algorithm of Brewster and Beylkin9 in the case of linear one-dimensional
Mvltiresolution
Homogenization
Schemes
157
second-order elliptic operators. Second, we will consider a multiresolution strategy for the reduction and homogenization of small systems of nonlinear ordinary differential equations. Third, we will apply these methods to search for and to characterize the steady-state solution(s) of a model one-dimensional reaction-diffusion problem. In Sec. 2 we apply the MRA homogenization strategy of Ref. 9 to onedimensional elliptic equations and compare the results to those obtained by the classical theory of homogenization. This is a natural situation to examine because it is the simplest setting in which classical results are derived. We will examine physical situations where both theories are valid and explore what physical quantities are preserved with the two methods. We will also investi gate several key physical problems (both numerically and theoretically) which highlight the distinctions between classical and multiresolution homogenizations. This work appears separately as Ref. 14. In Sec. 3 we present a multiresolution strategy for the reduction and homog enization of nonlinear equations; in particular, of a small system of nonlinear ordinary differential equations. The main difficulty in performing a reduction step in the nonlinear case as compared to the linear case is that there are no explicit expressions for the differences in terms of the averages. We offer two basic approaches to address this problem. First, it appears possible not to require an analytic substitution for the differences and, instead, to rely on a numerical procedure. Second, we use a series expansion of the nonlinear functions in terms of a small parameter related to the discretization at a given scale (e.g., the step size of the discretization) and obtain analytic recurrence relations for the terms of the expansion. These recurrence relations allow us to reduce repeatedly. A third method is a hybrid of the two basic approaches. We apply these three approaches to several examples. We also examine the convergence of the series expansions. Most of this work is joint work with Greg Beylkin and Mary Brewster and appears separately as Ref. 5. In Sec. 4 we apply the reduction methods for nonlinear ODEs developed in Sec. 3 to a second-order differential equation. This second-order equation with periodic boundary conditions determines the steady-state solution(s) of a coupled system of PDEs which are a generic one-dimensional model for the oxidation and diffusion of CO on a composite reactive surface or on a reactive surface with complex microstructured geometry. In experiments the composite surface consists of a base reactive component and a grid of inclusions of another reactive material.
158 A. C. Gilbert
Experimental results show that spatiotemporal patterns form during the heterogeneous chemical reactions on composite catalyst surfaces.2 Shvartsman et al.21 present a numerical study of pattern formation on one-dimensional reactive media. They vary the geometry of the composite and use the size of the medium as a bifurcation parameter to explore dynamic patterns (including non-uniform steady-states). We cannot, however, directly apply the reduction methods of Sec. 3 to this second-order equation; we must construct several new techniques for the reduction of boundary value problems, of equations on intervals of arbitrary length L, and of n-dimensional systems of equations. After crafting these new techniques, we apply them to the second-order equation to search for and to characterize the steady-state solution(s) and their averages of the reactiondiffusion model. The reduction procedure is a faster approach to finding the averages of the steady-state solution(s) than more standard methods and re veals precise dependence of the averages on the size of the medium and the geometry of the composite. This analysis of the steady-states by reducing the ODE which determines them is a first step towards the more difficult task of reducing the coupled system of PDEs which model reaction and diffusion on composite surfaces and examining how the inherent scales of the composite surface interact with the scales (both spatial and temporal) of the dynamic patterns. 2. A Comparison of M R A and Classical Homogenization Methods* In this section we first summarize the MRA homogenization methods presented in Ref. 9. We present the reduction and augmentation procedures for linear differential equations. Next, we apply these ideas to one-dimensional elliptic differential equations of the form: d ( du\ dx\dxj
_
(2.1)
where u € H&([0,1]), u(0) = u(l) = 0, / € ^ 0 _ 1 ([0,1]), K € L°°([0,1]) and K(X) > v > 0 for all x £ [0,1]. We answer the following three questions: • What is the effective equation we extract for the average of the solution u? •This section was published in Appl. Comput. Harmonic Anal. 6, 1-35 (1998), © Academic Press.
Midtireaolution Homogenization
Schemes
159
• What is the homogenized equation or constant coefficient equation whose solution has the same average as u? • Are the algorithms in Ref. 9 computationally feasible with bases other than the Haar basis? Then we review the classical homogenization techniques for these elliptic differential equations (2.1). We show that the homogenized equation is
_l_g
=/
v/ear'(|M]).
Next we investigate several key examples to highlight • that for those problems for which the classical theory was developed, the MRA methods reproduce the classical results; • that the MRA strategy does not simply provide a higher order term in the asymptotic expansion of the classical theory; and • that we can apply the MRA strategy to problems which fall beyond the reach of classical techniques.
2.1. Multiresolution
homogenization
method
Let us first summarize the methods in Ref. 9. The algorithm for numerical homogenization depends on the general framework or multiresolution analysis (MRA) associated to the construction of a wavelet basis. An MRA is a natu ral framework for discussing the behavior of a solution on both fine and coarse scales. Also, we use an MRA to represent operators in a matrix form.6 For a wide class of operators (e.g., Calder6n-Zygmund operators), the MRA repre sentation is a sparse matrix and allows us to construct fast algorithms. This MRA representation gives an explicit description of the operator's interactions between different scales and appears to be an appropriate tool for numerical homogenization. 2.1.1. Reduction of linear ODEs Let us now describe the MRA reduction method for linear ODEs. Consider the differential equation ^(G(t)x(t) + q(t)) = F(t)x(t) + p(t),
t e [0,1],
160 A. C. Gilbert
where F and G are bounded matrix-valued functions and p and q are vectorvalued functions (with elements in L 2 ([0,1])). We will rewrite this differential equation as an integral equation G(t)x{t) + q(t) - P = f (F(s)x(s) + p(s)) ds, Jo
t G [0,1],
(2.2)
(where ft is a complex or real vector) since we can preserve the form of this equation under reduction, while we cannot preserve the form of the corre sponding differential equation. To express this integral equation in terms of an operator equation on functions in L2([0, l]), let F and G be the operators whose actions on functions are pointwise multiplication by F and G and let K be the integral operator whose kernel K is
{
1, 0,
0<s
Then Eq. (2.2) can be rewritten as Gx + q-(3 =
K(Fx+p).
We will use a general MRA of L2([0,1]). See Appendix A for definitions. We begin with an initial discretization of our integral equation by applying the projection operator Pn and looking for a solution x„ in Vn. This is equivalent to discretizing our problem at a very fine scale. We have Gnxn +qn-/3
= Kn{Fnxn
+ pn),
(2.3)
where Gn = PnGP: , Fn = PnFP: , Kn = PnKP* , pn = Pnp
and
qn = Pnq.
We rewrite xn in terms of its averages (u n _i £ K - i ) and differences (io n -i € W„-i), Xn = P „ _ l X n + Qn-lXn
= t;„_i + tU n _i ,
and plug this into our Eq. (2.3): G„(u„_i + ru„_i) + qn - 0 = Kn(Fn(vn-!
+ w n _i) + p „ ) .
(2.4)
MuUiresolution Homogenization Schemes 161
Next, we apply the operators Pn-\ and Qn-\ to Eq. (2.4) to split it into two equations, one with values in V^,_i and the other with values in Wn-i> and we drop the subscripts: {PGP')v + (PGQ')w + Pq = PKPm((PFP*)v
+ (PFQ*)w + Pp)
+ PKQ*((QFP')v
+ (QFQ')w + Qp),
{QGP')v + (QGQ*)w + Qq = QKP'((PFP')v
+ {PFQ*)w + Pp)
+ QKQ'((QFP*)v
+ {QFQ')w + Qp).
Let us denote Toj = PjOj+1P;
,
Coj =
PjOj+iQ;,
Boj = QjOj+1P;
,
Aotj =
QjOj+1Q*
(see Ref. 6 for a discussion of the nonstandard form or representation of an operator 0), so that we may simplify the system of equations in v and w. Then we obtain (again dropping the subscript n - 1) TGv + CGw + Pq-0
= TK(TFv + CFw + Pp) + CK {BFv + AFw + Qp),
BGv + AQW + Qq = BK(TFv
(2.5)
+ CFw + Pp)
+ AK(BFv
+ AFw + Qp).
(2.6)
Let us assume that R = AG - BKCF
-
AKAF
is invertible so that we may solve Eq. (2.6) for w and substitute the result into Eq. (2.5), giving us a reduced equation in Vn-\ for v. (TG - CKBF - (CG - CKAF)R-\BG
- BKTF -
+ (Pq - CKQp - (CG - CKAF)R-l{Qq = TK[(TF - CFR-l{BG
- BKTF -
+ Pp-CFR-1(Qq-BKPp-AKQP)}.
AKBF))v
- BKPp - AKQp)) - 0
AKBF))v (2.7)
162
A. C. Gilbert
This equation for v n _i = Pn-\Xn exactly determines the averages of x„. That is, we have an exact "effective" equation for the averages of xn which contains the contribution from the fine scale behavior of xn. Since we have a linear system and we assumed that R is invertible, we can solve Eq. (2.6) exactly for w and substitute the solution into Eq. (2.5). Note that this reduced equation has half as many unknowns as the original system. We call this procedure the reduction step. Remark. There are differential equations for which R = Ac — BKCF — AKAF is not invertible. An example of such an equation can be found in Ref. 9. If we apply this reduction method to one-dimensional elliptic equations, the matrix R is always invertible. See Ref. 7 for a proof. We should point out that under the reduction step the form of the original equations is preserved. Equation (2.7) for vn-i has the form G n _iu n _i +qn-i
- 0 = Kn-i(Fn-ivn-i
+pn_i),
where G„_i
= TG - CKBF
- {CQ - CKAF)R~
F n _ ! = TF - CFR-\BG 9n_!
(BG - BKTF
- BKTF -
CFR~\Qq
,
AKBF),
= Pq - CKQp - (CG - CKAF)R-\Qq
p n _ ! = Pp-
— AKBF)
- BKPp -
AKQp),
- BKPp - AKQp) ■
This procedure can be repeated up to n times using the recursion formulas: F j n ) = TFJ - CFJRjl(Boj G
-
CKJBFJ
x (BG<j - BKJTFJ q™
= PjQ
- CKJQJP
- AKJBFJ),
BKJTFJ
(2.8)
CKJAF^RJ1
-
(CG,J
- AKJBFJ),
- (CGJ
x (QjQ - BKijPjP p
-
-
- AKJQJP) - BKjPiP
(2.9) CKJAF^RJ1
, - AKJQJP)
(2.10) .
(2.11)
The superscript "n" denotes the resolution level at which we started the re duction procedure and the subscript " j " denotes the current resolution level. Let us summarize this discussion in the following proposition.
Multiresolution Homogenization
Proposition 2.1.1. Suppose we have an equation for xt^ Vj+i, u
j+ixj+i
Schemes
163
= Pj+iXn
in
+ fy+i P - ^ j + H ^ j + i ^ + i + Pj+i).
where the operator Rj = AG,J—BK,JCF,J —AK,JAF,J is invertible, then we can n) { ] write an exact effective equation for x j = PjX n inVj,
G^xf + q
+ pf),
using the recursion relations (2.8)-(2.11). Remark. We initialize the recursion relations with the following values: Gn = PnGP*,
Fn = PnFPn* , Kn = PnKP: , pn = PnP
and
qn = Pnq,
where G and F are the operators whose actions on functions are pointwise multiplication by G and F, bounded matrix-valued functions with elements in L 2 ([0,1]); K is the integration operator; and p and q are vector-valued functions with elements in L2([0,1]). Remark. This recursion process involves only the matrices F-
, Gj
and
Kj and the vectors Pj and q^ . In other words, we do not have to solve for x at any step in the reduction procedure. If we apply the reduction procedure n times, we get an equation in Vo,
G
+p£°),
for the coarse scale behavior of XQ , which is an easily solved scalar equation. If we are interested in only this average behavior of x, then the reduction process gives us a way of determining the average of x exactly without having to solve the original equation for x and computing its average. This technique is very useful for complicated systems which are computationally expensive to resolve on the finest scale and which solutions we are only interested in the coarsest scale. These recursion relations hold for general wavelets. In most of this section we shall use the Haar basis. Since the supports of the Haar scaling functions at the same scale are disjoint, many of the matrices involved in the reduction procedure are very simple. However, other wavelets with short support may be
164 A. C. Gilbert
used as well. To illustrate that the scheme remains computationally viable with wavelets of short support, we will also work out an example in the following section with a different wavelet scheme; in particular, we will use a biorthogonal basis where the analyzing wavelet has three vanishing moments, leading to better approximation properties. If the reduction process is stopped at some level j > 0 in order to retain slightly more detail, then with the Haar basis, PjX is a piecewise constant function with step-width 2 - ^ ; with the biorthogonal basis (and other wavelet bases in general), PjX is a smoother function, still an approximation of x with resolution 2 _ J , but with a higher approximation order. Haar basis We are restricting ourselves to a one-dimensional system here for simplicity. (For JV-dimensional systems, the analysis is similar, except that the scalar entries in the matrices below become N x N matrices.) Let us now work out these formulas in detail for the Haar basis. First, the integral operator K in PnKP* :Vn^Vn has the Haar basis has a simple form. The operator TK,n the matrix form \ /l 0 0 2 1 T
*.« = y j 0
'b
V
The operator Cjf,n = ^nKQ* : Wn -¥ Vn has the matrix form (l CV n =
0
... o\
o '•• 2 n+2
•• o \0
...
0
1/
i.e. it identifies the space Wn with Vn in the sense that the element ^2k CfcVn.fc is mapped to l/2 n+2 £ fc c fc <£ n ,jfc. Also, BK>n = QnKP* : Vn -4 Wn identifies Wn with V„ and has the matrix form
Mvltireaolvtion Homogenization Schemes 165
0
...
0^
0
'••
'••
:
:
••
'••
0
(l 2n+2
*>
[o ... o
I)
The operator An,n = QnKQn '• Wn -► W n is identically zero. The initial operators F^"' and G„ ' have the matrix forms f
Mn,o
M« =
0
^ 0
0
...
0
'••
'••
:
'•• '••
o
...
> = diag{M„ )0 , M n > i , . . . , M n , 2 n _ i } ,
-J
0
M n>2 n.
where Mn>k = 2n /22-„"fc(*+1) M(x) dx, for M = F or G. The operators TM,J, CM,J, BMJ and AM,J also have a simple form in the Haar basis: TMJ = AM<j
= diag{s£ ) i 0 , . . . , SJfcJ, ^ . . J
and
CM,, = S M)i = diag{J?g,i0,.... ZJJJJ,. ^ . J , where W 9?(") - - f Ar(») fw M M,j,k — 9\ j+l,2k-l
S
U
M,j,k
w 4- M (n) ^ + ^-+1,2*;
r(»)
and ana
r(») M
- 2 ^Mj+l,2k-l
j+l,2k>-
So our recursion relations can be written simply as F , _ ! = SFJ - DF<jRj\DGij G j _ ! = SGj -
CJDFJ
+
+ (DGJ - CjSFJ)Rjl(-SFJ
qj-i = (DG<j - CjSF<j)Rjl(-Dq,j Pj-i = DF,jRjl{-Dq,j where Cj = 2 - J _ 1 .
CjSFJ),
-
DGJ),
- CjSpj) - CjDpj + Sq<}
- CjSpj) + Sp<j ,
and
166 A. C. Gilbert
(3,1) biorthogonai basis We will also work with the (3,1) biorthogonai wavelet basis, which has analyz ing filters
JJ_ J _ l
I
.
f -1
-1
1 -1
1
11
and synthesizing filters
h - J^-L — — -L _ L —\
= I — —X
See Ref. 10 for a discussion of biorthogonai wavelet bases. In this basis, the operator 7V,„ = PnKP* : Vn —» Vn has the matrix form *'n
\ '720'45'2'
45'720 /
We use this notation to signify that Tji
1 _ . f - 2 49 - 2 1 * . » = s 2n+2 5=«B«,d(45,45'45;
That is, CK,U is a banded matrix with the entry 49/45 along the main diagonal and the entry —2/45 to the left and to the right of the main diagonal. The operator Bx,n = QnK-Pn '• Vn -+ W„ has the matrix form K n
- —!— B dl-^— — — ~ 767 — —
< ~ 2 n+2
^
1
]
\ 1440' 320' 160' 1440 ' 160' 320' 1440 J '
The operator Ax,n = Qr»KQ* : W„ —► Wn has the matrix form . l «,n = ^
A
_
Band
A l -11 . -11 l 1 ( l 8 0 ' IQ-<°> "60"' 1 8 0 / •
All of these matrices must be altered appropriately at the boundaries of the interval. This alteration is made by changing the analyzing and synthesizing
Multiresolution Homogenization
Schemes
167
filters at the boundaries (see Refs. 11 and 23 for details). The initial operators FA and G„ have the matrix forms Mkn) = {mij\i,j
0,...,2n-1},
=
whereTO^J= (<£„,<, M<£n)j) for M = F or G. Unlike the Haar basis, this basis does not yield simplified recursion relations. We must use instead the recursion relations for a general wavelet basis which we have previously derived. 2.1.2. Augmentation procedure for linear ODEs Standard homogenization results are really formulated in terms of an "eleva tion" or "augmentation" of the reduction step. That is, an equivalent equation is written down where the solution has the same coarse behavior as the origi nal solution. Let us illustrate the numerical augmentation approach with two linear integral equations. Suppose we have two different equations, G(t)x(t)-0=
[ F(s)x(s)ds Jo
By(t)-f3=
and
(2.12)
f Ay(s)ds, Jo
(2.13)
such that after we reduce both to effective equations in Vo, Gi^xo
- /? = i^ 0 (00) *o
and
(2.14)
^°o)j/o-/3=^oo)yo, the effective coefficients GQ every value of 0; i.e.
and BQ ', and FQ'
G<,°°) = B<°°>
and
(2.15) and A^°°^ are equal for
F 0 (oo) = A^
.
Then the solutions x 0 and yo must be equal. In other words, the solutions of Eqs. (2.12) and (2.13) agree on a coarse scale and differ only on a fine scale. Suppose that one of the equations, say Eq. (2.13), has a "simpler" form; in this case, Eq. (2.13) is a constant coefficient equation. We will exploit this more desirable structure by replacing Eq. (2.12) with Eq. (2.13) and we are confident that the coarse scale behavior of the solution is not affected
168 A. C. Gilbert
by this replacement. In other words, we have substituted a more desirable equation for a complicated one but the desirable equation has the same coarse properties as the solution of the original equation. We call the simpler equation a homogenized equation and refer to this process of refining or simplifying an equation as homogenization. In many physical situations, we are only interested in the coarse scale be havior of a solution and so a reduced or effective equation for this behavior is sufficient. We need not use the second half of the MRA strategy to find a homogenized equation. We think that the real advantage in the MRA scheme is a precise algorithm for determining this effective equation. The classical the ory provides no such algorithm, only a homogenized equation. On the other hand, we will use the augmentation process so as to compare the numerical ho mogenization procedure with the classical results, both theoretically and with physical examples. We will now describe how to augment the effective Eq. (2.14) or how to determine the homogenized coefficients Fk and Gh in the integral equation Ghx{t) -p = Fh [ x(s)ds Jo
(2.16)
such that applying the same reduction procedure to Eq. (2.16) produces Eq. (2.14) for all /3. In other words, we want to find a constant coefficient integral equation whose solution has the same average on [0,1] as the solution ofEq. (2.12). The recurrence relations applied to Eq. (2.16) after simplification give us
G 5 _ i = ^ + (* J ) a ^(G^)- 1 F i \ where Sj = 2 _ J - 1 . Since F* remains unchanged at each level of the reduc tion procedure, the homogenized coefficient is Fh = Fq°°'. We now have to determine the homogenized coefficient Gh\ in general, it is not simply G0°°'. The solution of Eq. (2.16) is x{t) = and its average is
-eM(Ghr1Fht)((G'l)-10)
Multiresolution Homogenization
169
eM{Gh)-lFht)dt\{{Gh)-lp)
j
zo
Schemes
= (1 - exv((Gh)-1Fh))((Gh)-1Fh)-1((Gh)-1P).
(2.17)
However, we can also solve Eq. (2.14) for the average of x and get
x0 =(£">-±F 0 <~>)~V
(2.18)
Since we want to preserve the average of the solution under homogenization, Eq. (2.17) must equal to Eq. (2.18) for all /?. In other words, ((ft*) _ IfA
=
(i _ exP((Gh)-1Fh))((GhylFh)-1(Gh)-1,
(2.19)
where we have replaced F 0 (oc) with Fh. Solving (2.19) for Gh in terms of G^oc) and Fh, we have Gh =
phffi-i f
where F = log(-(l + (G^ ±Fh)-lFh)). We have derived the augmentation algorithm for zero forcing terms p and q. See Ref. 9 for a more detailed discussion. We should also note here that we do not have to preserve simply the average of the solution; we can, instead, preserve a linear functional of the solution (again, see Ref. 9). Also, we do not have to take as our simpler equation one which has constant coefficients. We can choose any equation so long as applying the reduction procedure to this equation produces an effective equation which is equal to the effective equation of the original problem. That is, any equation whose solution has the same average as the solution of our original equation will suffice. 2.2.
Second-order
elliptic
problems
We will now examine the results of applying the MRA homogenization scheme to one-dimensional second-order elliptic differential equations. This approach works only for one-dimensional elliptic problems. For higher-dimensional el liptic problems, we must use the methods presented in Ref. 7. The results in Ref. 7 indicate that the MRA homogenization methods for n-dimensional elliptic equations do not preserve the form of differential operators; instead, pseudo-differential operators seem to be the classes of operators to consider. In order to apply the above algorithm to equation
170 A. C. Gilbert
d_( dx \
du\_ dx)
we must rewrite this as a system of first-order differential equations: du dx
v
dv dx
K
2.2.1. Reduction procedure without forcing terms We first discuss the case where / is identically equal to zero which simplifies our calculations, we shall come back to the case where / ^ 0 afterwards. Theorem 2.2.1. If we are given the system of first-order differential equations du
v
dx
K
dv — = 0, dx
and
with the initial conditions u(0) = /?i and v(0) = fc, K G L°°([0,1]), and K bounded away from zero, then we can apply the MRA reduction procedure to extract the effective system for the averages UQ and vo in VQ : UQ + M2VQ - /?i = -Afjvo
and
v0 - (h = 0 .
The coefficient Mi is given by the average of K, MI = / 0 ^4|y, and M2 is given
Proof. Using the notation of Sec. 2.1.1, we have
GW=I
: ; 1,
*"(')= ( °
and
V
Q(t)l »
*(') =
(°~ p{t) = q(t) =
We will derive the operators GQ and FQ for general n and then determine limn-too G$ n ' and lim„_+oo Fo , the effective operators on the space VQ begin ning with an infinitely small discretization.
Multi-resolution Homogenization
Schemes
171
We begin by simplifying the recursion relations for our two-dimensional system. Let us write G(t) and F(t) in block form so that
G{t)=
\o T ) "^
FW=
(o T)'
where T(t) = 0 initially and &(t) = l/«(t). Due to the structure of G(t) and F(t), the two-dimensional recursion relations for this system are very simple. In particular,
Gj = |
j
\
and
F, = I
i
\
(2.20)
with T, = PjTj+1P;
- (PjKQ'KQiQ^P;)
9 j = PJQJ+IPJ
.
and
Since these recursion relations change only Tj and Qj, we will work only with these with the understanding that the operators Gj and Fj are organized as in Eq. (2.20). Haar MRA At this point we must choose a basis in which to evaluate the algorithm. We will use the Haar basis first (see Appendix A for the definitions of the Haar scaling function
where
172 A. C. Gilbert
I1 0
cf =
0
0
1 0
0
0
0
0 0
1
0
0
/l 2
x-\
0
0
F x (1)
and
1 0
0
0
0o,o
0o,i^
0
0
0 liO
0i,i
0
0
0
0
yO 0
0
0
\
1 - 0 0 2 0
^0
and Piv
0 ^ 0 2
o o
i
-,
The entries 0j, m in the matrix 9 i are defined as inner products of 1/K with scaling functions: 0j)Tn = (4>iti, \
and
T0
1 -Qo©i^o* 4
The reduced operators are simply scalars and are given by the inner products &o = (
and
r 0 = ( - -ipo, -
Before we proceed to a reduction spanning more than one level, let us in troduce several definitions. Recall that in the Haar basis the operator PjKQ^ : Wj -¥ Vj has the matrix form /l P&Q)
=
0
o\
^
\o ... o \) and that it identifies the space Wj with Vj in the sense that the element Efc CfcV'j.fc is mapped to 2~(J +2 ) £ f c ck
Multiresolution Homogenization
Schemes
173
Definition. Let Ej :Wj-¥Vj be the operator which equates Wj with Vj by mapping £) fc Ckipj,k to £ f c c ^ , * so that we may write PjKQ] as 2~^+2^Ej. Definition. Assume that we begin the reduction process at resolution level n and reduce I levels so that we are at resolution n — l. For a multi-index A* of the form A/k = 0 , . . . , 0 , l , we define the operator (T\k)n-i as a composition of the operators Pj, Qj and Ej (for j ranging from n to n - 1 ) , (Txk)n-i = Pn-i ■ ■ • Pn-i+(k-i)En-i+kQn-i+kNote that the following three relations hold for (T\k)n-i(T\0)n-l (Txk,o)n-i
= (Ti)n-l
= En-lQn-l
= CTxJn-j
(To)n-i = Pn-i
,
(equivalents Ej-xQj-xPj
= £;-iQ;-i) ,
(by convention).
In terms of the scaling and wavelet functions, using the operator (T\k)n-i amounts to introducing simply a special type of wavelet packet. Recall that the wavelet packet rpn^iiei t n _, where tj = 0 or 1 (see Ref. 13), is defined by means of its Fourier transform;
^-«i« 1 ,...,«.-.(o=nn»« i «/2 > )^/2 n -' +i ). J=l
Using the same notation A* as above, we will work with the wavelet packets ipn-i\\k ■ Notice that the following relations also hold: V>n-i;Ao(z) = V>n-j( x ) » ^n-J;A*,o(aO = ll>n-l;\k{x)
,
*l>n-i;
(by convention).
For n — Z = 0, these special wavelet packets in the Haar basis are simply Walsh functions. For simplicity we will drop the subscript "n — l" on both xp\k and T\k when n — l — 0. The operator T\k applied to a function / is the product of the wavelet packet and the inner product of the wavelet packet ip\k with / ; i-e.
(Txk)f = Wxk,f)tPxk.
174 A. C. Gilbert
We may now write the result of our first calculation in this form
e 0 = PQiP' = Uo, %o\ with <#> = f 1
r
and r0 = -irAoe1p0* = /-iv>o,^o
\ and F 0 (1) = ( °
°
9
°
Y Note that 9 0 and T0 are
both scalars. Lemma 2.2.1. The form of the effective operators GQ n is given by the following:
\o
i I
and F^' for arbitrary
\o o
where 1
n)
n_1
rS = - E r ^ e » " ) p »
and
e0n) = p 0 ewp 0 \
fc=0
(2.21)
Proof. We proceed by induction. Assume that for level n we have
0
1 /
\0
0
where n-l
n)
r0 = -\Y, 2"fc7Afce(,")p0* 4
and
f c =o
e0n) = P^Q^PS.
If we start at level n + 1 and reduce n steps, then we have (dropping super scripts) n-l
ri = -
S
8
£
2-*(T A Ji0n+iPi*
and
Gj = P i 6 n + 1 P r .
f c =o
We now apply the recursion relation for &j to Qi to obtain ©0 = i^oQl-Po = - P o ( P l 0 n + l P r ) P o = -PoQn+lPo* •
For To we have
Multiresolution Homogenization
To - P 0 riP 0 * -
Schemes
175
\EOQO9IPS
= p° ( - 1 E 2-fc(rA jxen+.p; j P0* - ^ogoCPie^xP^Po' = —EoQoQn+iPS
- I X^2-fcP0(TAJ1en+1P0* fc=0
= -^r%e» 4
+
ii'o'
*=o
"■(") o r , ^ « ( " ) This gives us the general form of TQ and 65 ' and proves Eq. (2.21) for all n. D
In the limit as n —> 00, we find
ejr»-(*,i*)-/'.j (0 for K any continuous function which is bounded away from zero. Let us now determine the limiting behavior of IQ . L e m m a 2.2.2. For the Haar basis n-l
lim T ^ = lim - i V 2- f c T A t eWp 0 ' = / " t—- ,^ ,d t .
Proof. Let ft(i) = - ± Efclo 2 ~VA f c (0- Observe that ft is an infinite (but pointwise convergent) sum of Walsh functions supported on [0,1]. The Fourier transform of ft is given by 00
fc=0
= -imi(^/2)^/2)-
if;2-fcnmo(^)m1(^/2fc+1)^/2fc+1). fc=i
j=i
We now multiply ft(£/2) by mo(£/2) and obtain:
176 A. C. Gilbert
mo(*/2)n«/2) =
-\mo{Zl2)mi(ZH)4>{iH) 1
oo
k
i X)2~ f c n m o ( ^ / 2 ) m o ( ^ + 1 ) m 1 ( ^ / 2 f c + 2 ) ^ / 2 *
= 2ft(0 + ^ ( 0 For the Haar basis mo(£/2) is given by mo(£/2) = 1/2 + l/2e~^^2, rewrite the product of mo(£/2) and n(f/2) as
so we can
\Cl{i/2) + | e - « / a A « / 2 ) - 2ft(0 + ^ ( 0 .
(2.22)
If we take the inverse Fourier transform of Eq. (2.22), we see that fi must satisfy the relation fi(2t) + fi(2t - 1) = 2fi(t) + ^ ( t ) •
(2-23)
Since fi(£) is restricted to the unit interval [0,1], the weight function fl(i) = t — 1/2 does indeed satisfy Eq. (2.23). On the other hand, suppose 0*(t) were another solution of Eq. (2.23), also bounded and supported on [0,1]. Then u(t) = fl(t) - ft*(t) would satisfy w(£) = ±(1 + e-*/ 2 &(£/2)). Since WjLii1 + e ~ i 2 _ i c ) / 4 = 0 for all £, it follows that w = 0. This shows that Eq. (2.23) determines Q uniquely. Thus, we have proven our claim. □ Finally, the limiting behavior of GQ
d-> - (l yo
M
'\
I
-
and FQ
is
tf"
■= (°
J
Ml
\o o
where Mi = / 0 ^4jy df. and Af2 = / 0 t~/1A d£. This proves our theorem.
□
Biorthogonal MRA We turn now to the (3,1) biorthogonal basis and evaluate the reduction al gorithm in this basis. For a biorthogonal basis the recursion relations are given by
r<"> = Pjr&p; -
(PJK^Q^Q^P;)
and
Multiresolution Homogenization
Schemes
177
When we write P, and Qj, we mean the 2* x 2 J + 1 matrices which map P, : Vj+1 -+ Vj and Qj : Vj+l -+ Wj. Similarly, P* and Q) are the 2 ' + 1 x V matrices which map P* : Vj -t Vj+\ and Qj : Wj -t Vj+i. These mappings are the matrix form of the filters h, h, g and g. The 2 J + 1 x 2 J + 1 matrix Kj+i maps Vj+i -> V}+i and the product PJKJ+IQJ is a 2 J x 2j matrix which maps Wj -> t^. This notation is reminiscent of the projection operators in the recurrence relations for the Haar basis but should not be mistaken for projection operators. Using arguments similar to those for the Haar basis, we find the general form of r£ n ) and 9^ n ) to be 9< n) = P 0 e! n) Po*
and
k=0
The matrix R^ is defined as a composition of the matrices Pj and Qj (for j ranging from nton—k): R\k
We can write IQ o
Pn-kQn-(k+l)
as
I n-\ r n)
- Pn ■ - -
2"-< f c + 1 »-l
\
= ( 5Zsn,fc»-
where
£„,*(*) = -
^ j=o
'Xj^n-Cfc+ijjW-
The coefficients rk,j are the entries in the 1 x 2 n_ ( fc+1 ) matrix Po • • • PnKnR"\k Once again we find that in the limit as n -¥ oo
*-.<*.. I * ) - / ' * L e m m a 2.2.3. TAe limiting behavior of TQ as for the Haar basis. That is,
/or the (3,1) 6asis is the same
n) lim r< = n—foo um -£(p ---p„/<:„^J(Q„-(* +*i)e(' l ")p 0 *) n-too *—' 0 " k=0
I
K(l)
178
A. C. Gilbert
Proof. Since we know that for the Haar basis ft(<) = - 1 / 4 £ £ L 0 2 _ V A * (*) = t — 1/2, it suffices to show that the difference between X/fc=o ^n,*(*) a n ( ^ ^(*) goes to zero as n tends to infinity. We begin with the nth (n > 2) partial sum n-1 /2"-
n-l
J2 ^.kW = " S (
S
*=o
i=o
fc=o
\
\
r
*jV'n-(/k+l),j (*) • /
One can show that the coefficients r^j (for A; > 2) are given by rfc,o = r fc , 2 „_ 1 = 2- 3 ("/ 2 + 1 )
*"fc,2
77 1920' 187 5760'
r f c 2 n_ 3 = 2 - 3 ( n / 2 + 1 ) - ,
=
For k = 0 and 1, we have r 0 , 0 = 1/3 and n , 0 = rlti = 119v/2/1440. The "boundary" coefficients are different from the interior coefficients (just as the boundary wavelets are different from the interior ones) because we are
0
0.1
02
0.3
0.4
0.5
0.6
0.7
O.t
0.0
1
Fig. 2.1. This figure shows the difference between the fourth partial sum of (3,1) biorthogonal wavelets and the weight function in the Haar basis, ££=o—4,*W — ^(')-
Multiresolution Homogenization
Schemes
179
working on the interval [0,1]. See Ref. 23 for the construction of these boundary wavelets. One can also show that the difference £fe=o Hn,fc(0 — fi(0 is zero for the "interior" of the interval and is nonzero at the "boundary". More specifically, one can show that the difference is a piecewise constant function which takes on the values: ( -59 °'2 n +V' 2880 59 2880'
( l - — -
43 2880:
n-l
££ S , * ( t ) - f t ( * ) = 2 - " n
+2
{
k=0
1
2"+V'
2«+i'2"+
/
-43 2880
t £
_ _2_
-7 1440
_
2"+ 1 '
\
1 "
2B+1J '
2«+i' 2 n + 1 ) ■
7 1440'
(
-
y1
10,
—
■
2n+1'
2 n +!
otherwise
^n-1 — See Fig. 2.1. That is, the difference between the nth partial sum ^fc=o —n,fc(0 and Q(t) is nonzero on an interval of length 3/2" and has largest magnitude equal to 2 _ n + 2 (59)/2880. We can then conclude that YlkZo ^n.fcW converges pointwise to Q(t) = t — 1/2, proving our claim. D
2.2.2. Homogenization via augmentation We now apply the augmentation procedure of Sec. 2.1.2 to our effective equa tion. The corresponding homogenized integral equation Ghx(t)-/3
= Fh f
Jo
x(s)ds
has homogenized coefficients Gh =
1
o\ and
o iy
. /o Fh = I
\o
Mi - 2M 2
o
This integral equation corresponds to the differential equations
180 A. C. Gilbert
— =
{Mv-2M2)v, (2.24)
dv ~dl
0.
Notice that these are different from the homogenized equations for the classical theory (see Sec. 2.3): du = ~dl
M\v,
dv - r = 0. . dt However, the first system of differential equations (2.24) is consistent with the goal of the wavelet-based homogenization. The averages of the solutions to the original equations (the nonconstant coefficient case) are <«)=/%
and
(u) = /3i+/? 2 y" (J
-^rds\dt;
note that the initial condition is /3 = {(5\,(3v) € R 2 . To compare this with the averages of u and v as determined by Gh and Fh, given by (v)=fh
<«)=/?!+ft ^ - M i ) ,
and
notice that l
M
Ml M2
2 -
l
M
fX
dt
2j0W)-Jo-^)-dt
=
-L
1-t 0
^ * - 1/2 *■
dt;
«(*)
if we now integrate this last integral by parts, we see that it is exactly
2.2.3. Reduction procedure with forcing terms Let us now apply the MRA scheme to the general problem given by
±.( dx\
—\-f
dxj
(2.25)
Multiresolution Homogenization Schemes 181
where / is no longer taken to be identically equal to zero. Let / be a continuous function on [0,1]. We now have to include forcing terms p and q in our reduction procedure. With the same notation as in Eq. (2.2), Eq. (2.25) corresponds to the initial choices
q(t) = I ° )
and
p(t) = (
The operators GQ and FQ and their limits GQ' and FQ' remain un changed. Using similar techniques as those above, we calculate the general form of the vectors pg and q^ and determine that the limiting behavior of these quantities is lim p0n' = I
n
~>°°
I
and
I .m \ \ m l /
lim
=
I
n—*oo
1 \
m
_ 2 ,
where
qi
mi
= Jo ( t - l / 2 ) - L j f sf(s)dsdt + Jo = / f(t) dt Jo
and
(l-3)f(s)J\l/2-t)^r)dtds,
m2 = [ (t - l / 2 ) / ( t ) dt. Jo
The reduced equations for the averages (u) and (v) are (u) =Pi+fo x^M
Q M J - M 2 J 4- Q m ! - m 2 J
1
-M
2
)+ip
1
-
g i )
<«>=/% + £ m i - "»2 • If we simplify the expressions (2.26) and (2.27), we have
(2.26) (2-27)
182 A. C. Gilbert
{u)=0i+
^l! I!-^dt+J!l!{i-t)fm-s)is)dsdt + {i t) l! ~ w)(l!sf{s)ds~l{i~s)f{s)ds)dt
= A+^ f f^dt+ K s Jo Jo
f
i)
Jo Jo
f\l-t)f(t)(l-s)-±-dsdt K s
+ J\l-t)^f)(j\s-l)f(s)ds
= Pi+lh[
i)
+
I -T\&+ Jo Jo K(s)
£f(s)dsyt
I I I ±-Trrdsdtdx Jo Jo Jo K(t)
and
(v)=fo+
I I f{s)dsdt. Jo Jo In other words, Eqs. (2.26) and (2.27) are indeed the averages of the solutions to Eq. (2.25) given by u(x) = ft + f ' K^t d t and v(x) = ft + f" f(t) Jo \ ) Jo The corresponding homogenized integral equation Ghx(t) -0=Fh
dt.
[ x{s) + phds Jo
(2.28)
has homogenized coefficients Gh and Fh as above and the coefficient ph is / P
Pi/2-9i + (mi/2-m2)/3 Mi/2-M2 m.\ — 2m.2
V
One can verify that the solutions of Eq. (2.28) given by u{t) = A + (M x - 2M 2 ) j
v(s) +
** _ ^
+ ^ Q m x - m2j
v(t) = ft + / (mi - 2m2) ds Jo have the same averages as the solutions of Eq. (2.25).
ds,
Mtdtiresolution Homogenization
Schemes
183
We conclude that the homogenized coefficients Gh and Fh do not depend on the forcing term / , only the homogenized coefficient ph depends on our choice of / . Furthermore, the MRA scheme produces a homogenized equation for the general problem (2.25) which preserves the averages of the solution. 2.3. Several approaches theory: A review
in classical
homogenization
In this section we review the classical homogenization theory for one-dimen sional elliptic differential equations. Let K be a periodic function in L°°([0,1]) such that K(X) > u > 0 for all x G [0,1]. We will associate to K the differential operator
dx \ l
If we define Ke(x) = n(xe~ ),
dxj
then we have an associated family of operators
^-=("<--)=)• We also have a family of solutions ue in problems
HQ(\0,
1]) which solve the Dirichlet
L£U£ = A ( K ( x £ - i ) ^ f ) = / .
(2-29)
A positive constant KQ is the homogenized or effective coefficient for this prob lem if for any / G J/ - 1 ([0,1]), the solutions uE of the Dirichlet problem (2.29) have the following property: ue -> UQ
weakly in
#o([0,1])
ne-r^--¥ KQ-T— weakly in L2([0,11) ax ax where u 0 is the solution of the Dirichlet problem
JL ( dx
K 0
^)
= /
f r
°
and as
£-40,
u G
° ^o([0,l]).
The operator ^ ( « o ^ ) is called the homogenized operator and the equation
—( dx\
^ A =f dx J
184 A. C. Gilbert
is called the homogenized equation. The vector fields pE = KC ^gf-, po = /Co faare called flows. Let us derive the value of KQ with two different methods: an asymptotic expansion of the solution u e in powers of £ and a direct examination of the flows p e . We want to emphasize that these methods are used for physical problems which have two or more (but finitely many) distinguished scales. We will show that the multiresolution approach can be applied to physical problems with a continuum of scales and as such is more robust. 2.3.1. Asymptotic method In the problem ^ ( K ( X £ - 1 ) ^ ) = / , we have two distinguished scales (the scales of x and xe-1) so we seek a two-scale asymptotic expansion of the solution ue. As a first approximation, we look for a solution of the form uc(x, e) = u0(x) + £Ui(x, y), where y = xe~l and u\ is periodic with respect to y. Note that ^ = ^ + j $-. Then f=
dx\^~dx)
= £
1
( * l U l + K 2Uo) + («3UO + K 2 U l ) + £ ( K 3 W l ) ,
(2, 30)
where
K2 =
^{K{y)Yx)+Tx(K{y)lTy)
^
«3 = «( ( y\ ) *^ .
The first term on the right-hand side of Eq. (2.30) must be equal to zero, so dy'\K[y)~dy~)
=~dy"te-
This is a periodic boundary value problem in y with the right-hand side de pending on x as a parameter. Let N be the solution of d ( . .dN\ dK K dry{ ^)=-d-yNotice that Eq. (2.31) is equivalent to the problem
.
„ 2 31
<- )
Multiresolution Homogenization
Then ui(x,y) = N(y)%* and ue(x) = uo{x)+eN(y)%*. and the second term in Eq. (2.30) to determine: K3u0
+ K2UI
Schemes
185
Let us use this fact
. sd?uo d . . . . . . sscPu0 . .dNdPuo = K(y)-^ + _(«(,,)*(,))_ + «(j/)-^^r d?tio
dx2
(«v) + £M)NM) + *%)
Averaging this term with respect to y, we get v / / N / ,dN\cPuo (K3U0 + K2«l> = ( K(y) + K(y)-^r )-faa K0-
d?u0
where no = (*(j/) + K(j/)^jrf) is our homogenized coefficient. From Eq. (2.32) we know that N(y) = -y + £ ft fa where Afi = /„ and so
o= K(y) K(y)+
* ( -
fa
=
i) = i
the harmonic average of K. For the justification of this method see Ref. 15. 2.3.2. Flows In this section we review a different approach using the flows pf(x) — K(X£~1)^-. In this one-dimensional case these are sufficient for us to de termine the value of «o- Set B(x) = ^ y and F(x) = fQx f(t)dt. Then the equation can be rewritten as p £ (x) = / c ( x £ - 1 ) ^ = F ( i ) - c E
and
^
= B(x£-x)(F(x)
-
ce).
The constants c£, which are indexed by e, are determined by our boundary conditions 0 = / ^ d a ; = / B(xe-1)(F{x)-ce)dx. (2.33) Jo dx J0 To find l i m ^ o ce we must invoke a simple property of periodic functions that is frequently used in homogenization theory.
186
A. C. Gilbert
T h e o r e m 2.3.1. Let g : R n —> C be a periodic function whose period cell is a box B with edges directed along the coordinate axes and edge lengths l\, I2, ■ ■ ■, ln respectively. We denote the mean value of g by (); i.e.
{9) =
g(x) dx, W\ JEB where \B\ = lih--ln. The space LP(B) is the space of periodic functions with finite norm (|<7|p) 'p for p > 1. Assume that g 6 LP(B), p > 1. Then g{x/e) —► (g) weakly in V{U) as e -* 0, where fi is an arbitrary bounded domain in Rn; i.e. g(x/e) —> (g) weakly in Lpoc(Rn).
\W\f
Proof. We can restrict ourselves to the situation fi = sB where fi is a dilation of the basic box B with ratio s > 1. Observe that for / G L?{B) and e < 1,
/ \f(x/e)\'dx JQ
= en f
mxWdxKe^se-^
+
irUfnKcodfn
JsB/e
for Co depending on £2 and for |_ 5£-1 J * n e greatest integer not larger than se'1. Let q be a trigonometric polynomial with the same periodicity as g such that (Q) = (g) a n d (\g ~ Q\p) ^ £• Then for e < 1, we also have / \g(x/e)-q(x/e)\pdx
(B) r F(x)dxJo
limc £ (B) = 0 ,
e-fO
or limE_+o c£ = / 0 F(x) dx. Now we can also determine the weak limits iin L 2 ([0,1]) of the sequences ^ and p£. We have F(x) dx = ;po(x) limp £ (x) = F(x) - / F(x)dx Jo
«-+o
SS^W-W^-JC^)*)-^.)These formulas show us that
Mtdtiresolution Homogenization Schemes
187
so that uo is the solution of the Dirichlet problem
^(^_1^)=/
with
«oe^([o,i]).
Therefore, the homogenized coefficient /Co is ( K _ 1 ) ~ . We note that the value of «o for these one-dimensional elliptic equations holds in a much more general context (although we derived the value in the above, more restricted context). In particular, the operators Le : HQ([0, 1]) —> HQ ([0,1]) form a sequence of linear operators which are uniformly coer cive and uniformly bounded. The sequence of inverses Ljl : 1/^"1([0,1]) —> l HQ([0, 1]) is bounded uniformly so we may extract a subsequence L~ which converges (with respect to the weak topology in the space of operators) to a bounded operator M : / / ^ ( [ 0 , 1 ] ) —► HQ([0,1]). This operator M is coer cive and admits an inverse, denoted L0 : H£([0,1]) -> tf0-1([0,l]). It can be shown that the operator LQ is the homogenized operator with the form J ^ K O ^ J with homogenized coefficient K0 = ( K _ 1 ) ~ . See Ref. 15 for a more detailed discussion. We can see from the previous calculations and discussions that we have an explicit value for KQ in one-dimensional elliptic equations. In two or more dimensions it is sometimes possible to determine explicitly the homogenized matrix (see Ref. 15 for examples). For cases in which an explicit value cannot be determined, we can apply the Voigt-Reiss inequality (in Ref. 15). Theorem 2.3.2. / / we assume that the initial periodic matrix K is symmetric, then the homogenized matrix /«o satisfies the following inequality. (K-1)"
1
Furthermore, KQ — (K) holds only if div« = 0 and ( K - 1 ) if c u r l * - 1 = 0. 2.4. Physical
= «o holds only
examples
In this section we will present two examples which illustrate the differences between the classical and the MRA homogenization methods. We will show that the MRA method is more physically robust, meaning with this method we can handle many more physical situations.
188 A. C. Gilbert
The physical problem which we will look at is the steady-state heat distri bution in a rod of length one. We will assume that the temperature T satisfies T(0) = 0 and T ( l ) = A. We also assume that the average temperature gradient (z£) = Jo ^ ( s ) ds = ^- Our heat equation is then
d_( aT\ dx\ dx J with the conditions T(0) = 0, T(l) = A, and ( ^ ) = A. Also, the thermal conductivity K is a bounded function, bounded away from zero, and has period one. We will homogenize this problem for several different functions K. First, we will look at a family of thermal conductivities K„(X) = « ( 2 n i ) . Each function Kn models a and we want to know the n -> oo (or as the length physical motivation for the
material composed of period cells (of length 2 _ n ) effective thermal conductivity of the material as of each period cell shrinks to zero). This is the classical theory.
T h e o r e m 2.4.1. / / we use the MRA strategy to homogenize the problem
for each n and then take the limit as n —> oo of the homogenized coefficients K£, we will replicate the classical homogenization results. Proof. Again, rewriting Eq. (2.34) as a system of ODEs gives us dx
— Kn
and
where M\>n = f0
-j^- = 0 dx d K
f,y
with
T„(0) = 0 and n
vn(x) = (vn) "
Mi, n '
We know from the results of Sec. 2.2 that
(Tn) = r„(0) + vn(0) ( ^
W = Wo))=
- M 2 , n ) = \ - ™*& ,
A
Mlin-
Here M2,,, = / 0 *~ (s\ ds. Furthermore, the homogenized equations are
MvUiresolution Homogenization
Schemes
189
T*(x) = f «*(*)(M llB - 2M 2 , n ) ds, JO
^(z)
=0,
or, in differential form,
£( «fl*\ „ : ( fc^ ) = 0 dx 1
... with mk J*
cfl*, ^(0) =
and
^ -
The effective coefficient is given by
«5n =
Mi,„ - 2M2,,
In the limit as n goes to infinity, we have lim Mi „ = lim / n-Kx>
n-*ooj0
lim M2 „ = lim /
—7-—r = ( - ) K(2nS)
\K/
* ~1^
ds=(-\
and
f (s- 1/2) ds = 0
by Theorem 2.3.1. In general, we can conclude that 1
1
,fc = lim •• lim K„ — „., — ,, v , „_>„, « „_>«, M 1 ) n - 2M2,„ (i> or that our homogenized coefficient is simply the harmonic average of K, (the same as the classical theory!). □
K
We will now examine a specific family of thermal conductivities. \ \ = 2 — sin(27r2nx). The moments M\
= /
M2 n=
70
Let
— r ^ = / (2 - sin(27r2nx)) dx = 2 ,
K
n\Z)
= (x i/2)(2_s (2 2nx))dx Jo
- rw^ r " " " ^2irW-
So, (I*) = £(1 - j ^ r ) and 2*(z) = A(l - 2^»)x. Furthermore, our homog enized coefficient is
190
A. C. Gilbert
The classical theory tells us that our homogenized problem is
±(J_dJh\ = ±(m±\ dx \ M\ dx )
=Q
dx \ 2 dx )
with To(0) = 0 and T 0 (l) = A. We get T0{x) = Xx and (T0) = A/2. Observe that in the limit as n —> oo the two methods agree; i.e. lim »->oo
K£ »
= lim —
:—- = - = -r-r ,
n ^ s o 2 ( 1 - j2ir2 A.)
2
l i m ^ x ) = lim A^l - ^ ) x
n—foo
n—*oo
Mx
= Ax
This example prompts a question. Does the MRA strategy provide a higher order correction term in the asymptotic expansion derived in the classical the ory? The answer to the question is no, the MRA scheme is not simply a higher order term in the asymptotic expansion of the classical theory. Recall that the classical theory tells us T(x) » T £ (x) = T 0 (x) +
eN(y)^-,
where we take e = 2~n and N solves ^(/c(i/)(l + ^ ) ) = 0 with y = 2 n x and N(0) = N(l) = 0. Here, T 0 (x) = Ax and N(y) = -y + -fc /0V $ j . Therefore, r ( * ) « r a - . ( x ) = ro(x) + 2 - » J V ( y ) § = Xx + X2~» (-y =
+ ±-f"
^ )
f2"x 2^M-J0 2 - s i n ( 2 ^ d 5 X
A/cos(27r2"x)_J_>) 2\ 27r2n 27r2"/ So, the correction term is | C 2*2"
~ 2 ^ F ) - ^ e MRA algorithm gives
T£(x) = Ax - - ^ Ln . nV '
27r2
It is clear that the MRA scheme does not give us simply a more accurate approximation to the true solution T. The solution T„ is a linear function
Multiresolution Homogenization Schemes
191
which has the same average as t h e t r u e solution 7V, b u t which t e n d s pointwise to To as n goes t o infinity. <\ i
MRA homogenized soution Asymptotic s4Mlon -0.002 •0.004 -0.006 -0.008 -0.01 -0.012 -0.014 -0.016 -0.018 -0.02 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Fig. 2.2. This figure shows a comparison of the difference between the MRA homogenized solution and the true solution T*(x) — Xx (dotted line) on one hand, and of the difference between the asymptotic solution and the true solution T 2 _„(x) — Ax (dashed line). Here n = 3 and A = 1. Both functions T% and T2_„ correspond to the temperature in a rod with period cells of length 2~ n .
If we plot the difference of To and the two functions T^-* and T„ (see Fig. 2.2), we see that the approximate solution T 2 -» oscillates just below the line To(x) = Xx and as n tends to infinity these oscillations increase in fre quency and decrease in amplitude. The function T„ is a straight line from the origin to the point (1, A(l — 2 ^ r ) ) w i t n its average value exactly equal to the average of Tn. Also, in the limit T£ is the line Ax. As we discussed previously, the MRA scheme is more physically robust than the classical theory. The next example will illustrate a situation that falls outside of the reach of classical theory and yet, physically, this is an important case we would like to homogenize.
192 A. C. Gilbert
This example is a problem with a continuum of scales. Let —TT = 2 — sin ( 27rtan I — x 1 1 K(X) \ \2 JJ (see Fig. 2.3). This conductivity corresponds to a material composed of period cells but which has been stressed or distorted at one end. We emphasize that there is no small parameter e (or family of thermal conductivities «„(x) = *c(2nx)) unlike the previous examples. We can calculate Mi = / 2 - sin (27rtan ( ^-x) j dx w 1.89173
and
M 2 = I (x - 1 / 2 ) ( 2 - s i n ( 2 7 r t a n ( ^ x ) ) J d x « 0.052 25.
Fig. 2.3. This is a plot of the thermal conductivity -^r = 2-sin(27rtan(^x)). This function "contains" a continuum of scales.
These quantities allow us to determine the average temperature distribution (T) and to write a homogenized equation for this example even though there
Muitiresolution Homogenization Schemes 193
is no small parameter in which we could do an asymptotic expansion as in the classical theory. 2.5.
Conclusions
The MRA strategy for numerical homogenization consists of two algorithms; a procedure for extracting the effective equation for the average or for the coarse-scale behavior of the solution (the reduction process) and a method for augmenting this effective equation (the augmentation process). In other words, once one has determined what the average behavior of the solution is, one can construct a simpler equation whose solution has the same average behavior. For physical problems in which one wants to determine only the average behavior of the solution, the reduction process is very useful and is not part of the classical theory of homogenization. In some applications, this step suffices. On the other hand, the augmentation procedure yields effective material parameters (or homogenized coefficients) just as the classical theory does; however, the MRA procedure produces a homogenized equation which preserves important physical characteristics of the original solution, such as its average value. The MRA method is more physically robust in that it can be applied to many more situations than the classical theory. For example, the MRA strat egy can be applied to problems which have a continuum of scales while the classical theory may be applied to problems with only a finite number of distin guished scales. Moreover, for those two-scale problems for which the classical theory was developed, the MRA results agree with the results of classical ho mogenization in one dimension. 3. M R A Reduction Methods for Nonlinear ODEs* Let us begin by highlighting the difficulty in the reduction procedure for non linear equations. The reduction procedure begins with a discretization of the nonlinear equation. Just as the initial discretization of a linear ODE is a linear algebraic system, the initial discretization of a nonlinear ODE is a nonlinear system .F(x) = 0. (3.1) The nonlinear function T maps RN to RN (for N = 2 n ) and we denote the fcth coordinate of 7-{x) by T[x){k). Similarly, we denote the fcth coordinate *This section was published in Appl. © Academic Press.
Compvt.
Harmonic Anal.
5, 450-486 (1998),
194 A. C. Gilbert
of x by x(k). We change basis by writing and
s(k) = -Ux(2fc + 1) + x(2fc)) V2
d(k) = -=(x(2k v2
+ 1) -
x(2k)),
the averages and differences of neighboring entries in x. We split our equation into two equations in the two unknowns s and d by applying Ln and Hn to Eq. (3.1). The 2 n _ 1 x 2 n matrix Ln is the top half of the matrix Mn and 2 n - i x 2 n matrix # n j s t h e bottom half of /
M
""V5
1
1
0
0 0
1
- 1 1 0
0 0 - 1
0
...\ 0 0
0 1
0
0
V Our two equations are (3.2) (3-3) Notice that the function LnT maps R N / 2 x RN'2 to R N / 2 and similarly for HnT but that we cannot split these functions into their actions on Lnx = s and Hnx = d (as we did in the linear case). Instead, we can give the coordinate values for LnF and HnJ-:
{LnF{s, d)){k) = 4 ^ ( 3 , d)(2k + !) + ^ (HnF(s,d))(k)
= -j=(F(s,d)(2k
d
)(2fc)) >
+ 1) - ^(-,d)(2fc))
for fc = 0 , . . . , 2 n - 1 - l . As with the linear algebraic system, we must eliminate the differences d from the nonlinear system, Eqs. (3.2) and (3.3). In other words, we must solve Eq. (3.3) for d as a function of s. This equation, however, is a nonlinear equation and may not be easily solved (if at all). Let us assume that we can solve Eq. (3.3) for d as a function of s and let d(s) denote the solution. We then substitute d(s) into Eq. (3.2) to get
Lnr(a,d(s)) = o
Multiresolutum Homogenization
Schemes
195
which is the reduced equation for the coarse behavior of x. The form of the original system is preserved under this procedure and we may write the recur rence relation for T as follows: Ti-l{a)
=
LiFi(a,d\a))
where d(s) satisfies HjFj(s,d(s)) — 0. In the following two sections we will • give the precise form of the nonlinear system, Eqs. (3.3) and (3.2) in d and s, • state conditions for Eqs. (3.2) and (3.3) under which we can solve for d as a function of s, • develop two approaches for solving Eqs. (3.2) and (3.3) for d (a numerical and an analytic approach), and • derive formal recurrence relations for the nonlinear function Tj. 3.1. Nonlinear
reduction
method
We now extend the MRA reduction method to nonlinear ODEs of the form x'(t) = F(t,x(t)),
t€[0,l].
(3.4)
We will address the difficulties raised in the previous section with two ap proaches, a formal method to be implemented numerically and an asymptotic method. We will assume that F is differentiable as a function of x and as a function of t. The assumption that F is Lipschitz as a function of x guarantees the existence of uniqueness of the solution x(t). For the reduction procedure F must be Lipschitz in t and differentiable in x. We will rewrite this differential equation as an integral equation in a slightly unusual form: G(t, x(t)) - G(0, x(0)) = / F{s, x(s)) ds, (3.5) Jo where dG/dX ^ 0. The more usual differential equation (3.4) is obtained by setting G(t, x(t)) = x(t) and by differentiating. We choose this integral formulation since we can maintain this form under the reduction procedure. In our derivations we find it helpful to use an operator notation in addition to the coordinate notation so we write Eq. (3.5) in an operator form, G(x) = K(P(x)), where
(3.6)
196 A. C. Gilbert
K(y)(t) = / y(s)ds,
G(y)(t) = G(t,y(t))
and
F(y)(t) = F(*,y(t)) •
We will use the MRA of L2([0,1]) associated with the Haar basis to begin our discretization. We discretize Eq. (3.6) in t by applying the projection operator P„ to Eq. (3.6) and seeking a solution x„ € V^, to the equation
G n (x„) =
KnFn(xn),
(3.7)
where Gn(xn) =
PnG(xn),
Kn = P n KP n *
and
Fn(xn)
= P„F(xn).
Since we are using the Haar basis, xn is a piecewise constant function with step width Sn = 2~n. The functions Gn(xn) and Fn(xn) are also piecewise constant functions. Note that Gn, Fn and Kn map Vn to V„, although G„ and Fn are nonlinear functions. Let xn(k) denote the value of the function x n on the interval k5n < t < (k + l)Sn, for k = 0 , . . . , 2 n - 1. Let gn(xn)(k) and fn(xn)(k) denote the values of the functions Gn(xn) and Fn(xn) on the same interval. That is, 9n(xn)(k)
=
T
/
g(s, xn(k)) ds = (P B G(x„))(t),
where k5n < t < (k + l)Sn, and similarly for fn(x)(k). We can say that 9n(xn)(k) is the average value of the function G(t, •) over the time interval (kSn, (k+l)6n) and evaluated at xn(fc). Notice that gn(xn)(k) is the shorthand for gn(xn(k))(k). As in Ref. 9 we use the integration operator K„ defined by
l
\ o
0
\
(3.8)
Kn = Sn 0 1
1
1 2/
With this notation, the coordinate form of Eq. (3.7) is fc-i
Sn
9n{Xn)(k) = SnJ2 /»(*»)(*') + f^C-OW fc'=0
(3.9)
Muitiresolutton Homogenization
Schemes
197
This equation gives the precise form of the nonlinear system J-{x) = 0 discussed in the previous section. We are now ready to begin the reduction procedure. We first split Eq. (3.7) into two equations, one with values in Vn_\ and the other with values in Wn-i, by applying the projection operators P„_i and Qn-i- We now have the two equations P n _ ! G n ( x n ) = Pn-iKn(Fn(xn)),
(3.10)
Qn-lGn(Xn) = Qn-lKn(Fn{xn))
.
(3.11)
At this point let us work with two consecutive levels and drop the index n indicating the multiresolution level (assume that 5 = 5„). We recall that for the Haar basis the action of the operators Pn-i and Qn-i amounts to forming averages and differences of the odd and even elements of a vector (renormalized by a factor of y/2). We will modify the Haar basis slightly and normalize the differences by 1/5. The averages will not be adjusted by any factor. By forming successive averages of Eq. (3.9), we can rewrite Eq. (3.10) in coordinate form as \{g(x)(2k
+ 1) + g(x)(2k)) = S-
+ 2 E / ( * ) ( * ' ) + j/(*)(2fc).
(3.12)
fc'=0
In the same manner we rewrite Eq. (3.11) by taking successive differences normalized by the step size 6: i( 5 (x)(2fc + 1) - g{x)(2k)) = \{f{x){2k
+ 1) + /(x)(2fc)).
Let us rearrange the right-hand side of Eq. (3.12) as follows: Ik
2k—1
|z £ /(*)(*') +4jf(x)(2k + 1) +z | E '(*)(*') +4 7/(*X2fc) fc'=o
fc'=o
= 6 £ K*W) + 7/(*)(2* +1) +
T/( X )( 2/C )
(3.13)
198
A. C. Gilbert fc-1
c
= S £ U { x ) { 2 k ' + 1) + /(x)(2A;')) + -(/(x)(2fc + 1) +
f(x)(2k))
*'=0
-5-(f(x)(2k
+
l)-f(x)(2k)).
To simplify our notation, let us define S and D as "average" and "difference" operators which act on g(x) and f(x) by taking successive averages and differ ences of elements g(x)(k) and f(x)(k). We define S and D as follows: Sg(x)(k) = ±(g(x)(2k + 1) 4-
g(x)(2k)),
Bg(x)(k)
g(x)(2k)).
= ±(g(x)(2k + 1) -
Then we may write the coordinate form of Eqs. (3.10) and (3.11) in a compact form Sg(x)(k) + i-D/(i)(fc) = 26 £
Sf(x)(k')
+ *S/(x)(*),
(3.14)
fc'=0
Dg(x)(k) = Sf(x)(k).
(3.15)
We have split Eq. (3.9) into two sets and now we split the variables accord ingly. We define the averages s n _i and the scaled differences d„_i as s„-i(k) = -(xn(2k-rl) I
+ xn(2k))
and
dn-^k)
= 7(xn(2k o
+ 1)
-xn(2k)).
Notice that since xn is a piecewise constant function with step width 6n, then s„_i and dn-i are piecewise constant function with step width 2<5n = £„_i. We will now change variables in Eqs. (3.14) and (3.15) and replace x with x(2Jfc + 1) = a(k) + |d(ife)
and
x(2k) = s(k) - |d(fc).
We will abuse our own notation slightly for clarity and denote the change of variables by Sg(s,d)(k)
= l-(g(a + *-d)(2k + l)+g(a-
Bg(a,d)(k)
= I(g(a
&
-d)(2*))
,
+ §-d)(2k + ! ) - * ( « - id)(2k))
.
Multireaolution Homogentzation
Schemes
199
Note that when we write g(x)(k), this is the shorthand for g(x(k))(k); so g(x)(2k + 1) stands for g(x(2k + l))(2fc + 1). When we replace x(2fc + 1) with s(k) + §d(fc) and write g(x)(2k + 1 ) = g(s + § d)(2k + 1 ) , this is the shorthand for the expression g(x(2k + l))(2k + l) = g(s(k) + ^d(k)\(2k
+ 1).
The shorthand notation g(s — |d)(2fc) is similar. Then our system of two equations in the two variables s and d is given by 8g(s, d)(k) + -I>f(s,
d)(k) = 26^2
S/(s, d)(k') + 68f(st d)(k),
(3.16)
fc'=0
T>g(s,d)(k) = Sf(s,d)(k).
(3.17)
Our goal, as in the linear case, is to eliminate the variables d from Eqs. (3.16) and (3.17) to obtain a single equation for s. We consider (3.17) as an equation for d which we have to solve in order to find d in terms of s. Let us assume that we can solve (3.17) for d and let d represent this solution. Notice that Eq. (3.17) is a nonlinear equation for d so that d is a nonlinear function of s. We will discuss how this is implemented numerically in Sec. 3.4 and how this is implemented analytically in Sec. 3.2. In the linear case, d is a linear function of s and it can be easily computed explicitly. Provided that we have d, we substitute this into Eq. (3.16) and obtain Sg(s, d)(k) + -Df(s,
d)(k) = 25J2
Sf(s, d)(k') + 6Sf(s, d)(k).
(3.18)
fc'=0
Observe that we may arrange Eq. (3.18) as follows: *-i
ft.-l(fc)(«n_l) = *„-l £
s
/„-i(*>„-l) + f / „ - l ( l ) K - l ) ,
(3.19)
fc'=0
where 5„_i(*)(«„_i) = 8gn(k)(sn-i1dn.1)
+ ^D/nM^n-i.dn-x)
(3.20)
200 A. C. Gilbert
and /„_l(fc)(«n-l) = S A W ^ - i A - , ) •
(3-21)
In other words, the reduced Eq. (3.19) is the effective equation for the averages s n _i of xn. It is important to note that this equation has the same form as the original discretization. Let us switch now to operator notation to present the recurrence relations for the reduction procedure. We use the solution d of Eq. (3.17) to write Eq. (3.19) in operator form as
G™l(sn-i) = Kn-1Fin}1(sn-1), where s n _i = Pn-\x and the nonlinear operators G„_ x and F£J1 map V^_i to Vn-i. The superscript "(n)" on the operators denotes the level at which we start the reduction procedure and the subscript "n — 1" denotes the current level of resolution. The operators G„_x and F^j1 are denned as the opera tors which act elementwise according to Eqs. (3.20) and (3.21), respectively. Notice that they have the same form as the operators G„ and F^'; both functions G ^ ^ a n - i ) and i ^ " \ ( 5 n _ i ) are piecewise constant functions with step width (5„_i. In particular, the fcth element of G„_ 1 (s n _i) depends only on the arguments through the kth element of sn-i(k). Because the form of the discretization is preserved under reduction, we can consider Eqs. (3.21) and (3.20) as recurrence relations for the operators G£2i and F „ " \ and, as such, may be applied recursively to obtain a sequence of operators Gj and Fjn', j < n. The recurrence relations for Gj are given by Gj n ) = PjG^ Fjn) = PjFJll,
and Fj'
+
(for j < n) in operator form
"-^QiF™
,
(3.22) (3.23)
provided the solution dj of the equation QjG^-y = PjFJ"\ exists. Observe that the operator forms of the "average" and "difference" operators S and D, which we introduced in working with the coordinate forms of our expressions, are the projections Pj and Qj. We emphasize that this is a formal derivation of the recurrence relations. We show in Sec. 3.4 how to implement numerically this formal procedure. In Sec. 3.2 we derive analytic expressions for these recurrence relations.
Mtdtiresolution Homogenization
Schemes
201
Let us now address the existence of the solution d, to the equation QjGj+\ = PjF>"\. We will write this equation in coordinate form as follows (dropping subscripts): F(a,d)(k) = Bg(s,d)(k) - Sf(s,d)(k) = 0, where J : £ - > R 2 ' , ( j , d ) G £ a n open set in R 2 ' x R 2 ', and k = 0 , . . . , 2 j - 1. Assume that g and / are both differentiable functions so that T 6 Cl(E). Suppose that there is a pair (s°,d°) £ E such that
?(S°,d°)(k) = D5(s°,d°)(fc) - Sf(s°,d°)(k) = 0 and that the Jacobian of T with respect to d at (s°,d°) does not vanish. (We know that such a pair (s°,d°) € E must exist since a unique solution to our ODE exists.) The Implicit Function Theorem tells us that there is a neighborhood S of s° in R 2 ' and a unique function d : S -> R 2 ' (d € C ^ S ) ) such that d(s°) = d° and F{s, d{s)) = 0 for a € S. Let us investigate what it means for the Jacobian of T with respect to d at (s°,d°) to be nonzero. Notice that the kth coordinate of J-, J-(s,d)(k), depends only on the kth coordinates of s and d F(s,d)(k)
= T>g(s,d)(k) -
Sf(s,d)(k).
In turn, s(k) and d(k) depend on x(2k + 1) and x(2k) and we may write T(s, d){k) in terms of x(2k + 1) and x(2k). In particular, we can write Dg(s, d)(k) = ±(g(x)(2k + 1) Sf(s,d)(k)
g(x)(2k)),
= | ( / ( x ) ( 2 * + 1) + /(*)(2*)),
where x(2k + 1) = s(k) + ^d(k)
and
x(2fc) = s(k) -
b
-d{k).
When we differentiate IF(s,d)(k) with respect to d(k), we can apply the chain rule and differentiate with respect to x(2k + 1) and x(2k) instead. Therefore, the derivative of the term Dg(s,d)(k) with respect to d(k) is
ad(fc)
"v '
/ w
2 dx(2fc + l)
2 di(2fc)
202
A. C. Gilbert
We calculate a similar expression for the derivative of Sf(s,d)(k). Hence, the Jacobian of T with respect to d is given by the matrix J? with entries (k, I): JAs,d)(k,l)
= ^
-
= A(D5(s,d)(fc) -
Sf(s,d)(k))
Sg>(s,d)(k)-jT>f>(s,d){k),
k = l,
0
k^l.
Requiring the Jacobian of T to be nonsingular at (s°,d°) is equivalent to stipulating that the product below be nonzero; i.e. 2,-1
/ S2 1 7 (Sg'(s°,d?)(k) 0 d -jBf'{s°,cP)(k)) [Sg'(s ,d°)(k)'
^0.
fc=0
In other words, the quantity Sg'(s°, d°)(k) - ^Df^s0, d?)(k) must be nonzero J for every k = 0 , . . . , 2 ' — 1 to find a solution d(s) for each k. If <52 is sufficiently small, the product n*=o Sff'(s°,d0)(fc) ^ 0 dominates the condition. We will see this condition reappear in the analytic reduction procedure. We summarize the above derivation as Proposition 3.1.1. Given an equation of the form (3.9) on some scale j + 1 (with dyadic intervals of size 2 _ ^ + 1 ^ ) , we arrange the transition of this equation to an equation at scale j as follows:
fc'=0
*
where gi(k)(8j) = Sgj+iWia^dj)
+ - * ± ± D / j + , (*)(*,-, dj)
and
fj(k)(sj) = Sfj-1(k)(sj,dj). The solution dj to the equation Dgj+i(k)(sj,dj) — Sfj+\(k)(sj,dj) exists pro vided that there is a pair (s°,d9) which satisfy the equation and the product below does not vanish:
n (s5i+1(fc)(5jo,^)-%iD/J+1(fc)(ajo,^)) to.
Mtdtiresolution Homogenization
3.2.
Series expansion
of the recurrence
Schemes
203
relations
In the previous section we derived recurrence relations for the operators Gj and FJn', (3.22) and (3.23) which depended on the existence of dj, the solution of the equation D G S = SF-"{. In this section we derive analytic expressions for the recurrence relations (3.20) and (3.21) and an explicit expression for dj. Let us begin at the initial discretization scale 6n = 2~n and examine the transition from scale n to scale n — 1. We will not include the subscripts n and n — 1 unless they are necessary for clarity. Assume that S = Sn. The equation which determines d„_i is given by D ^ J n - i . c k - i K f c ) = S/ B (« B _i t d„_i)(*).
(3.24)
Below it will be convenient to expand g(x)(2k + 1) as follows: g(x(2k + 1))(2* + 1) = g(s(k) + 6-d{k) J (2* + 1) = g(s(k))(2k + 1) + g'(s(k))(2k + l)^d(k)
+
0(62).
We will then use a slight abuse of notation and write g(s(k))(2k + 1) as g(s) (2k + 1) (and g'(s(k))(2k + 1) as g'(s)(2k + 1)). The reader should beware that the notation convention for g(x) and g(s) is thus slightly different. To solve this equation for d, we will first expand g(s, d) and f(s, d) in Taylor series about s(k) (for each A; = 0 , . . . , 2 " - 1 — 1) and keep only the terms which are of order O(l) in 5. Observe that we may expand the left side of Eq. (3.24) as
l(9(S
+
I*)^
+ 1) 9 S
+ ^Y-(9'(s)(2k
~ ( ~ ld){2k))
+ 1) + g'(s)(2k)) +
=
]MsH2k
+
V-S(s)(2*0)
0(S2),
and similarly for the right side. After expanding both sides of Eq. (3.24) and retaining only terms of order 0(1) in S, we have the equation Bg(s)(k) + Sg'(s)(k)d(k)
=
Sf(a)(k),
which we may solve for d(s)(k): -
d(s)(fc) =
Sf(s)(k)-Dg(s)(k)
Sg^jik)
2
+0{5)
-
204
A. C. Gilbert
Next we expand the recursion relations for gn-i(sn-i) and fn-\(sn-i) in Taylor series about sn-i and keep only the terms which are of order O(l) in 8n-\. This gives us the following expressions for gn-i and fn-ign-i(sn-i)(k)
= Sp n (s n _i)(fc)
and
/„-i(s„_i)(fc) = S/„(s„_i)(A;).
Notice that if we retain terms which are only of order 0(1) in <Sn_i, the re cursion relations do not depend on dn_i! These equations simply reproduce the discretization procedure without incorporating any information from the fine scale. In operator form, we have done nothing other than project onto the next coarsest scale, reducing PnG(xn) = KnPnF(xn) to P„_iG(a; n _ 1 ) = K n _i-P„_iF(x n _i). Therefore, we have to include higher order terms in the recurrence relations to determine any contribution from the fine scales. Let us expand the recurrence relations for gn-i(sn-\) and / n - i ( s n - i ) in Taylor series again, but this time we will retain terms of order 0(1) and 0(^n-i)- This gives us recurrence relations of the form
fc-iW(i)
= Sgn(s)(k) + (&^(Bg'n(s)(k)+SfM{k))
+
±Dfn{s)(k)
/„-!«(*) = s/.(#) + (^V(sXfc)+^srww)*!., • Notice that these equations do include information from the fine scale. If we solve Eq. (3.24) for d n _i(s)(fc) to order 0(1) and substitute dn-i(s)(k) into the recursion relations for gn-i(s) and / n _ i ( s ) , we may split the functions gn-i(s) and fn-i(s) into two terms, one of order O(l) in J n _i and one of
order
0(6^): gn-i(s)(k) = l0(s)(k) + 7i(«)(*)#_!, /»-iW(t) = « b ( # ) + « i ( # ) t i .
where 7o = Sgn, #o = S/„,
7! = 4(D<£ + S/A) + ^D/ n + | W ,
Multiresolution Homogenization Schemes 205
-
S/n-Dfln Sg'n
We summarize the previous discussion in the following proposition. P r o p o s i t i o n 3.2.1. If F andG are twice continuously difjerentiable as func tions of x and if F is a Lipschitz function in both t and x, then we can obtain analytic expressions, at least up to order <$?, for the recurrence relations and for d. Let us again introduce a superscript "(n)" on the functions to denote the level at which we started the reduction procedure, the subscript "j ", as before, signifies the current level of resolution. If the functions
= 7 $ . 1 (*)(*) + 7J5 + i(*)(*)^+i
/ $ ( » ) ( * ) = ^j+1(s)(k)
and
+*i3 + i(«)(*)*i a + i.
then we may arrange the transition of these functions to functions g^ (s) and fj(s)
at scale j as follows: 9in)(s)(k)=j^(s)(k)+^(s)(k)S]
and fjn)(s)(k)
= e$(s)(k)
+
9<$(s)(k)6l
where (dropping superscripts) 7o,j = S7o,j_i, 0o,j — S0o,j-i, 71 j = J s 7 i j - i +
^(DVOJ-I
j _ Sg0|j_x - D7oiJ-_1 a
7o,i-i
+ Sfl&j-x) + ^ E W o j - i +
^ W j - i >
206 A. C. Gilbert
In other words, at level j , we arrange the functions g^ and / ■ so that they consist of two terms of the appropriate orders and we write recurrence relations for each of these two terms. Remark. We usually initialize the reduction procedure with the O(l) terms:
7$(«)(*) = 9^(s){k),
fin\s)(k),
oQ(,)(k) =
and the 0(5„) terms:
7i2(*)(*) = 0,
*g(*)(fc)=0.
This can be modified, however. Remark. Higher order expansions may be obtained in the same manner. We supply an algorithm implemented in Maple in Sec. 3.2.2 to compute the recurrence relations for arbitrarily high order terms. 3.2.1. Recursion relations for autonomous equations We will now apply the reduction procedure to the autonomous integral equa tion G(x(t))=
f F(x(s))ds (3.25) Jo and examine the series expansions for the recurrence relations when applied to this autonomous integral equation. We will consider only the first two terms in the expansions; higher order discretization schemes can be obtained if we keep higher order terms in the expansions. Theorem 3.2.1. Let us assume that the functions F and G are both twice continuously differentiable as functions of x and that ^ ^ 0. Then the coef ficients %J, 7 J J , fl£y and 0£y are given by
JM-G io,j — " i 0M_F
-<»> _ ( 2 2 " - D / 3 / F \ n,j — gin)
2
2 +
2m
\G'J
(22"-l)/3/FY2
(2^-l)/3/F\2 23+2m
\G')
'
Multi-resolution Homogenization
Schemes
207
where m = n — j . Furthermore, in the limit as m tends to infinity, the coeffi cients converge to
j-°°)-c
J-°°)-L(L\F'
+ ±-(—\2G"
7c,;
lhi
+ 2A\G>)
-<*,
/}(-<») _ F p
\j
~>
~
_0 fl( °) 6
l2\G>)*
— * f
U
-Yl\G>)
^ >
) F p"
■
Proof. Since the functions G and F do not depend explicitly on time, the terms gn(xn){k) and fn(xn)(k) in the initial discretization 9n(Xn)(k) = <Sn £
fn(Xn)(k')
+ f
fn(xn)(k) l
k<=0
are simply the values of G and F evaluated at xn(k). In the non-autonomous case, the terms gn(xn)(k) and fn(xn)(k) are the averages of the function G(t, •) and F(t, •) over the time interval k5n
= ^(gn(sn-i(k))(2k
+ l)+gn(sn.1(k))(2k))
=
gn(sn^)(k).
We will drop the parameter k in what follows for this reason and simply write G(x n ) and F(xn) instead of gn(xn)(k) and fn(xn){k) and we will simplify the recursion relations. We begin with an initial discretization of our integral equation at resolution level n = 1 and initialize the coefficients as follows: $l(x1)
= G(xl),
7{,1i)(xi) = 0,
B$(xl)
= F(xl)t
0[1}(x1) = O.
208
A. C. Gilbert
We reduce one level to j = 0 so that the difference in resolution (n — j) is one. Using the simplified recursion relations, we calculate the reduced coefficients:
<>(*„) = C(x„),
e™(x„) = i ( | M ) V ( z „ ) .
We want to find the forms of the coefficients for an arbitrary difference in resolution {n—j) = m. We proceed by induction. Assume that for (n — j) = m we have
«-'.
p.",
We will apply the simplified recursion relations to these coefficients and reduce one more level so that n — {j — l) = m + l. It is clear that 7o"-_i = G and *JJ_i = /r. The simplified recursion relations tell us that 7IJ-I
- 4 S7 W + 1 6 ^ G , J * + 3 2 ^ G ,J _
(F\(l \GVV16
G
l(22"-l)/3\ 4 23+2™ j
V ^ V V32 4 23+2"> y ~
(2 2(m + i) _ 1 ) / 3 2 2 m 1 2 + ( + )
(F\ \ G 7
(2»("""> - l ) / 3 / F\2 23+2("*+1) \G'J
fl(») _ 1 « f l ( " ) ^ X { FYf" *U-i-4 Stf iJ+32^J F / F \ V l VGV ^32
l(22"-l)/3\ 4 2 3 + 2m /
(2a(m+i)_1)/3/Fy 2 3+2(m+l)
^G/^ ^ •
'
Midtiresolution Homogenization
Schemes
209
This proves formulas (3.26) and (3.27) for all m = (n — j). Note that these forms depend only on the difference in resolution levels n — j . In the limit as m tends to infinity, we find that the coefficients converge to the following:
Additionally, the limiting values of these coefficients eliminate the error of the initial discretization, give us expressions independent of resolution level j , and contribute errors only from the truncations of the original Taylor series. The reduced equation at level j is then given by fc-i
9i{xj)(k)
s
= Si Y, /i(*i)( f c ') + ifj(xi)(k) Z fc'=o
*(*)(*) =7$~ ) (*(*))+7i ( ~ ) (*(*))* 2 f(x)(k)
= 9i°°\x(k))
wh
« e (dropping j)
and
+ ^ oo) (x(fc))* 2 .
(3.28) (3.29) (3.30)
3.2.2. Algorithm to generate recurrence relations In Sec. 3.2, we limited our expansions to 0(52) terms. In this section we present an algorithm (implemented in Maple) to compute the recurrence relations for the terms of the power series expansions including higher powers of 6 I
9J(sj)(k) = J2^MW5^\
(3.31)
t=0
/
£('*)(*) = X>>i)(*)*f.
(3.32)
t=0
/ <*j('i)(*) = X>jte)(*)#.
(3-33)
i=0
In other words, if we group the terms in gj, fj and dj by their order in 6j and if we stipulate that the terms in
210
A. C. Gilbert
fashion, then we can determine the recurrence relations for the coefficients 7i i j_i(sj_i)(fc) (i = 0 , . . . , / ) in the series expansion of gj-i (and similarly for the coefficients 9ij-i). In the program shown in Fig. 3.1, we first specify the order I of the expan sions. In the example program the order is four. Next the four quantities ge, go, f e and f o are defined. ord := 2: ge := sum((SG(i,x) + h*DG(i,x)/2)*h~(2*i), i » C o r d ) : go :- sum((SG(i,x) - h*DG(i,x)/2)*b.-(2*i), i = C o r d ) : fe :- sum((SF(i,x) + h*DF(i,x)/2)*h"(2*i), i » C o r d ) : fo := sum((SF(i,x) - h*DF(i,x)/2)*h~(2*i) , i = C o r d ) : QG := (subs(x = s + h/2*d,ge) - subs(x = s - h/2*d,go))/h: QF := (subs(x = s + h/2*d,fe) + subs(x = s - h/2*d,fo))/2: dsub := {d = sum(d(i)*(2*h)-(2*i), i=0..ord)}: eql := taylor(subs(dsub, qG - qF), h, 2*ord + 2): solve(coeff(eql, h, 0), d(0)); newf := taylor(subs(b. = h/2, subs(dsub, qF)), h, 2*ord + 2): coeff(newi, h, ord); PG := (subs(x = s + h/2*d, ge) + subs(x = s - h/2*d, go))/2 + h/4*(subs(x = s + h/2*d, fe) - subs(x = s - h/2*d,fo)): newg := taylor(subs(h = h/2, subs(dsub, PG)), h, 2*ord + 2): coeff(newg, h, ord); Fig. 3.1. Maple code to compute recurrence relations for coefficients up to any specified order in series expansions of g and / . The specified order for the example is ord : - 2 . The variable h stands for the 6 used in the text.
Notice that we are using the fact that (Sg)(x)(k)
= \(g(x)(2k
+ 1) +
g(x)(2k)),
(T)g)(x)(k) = ±(g(x)(2k + 1) -
g(x){2k)),
to express ge = g(x)(2k), the even-numbered values of g(x), and go = g(x) (2k + 1), the odd-numbered values. The step-size <5 is accorded the variable h in the program. Next we form the two sides of the equation qG-qF=0 which determines d; at the same time we substitute x(2k+l)=s(k)+h/2d(k) and
Multiresolution Homogenization
Schemes
211
x(2k)=s(k)-h/2d(k) into ge and fe (respectively, go and fo). Into the ex pression QG-QF, we substitute the series expansion for d, d = sum(d(i) * (2 * h) A (2 * i ) , i = C o r d ) . We expand the expression QG-QF in a Taylor series and we peel off the zerothorder coefficient in h and solve for d(0), which gives us the first term in our expansion for d. This is the recurrence relation for 770. To determine higher order terms in the expansion of d, we use, for example, simplif y(solve(coef f (eql, h, 2), d ( l ) ) ) . Recall that the recurrence relation for / j is fj = S/j+i and notice that S/j + i is the same as QF so we simply substitute the expansion for d into QF. Then we let h=h/2 to adjust the resolution size for the next step and finally expand the expression in a Taylor series. (Recall that gj+i and / J + i are expanded in powers of 5j+i = Sj/2 and gj and fj are expanded in powers of 8j.) To determine the recurrence relation for the coefficient 6i(s)(k), we peel off the ith coefficient (for t < ord): coeff (newf,h, i ) . The recurrence relation for gj is given by gj = S
of the series
expansion
In Sec. 3.2 we derived the series expansion of the recurrence relations, provided an algorithm for generating the recurrence relations for arbitrarily high order terms in the series expansion, and showed that for autonomous equations, the
212 A. C. Gilbert
recurrence relations for the two lowest order terms in the series have a fixed point. In this section we address more general issues. We have shown that if we begin, at some scale j , with functions 9J(SJ) and fj{sj) of the form /
9i('i) = £7i>;)*?
/
and
f^) = Y,*A»iW >
i=0
then we can arrange the transition of these functions to functions and fj-i(sj-i) at scale j — 1 as 9j-i(sj-i)
/ = 5Z7i,j-i(sj-i)*ili t=0
(3-34)
i=0
and fj.^Sj-i)
gj-i(sj-i)
/ = 52^,j-i(sj-i)^ii, i=0
(3.35) where 7 J ) J _ I ( S J _ I ) and 6ij-i(sj-i) are given by recurrence relations which we can generate with the algorithm given in Sec. 3.2.2. There are three questions we will (at least partially) answer in this section: • Can we find a closed form for the recurrence relations for fij and dij? The algorithm which we presented does not yield a closed form; it only computes the recursion relation. • If we begin our reduction procedure at level n and apply the recursion rela tions repeatedly, does this process converge? In other words, can we deter mine the limit as n tends to infinity of an arbitrary coefficient 9\nj{sj)? • If we let the highest power / in the series expansions (3.34) tend to infinity, do these series converge? If the series (3.34) do converge and have radii of convergence R, do the series (3.35) at a coarser scale also converge and how do their radii of convergence depend on R? While we cannot fully answer the above questions, we do have partial answers. First, we can find expressions for the recurrence relations for 7 ^ and Oij without resorting to our algorithm to produce them. Second, we can show that repeatedly applying the recursion relations for the four coeffi cients 7o,j,71 j,6o,j and 9ij does converge (under certain specified conditions). Third, if we restrict our attention to linear ODEs of the form ^(G(t)x(t)) = F(t)x(t), then we can answer the above questions. That is, we can derive a closed form expression of the recurrence relations for 7 i j and Oij and we can show that the limits limn-too 6\J(SJ) and lim„_+oo 7<," ( s j) e x i s t f° r any i and
Multi-resolution Homogenization
Schemes
213
j . Also, we can determine conditions under which the series (3.35) will con verge and how that convergence depends on the convergence of the fine-scale series (3.34). 3.3.1. Closed form expressions To find expressions for the recurrence relations for 7 ^ and 8ij, we recall the equation which determines df D 5 i + 1 ( S i ) <*,)(*) = Sfj+1(sj,dj)(k).
(3.36)
As in Sec. 3.2, where we derived the series expansion of the recursion relations, we expand both sides of Eq. (3.36) in a Taylor series of order Sy+l about Sj. Next, we assume that gj+i(sj+i) and fj+i(sj+\) have the form 1
1
9j+i{sj+i) = J2-ri,i+i(sj+1)Sf+1
and
fj+1{sj+i)
= J^ij+ifo+O^+i.
1=0
1=0
We substitute these forms into our Taylor series expansions and retain only terms of order less than 21 in Sj. Note that we have used the fact that Sj+i = Sj/2 to write this series in powers of Sj rather than in powers of Sj+\. The equation which determines dj is now given by /
D-rP,J+i(sj)(k) + di(Sj)(fc)s7^+1(5J)(fc) p=0
p-i
S0Ptj+1(sj)(k)
^i-pEiAsJ)(k) = 0,
+
(3.37)
i=0
where (dropping subscripts) T2(p-i) p
^• -(2(p-i))!
J2(p-») + l 1J7i
+
(2(p-i) + l)r7<
T2(p-i)-l
_ _h
riaWp-o-i] _
(2(p-i)-l)!
*
j2(p-t) a
j
se[2{p-i)]
(2(p-»))! <
Note that the superscript "[•]" denotes differentiation. To solve for (up to order 21 in Sj), we set /
di(sj)(k) =
»=o
Y,r>^sWi?'
dj(sj)(k)
214 A. C. Gilbert
we substitute this expansion into Eq. (3.37), and we solve for each f]i,j(sj)(k). For example, to solve for »?Dj(Sj)(fc) (as we did in Sec. 3.2), we keep only those terms in Eq. (3.37) which are linear in Sj Ehtoj+i («*)(*) +*JDj(«j)(*)Sy 0 j+i(«i)(*) = Sfloj+i(«,-)(*), and we solve for Vo,j{sj)(k) Sflo,j+i(sj)(fc)-Dflo,j + i(s J )(fc)
VoA*i)(k)
S0'o,j+iM(k)
as calculated previously. If we apply similar manipulations to the formal recurrence relations for fj(sj) and gj(sj), we obtain the following expressions for 7i,j(sj) and Oi,j(sj): P-1 i
P-1 4
jij = 4 - S 7 i , J - i + £
Py
'" '.P
and
6i
'i =
4_iS
^.i-i + Z
4
'""^.P • (3.38)
where j2(p-Z)-l
j2(p-J)
y, _ _ 2 iVyPtP-O-il + _JL? s Ja
, _^J +
j2(P-0 nfl[2(p-0-l]
(( 22((pp- -l )I -)l-) l! ) r « 72(p-/)-l
^,P = (2(p-Z)-l)!
1
, +
fl
J
qfl|2(p-')l
( 2(2(p-Z))! ( -/))!S^-1 P
'
J2(p-0 i eff[a(p-01 + (2(p-Z))! 'J-1 ' a
'J-
1
i=0
We should point out that these are not quite closed formulas since they include powers of dj and hence powers of the expansion dj = J2i=0 Vi.jtf1- In other words, these expressions are not homogeneous in Sj. 3.3.2. Convergence of the lowest two order terms We now examine the limiting behavior, as n tends to infinity, of the recurrence relations for the two lowest order coefficients 7o"^> 7i 5 , 0$ and 0 $ . We
Multi-resolution Homogenization
Schemes
215
emphasize that these are recurrence relations for a non-autonomous nonlinear differential equation. Recall that the superscript "(n)" indicates the initial res olution level and the subscript j the current resolution level. We will show that these limits exist by proving that the coefficients are bounded as a sequence in n and that the difference between successive members in the sequence goes to zero as n tends to infinity.
Theorem 3.3.1. Let us assume that the solution x(t) of the equation G(t,x(t))
- G(0,z(0)) = [
F(s,x(s))ds
Jo
is bounded so that \x(t)\ < R for t € [0,1]. We will also assume that F and G are twice continuously differentiable as functions of x G [—R, R] and that F is Lipschitz as a function oft. The function G is automatically continuous as a function oft. Let us also assume that ^ is bounded away from zero for all x € [-R, R] and for all t € [0,1]. Then we may conclude that the limits as n tends to infinity of the coefficients fg-, 7}"', 0Q"- and 6^ exist and are finite.
Proof. The recurrence relations are given in operator form by the formulas
/)(") _ p /)( n )
°oj —
n°o,j+i'
f
^
-
A"+1
~ -
E
■ „(»)
JQJ7OJ+I
wirj+xY
'
and we initialize the recurrence relations as follows:
216
A. C. Gilbert
» -— _ pr n G« > ^1 7o,n -v (n)
= o,
7l,n "0,n
= P„F, = 0.
Recall that the operator Ej is the identification operator which identifies the subspace Wj with the subspace Vj. At each level j , we must compose the operators 7^"' and 0 Q J with the operator P / to form the operators
I^JPJ-.VJ
-> Vj and
0QJPJ
'• vi ~* vi
since 7^- and 0QJ must act on Xj € Vj and not on x G L2([0,1]). It is clear that the O(l) terms 7 ^ and 0Q"- are simply the projections Pj • ■ • PnGPj = PjGPf and P, • ••PnFPf = P j F P / , respectively. Furthermore, the nonlinear operator PjGPj : Vj -¥ Vj is bounded independently of n and the limit of the operator %}Pj
= PjGPj
as n tends to infinity is simply PjGPj
(and
similarly for 0Q"f). We turn now to the second order terms 7}^ and 0("j. In what follows, we will drop the composition with the operator Pj and the operators f[j
and
#i"- will implicitly{ include in< the composition with Pj" so that they map Vj to Vj. We will split 7}"' into three terms ~(»)_r(»)+r(»)+r(») and 0, „• into two terms I>J
I.J
*
Each of these terms is generated by a specific term in the recurrence relations for 7j"- and 0\j. We write the recurrence relations for the individual terms (n)
' , An)
r\ f and t\ / as
r
S = J ^ S H + jgVoAPjVirJ+iY + EjQjd^l,)'),
Multiresolution Homogenization
n (n) _ Ip./( ) i r
rf
. _L/V,„ \2P(ft^
Schemes
217
V
2,j - 4 J 2 , i + i + 3 2 ^ / ^ " o j + i J •
We can check that the sum of the recurrence relations for r\j,
r£ j and r^y is
the entire recurrence relation for the sum 7}"' = r, „• + r,,- + r,,-. A similar check holds for 0\n'• and the terms t, ■ , and t, ... We observe that the terms
r
S- r S • *S>and 45 are similar in that they all are a product of 770,7 (or
T)Q
•) and a function of 7oV i and OQJ r^J aisproduct distinguished are +similar in +V thatThe theyterm all are of 770,7 from (or
T)OJ) and a function of %j+i and #o"+i- The term rj"- is distinguished from the four other terms because it consists solely of a function of 0o\V+i" We will restrict our attention to the term r^J. Lemma 3.3.1. The limit as n tends to infinity o/rj 0 '(zo) is / 0 (t — 1/2) xF(t,x0)dt. Proof. When we derived the series expansion of the recurrence relations in Sec. 3.2, we expanded the recurrence relations for •• and / • in Taylor series and grouped terms in powers of <5?. We did this because we wanted g*1 and /■ to have the same structure as the function « " \ and /_•"\ at the finer scale j + 1 . Had we grouped the terms in gy' and / j n ' in powers of <J|+1, the smaller step size, instead, we would have had the expansions
sjn) = PMZ + (\E&f&
+ \*oAPi(fi:\Y + EiQiij,^)')
+ J(*J)aPi(ff}?i),')*?+i,
r
»*=
- < » y PMtY
218 A. C. Gilbert
The difference between the expansion in 6? and the expansion in S?+1 is that the second-order terms in the expansion in 6?+1 are four (= (Sj/Sj+i)2) times those in the expansion in 5"j. Similarly, if we had grouped the terms in powers of <5j+1 rather than 6?, the recurrence relation for r["- would be r
S = Prfhi + \E>QA1+i.
(3-39)
again with an extra factor of four when compared with the recurrence relation from the expansion in 6?. Let us initialize r[ n „ = 0 and apply the recurrence relation (3.39). We claim that Ti"o is given by
^^E ^fro, 4
(3-40)
*=o
where T\k is a composition of the operators Pj and Qj, T\k = PQ • ■ • Pk-iEkQk, and the multi-index A& has the form Afc = 0 , . . . , 0 , l .
We have explicitly included the operator P0* for clarity here. With the nor malization we have chosen for the Haar basis, the expression (3.40) is almost identical to the result we derived in Sec. 2.2.1 for the effective coefficient in the elliptic equation. The difference between the two results is that T\Q is a nonlinear operator. We may conclude, however, that the limiting value of » is r1 lim rft^xo) = / (1/2 - t)F(t,x0) dt, for x0 G V0 . □ n fo - ° ' Jo
Let us now examine the term r^J ■ L e m m a 3.3.2. The limit lim n _ +00 r^J exists and is finite. Proof. We return to the expansion of the recurrence relations in powers of J | and the expected form of the recurrence relation for r£y:
MuUireaolution Homogenization
r
2j-l -
r
An
2J+l Z J + 1+ T^°j
4 ■*
'
219
w n e r e
>
16
Schemes
\
-
0
-
with the initialization r$"„ = 0. Observe that we may bound the operator bj independently of n and j since ( .)
J
/ P J F - Q J G W
~V P,fg
OF
A'a*
ag\
+gj
a*y
and since we have assumed that F and G are twice continuously differentiable for x G [—R,R], that both functions are (at least) bounded for t G [0,1], and that ^ is bounded away from zero for all x G [—R, R] and all t G [0,1]. We claim that r,,- is the sum r
2,i -
16°j
\
+ 64
J J+1
4l+n-i * V
1= 7 + 1
i
n-20„-i
/ n
fc( ) ;= The reader is invited to check this himself. Since each bj is bounded indej
pendently of n and j by a constant C, we can bound rj ,• n-l
■•fflisis
»+EJR i=J+l
<*■
Finally, the difference between two successive terms r^"* as n tends to infinity: lim | | 7 > + 1 ) - 7 > j | | = lim
1
p ...p
— r^j goes to zero
J,("+1) < lim
r = 0.
n-+oo 4 n — ^
Therefore, the limit as n tends to infinity of Tj"- exists and the convergence is of order O ( ^ ) . ' n Similar arguments can be made for the other terms r$j, t\J, and t£j.
□
220 A. C. Gilbert Remark. We should point out that F must be Lipschitz as a function of t so that we may expand the recurrence relations for gy' and />"' in a series in Sj. In other words, the product of D6oj+i(s)(k) and 6? must remain of order 0(6j) for the series expansion to be valid. So, the ratio below must be bounded by a constant for all fc = 0 , . . . , 2 J — 1 \fj+i(s)(2k
sup
xe[-R,R]
+ l)-fj+1(s)(2k)\
^c>
fy+l
which holds if F is Lipschitz in t. If F is merely bounded in t, the order of the product D60j+i(s)(k) is 0(Sj). That is, for all k = 0 , . . . , 2j - 1, we have
" i€|-fi,fi]
and <5?
|/ J+1 («)(2fc + l ) - / J + 1 ( « ) ( 2 f c ) | C x —x d d J+i i+i
This means that our series expansions can no longer be expanded in powers of 5? but must be expansions in powers of Sj instead. To derive the series expansion for the recursion relations for non-Lipschitz systems, we follow the derivation presented in Sec. 3.2 but we begin with the assumption that the functions gj+i(sj+\) and fj+i(sj+i) at scale j + 1 are grouped by powers of 5j rather than by powers of 5?: 9i+i(t>j+i) = 7o,j+i(sj+i) +-n,j+i(sj+1)5j+1
+72,J+I(SJ+I)<^+I
/j+i( s j+i) = 9o,j+i(sj+i) +^1^+1(^+1)^+1 +
02,j+i(sj+i)6j+1.
We carry out the derivation as before and determine 9j(sj) = 7oj(*i) + l\Asi)si
+ 72,j0»j)<*j ,
fA*i) = 9oA*i) + hAsj)SJ
+ *aj(«i)«? -
where (in operator form) 7o,j =
Pjloj+i,
&oj = Pj6o,j+i,
,
Multiresolution Homogenization
72,i - PjKj+i
hj
+ ^jEjQfrj+x
= 2^j&l^+1
+
+ -VoAPAj+i
+
Schemes 221
EjQn'o,j+i)
$f>jT)o,jEjQj6'0j+1,
h j = J - P j ^ j + l + ^Mo.j-EjQjfli.j + l + g*j»?lj^,-Qi^o,i+l + ^(%,i)2P^o,J+i, 32' %,>
Pj6o,j+i - EjQjlo,j+i Pj7o,j+i PjOo,j+i - EjQjfo,j+i
fi,i =
+ PjQ\,j+i -
SjEiQjffoj+i'lojH
Pjloj+i
As in the derivation of the recursion relations for the Lipschitz systems, higher order expansions may be obtained in the same manner. 3.3.3. Linear ODEs and convergence issues We now restrict our attention to linear ODEs of the form
±(G(t)x(t)+q(t)) = F(t)x(t),
te[0,l]
and address the convergence of the series expansions for gj and / , , the general formula for the recurrence relations for the ith-order coefficients jij and 6ij in these series, and the convergence of these recurrence relations. Let G and F denote the linear operators G(t)x(t) and F(t)x(t) respectively. If we begin with linear functions <7J+i(s) and fj+i(s) at scale j + 1 of the form
t=0 J t=0
222 A. C. Gilbert
then the reduction of these functions to linear functions gj(s) and fj(s) at scale j preserves the form of the functions above. For linear ODEs, the recurrence relations (3.38) for 7 ^ and Oij do indeed have a closed form and are given in operator form by
t-i +
£ 4 - < i + 1 > + V ; ( ^ - ( < + i ) , ; + i + £;Q;7i- ( / + i),,- + i), 1=1
hi = hWw „. . -
P
+ X>- ( i + 1 ) + WWi- ( J + i)j+i.
A J + 1 ~ EjQtfiJ+l ■OToj+l
_ V4-(<+1>+'r7, • l=1
KEiQj0'i_(l+1)J+1+tPJT'i_ltj+l
■Pj'Toj+i
Note that each of these operators must be composed of P * so that they map the space Vj to Vj. These are initialized as 7o,n = <7n and 0o,n = /n and all higher order terms are set to zero. There are several important observations to make here. First, the recur rence relations tell us that 7 ^ and $ij depend only on sums and products of lower order terms 71^, 9ij (and their derivatives), and nij for 0 < I < i — 1. Second, the quantity TJIJ depends only on sums and products of the terms jij and 8ij (and their derivatives) for 0 < / < i. The denominator in rjij is always Pj7o,j+i» W Q i c n is simply P , G . (We must assume that G(t) is bounded away from zero.) Theorem 3.3.2. Let us assume that G and F are the linear operators whose action on functions is pointwise multiplication by the functions G and F. Suppose that G(t) and F(t) are bounded for all t G [0,1] and that G(t) is bounded away from zero. Then we can conclude that the limits limn_foo 7<," , limn-»oo #1" and lim n _ +00 77}"' exist and are finite. Proof.
We
can
easily
check
that
limn^oo
y^J = PjGPJ
and
l i m ^ o o t f ^ = P , F P / . We also have the limits l i m „ _ > 0 0 7 ' ^ = PjGP* and
Mtdtiresolution
limn-foo QQJ — PjFPj,
Homogenization
Schemes
223
which are constant operators. Additionally, we know
that linin-voo %"• exists and equals
PiFP'-EjQiGP*
(n)
hm % „• =
„ ^^
—.
Let us now assume that the limits limn-^*, 7 j " , limn-Kx, 0," , limn-Kx, 77}" and the limits of their derivatives exist, and that the coefficients 7/" , 0}y, and 77}"' are bounded uniformly in n and j , for 0 < / < i - 1. We will show that 0,-y is bounded uniformly in n and j and that the difference between #j*J and 0>y tends to zero as n —► 00. Let b\j denote the sum i-l
b^ = V4 _ ( i + 1 ) + i T7, ( T , ) B O (0{n)
V
i=l
We use b\j to simplify the recurrence relation for 0>y: fl(n) _
1 u(n)
* p /j(")
We apply this recurrence relation and obtain a general expression for 0\n(which the reader is invited to check):
fc=j+i
Since b\j depends only on the sum and products of lower order terms, we know that it is bounded uniformly in n and j . Therefore, 9\j is also bounded uniformly. The difference between two successive terms 0;"- ' and #;"• can be written in three pieces as /}(«+!)
fl(")
_ /t("+l)
J,(n)\,
n-l V*
X 4i
p
p,
,/A("+ 1 )
*=j+l i(n+l)
Let us examine each piece individually beginning with the first:
/,(n)\
224
A. C. Gilbert
1=1
Since we assumed that the lower order terms 7/"', fljy and 77,"' (and thenderivatives) for 0 < / < t — 1 converged as n -> 00, we can conclude that the above difference goes to zero as n tends to infinity. In particular, for n sufficiently large, ||&i"+ — b\j\\ can be made as small as we want, uniformly in j . Thus, the second piece can be shown to tend to zero. Finally, the last piece tends to zero as well since 6>^- ' is bounded. This proves our claim that the limit as n -> 00 of 9l/ exists. We must still show that the limit of the derivative exists. Observe that the derivative of #{"• is given by
.*.«<»> - iLfcW ++ V dx *<* ~ dx ^
^
1
P 3... P t , —h(n)
4«(*-»
^ d x
<•*'
k=j+l
where
£ «
- p-(i+1)+livuEjQM-UJ+J
and
^ . - ^ d ( ^iQ 3 (e{ l+1 ), J+ i)'+^(7a J+1 )' f^
dx^'J
p.tJ.*)
V
A similar argument, using our initial observations and the inductive hypothesis, shows that the limits of ^6\", , 7 ^ ' and jzJij exist. Thus, we may conclude that the repeated appUcation of the reduction procedure converges and that we may make the initial resolution infinitely small, eliminating the discretization error in our effective system. We have made no comment, however, about the convergence of the solutions to these effective equations. □ Now we address the convergence of the series expansions for gy' and / > ' .
Multiresolution Homogenization Schemes
225
Theorem 3.3.3. Let us assume that the series
<#?i = E'rS)+i*&i
and
(3-41)
/i?i = E«Sli«r + i
i=0
i=0
converge and have radius of convergence R. is
bounded
Let us also assume
uniformly
by
C\
and
that that
is bounded uniformly by C?,. We stipulate that 11%" II ^ ^o- Finally, we assume that C = max(C\,C2) satisfies the inequality (1 + C) < 22d+1\ With these assumptions, we can conclude that the series for gf,f\n) and^ oo
oo
oo
i=0
»=0
i=0
converge. Note that we retain the initial level of resolution n and n does not tend to oo. Proof.
(»)
First, because we assumed that
pA-iS!£y
QM-\i+».J+lY+4P>rt"-\.J+1Y 'J(C)' » by by C2, we may bound the term rfcj
uniformly by C\ and that
is bounded
is bounded uniformly
•-1
||ifci-i|| < ^
+ C 2 ^ 4 - ( < + 1 ) + ' | | ^ _ 1 | | < cU-* 1=0
^
+ 5>-+Wi||) > J=O
'
where C = max(Ci,C2). Next, because we assumed that 11 T7OTJ H — ^ ° ' w e show that 7/)"- is bounded by "iterating" the above inequality to obtain:
bS'n s c(i + £) I (';') £ < (1 + £) go + o-. To determine the radius of convergence flj of dj, we calculate a (l+Co/^C^l+C)1"1/* — — — a = lirnsup^lTj^ljII) 1 /* = limsup/ ■ i—>oo
1+ C
can
226
A. C. Gilbert
and set Rd — 1/a = 4/(1 + C). Therefore, if 6? < Rd, the series is convergent and if 6? > Rd, the series is divergent. To have a convergent series, C must satisfy (1 + C) < 22(J+1), which C does indeed satisfy. Using this estimate, we examine the series for gy' and fj
. Let us assume
that HP^y+iH and ||Q>(«{2(i+i)a-+1)'II are uniformly bounded by C. we can bound 9$
II^II
Then
by
(A-+g4-<*'>«c<1+c°/ff1+c>'
t-i
\
/
C"
s-jrd + c)1We calculate the radius of convergence of fy' 2 J+1
same condition (1 + C) < 2 (
) for / j
n)
and find that C must satisfy the
to converge. A similar calculation
holds for gy' with the same result. 3.4.
Implementation
and
□
examples
In this section we present the numerical implementation of our formal reduction procedure, which we derived in Sec. 3.1, and three examples to evaluate the ac curacy of our reduction methods and to explore "patching" together the series expansion of the recursion relations and the numerical reduction procedure. We also determine numerically the long-term effect of a small perturbation in a nonlinear forced equation. 3.4.1. Implementation of the reduction procedure We initialize our numerical reduction procedure with two tables of values, one table for each of the discretizations of the functions F and G at the starting resolution level n. The first coordinate k in our table enumerates the averages in time of the functions F and G, the functions gn(sn)(k) and fn(s„)(k), for fc = 0 , . . . , 2 - n — 1. Notice that these are still functions of sn which is unknown, so we also discretize in s„. In other words, from the start, we look at a range of possible values sn(k, i) (i — 0 , . . . , N — 1) for each k, and work with all of them
Multiresolution Homogenization
Schemes
227
together. This discretization gives us the second coordinate i for our tables. We then have the values gn(sn(k,i))(k) and fn(sn(k,i))(k) for k = 0 , . . . , 2 n - l and i = 0 , . . . , N - 1. To look at a range of possible values in sn(fc), we must have some a priori knowledge of the bounds on the solution of the differential equation. Next we form the equation (dropping the subscript "n") which determines d on the interval fc£n-i
(3.42)
Notice that this is a sampled version of Eq. (3.17) and for each sample value s(k, i) and for each k = 0 , . . . , 2 n _ 1 - 1 we must solve Eq. (3.42) for d(k, i). That is, our unknowns d(k, i) form a two-dimensional array. To solve for each d(k, i) we must interpolate among the known values g(s(k, i))(k) since we need to know the value g(s(k,i) + |d(fc, i))(2fc + 1) (and similarly for g(s(k,i) — ^d(k,i))(2k)) and we only have the values at the sample points s(k,i) for i = 0 , . . . , N — 1. For higher order interpolation schemes, we need fewer grid points in s to achieve a desired accuracy which reduces the size of the system with which we have to work. Once we have computed the values d(k, i), we calculate the reduced tables of values gn-i(s(k,i))(k) and / n _i(s(fc,i))(fc), where k = 0, . . . , 2 n _ 1 - 1 and i = 0 , . . . , N—l, according to the sampled versions of the recurrence relations (3.20) and (3.21): gn^(s(k,i))(k) = Sgn(k)(s(k,i)J{k,i)) + /^(sik^Xk)
5
-fDfn(k)(s(k,i),d(k,i)),
= Sfn(k)(s(k,i)J(k,i)).
Notice that the tables are reduced in width in k by a factor of two and that this procedure can be applied repeatedly. Remark. Observe that when i = 0 (respectively, t = N — 1), we cannot in terpolate to calculate the values g(s(k, i) — |d(fc, i)) (respectively, g(s(k, i) + |d(fc,i))). We must either extrapolate (and then ignore the resulting "bound ary effects" which propagate through the reduction procedure) or adjust the grid in the s variable at each resolution level. An alternate approach could be to use asymptotic formulas valid for large s. We implemented this algorithm in Matlab as a prototype to test the fol lowing examples.
228
A. C. Gilbert
3.4.2. Examples With the first example we verify our numerical reduction procedure and de termine how the accuracy of the method depends on the step-size <5n = 2 _ n of the initial discretization. We also evaluate the accuracy of the linear versus cubic interpolation in the context of our approach. We use a simple separable equation x'(t)=(-jx2(t)cos(-j
and
x(0) = x0
(3.43)
with the solution available analytically. We observe that the solution x(t) to Eq. (3.43) exhibits behavior at two scales. We choose s = l/(47r) and the initial value xo = 1/2. The exact solution is given by XQ
X{t)
~
l-xcB^^fe)'
which we use to verify our reduction procedure. In particular we check if the averages of x(t) satisfy the difference equation derived via reduction. Let us assume that we reduce to resolution level 6j = 2~* so that we have two tables of values for fj(s(k, i))(k) and gj(s(k, i))(k). If Xj(k) is the average of x over the interval fc2J < t < (k + 1)2^, then the following equation should hold
*•(*;)(*) = *j £ /;fe)(fc') + siM*m ■ fc'=0
*
We denote by ej(k) the error over each interval kSj
=
9A*m - *j E fji'iW)
+ 1)5j and define
*,
- y/i(*i)(*)
Jb'=0
Note that we have only sampled values for gj(s(k, i))(k) and fj(s(k, i))(k) and so we must interpolate among these values to calculate 9j(xj)(k) for a specific value Xj(k). We want to know how the errors ej(k) depend on the level of resolution at which we begin the reduction procedure. We reduce to resolution level with Sj = 2 _ 1 and calculate the errors e^(0) and ej(l) using the averages Xj(0) — Xj(l) = 0.5774. We fix the number of sample points in s to be 50 and use linear interpolation. Table 3.1 lists the errors as a function of the initial resolution. If we exclude the errors associated with the initial resolution
Multiresolution Homogenization Schemes 229 Table 3.1. Errors as a function of the initial resolution. Initial resolution = <5„
Average error
2-2
0.0774
2"3
0.0290
2
_4
0.0069
2
-5
0.0019
Sn = 2 - 2 and plot the logarithm of the remaining errors asafunction of log(<Sn), the slope of the fitted line is 1.9660. We can conclude that the accuracy of our numerical reduction scheme increases with the square of the initial resolution. As we described above, we must interpolate between known function values in the tables. We can use the built-in Matlab linear or cubic interpolation routines. We would like to know how the interpolation affects the error of the method and the minimum number of sample points in s we need for both interpolation methods. We use Eq. (3.43) again with the same values for xo and e. We fix the initial resolution at Sn = 2 - 5 . For technical reasons, with cubic interpolation we can reduce only to resolution level Sj = 2 - 2 . Table 3.2 lists Table 3.2. Error as a function of the number of sample points in s, with linear interpolation and with cubic interpolation. No. of sample points in s
Average error linear
cubic
6
0.0238
0.0045
10
0.0098
0.0020
15
0.0052
0.0020
25
0.0029
0.0020
30
0.0024
50
0.0019
— —
the errors as a function of the number of sample points in s for both linear and cubic interpolations. In Fig. 3.2 we have plotted the average error as a function of the number of sample points in s for the two methods of interpolation. We can see that with cubic interpolation the minimum number of grid points in s is
230
A. C. Gilbert 0.025
^
1 r^ Inev interpolation cubic Interpolation
0.02
0.015
I 0.01
0.005
"5
_i
10
i_
15
20
25
30
3S
-i
40
i_
45
50
grid points in x
Fig. 3.2. The error as a function of the number of sample points in a for linear and cubic interpolation methods
15 and that with linear interpolation we can achieve the same accuracy with 50 grid points. We can also see from the graph that increasing the number of grid points (past 15) will yield no gain in the accuracy of the cubic interpolation method. In the second example we will combine the analytic reduction procedure with the numerical procedure. We begin at a very fine resolution <5„0 = 2"° and reduce analytically to a coarser resolution level <5ni = 2™. From this level we reduce numerically to the final coarse level 6j. The analytic reduction procedure is computationally inexpensive compared to the numerical procedure and we want to take advantage of this efficiency as much as possible. However, we must balance computational expense with accuracy. With this example we will determine the resolution level <5ni at which this balance is achieved. Again we use a separable equation given by x'(t) = x2{t)cos(-), \e J The solution to Eq. (3.44) is
x0 = 0.1,
e= —. 47r
(3.44)
Multi-resolution Homogenization
*(«) =
Schemes
231
Xo
1 — exo sin(t/e)
We begin with analytic reduction at resolution Sno = 2 - 1 0 . We choose the final resolution level to be Sj = 2 - 2 and we let m , the resolution at which we switch to the numerical procedure, range from 2 to 5. Table 3.3 lists the errors as a function of n i . Note that we have used cubic interpolation and ten grid points in x. Figure 3.3 is a graph of the average error as a function of Table 3.3. Errors as a function of the intermediate resolution. Intermediate resolution = nj
Average error
2-a
0.00106
2~3
0.00093
_4
0.00092
2-s
0.00092
2
0.00106
0.00102
0.001 -
0.000M -
0.0O0M -
0.00082
0.25
Fig. 3.3. The error as a function of the intermediate resolution level at which we switch from the analytic reduction method to the numerical reduction method.
232 A. C. Gilbert.
the intermediate resolution. We can see from this graph that the biggest gain in accuracy occurs at the intermediate resolution Sni = 2 - 3 . In other words, at the finer intermediate levels (n\ = 4,5) we get a small gain in accuracy compared to the computational expense of the additional resolution levels in the numerical reduction. To balance accuracy with computational time for this particular example, we should reduce analytically to resolution 6ni = 2 - 3 and then switch to the numerical reduction to reach the final level Sj = 2 - 2 . The analytic procedure allows us to reduce our problem with very little computational expense (compared to the numerical procedure) and then for the additional accuracy needed we can use only one relatively more expensive numerical reduction step. The third example we will consider is the equation x'(t) = (l~x2(t))x(t)
+ Asm(t/e),
x(0) = x 0 ,
(3.45)
where c is a small parameter associated to the scale of the oscillation in the forcing term. If the amplitude A = 0, then the solution x(t) has one unstable equilibrium point at Xo = 0 and two stable equilibria at x<> = —1,1 (see Fig. 3.4). 1.5
1
0.5
f
0 -0.5
-1
0
0.5
1
1.5
2
2.5 t-timt
3
3.5
4
4.5
Fig. 3.4. The flows for Eq. (3.45) with zero forcing.
5
Multi-resolution Homogenization
Schemes
233
A small perturbation in the forcing term will affect large changes in the asymptotic behavior as t tends to infinity. Therefore, the behavior of the solution on a fine scale will affect the large scale behavior. In particular, if the amplitude A is nonzero but small, then the solution x(t) has three periodic orbits. Two of the periodic orbits are stable while one is unstable (see Fig. 3.5).
Fig. 3.5. The flows for Eq. (3.45) with small but nonzero forcing. Notice that there are three periodic orbits, two stable and one unstable.
As we increase the amplitude A, there is a pitchfork bifurcation — the three periodic orbits merge into one stable periodic orbit (see Fig. 3.6). We would like to know if we can determine numerically the initial values of these periodic orbits from the reduction procedure and if those periodic solutions are stable or unstable. We will compare these results derived from the reduction procedure with those from the asymptotic expansion of x for initial values near xo = 0 and for small e. Let us begin with the asymptotic expansion of x for small values of e. Assume that we have an expansion of the form *(*; e) ~ 0 + exi (t, T) + e2x2 (*, T) + • • • ,
(3.46)
where the fast time scale r is given by T = t/e. If we substitute the expan sion (3.46) into Eq. (3.45), we have the equation
234
A. C. Gilbert
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 3.6. The flows for Eq. (3.45) with large amplitude A. Notice that there is only one (stable) periodic orbit in this diagram as the system has undergone a pitchfork bifurcation.
dxi
+£
V dt
dr )
.Asinr + exi +
0{e2).
Equating terms of order one in e, we have ^ = A sin T, which has the solution x\{t,r) = —Acosr + w(£). The function w is determined by a secularity condition which we impose on the terms of order e. Equating the terms of order e gives us the equation —— = — A cos T + (j(t) — u)'(t). OT
The non-oscillatory term u — ui' in the above equation is "secular" because if it were nonzero, we would have a linear term in r which is incompatible with the assumed form of the expansion (3.46). Therefore, we set this term equal to zero, u) — w' = 0, and determine that v(t) = C\£. So we have obtained an asymptotic expansion for x X{T;E)
~ 0 + £(-^4cosT + C i e t ) .
(3.47)
Note that this asymptotic expansion is valid only for t < | log e |. We can, however, determine the behavior of x for large time by examining the direction
Multiresolution Homogenization
Schemes
235
of the growth in x since the direction signifies which stable periodic orbit (1 or — 1) captures the solution. Observe that the sign of the coefficient C\ depends on the initial value xo. In particular, if xo > — Ae, then C\ > 0 and if xo < — Ae, then C\ < 0. In other words, if £ is sufficiently small, there is a separation point xj", defined as the largest value such that if xo < XQ, then x(t) < 0 as t tends to infinity. According to the asymptotic expansion (3.47), the separation point XQ as e goes to zero is given by XQ
~ —Ae.
This is an approximation of the initial value of the unstable periodic solution. Let us derive another approximation for the separation point by linearizing Eq. (3.45) about xo = 0. The linearized differential equation is the equation x'(t) = x(t) + A sin(t/e),
x(0) = x 0 ,
which has the solution x(t) given by
x ++ /
"i ° J!
x(t) = et(x0
Ae
t
a\n(s/e)ds
The sign of the factor x 0 4- / 0 Ae~* sin(s/e) ds as t tends to infinity determines the direction of growth in x(t). In other words, the separation point XQ is the value for which the following is true lim XQ+ Ae * sin(s/e) ds = 0. - °° Jo
t +
If we evaluate the integral in the above expression, we determine that xg sat isfies Ae2 lim x'0 + —-^eT1 t-Kx> 1 + e*
Ae sm{t/e) - — ^ (z e _ t cos(*/£) - 1) = 0. 1+ e
Thus the separation point is given by xX =
—Ae x 1 + e*
for e sufficiently small.
We have derived two approximations for the initial value near xo = 0 of the unstable periodic orbit. We will compare these two approximations with the values we determine numerically from the reduction procedure.
236 A. C. Gilbert
We now turn to the numerical reduction procedure. Assume that we can reduce the problem to a resolution level 6j = 2 _ J where it no longer depends on time (i.e. the problem is now autonomous). This means that the tables gj(s(k,i))(k) and fj(s(k,i))(k) depend only on i and not on k. Let Xj(k) denote the average of the solution x over the interval kSj < t < (k + 1)5j. Observe that for Eq. (3.45), the functions G and F are given by G(t, x{t)) = x(t) - x0
and
F(t, x(t)) = (1 - x2(t))x(t)
+ A sin(t/e)
so that the initial value xo is simply a parameter in the numerical reduction scheme and we may take G(t, x(t)) = x(t). If the solution x(t) is periodic and if 5j is an integer multiple of that period, the averages Xj(k) will all be equal to the value xe (call that the average); i.e. Xj(l) = Xj(2) = • ■ ■ = xe. Therefore the value of gj(-){k) at each average Xj(k) is the same: 9j(xj)(l)
= gj(xj)(2) = •■• =
gj(xe).
Since this holds for all k, we will drop the parameter. We will also drop the subscript j for clarity. If we take the expressions for g evaluated at two successive averages x(l) and x(l + 1) and subtract them, we find that / ( x e ) must satisfy 0 = g(xe) - g(xe) = g(x(l + 1)) - g(x(l)) = 6-(f(x(l + 1)) + f(x(l)))
= /(*«).
This gives us a criterion for finding the average value xe. We know that the average value of the periodic solution x is a zero of / . Finally, the separation point XQ is the initial value such that g(xe) — XQ — 0. To determine if the separation point XQ is stable or unstable, we will perturb it by a small value X. Set the new initial value xo equal to xj + ^- L ^ (Ax e )j denote the deviation from the average value x e in the average of x over the interval 16j < t < (I + 1)6j. Then, the discretization scheme relates the difference between (Ax e )j and (Ax e )j + 1 : g(xe + (Ax e ) / + i) - g(xe + (Ax«)j) = ^ ( / ( ^ + ( A x e)/+i) + /(*« + (A*.)/)). If we linearize the above equation, the following holds: 5'(xe)((Axe)/+1
- (Ax e ) ( ) = | / ' ( x . ) ( ( A x « ) , + i + (Ax.),),
or equivalently, we may use the ratio
Mtdtiresolution Homogenization Schemes 237
(Ale)
'
fix.) - §/'(*.)
to test the stability of the separation point xjj. Table 3.4 lists several values for e, the amplitude A, and the corresponding average values xe for the periodic orbits, separation points, and ratios. The separation point which has a corresponding ratio greater than one is the un stable periodic orbit with initial value XQ. We reduce to a level where the problem is autonomous and use cubic interpolation. We compare the cal culated separation points with those determined by the two analytic meth ods. The first number in the errors column is the error in the asymptotic method and the second number is the error in the linear method. Notice that for the values A = 40 and e = l/(87r) we have only one stable peri odic solution. In other words, the two stable periodic orbits have merged with the unstable one to create one stable periodic solution. Clearly, this merging of solutions shows that the fine scale behavior of the solution has a large effect on the coarse scale (or long time) behavior. Furthermore, we Table 3.4. The entry x e is the value of x(t) for the corresponding initial value xj$, which we call a separation point. The ratio tells us if the separation point is stable or unstable. These three columns are calculated using the effective equation. We also calculate the separation point closest to XQ = 0 with an asymptotic method and a linear method. £
A
Xe
4.0 x 1 0 - 7 1 16ir
1 16w
1 16ir
1 8ir
1 8*
1
10
20
1
Separate points Ratios -0.0199
x j (asymp.)
x'0 (lin.)
errors 1.0 x 1 0 - *
1.1354 0.7796 -1.989 x 1 0
-2
l 16*
1.0006
0.9807
-1.0006
-1.0204
0.7796
6.0 x 1 0 - 6
4.0 x 1 0 - 6
-0.1989
1.1276
< 1.0 x 10" 5
0.9746
0.7759
0.7868
-0.9746
-1.1732
0.7868
4.0 x 1 0 - 6
-0.3978
1.1056
0.8927
0.4951
0.8185
-0.8927
-1.2901
0.8186
3.0 x 10" 6
-0.0397
-0.1989
4.37 x 10" 4 < 1.0 x 1 0 " 5 -0.3978
20 16*
8.7 x 1 0 - 5
1.1345
1.50 x 1 0 - *
0.9991
0.9595
2 0.7815 -3.985 x 1 0 -
-0.9991
1.0387
0.7815
-1.5891
0.7594
40 - 1 . 4 X 10-*
10 16>r
1 Sir
8.9 x 1 0 " s -1.592
—
0.0029
238
A. C. Gilbert
have detected numerically this large effect. Note that we had to resort to a different asymptotic expansion from the one used previously to determine the separation point for A = 40 and e = l/8ir. 3.5.
Homogenization
In the previous sections we discussed only the MRA reduction procedure for nonlinear ODEs. In this section we construct the MRA homogenization scheme for nonlinear ODEs. In the multiresolution approach to homogenization, the homogenization step is a procedure by which the original system is replaced by some other system with the desired properties (perhaps a "simpler" system). By making sure that both systems produce the same reduced equations at some coarse scale, we observe that as far as the solution at that coarse scale is concerned, the two systems are indistinguishable. We should emphasize that this is a preliminary investigation of the homogenization method for nonlinear ODEs. Homogenizing a nonlinear ODE is a difficult and subtle problem. It is not even clear what constitutes a "simpler" equation. Suppose we reduce our problem to level j , using the series expansion of the recurrence relations, and have a discretization of the form fc-i 9j(Sj)(k) = 6j £
s
MSJW)
+ -±Msj)(k),
(3.48)
z
fc'=o
where the functions gj(sj) and fj(sj) are expanded in powers of Sji 9iisi)(k)
= 70j(*j)(*) + 7ij(*i)(*)* J ?
fi(»i){k)
= 0oAsj)(k)
and + 0iAsi)(W*
■
We want to find 2 ' functions G(s)(k) and F(s)(k) (indexed by k = 0 , . . . , 2 _ J ; 1) with expansions G(s)(k) = G0(s)(k) + qdi (s)(k)
and
F(s)(k) = F0(s)(Jfe) + 5?Fi (*)(*)
such that for each k and all Sj € V, we have 9i(»i)W = 7b(«i)(*) + #tt(«i)(*) = G0(sj)(k) + ^Gi(*;)(*), (3.49)
fiMW
= «o('i)(*) + ^*i(«i)(*) = F0(sj)(k) + ^Fi(*,-)(*),
Multiresolution Homogenization
Schemes
239
where U A ;
24\G'0(x)(k)J
°y
>K
'
12\G'0(x)(k)J °K ,K '
and
In other words, on each interval (fc)2 -J < t < (k + 1)2 _ J we want to find two functions G(x)(k) and F(s)(k) which depend only on x such that the reduction scheme applied to these functions on each interval yields the same discretization (3.48) as the original. We know what the fixed point or limiting value of the reduction process for autonomous equations is so we may use this exact form to specify Gi(x)(k) and Fi(x)(fc) in terms of Go(x)(k) and Fo(x)(k). We can eliminate Gi(x)(k) and Fi(x)(k) from Eqs. (3.49) to get the following coupled system of differential equations for each k
/,-(x)(fc)-F 0 (s)(fc)
**
=
1 /Fo(x)(A:)\2F (x)(fc)
24 K^m)
°
•
We may pick out the non-oscillatory solution to the system of differential equa tions and obtain
*-* + <-»-5®'*-n(S) < 0' A-*+<*-ii($) , 4 This homogenization procedure will yield a simplified equation which is au tonomous over intervals of length 2 - J and whose solution has the same average over these intervals as the solution to the original, more complicated differ ential equation. One can replace the original equation by this homogenized
240
A. C. Gilbert
equation and be assured that the coarse behavior of the homogenized equation is identical to the coarse behavior of the original solution. 3.6.
Conclusions
We can extend the MRA reduction and homogenization strategies to small systems of nonlinear differential equations. The main difficulty in extending the reduction procedure to nonlinear equations is that there are no explicit expressions for the fine scale behavior of the solution in terms of the coarse scale behavior. We resolve this problem with two approaches; a numerical reduction procedure and a series expansion of the recurrence relations which gives us an analytic reduction procedure. The numerical procedure requires some a priori knowledge of the bounds on the solution since it entails using a range of possible values for the solution and its average behavior and working with all of them together. The accu racy of this scheme increases with the square of the initial resolution but it is computationally feasible for small systems of equations only. We can use the reduced equation, which we compute numerically, to find the periodic orbits of a periodically forced system and to determine the stability of the orbits. One reduction step in the analytic method consists of expanding the recur rence relations in Taylor series about the averages of the solution. We gather the terms in the series which are all of the same order in Sj, the step size, and identify them as one term in the series so that we have a power series in 5j. Then we write recurrence relations for each term in the series so that the nonlinear functions which determine the solution on the next coarsest scale are themselves power series in the next coarsest step size <5j_i. We determine the recurrence relations for an arbitrary term in this power series, show that the recurrence relations converge if applied repeatedly, and investigate the conver gence of the power series for linear ODEs. The homogenization procedure for nonlinear differential equations is a pre liminary one. We replace the original equation with an equation which is autonomous on the coarse scale at which we want the solutions to agree. If we are interested in the behavior of our solution only on a scale 2 - J , then our simpler equation which we use in place of the original equation does not depend on t over intervals of size 2 - J . Unlike the linear case where a con stant coefficient equation (or an equation with piecewise constant coefficients) is clearly simpler than a variable coefficient equation, it is not clear what kind of "simpler" equation should replace a nonlinear equation. We present one candidate type for a simpler equation.
Multiresolution Homogenization
Schemes
241
4. Steady-States of a Model Reaction-Diffusion Problem Spontaneous pattern formation in physical and biological systems is a major current area of research. Many researchers in chemistry, chemical engineering, physics and mathematics study pattern formation in reaction-diffusion systems and their models. For instance, Bar et al., in Ref. 2, perform physical and numerical experiments to study how microstructured and composite surfaces affect pattern dynamics during the oxidation of CO on Pt surfaces. They find that when the scale of the heterogeneity is large compared to the wavelength of the spontaneously arising structures, the interactions of the patterns with the boundaries dominate the system and when the heterogeneity is very small, the system exhibits effective behavior. Motivated by these experiments and others, Shvartsman et al., in Ref. 21, present a numerical study of the pattern formation on model one-dimensional reactive media. The surface in this one-dimensional model is a periodic interval of period or ring length L. They vary the geometry of the composite and use the length of the medium as a bifurcation parameter to explore dynamic patterns. Shvartsman et al. use the Fitzhugh-Nagumo equations as a model excitable reaction-diffusion system:
_ = -„» + „ - „ + _ ,
(4.1)
dv — = e(u - aiv - a0).
(4.2)
The parameters ao and aj represent the catalyst activity or kinetics of the reactants (and depend on x). The parameter £ is a ratio of time scales. The com posite surface is made of two components which individually satisfy Eqs. (4.1) and (4.2) and which have individual kinetic parameters OQ and a\. Through diffusion the two components interact and we take the diffusion constant on both components to be equal, to match experimental observations. Since the variation in catalyst activity in the experiments (the spatial dependence of the kinetic coefficients in the reaction-diffusion equations) is abrupt, the model composites are designed to look like striped media. The stripes are (almost) step functions which model (almost) step changes in ac tivity. The transition between the two components at Xo is modeled by several different smooth cutoff functions; e.g.,
to(x) = a0,b + ° M " °°* (l + tanh ( ^ ) )
•
242 A. C. Gilbert
300
Fig. 4.1. This is a graph of 256 samples of the parameters ao and a\ with base values —0.4 and 2/3 (respectively) and defect values 0.65 and 4/3 (respectively).
The constant an,6 is one of the kinetic parameters for the base component and ao,d is for the defect component. Shvartsman et al. studied several defect stripe configurations, including two and four symmetric stripes and asymmetric stripes. See Fig. 4.1 for a graph of the parameters ao and a\ with the above transition function and four symmetric stripes. The reader can check that approximately 20% of the surface is covered by the defect component, that there are eight transitions between the defect and the base components, and that approximately 80% of the surface is covered by the base component. 4.1. Setting
the
stage
We restrict ourselves to exploring steady-state solutions of the FitzhughNagumo Eqs. (4.1) and (4.2). The steady-state solutions u(x) satisfy a secondorder differential equation
Multiresolution Homogenization
Schemes
243
with periodic boundary conditions u(0) = u(L) and ^ ( 0 ) = ^(L). Note that the steady-state solution v(x) depends algebraically on u(x), v = (u — ao)/ai, and we have used this algebraic relationship to eliminate v from Eq. (4.3). We put four stripes on our surface. As the number of stripes is fixed and because the defect remains 20% of the surface regardless of the length L of the surface, we construct the kinetic parameter ao as follows. We first construct ao(x) for x € [0,1], then we rescale x G [0, L] and take ao(x/L) as one of the coefficients in Eq. (4.3). We model the transition by the smooth cutoff function t0(x) = a0,6 -I
=—-—— ( 1 -I- tanh(647rx)
■
We use tanh(647ri) because we want a sharp transition. In order to have an integral number of average values at the initial level of resolution for each part of the parameter profile (base, defect and transition), we must begin with a res olution no coarser than 5n = 2 - 8 . Notice that once we have constructed ao(x) a for x G [0,1] and computed the initial averages ao,„(fc) = 2 n J2-"k o ( I ) ^x> we may use the same average values for the initial discretization of ao(x/L) with x e [0, L] because 2n
r2-n(k+l)L
— / L> J2~nkL
/^""(fc+l)
ao(x/L) dx = 2" /
ao(x)dx.
J2~nk
This observation will be crucial in Sec. 4.2.3. We will answer the following two questions: • Can we characterize averages over the interval [0, L] of the steady-state so lution^) in terms of the period length LI Numerical results obtained by Shvartsman (Ref. 20) show that the steady-state undergoes a bifurcation at L = 47.5?r. In fact, at this bifurcation four pairs of eigenvalues (two distinct pairs and one pair with multiplicity two) move transverse to the imaginary axis into the right half-plane, crossing the imaginary axis. (For a Hopf bifur cation, a pair of eigenvalues moves transverse to the imaginary axis, crossing the imaginary axis, at nonzero velocity.) • What is the complexity of the reduction algorithm for this example and how does it compare with the pseudo-spectral method used by Shvartsman et al. (Refs. 20 and 21)?
244 A. C. Gilbert
Before we can proceed, however, we must develop several new techniques in addition to the methods discussed previously. • We must be able to reduce a small system of ODEs (i.e. we need the recur rence relations for an n-dimensional system). • We must also be able to reduce a boundary value problem. All of our previous derivations address initial value problems only. • Finally, we must determine how to incorporate the period length L into our reduction procedure without recomputing the effective equation for each new value of L. We present these techniques first in the following section and then discuss the results of their application to this problem. 4.2. New
techniques
Equation (4.3) to which we want to apply our reduction method is a secondorder nonlinear equation. We can write this equation as a system of two first-order equations
|
—
<">
— =u3- 1 ax \
u aj J
(4.5) a\
with the periodic boundary conditions u(0) = u(L) and w(0) = w(L). As in the previous sections, we must rewrite these equations (4.4) and (4.5) as a system of integral equations before applying the reduction procedure. The new system is u(x) - u(0) = / w(s) da, Jo
(4.6)
w(x) - «,(0) = f u3(s) - (l - - 4 T W )
Jo
\
a s
n )J
-^r\da, a
(4.7)
us)
where x € [0, L], u(0) = u(L) and w;(0) = w(L). We want to calculate the effective system of equations which determine the averages uo and WQ of the solutions u and w to Eqs. (4.6) and (4.7). We want to include the period length L as a parameter in this effective system. We also want to address
Multiresolution Homogenization Schemes 245
the boundary conditions u(0) = u{L) and w(0) = w(L). We begin with the recurrence relations for an JV-dimensional system of integral equations. 4.2.1. Recurrence relations for n-dimensional
systems
Let us assume that our iV-dimensional system has the form G = K F where
and
\GN;
F =
\FN)
The operators G i , . . . , G N and F i , . . . , F N are nonlinear operators and are the same as those discussed in Sec. 3.1. K is the integration operator (for TVdimensional systems). We apply the same arguments and notation as those in Sec. 3.1 to this system and derive formal recurrence relations for the functions gjt and / ' " ' (i = 1,...,N). Then we expand these recurrence relations in series as in Sec. 3.2: 9j = lo,}> + 7u<$
and
fj=0Oj+8itj5^,
(4.8)
where
9j
M
>/; =
M
UJ
\9J,N)
' 7o,j,i ^
. 7o,j =
^OJ.1^ > Qo,j
\7o.i.^/
=
\eo,j,jv/
and similarly for 7 i j and 6ij. The recurrence relations for the two lowest order terms in the series (4.8) are 7o,j-i = S70J, 0o,j-i = S0o,j , 7i j - i = \sjld
+
1 32
+ ^D0o,j + ^(SVtfoj + DV7o,i)di-i
246 A. C. Gilbert
Oij-i
= \s9ltj
+ ^(SVtfojH--! +
^
^•.^J-^Stfoj-DTbj)The first factor in the recurrence relation for d,_i is the inverse of the Jacobian of 7o,j. The expression Hyoji is the Hessian of 7o,j,i and the operators S and D act on each entry in this matrix. The operators S and D also act coordinatewise on the vectors 9oj and 70,j. These recurrence relations are initialized in the same way as the one-dimensional recurrence relations. 4.2.2. Boundary value problems At first glance, the reduction procedure seems to be applicable only to initial value problems. Because we rewrote the differential equation as an integral equation, we must use the initial value of the solution to solve the reduced equation. To solve for the steady states of this one-dimensional reactiondiffusion equation, we do need to be able to apply the reduction procedure to boundary value problems (specifically, to periodic boundary value problems). We begin with an observation about the reduction procedure and we use a simple one-dimensional example for this observation. Our one-dimensional integral equation is x(t)-x(Q)=
f F(s,x(s))ds, Jo
t€[0,l],
(4.9)
which requires the initial value x(0). We can reverse the coordinate t € [0,1] (in numerical analysis terminology, "shoot backwards") and obtain the integral equation c ( l ) - z ( l - t ) = / F(l-s,x(l-s))ds, Jo
t€ [0,1],
(4.10)
which requires the final value x(l). Observe that the form of the reversed equation (4.10) is the same as that of the forward equation (4.9), so we can apply the reduction algorithm to the reversed equation. At the initial resolu tion level the only distinction between the two equations is the indexing in k of the values gn (xn)(k) and /« (xn)(fc) and a minus sign on the left-hand
Multi-resolution Homogenization
Schemes
247
side. Furthermore, at resolution level j = 0 the average of the forward so lution x(t) is the same as the average of the reversed solution x(l — t). In the forward direction, the initial condition x(0) is a parameter in the effective equation at level j = 0 and in the backward direction, the final condition is also a parameter in the effective equation. We now apply this observation to a second-order ODE with periodic bound ary conditions, x(0) = x(l) and y(0) = y(l): x(t)-x{0)=
j Jo
F1(s,x{s),y(s))ds,
y(*)-»(0)= / F2(s,x(s),y{s))ds. Jo Suppose that we reduce the forward integral equation (either formally or by the series expansions) and obtain 9i{xo,y0) - x(0) = - / i ( x 0 , y 0 ) ,
(4.11)
Stefco, yo) - y(0) = ^.Mzo, y 0 ),
(4.12)
an effective system of equations for xo and yo (the averages of x and y over the interval [0,1]). Suppose that we also reduce the reversed integral equations and obtain x(l) -
(4.13)
2/(1) -52(2:0,yo) = ^ ( z o , y o ) ,
(4.14)
an effective system of equations for xo and yo. Notice that we can combine these four equations and the periodic boundary conditions to obtain a system of equations which we can solve for xo and yo and which does not depend on the boundary conditions. We add Eqs. (4.11) and (4.13) and Eqs. (4.12) and (4.14) and eliminate the periodic boundary conditions for x and y. Our system for xo and yo is
gi(x0,y0) - gi{x0,yo) = ^(fi(x0,yo) + fi(xQ,y0)), 92(x0,yo) - 92{x0,yo) = 2-(/2(zo,2/o) + /20 c o,yo)) which we can solve for xo and yo-
248 A. C. Gilbert
This method of reversing the integral equation and solving the reversed equation is similar in spirit to the shooting methods for boundary value prob lems in that we paste together the results for the forward and backward equa tions. However, unlike shooting methods, the reduction procedure gives us an equation for the average of the solution (whether computed backward or forward) and the initial (or final) value is a parameter in the reduced equa tion. Shooting methods yield point values of the solution only, not averages or equations for averages and the point values must be recalculated for each new initial (or final) value. 4.2.3. Rescaling the interval [0,1] In all of our previous examples and derivations we constrained our problems to the unit interval [0,1]. With this application we want to know the effect of varying the physical length L of the interval [0, L] and we want to be able to examine this effect without reducing the differential equation for each value of L. In other words, we would like to make L a parameter in the reduced equation. If we take the naive approach and begin with a nonlinear system of size \2nL\ (or [2"LJ +1) and resolution size 2~", we do not make L a parameter of the reduced equation and we have to recalculate the reduced equation for each value of L. If we rescale the differential equation by setting v(x) = u(Lx), we introduce the coefficient L2 on the right-hand side of the equation. Instead we want to rescale our original grid on [0,1], which consists of 2" intervals of size 2 _ n , so that the rescaling does not affect the reduction procedure and we still have a nonlinear system of size 2 n , independent of L. The grid of 2 n intervals on [0,1] determines the intervals over which we initially discretize the integral equation. Let us examine what type of grid we may use with the reduction procedure. First, the reduction procedure can be thought of as simply a change of basis, where we split an approximation of the solution into a coarser approximation and the differences between these two approximations, and then eliminate the differences. The form of the discretization of the ODE must remain the same under the change of basis and the elimination of the differences, and the grid determines the step size of the coarser approximation. We claim that a dyadic partitioning of the interval [0,L] gives us a grid which we may use in the reduction procedure. A dyadic partitioning {/,-,* \j G N, k G K.{j)} of [0, L] is the collection of sets Ijik = [2-jkL,2~j(k + \)L) where j G N and k G /C(j)
Multireaolution Homogenization
Schemes
249
= { 0 , . . . , 23; — 1}. Notice that the intervals Ijtk are all of the same length 2 3L and that the coarser intervals {Ij-i k} are twice as long as the finer intervals We will apply the reduction procedure to the integral equation G(t ,x(t))-G{0,x(0))=
f F(s,x(s))ds,
te[0,L].
(4.15)
Jo
We begin with the discretization of Eq. (4.15) at scale n using a dyadic par titioning of [0, L\. Let x„(A:) denote the average of the solution x(t) over the interval In>k- Let gn(xn)(k) and fn(xn)(k) denote the averages of the functions G and F over the interval In
0
Kn = hn
V
1
I
h
The initial discretization of (4.15) is given in coordinate form by fc-i 9n(Xn)(k) = hnJ2
h
fn(Xn)(k')
+ -ffn(xn)(k)
fe'=0
.
(4.16)
*
As in Sec. 3.1, we split (4.16) into two equations by applying the average operator S and the difference operator V. Note that V amounts to taking successive differences normalized by the step size of the grid so V has the form Vgn{xn){k)
= i-( 5 n (x n )(2fc + 1) -
gn(xn)(2k)).
S remains unchanged. We also change coordinates at this point and write Sn-i(fc) = ^(xn(2k and
+ 1) + xn(2k))
250
A. C. Gilbert
dn-i(k)
= ^~(xn(2k
+ 1) - Xn(2k)) .
The resulting system of two equations in the variables s n _i and d„_i is given by (dropping subscripts) 2k
Sg(s, d)(k) = ^
£
f(s, d)(k') + ^f(s,
£
d)(2k + 1)
4
fc'=o 2Jb—1
+ T £ /(*• d)(fc') + T /(s - d)(2fc) '
(4 17)
-
fc'=0
Vg(s,d)(k)
= Sf(s,d)(k).
(4.18)
As before, let us assume that we can solve Eq. (4.18) for d as a function of s and that we substitute d into Eq. (4.17). The important question is whether or not we can rewrite Eq. (4.17), after substituting d, in the same form as Eq. (4.16). Indeed, we can do this since the coarser step-size /i„_i is twice as large as the finer step-size hn, /i n -i = 2hn. Observe that the right-hand side of Eq. (4.17) can be rearranged as follows:
-^ 52 f(s,d)(k') + -ff{s,d)(2k + 1) + ^ J2 / M ( * ' ) + ^f(s,d)(2k) z
4
fc'=o
= 2hn J2 Sf(s,d)(k')
4
* k'=o
+ hnSf(s,d)(k)
-
-fVf(s,d)(k)
fe'=0
= /*„_! J2 S / M ( f c ' ) + ^-Sf(s,d)(k)
-
%-Vf(s,d)(k).
fc'=0
This means t h a t our effective equation for s is h2
Sg(s,d){k) + -fVf(sJ)(k) 4
lc 1
~
= / i n _ ! J2 Sf(s,d)(k') fc'=o
h
+ l
!h^lSf(s,d)(k)
and that it has exactly the same form as Eq. (4.16). Furthermore, our recur rence relations for gj and fj are given by 9j(»j)(k) = 8gJ+i(8j,dj)(k) /j(«i)(*) = S/ j + i(* > J i )(fc).
+ - ^ P / j + i (*>,(*;)(*),
Multireaolution Homogenization
Schemes
251
The function Sj is a piecewise constant function with step-width hj = 2 - J L and Sj(k) denotes the average of x(t) over the interval IjtkWe might ask how rescaling the grid and taking step sizes hj = 2 _ J L affects the series expansions for the recurrence relations. With the exception of introducing a factor of \/L in the difference operator D (the action of which is to divide successive differences by 2 _ J — Sj), the recurrence relations for the coefficients 7 ^ and 6itj are not altered. If we examine the derivation of the recurrence relations for 7o,j, 71 j , 0o,j, #i,j (see Sec. 3.2), and the algorithm for generating the general recurrence relation (see Sec. 3.2.2), we will see that the only property of Sj used is that <5j_i is twice as large as Sj. Notice in the derivation in Sec. 3.2 we did not use the fact that Sj — 2 - J (and similarly in the algorithm in Sec. 3.2.2 where h represents the step-size). If we rescale the grid, the recurrence relations for the two lowest order terms are given by 7o,j = S7o,j_i, 0o,j = S#o,j-i i
-
Sg0,J_1-P7o,;,_1 S7^_1
Sg0j--i-D7oj-i/L S7^_1
Similar expressions hold for systems of nonlinear ODEs. The key feature of our application which makes L a parameter in the re duced equation and which we have not used up to this point is the rescaling of the coefficients in our integral equations. We assume that the profiles of the kinetic parameters ao(x) and ai(x) in Eq. (4.3) are the same for every value of L. In other words, the percentage of the ring which is ON is always 20% (and similarly for the percentage OFF). So we can simply rescale the coefficients in our integral equation (see Sec. 4.1). We begin the reduction procedure with the averages
252 A. C. Gilbert 2n 9n{un, U>n) = -f L
I-2-" (k+l)L I G(x, Un(k), Wn(k)) dx = J2-"kL
2n
fn(un,wn)
/-2—(k+l)L
= — / L
un(k)
F(x,un(k),wn(k))dx
J2-"kL
wn(k) 3
^un(k) -un(k)(l-an(k))-pn(k)t Notice that the coefficients an(fc) and /3n(k) do not depend on L since we can rescale the averages which define these coefficients: dx = 2n /
«»(*) = T /
——dx
^ J2-"kL
0.i(X/L)
J2-"k
a x
^ J2-'>kL
ai\Z/L)
J2~nk
a x
and
n)
l\ )
In other words, we initialize the reduction procedure with the same values regardless of the ring length L and hence L is simply a parameter in the recurrence relations. We must calculate the reduced equation only once with the series expansion method and then solve the reduced equation for each value of L. We will apply these derivations to our problem in Sec. 4.3 and examine the results. 4.2.4. Generalized Haar basis In Sec. 4.2.3 we made no mention of an MRA when we showed that we can rescale our step-size by L and that the discretization (4.16) using this rescaled step-size is preserved under reduction; we simply let xn(k), gn(xn)(k) and fn(xn)(k) represent the averages of the functions x, G(t,x(t)) and F(t, x(t)) over intervals of size 2~nL, changed bases, and then eliminated half of the variables. At each resolution level j , we wrote Xj as a piecewise constant function, with the constants equal to the averages of x(t) over intervals of size 2 - J L . We can view this decomposition of Xj as a projection onto the stretched Haar scaling functions. Taking into account also a convenient change of normalization, we can write this decomposition as a stretched orthogonal Haar decomposition
Mvltireaolution Homogenization
Schemes
253
2>-l fc=0
where ^ t = JJTXJ^,. and c/^fc = X / j t . Furthermore, we can interpret the recurrence relations for gj and fj as the coordinate form of operator recur rence relations and we can view this procedure as a generalization of the MRA reduction procedure. The dual scaling function <j> satisfies the following refinement equation: 1
fc,-+i
1 ,
,
and is associated with the filter h = {\,\}\
1-
IT
i.e.
2J+,-l 1=0
where the only nonzero entries in hjtk,i are the entries hjtk,2k+i = 1/2 and hj,k,2k — 1/2. The primal scaling function (j> satisfies the refinement equation <$>j,k - Xlj
and is associated with the filter h = {1,1}. We claim that our average operator S is merely the application of the filter h associated to the dual scaling function
1
Sxj(k) = -(xj(2k +1) + Xj(2k)) -= £
2^-1
hj,Ki Xj(l) = Y, hk,i (x, ij,i)
1=0
= I x, ^
1=0
hjtkj,i ) = (x,>j-i,fc)
Therefore, S acts on the coefficients in the decomposition of Xj and gives us the coefficients {(x, 4>j-i,k) I k = 0 , . . . , 2-*-1 - 1} in the decomposition 2J-'-i x
i-i
-
2_/ {x'
of x(t) into a piecewise constant function at scale 2 ~ J + 1 .
254 A . C. Gilbert Let us now discuss the primal and dual generalized Haar wavelets. The standard definition of the filters for the wavelets, given those for the scaling functions, determine the primal and dual wavelets by
i'j.k — 4>j+l,2k+l - <j>j+l,2k and the associated filters by g = {-1/2,1/2} and g = {-1,1}. We claim that our difference operator D is the application of the filter g renormalized by l/hj. We can see that applying D to Xj and evaluating at the result at k gives us the same result as computing the inner product of x with the renormalized dual wavelet tl>j-i,k'2i l
1
~
1
/
1
\
DXj(k) = -^(xj(2k +1) - Xj{2k)) = J2 J - & , * , I * J ( Q = (x> £ 7 ^ - 1 . * / • To achieve the correct normalization for the differences Xj — Xj-i between the two approximations Xj and Xj_i, we must multiply the primal wavelet V'j-i.fc by hj to balance the normalization of ^ j . j ^ . Therefore, the renormalized primal and dual wavelets satisfy the refinement equations: £-V>j-i,fc = -^-(4>j,2k+i - 4>j,2k) = -j^ixij.^+i 3
3
-
xij,2k).
j
h h jTpj-i,k - y(
h
The operator D acts on the coefficients in the decomposition of Xj and gives us the coefficients {{x, l/hjipj_itk) | k = 0 , . . . , 2 J _ 1 — 1} in the decomposition of the differences
Xj
Xj _ 1 =
X ) (x>T-ii>j-i,k)hjii>j-i,k= fc=o x "■' '
y^ t=o
(x,ij>j-i,k)i/>j-i,k-
It is easy to check that the integration operator K has the form
Midtiresolution Homogenization Schemes
II Kj = hj
255
0
2 1
in the (renormalized) biorthogonal Haar basis. We verify that the quantity
hAy)dvdx
/ 4>JAX) Jo
\h,k{y)dydx
TXIJAX) n Jo j
=/
Jo
h 2 ' hj, I 0,
Jo
l-k ' I > k + 1, l
gives us the entries in the matrix representation of Kj. 4.3. Characterizing
the average in terms
of L
We begin with the forward system of integral equations u(x) — u(0) — / w(s) ds, Jo w(x)-w(0)= Jo
f \ ^ s ) - ( l - - ^ ) u ( s ) - ^ d s , \ ai(s)/ ai(s)
where x e [0, L] and u and w satisfy the boundary conditions u(0) = u{L) and w(0) — w(L). The interval length L is an arbitrary value. We will construct coefficients oo(x) and ai(a;) according to the description in Sec. 4.1. We use the values ao,b - - 0 . 4 ,
a0,d = 0.65,
ai,t =
ahd = 3 •
3'
We will determine the effective system for the averages UQ and Wo of u and w over the interval [0, L\. We will use the series expansions of the recurrence
256
A. C. Gilbert
relations (and retain the two lowest order terms in each series) and reduce to resolution level j — 0. We begin with an initial resolution size hn = 2~nL and use the dyadic partitioning on [0, L] described in Sec. 4.2.3. We will solve the effective system for uo and WQ and will examine the dependence of the averages on L. We will use the techniques in Sec. 4.2.2 to solve the effective boundary value system. We initialize the recurrence relations with the values
W„
fl(») _
70,n =
Un - anUn - /?„
*s
-7l,n ( » )— _
where a n = 1 — l / o i | B and /?„ = Oo,n/ai,n- Recall that u„ and wn are piecewise constant approximations to our solutions u and w at scale n. Let us reduce one step to resolution level n — 1 (and step-size 2~n+1L) so that we may determine the form of the terms at an arbitrary resolution level j . We apply the two-dimensional recurrence relations from Sec. 4.2.1 and the rescaling arguments from Sec. 4.2.3 and calculate (n)
/ «n-l
7o,n-l Wn-i
™n-l "O.n-l —
" n - i - S a n u n _ i -S/3„ —
7l,n-l —
- S o . — - - S A ,
3ul_,w, -
San — 0
y-Dan-r
+
- D a „ — \
-un_1<_1y
— j
Multi-resolution Homogenization Schemes
257
If we apply the recurrence relations repeatedly, we see that 7QV, ^oj > nj and 9$ are polynomials in Uj and Wj. Several coefficients of these polynomials are determined by applying the recurrence relations to an and /? n . That is, some of the coefficients of these polynomials are averages and differences of a „ and (3n and some of the coefficients are constants. Note that an and /3„ are the only quantities which depend explicitly on k. Furthermore, L appears in the denominator of several coefficients in these polynomials for 7}"' and 0j"-. At resolution level j = 0 the reduced system for the averages uo and wo is u0 + L 2 (CHUQ + C12U0 + ci 3 ) - u(0) = -w0 ,
(4.19)
W0 + L2 I — U0 + C22W0 + -j- + C24«0U'0 1 - «>(0)
=
2" ( U° + °3lU°
+
°32 + ^ ( C33U°wo
+
~TW° ) I ■
( 4 ' 2 °)
At this resolution level the step size is ho = L. The coefficients cj m are given by n-l
2"-l
N-1
p=0
q=0
1=0
1
n-l
.
p=0
c
" = E ^rP 2 E £ E -««(«*+') p=0
q=0
1=0
c
* = E 2*5 E ' £ E -*.(«*+0 p=0
9=0
(4-22) (4-23)
1=0
M/2-1 c
2" = 2M2 E
(2Z + 1 ) ( - Q « W 2 - 1 - 0 + o»(M/2 + 0),
(4-24)
1=0 C22 = C12,
(4.25) M/2-1
c
« = 2M2 E 1=0
(2/ + ! ) ( - ^ ( M / 2 - 1 - 0 + /3n(M/2 + 0),
(4.26)
258
A. C. Gilbert n-1
2',-l ,
,
C
24 = XJ 2<+3P ^ p=0 .
9=0
N-1
N ^
4
i=0
as n —> oo,
(4.27)
M-l
(4.28) i=0 1
M-l
(4.29) /=0 n-1
2"-l
1
1
N-1
C
33 = E 25+3p E AT S p=0
9=0
1
6
^
4
(4.30)
as n -4- oo,
Z=0
^E^E22^-" p=0
9=0
/N/2-1 X
\
( E
~<*n{qN + 0 + *n(giv- + JV/2 +1) J ,
(4.31)
where M = 2 n and iV = 2 n _ p . The reader may check these formulas by induction. In Table 4.1, we list the values of the coefficients cj m for initial discretization level n = 15. (We also calculated the coefficients beginning with 2 20 values for 0:20 and /?20 and found no difference between the coefficients for the two initial resolutions, n = 15,20.) In other words, we eliminated the dis cretization error from the coefficients in the reduced system (4.19) and (4.20). The error from the truncation of the Taylor series, however, still remains. We now apply the same methods to the reversed integral system u{L) — u(L — x) = j Jo w{L) -w{L-x)=
w(L -
s)ds,
j u3(L - s) Jo -
1
\
77
a^L-s);
T)U(L-S)
)j
'rd,S,
ai{L-s)
where x € [0, L] and u and w satisfy periodic boundary conditions. effective system for uo and WQ is
The
Multiresolution Homogenization Schemes
259
Table 4.1. This table lists the values of the coefficients cj m for the initial resolution level n = 15. Coefficient c j m
Value
Cll
0.083 333333 333 333 3
C12
0.004 689078 664008 6
C13
0.0231912486796427
C21
0.000 000 000 000 0000
C22
0.004 689078 6640086
C23
0.000 000000 000 0000
C24
0.250 000000000 0000
C31
0.056 268944020 508 4
C32
0.278 294984414 912 4
C33
0.250 0000000000000
C34
0.000000000000 0000
u{L) - (u 0 + L2(cnul
(4.32)
+ c12«o + c 13 )) = -u>o ,
2 w(l) -- [w ( w00 + + L21[ -^-u —u0 + c C22W0 4«o«;o 0 ) 22W0 ++ - -y -p ++ Cc224UQW
=
~2 ( "° +
C3lU
°
+
°32 +
L,2 C33U w
\
°o
1
+ ~J~W° ) ) •
(433)
As in Sec. 4.2.2 we add Eqs. (4.19) and (4.32) and eliminate the boundary conditions for u. The sum of these two equations gives us an equation for Wo, 0=
LWQ
,
from which we get ion = 0. In fact, we could have determined that the average ler since of w = 3^ is zero in a different manner 0 =
«(L) - u(0)
-ir^-ir-" 1 fL du(x) ,
1
rL
dx.
We now add Eqs. (4.20) and (4.33), after substituting Wo = 0, and eliminate the boundary conditions for w. The sum gives us an equation for UQ
260
A.C. Gilbert ° = y ( " 0 + C31«0 + C32) = UQ +C31U0 + C32 .
This cubic equation has one real root uo ~ —0.62417, the average of u, and the two complex roots uo « 0.31209 ± 0.590 31i, all of which are independent of L. We may conclude that the average of u over [0, L] is independent of L up to the accuracy of our average value. Let us compare our results with those obtained by Shvartsman (Ref. 20) and determine the accuracy of our method. Instead of working directly with the second-order ODE which determines the steady-state solution(s), Shvartsman discretized the system of PDEs (Eqs. (4.1) and (4.2)) and sought solutions whose time-derivatives were zero. He used a pseudo-spectral code with 256 collocation points and 50 Fourier modes. He also used a continuation method to find the value of L at which a bifurcation occurred. For L < 47.5TT the steady-state is stable and for L > 47.5n the steady-state is unstable. In other words, at L = 47.57T there is a bifurcation in the system. Table 4.2 lists Table 4.2. This table lists the averages of the solution u(x) over the interval [0, L] as a function of the interval length L. Period length L
Average uo
46.500000
-0.62646796
46.510 100
-0.62646667
46.530100
-0.62646412
46.570100
-0.626459 01
46.650100
-0.626448 82
46.810100
-0.626428 53
47.130100
-0.626388 23
47.450100
-0.62634835
47.770100
-0.62630886
48.090100
-0.626269 77
48.410100
-0.62623107
48.730100
-0.626192 76
49.050100
-0.626154 83
Multiresolution Homogenization -0.6261 S I
1
i
,
1
4.6262 -
1
261
1
yr
-0.62625 -
jf
-0.6263 -
/
-0.62S3S -
jf
-0.6264 -
•0.62645 ■
Schemes
/
/
-0 6265 ' 46.5
'
'
'
1
i
47
47.5
48
46.5
49
49.5
ponoo Mnom L
Pig. 4.2. This is a graph of the average of the solution u(x) over the interval [0, L] as a function of the interval length L. Notice that there is an (almost) linear relationship between the average uo and the period length L.
the averages of u over [0, L] as a function of L, as computed by Shvartsman. These are averages of the 256 point values for tt. Figure 4.2 is a graph of these values. Figure 4.3 is a graph of u(x) for L = 48.07T where u is computed by the pseudo-spectral method. We should point out that the pseudo-spectral method gives point values or samples of the solution u rather than averages of it over very small intervals. This distinction between point values and averages is a source of discrepancy between our two sets of results. Assume that the 2 8 values u(xk) of the solution are the samples of u at the midpoints of the intervals In>k = [2~8kL,2~6(k + 1)L). The difference between the point value of u(xfc) and the average of u(x) over the interval /„,* is given by I u x
( k)
~ Tft
/-M*+i) I
1 u(x)dx
n Jhnk
+ u'(xk)(x
th„(k+i)
= U(lfc) - — / n
n
- xk) + \u"{xk){x
U(lfc)
Jhnk
- xk)2 + R(x) dx,
(4.34)
262 A. C. Gilbert -0.52
■0.54 -1
Fig. 4.3. This is a graph of the solution u(x) for period length L = 48.Ow. Since the solution u(z) is computed by the pseudo-spectral method there are small oscillations in the solution which are a result of Gibbs' phenomenon.
where R(x) is the remainder term which we assume to be bounded by Ch\. Notice that because u(x) satisfies a second-order ODE, we can substitute Qo(xfc)
u"(xk) = u3(xk) - (1 - —L-)u{xk)
- 2°
into our expression for the difference (4.34). We then simplify Eq. (4.34) and obtain 1
rhn(k+1)
u{xk) - — / h n Jhfcn(fc)
rw u(x) ax = -
■fc«(fc+i)
«'(«*) n
(x — xjt)dx
Jhnk
-(-<-)-('-=55)-^) j
x —
,fc„(fc+l)
/
•''In 7/»„fc
^)^+al
( x - x f c ) 2 d x + C/i^
Multireaolution Homogenization
Schemes
263
We used the assumption that Xk is the midpoint of In,k to eliminate the first integral on the left of the above equation. Let us examine the "constant" c(xfc) to see how large the quantity c(xk)h\ is. First, u{xk) ranges between —0.650 7 and -0.5249. This is a rough approximation since we have a number of solutions u(x), one for each value of L. Second, 1 ^—r and a°)Xk\ range between —1/2 and 1/2, and between —3/5 and 1/8, respectively. Putting these together, we determine that |c(a;fc)| is no larger than 0.00717 and that the discrepancy between the point values u(xfc) and the averages un(k) is less than \c(xk)\hl
< (0.00717)(2 -8 x 48TT)2 « 0.00249.
Therefore, we should expect uo, the average of u calculated by the reduction procedure, to agree with the average of u as calculated by Shvartsman to no more than three decimal places. In fact, this is exactly what we see. Therefore, the average «o is independent of L up to three decimal places. If we use the series expansions of the recurrence relations and retain the three lowest order terms to increase the accuracy of our solution, we find that the average uo does indeed depend on L. This observation is reflected in Shvartsman's data (see Table 4.2). It is interesting to note that this dependence on L and this structure of the average uo does not depend on the coefficients ao and a\. The value of UQ clearly depends on these coefficients, but the form of the effective system does not. Only the coefficients of the effective system depend on ao and ax. So, regardless of the geometry or the nature of the composite medium, the average of the steady-state solutions will not depend on the size of the medium (to first order). 4.4. Complexity
of reduction
algorithm
In this section we examine the complexity of the algorithm to compute the effective system for ito and two- In Sec. 4.3 we showed that the effective system for uo and wo is a system of polynomials in uo and WQ with coefficients cj m given by Eqs. (4.21)-(4.31). We assume that the form of the polynomials in this system is determined ahead of time, separate from the computation of the coefficients belonging to these polynomials or "off-line". We also assume that the initial averages an(k) and /?„(fc) are computed off-line. This leaves only the computation of the coefficients cj m . There are 11 coefficients; several are constants and several are simple averages of the 2 n values for an(k) or 0n(k). The more complicated coefficients, such as C12 or C34, require n 2 n operations.
264 A. C. Gilbert
If we let N = 2 n , then we can compute these coefficients in 0(N log N) steps. Once we compute the coefficients in the system of polynomials, we have to solve a two-dimensional nonlinear system using our favorite nonlinear solver. However, this should not be very costly since one relation in our system is linear (wo = 0) and the second relation is simply a cubic polynomial. 4.5.
Conclusions
We can generalize and extend the MRA reduction methods for nonlinear ODEs to boundary value problems, to small systems of differential equations, and to equations which are denned on intervals of arbitrary length [0, L\. Also, we can use a generalized (bi)orthogonal MRA as a framework for our reduction procedure. We can apply the MRA reduction procedures and their generalizations to characterize the steady-state solutions of a model reaction-diffusion equation. We find that the average of the steady-state does not depend on the size of the composite medium (up to first order) and that this independence holds regardless of the geometry and nature of the inhomogeneous material. We also find that the procedure for calculating the average of the solution is computa tionally inexpensive. This is only a preliminary study and is a first step towards the more diffi cult task of reducing the coupled system of PDEs which models reaction and diffusion on composite surfaces. We are currently extending MRA reduction methods to nonlinear PDEs to explore how spatial and temporal scales interact with each other and with the inherent scales of the composite surface. 5. Conclusions The MRA strategy for the homogenization of differential equations consists of two algorithms; a procedure for extracting the effective equation for the coarse scale behavior of the solution (the reduction procedure) and a method for con structing a simpler equation whose solution has the same coarse scale behavior (the augmentation or homogenization procedure). For physical problems in which one wants to determine only the average behavior of the solution or how the average depends on physical parameters such as the length of the medium or the amplitude of the forcing, the reduction process is very useful and is not part of the classical theory of homogenization. On the other hand, the MRA homogenization procedure produces a homogenized equation which preserves important physical characteristics of the original solution.
Multiresolution Homogenization
Schemes
265
The MRA method can be applied to linear and nonlinear systems of differ ential equations, including both initial and boundary value problems. We can also apply the MRA methods to problems which contain a continuum of scales; we need not restrict ourselves to problems with a finite number of distinguished scales. We can include boundary values of the solution and the length of the interval over which the differential equation is defined as parameters in the re duction procedure so that we may determine how the coarse scale behavior of the solution depends on these values without computing the effective equation for every value of these parameters. There are many directions in which we can extend this work. One direc tion is to develop MRA homogenization methods for nonlinear PDEs. The reaction-diffusion model in Sec. 4 is just one example of a future application. Another direction is to use the techniques in Ref. 7 to explore the homogenized coefficients of iV-dimensional elliptic equations and to compare the results with those from classical theory. We think that these MRA methods are important new numerical and analytical tools for many applications and that there are many interesting issues to explore. Appendix A A multiresolution analysis (MRA) of L2([0,1]) (square-integrable functions with period 1) is a decomposition of the space into a chain of closed subspaces ^oCViC---cVn--. such that
[Jvj = L2({0,1})
and
j>0
If we let Pj denote the orthogonal projection operator onto Vj, then linij-foo Pjf = f for all / € L2([0,1]). We have the additional requirements that each subspace Vj (j > 0) is a rescaled version of the base space V^: / € Vj; <=> f{V-) G V0 . Finally, we require that there exists
266 A. C. Gilbert
{
k).
As a consequence of the above properties, there is an orthonormal wavelet basis Wj,k\j> 0,* = 0,... > 2 * - l } of L 2 ([0,1]), V>j,fc(x) = 2i/2ip(2>x - k), such that for all / in L2([0,1]) 2'-l
Pj+1/ = Pi/+^(/>J-,fc>Vi,fc. fc=0
If we define Wj to be the orthogonal complement of Vj in Vj+i, then
We have, for each fixed j , an orthonormal basis {rpj,k\k = 0 , . . . , 2 J — 1} for Wj. Finally, we may decompose L2([0,1]) into a direct sum
^2([o,i]) = v b 0 ^ . i>o
The operator Qj is the orthogonal projection operator onto the space Wj. The Haar wavelet rp and its associated scaling function <j> are denned as follows: . 1,
x G [0,1) elsewhere
and
ip(x)
l,
x e [o, 1/2)
-1,
x£ [1/2,1)
0,
elsewhere.
Acknowledgments First, I would like to thank my advisor Ingrid Daubechies for her patience, guidance and inspiration. She has been an excellent mentor both mathemati cally and personally. I would also like to thank Greg Beylkin and Mary Brewster. Their work set the stage for this thesis and we worked together on a major part of it. My
Multiresolution Homogenization
Schemes
267
visits to Pacific Northwest National Laboratories to work with Mary were very rewarding and enjoyable. I also thank Ioannis Kevrekidis for his interest and enthusiasm in this work. The final portion of this thesis was completed through his encouragement. I have received financial support and guidance through two different pro grams and two different companies. I thank Lawrence Cowsar and Wim Sweldens of Lucent Technologies for their support through the Ph.D. Fellow ship program. I thank Robert Calderbank for his support at AT&T Labs. (formerly AT&T Bell Laboratories) through the Graduate Research Program for Women. I would also like to thank my friends and colleagues in the mathematics, applied mathematics, and chemical engineering departments at Princeton Uni versity; especially George Donovan, Mark Johnson, Peter Kramer, Jonathan Mattingly, Stas Shvartsman and Terence Tao for their many helpful discus sions. I give personal thanks to my mother Lynn Gilbert for her support and encouragement. I also thank my father and stepmother John and Vicki Gilbert for their support. I give thanks to Phyllis and Walter Strauss for welcoming me into their family. Finally, I would like to thank Martin Strauss for his love and patience. Bibliography 1. A. Askar, B. Space and H. Rabitz, The subspace method for long time scale molecular dynamics, J. Phys. Chem. 99, 7330-7338 (1995). 2. M. Bar, A. K. Bangia, I. G. Kevrekidis, G. Haas, H.-H. Rotermund and G. Ertl, Composite catalyst surfaces: effect of inert and active heterogeneities on pattern formation, J. Phys. Chem. 100, 19-106 (1996). 3. C. M. Bender and S. A. Orszag, Advanced Methods for Scientists and Engineers (McGraw-Hill, 1978). 4. A. Bensoussan, P. L. Lions and G. Papanicolaou, Asymptotic Analysis for Peri odic Structures (North-Holland, 1978). 5. G. Beylkin, M. E. Brewster and A. C. Gilbert, Multiresolution homogenization schemes for nonlinear differential equations, Appl. Comput. Harmonic Anal. 5, No. 4, 450-486 (1998). 6. G. Beylkin, R. Coifman and V. Rohklin, Fast wavelet transforms and numerical algorithms I, Comm. Pure Appl. Math. 44 (1991). 7. G. Beylkin and N. Coult, A multiresolution strategy for reduction of ellip tic PDE's and eigenvalue problems, Appl. Comput. Harmonic Anal. 5, No. 2, 129-155 (1998).
268 A. C. Gilbert 8. F. Bornemann and C. Schiitte, Homogenization of Hamiltonian systems with a strong constraining potential, Physica D102, No. 1-2, 57-77 (1997). 9. M. E. Brewster and G. Beylkin, A multiresolution strategy for numerical homog enization, Appl. Comput. Harmonic Anal. 2, No. 4, 327-349 (1995). 10. A. Cohen, I. Daubechies and J. Feauveau, Bi-orthogonal bases of compactly sup ported wavelets, Comm. Pure Appl. Math. 4 5 , No. 5, 485-560 (1992). 11. A. Cohen, I. Daubechies and P. Vial, Multiresolution analysis, wavelets, and fast algorithms on an interval, Appl. Comput. Harmonic Anal. 1, 54-81 (1993). 12. R. Coifman, P. L. Lions, Y. Meyer and S. Semmes, Compensated compactness and hardy spaces, J. Math. Pures Appl. 72, No. 3, 247-286 (1993). 13. R. Coifinan, Y. Meyer and M. V. Wickerhauser, Size properties of wavelet packets, in Wavelets and Their Applications, eds. M. Ruskai et al. (Jones and Bartlett, 1992), pp. 453-470. 14. A. C. Gilbert, A comparison of multiresolution and classical one dimensional homogenization schemes, Appl. Comput. Harmonic Anal. 5, No. 1, 1-35 (1998). 15. V. V. Jikov, S. M. Kozlov and O. A. Oleinik, Homogenization of Differential Operators and Integral Functionals (Springer-Verlag, 1994). 16. J. Kervorkian and J. D. Cole, Perturbation Methods in Applied Mathematics (Springer-Verlag, 1985). 17. P. A. Lagerstrom, Matched Asymptotic Expansions, Ideas and Techniques (Springer-Verlag, 1988). 18. F. Murat, Compaciti par compensation, Ann. Scuola Norm. Sup. Pisa Cl. Sci. 5, No. 3, 489-507 (1978). 19. C. Schiitte and F. Bornemann, Homogenization approach to smoothed molecular dynamics Nonlinear Anal. 30, No. 3, 1805-1814 (1997). 20. S. Shvartsman, private communication. 21. S. Shvartsman, A. K. Bangia, M. Bar and I. G. Kevrekidis, On spatiotemporal patterns in composite reactive media, preprint, 1996. 22. B. Space, H. Rabitz and A. Askar, Long time scale molecular dynamics subspace integration method applied to anharmonic crystals and glasses, J. Chem. Phys. 99, No. 11, 9070-9079 (1993). 23. W. Sweldens, The lifting scheme: A construction of second generation wavelets, SIAM J. Math. Anal. 29, No. 2, 511-546 (1997). 24. L. Tartar, Compensated compactness and applications to partial differential equa tions, in Heriot-Watt Symp., IV (1979).
269
LOCAL F E A T U R E E X T R A C T I O N A N D ITS APPLICATIONS USING A LIBRARY OF BASES
Naoki Saito
To the Saito Family And to the Memory of My Mother Tervko
Contents 1. Introduction 1.1. Importance of feature extraction 1.2. Historical background on feature extraction 1.3. The best basis paradigm and a library of bases 1.4. Overview of the thesis 2. A Library of Orthonormal Bases 2.1. Introduction 2.2. Wavelet bases 2.3. Wavelet packet bases 2.4. Local trigonometric bases 2.5. Selection of a "best basis" from a library of orthonormal bases 2.5.1. Information cost functions 2.5.2. Best basis selection from a dictionary of orthonormal bases 2.5.3. Best basis selection from a library of orthonormal bases 2.6. Joint best basis and Karhunen-Loeve basis 2.7. Extension to images
272 272 273 275 278 280 280 281 285 286 289 289 289 291 292 294
270
N. Saito
3. A Simultaneous Noise Suppression and Signal Compression Algorithm 3.1. Introduction 3.2. Problem formulation 3.3. The minimum description length principle 3.4. A simultaneous noise suppression and signal compression algorithm 3.5. Examples 3.6. Discussion 3.7. Summary 4. Local Discriminant Bases and Their Applications 4.1. Introduction 4.2. Problem formulation 4.3. A review of some pattern classifiers 4.3.1. Linear discriminant analysis 4.3.2. Classification and regression trees 4.4. Construction of local discriminant basis 4.4.1. Discriminant measures 4.4.2. The local discriminant basis algorithm 4.5. Examples 4.6. To denoise or not to denoise? 4.7. Signal/background separation by LDB 4.8. Summary 5. Local Regression Bases 5.1. Introduction 5.2. Problem formulation 5.3. Construction of local regression basis 5.4. Examples 5.5. Discussion 5.6. Summary 6. Extraction of Geological Information from Acoustic Well-logging Waveforms Using LDB and LRB Methods 6.1. Introduction 6.2. Data description and problem setting 6.3. Results 6.3.1. Analysis by LDB 6.3.2. Analysis by LRB
295 295 296 297 303 308 312 315 316 316 316 318 318 319 321 321 323 327 333 334 336 337 337 337 338 341 345 347 348 348 352 354 355 363
Local Feature Extraction and Its Applications
6.4. Discussion 6.4.1. On the choice of the training data set 6.4.2. Using the physically-derived quantity 6.4.3. On the measure of regression errors 6.5. Summary 7. Multiresolution Representations Using the Autocorrelation Functions of Wavelets and their Applications 7.1. Introduction 7.2. Orthonormal shell: A shift-invariant representation using orthonormal wavelets 7.2.1. The orthonormal shell 7.2.2. A fast algorithm for expanding into the orthonormal shell 7.2.3. A fast reconstruction algorithm 7.3. Autocorrelation shell: A symmetric shift-invariant multiresolution representation 7.3.1. Properties of the autocorrelation functions of compactly supported wavelets 7.3.2. The autocorrelation shell of compactly supported wavelets 7.3.3. A direct reconstruction of signals from the autocorrelation shell coefficients 7.3.4. The subsampled autocorrelation shell 7.4. A review of Dubuc's iterative interpolation scheme 7.5. On reconstructing signals from zero-crossings 7.5.1. Zero-crossing detection and computation of slopes . . 7.5.2. An algorithm for reconstructing a signal from its zero-crossings representation 7.5.3. Examples 7.6. Summary 8. Further Development 8.1. Introduction 8.2. Discovering time-warping functions by the entropy minimization principle 8.2.1. Problem formulation 8.2.2. Numerical implementation 8.3. Discussion 9. Conclusion Appendix A. MDL-Based Tree Pruning Algorithms A.l. Introduction
271
370 370 378 382 382 383 383 385 387 388 393 393 394 398 407 408 411 418 418 419 423 424 425 425 429 429 430 432 433 436 436
272
N. Saito
A.2. Minimal cost-complexity pruning A.3. MDL-based pruning algorithms Bibliography
436 438 443
1. Introduction 1.1. Importance
of feature
extraction
In analyzing and interpreting signals such as musical recordings, seismic sig nals, or stock market fluctuations, or images such as mammograms or satellite images, extracting relevant features from them is of vital importance. Often, the important features for signal analysis, such as edges, spikes, or transients, are characterized by local information either in the time (or space) domain or in the frequency (or wave number) domain or in both a : for example, to dis criminate seismic signals caused by nuclear explosions from the ones caused by natural earthquakes, the frequency characteristics of the primary waves, which arrive in a short and specific time window, may be a key factor; to distinguish benign and malignant tissues in mammograms, the sharpness of the edges of masses may be of critical importance. In this thesis, we explore how to extract relevant features from signals and discard irrelevant information for a variety of problems in signal analysis. We address five aspects of signal analysis: Compression: how to represent and describe signals in a compact manner for information transmission and storage. Noise removal: how to remove random noise or undesired components from signals (also called denoising). Classification: how to classify signals into known categories. Regression: how to predict a response of interest from input signals. Edge Characterization: how to detect singularities (e.g., spikes, step edges, or "ramp" edges) in signals and characterize them. Although the methods we develop here can be applied to many different types of signals and images, we focus our attention on those measured by a FVom now on, unless otherwise stated, time and frequency also means space and wave number (spatial frequency) respectively.
Local Feature Extraction and Its Applications
273
sensing devices and representing certain properties of natural objects such as acoustic properties of subsurface geological formations. Also, the signals and images treated in this thesis (whether synthetic or real) are all discrete: they are simply vectors and matrices consisting of a finite number of real-valued samples. These signals normally have very large number of samples; e.g., a typical exploration seismic record per receiver has 1,000 samples, and a typical CT scanner image has 512 x 512, or 262,144 samples. Therefore, extracting only important features for the problem at hand and discarding irrelevant information (this strategy is called the reduction of dimensionality) are crucial; if one succeeds in doing so, the subsequent objectives can be improved in both in accuracy and computational efficiency. 1.2. Historical
background
on feature
extraction
The problem of feature extraction in general has intrigued many scientists, and several fundamental ideas have been proposed. The philosophically most important and notable among them are as follows: Kolmogorov contended that the best description of data is defined as the length of the shortest com puter program to generate that data (this length is called the Kolmogorov complexity)82 (see also Ref. 90). This concept was further refined by Rissanen and led to the minimum description length principle,128 which states that the best theory to infer from data is the one minimizing the sum of the length of the theory and the length of the data using the theory both encoded as binary digits. Watanabe paraphrased that "pattern recognition is a quest for the minimum entropy, b when suitably defined". 151 ' 152 Grenander proposed the pattern theory for understanding regular structures of patterns based on the "analysis by synthesis" approach, i.e. analyzing patterns by decomposing into and synthesizing from the elementary building blocks64 (see also the work of Mumford 108 ). All of these proposals essentially share the same theme: an efficient and compact representation or description of data suitable for the problem at hand leads to an improvement in solving the problem itself. Al though these proposals have influenced many others including the author of this thesis, their numerical implementations are not always straightforward and efficient. This thesis attempts to answer partially how to implement these philosophical proposals via concrete and efficient numerical algorithms for a specific form of data, i.e. discrete signals and images. Entropy is essentially a measure of the disorder in a system. We shall define entropy more precisely in the following sections.
274
N. Saito
If we look back at practical methodologies proposed for feature extraction in signal analysis, they may be grouped into two general approaches: one is the statistical (or decision-theoretic) approach (representative references are Refs. 48, 63 and 103); the other is the structural (or syntactic) approach. 62 ' 114 The major accomplishments in the statistical approach — some of them are reviewed in the subsequent sections — are the development of the Fast Fourier transform (FFT) 3 7 for signal representation and noise removal, the KarhunenLoeve transform (KLT) [also known as Principal Component Analysis (PCA)] for signal compression, 74 ' 80,91,150 the Linear Discriminant Analysis (LDA) 59 for classification, and more generally, the Projection Pursuit (pp), 6 1 - 7 7 . 8 6 Clas sification and Regression Trees (CART™), 1 8 or Artificial Neural Networks (ANNs). 24,107 ' 125 Each of them can be considered as a tool for providing an efficient coordinate system (whether orthogonal or not) for the specific prob lems. However, when we use them to extract features, especially the features whose energy is localized simultaneously in the time and frequency domains, we immediately face one or both of the following problems: (a) computational complexity and (b) inability to capture localized features. KLT and LDA re quire solving the eigenvalue systems which is an expensive operation 0(n3), where n is a number of samples in each signal. In the case of P P and ANNs, optimization of certain nonlinear functions of high dimensional variable is re quired; in general, they are computationally expensive, and moreover, easily get stuck in local minima of these nonlinear objective functions. CART re quires searching and sorting all coordinates for the best partition of the input signal space. The second difficulty, the lack of ability to capture local features, implies that the interpretation of the features extracted by these methods be comes difficult. For example, with FFT, one can analyze signals locally in the frequency domain, but cannot obtain local information in the time domain at all. Since KLT and LDA rely on the eigenvalue systems, they are fragile to outliers and perturbations and only extract global features in either the time or frequency domain. ANNs have difficulty in interpreting the physical meaning of the weights of the connections among neurons (computational units), and so on. In short, these methods are overwhelmed by the direct input of raw signals of large dimensions; they are not originally designed as feature extrac tors even though they have been used as such. Rather they are designed to optimize certain criteria for the specific tasks such as compression, denoising, classification, and regression for given features. Therefore, they can be very powerful tools if a small number of key features are supplied to them.
Local Feature Extraction and Its Applications
275
On the other hand, the structural approach, 62 ' 114 is based on the philosophy of "analysis by synthesis". This is similar to the pattern theory of Grenander, which describes signals in terms of a hierarchical composition of predefined primitive components or elementary building blocks (e.g., peaks/valleys or en ergy levels of certain frequency bands of signals etc.); see Ref. 62 for various examples. In particular, this approach associates the description of signals with a formal language theory: it makes the correspondence of: (a) the elementary features of the signals; (b) the signals themselves; and (c) the structural de scription of the signals or class of signals (normally tree structures) to: (a) the words; (b) the sentences; and (c) the syntax or grammar of languages. The fundamental problem of this approach is how to define the elementary building blocks in the first place. It does not solve the feature extraction problem either. Considering this situation, it is worthwhile to pursue a unified approach to feature extraction to fully utilize these available statistical tools and to permit the intuitive interpretation of the results, as in the structural approach. This thesis offers a new approach by blending the statistical and structural approaches. 1.3. The best basis paradigm
and a library of bases
The approach to the feature extraction explored in this thesis is guided by the so-called best basis paradigm.35'105 This paradigm consists of three main steps: 1. select a "best" basis (or coordinate system) for the problem at hand from a library of bases (a fixed yet flexible set of bases consisting of wavelets and their relatives: wavelet packets, local trigonometric bases, and the autocor relation functions of wavelets), 2. sort the coordinates (features) by "importance" for the problem at hand and discard "unimportant" coordinates, and 3. use the surviving coordinates to solve the problem at hand. What is "best" and "important" clearly depends on the problem. For sig nal compression, a basis which provides only a few large components in the coordinate vectors should be used since we can then discard the other compo nents without much signal degradation. Thus, to measure the efficiency of the coordinate system for compression, an information cost such as entropy may be appropriate since entropy measures the number of significant coordinates in a vector. For classification, a basis through which we can "view" classes as maximally-separated point clouds in the n-dimensional space, where n is
276
N. Saito
the number of samples in each signal, is a choice. In this case, the class sep arability index or "distances" among classes (such as relative entropy) should be used as a measure of the efficiency of the coordinate system. For regres sion, a basis through which we can "see" the essential relationships between the input signals and output responses of interest should be used. For this purpose, prediction error such as relative £2 error may measure an efficiency of the coordinate system. For edge characterization problems, a coordinate system which allows one to detect, characterize, and manipulate edges in a convenient manner should be used. The original best basis paradigm was de veloped mainly for signal compression problems. 35 In this thesis, this paradigm is extended to noise removal, classification, and regression problems using these basis selection criteria. One may ask why we use the library approach. There are several answers to this question. If we confine ourselves to a single coordinate system for the problem at hand, we may lose our flexibility for handling various changes in signals; for example, KLT and LDA work perfectly if all the signals or the classes of signals obey the multivariate normal distributions, in fact, they pro vide the optimal coordinate system under these assumptions. However, for the cases where these assumptions cannot be guaranteed, they may fail miserably. On the other hand, if we try to seek the absolutely best coordinate system without limiting our resources and without assuming models as Kolmogorov suggested, it may take an infinite amount of time to compute or find it. In Y. Meyer's words, we cannot afford to have "the library of Babel" 105 which includes all possible coordinate systems of K n in our case. Therefore, a com promise is necessary. As Rissanen correctly pointed out in his book, 128 we need to define a language to describe the signals as in the syntactic approach, and within that language, we should seek the best possible solution by opti mizing certain criteria depending on the problem at hand. The language in our context is a collection of different coordinate systems or bases. Hence, the performance of the signal analysis tasks strongly depends on what language we use. The language must be flexible and versatile enough to describe various local features of signals such as transients and singularities but also must be computationally efficient to be practical. This is why we use wavelets and their relatives as library members. They provide flexible coordinate systems which can capture local features in the time-frequency plane in a computationally efficient manner: e.g., to find a good coordinate system for one's problem, it costs 0(n[logn] p ), where p = 0,1,2
Local Feature Extraction and Its Applications
277
depending on the basis type. These bases are particularly useful for the sig nals considered in this thesis. Some signals represent the multiscale or fractal nature of geological formations (see e.g. Refs. 83 and 146) which may be wellcompressed by the standard wavelet bases including the Haar basis. If one cares about edge types (steps or ramps, etc.) in this class of signals, then the autocorrelation functions of wavelets provide a natural way to detect them and characterize them. Other signals represent responses of natural objects to the inputs of acoustic energy of oscillatory nature. These signals may be handled naturally by local trigonometric bases since they perform "local Fourier analy sis" by segmenting the time axis into smaller windows. Moreover, these bases "almost diagonalize" Green's operator for the partial differential equations de scribing the system responses. 16 The wavelet packets can be considered as a dual version of the local trigonometric bases; they segment the frequency axis so that the oscillatory signals can be handled efficiently. Figure 1.1 shows some of the basis functions in the library. Standard Basis
Haar Basis
"I" -v0 20 40 60 80100120
0 20 40 60 80100120
0 20 40 60 80100120
C12Wavlet Packet Basis
Local Skis Basis
Discrete Sine Basis
0 20 40 60 80100120
0 20 40 60 80100120
-AT
0 20 40 60 80100120
Fig. 1.1. Examples of basis functions used in this thesis. Top row: from left, the standard Euclidean basis, the Haar basis, and the Walsh basis. Bottom row: the smooth wavelet packet basis, the local sine basis, and the discrete sine basis. Horizontal axes indicate time in this figure.
278
N. Saito
The connection of this approach to the syntactic approach can be explained as follows: as the words, i.e. the elementary building blocks, we use the basis vectors of the wavelets and their relatives, such as those shown in Fig. 1.1. A collection of words defines a dictionary which corresponds to a collection of wavelet packet bases specified by their frequency localization characteristics, or a collection of local trigonometric bases. As shown in Fig. 1.1, each dic tionary has its own characteristics or flavor; the "Haar-Walsh dictionary" has various scaled and oscillatory versions of a simple piecewise constant function whereas the "local sine dictionary" contains localized smooth oscillations, etc. A library is defined as a collection of such dictionaries. Finally, the grammar for describing a signal or a class of signals is defined as a binary tree of selected subspaces or basis vectors from this library. In summary, this paradigm (with the library of bases) provides us with an array of tools bridging between "two extremes," i.e. (a) the standard Eu clidean basis and the Fourier basis; (b) the computational efficiency and the descriptive efficiency; and (c) the statistical approaches and the structural ap proaches. This paradigm leads us to a vastly more efficient representation, processing, and analysis of signals, compared with strategies of confining our selves to a single basis, or of seeking the absolutely best solution without restricting the size of the library. This thesis is a record or a snapshot of the continuing effort of implementing this philosophy begun by Coifman, Meyer, and Wickerhauser. 35 ' 105 1.4. Overview
of the
thesis
This thesis explores a variety of feature extraction and signal analysis problems using the best basis paradigm. In particular, the original best basis paradigm proposed by Coifman, Meyer, and Wickerhauser is extended for: simultane ous noise suppression and signal compression, classification, and regression, focusing on the real geophysical datasets. We also derive a library of nonorthogonal bases using the autocorrelation functions of wavelets for multiscale edge characterization and representation. We first review our elementary building blocks, i.e. wavelets, wavelet pack ets, and local trigonometric bases, and define a dictionary and a library of orthonormal bases more precisely in Sec. 2. We also review the original best basis algorithm of Coifman and Wickerhauser, and compare it with the KLT. The original contribution of this thesis starts from Sec. 3, where we con sider how to simultaneously remove random noise (e.g., additive white Gaussian
Local Feature Extraction and Its Applications
279
noise) and compress a signal component and show how naturally the concept of the minimum description length principle fits the best basis paradigm. The key observation here is that one or more of the bases in a library of orthonormal bases can compress the signal component quite well whereas the noise component cannot be compressed efficiently by any basis in the library. Based on this observation, we derived an algorithm to estimate the signal component in the data by obtaining the "best" basis and the "best" number of terms to retain using the MDL criterion. Because of the use of the MDL criterion, this algorithm does not require the user to specify any parameter or threshold values. Then in Sec. 4 we derive an algorithm for selecting a basis suitable for clas sification from the library. This algorithm uses a discrimination measure such as relative entropy as a basis selection criterion and provides a fast numerical algorithm of 0(n[logn] p ), p — 1,2. Once the basis for classification has been selected [we call this the Local Discriminant Basis (LDB)], a small number of most important features (for classification) is supplied to the conventional clas sifiers such as LDA or CART. Using two synthetic datasets, we demonstrate the superiority of our method over the direct application of these conventional classifiers to the raw input signals. As a further application of LDB, we also describe a method to extract signal component from data consisting of signal and textured background. In Sec. 5, we extend the best basis paradigm for regression problems. The proposed method uses prediction error (computed by a specified regression scheme) as a measure of the goodness of bases so that the regression scheme is integrated into the basis selection mechanism. We call such basis Local Regression Basis (LRB). We show that the LRB method can also be used for the classification problems and examine its performance using the same examples as Sec. 4. The LRB method is more flexible and general than the LDB method; however, it is more computationally demanding than the LDB method. In Sec 6, we apply the LDB and LRB algorithms to a real geophysical regression problem, i.e. prediction of geological information about subsurface formations (e.g., volume fractions of quartz, gas, etc.) from acoustic welllogging waveforms. Using these methods, we extract the useful features for predicting this information. The results, in general, agree with the explana tions from the physics of wave propagation, although our use of the physics in constructing the regression rules is minimal. The best results using our
280
N. Saito
methods are found to be comparable to the predictions using the physicallyderived quantities. In Sec. 7, we derive a library of non-orthogonal bases using the autocorrela tion functions of wavelets with which we can explicitly extract multiscale edge information and characteristics of singularities. This library provides us shiftinvariant multiresolution representations of signals which have: (a) symmetric analyzing functions; (b) shift-invariance; (c) natural and simple iterative in terpolation schemes; (d) a simple algorithm for finding the locations of the multiscale edges as zero-crossings. Then we develop a non-iterative method for reconstructing signals from their zero-crossings (and slopes at these zerocrossings) in our representation. This method reduces the problem to that of solving a system of linear equations. For certain classes of signals, such as frequency-modulated signals (often called chirps), our basis functions in the library may not be too efficient. In Sec. 8, we discuss how to handle this problem and propose an algorithm to discover the modulation laws of simple chirp signals. Finally, we conclude in Sec. 9 with some discussion on our future projects. We would like to note that the following sections are the detailed and expanded version of our published materials. Section 3 is based on Refs. 129 and 130. Section 4 is based on Refs. 34 and 135. Section 5 is based on Ref. 34. Section 7 is based on Refs. 132-134. We concludes this section by quoting Y. Meyer 105 : "Wavelets, whether they are of the time-scale or time-frequency type, will not help us to explain scientific facts, but they will serve to describe the reality around us, whether or not it is scientific." In this thesis, we try to show that the good description sometimes makes an explanation of scientific facts easier. 2. A Library of Orthonormal Bases 2.1.
Introduction
In Sec. 1, we have defined our strategy for feature extraction: analyzing and describing signals using a library of bases, in particular, orthonormal bases of wavelets, wavelet packets, and local trigonometric transforms. In this section, we review the most important properties of these bases and precisely define what a dictionary and a library mean, for their applications starting from the
Local Feature Extraction and Its Applications
281
next section. Since the autocorrelation functions of wavelets will not be used until Sec. 7, we defer their definitions and properties. Then, as a first step of the "best basis paradigm", we review the "best basis" algorithm of Coifman and Wickerhauser 35 which was developed mainly for signal compression, and compare this basis with the well-known Karhunen-Loeve basis. In Sec. 2.7, we briefly review the higher dimensional versions of these bases. Historical aspects and more details of the properties of these bases can be found in the literature, most notably, in Refs. 3, 43, 105, 106, 124 and 157. Throughout this thesis, we consider real-valued discrete signals with finite length n (= 2 n °). To focus our attention on our main theme, we assume the periodic boundary condition on the signals; a signal x = (xk)£~Q is ex tended periodically beyond the interval [0, n — 1] with xk = xk (mo(j n ) for any k € Z if necessary. If one is concerned with the discontinuities created by the periodic boundary condition, their effects can be reduced by considering an evenly-folded version x' = (xo, ■ ■ ■,xn-i,in-i> • • •,xo) of period of 2n. The compactly-supported orthonormal wavelet bases which do not assume the pe riodic boundary conditions have been proposed; see e.g. Ref. 28. These can certainly be incorporated in our library of bases. 2.2. Wavelet
bases
The wavelet transform42,43'93'94,106 can be considered as a smooth partition of the frequency axis. The signal is first decomposed into low and high frequency bands by the convolution-subsampling operations with the pair consisting of a "lowpass" filter {hk}%~Q anc ^ a "highpass" filter {gk}kIo directly on the discrete time domain. Let / = {/fc}]^ 1 be a real-valued vector of even length K. Let H and G be the convolution-subsampling operators using these filters which are defined as: L-l
(Hf)k = £ hihk+l, (=0
L-l
(G/)fc = £
9lhk+i,
1=0
for fc = 0 , 1 , . . . , K — 1. Because of the periodic boundary condition on / (whose period is K), the filtered sequences Hf and Gf are also periodic with period K/2. Their adjoint operations (i.e. upsampling-anticonvolution) H* and G* are defined as
0
0
282
N. Saito
for k = 0 , 1 , . . . , 2K — 1. The operators H and G are called (perfect reconstruc tion) quadrature mirror filters (QMFs) if they satisfy the following orthogonal ity (or perfect reconstruction) conditions:
HG*=GH'=0,
and
H'H + G*G = I,
where I is the identity operator. These conditions impose some restrictions on the filter coefficients {hk} and {gk}- Let TUQ and mi be the bounded periodic functions defined by L-\
h-\
h K
mo(0 = X) ^
•
m
i (0 = Efl*c<*•
Daubechies proved in Ref. 42 that H and G are QMFs if and only if the following matrix is unitary for all { g R : ™o(0
rnQ(Z + Tr) \
"*l(0
"li(£ + 7T) /
Various design criteria (concerning regularity, symmetry etc.) on the lowpass filter coefficients {hi} can be found in Ref. 43. Once {hk} is fixed, we can have QMFs by setting gk = (-l) fc /iL-i-fc. This decomposition (or expansion, or analysis) process is iterated on the low frequency bands and each time the high frequency coefficients are retained. At the last iteration, both low and high frequency coefficients are kept. In other words, let x = {xk}^Zo G K n be a vector to be expanded. Then, the convolution-subsampling operations transform the vector x into two subse quences Hx and Gx of lengths n/2. Next, the same operations are applied to the vector of the lower frequency band Hx to obtain H2x and GHx of lengths n/4. If the process is iterated J (< no) times, we have the discrete wavelet coefficients (Gx, GHx, GH2x,..., GHJ~xx, HJx) of length n. As a result, the wavelet transform analyzes the data by partitioning its frequency content dyadically finer and finer toward the low frequency region (i.e. coarser and coarser in the original time domain). If we were to partition the frequency axis sharply using the characteristic functions (or box-car functions), then we would have ended up the so-called Shannon (or Littlewood-Paley) wavelets, i.e. the difference of two sine func tions. Clearly, however, we cannot have a finite-length filter in the time do main in this case. The other extreme is the Haar basis which partitions the
Local Feature Extraction and Its Applications
283
frequency axis quite badly but gives the shortest filter length (L = 2 with ho = h\ = l/\/2) in the time domain and which are suitable for describing discontinuous functions. The QMFs described in detail in Ref. 43 essentially bridge between these two extreme cases; see Fig. 1.1 of Sec. 1. The reconstruction (or synthesis) process is also very simple: starting from the two lowest frequency bands HJx and GHJ~1x, the adjoint operations are applied and added to obtain HJ~1x — H*HJx + G*GHJ~1x. This process is iterated to reconstruct the original vector x. The computational complexity of the decomposition and reconstruction process is in both cases 0(n) as easily seen. Because of the perfect reconstruction condition on H and G, each decom position step is also considered as a decomposition of the vector space into mutually orthogonal subspaces. Let flo,o denote the standard vector space R n . Let Qito and fi^i be mutually orthogonal subspaces generated by the applica tion of the projection operators H and G respectively to the parent space fio.oThen, in general, j t h step of the decomposition process can be written as ftj.o = fy+1,0 © fij+i,i for j = 0 , 1 , . . . , J. It is clear that dimfi^. = 2 n ° - J . The wavelet transform is simply a way to represent Qo,o by a direct sum of mutually orthogonal subspaces,
and the decomposition process is illustrated by the following figure: We can construct the basis vector Wj,k,i € ^ j , * * fc = 0,1, at "scale" j and "position" I {0 < I < 2n°-j,l e Z) simply by putting (GkHj-kx)i = Siti, where 6ij denotes the Kronecker delta, setting all the coefficients except j at the finer scales to zero, and synthesizing x = Wj,k,i by the reconstruction algorithm. Using these basis vectors, we can express the wavelet transform in a vector-matrix form as
a =
WTx,
where a 6 R" contains the wavelet coefficients and W £ R n X n is an orthogonal matrix consisting of column vectors Wjtk,i- This basis vector has the following important properties:
284
N. Saito nOl.l
^o,o-
r-n2,i
Fig. 2.1. A decomposition of fio.o into the mutually orthogonal subspaces using the wavelet transform (with J = 3). The symbols in bold font represent the subspaces kept intact by the wavelet transform.
vanishing
moments: n-1
^2 imwjtk,i(i)
= 0 for m = 0 , 1 , . . . , M - 1.
t=0
The higher the degrees of vanishing moments the basis has, the better it compresses the smooth part of the signal. In the original construction of Daubechies, 42 it turns out that L = 2M. There are several other possibilities. One of them is a family of the so-called "coifiets" with 43 L = 3M which are less asymmetric than the original wavelets of Daubechies. • regularity. |t0j,fc,{(z + 1) - Wjtk,i(i)\ ^
C
2_J",
where c > 0 is a constant and p > 0 is called the regularity of the larger the value of p is, the smoother the basis vector becomes. may be important if one requires high compression rate since the basis vectors become "visible" in those cases and one might fractal-like shapes in the compressed signals/images. 122 • compact support: wj
wavelets. The This property the shapes of want to avoid
for i £ [2jl, 2jl + (2j - 1)(L - 1)].
The compact support property is important for efficient and exact numerical implementation. Due of these properties, wavelet bases generate very efficient and simple representations for piecewise smooth signals and images.
Local Feature Extraction and Its Applications
2.3.
Wavelet
packet
285
bases
For oscillating signals such as acoustic signals, however, the analysis by the wavelet transform is not always efficient because it only partitions the fre quency axis finely toward the low frequency. The wavelet packet trans form31'32'105'157 is a generalized version of the wavelet transform: it decom poses even the high frequency bands which are kept intact in the wavelet transform. Examples of the wavelet packet basis vectors were already shown in Fig. 1.1. They are much more oscillatory compared to the wavelet basis vectors. The first level decomposition generates Hx and Gx just like in the wavelet transform. The second level generates four subsequences, H2x, GHx, HGx and G2x. If we repeat this process for J times, we end up having Jn expansion coefficients. It is easily seen that the computational cost of this whole process is about O(Jn) < 0 ( n l o g 2 n ) . This iterative process naturally generates subspaces of R" of a binary tree structure where the nodes of the tree represent subspaces with different frequency localization characteristics. The root node of this tree is again fio.o- The node ilj^ splits into two orthogonal subspaces ftj+i^t and £lj+i,2k+i by the operators H and G, respectively: fij.fc = fii+i,2fc © nj+1}2k+i
for j = 0,1,...,J,
k = 0,...,2j
-1.
The following figure shows the binary tree of the subspaces of flo.o: Clearly, we have a redundant set of subspaces in the binary tree. In fact, it is easily proved that there are more than 2 2 possible orthonormal bases in this binary tree; see e.g. Ref. 157. This binary tree is our main tool in this thesis: Definition 2.1. A dictionary of orthonormal bases D for R" is a binary tree if it satisfies: (a) Subsets of basis vectors can be identified with subintervals of I = [0, n) of theform/j.fc = ^"^k^^-^k + l)), for j = 0 , 1 , . . . , J, k = 0 , 1 , . . . , 2 ' 1, where J
Qj+i,2k®^j+i,2k+i-
Each subspace Cljtk is spanned by 2"° _ J basis vectors {wj,k,i}2=o _ 1 - In the wavelet packet dictionary, the parameters fc and I roughly indicate
286
N. Saito
frequency bands c and the location of the center of wiggles, respectively: the vector Wjtk,i is roughly centered at 2H, has length of support « 2 J , and oscil lates RS k times. Note that for j = 0, we have the standard Euclidean basis of R n . By specifying a pair of QMFs, we obtain one dictionary which contains a large number of orthonormal bases of R n : we have a large number of coor dinate systems to "view our signals" at our disposal. An important question is how to select the best coordinate system efficiently for the problem at hand from this dictionary. This is the main theme of this thesis. 2.4. Local trigonometric
bases
Local trigonometric transforms 3,33,105 ' 157 or lapped orthogonal transform 99 ' 100 can be considered as conjugates of wavelet packet transforms: they partition the time axis smoothly. In fact, Coifman and Meyer 33 showed that it is possible to partition the real-line into any disjoint intervals smoothly and construct orthonormal bases on each interval. Each basis function on an interval uses the signal values on the interval itself and on the adjacent intervals; hence it is named the "lapped" orthogonal transform. It is also "local" since this essentially performs the Fourier analysis on the short intervals. Figure 1.1 in Sec. 1 clearly showed this capability. Since it partitions the time axis smoothly, these local cosine and sine transforms (LCT/LST), have less edge (or blocking) effects than the conventional discrete cosine/sine transforms (DCT/DST). For definiteness we use a particular symmetric window (or "bell") function s i n - ( l + sin7rf)
ii--
0
otherwise.
This function generates a partition of unity: oo
^2
&2(* + k) = 1
for any t G E .
fc= — oo
This function is also symmetric about t = 1/2, smooth on (—1/2,3/2) with vanishing derivatives at the boundary points, so that it has a continuous deriva tive on R; see Ref. 157 for more general bell functions. Now let us consider c T h e original binary tree generated by successive applications of H and G is called "Paley ordered" and the frequency band of fl,*. is not monotonically increasing as a function of k. This behavior is corrected by the so-called "Gray code" permutation; see Ref. 157 for details.
Local Feature Extraction and Its Applications
287
the interval of integers I = { 0 , 1 , . . . ,n — 1}. The lapped orthogonal func tions are mainly supported on I but also take some values on {—n/2,..., — 1} and { n , . . . ,3n/2 - 1}. For integers k £ / , we can define the local sine basis functions on the interval / :
By replacing sin by cos, we obtain the local cosine basis function Ck(l) as well. Apart from the bell function factor, these are exactly the basis functions for the so-called DST-IV (and DCT-IV for cosines).121 The orthogonality condition certainly holds since 3n/2-l
J2
Sk(l)Sk,(l) = 5k,k,.
l=-n/2
The bell function allows sines on adjacent intervals to overlap while remain ing orthogonal. For example, the function Sk(l + n) is centered over the range I £ {—n,—n+ 1,...,—1} and overlaps with the function Sk<(l) at I £ {—n/2, —n/2 + 1 , . . . , n/2 — 1}. Yet these two are orthogonal since n/2
^2
Sk(l + n)Sk.(l) = 0
for any k,k' 6 { 0 , 1 , . . . , n - 1} .
J=-n/2
In the numerical implementation of these transforms, we should not calculate the actual inner products of data and the basis functions. Instead, we should preprocess the data so that the standard DST-IV (or DCT-IV for LCT) algo rithms may be used. This is accomplished by "folding" the overlapping parts of the bells back into the interval. This folding can be transposed onto the data and results in the disjoint intervals of samples over which the DST-IV al gorithms can be applied. These folded disjoint intervals can also be "unfolded" to produce smooth overlapping segments. The details of these operations can be found in Ref. 3, 156 and 157. Since we can segment an interval into arbitrary disjoint intervals, it is nat ural to segment the whole interval into dyadic subintervals recursively. This segmentation makes the number of signal samples contained in each subinterval a dyadic number (2 n ° - - 7 at step j) if the length of the original signal is also a dyadic number (2"°); this enables one to utilize the fast DCT/DST-IV algorithm. 121 By this segmentation, the original interval / = [0,n) is split into
288 N. Saito
[0,n/2) and [n/2,n), and each subinterval is further split into two in a recur sive manner. Let us set Io,o = I and let Ijtk be a subinterval of / after j t h iteration of the splitting process. Then we have a familiar relation /,-.* = I.j+l,2fc U / j + i i 2 *+i
for j = 0 , 1 , . . . , J, k = 0 , 1 , . . . , 7? - 1.
Now we can consider the subspaces Qj^ associated with the interval Ijtk • Then we obtain a binary tree of the subspaces with the same tree structure as the one shown in Fig. 2.2. Each subspace is spanned by the basis vectors
r^2
rfil.l-J -^2
^o,o-
Lfl i,o-
u^o-T3'1
Fig. 2.2. A decomposition of fio.o into the tree-structured subspaces using the wavelet packet transform (with J = 3).
/ \ Wjkdm) =
,
1
Jm+l/2 =b[ — -.
k
\
. / / l\\m + l/2 sin In ll + — '-.
k
1\
,
for LST. The LCT version can be obtained in an obvious manner. Note that the triplet (j, k, I) now corresponds to scale, location (or window index) and fre quency, respectively. For j = 0, this reduces to a simple DCT/DST. Hence, we can obtain two additional dictionaries of orthonormal bases using LCT/LST. The computational complexity to obtain this dictionary (or expanding a sig nal into this dictionary) is about 0(n[log 2 n] 2 ); see e.g. Ref. 157. Once the dictionary of orthonormal bases is obtained, the question is again how to select the best basis for the problem at hand from the collection of bases.
Local Feature Extraction and Its Applications
289
In Sec. 2.5, we review the best basis algorithm of Coifman-Wickerhauser for signal compression. 2.5. Selection of a "best basis" from a library of orthonormal bases 2.5.1. Information cost functions An efficient coordinate system for representing a signal should give large mag nitudes along a few axes and negligible magnitudes along most axes when the signal is expanded into the associated basis. We then need a measure to eval uate and compare the efficiency of many bases. Let J denote this measure which is often called "information cost" function. There are several choices for J; see e.g. Chap. 9 of Ref. 63 and Refs. 112 and 157. All of them essentially measure the "energy concentration" of the coordinate vector. A natural choice for this measure is the Shannon entropy of the coordinate vector. 35 ' 157 Let us define the entropy of a non-negative sequence p = {p{} with YliPi — 1 by tf(P) = - £ p i l ° g 2 P i ,
(2.1)
t
with the convention 0 • log 0 = 0. (From now on, we use "log" for the logarithm of base 2.) For a signal x, we set pi = (|xj|/||x|| r ) r where || • || r is the C norm and 1 < r < oo and define
Often r = l o r r = 2 i s used. In this thesis, we always use r = 2. 2.5.2. Best basis selection from a dictionary of orthonormal bases The "best basis" algorithm of Coifman and Wickerhauser 35 was developed mainly for signal compression. This method first expands a given single signal into a specified dictionary of orthonormal bases. Then a complete basis called a best basis (BB) which minimizes a certain information cost function such as entropy (2.2) is searched in this binary tree using the divide-and-conquer algorithm. More precisely, let Bjtk denote a set of basis vectors belonging to the subspace fi^* arranged as a matrix B
i,k = {Wj,k,0, ■■-, ^i,fc,2»o-i-l) T •
(2.3)
290
N. Saito
Now let Ajtk be the BB for the signal x restricted to the span of Bjik and let J be an information cost function measuring the goodness of nodes (subspaces) for compression. The following BB algorithm essentially "prunes" this binary tree by comparing efficiency of each parent node and its two children nodes: Algorithm 2.2 (The Best Basis Algorithm 3 5 ). Given a vector x, Step 0: Choose a dictionary of orthonormal bases 2) (i.e. specify QMFs for a wavelet packet dictionary or decide to use either the local cosine dictionary or the local sine dictionary) and specify the maximum depth of decomposition J and an information cost J. Step 1: Expand {Bj,kX}0<j<j,
x
into
the dictionary
2) and obtain
coefficients
0
Step 2: Set AJtk = BJ
if J(Bjtkx)
Aj,k = <
<
J(Aj+i}2kxUAj+i>2k+ix), Aj+i,2k © Aj+i^k+i
(2.4)
otherwise.
To speed up this algorithm, the cost functional J needs to be additive: Definition 2.3. A map J from sequences {x(\ to R is said to be additive if J ( 0 ) = 0 a n d J({Si}) = £ i . 7 ( * i ) Thus, if J is additive, then in (2.4) we have J(Aj+i<2kX
U Aj+ifik+ix)
= J{Aj+i<2kx)
+
J{Aj+i<2k+ix).
This implies that a simple addition suffices instead of computing the cost of union of the nodes. Although (2.1) is additive with respect to p , Hr(x) is not additive with respect to x. But it is easy to show that minimizing the additive measure hr(x)^-J2\xi\rlog\Xi\r (2.5) t
implies minimizing Hr(x) since Hr(x) = hr(x)/\\x\\^. + log ||x||£.
Local Feature Extraction and Its Applications
291
With the additive information cost function, we have the following propo sition: Proposition 2.4 (Coifman & Wickerhauser 3 5 ). Algorithm 2.2 yields the best basis relative to J if J is additive. See Refs. 35 and 157 for the proof. The computational complexity of computing the best basis from a dictio nary is O(nlogn) for a wavelet packet dictionary and 0(n[logn] 2 ) for a local trigonometric dictionary; it is dominated by the expansion of a signal into the dictionary and the cost for searching the BB is about 0{n) because of the use of the divide-and-conquer algorithm. The reconstruction of the original vector from the BB coefficients has the same computational complexity. 2.5.3. Best basis selection from a library of orthonormal bases We now consider a "meta" algorithm for the best basis selection. Definition 2.5. A library of orthonormal bases for R n is a collection of the dictionaries of orthonormal bases for R". This library of bases is more adaptable and versatile for representing various transient signals than a single dictionary of bases. For example, if the signal consists of blocky functions such as acoustic impedance profiles of subsurface structure, the Haar-Walsh dictionary captures those discontinuous features ac curately and efficiently. If the signal consists of piecewise polynomial functions of order p, then the Daubechies wavelets/wavelet packets with filter length L > 2(p + 1 ) or the coiflets with filter length L > 3(p + 1) would be efficient because of the vanishing moment property. If the signal has a sinusoidal shape or highly oscillating characteristics, the local trigonometric bases would do the job. Moreover, computational efficiency of this library is also attractive; the most expensive expansion in this library, i.e. the local trigonometric expansion, costs about 0(n[logn] 2 ). How can we choose the best dictionary from this library? The strategy of Coifman and Majid 30 is very simple: pick the one giving the minimum entropy among them. (We note that the purpose of Ref. 30 is not the compression but the noise removal. More on this aspect in Sec. 3.) More precisely, let £ — { S i , . . . , 2 ) M } denote a library of orthonormal bases where S m represents
292 N. Saito
a dictionary of orthonormal bases. For each dictionary 25 m , the BB Q5m of the signal a; is computed by Algorithm 2.2. This generates M different sets of the expansion coefficients { a m } " = 1 of the signal. For each expansion coefficient set, entropy h
basis
To compress a given set of signals { a j j } ^ C ^ C R " rather than a single sig nal, one of the well-known traditional methods is the Karhunen-Loeve trans form (KLT). 1,63,112 - 150 Let X denote the data matrix: X = (xi,...,xN) G RnxN. Then, the Karhunen-Loeve basis (KLB) for a finite number of realvalued discrete signals of finite lengths is denned as the eigenvectors of the symmetric positive definite matrix called the sample autocorrelation matrix
The KLB satisfies a number of optimality criteria, and in particular, it is the minimum entropy basis in R" among all the orthonormal bases associated with orthogonal transformations in 0(n).150 To measure the efficiency of the coordinate systems for a set of signals, the entropy of the total energy distri bution to the coordinate axes is used. This energy distribution is defined as the normalized diagonal vector of the sample autocorrelation matrix: 7
±
*
diag(fl x )
||diag(/fr)ir
Thus, 7 x (i) represents the energy distribution to the ith coordinate axis of the signals in the original basis which is normally either the standard Euclidean ba sis or the discrete trigonometric basis. Then, i/(7x)> the entropy (2.1) for the vector 7 x is well-defined. Now we rotate the axes or apply an orthogonal trans formation U € 0(n) to the set of signals. This generates a set of transformed signals j/j = iP-Xi for i — 1 , . . . , N. Let Y = UTX. Then, the energy distribu tion of the new coordinate system is given by yy = diag(i?y) = diag(i?(/Tx)S. Watanabe proved the following theorem: Theorem 2.6 (Watanabe 1 5 0 , 1 5 2 ). The orthogonal matrix U € 0(n) is the Karhunen-Loeve basis UKL if and only if
Local Feature Extraction and Its Applications
^(7t/TtX) =
293
(/nuni)H(7t/TX).
The main problem of the KLB is its computational cost 0 ( n 3 ) for diagonalizing Rx- In fact, its dependence on the eigenvalue system creates more problems; the sensitivity to the alignment of the signals and difficulty in cap turing local features in the signals. Wickerhauser proposed a method to overcome these problems of the KLB using the "best basis paradigm," which is an extension to the BB met hod. 155 ' 157 Let us fix a dictionary 2). Then, the idea is to use the energy distribution of the set of the signals to the coordinate axes in 2) by computing ^2i=i(wJ,k,ix*)2 f° r each (j, k,I) and organize them into a binary tree so that the divide-and-conquer algorithm can search a basis minimizing the entropy of the energy distribution from the tree-structured subspaces corresponding to the tree of energy distribution. Such a BB is called the joint best basis (JBB) for { x j } ^ ! . In this thesis, we will also use the term "best basis" as a JBB for simplicity. The following is the algorithm for obtaining such a BB derived by Wickerhauser. We also show the computational cost at each step. Algorithm 2.7 (The Joint Best Basis Algorithm 1 5 5 ). signals {xi}?=1,
Given a set of
Step 0: Choose a dictionary of orthonormal bases I) (i.e. specify QMFs for a wavelet packet dictionary or decide to use either the local cosine dictionary or the local sine dictionary) and specify maximum depth of decomposition J and an additive information cost J. Step 1: Expand {xi}^
into the dictionary 2); 0{Nn log n).
Step 2: Sum the squares of the expansion coefficients into the tree of energy distribution; 0(n log n). Step 3: Search the tree for a JBB;
0(n).
Step 4: Sorting the BB vectors in decreasing order of importance; 0(n log n). Step 5: Use the top k (< n) BB vectors for representing the signals. One can further apply the KLT to the top k BB vectors to compress more with additional cost 0(k3); however, since a good amount of compression is usually achieved at Step 5, this final KLT may not be necessary.
294
N. Saito
We note that a JBB as well as a KLB can also be computed after subtracting the mean {1/N) J2i=i x « fr°m e a c n signal. The entropy criterion used in the BB algorithm is good for signal compres sion; however, it may not be necessarily good for other problems. We extend this algorithm for classification in Sec. 4 and for regression in Sec. 5. 2.7. Extension
to
images
For images or multi-dimensional signals, we can easily extend the above algo rithms by using the multi-dimensional version of the wavelets, wavelet packets, and local trigonometric transforms. In this section, we briefly summarize the two-dimensional (2-D) versions of these transforms. For the 2-D wavelets, there are several different approaches. The first one, which we call the se quential method, is the tensor product of the one-dimensional (1-D) wavelets: applying the wavelet expansion algorithm separately along two axes t\ and t?. corresponding to column (vertical) and row (horizontal) directions respectively. Let x € R"i x n 2 and Hi,Gi be the 1-D convolution-subsampling operators de fined on matrices along axis U,i = 1,2. Then this version of the 2-D wavelet transform first applies the convolution-subsampling operations along the t\ axis to obtain x\ = {G\x, G\H\x,..., GiHl1x), then applies the convolutionsubsampling operations along the <2 axis to get the final 2-D wavelet coeffi cients (G2Xx,G2H2X\,... ,G2H23X\) of length ni x 712, where Jj (< lognj) and J2 (< log 712) are maximum levels of decomposition along t\ and £2 axes respectively. We note that one can choose different 1-D wavelet bases for ti and *2 axes independently. Given M different QMF pairs, there exist M 2 possible 2-D wavelets using this approach. The second approach is the basis generated from the tensor product of the multiresolution analysis. This decomposes an image x into four different sets of coefficients, H1H2X, GiH2x, HiG2x and G1G2X, corresponding to "low-low", "high-low", "low-high", "high-high" frequency bands of the image, respectively. The decomposition is iterated on the "low-low" frequency band and this ends up in a "pyramid" structure of coefficients. Transforming the digital images by these wavelets to obtain the 2-D wavelet coefficients are described in detail in e.g. Refs. 43 and 94. There are also 2-D wavelet bases which do not have a tensor-product struc ture, such as wavelets on the hexagonal grids and wavelets with matrix dila tions; see e.g. Refs. 65 and 84 for details.
Local Feature Extraction and Its Applications
295
There has been some argument as to which version of the 2-D wavelet bases should be used for various applications. 13 ' 43 Our strategy toward this problem is this: we can put as many versions of these bases in the library as we can afford it in terms of computational resources. Then we select the best possible basis for the problem at hand. As for the 2-D version of the wavelet packet transform, the sequential method may be generalized, but it is not easily interpreted; the 1-D BBs may be different from column to column so that the resultant coefficients viewing along the row direction may not share the same frequency bands and scales unlike the 2-D wavelet bases. This also makes the reconstruction algorithm complicated. Therefore, we should use the other tensor-product 2-D wavelet approach for the construction of the 2-D wavelet packet BB: we recursively decompose not only the "low-low" frequency band but also the other three fre quency bands. This process produces the "quad-tree" structure of subspaces instead of the "binary-tree" structure for 1-D wavelet packets. The 2-D version of the local trigonometric transforms can be constructed similarly: the original image is smoothly folded and segmented into four subimages recursively, and in each subimage the separable DCT/DST-IV is applied. This process results in the quad-tree structure of the subspaces. Thus, these 2-D wavelet packet transforms and local trigonometric trans forms generate dictionaries of orthonormal bases of the quad-tree structure. The 2-D BB can be selected in the same manner as the 1-D version. As for the computational complexity for an image of n = n\ x « 2 pixels, it costs ap proximately 0(n), 0 ( n l o g 4 n ) , 0(n[log 4 n] 2 ) to compute a 2-D wavelet, a 2-D wavelet packet BB, a 2-D local trigonometric BB, respectively; see Refs. 156 and 157 for the details of the 2-D version of the BB algorithms. 3. A Simultaneous Noise Suppression and Signal Compression Algorithm 3.1.
Introduction
This section attempts to bridge between the "best basis paradigm" and Rissanen's minimum description length (MDL) principle. 128 In particular, we pro pose an algorithm to simultaneously reduce random noise in signals and com press the "structural" component of the signals using a library of orthonormal bases and the MDL criterion.
296
N. Saito
Wavelet transforms and their relatives have given a major impact on data compression field; see e.g. Refs. 17, 49, 73, 138 and 156. Meanwhile, several researchers have claimed that wavelets and their relatives are also useful for reducing noise in (or denoising) signals/images. 30 ' 36 ' 51 ' 53 ' 92 ' 97 In this section, we take advantage of both sides: we show that compression of signals leads to random noise suppression in the signals. In other words, we try to "kill two birds with one stone." Throughout this section, we consider a simple degradation model: observed data consists of a signal component and additive white Gaussian noise although it is possible to extend to more general noise models. The key motivation here is that the structural component in signals can often be efficiently represented by one or more of the bases in the library whereas the noise component cannot be represented efficiently by any basis in the library. The use of the MDL criterion frees us from any subjective parameter setting such as threshold selection. This is particularly important for real field data where the noise level is difficult to obtain or estimate a priori. This section is organized as follows. In Sec. 3.2, we formulate our problem. We view the problem of simultaneous noise suppression and signal compres sion as a model selection problem out of models generated by the library of orthonormal bases. In Sec. 3.3, we review the MDL principle which plays a critical role in this thesis. We also give some simple examples to help to under stand its concept. In Sec. 3.4, we develop an actual algorithm of simultaneous noise suppression and signal compression. We also give the computational complexity of our algorithm. In Sec. 3.5, we apply our algorithm to several geophysical data sets, both synthetic and real, and compare the results with other competing methods. We discuss the connection of our algorithm with other approaches in Sec. 3.6. 3.2. Problem
formulation
Let us consider a discrete degradation model y = x + e, where y,x,e € R n and n = 2"°. The vector y represents the noisy observed data and x is the unknown true signal to be estimated. The vector e is white Gaussian noise (WGN), i.e. e ~ A/"(0, o21). Let us assume that a2 is unknown. We now consider an algorithm to estimate x from the noisy observation y. First, we prepare the library of orthonormal bases for x described in Sec. 2.
Local Feature Extraction and Its Applications
297
In this section, we use the library
x = Wma£>,
(3.1)
where Wm e R n x n is an orthogonal matrix whose column vectors are the basis elements of ! 8 m , and a£? e R" is the vector of expansion coefficients of x with only k nonzero coefficients. At this point, we do not know the actual value of k and the basis 53 m . We would like to emphasize that in reality the signal x might not be strictly represented by (3.1). We regard (3.1) as a model at hand rather than a rigid physical model exactly explaining x and we will try our best under this assumption. (This is often the case if we want to fit polynomials to some data.) Now the problem of simultaneous noise suppression and signal compression can be stated as follows: find the "best" k and m given the library £. In other words, we translate the estimation problem into a model selection problem where models are the bases !B m and the number of terms k under the additive WGN assumption. For the purpose of data compression, we want to have A: as small as possible. At the same time, we want to minimize the distortion between the estimate and the true signal by choosing the most suitable basis 2$ m , keeping in mind that the larger k normally gives smaller value of error. How can we satisfy these seemingly conflicting demands? 3.3. The minimum
description
length
principle
To satisfy the above-mentioned conflicting demands, we need a model selec tion criterion. One of the most suitable criteria for our purpose is the so-called Minimum Description Length (MDL) criterion proposed by Rissanen. 126-128 The MDL principle suggests that the "best" model among the given collection of models is the one giving the shortest description of the data and the model itself. For each model in the collection, the length of description of the data
298
N. Saito
is counted as the code length of encoding the data using that model in binary digits (bits). The length of description of a model is the code length of spec ifying that model, e.g., the number of parameters and their values if it is a parametric model. To help to understand what "code" or "encoding" means, we give some simple examples. We assume that we want to transmit data by first encoding (mapping) them into a bitstream by an encoder, then receive the bitstream by a decoder, and finally try to reconstruct the data. Let L{x) denote the code length (in bits) of a vector x of deterministic or probabilistic parameters which are either real-valued, integer-valued, or taking values in a finite alphabet. E x a m p l e 3 . 1 . Code length of symbols drawn from a finite alphabet. Let x = (xi, X2, • •., xn) be a string of symbols drawn from a finite alphabet X, which are independently and identically distributed (i.i.d.) with probability mass function p(x), x e X. In this case, clearly the frequently occurring symbols should have shorter code lengths than rarely occurring symbols for efficient communication. This leads to the so-called Shannon code 40 whose code length (if we ignore the integer requirement for the code length) can be written as L(x) = — logp(i) for x £ X. The Shannon code has the shortest code length on the average, and satisfies the so-called Kraft inequality 40 : ^-2-L(x)<1)
( 3 2 )
xex
which is necessary and sufficient for the existence of an instantaneously decodable code, i.e. a code such that there is no codeword which is the prefix of any other codeword in the coding system. The shortest code length on the average for the whole sequence x becomes n
n
L{x) = J2 L(xi) = - 5 3 logp(xi). i=l
i=l
E x a m p l e 3.2. Code length of deterministic integers. For a deterministic parameter j € Z n = ( 0 , l , . . . , n — 1) (i.e., both the encoder and decoder know n), the code length of describing j is written as
Local Feature Extraction and Its Applications
299
L(j) = logn since logn bits are required to index n integers. This can also be interpreted as a code length using Shannon code for a sample drawn from the uniform distribution over ( 0 , 1 , . . . , n — 1). Example 3.3. Code length of an integer (universal prior for an integer). Suppose we do not know how large a natural number j is. Rissanen 126 proposed that the code of such j should be the binary representation of j , pre ceded by the code describing its length logj, preceded by the code describing the length of the code for log j , and so forth. This recursive strategy leads to L*(j) - log* j + logco = logj + loglogj H
h logco,
where the sum involves only the non-negative terms and the constant co « 2.865 064 which was computed so that equality holds in (3.2), i.e. Yl'jLi 2~L U) = i. This can be generalized for an integer j by denning fl
if j" = 0,
L'{j) = { y log* \j | + log 4co
(3.3) otherwise.
(We can easily see that (3.3) satisfies £ ^ - o o 2~L'W = 1.) The following two examples are important for pruning the tree-based clas sification and regression rules used in Sees. 4-6; see Appendix A for the details. Example 3.4. Code length of a binary string with specified number of 1 's. Let a; be a binary string of length n containing exactly k l's. Then, we must describe: (a) the integer k which requires log(n +1) bits since 0 < fc < n, and (b) the index of this string in the list of all possible strings of length n with k l's, which costs log (£) bits. 117 ' 128 Hence the total description length is L(n, k) = log(n + 1) + log Q
= log ± ± 1 L
(3.4)
bits. Notice that L(n,k) does not depend on the position of l's but just on the number of l's. This encoding scheme wins over the obvious code (i.e., just sending it as it is, which costs n bits) when k is small. As a generalization of this example,
300
N. Saito
E x a m p l e 3.5. Code length of a d-ary string with specified numbers of symbols. Let x be a string of symbols from d-ary alphabet, say, { 0 , 1 , . . . , d — 1} and let n be the length of x and let n* be the number of occurrences of symbol i in x, i.e., n = no H h rid-i- To encode this string, we need to specify: (a) the description length of numbers of occurrences of symbols, ( n o , . . . , n ^ - i ) , which requires l o g ( " ^ ~ 1 ) bits (think of how to assign n apples to d children), and (b) the index of this string x in the list of all possible strings with no 0's, . . . and n<j_i (d — l)'s, which needs log( w " ). Thus, we need L(n;no,...,nd_i) = l o g ( n
) + log (
"
)
Let us now turn to real-valued parameters: E x a m p l e 3.6. Code length of a truncated real-valued parameter. For a deterministic real-valued parameter v € R, the exact code generally requires infinite length of bits. Thus, in practice, some truncation must be done for transmission. Let 6 be the precision and vs be the truncated value, i.e. \v — vg\ < 6. Then, the number of bits required for vs is the sum of the code length of its integer part [v] and the number of fractional binary digits of the truncation precision 6, i.e. L(v5) = L'([v})+\og(l/6).
(3.6)
Having gone through the above examples, now we can state the MDL prin ciple more clearly. Let M. = {6m : m = 1,2,...} be a class or collection of models at hand. The integer m is simply an index of a model in the list. Let x be a sequence of observed data. Assume that we do not know the true model 6 generating the data x. As in Refs. 110 and 128, given the index m, we can write the code length for the whole process as L(x, 6m, m) = L{m) + L(0m\m) + L(x\Om, m).
(3.7)
This equation says that the code length to rewrite the data is the sum of the code lengths to describe: (i) the index m, (ii) the model 6m given m, and
Local Feature Extraction and Its Applications
301
(iii) the data x using the model Om. The MDL criterion suggests picking the model 0m- which gives the minimum of the total description length (3.7). The last term on the right-hand side (R.H.S.) of (3.7) is the length of the Shannon code of the data assuming that the model 0m is the true model, i.e., L(x\em,m)
= - logp{x\6m,m),
(3.8)
and the maximum likelihood (ML) estimate 0m minimizes (3.8) by the defini tion: L(x\em,m) = - logp{x\6m,m) < - logp(x\0m,m). (3.9) We should consider a further truncation of 0m as shown in Example 3.6 above to check that additional savings in the description length is possible. The finer truncation precision we use, the smaller the term (3.9), but the larger the term L(0m\m) becomes. Suppose that the model 6m has km real-valued parameters, i.e. 0m = (0m,i.- • • ,0m,k„)- Rissanen showed in Refs. 126 and 128 that the optimized truncation precision (6") is of order l/i/n and min L(x,0m<s,m,
8)
6
= L(m) + L(0m,s- \m) + L{x\0m,s., « L(m) + V L'([9mij}) + ^
m) +
0{km)
logn + L(x\0m,m)
+ 0(fc m ),
(3.10)
where 0m is the optimal non-truncated value given m, 0m,&- is its optimally truncated version, and L*() is defined in (3.6). We note that the last term 0(km) in the approximation in (3.10) includes the penalty code length neces sary to describe the data x using the truncated ML estimate 0m,S' instead of the true ML estimate 0m. In practice, we rarely need to obtain the optimally truncated value 0m,s- and we should compute 0m up to the machine precision, say, 1 0 - 1 5 , and use that value as the true ML estimate in (3.10). For suffi ciently large n, the last term may be omitted, and instead of minimizing the ideal code length (3.7), Rissanen proposed to minimize MDL(x,0m,m)
= L{m) + V t ' ( [ 9
m
J + ^ l o g n + L(x\0m,m).
(3.11)
The minimum of (3.11) gives the best compromise between the low complexity in the model and high likelihood on the data.
302
N. Saito
The first term on the R.H.S. of (3.11) can be written as L(m) = - logp(m),
(3.12)
where p(m) is the probability of selecting m. If there is prior information about m as to which m is more likely, we should reflect this in p{m). Otherwise, we assume that each m is equally likely, i.e., p(m) is a uniform distribution. Remark 3.7. Even though the list of models M does not include the true model, the MDL method achieves the best result among the available models. See Barron and Cover7 for detailed information on the error between the MDL estimate and the true model. We would also like to note that the MDL principle does not attempt to find the absolutely minimum description of the data. The MDL always requires an available collection of models and simply suggests picking the best model from that collection. In other words, the MDL can be considered as an "oracle" for model selection. 110 This contrasts with the algorithmic complexities such as the Kolmogorov complexity which gives the absolutely minimum description of the data; however, in general, it is impossible to obtain. 126 Before deriving our simultaneous noise suppression and signal compression algorithm in the context of the MDL criterion, let us give a closely related example: Example 3.8. A curve fitting problem using polynomials. Given n points of data (tj, yi) 6 R 2 , consider the problem of fitting a polynomial through these points. The model class we consider is a set of poly nomials of orders 0 , 1 , . . . , n — 1. In this case, 6m = (ao, a i , . . . , a m ) represents the m + 1 coefficients of a polynomial of order m. We also assume that the data are contaminated by the additive WGN with known variance a2, i.e., Vi = x(ti) + Et, where x(-) is an unknown function to be estimated by the polynomial models and £i ~ -A/"(0, a2). To invoke the MDL formalism, we pose this question in the information transmission setting. First we prepare an encoder which computes the ML estimate of the coefficients of the polynomial, ( a o , . . . , a m ) , of the given degree m from the data. (In the additive WGN assumption the ML estimate coincides with the least squares estimate.) This encoder transmits these m
Local Feature Extraction
and Its Applications
303
coefficients as well as the estimation errors. We also prepare a decoder which receives the coefficients of the polynomial and residual errors and reconstruct the data. (We assume that the abscissas {<<}"=! and the noise variance o-2 are known to both the encoder and the decoder.) Then we ask how many bits of information should be transmitted to reconstruct the data. If we used polynomials of degree n — 1, we could find a polynomial passing through all n points. In this case, we could describe the data extremely well. In fact, there is no error between the observed data and those reconstructed by the decoder; however, we do not gain anything in terms of data compression/transmission since we also have to encode the model which requires n coefficients of the polynomial. In some sense, we did not "learn" anything in this case. If we used the polynomial of degree 0, i.e. a constant, then it would be an extremely efficient model, but we would need many bits to describe the deviations from that constant. (Of course, if the underlying data is really a constant, then the deviation would be 0.) Let us assume there is no prior preference on the order m. Then we can easily see that the total code length (3.11) in this case becomes m
MDL(y,0m,m)
= logn + YiL*{[aj})
m 4-1
+ ——
i=l
\
logn
j=0
/
The MDL criterion suggests picking the "best" polynomial of order m* by minimizing this approximate code length. The MDL criterion has been successfully used in various fields such as signal detection, 153 image segmentation, 88 and cluster analysis 149 where the optimal number of signals, regions, and clusters, respectively, should be determined. If one knows a priori the physical model to explain the observed data, that model should definitely be used, e.g., the complex sinusoids in Ref. 153. In general, however, as a descriptor of real-life signals which are full of transients or edges, the library of wavelets, wavelet packets, and local trigonometric transforms is more flexible and efficient than the set of polynomials or sinusoids. 3.4. A simultaneous noise suppression signal compression algorithm
and
We carry on our development of the algorithm based on the information transmission setting as the polynomial curve fitting problem described in the
304
N. Saito
previous section. We consider again an encoder and a decoder for our prob lem. Given (k,m) in (3.1), the encoder expands the data y in the basis 03m, then transmits the number of terms k, the specification of the basis m, and k expansion coefficients, the variance of the WGN model a 2 , and finally the estimation errors. The decoder receives this information in bits and tries to reconstruct the data y. In this case, the total code length to be minimized may be expressed as the sum of the code lengths of: (i) two natural numbers (fc, m), (ii) (k + 1) realvalued parameters (ain\(T2) given (k,m), and (iii) the deviations of the ob served data y from the (estimated) signal a: = Wma.m given (k,m, a™ , o2). The approximate total description length (3.11) now becomes MDL(y,cx£\a2,k,m) = L(k,m) + L(a£\&2\k,m)
+ L(y\a£\a2,k,m),
(3.13)
where d i , and a2 are the ML estimates o f a L and a2, respectively. Let us now derive these ML estimates. Since we assumed the noise compo nent is additive WGN, the probability of observing the data given all model parameters is P(y\a^,a2,k,m) = (2na2)~^2exp ( -
fc^"t)
,
where || • || is the standard Euclidean norm on R n . For the ML estimate of a2, first consider the log-likelihood of (3.14) \np(y\a£\o2,k,m)
= - £ In W » -
lly
~ ^f^11' •
(3.15)
Taking the derivative with respect to a2 and setting it to zero, we easily obtain aa = i | | y - W m a W | | 2 .
(3.16)
Insert this equation back to (3.15) to get \np(y\a£\a2,k,m)
= -^ln(^\\y-WmaW\\2^
- \ .
(3.17)
Let ym = W^y denote the vector of the expansion coefficients of y in the basis Q3m. Since this basis is orthonormal, i.e., Wm is orthogonal, and we use the t2 norm, we have
(3.14)
Local Feature Extraction and Its Applications
\\y- W m a W | | 2 = | | W m ( W £ y - a « ) | | a = \\ym-aWf.
305
(3.18)
From (3.17), (3.18), and the monotonicity of the In function, we find that maximizing (3.17) is equivalent to minimizing llym-aWf.
(3.19)
Considering that the vector atm only contains k nonzero elements, we can easily conclude that the minimum of (3.19) is achieved by taking the largest k coefficients in magnitudes ofy m as the ML estimate of otm , i.e., &W
=
0«y m
=
0(*> (W£y),
(3.20)
where ©(fc) is a thresholding operation which keeps the k largest elements in absolute value intact and sets all other elements to zero. Finally, inserting (3.20) into (3.16), we obtain o2 = ± | | W £ y - e « W £ y | | a = ± | | ( I - e W ) W £ y | | 2 , (3.21) n n where I represents the n-dimensional identity operator (matrix). Let us further analyze (3.13) term by term. If we do not have any prior information on (k,m), then the cost L(k,m) is the same for all cases, i.e. we can drop the first term of (3.13) for minimization purpose. However, if one has some preference about the choice of basis by some prior information about the signal x, L(k,m) should reflect this information. For instance, if we happen to know that the original function x consists of a linear combination of dyadic blocks, then we should obviously use the Haar basis. In this case, we may use the Dirac distribution, i.e., pirn) = Smimo, where mo is the index for the Haar basis in the library £. By (3.12), this leads to
{
L(k)
if m = mo ,
+oo
otherwise.
On the other hand, if we either happen to know a priori or want to retain the number of terms k to satisfy fci < k < fa, then we may want to assume the uniform distribution for this range of k, i.e. L(k,m)=\
f L(m) + log(fc2 - h + 1) I +oo
if ki < k < k2 , ~ ~ otherwise.
(3.22)
306
N. Saito
As for the second term of (3.13), which is critical for our algorithm, we have to encode k expansion coefficients d m and
= h\ogn
+ c,
(3.23)
where c is a constant independent of (fc,m). Since the probability of observing y given all model parameters is given by (3.14), we have for the last term in (3.13) L(y|d m ),a 2 ,fc,m) = | l o g | | ( / - 0 ( f e ) ) ^ y | | 2 + c ' ,
(3.24)
where d is a constant independent of (k,m). Finally we can state our simultaneous noise suppression and signal compres sion algorithm. Let us assume that we do not have any prior information on (k, m) for now. Then, from (3.13), (3.23) and (3.24) by ignoring the constant terms c and c', our algorithm can be stated as: Pick the index (k*,m*) such that AMDL(k*,m*)=
min
(hhgn
+ J l o g \\(I - &^)W^y\\2)
.
(3.25)
l<m<M
Then reconstruct the signal estimate x = Wm.<x^:).
(3.26)
Let us call the objective function to be minimized in (3.25), the approximate MDL (AMDL) since we ignored the constant terms. Let us now show a typical
Local Feature Extraction and Its Applications
307
Fig. 3.1. Graphs of AMDL vs. k: AMDL (solid line) which is the sum of the (3/2)fclogn term (dotted line) and the (n/2) log (residual energy) term (dashed line).
behavior of the AMDL value as a function of the number of terms retained (k) in Fig. 3.1. (In fact, this curve is generated using Example 3.9 below.) We see that the log (residual energy) always decreases as k increases. By adding the penalty term of retaining the expansion coefficients, i.e. (3/2)fclogn (which is just a straight line), we have the AMDL curve which typically decreases for the small k, then starts increasing because of the penalty term, then finally decreases again at some large k near k = n because the residual error becomes very small. Now what we really want is the value of k achieving the minimum at the beginning of the fc-axis, and we want to avoid searching for k beyond the maximum occurring for k near n. So, we can safely assume that fci = 0 and &2 = n/2 in (3.22) to avoid searching more than necessary. (In fact, setting &2 > n/2 does not make much sense in terms of data compression either.) We briefly examine below the computational complexity of our algorithm. To obtain (fc*,m*), we proceed as follows: Step 1: Expand the data y into bases 951,..., 95 M. Each expansion (includ ing the BB selection procedure) costs about 0(n) for wavelets, 0(n log n) for wavelet packet BBs, and 0(n[logn] 2 ) for local trigonometric BBs.
308
N. Saito
Step 2: For fci
Examples
In this section, we give several examples to show the usefulness of our algo rithm. Example 3.9. The synthetic piecevnse constant function of stone.
Donoho-John-
We compared the performance of our algorithm in terms of the visual qual ity of the estimation and the relative £2 error with Donoho-Johnstone's method using the piecewise constant function used in their experiments. 53 The results are shown in Fig. 3.2. The true signal is the piecewise constant function with n = 2,048, and its noisy observation was created by adding the WGN se quence with ||a:||/||e|| = 7. The library £ for this example consisted of 18 different bases: the standard Euclidean basis of R n , the wavelet packet BBs created with D02, D04, . . . , D20, C06, C12, . . . , C30, and the local cosine and sine best bases (Dm represents the ro-tap QMF of Daubechies and Cm
Local Feature Extraction and Its Applications
-HJ—!f ».J U^^j
309
"^n
TJ-
i,j'
»"»■'
u,.„»,.»..,«
n^-
(c>
U
W>
■AJ
(e>
Jlmj 500
Lr
J
^L
u-
J
n_
1000
1500
2000
Fig. 3.2. Results for the synthetic piecewise constant function: (a) original piecewise constant function, (b) Noisy observation with (signal energy)/(noise energy) = 7 2 . (c) Estimation by the Donoho-Johnstone method using coiflets C06. (d) Estimation by the Donoho-Johnstone method using Haar basis, (e) Estimation by the proposed method.
represents the m-tap coiflet filter). In the Donoho-Johnstone method, we used the C06, i.e. 6-tap coiflet with two vanishing moments. We also specified the scale parameter J = 7, and supplied the exact value of a2. Next, we forced the Haar basis (D02) to be used in their method. Finally, we applied our algorithm without specifying anything. In this case, the Haar-Walsh BB with k* = 63 was automatically selected. The relative t2 errors are 0.116, 0.089, 0.051, respectively. Although the visual quality of our result is not too different from Donoho and Johnstone's (if we choose the appropriate basis for their method), our method generated the estimate with the smallest relative I2 error and slightly sharper edges. (See Sec. 3.6 for more about the Donoho-Johnstone method and its relation to our method.) E x a m p l e 3.10. A pure white Gaussian noise. We generated a synthetic sequence of WGN with cr2 = 1.0 and n = 4,096. The same library as in Example 3.9 (with the best bases adapted to this pure WGN sequence) was used. We also set the upper limit of search range fc2 = n / 2 = 2,048. Figure 3.3 shows the AMDL curves versus k for all bases
310
N. Saito
Fig. 3.3. The AMDL curves of the white Gaussian noise data for all bases. For each basis, fe = 0 is the minimum value. The vertical dotted line indicates the upper limit of the search range for k.
(a>
u
(c)-'
WtltylS^s~~~^Wrv~J~^s~~*^lt^^
200
400
600
800
1000
Fig. 3.4. The estimate of the natural radioactivity profile of subsurface formations: (a) original d a t a which was measured in the borehole of an oil-producing well, (b) Esti mation by the proposed method, (c) Residual error between (a) and (b).
Local Feature Extraction and Its Applications
311
in the library. As we can see, there is no single minimum in the graphs, and our algorithm satisfactorily decided k* = 0, i.e., there is nothing to "learn" in this data set. Example 3.11. A natural radioactivity profile of subsurface formation. We tested our algorithm on the actual field data which are measurement of natural radioactivity of subsurface formations obtained at an oil-producing well. The length of the data is n = 1,024. Again, the same library was used as in the previous examples. The results are shown in Fig. 3.4. In this case, our algorithm selected the D12 wavelet packet best basis (Daubechies' 12-tap filter with six vanishing moments) with k" = 77. The residual error is shown in Fig. 3.4(c) which consists mostly of a WGN-like high frequency component. The compression ratio is 1024/77 w 13.3. To be able to reconstruct the signal from the surviving coefficients, we still need to record the indices of those coefficients. Suppose that we can store each index by 6< bytes of memory and the pre cision of the original data is bf bytes per sample. Then the storage reduction ratio Ra can be computed by
„/r X (6 /
+
6,) =
nxbf
l / r\
+
M bf)
where r is a compression ratio. The original data precision was 6/ = 8 (bytes) in this case. Since it is enough to use 6* = 2 (bytes) for indices and r = 13.3%, we have Ra « 9.40%, i.e., 90.60% of the original data can be discarded. Example 3.12. A migrated seismic section. In this example, the data are a migrated seismic section as shown in Fig. 3.5(a). The data consist of 128 traces of 256 time samples. We selected six 2-D wavelet packet best bases (D02, C06, C12, C18, C24, C30) as the library. Figure 3.5(b) shows the estimate by our algorithm. It automatically selected the filter C30 and the number of terms retained as k* — 1611. If we were to choose a good threshold in this example, it would be fairly difficult since we do not know the accurate estimate of <x2. The compression rate, in this case, is (128 x 256)/1611 w 20.34. The original data precision was bf — 8 as in the previous example. In this case we have to use bi = 3 (1 byte for row index, 1 byte for column index, and 1 byte for scale level). If we put these and
312
N. Saito
r = 20.34% into (3.27), we have Rs ss 6.76%, i.e., 93.24% of the original data can be discarded. Figure 3.5(c) shows the residual error between the original and the estimate. We can clearly see the random noise and some strange high frequency patterns (which are considered to be numerical artifacts from the migration algorithm applied).
Fig. 3.5. Results for the migrated seismic section: (a) original seismic section with 128 traces and 256 time samples, (b) Estimation by the proposed method, (c) Residual error between (a) and (b). (Dynamic range of display (c) is different from those of (a) and (b).)
3.6.
Discussion
Our algorithm is intimately connected to the "denoising" algorithm of Coifman and Majid. 30,36 Their algorithm first picks the BB from the collection of bases and sorts the BB coefficients in order of decreasing magnitude. Then they use the "theoretical compression rate" of the sorted BB coefficients {c*i}"=1 as a key criterion for separating a signal component from noise. The theoretical compression rate of a unit vector u is denned as c(u) = 2H^/n(u), where H(u) is the £2-entropy of u, i.e., H(u) = - YA=I U1 l°g u ?> a n ^ n(u) is the length of u. We note that 0 < c(tt) < 1 for any real unit vector u, and c(u) = 0
Local Feature Extraction and Its Applications
313
implies u = {bi,io} for some io (the best possible compression), and c(u) = 1 implies u = (1, ... ,1)/ n(u) (the worst compression). Then to decide how many coefficients to keep as a signal component, they compare c({ai} k+l), the theoretical compression rate of the noise component (defined as the smallest (n - k) coefficients), to the predetermined threshold r. They search k = 0,1.... which gives an unacceptably bad compression rate: c({ai}a k+1) > r. Their algorithm critically depends on the choice of the threshold r whereas our algorithm needs no threshold selection. On the other hand, their algorithm does not assume the WGN model we used in this section; rather, they defined the noise component as a vector reconstructed from the BB coefficients of small magnitude. Recently, Coifman discovered a way to improve the denoising algorithms. This method utilizes the fact that the expansion coefficients in the library is not shift invariant. This is certainly an undesirable property of the library; however, the key observation is that the amount of variation on the coefficients caused by the shift is less emphasized in the signal component than in the noise component since the former is normally smoother than the latter. Based on this observation, he proposed to apply his denoising algorithm to several translated versions of the original signal then take the average of these denoised signals after undoing the translations. We applied this method with our MDLbased denoising algorithm to the signal of Example 3.11. Figure 3.6 shows the result of this "shift-denoise-average" algorithm. Compare this with Fig. 3.4. Eleven different versions with shifts 0, f 1, ... , ±5 were used. We observe that the Gibbs-like phenomena around the edges are less emphasized in this method and the residual error becomes much closer to WGN than in the one step MDLdenoising result in Fig. 3.4. An interesting issue here is to examine whether this "shift-denoise-average" process can be formulated in the MDL formalism and whether the optimal shift parameters can be selected automatically or not. Our algorithm can also be viewed as a simple yet flexible and efficient realization of the "complexity regularization" method for estimation of functions proposed by Barron.6 He considered a general regression function estimation problem: given the data (ti, yi)$ ,, where {ti E RP} is a sequence of the (pdimensional) sampling coordinates (or explanatory variables) and {yi E IR} is the observed data (or response variables), select a "best" regression function xn out of a list (library) £'n of candidate functions (models). He did not impose any assumption on the noise distribution, but assumed that the number of models in the list 2n depends on the number of observations n. Now the complexity regularization method of Barron is to find xn such that
314
N. Saito
200
400
600
800
1000
Fig. 3.6. The result of the "shift-denoise-average" method using the signal of Example 3.11. (a) Original data, (b) Estimation by the "shift-denoise-average" method using the MDLbased denoising algorithm with 11 shifts, (c) Residual error between (a) and (b).
R(xn) = min j - J25(yi,x(ti))
+ -L(x)
) ,
where S(-, •) is a measure of distortion (such as the squared error), A > 0 is a regularization constant, and L(x) is a complexity of a function x (such as the L{m) + L{6m\m) term in (3.7)). He showed that various asymptotic properties of the estimator x n as n —» oo, such as bounds on the estimation error, the rate of convergence, etc. If we restrict our attention to the finite-dimensional vector space, use the library of orthonormal bases described in Sec. 2, adopt the length of the Shannon code (3.8) as a distortion measure, assume the WGN model, and finally set A = 1, then Barron's complexity regularization method reduces to our algorithm. Our approach, although restricted in the sense of Barron, provides a computationally efficient and yet flexible realization of the complexity regularization method, especially compared to the library consisting of polynomials, splines, trigonometric series discussed in Ref. 6. Our algorithm also has a close relationship with the denoising algorithm via "wavelet shrinkage" developed by Donoho and Johnstone. 53 (A well-written summary on the wavelet shrinkage and its applications can be found in Ref. 52.)
Local Feature Extraction and Its Applications
315
Their algorithm first transforms the observed discrete data into a wavelet basis (specified by the user), then applies a "soft threshold" T = o V l n n to the coefficients, i.e., shrinks magnitudes of all the coefficients by the amount r toward zero. Finally the denoised data is obtained by the inverse wavelet transform. Donoho claimed informally in Ref. 52 that the reason why their method works is the ability of wavelets to compress the signal energy into a few coefficients. The main differences between our algorithm and that of Donoho and Johnstone are: • Our method automatically selects the most suitable basis from a collection of bases whereas their method uses only a fixed basis specified by the user. • Our method includes adaptive expansion by means of wavelet packets and local trigonometric bases whereas their method only uses a wavelet trans form. • Their method requires the user to set the coarsest scale parameter J < n = logn and a good estimate of a 2 , and the resulting quality depends on these parameters. On the other hand, our method does not require any such parameter setting. • Their approach is based on the minimax decision theory in statistics and ad dresses the risk of the estimation whereas our approach uses the informationtheoretic idea and combines denoising and the data compression capability of wavelets explicitly. • Their method thresholds the coefficients softly whereas our method can be said to threshold sharply. This might cause some Gibbs-like effects in the reconstruction using our method. Future extensions of this research are to: (a) formulate the Coifman's "shift-denoise-average" method using the MDL principle; (b) incorporate noise models other than Gaussian noise; (c) extend the algorithm for highly nonstationary signals by segmenting them smoothly and adaptively; (d) investigate the effect of sharp thresholding; and (e) study more about the relation with the complexity regularization method of Barron as well as the wavelet shrinkage of Donoho-Johnstone. 3.7.
Summary
We have described an algorithm for simultaneously suppressing the additive WGN component and compressing the signal component in data. One or more of the bases in a library of orthonormal bases can compress the signal
316
N. Saito
component quite well whereas the WGN component cannot be compressed efficiently by any basis in the library. Based on this observation, we have derived an algorithm to estimate the signal component in the data by obtaining the "best" basis and the "best" number of terms to retain using the MDL criterion. Because of the use of the MDL criterion, this algorithm does not require the user to specify any parameter or threshold values. Both synthetic and real field data examples have shown the wide applicability and usefulness of this algorithm. 4. Local Discriminant Bases and Their Applications 4.1.
Introduction
Extracting relevant features from signals or images is an important process for data analysis, such as classifying signals into known categories (classifica tion) or predicting a response of interest based on these signals (regression). In this section, we describe an extension to the "best basis" method reviewed in Sec. 2 to construct an orthonormal basis suitable for classification problems rather than compression problems. In particular, we propose a fast algorithm to select an efficient basis (or coordinate system) from a library of orthonormal bases to enhance the performance of a few classification schemes. This algo rithm reduces the dimensionality of these problems by using basis functions described in Sec. 2 (which are well-localized in the time-frequency plane) as feature extractors. Since this basis illuminates the differences among classes, it can be used to extract signal component from data consisting of signal and textured background. This section is organized as follows. In Sec. 4.2, we formulate the problem of feature extraction and classification. Then, in Sec. 4.3, we review some of the pattern classification schemes used in our study. In Sec. 4.4, we propose a fast algorithm for constructing such a local basis for classification problems. This is immediately followed by examples in Sec. 4.5. We then examine the effect of the denoising capability of the selected bases to the classification problems and discuss whether we should preprocess the input signals via the denoising algo rithm described in Sec. 3. Finally, we discuss a method of signal/"background" separation as a further application of such a basis in Sec. 4.7. 4.2. Problem
formulation
Let us first define appropriate spaces of input signals (or patterns), extracted features, and outputs (or responses) and mapping functions among them. Let
Local Feature Extraction and Its Applications
317
^ C R " denote a signal space (or a pattern space) which is a subset of the standard n-dimensional vector space and which contains all signals (or samples, or patterns) under consideration. In this case, the dimensionality of the signal space or equivalently the length of each signal is n. Let y = {1,2,.. . , C } be a set of the class or category names corresponding to the input signals. We call this space a response space. Signal classification can be considered as a mapping function (usually a many-to-one) d : X —> y between these two spaces. Direct manipulation of signals in the signal space for classification is prohibitive because: (a) the signal space normally has very high dimensionality (e.g., n « 1,000 for a typical exploration seismic record per receiver, and for a typical CT scanner image, n = 512 x 512 = 262,144); and (b) the existence of noise or undesired components (whether random or not) in signals makes classification difficult. On the other hand, the signal space is overly redundant compared to the response space. Therefore, it is extremely important to reduce the dimensionality of the problem, i.e., extract only relevant features for the problem at hand and discard all irrelevant information. If we succeed in doing this, we can greatly improve classification performance both in its accuracy and efficiency. For this purpose, we set a feature space f c R ' where k < n between the signal space and the response space. A feature extractor is denned as a map / : X -» T, and a predictor (also called classifier for classification) as a map g : T -* y. Let T — {(xi,yt)}£Li C A" x ^ be a training (or learning) data set with N pairs of signals Xi and responses (class names) yi This is the data set used to construct a feature extractor / . Let Nc be the number of signals belonging to class c so that we have N = Ni -\ + Nc- Also, let us c denote a set of class c signals by {x\ J ^ = {xi}ieic where Ic C { 1 , . . . , N} is a set of indices for class c signals in the training dataset with \IC\ = Nc. Preferably, the performance of the whole process should be measured by the misclassification rate using a test dataset T' = {(y^, a ^ ) } ^ (which has not been used to construct the feature extractors and classifiers) as (1/iV) $ 3 i = 1 S(y'i - d(x'i)), where 6(r ^ 0) — 1 and <5(0) = 0. If we use the resubstitution error rates (the misclassification rates computed on the training data set), we obviously have overly optimistic figures. In this section, we focus on the feature extractors of the form / = Q^ o \I>, where ©(*' : X —► J- represents the selection rule (e.g., picking most impor tant k coordinates from n coordinates), and $ £ 0{n), i.e. an n-dimensional orthogonal matrix. In particular, we consider matrices representing the orthonormal bases in the library as candidates for ty. As a classifier g, we adopt
318
N. Saito
Linear Discriminant Analysis (LDA) of R. A. Fisher 59 and Classification and Regression Trees (CART). 18 4.3. A review of some pattern
classifiers
In this section, we review the pattern classifiers used in our study, i.e. LDA and CART, although other classifiers such as A;-nearest neighbor (&-NN),39 or artificial neural networks (ANN) 125 can be used in our algorithm. The reader interested in comparisons of different classifiers is referred to the excellent review article of Ripley.125 The useful information on pattern classifiers in general can be found in Refs. 48, 63, 103 and 152. 4.3.1. Linear discriminant analysis LDA first tries to do its own feature extraction by a linear map AT : X —> T (in this case not necessarily orthogonal matrix). This map A simultaneously minimizes the scatter of sample vectors (signals) within each class and maxi mizes the scatter of mean vectors around the total mean vector. To be more precise, let m c = (1/NC) ]Ci=i x » be a m e a n vector of class c signals. d Then the total mean vector m can be defined as: c c=l
where TTC is the prior probability of class c (which can be set to Nc/N without the knowledge on the true prior probability). The scatter of samples within each class can be measured by the within-class covariance matrix c c=l
where E c is the sample covariance matrix of class c:
=^FEK,-^)K)-™«)T. c
i=l
" T h e sample mean operation (1/NC) ^2 -1 ' n ' ^ ' s subsection can be replaced by expectation Ec for general cases; however, in this thesis, we focus our attention on the cases of a finite number of samples, we stay with the sample mean operations.
Local Feature Extraction and Its Applications
319
The scatter of mean vectors around the total mean can be measured by the between-class covariance matrix c E b = ^2 7r c (m c - m)(mc
- m)T .
c=l
Then, LDA maximizes a class separability index J(A) ±
tr[(ATZbA)-\ATXwA)},
which measures how much these classes are separated in the feature space. This requires solving the so-called generalized (or pencil-type) eigenvalue problem,
where A is a diagonal matrix containing the eigenvalues. Once the map A is obtained (normally k = C — 1 for LDA), then the feature vector A^Xi is computed for each i, and finally it is assigned to the class which has the mean vector closest to this feature vector in the Euclidean distance in the feature space. This is equivalent to bisecting the feature space T by hyperplanes. In this section we regard LDA as a classifier although, as explained, it also includes its own feature extractor A. LDA is the optimal strategy if all classes of signals obey multivariate normal distributions with different mean vectors and an equal covariance matrix. 63 ' 103 In reality, however, it is hard to assume this condition. Moreover, since it relies on solving the eigensystem, LDA can only extract global features (or squeezes all discriminant information into a few [C — 1] basis vectors) so that the interpretation of the extracted features becomes difficult, it is sensitive to outliers and noise, and it requires 0(n3) calculations. 4.3.2. Classification and regression trees Another popular classification/regression scheme, CART 18 is a nonparametric method which recursively splits the input signal space along the coordinate axes and generates a partition of the input signal space into disjoint blocks so that the process can be conveniently described as a binary tree where nodes represent blocks. Such a tree for classification problems is called a classification tree (CT). At each node in a CT, a class label is assigned by the majority vote at that node. Then, candidate splits are evaluated by the "information gain" or the quantity called deviance and the most "informative" split is selected.
The popular measure as the deviance for the classification is again entropy! This time the entropy of a node is defined as
where p, is the proportion of class c samples over the whole samples at that node. Thus, the best split amounts to maximally reducing the entropy of that node. Once the best split is determined, all the input signals belonging to that
Fig. 4.1. An example of a classification tree. Nodes are represented by ellipses (interior nodes) and rectangles (terminal nodes/leaws). The node labeis are the predicted class name= which are "cylinder", "bell" and "funnel" in this case. The ratio displayed under each node r e p r e n t s the misclagpificatian rate of csses reached to that node. The splitting rules are displayed on the edges mnnecting nodes. The rule "z.1 < 10.0275" implies "if the first mordinste value of the input signal is less than 10.0275, go to this branch." See Example 4.7 for the details.
Local Feature Extraction and Its Applications
321
node is split into two groups (children nodes). Splitting is continued recur sively until nodes become "pure", i.e., they contain only one class of signals, or become "sparse", i.e., they contain only a few signals.6 An example of the CT is shown in Fig. 4.1 (which, in fact, is the best tree for Example 4.7 we will study in Sec. 4.5). Finally, the pruning process to eliminate unimportant branches is usually applied after growing the initial tree to avoid the "over training". In Appendix A, we develop the pruning algorithm based on the MDL principle. Regression trees (RTs) of the CART methodology are exten sively used in Sees. 5 and 6 and are described there. We refer the reader to Ref. 18 for the details of splitting, stopping and pruning rules. CART requires searching and sorting all the coordinates of training signals for the best splits: it is computationally expensive for the problem of high dimensionality. This is more emphasized if we want to split the signal space "obliquely" by taking linear combinations of the coordinates to generate a tree. 4.4. Construction
of local discriminant
basis
In order to fully utilize the classifiers including the ones reviewed in the pre vious section, we must supply them good features (preferably just a few) and throw out useless part of the data. This improves both accuracy and speed of these classifiers. In this section, we describe a fast algorithm to construct good feature extractors. In particular, we follow the "best basis" paradigm dis cussed in Sees. 1 and 2 which permits a rapid [e.g., 0(nlogn)] search among a library of orthonormal bases for the problem at hand; we select basis functions which are well-localized in the time-frequency plane and which most discrim inate given classes, and then the coordinates (expansion coefficients) of these basis functions are fed into LDA or CART. 4.4.1. Discriminant measures Recall that the best basis algorithm of Coifman and Wickerhauser 35 was de veloped mainly for signal compression as reviewed in Sec. 2. It selects a basis suitable for signal compression from a dictionary/library of orthonormal bases (i.e. a set of tree-structured subspaces which generates many orthonormal bases e
In the S-PLUS package 140 (the extended version of the statistical language 10 ' 23 S), which we extensively use to test our algorithms, by default, the split stops if either the number of samples belonging to that node becomes less than 10 or the deviance of that node becomes less than 1% of the deviance of the root node.
322
N. Saito
and which have different time-frequency localization characteristics) by mea suring the efficiency of each subspace in the dictionary/library for representa tion/compression of signals. The Shannon entropy (2.1) is a natural choice as such a measure of efficiency, or information cost. This quantity measures the flatness of the energy distribution of the signal so that minimizing this leads to an efficient representation (or coordinate system) for the signal. For the classification problems, however, we need a measure to evaluate the power of discrimination of each subspace in the tree-structured subspaces rather than the efficiency in representation. Once the discriminant measure (or discrimi nant information function) is specified, we can compare the goodness of each node (subspace) for the classification problem to that of union of the two chil dren nodes and can judge whether we should keep the children nodes or not, in the same manner as the BB search algorithm. There are many choices for the discriminant measure (see e.g. Ref. 9); all of them essentially measure "statistical distances" among classes. For simplicity, let us first consider the two-class case. Let p — {pi}" = i, q = {9t}"=i be two non-negative sequences with J^Pi = ^ 9 i = 1 (which can be viewed as normalized energy distributions of signals belonging to class 1 and class 2 respectively in a coordinate system). The discriminant information function X>(p, q) between these two sequences should measure how differently p and q are distributed. One natural choice for T> is the so-called relative entropy (also known as cross entropy, Kullback-Leibler distance, or I-divergence)87 n
with the convention, logO = - c o , log(x/0) = +00 for x ^ 0, 0 • (±00) = 0. It is clear that I(p, q) > 0 and equality holds iff p = q. This quantity is not a metric since it is not symmetric and does not satisfy the triangle inequality. But it measures the discrepancy of p from q. Note that if qi = \/n for all i, i.e., qi are distributed uniformly, then I(p,q) = —H{p), the negative of the entropy of the sequence p itself. The relative entropy (4.1) is asymmetric in p and q. For certain applica tions the asymmetry is preferred (see e.g., Sec. 4.7). However, if a symmetric quantity is preferred, one should use the J-divergence between 87 p and q:
JiP,q) =
I(p,q)+I(Q,p)-
(4.2)
Local Feature Extraction and Its Applications
323
Another possibility of the measure V is a t2 analogue of152 I(p, q):
^(P,g) = IIP-
(4-3)
i=l
Clearly, tp (p> 1) versions of this measure are all possible. To obtain a fast computational algorithm, the measure V should be additive similarly to J: Definition 4 . 1 . The discriminant measure V(p,q)
is said to be additive if
D({W}?=1, {fc}?=i) = JZV&^)-
(4-4)
i=\
The measures (4.1) (subsequently (4.2) as well) and (4.3) are both additive. For measuring discrepancies among C distributions, p ^ \ . . . ,p^c\ one may take ( 2 ) pairwise combinations of V: O—1 (C)
*>({P }?=I)
=E t=l
C*
E
2>(p(i)>p(i))-
(4-5)
j=i+l
4.4.2. 77ie local discriminant basis algorithm The fist step of our strategy for classification is to select a basis which most discriminates given classes from a library of orthonormal bases. Let us first consider the selection of such a basis from a dictionary of orthonormal bases in the library. Given an additive discriminant measure V, what quantity should be supplied to V to evaluate the discrimination power of each subspace Cljtk in the binary-tree-structured subspaces in the dictionary? In order to fully utilize the time-frequency localization characteristics of our dictionary of bases, we compute the following time-frequency energy map for each class and supply them to T>: Definition 4.2. Let {x\c }iJi be a set of training signals belonging to class c. Then the time-frequency energy map of class c, denoted by r c , is a table of real values specified by the triplet (j, k, I) as
re(j, *, o - X K w *
/
t=l
for j = 0 , . . . , J, k = 0 , . . . , V'< - 1 and I = 0 , . . . , 2 n ° - J - 1.
<4-6)
324 N. Saito
In other words, T c is computed by accumulating the squares of expan sion coefficients of the signals at each position in the table followed by the normalization by the total energy of the signals belonging to class c. (This normalization may be important especially if there is significant differences in the number of samples among classes.) In the following, we use the notation: 2n"*-l
©({rc(i,*,-)}?=i)= E
v(r1(3,k,i),...,TcU,k,i)).
1=0
Here is an algorithm to select an orthonormal basis (from the dictionary) which maximizes the discriminant measure on the time-frequency energy dis tributions of classes. We call this a local discriminant basis (LDB). Similarly to the BB algorithm, let Bjtk denote a set of basis vectors at the subspace Qjtk as defined in (2.3). Let Ajtk represent the LDB (which we are after) restricted to the span of Bjtk- Also, let Ajtk be a work array containing the discriminant measure of the subspace £lj,kAlgorithm 4.3 (The Local Discriminant Basis Selection Algorithm). Given a training data set T consisting of C classes of signals {{x\ }£[?i}£Li, Step 0: Choose a dictionary of orthonormal bases 2) (i.e., specify QMFs for a wavelet packet dictionary or decide to use either the local cosine dictionary or the local sine dictionary) and specify the maximum depth of decomposition J and an additive discriminant measure T>. Step 1: Construct time-frequency energy maps Tc for c= 1 , . . . , C. Step 2: Set Aj,k = BjM and Aj,k = V({TC(J, k, -)}f =1 ) for k = 0,...
,2J-1.
Step 3: Determine the best subspace Aj
and set Aj
+ Aj+i,2fc+i-
Step 4: Order the basis functions by their power of discrimination (see below).
Local Feature Extraction and Its Applications
325
Step 5: Use k (< n) most discriminant basis functions for constructing clas sifiers. The selection (or pruning) process in Step 3 is fast, i.e. 0(n), since the measure V is additive. After this step, we have a complete orthonormal basis LDB. Proposition 4.4. The basis obtained by Step 3 of Algorithm 4.3 maximizes the additive discriminant measure V on the time-frequency energy distributions among all the bases in the dictionary 2) obtainable by the divide-and-conquer algorithm. Proof. Similarly to the proof of Proposition 2.4 described in Refs. 35 and 157, we show this by induction on j (in the decreasing order, J,J— 1 , . . . , 0). Let Qjtk be a span of Bjik as in Sec. 2. Let A'3■ k be any basis of fij,fc. Let Aj■ k be its discriminant measure on the time-frequency energy distributions. There is only one basis for Ujtk in 2) (which is Bj.fc) because J is the maximum depth of decomposition in 3). Then, for J — 1, let A'J_1 k be any basis for ftj-i,fc. Then either A'j_lk = B j - i . t or A'j_lk = A'J2k ® A'J2k+1. By the inductive hypothesis, Ajt2k > A' J2fc and Aj^jt+i > A' J 2 f c + 1 . Thus, from the equations in Step 3, Aj_i,fc > Ajt2k + &J,2k+i > ^'j^k + ^!/,2fc+i f° r a n J r k 6 { 0 , 1 , . . . , 2 J _ 1 — 1}. This implies that Ao,o becomes the largest possible discriminant value using this divide-and-conquer algorithm. D Once the LDB is selected, we can use all expansion coefficients of signals in this basis as features; however, if we want to reduce the dimensionality of the problem, the two subsequent steps are still necessary. In Step 4, there are several choices as a measure of discriminant power of an individual basis func tion. For simplicity in notation, let A — (j, k, m) € Z 3 be a triplet specifying one of the LDB functions selected in Step 3, and let a^\ = wjx\c', i.e. an c expansion coefficient of x\ ' in the basis vector w\. (a) The discriminant measure of a single basis function w\: D(r1(A),...,rc(A)).
(4.7)
(b) The Fisher's class separability of the expansion coefficients onto the basis function w\:
326
N. Saito C
y"\ c (meani(o4 c ;) - mean c (meani(a^))) 2 —
c J^irevari(ag)
'
'
(4-8)
c=l
where meanj(-) and var<(-) are operations to take the sample mean and variance with respect to the samples indexed by i, respectively. (c) The robust version of (b): c ^ 7 r c | m e d i ( a ^ ) i ) - med c (medi(a^))| —
c
>
(4-9)
^7rcmadi(ag) c=l
where medj(-) and madj(-) are operations to take the sample median and median absolute deviation with respect to the samples indexed by i, re spectively. See Refs. 9 and 77 for more examples. We note that this step can also be viewed as a restricted version of the projection pursuit algorithm. 77 Step 5 reduces the dimensionality of the problem from n to k without losing the discriminant information in terms of time-frequency energy distributions among classes. Thus many interesting statistical techniques which are usually computationally too expensive for n-dimensional problems become feasible. How to select the best k is a tough interesting question. One possibility is to use model selection methods such as the minimum description length (MDL) criterion 128 (see also Sec. 3). We can easily extend Algorithm 4.3 to a library of orthonormal bases. Let £ = { D I , . . . , 2 ) M } denote a library. Let Q3m be the LDB selected from the dictionary D m . Each LDB 23 m is associated with the maximum value of a discriminant measure on the time-frequency energy distributions as shown in Proposition 4.4. Let A ^ denote this maximum value. Then we can simply pick the basis giving the maximum value among {Aj^}: the "best" of the LDBs Q3m- is given by A ^ . = maxi< m <M A ^ . Remark 4.5. Our LDB method can be used for certain regression problems which are closely related to the classification problems. Let a training data
Local Feature Extraction and Its Applications
327
set {(aJi,y»)}£Li consists of C classes of samples {{(xi,yi)}ieic}^=i as before. Let us assume that the response yj for i € Ic is now a real number conditioned as y< € Rc = [ac,bc] and that n ^ i i c ^ 0. Under this assumption, suppose one wants to estimate the response y< for a given input signal Xi rather than its class label or assignment. This situation is not really special; we often encounter this type of regression problems in medical and geological sciences where the objects are made in the course of nature. In Sec. 6, we test and analyze the real data set from the field of geophysical prospecting using the algorithms described in this section. 4.5.
Examples
To demonstrate the capability of the LDB method, we conducted two classi fication experiments using synthetic signals. In both cases, we specified three classes of signals by analytic formulas. For each class, we generated 100 train ing signals and 1,000 test signals. We first constructed LDA-based classifier and CT (with and without pruning) using the training signals represented in the original coordinate (i.e. standard Euclidean) system. We used the pruning algorithm based on the MDL principle described in Appendix A. Then we fed the test signals into these classifiers. Next we computed the LDB (using (4.5) as a discriminant measure and (4.7) for ordering the individual basis functions) on the training signals. Then we selected a small number of most discriminant basis functions, say about 10% of the dimensionality of the signals, and used these coordinates to construct LDA-based classifier and CTs. Finally the test signals were projected onto these selected LDB functions and then fed into these classifiers. For each method, we computed the misclassification rates on the training data set and the test data set. E x a m p l e 4.6. Triangular waveform classification. This is an example for classification originally examined in Ref. 18. The dimensionality of the signal was extended from 21 in Ref. 18 to 32 for the dyadic dimensionality requirement of the bases under consideration. Three classes of signals were generated by the following formulas: xW(i) = uhi(i) + (1 - u)h2(i) + e(i)
for Class 1,
xW(i) = u/ii(i) + (1 - u)h3(i) + e(i)
for Class 2,
x^ (j) = uh2(i) + (1 - u)h3(i) + e(i)
for Class 3 ,
328
N. Saito
0
5
10
%
20
25
30
10
15
20
25
30
10
15
20
25
30
Fig. 4.2. Five sample waveforms from (a) class 1, (b) class 2, and (c) class 3.
20
25
30
Fig. 4.3. Plots from the analysis of Example 4.6: (a) top five LDA vectors, (b) Top 5 LDB vectors, (c) The subspaces selected as the LDB.
Local Feature Extraction and Its Applications
329
where i = 1,...,32, hi(i) = max(6 - |t - 7|,0), h2{i) = hx(i - 8), h3(i) = h\ (i — 4), u is a uniform random variable on the interval (0,1), and e(i) are the standard normal variates. Figure 4.2 shows five sample waveforms from each class. The LDB was computed from the wavelet packet coefficients with the 6-tap coiflet filter.43 Then the five most discriminant coordinates were selected. In Fig. 4.3, we compare the top five vectors from LDA and LDB. Only the top two vectors were useful in LDA in this case. The top five LDB vectors look similar to the functions hj or their derivatives whereas it is difficult to interpret the LDA vectors. The misclassification rates are given in Table 4.1. The best result so far was obtained by applying LDA to the top five LDB coordinates. We would like to note that according to Breiman et al.,18 the Bayes error of this example is about 14%. Table 4.1. Misclassification rates of Example 4.6. FCT and PCT denote the full and pruned classification trees, respectively. STD, LDB5 and LDB represent the standard Euclidean coordinates, the top five LDB coordinates, and all the LDB coordinates, respectively. We do not show the error rates of LDA on all the LDB coordinates since this is same as the ones of LDA on STD theoretically. The smallest error on the test data set is shown in bold font. Method LDA FCT PCT LDA FCT
on on on on on
STD STD STD LDB5 LDB5
P C T on LDB5 F C T on LDB P C T on LDB
Error Training
rate (%)
13.33 6.33 29.33 14.33 7.00 17.00 7.33 17.00
20.90 29.87 32.97 15.90 21.37 25.10
Test
23.60 25.10
Example 4.7. Signal shape classification. The second example is a signal shape classification problem. In this exam ple, we try to classify synthetic noisy signals with various shapes, amplitudes, lengths, and positions into three possible classes. More precisely, sample sig nals of the three classes were generated by:
330
N. Saito
0
20
40
60, ,
80
100
120
0
20
40
60^,
80
100
120
0
20
40
60,
80
100
120
v
Fig. 4.4. Five sample wave forms from (a) "cylinder" class, (b) "bell" class and (c) "funnel" class.
c(i) = (6 + rj) ■ X[a,b](i) + £(*)
for "cylinder" class,
b(i) = (6 + T)) ■ X[a,b} (i) ■ (i ~ a)/(b - a) + e(i) f(i) = (6 +17) ■ X{a,b)(i) ■ (b - i)/(b -a)+
e(i)
for "bell" class, for "funnel" class,
where i = 1 , . . . , 128, a is an integer-valued uniform random variable on the interval [16,32], b — a also obeys an integer-valued uniform distribution on [32,96], T) and e(i) are standard normal variates, and X[o,6](0 is t n e character istic function on the interval [a,b\. Figure 4.4 shows five sample wave forms from each class. If there is no noise, we can characterize the "cylinder" signals by two step edges and constant values around the center, the "bell" signals by one ramp and one step edge in this order and positive slopes around the center, and the "funnel" signals by one step edge and one ramp in this order and negative slopes around the center. The 12-tap coiflet filter43 was used for the LDB selection. Then the ten most important coordinates were selected. In Fig. 4.5, we compare the top ten LDA and LDB vectors. Again, only the top two vectors were used for classification in LDA case. These LDA vectors are very noisy and it is difficult to interpret what information they captured. On the other hand, we can
Local Feature Extraction and Its Applications
20
40
"(M
"
100
331
120
o
-*■■■■ (O
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 20
40
60, (c)
80
100
120
Fig. 4.5. Plots from the analysis of Example 4.7: (a) top ten LDA vectors, (b) Top ten LDB vectors, (c) The subspaces selected as the LDB.
observe that the top ten LDB vectors are located around the edges of the centers of the signals. Also note that some of the vectors work as a smoother (low pass filter) and the others work as an edge detector (band pass filter), so that the resulting expansion coefficients carry the information on the edge positions and types. The misclassification rates in this case are displayed in Table 4.2. As expected, LDA applied to the original coordinate system was almost perfect with respect to the training data, but it adapted too much to the features specific to the training data, and lost its generalization power; when applied to the new test data set, it did not work well. The best result was obtained using the full CT on the top ten LDB coordinates. In this case, the misclassification rates of the training data and test data are very close; i.e. the algorithm really "learned" the structures of signals. In fact, this best tree was already shown in Fig. 4.1. If the tree-based classification is combined with the coordinate system capturing local information in the time-frequency plane, the interpretation of the result becomes so explicit and easy: in Fig. 4.1 we find that the LDB coordinate # 1 is checked first. If this is less than 10.027 5, it is immediately classified as "bell". From Fig. 4.5(b), we observe that the LDB function # 1 is located around i = 30 which, in fact, coincides with the
332
N. Saito
Table 4.2. Misclassification rates of Example 4.7. LDBIO represents the top ten LDB coordinates, other abbreviations are exactly the same as in Table 4.1. The smallest error on the test data set is shown in bold font. Error
Method LDA FCT PCT LDA FCT PCT
on on on on on on
Training STD STD STD LDBIO LDBIO LDBIO
F C T on LDB P C T on LDB
rate (%) Test
0.33 3.00 6.00 3.67 3.00 4.33
13.17 13.37 10.67 6.20
3.33 4.00
5.97 6.13
3.83 6.40
Table 4.3. Misclassification rates of Example 4.6 with the denoised input signals. Method LDA FCT PCT LDA FCT PCT FCT PCT
on on on on on on on to
STD STD STD LDB5 LDB5 LDB5 LDB LDB
Error Training
rate (%)
15.33 6.33 18.00 17.00 7.67 22.00
24.13 24.97 24.37 21.37 22.40 28.87
6.33 18.00
25.07 27.40
Test
Table 4.4. Misclassification rates of Example 4.7 with the denoised input signals. Method LDA on STD F C T on STD PCT LDA FCT PCT FCT PCT
on on on on on on
STD LDB10 LDB10 LDB10 LDB LDB
Error Training
rate (%) Test
2.00 2.33 5.00 4.67 3.67 4.00 2.67 4.00
18.83 7.33 8.13 6.73 8.10 7.03 7.10 7.03
Local Feature Extraction and Its Applications
333
starting position (the parameter a in the formulas) of various signals. Around this region, both the cylinder and the funnel signals have sharp step edges. On the other hand, the bell signals start off linearly. Thus CART algorithm found that the LDB function # 1 is the most important coordinate in this example. Separating the cylinder class from the funnel class turned out to be more difficult because of the large variability of the ending positions. This resulted in the more complicated structure of the right branch from the root node. But we can still obtain the intuitive interpretation: the first node in the right branch (with "cylinder" label) from the root node is split into either "funnel" or "cylinder" depending on the LDB coordinate # 5 which is located around the middle of the axis (i — 64). The cylinder signals have roughly constant values around this area whereas the funnel signals roughly decrease in a linear fashion. One can continue the interpretation in a similar manner for all remaining nodes. In both examples, we see that the misclassification rates (of the test datasets) using the pruned CTs are worse than those using the full CTs. This tendency can also be found in some of the examples studied in Ref. 125. We will investigate this issue further in our future research project. From these examples, we can see that it is more important to select the good features than to select the best possible classifier without supplying the good features; each classifier has its advantages and disadvantages, 125 i.e., the best classifier depends heavily on the problem (e.g., LDA was better than CART in Example 4.6 whereas the situation was opposite in Example 4.7). By supplying a handful of good features, we can greatly enhance the performance of classifiers. 4.6. To denoise
or not to
denoise?
The LDB-based classification developed so far worked quite well for the noisy synthetic data sets. An interesting question is whether this good performance is attributed solely to the denoising capability of the basis functions, or to the local features extracted by the basis functions, or both. Thus, we applied the MDL-based denoising algorithm combined with shift-average method devel oped in Sec. 3 to the input signals. In this exercise, we fixed the QMF for the denoising, i.e. C06 for the signals of Example 4.6, and C12 for the signals of Example 4.7. Also we used the five-point shifts in that algorithm. After the denoising, exactly the same procedures were applied. The following two tables summarize these results. We can observe that the following tendency:
334
N. Saito
• Each error rate on the denoised signals represented in the LDB coordinates (whether taking top few coordinates or not) is consistently worse than the one without denoising. • Error rates on the denoised signals represented in the standard Euclidean system depends on the methods; in both examples LDA resulted in the larger error rates on the denoised signals than on the original noisy signals, and the situation is opposite for the CTs. • In both examples, the best performance is obtained using the signals without denoising. Based on these observations, we may conclude that too much denoising should not be applied prior to the LDB analysis since it may lose some impor tant information for classification. We will address how to choose the number of basis functions to retain for denoising without deteriorating the classification performance in our future project. 4.7. Signal/background
separation
by LDB
LDB vectors can also be used as a tool for extracting signal component from the data obscured by some unwanted noise or "background" (which may not be random). Let class 1 consists of a signal plus noise or a signal plus "back ground" and let class 2 consists of a pure noise or "background". Then, by selecting the LDB maximizing V between class 1 and class 2, we can construct the best basis for denoising arbitrary noise or pulling a signal out of a textured background. In this application, the asymmetric relative entropy (4.1) makes more sense than the symmetric version (4.2). We show one example here. As "background" (class 2), we generated 100 synthetic sinusoid with random phase as b(k) — sin(7r(A:/32 -I- u)), where k = 1 , . . . , 128, and u is a uniform random variable on (0,1). As class 1 samples, we again generated 100 "backgrounds", and added a small spike (as a "signal" component) for each sample vector randomly between 20 < k < 60, i.e., x(k) = sin(7r(fc/32 -I- u)) + 0.01<5fcir, where 8k,r is the Kronecker delta and r is an integer-valued uniform random variable on the interval [20,60]. Figure 4.6 shows how these "backgrounds" were removed. Figure 4.6(a) shows 10 sample vectors of class 1. We can hardly see the spikes. Then we transformed both class 1 and 2 samples by the discrete sine transform (DST) into "frequency" domain. Figure 4.6(b) shows the transformed version of Fig. 4.6(a). Then these DST coefficients of both classes were supplied to the LDB algorithm of
Local Feature Extraction and Its Applications
335
o CM
* CD CO
V-
T1 )t » V
0
20
40
60 80 100 120 (a)
20
0
40
i
eg
-r
•»
....
40
60 (C)
80 100 120
i:
1
._
i ....
»
1
_ .
' " " i _1
CO
20
80 100 120
(b)
o
0
60
0
20
40
60 80 100 120 (d)
Fig. 4.6. (a) Ten samples of Class 1 vectors, i.e. sinusoids plus spikes, (b) DST coefficients of vectors in (a), (c) Top 20 LDB vectors using the local sine dictionary on the frequency domain, (d) Reconstructed spikes after removing the "background".
Sec. 4.4 using the local sine basis dictionary (which essentially does segmenta tion in frequency domain). After the LDB was found, the basis vectors were sorted by (4.7). The top 20 LDB vectors are displayed in Fig. 4.6(c). We can clearly see that the top eight basis vectors are concentrated around low frequency region and other vectors are located in higher frequency region. We regard the subspace spanned by these eight LDB vectors as "background" us ing the a priori knowledge that the "background" component consists of only low frequency component. The reason why these vectors have large values in (4.7) is that the "background" parts of class 1 samples are different from class 2 samples in phase, and the DST is not a shift-invariant transform. After remov ing the component belonging to this "background" subspace, we reconstructed the "signal" component of class 1 samples by inverse DST which are shown in Fig. 4.6(d). We can clearly see the spikes now. The LDB can thus improve the algorithm of extracting "coherent" component from the data by Coifman, Majid and Wickerhauser, 30 ' 36 if we know the statistics of the background a priori or have actual pure background signals.
336
N. Saito
A similar idea for multidimensional signals has been proposed by Harlan et al.6S; they considered the problem of removing linear and hyperbolic structures from images (representing the geophysical acoustic signals such as Fig. 3.5 in Sec. 3) using the Radon and the generalized Radon transforms. The key observation is that the structural components (e.g. lines and hy perbolas) in the images can be well-compressed or "focused" in the certain transformed domains (e.g., the Radon, the generalized Radon transformed do mains). On the other hand, the unstructured components or backgrounds are "defocused" in these domains. Based on this observation, the thresholding operation in the transformed domain is applied and only the "focused" objects in the transformed domain remain. Then the inverse transform only recon structs the structural components and eliminates backgrounds. In this sense, the "structure" strongly depends on the transform under consideration. Our philosophy is to use the library of bases in Sec. 2; we have a large collection of transforms each of which can represent many different "structures" in signals. For images or multidimensional signals, it is not simple to determine which basis should be included in the library of bases because: (a) there are many possible two-dimensional bases both separable and nonseparable (see Sec. 2.7 and the references therein); and (b) the computational cost is much higher (w 0(n2 log4 n 2 ) for an image of n rows and n columns). Here is the place to use a priori information carefully to restrict the number of bases or dic tionaries in the library to achieve both the computational efficiency and the representation power of the bases. Classification and signal/noise separation for images are our important future project.
4.8.
Summary
We have described an algorithm to construct an adaptive local orthonormal basis [local discriminant basis (LDB)] for classification problems by select ing a basis from a library of orthonormal bases using a discriminant measure (e.g., relative entropy). The basis functions generated by this algorithm can capture relevant local features (in both time and frequency) in data. LDB provides us with better insight and understanding of relationships between the essential features of the input signals and the corresponding outputs (class names), and enhances the performance of classifiers. We have demonstrated that LDB can also be used for pulling out signal component from the data consisting of signals plus "backgrounds".
Local Feature Extraction and Its Applications
337
5. Local Regression Bases 5.1.
Introduction
Basis functions selected by the "best basis" paradigm from a library of orthonormal bases have been found useful for the classification problems in Sec. 4. They provide us with better insight and understanding of relationships between the local features of the input signals and the class assignments. A natural question is how to extend this paradigm for regression problems. Our definition of regression is simply any statistical method to construct a mapping function from the input signal space (generally high dimensional) to the response space (normally low dimensional). Estimation or prediction of some quantity from the input signals can be considered as a regression problem; e.g., classification problems are a subset of regression problems. In this section, we describe a method to select a complete orthonormal basis from a library of orthonormal bases suitable for regression problems. In Sec. 5.2, we formulate the problem of feature extraction and regression, and summarize the tree-based regression method in the CART methodology. In Sec. 5.3, we propose an algorithm for constructing such a local basis for regression problems. Then in Sec. 5.4, we show the result of applying our method to the same examples used in Sec. 4.5 and compare their performances. Finally we discuss the related methods proposed by others and contrast them with our method in Sec. 5.5. 5.2. Problem
formulation
We formulate a regression problem in a similar manner as the classification problem in Sec. 4. Let T = {(&i, 2/i)}£Li C X x y be a training data set with a signal space A f c R " , and a response space 3^ = K. We want to find a feature extractor / : X -4 T C Rfe, (fc < n) for extracting relevant features and reduc ing the dimensionality of the problem without losing important information as much as possible. If we can succeed in constructing such a feature extrac tor, then the subsequent regression process can be improved in its accuracy and efficiency. We call this regression process g : T —► y a predictor. As in the previous section, d = g o f : X —* y denotes the overall regression pro cess. We assess the performance of the whole process by the regression error (also called prediction error) using a test data set T ' as T' = {(yi,a^)}£Li (which has not been used to construct the feature extractors and predictors) as (l/N') £ t 1 ' i 6 ^ - <*(*;)), where Sp(r) = r*> with 1 < p < oo or the
338
N. Saito
relative £*> error, ||y' - y'|| P /||y'|| P , where y' = (yJ),Ci and y> = (d(*;))£'iThe resubstitution error (using training data set), of course, gives too opti mistic figures. Here, we focus on the feature extractors of the form / = 0^fc' o ^ , where ©(*) : X —> T represents the selection rule (e.g., picking most important k co ordinates from n coordinates), and $ £ 0(n), i.e. an n-dimensional orthogonal matrix. As a regression method g, we adopt the tree-based regression in the CART methodology 18 although other multivariate regression techniques such as ordinary linear regression, 54,120 projection pursuit regression, 60,77 or neural network regression 24 ' 125 are all possible to use. Before proceeding to the description of a basis selection algorithm, we briefly review regression trees (RTs) in the CART methodology. Essentially, an RT constructs a piecewise constant approximation to the response vector y by recursively partitioning the input signal space (along the coordinate axes) into a set of disjoint blocks and taking the average of the response values at each block. An actual prediction works as follows: each input signal Xj is dropped in the RT and reaches a certain terminal node; then the average value men tioned above is assigned as the predicted response value ?/j. An algorithm for constructing an RT is easily obtained by small modifications on the classifica tion tree (CT) algorithm; as a node value, replace the class assignment by the average of the response values of the samples at that node, and as the deviance of a node, replace the entropy by the residual sum of squares (i.e. \\y — y\\2); see Chap. 8 of Ref. 18 for more details. The MDL-based pruning algorithm can also be modified easily for RTs; see Appendix A. 5.3. Construction
of local regression
basis
We want to select a complete orthonormal basis from a library of orthonormal bases, i.e. a set of tree-structured subspaces. For classification, we have used relative entropy to measure the goodness of each subspace which leads to LDB. For the regression problem, instead of relative entropy, we use regression error computed from the expansion coefficients belonging to a subspace by invoking a specified regression method. As described in the previous section, let g denotes the final regression method (such as RT) after selecting the basis suitable for the regression. Let gj: R 2 —¥ y denote the regression method on the subspace fij,.. The methods gj's are normally the same regression method as g except the dimensionality of the input space. Then, we may take the following relative £p error on the subspace to evaluate it:
Local Feature Extraction and Its Applications 339
n,iB„X;9i)^y-«»>f'>8^>,
(5.1)
\\y\\p
where Bjtk is a matrix of basis vectors belonging to the subspacefi^*defined as (2.3), X = ( x i , . . . , a;/v), i.e. a matrix consisting of the training signals, and Vi(Bj,kX; gj) is the estimate of yi by the regression method gj on Bj^X, i.e. all the expansion coefficients of the training signals belonging to the subspace £lj,kAlternatively, we may take a residual sum of pth powers: JV
np(BitkX;gj)
= £ > * " Vi{BjtkX;gj)\*.
(5.2)
We note that these measures are not additive in the sense of Definition 2.3. The following algorithm selects a basis suitable for the regression problem relative to g. We call this basis a local regression basis (LRB) relative to g. In contrast with the LDB algorithm of Sec. 4 where the statistical method (classification) is used after the basis selection, the following LRB algorithm integrates the statistical method (regression) into the basis selection mechanism. The follow ing is an algorithm for selecting such a basis from a dictionary of orthonormal bases. Algorithm 5.1 (The Local Regression Basis Selection Algorithm). Given a training dataset T , Step 0: Choose a dictionary of orthonormal bases 2) (i.e. specify QMFs for a wavelet packet dictionary or decide to use either the local cosine dictionary or the local sine dictionary) and specify the maximum depth of decomposition J, a regression method g, and a measure of regression error Up. Step 1: Expand each signal into the dictionary 2). Step 2: Set Aj,k = BJ>k for k = 0 , . . . , 2J - 1. Step 3: Determine the best subspace Ajtk for j = J— 1 , . . . , 0, k = 0 , . . . , 2J' — 1 by
340
N. Saito
'Bj
ii1Zp(BJikX;9j) < Hp(Aj+i<2kX
Aj,k k Aj+i,2k
© ^j+i,2fc+i
U
Aj+it2k+iX;gj+i),
otherwise. (5.3)
Step 4: Supply k(< n) most important coordinates to the final regression method g. Unlike the BB and LDB, this basis may not give the smallest prediction error (using g/s) in the set of all possible bases obtainable by the divide-andconquer algorithm from the dictionary. This is not because the prediction error is non-additive but because the best prediction error of the union of the two individually-best subspaces may not be necessarily smaller than the best prediction error of the union of the two subspaces each of which is not individually-best by itself. This is a rather general problem in feature selection based on the prediction error or misclassification error. See Refs. 38 and 145 for a few interesting examples; see also Sec. 10.5 of Ref. 63 and Chap. 12 of Ref. 103 for more information. In this sense, the LRB is still a first step toward the general regression problem using the best basis paradigm. We will study how to obtain a better selection scheme in our future project. Step 4 is the so-called "selection-of-variables" problem. After Step 3, the regression error using gj is assigned to each subspace. Hence one way to reduce the dimensionality of the problem is to only consider the coefficients generated by the projection onto the subspaces whose regression errors are smaller than some threshold. The other way is to let the final regression method g select the most important coordinates by supplying all the coefficients. We use the latter approach for the examples in Sec. 5.4. In that case, each basis function is assigned its column index (ranging from 1 to n) of the table of tree-structured subspaces rather than an index of importance. In general, however, the MDL criterion 128 should be a good candidate for obtaining the optimal fc, and our future research will address this problem. As for an extension to a library of orthonormal bases, we can pick the LRB giving the smallest prediction error among the LRBs constructed from dictionaries in the library. Remark 5.2. A useful variant of this LRB algorithm is to consider nonlinear operations (such as taking log of absolute values, squaring, or thresholding) of
Local Feature Extraction and Its Applications
341
the expansion coefficients prior to the subspace evaluation (5.3). In particular, supplying the squares of the coefficients corresponds to regressing the response function on the time-frequency energy distributions of the input signals. We call the LRB selected from the squares of the coefficients "LRB2". 5.4.
Examples
It is possible to use the LRB algorithm for classification by replacing regression methods g and g/s by classification methods (e.g., CTs), and regression errors by misclassification rates. In this section we apply this LRB-based classification to the examples shown in Sec. 4 and compare the results. A genuine regression example using a real data set is described in Sec. 6 in depth. We first applied the LRB algorithm to the data set described in Exam ple 4.6 and generated the following table of misclassification rates. The lowest misclassification rate was obtained by the fully-grown RT on all the LRBP co ordinates. Here LRBP means the LRB selected by Algorithm 5.1 with pruned CT as gj's for subspace evaluation. Figure 5.1 shows this best tree. This tree selected 11 LRBP coordinates for the classification out of 32 possible Table 5.1. Misclassification rates of Example 4.6 using the LRB methods. The same QMF, C06, was used to generate the tree-structured expansion coefficients. F C T and P C T denote the full and pruned classification trees, respectively. LRBF and LRBP represent the coor dinates selected by the subspace evaluation using the FCTs and P C T s on the expansion coefficients, respectively. Thus, e.g., FCT on LRBP means that the full classification tree grown on all the coordinates selected by Algorithm 5.1 with pruned classification tree as
Method F C T on LRBF PCT FCT PCT FCT PCT FCT PCT
on on on on on on on
LRBF LRBP LRBP LRB2F LRB2F LRB2P LRB2P
4.33
rate (%) Test
17.00 4.33 16.67 5.00 21.67 4.00
24.33 25.10 22.13 25.00 23.00 27.50 25.30
17.00
25.10
342
N. Saito
I (dHtf)
X-S>O70O796
X.16
I dM«2 I "TS7~
1<3.31028 \ / «.1>331028
[ <*>—2 |
«5<0 494215 / x5>0 494215 (ctawT)
[ c*u»3 1
raj «1&<00522097 \ / x18>0 0522097 classS 37J
Fig. 5.1. The full C T giving the lowest misclassification rate using the LRB methods on the data set of Example 4.6. This tree is grown on the signals represented in the LRBP coordinates.
coordinates. Figure 5.2 shows these selected coordinates as well as the subspace pattern of the LRB. Comparing with the corresponding table and figures of the LDB methods in the Sec. 4, we observe the following: • The misclassification rates except the one by the LDA-based classification in Table 4.1 are comparable. • Seven functions out of 11 selected LRB functions have larger scale features (from the subspaces ^4,0.^4,11^3,1,^3,2) than the top five LDB functions shown in Fig. 4.3(b) (from ^2,0)- In fact the LRB functions try to combine
Local Feature Extraction
and Its Applications
343
the elementary triangular waves /ii,/i2,/i3 of Example 4.6, e.g., the LRB function # 6 has two major positive peaks around the functions hi and /12 and a major negative peak around /13.
m f
°; 4 10
20
15
25
30
(a)
Fig. 5.2. (a) The LRB functions used in the CTshown in Fig. 5.1. (b) The selected subspaces as the LRB. Table 5.2. Misclassification rates of Example 4.7. The same QMF, C12, was used to generate the tree-structured expansion coefficients. The abbreviations are exactly the same as in Table 5.1. The smallest error on the test data set is shown in bold font. Error
Method
Training
F C T on LRBF P C T on LRBF FCT on LRBP PCT FCT PCT FCT PCT
on on on on on
LRBP LRB2F LRB2F LRB2P LRB2P
2.00 4.67 2.00 2.67 2.00 4.67 2.33 4.67
rate (%) Test 5.73 8.20 6.47 5.40 5.73 8.20 9.60 9.27
Next we applied the same procedures to the data set of Example 4.7. The misclassification errors are summarized in Table 5.2. In this example, the
344
N. Saito
200/30CT x.3<10.0275 x.3> 10.0275
/
bell 1/99
101/201 x.5<17.3111 x.5>17.3111
_^ cylinder 21/120 x.11<-2.3464 x.11>-2.3464
0/81
z.
cylinder 2/14
9/106 x.20<-2.77825 / x.20>-2.77825 _X_ funnel cylinder
1/5 4/101 Fig. 5.3. The pruned CT giving the lowest misclassiflcation rate using the LRB methods on the data set of Example 4.7. This tree was initially grown on the signals represented in the LRBP coordinates and then pruned by the MDL-based pruning algorithm.
Fig. 5.4. (a) The LRB functions used in the CT shown in Fig. 5.3. (b) The selected subspaces as the LRB.
Local Feature Extraction and Its Applications
345
pruned CT on the LRBP coordinates gives the lowest misclassification rate. Figure 5.3 shows this best tree. This best tree selected 4 LRBP coordinates for the classification out of 128 possible coordinates. Figure 5.4 shows these selected coordinates as well as the subspace pattern of the LRB. Comparing with the corresponding table and figures of the LDB methods in Sec. 4, we observe the following: • Both misclassification rates are comparable. • Two basis functions from the subspace fi^o were selected by both methods. From these two classification examples, it is difficult to judge which method is superior. For the genuine regression problems where the data does not have a natural association with classes or categories, the LDB methods cannot be used. We study this type of regression problem in Sec. 6. 5.5.
Discussion
In this section, we discuss some of the related methods proposed by others and a possible extension of our methods. In Ref. 67 Guo and Gelfand proposed an idea to improve the CT by using a small neural network at each node in the tree. By the use of neural networks, their method allows one to split the input signal space nonlinearly to gain more class separability than the CTs using coordinatewise splits and the linear splits. The fundamental difference between our approach and their approach is that they do not use the local information in the time-frequency plane; they claim that they can extract the local features but in our opinion, local only in the sense of features mappable from the original coordinates by the neural networks they use. For example, it is essentially impossible to use the local frequency information by their algorithm unless one supplies the input signals represented in the Fourier basis. They applied their algorithm to the same example we used, i.e. Example 4.6, and achieved the misclassification rate about 18% using 250 training samples and 5,000 test samples. This is comparable with our results. In fact, their result is worse than the best LDB method and is better than the best LRB method. The key advantage of the use of neural networks is their ability to approximate nonlinear relationships between inputs and outputs. It is an interesting exercise to compare their result with the performance of the LRB method using a neural network as a regression scheme g which more directly address the extraction of the local features using neural networks. We expect that our LRB method will improve
346
N. Saito
significantly if nonlinear interactions among the local basis coordinates at each subspace are allowed via neural networks. Similar approaches to Guo and Gelfand's method have been reported in the field of speech recognition. 119 ' 141 Regarding the use of the wavelet basis functions in the neural networks, the reader can refer to Refs. 4, 15, 113, 142 and 160. The works described in Refs. 4, 15 113 and 160 are essentially the same: they suggest replacing the activation functions (three popular choices are (a) the step function; (b) the sigmoid function; and (c) the Gaussian function) by wavelet basis functions for their multiresolution capabilities. Their approach addresses the neural net works themselves and incorporates wavelet bases for improving the classifica tion/regression ability of the neural networks. Our approach is fundamentally different; we address selecting the best possible basis functions localized in the time-frequency plane as feature extractors from a library of bases using the regression scheme which gives the best performance for the problem at hand. A neural network is simply one of the regression schemes from our viewpoint: if there is a better regression scheme, we use it instead. We do not attach any inherent importance to the neural network although we appreciate their flexibility and nonlinear capability. The work of Szu et ai. 142 is slightly different from the above-mentioned works; they input the wavelet coefficients of the training signals to the neural network having the sigmoidal activation function. More precisely, let ip(t) be a wavelet mother function and tl>(ak
for i = 1,...,N. The whole exercise is to compute (wk, afc, &t) so as to minimize the classification error measured by TZ = (1/2) $2 i=1 (y« —Vi)2 where j/j is the de sired classifier output, i.e. x/i = 1 if Xi belongs to class 1 and j/j = 0 if x^ belongs to class 2 (they considered the binary classification case there). In particular, they use the Morlet mother wavelet, ip(t) = cos(u>t)exp(—12/2) and compute the parameters (wk,ak,bk) using the conjugate gradient method. We can see a clear difference from our philosophy. They try to optimize the translationdilation parameters for the fixed wavelet mother function whereas our approach uses many different wavelet bases (including wavelet packets/local trig, bases)
Local Feature Extraction and Its Applications
347
for a set of predefined translation-dilation parameters (dyadic dilations and associated circular translations). The potential problems of their approach are: (a) how to select the mother wavelet function in the first place; and (b) the conjugate gradient method can get stuck in the local minima of the objective function TZ. As for an extension of the LRB method, instead of restricting only one regression family gj to evaluate subspaces at level j , we may consider many different regression methods gij,.. .,gm>j, and take a method which gives the smallest regression error for each subspace. Thus, if we register the best re gression method as well as the smallest regression error at each subspace, then we obtain the LRB with a list of the regression methods. This strategy makes sense since each regression method has pros and cons; e.g., RTs are relatively good for nonlinear relationships between input signals and responses, but are less accurate for the problems with linear relationships where the linear regres sion works much better, etc. We will address the use of multiple regression methods in our future project. Finally, we give our thoughts on the LRB method versus the LDB method. As we can easily see, the LRB method is more flexible and general than the LDB method. But it is more computationally intensive than the LDB method; a regression method gj has to be invoked at each subspace in the tree-structured subspaces. Which method to be used really depends on the problem at hand. For the general regression problem, the choice is definitely the LRB method. For classification problems and certain regression problems mentioned in Re mark 4.5, the LDB method may be a first choice because of its computational efficiency.
5.6.
Summary
In Sec. 5, we have proposed a method to select a complete orthonormal basis [local regression basis (LRB)] from a library of orthonormal bases which is suitable for regression problems. This method uses prediction error (computed by a specified regression scheme) as a measure of the goodness of each subspace so that the regression scheme is integrated into the basis selection mechanism. We have shown that the LRB method can also be used for the classification problems and have examined its performance using Examples 4.6 and 4.7 of Sec. 4. The results are comparable with those of the LDB method. The LRB method is more flexible and general than the LDB method; however, it is more computationally intensive than the LDB method.
348
N. Saito
6. Extraction of Geological Information from Acoustic Well-Logging Waveforms Using LDB and LRB Methods 6.1.
Introduction
In this section, we apply the LDB and LRB methods developed in the previous sections to a real geophysical regression problem. The problem we consider here is to infer some geological properties of subsurface formations from measured acoustic waveforms which propagated through these formations. The acoustic measurements have been used in geophysical well logging for a long time to infer petrophysical properties of subsurface formations. 144 These measurements consist of the following procedure. First, an acoustic pulse is generated at the transmitter of a measurement tool lowered down in a borehole. Then, this pulse propagates through the surrounding formations. Finally, the pressure field is recorded at the receiver of the same measurement tool. This process is repeated until the tool is drawn up to a certain depth level. See the illustration of this type of tool in Fig. 6.1. The main purpose of this measurement is to:
Fig. 6.1. An illustration of a simple sonic tool. The tool is represented by the ellipse. The symbols T and R denote a transmitter and a receiver equipped in the tool, respectively. A typical distance between the transmitter and the receiver js 9 ft. An actual tool normally has two or eight receivers to compensate the borehole effects. The arrows connecting the transmitter and the receiver simply illustrate raypaths of P or S wave components. The recorded wave form data is digitized and sent to the surface processing unit through the cable attached to t h e tool.
Local Feature Extraction and Its Applications
349
• Calibrate the reflection seismic imaging algorithms. • Deduce the lithology information from the acoustic/elastic properties of sub surface formations. • Assess the mechanical property of formations including fracture detection. A typical recorded wave form, as shown in Fig. 6.2, consists of three types of localized wave components: a refracted compressional (or longitudinal or primary) wave called P wave, a refracted shear (or transverse or secondary) wave called S wave, and a guided surface wave called the Stoneley wave. The P and S waves follow paths that minimize the traveling times between the trans mitter and the receiver. The Stoneley wave, which is guided by the fluid-rock interface, travels more slowly than the two refracted waves and is the dominant event at later times in the wave form. Traditionally, velocities of these three wave components with or without amplitudes of these components have been used to infer geological information of the formations. These quantities are re lated to the formation properties such as porosity, mineralogy, grain contacts, and fluid saturation etc.; see e.g., Refs. 109,154 and 158 and references therein for more details.
0.001
0.002
0.003 Time (sec)
0.004
0.005
Fig. 6.2. A typical acoustic wave form recorded in the downhole. The surrounding subsurface formation consists of shale in this case. The Stoneley wave component normally has a dominant energy.
350
N. Saito
The velocity and amplitude information of a particular wave component is just a part of the information contained in the entire wave fofm shape since the velocity can be computed from its arrival time (the starting time position) and the amplitude is simply the maximum value of the wave component. It is an extremely difficult task to validate the entire shape information by the ex act mathematical modelling and computer simulation because of the complex ity of: (a) the material, i.e., the subsurface formations of various mineralogy with varying pore spaces containing different types of fluids; (b) the geometry, i.e. varying diameters of the borehole and the rugosity of the borehole wall; and (c) the physics, i.e., acoustic/elastic wave propagation phenomena in such formations. These are the reasons why there have only been a few attempts to fully utilize the wave form shape information, 71,75 although the relationships between the shapes of the wave forms and the types of rocks have long been recognized. The first systematic method to use the wave form shape information is due to Hoard. 71 His method estimates lithologic information or attributes (i.e. porosity, volume percentages of various rocks such as sandstone, shale, and limestone, etc.) from the full acoustic wave forms combining with several other geophysical measurements such as the natural radioactivity and resis tivity of the formations. In his study, the lithologic information was obtained by careful study of all available data including drill cuttings, core samples, and other geophysical measurements. After selecting the training data set, the clusters in the input signal space are identified using the graph-theoretic clustering algorithm (pp. 539-541 of Ref. 63). Any clustering method requires one to specify the similarity measure among input vectors, and the standard Euclidean norm was used in his approach. This is, however, a global measure of similarity: a slight time shift in the wave forms creates a large distance in this norm, and the large amplitude portions "mask" small features. Because of this problem, the envelopes of the wave forms are computed by their Hilbert transforms, and then the log values of the envelopes are used as inputs to the clustering algorithm. After identifying the clusters, a mean vector and mean lithologic attributes are computed for each cluster. Finally, for each vector in the test data set, the distances between that vector and the mean vectors of the clusters are compared and the lithologic attributes of the closest mean vector is taken as the test vector's lithologic attributes. Although he claims that it works well as long as the test data set comes from the similar geo logical environment such as the wells near the training well, there are several
Local Feature Extraction and Its Applications
351
problems with this approach. Since this simply uses the entire envelope infor mation with the Euclidean norm as a similarity measure, it is very difficult to interpret the results: which wave component is responsible for what lithologic attributes? Also, the training process is computationally extremely expensive since this approach does not reduce the dimensionality of the problem. Finally, it is not too clear why the clustering technique (unsupervised learning) is used rather than classification/regression techniques (supervised learning methods). A few years later, Hsu recognized the importance of the relationships be tween individual wave components and lithologic information, and proposed a different approach. 75 His approach first extracts the wave components in the data set separately by aligning the arrival times of each component throughout the data set and segmenting each component with an appropriate time win dow. This process generates three sets of vectors corresponding to P, S, and Stoneley wave components. Then for each wave component set, the KarhunenLoeve transform (KLT) is applied and a few largest eigenvectors are obtained. Due to the alignment, the first and second eigenvectors account for major part (in his example, more than 90%) of the total energy of the set. Then all vectors in the set are projected in this coordinate system (spanned by these two eigen vectors) and the structures of the point clouds in this coordinate system are examined. In his example, the projected data points tended to have clusters depending on the formation rock types (e.g., sandstone, limestone etc.) where they propagated and this tendency was more pronounced in the Stoneley wave set than in the P wave set. (He could not use the S wave components because the S waves did not exist for certain depth levels due to the soft formation conditions.) Although his findings are interesting and his method is easier to interpret than that of Hoard, it is still computationally very expensive due to the use of the KLT as we mentioned in Sec. 2. (He proposed a compu tationally faster method to simulate the first and second eigenvectors. Such a method, however, may only be applicable to the specific example he used. See Ref. 75 for more details.) Moreover, the features extracted may not carry subtle discriminatory information since a few largest KL basis vectors only capture the major and common features in the data set. His method is again f
This alignment is normally done in a semi-automatic way: the user first defines an ap propriate time window, then the computer tries to track the first zero-crossing of the wave component within the time window. Since the positions of these zero-crossing vary (some times wildly) from trace to trace, the manual editing is sometimes necessary.
352
N. Saito
considered as a clustering or an unsupervised learning technique rather than classification/regression. Although our purpose here is similar to the above studies, our approach is fundamentally different from theirs. We use the regression techniques de veloped in the previous sections which use the local information in the timefrequency plane. Similarly to the genuine classification problems, the key is how to extract useful features from input signals and reduce the dimensionality of the problem at hand since this again enhances the conventional regression methods in both efficiency and accuracy. 6.2. Data description
and problem
setting
In this exercise, we use the acoustic wave forms recorded at a certain well. We have 3,012 wave forms recorded at every 0.5 ft depth interval. Each wave form consists of 512 time samples with sampling rate of 1 0 - 5 second. Along with the wave forms, we also have lithologic information around each receiver location which was computed from various geophysical measurements using the volumetric analysis methods described in Refs. 22 and 118. In this study, we use the volume fractions of quartz (main constituent of sandstone), illite (a type of clay which is a main constituent of shale in this area), and gas as the lithologic information. We note that no acoustic/elastic information was used to generate the lithologic information in this case. The region where the well is located mainly consists of sandstone-shale sequences, i.e., this is a relatively simple geologic setting. Most sandstone layers contain either gas or water. Figure 6.3 shows the data set under study. From the volume fraction curves in Fig. 6.3, we observe that 1. There is a thick sandstone layer containing gas around the depth index ranging from 1,600 to 2,100. 2. There is a shale layer around the depth index ranging from 700 to 1,100. 3. There are alternating sandstone-shale sequences above the thick sandstone layer and below the shale layer described above. Let us call the wave forms propagated through sandstone layers "sand wave forms" and those propagated through shale layers "shale wave forms" for short. We observe the following wave form features from Fig. 6.3: 1. The S wave components in the sand wave forms have much stronger energy and faster speed than those in the shale wave forms.
Local Feature Extraction and Its Applications
353
2. Velocities of the P wave components in the sand wave forms are higher than those in the shale wave forms. 3. Velocities of the Stoneley wave components in the shale wave forms are lower than those in the sand wave forms except those in the bottom 200 levels.
0.0 0.2 0.4 0.6 0.8 1.0 Volume Fraction
0
0.001
0.002 0.003 Time (sec)
0.004
0.005
Fig. 6.3. These figures show the whole data set used in this study. The left figure shows three curves representing volume fractions of quartz (solid line), illite (dotted line), and gas (thick solid line). The right figure shows the acoustic wave forms recorded at the corresponding depth levels as a gray scale image. The depth index 0 corresponds to the deepest level.
The physics of wave propagation suggests that in fact the P and S velocities are sensitive to the fluid content and the mineralogy, and the Stoneley wave velocity is sensitive to the permeability of the formations as well as the bore hole conditions such as rugosity and diameters of the borehole. 109,154,158 The exceptionally high velocities of the Stoneley wave components in the bottom region mentioned above may be due to the borehole conditions. Because of the sensitivity to the borehole conditions, we smoothly taper off the Stoneley wave component from each wave form and only consider the earlier part of the wave forms.
354
N. Saito
0.0 0.2 0.4 0.6 0.8 1.0 Volume Fraction
0
0.0005
0.001 0.0015 Time (sec)
0.002
0.0025
Fig. 6.4. These figures show the training data set selected for this study. Bottom 201 recordings correspond to the shale dominant region (depth index ranging from 800 to 1,000 in Fig. 6.3). Top 201 recordings correspond to the sandstone regions (depth indices ranging from 1,160 to 1,188 [water sand], from 1,340 to 1,410 [water sand], and from 1,780 to 1,880 [gas sand]). Three curves in the left figure again correspond to the volume fractions of quartz (solid line), illite (dotted line), and gas (thick solid line). The acoustic wave forms shown in the right figure have been smoothly tapered off to eliminate the Stoneley wave components.
As a training data set, we selected the data from the most representative regions, i.e. 201 contiguous depth levels from the main shale layer and 201 depth levels from three different sandstone layers as shown in Fig. 6.4. The purpose of this exercise is to examine: (a) how accurately we can predict (in an automatic manner) the volume fractions of quartz, illite, gas at each depth level from the acoustic wave form propagated around that level without assuming the detailed physical models; and (b) what features in the wave form are important for estimating these volume fractions. Using this training data set, we proceed to the regression analysis using LDB and LRB with CART as a basic regression tool. 6.3.
Results
Since the velocity information (i.e. the locations of the wave components in the time domain) is important in this study, a natural choice of the time-frequency
Local Feature Extraction and Its Applications
355
decomposition is the local trigonometric transforms rather than the wavelet packets. Hence, we use the local sine transform (LST) in this study. In the following, the test data set means the whole data set excluding the training data set, i.e., the test data set consists of 2,610 wave forms and the corresponding lithologic attributes. Also, we simply say "volume" instead of volume fraction for short. 6.3.1. Analysis by LDB First we computed the LDB assuming that the training data set consists of two classes, i.e. shale class and sandstone class. In reality, there exist layers of "shaly sand", a mixture of sandstone and shale, in this region, which are difficult to classify into either sandstone or shale in a clear manner. This assumption, however, is a good starting point to examine what features in the wave forms carry the discriminatory information between sandstone and shale. In this study, the symmetric relative entropy (4.2) was used as the discriminant measure. The order of importance of individual LDB functions was computed by (4.7). Top 50 basis functions and selected subspaces are displayed together in Fig. 6.5. We can observe that the most discriminant basis functions are
0
0.0005
0.001
0.0015
0.002
0.0025
0.0015
0.002
0.0025
(a)
0
0.0005
0.001 (b)
Fig. 6.5. (a) Top 50 LDB functions using LST. (b) Selected subspaces as the LDB.
356
N. Saito
the localized wiggles around the P and S wave components (around t = 0.001 and t = 0.0015, respectively). Then, for each lithologic attribute, we applied the CART procedure to the training wave forms represented in the standard Euclidean coordinates and then those represented in the LDB coordinates. In each case, the full tree was grown first, and the regression (or equivalently prediction) errors for the training data set and the test data set were obtained. Finally, the full tree was pruned by the MDL-based algorithm described in Appendix A and the regression error was computed. We adopt the relative I2 error, ($2(yi — d(xi))2/Y^yf)1^'', as a measure of regression error since this is used as the deviance in the regression tree (RT) procedure in S 1 0 , 2 3 and SPLUS140 which we use for all the experiments in this section. These regression errors are summarized in Table 6.1. Table 6.1. The prediction errors on the lithologic attributes using the tree-based regression with the wave form d a t a represented in the standard Euclidean coordinates and the LDB coordinates. FRT and PRT denote full regression tree and pruned regression tree, respec tively. STD, LDB50, LDB denote the wave forms represented in the standard Euclidean coordinates, the top 50 LDB coordinates, and all the LDB coordinates, respectively. The smallest errors in the test data columns are displayed in bold font. Quartz
FRT PRT FRT PRT FRT
on on on on on
STD STD LDB50 LDB50 LDB
P R T on LDB
Illite
Gas
Training
Test
Training
Test
Training
Test
0.069 85
0.2641
0.095 97 0.072 75 0.09799 0.069 88
0.260 2 0.249 9 0.253 2 0.2574
0.163 0 0.208 3 0.168 9 0.2314 0.162 9
0.6770 0.664 8 0.619 6 0.6186 0.594 8
0.213 2 0.239 0 0.218 0 0.282 5 0.198 6
0.8616 0.858 6 0.868 8 0.834 9 0.881 8
0.096 97
0.242 3
0.2117
0.584 3
0.243 7
0.885 9
For the quartz volume, the best result (in this table) was obtained by using the pruned tree regression on all the LDB coordinates. This pruned tree is plotted in Fig. 6.6. Only four LDB coordinates out of 256 are used in this tree. These LDB functions are displayed in Fig. 6.7. Three LDB functions with indices (127,27,85) are located around the P wave components of the sand wave forms and one LDB function #10 is located around the S wave components of the sand wave forms. (We use #fc to denote index k for short.) Examining the tree in Fig. 6.6 reveals that the combination of the LDB functions #127 and #10 is responsible for high quartz volume: the tree says,
Local Feature Extraction and Its Applications
357
tfioafc x.127<-4.3207
\ x.127>-4.3207
x1(k-54.9496 / x.10>-54.9496
x.27<-26.0971 / x.27>-26.0971
x.B5<-3.38718 x85>-3 39718
Fig. 6.6. The pruned regression tree for the quartz volume. This tree was initially grown on the complete wave forms represented in the LDB coordinates. Nodes are represented by ellipses (interior nodes) and rectangles (terminal nodes/leaves). The node labels are the predicted values of the quartz volume. The numbers displayed under each node represent the residual sum of squares within that node. The splitting rules are displayed on the edges connecting nodes and x.127 means the basis coordinate of index 127.
\
\
1 T 0.0005
0.001 0.0015 Time (see)
0.002
Fig. 6.7. The LDB functions used in the pruned regression tree for the quartz rate shown in Fig. 6.6. These are displayed in the depth-first search manner in the tree.
358
N. Saito
"If the LDB coordinate #127 is less than —4.320 7, then check the coordi nate #10. If that is less than —54.9496, then assign 0.719 3, otherwise assign 0.806 3 as the quartz rate." This observation agrees with the physics of wave propagation described in Refs. 109, 154 and 158. R e m a r k 6.1. The indices (127, 10, 27, 85) used in the best tree are in fact the order of importance in terms of (4.7). These indices change when a different ordering scheme is used; e.g., with the Fisher index (4.8), they are (1, 21, 112, 3). This would suggest, at least in this example, letting the regression method select the best LDB coordinates by supplying all the LDB coordinates rather than worrying about the order of importance and selection of the coordinates prior to applying the regression method. (This is only applicable if the regression method has a built-in capability of selecting the best coordinates. CART is one of such regression methods.) The LRB method completely avoids this problem at the expense of the computational time. Figure 6.8 compares the original and the predicted quartz volume for the training data set and the whole data set (including the training data set) by this pruned tree regression.
00
02
0.4
0.6
0.8
1.0
(a)
Fig. 6.8. The prediction of the quartz volume by the pruned regression tree shown in Fig. 6.6. (a) The training data set. (b) The whole data set. In both cases, the solid line and the dotted line correspond to the predicted quartz volume and the original quartz volume, respectively.
Local Feature Extraction and Its Applications
359
.182C X.127<-4.3207 X.127>-4.3207
L
0.06753 0.33130 £.851 ( X.27<-26.0971 x.27>-26.0971
L
0.07150 0.04511 X.85<-3.39718 x.85>-3.39718 /
0.15880 0.04249 -0.6511( x.222<-7.17828e-06 x.222>-7.17828e-06
L
0.19830
A.
0.32100
0.04128 0.52230 Fig. 6.9. The pruned, regression tree for the illite volume. This tree was initially grown on the complete wave forms represented in the LDB coordinates. Notice that three coordinates are the same as the ones in the tree for the quartz volume.
For the illite volume, the same procedure, i.e. the pruned tree on all the LDB coordinates, gives the best result in this table. This tree is in Fig. 6.9. Again only four LDB coordinates are used in this tree, and three out of which are exactly the same as the quartz case. This may be explained by the "duality" of the response curves of quartz and illite volumes in Figs. 6.3 and 6.4. The illite volume curve is roughly a flipped version (around volume fraction 0.4) of the quartz volume curve. The corresponding basis functions are displayed in Fig. 6.10. The basis function #222, located rather late in time, seems responsible for high illite volume (see Fig. 6.9); however, it is not at all clear whether that is the case since the threshold used for this coordinate is very small (—7.178 28x 10~ 6 ), and the function is located around the tapered region. On the other hand, removing this coordinate from the tree does not reduce the regression error either (the error on the test data set becomes 0.588 5 instead of 0.5843). Figure 6.11 shows the prediction of the illite rate for the training data set and the whole data set by this pruned tree regression.
I
I 0
OWa5
OW1
O.Wi5
0.002
OW25
Tim (ra)
Fig. 6.10. The LDB functions used in the pruned regression tree for the illite rate ahown in Fig. 6.9. These are dimplayed in the depth-first search manner in the tree.
Fig. 6.11. The prediction of the illite volume by the pruned regreasion tree shown in Fig. 6.9. (a) The training data set. (b) The whole data set.
Local Feature Extraction and Its Applications
361
Fig. 6.12. The pruned regression tree for the gas volume. This tree was initially grown on the top 50 LDB coefficients of the wave forms.
Finally for the gas volume, the pruned regression tree on the top 50 LDB coordinates gives lower regression error than the one on all the LDB coordinates does. This tree is plotted in Fig. 6.12. The corresponding basis functions are displayed in Fig. 6.13. The majority of the selected LDB functions are located around the 5 wave components. Examining the tree in Fig. 6.12 carefully, we observe that the nodes in the right branch from the root node has higher gas volume than those in the left branch. In particular, the LDB functions of indices (23, 17, 11) are responsible for high gas volume. Out of these three basis functions, the functions (23, 11) are located around t — 0.0015 where in fact the S wave components with high amplitudes and relatively low frequency bands exist in the sand wave forms, especially the "gas sand" wave forms as we can see from Fig. 6.4. Note that the basis function #35 in Fig. 6.13 also located around t = 0.0015; however, its frequency band is higher than functions (23, 11). In fact the function #35 is used in the left branch of the tree in Fig. 6.12 and is responsible for low gas volume if combined with the low value of the LDB coordinate # 2 . These observations again agrees with the explanation
362
N. Saito
9
00005
0.001
0.002
0.0015 Time (sec)
0.0025
Fig. 6.13. The LDB functions used in the pruned regression tree for the gas volume shown in Fig. 6.12. These are displayed in the depth-first search manner in the tree.
0.0
0.05
0.10 (a)
0.15
0.20
0.0
0.05
0.10 (b)
0.15
0.20
Fig. 6.14. The prediction of the gas volume by the pruned regression tree shown in Fig. 6.12. (a) The training data set. (b) The whole data set.
Local Feature Extraction
and Its Applications
363
from the physics. 109 ' 154,158 Figure 6.14 shows the prediction of the gas rate for the training data set and the whole data set by this pruned tree regression. 6.3.2. Analysis by LRB Now we describe the results using the LRB methods. Without assuming class assignments, four different LRBs were computed using the combinations of the regression method for the subspace evaluation (Full RT or Pruned RT) and the coefficients mapping (original expansion coefficients or the squares of them). For each LRB, all the basis coordinates were supplied to the CART program and the full and pruned RTs were obtained. Unlike the LDB method, we decided not to order the individual basis functions in terms of their importance; see also Remark 6.1 and Sec. 5.3. The regression errors are shown in Table 6.2. These were also measured by the relative I2 error. Overall, errors using the LRB coordinates are comparable with those using the LDB coordinates. For the quartz and gas volumes, the best results by the LRB method beat those by the LDB method. Table 6.2. The prediction errors on the lithologic attributes using the tree-based regression with the wave form d a t a represented in the LRB coordinates. The smallest errors in the test d a t a columns are displayed in bold font. LRBF and LRBP denote the bases selected by invoking the full and pruned RTs at each subspace, respectively. LRB2F and LRB2P denote the bases selected by invoking the full and pruned RTs on the square of the expansion coefficients at each subspace. Illite
Quartz
Method
Gas
Training
Test
Training
Test
Training
Test
FRT on LRBF
0.07166 0.09635 0.07423 0.098 25 0.069 31 0.09518 0.065 81
0.2744 0.2517
0.1593 0.2073 0.1611 0.2078 0.1699 0.2177
0.2174 0.2075 0.244 2 0.189 9 0.195 6 0.1911 0.203 7
0.8614 0.893 6
0.2481 0.235 6 0.253 4 0.2523 0.255 0 0.2475
0.6144 0.5949 0.6097 0.595 7 0.615 8 0.6061 0.615 8 0.606 3
0.189 0
PRT FRT PRT FRT PRT FRT PRT
on on on on on on on
LRBF LRBP LRBP LRB2F LRB2F LRB2P LRB2P
0.09198
0.1584 0.2044
0.823 3 0.810 8 0.870 2 0.8727 0.862 8 0.867 5
For the quartz volume, the best result so far was obtained by using the pruned tree regression on the LRBP coordinates. As explained in Sec. 5, LRBP denotes the basis selected by the LRB algorithm using the subspace evaluation based on the pruned tree regression errors. This tree is plotted in Fig. 6.15. The
364 N. Saito
x.153>-0.0408494
x.81<-28.8741
x.85<1.4S48 \ / x.85>1.4548
\ x.81>-28.8741
x.16<-3.20611 / x.16>-3.20611
/
\
0.7841
0.7097
TOJSBW
0.42480
x168<0 592645 ' / x168>0 592645
/
\
/
\
0.6463
0.8212
0.4807
0.7014
TJ200WT
U.U/806
0.66330
0JBZ&
Fig. 6.15. The pruned regression tree for the quartz volume using the wave forms represented in the LRBP coordinates.
4
s X
t-
1*
AAA
A/V
T^
NAAAAAAA^ 0.0005
0.001
0.0015
0.002
0.0025
0.0015
0.002
0.0025
(a)
o M
^^^^^^^^^|
CO ■ *
u>
111110.0005 111
0.001
0>)
Fig. 6.16. (a) The LRB functions used in the pruned regression tree for the quartz vol ume shown in Fig. 6.15. These are displayed in the depth-first search manner in the tree. (b) The selected subspaces as the LRB.
Local Feature Extraction and Its Applications
365
corresponding basis functions and the selected basis pattern (subspaces) are displayed in Fig. 6.16. A close examination of Figs. 6.15 and 6.16 suggests that the most important LRB coordinate is #153, i.e. the localized wiggle around the S wave components of the sand wave forms. This basis function works as a detector: if the expansion coefficient of each wave form onto this function is below a certain threshold (—0.0408494), that wave form is considered as "sand wave form". Also, we can observe that the combination of the LRB functions # 8 1 #85 (both located around the P wave components) works also as a detector for the sandstone layer containing water: if the coordinate # 8 1 is smaller than —28.8741 and the coordinate #85 is larger than 1.4548, then the highest quartz volume (0.8212) is assigned. When we see the prediction curve of the quartz volume for the training data set in Fig. 6.17, we find that in fact this highest quartz volume is assigned around the indices ranging from 207 to 230 which corresponds to the depth indices 1,165 ~ 1,188 in the whole data set, i.e. the water sand region. In Fig. 6.4, we notice that the wave forms of the water sand region under discussion have rather different characteristics components than the other water/gas sand regions. The LRB functions # 8 1 and #85 "saved" these wave forms: if we had cut these coordinates, the quartz volume estimate of this region would have been much lower (in fact, 0.5346
(a)
(b)
Fig. 6.17. The prediction of the quartz volume by the pruned regression tree shown in Fig. 6.15. (a) The training data set. (b) The whole data set.
366
N. Saito
as we can see in Fig. 6.15). These subtle arguments could not have been done without extracting the local features in the time-frequency plane. For the illite volume, the pruned tree regression on all the LRBF coordi nates gives the best result in this table. The best tree, the LRB functions and the selected subspaces and the prediction results are plotted in Figs. 6.18-6.20, respectively. From these figures, we observe that the selected LRB functions are very similar to the LDB functions shown in Fig. 6.10 except that the LRB has a basis function #152 located around the 5 wave component of the sand wave forms. The sand wave forms have considerable energy in the LRB coor dinates (82,89,152) whereas the shale wave forms have very small energy in these coordinates. The LRB algorithm decides to use this information to infer the illite volume.
7.18200N x.82<-4.3207 •
x.82>-4.3207
0.06753 0.33130
2.85100N x.89<-6.44566 x.89>-6.44566 0.07024
0.80160" 0.04770 x.152<-0.860046 / x.152>-0.860046 0.13600 0.02532
0.61660N X.234<-1.92171e-11 / x.234>-1.921716-11
/ .
\
0.32430 0.26330 0.08673 0.45150 Fig. 6.18. The pruned regression tree for the illite volume using the wave forms represented in the LRBF coordinates.
For the gas volume, the pruned regression tree on the LRBP coordinates gives the lowest regression error among all the methods we have tried so far. The best tree, the LRB functions and the selected subspaces and the predic tion results are plotted in Figs. 6.21-6.23, respectively. From these figures, we observe that the time axis was first split into two. In the later half (which in cludes the S wave components), the LRB method decided to do the "frequency
Local Feature Extraction and Its Applications
367
-V
+■ 0.0005
0.001
0.0015
0.002
0.0025
0.0015
0.002
0.0025
(a)
0.0005
0.001 (b)
Fig. 6.19. (a) The LRB functions used in the pruned regression tree for the illite vol ume shown in Fig. 6.18. These are displayed in the depth-first search manner in the tree. (b) The selected subspaces as the LRB.
■Z.iOE.
8
0.0
0.1
0.2
0.3 (a)
0.4
0.5
00
0.1
0.2
0.3
0.4
0.5
(b)
Fig. 6.20. The prediction of the illite volume by the pruned regression tree shown in Fig. 6.18. (a) The training data set. (b) The whole data set.
368
N. Saito
TXXBBS? Fig. 6.21. T h e pruned regression tree for t h e g a s v o l u m e using t h e wave forms represented in the LRBP coordinates.
Local Feature Extraction and Its Applications
> X % c m DC -J
9 143 122 34 126 14> 34 151 16*
369
5E MM»»»»»-
-*^A^-
0.0005
0.001
0.0005
0.001
0.0015
0.002
0.0025
0.0015
0.002
0.0025
O) Fig. 6.22. (a) The LRB functions used in the full regression tree for the gas volume shown in Fig. 6.21. These are displayed in the depth-first search manner in the tree, (b) The selected subspaces as the LRB.
0.0
0.05
0.10 (a)
0.15
0.20
0.0
0.05
0.10 (b)
0.15
0.20
Fig. 6.23. The prediction of the gas volume by the full regression tree shown in Fig. 6.21. (a) The training data set. (b) The whole data set.
370
N. Saito
analysis." The earlier half of the time axis (which includes the P wave com ponents) was further segmented into finer time windows. The LRB functions (8,9,143,34,122,126) are responsible for high gas volume, and in particular, (143,122). The function #143 is a sine wave whose frequency content agrees well with the S wave components of the gas sand wave forms. The function #122 is located between t w 0.001 and t = 0.00125 and has rather high fre quency content. From Fig. 6.4, we observe that the gas sand wave forms have such wave form features in that interval whereas the water sand wave forms have smaller energy and the shale wave forms have much lower frequency con tent in that interval. 6.4.
Discussion
In the previous section, we showed the results of the regression analysis using the LDB and LRB methods. The selected basis functions (as the useful features for predicting the lithologic attributes) were interpreted more easily in the light of the physics of wave propagation than the previously proposed methods such as Refs. 71 and 75. But there still remain some questions to our approaches. 6.4.1. On the choice of the training data set In our study, we have selected the most representative regions of water sand, gas sand and shale from the whole data set. In other words, we have used our a priori knowledge on the data set and have selected the training data set in a nonrandom fashion. If we have such knowledge, we should actively use it for the training. To examine the dependence of the performance of our method on the selection of a training data set, we conduct the following experiment: the same number of the depth levels (402 levels) is chosen randomly from the whole data set (3,012 depth levels) and is used as a new training data set. In this case, it is extremely difficult to apply the LDB method since the classes of randomly sampled levels are not well-defined: the training data set includes the data from the "shaly-sand" region as mentioned in the beginning of Sec. 6.2. Therefore, we conducted the tree regression analysis on the LRB coordinates (and on the standard Euclidean coordinates also). The results are summarized in Table 6.3. For each lithologic attribute, the smallest regression error on the test data set in the table overcomes the best of all the results using the nonrandom training data set of the previous section. On the other hand, the resubstitution errors in this table are consistently larger than those
Local Feature Extraction
and Its Applications
371
Table 6.3. The prediction errors on the lithologic attributes using the LRB methods applied to the randomly-sampled training data set. The smallest errors in the test data columns are displayed in bold font. Illite
Quartz
FRT on STD PRT on STD F R T on LRBF PRT on LRBF FRT on LRBP P R T on LRBP FRT on LRB2F P R T on LRB2F FRT on LRB2P P R T on LRB2P
Gas
Training
Test
Training
Test
Training
Test
0.08916 0.1623
0.220 3 0.188 8 0.235 5
0.2296 0.4146 0.224 7 0.3963 0.240 2 0.385 9 0.224 7 0.3963 0.2323 0.4073
0.532 8 0.458 9 0.549 3 0.4570 0.5164 0.4595 0.549 3 0.4570 0.535 7 0.4688
0.342 6 0.542 3 0.278 4 0.458 3 0.288 8 0.4131 0.329 5 0.471 5 0.312 7
0.8068 0.732 9 0.7769 0.712 7 0.794 7 0.736 4 0.789 4 0.7372 0.728 7
0.4916
0.7218
0.086 52 0.1666 0.092 37 0.1666 0.09201
0.186 0 0.218 6 0.1860 0.242 5
0.178 5 0.09041 0.1686
0.195 0 0.2104 0.1909
using the nonrandom training data set. These two observations suggest that the nonrandom training data set, even though we thought we had selected the good representative regions, does not cover the actual response space well (e.g., the data from the "shaly-sand" region). In our future project, we will address more elaborate random sampling schemes such as the bootstrap 58 to increase the prediction accuracy. An interesting exercise would be to construct a regression rule using the whole data set from this well and then use that rule to predict the lithologic attributes of the neighboring wells using the wave forms recorded at those wells. We also note that the pruned tree regression on the LRBF coordinates consistently gives the best result for each lithologic attribute although there are some ties. For the quartz volume, the pruned tree regression on the LRBF coordi nates tie the one on the LRBP coordinates. Although the subspace patterns of these two bases are different, the selected basis functions by the RTs are exactly the same ones. Figures 6.24-6.26, show the best tree, the selected LRB functions and the predicted curves for the quartz volume, respectively. Although the functions located around the P and S wave components are se lected again, there are notable differences from those of the nonrandom train ing data set shown in Fig. 6.16: all the selected LRB functions using the randomly-sampled training data set have longer support than those using the
372
N. Saito
6.8220 x.73<-13.8488 x.73>-13.8488
/
0.4956 '5.1590 x.71<3.30831 x.71>3.30831
0.6399
\ (0.6399) 0.4311 '4.3090 \ 0.1074 x.182<0.737918 x.182>0.737918
A.
0.6921 '3.0810 0.8339 x.190<-0.315241 x.190>-0.315241
z
A_
0.6719
0.5839
0.6470
2.0490
Fig. 6.24. The pruned regression tree for the quartz volume using the randomly-sampled training data set. The tree was initially grown on the wave forms represented in the LRBF coordinates. The same tree was also obtained on the LRBP coordinates.
0.0005
0.001
0.0015
0.002
0.0025
o •F-
CM
o ■«
IO
1
H
j WE^ , PP JfLJ JLL jfTmiL I l l l 1I l l OB 1111TtL mmlifT™ ■ i0.0025 n m jJT 0.0005 0.001 0.0015 0.002 (b)
Fig. 6.25. (a) The LRB functions used in the pruned regression tree for the quartz volume shown in Fig. 6.24. These are displayed in the depth-first search manner in the tree, (b) The selected subspaces as the LRB.
Local Feature Extraction and Its Applications
0.0
0.2
0.4
0.6 (a)
0.8
1.0
0.0
0.2
0.4
0.6
0.8
373
1.0
(b)
Fig. 6.26. The prediction of the quartz volume by the pruned regression tree shown in Fig. 6.24. (a) The training data set. (b) The whole data set.
nonrandomly-sampled training data set. This implies that the LRB using the randomly-sampled training data set is more tolerable in the variability in the velocities of the P and S waves. This may be one of the advantages of using the randomly-sampled training data set. For the illite volume, we have again a tie as shown in Table 6.3. In both cases, our algorithm selected the discrete sine basis as the best LRB and the smallest regression error was obtained by the pruned RT on this coordinate system. The best tree, the selected basis functions, and the predicted curves for the illite volume, are shown in Figs. 6.27-6.29, respectively. It is interesting to observe that the highest illite volume (0.285) is assigned for the wave forms whose projections onto each discrete sine basis functions shown in Fig. 6.28 is less than a certain positive threshold. In other words, the shale wave forms have either very small energy in these frequencies or negative correlation with these sine basis. The pruned RT on the LRB2P coordinates also gives the same result. This implies that in fact the shale wave forms have rather small energy in these frequencies.
374
N. Saito
'9.0010 \ x.43<83.34* X43>«3 34»
/4asao\ x»»<3 72572 x.St>3.72S72
\
'314W X.S&XZOMM
x»8>2 09686
/2.0320 \ XW<2 3S02 / »»4>2 3S02
15370
01565
Fig. 6.27. The pruned regression tree for the illite volume using the randomly-sampled training data set. The tree was initially grown on the wave forms represented in the discrete sine basis.
o CM
n ■*■
IO to
T I II I I I I I I II I I I I I I I I I I I I0.0005 I I I I I I I0.001 0.0025 0.0015
0.002
(b)
Fig. 6.28. The basis functions used in the pruned regression tree for the illite volume shown in Fig. 6.27 turn out to be the discrete sine basis. These are displayed in the depth-first search manner in the tree.
Local Feature Extraction and Its Applications
(a)
375
(b)
Fig. 6.29. The prediction of the illite volume by the pruned regression tree shown in Fig. 6.27. (a) The training data set. (b) The whole data set.
Finally for the gas volume, the smallest regression error was obtained by the pruned tree regression on the LRBF coordinates. Figures 6.30-6.32 show the best tree, the selected LRB functions, and the predicted curves for the gas volume, respectively. From these figures, we observe that the time axis was first split into two. In the earlier half (which includes the P wave components), the LRB method decided to do the "frequency analysis". The later half of the time axis (which includes the S wave components) was further segmented into finer time windows. This segmentation is completely opposite to the one in the nonrandom training data; see Fig. 6.22. Interpreting this tree is rather difficult. Figure 6.32 shows that many depth levels are assigned the constant lowest value (0.006638). From the tree in Fig. 6.30, we can trace back the LRB functions giving this value; the functions (134, 165, 131, 145, 142) are responsible for this low value, and in particular, (145, 142). The functions (145, 142) work as detectors; if the projected values onto these basis functions are smaller than certain thresholds, the algorithm assigns the low gas volume; otherwise assign rather high gas volume.
376
N. Saito
0.006638
0.078000
0.064640
0.003480
Fig. 6.30. The pruned regression tree for the gas volume using the randomly-sampled train ing data set. The tree was initially grown on the wave forms represented in the LRBF coordinates.
Local Feature Extraction and Its Applications
377
-*V*^A/-
134 35 5 102 ■O 165 131
^Xy^
m 145 _l 142 111 »W*tf<»«>» W « f r « » W % W W » * » « » ¥ » * » * * ■ »
0.0005
0.001
0.0015
0.002
0.0025
0.0015
0.002
0.0025
(a)
0.0005
0.001
0» Fig. 6.31. (a) The LRB functions used in the pruned regression tree for the gas volume shown in Fig. 6.30. These are displayed in the depth-first search manner in the tree, (b) The selected subspaces as the LRB.
0.0
0.05
0.10 (a)
0.15
0.20
0.0
0.05
0.10 (b)
0.15
0.20
Fig. 6.32. The prediction of the gas volume by the pruned regression tree shown in Fig. 6.30. (a) The training data set. (b) The whole data set.
378
N. Saito
6.4.2. Using the physically-derived quantity In our study so far, we have not really used any physics-based insight to se lect the features except that we have chosen the good training data set in Sec. 6.3 and adopted the LST as the time-frequency decomposition method. Everything else have been automatic. Comparing the performances of these automatic procedures with those of the regression analysis using the quanti ties directly derived from the physics has its own interest. The physics of wave propagation suggests that the key parameter for determining the lithology is the ratio of P and S wave velocities Vp/Vt. Although this quantity is still affected by the other factors such as the crack, pore, and geometry, this is related to Poisson's ratio, a characteristic property of elastic solids. 109,143 In our data set, the velocities of the P and S wave components at each depth level were computed using the modified version of the Radon transform. 81 This technique was applied to the wave forms recorded at eight receivers8 above the specific depth level, which were generated by the same acoustic pulse from transmitter below. Based on this available velocity information, we compute the ratios of the P wave speed to the S wave speed at each depth and use these quantities as input signals (now the input signal space is one-dimensional). The Table 6.4. The table of regression errors using the Vp/V, values of the nonrandomly-sampled training data set and the randomly-sampled training data sets (which are denoted by NTR and RTR, respectively). The smallest errors in the test data columns are displayed in bold font. Quartz
Illite
Gas
Training
Test
Training
Test
Training
Test
FRT on N T R P R T on NTR FRT on RTR
0.1065 0.125 2 0.1540
0.2435 0.2218 0.2086
0.2393 0.2786 0.3833
0.5854 0.565 2 0.5124
0.4427 0.5346 0.5267
0.7804 0.7338 0.7881
P R T on RTR
0.1880
0.1821
0.4667
0.4522
0.6425
0.7147
regression errors by the full and pruned trees on the nonrandomly-sampled training data set and the randomly-sampled training data set are summarized in Table 6.4. For each lithologic attribute, the best result in this table is con sistently obtained by the pruned tree regression using the Vp/Vs values of the g The tool actually used has eight receivers; in the previous sections we have used only the wave forms recorded at the receiver nearest to the transmitter.
Local Feature Extraction and Its Applications
379
randomly-sampled training data set. We also note that the best estimates for the quartz and illite rates in this table are the "best" ones in all the ex periments we have done so far. They are even better than the corresponding resubstitution estimates (the results on the training data set). The following figures show the best trees and the best prediction curves of the quartz, the illite, and the gas volumes using the Vp/Va values.
0.6591 ^574"
0.5418 "2376"^
Fig. 6.33. The pruned regression tree for the quartz volume using the Vp/V, values of the randomly-sampled training data set. This tree has only two terminal nodes: the prediction values have only two possibilities.
0.0
0.2
0.4
0.6 (a)
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
(b)
Fig. 6.34. The prediction of the quartz volume by the pruned regression tree shown in Fig. 6.33. (a) The training data set. (b) The whole data set.
380
N. Saito
0.2591 ~F558 Fig. 6.35. The pruned regression tree for the illite volume using the Vp/V, values of the randomly-sampled training data set. This tree's structure is exactly the same as the one shown in Fig. 6.33.
8
0.0
0.1
0.2
0.3 (a)
0.4
0.5
0.0
0.1
0.2
0.3
0.4
0.5
(b)
Fig. 6.36. The prediction of the illite volume by the pruned regression tree shown in Fig. 6.35. (a) The training data set. (b) The whole data set.
Local Feature Extraction and Its Applications
0.027600
381
0.005088
0.11180 0.05547 Fig. 6.37. The pruned regression tree for the gas volume using the Vp/V, values of the randomly-sampled training data set.
0.0
0.05
0.10 (a)
0.15
0.20
0.0
0.05
0.10
0.15
0.20
(b)
Fig. 6.38. The prediction of the gas volume by the pruned regression tree shown in Fig. 6.37. (a) The training data set. (b) The whole data set.
382
N. Saito
From Figs. 6.33 and 6.35, we observe that the structures and split rules of the pruned RTs for the quartz and the illite volumes are exactly the same. This may again be explained by the "duality" of the quartz and the illite rates in this region. The prediction of the gas volume is slightly worse than the best LRB method. 6.4.3. On the measure of regression errors Is the I7, norm an appropriate measure of regression error? This measure has been used in all the splitting rules in the RTs as well as in the tables of the prediction errors in this section. Since the function we want to approxi mate/predict is not smooth at all (they are discretized versions of the functions in the space of functions of bounded variation), the error measure based on the £x norm would be more appropriate. It is true that the quartz and illite volume predictions derived from the Vp/Vs values shown in Figs. 6.34 and 6.36 give the least £2 error among all the experiments we have done; however, they have only two possible values both of which are close to the total mean values. At least, visually, these are not satisfactory compared to the predictions by the LRB methods shown in, e.g., Figs. 6.26 and 6.29. This problem is often encountered in the image processing field such as image compression and recon struction: the small £2 error does not necessary imply the satisfactory image estimate; see Ref. 49 for the details on the error criterion for image compres sion. Unfortunately, the software we rely on, S-PLUS,140 does not allow us to change the measure of regression error; the £2 norm-based error is hard-wired in the current version of S-PLUS. Breiman et al. described the RT based on the i1 norm (the so-called "Least Absolute Deviation" [LAD] regression) in Sec. 8.11 of Ref. 18. We hope that the RT based on the LAD regression error will be available soon in the S-PLUS. Besides what we mentioned earlier, our future projects also include: (a) ex amine the importance of the frequency contents/shape information, the ampli tude information, and the velocity information, independently, on predicting the lithologic information, (b) incorporate the user interactions to pick the ba sis functions flexibly for the regression analysis, and (c) examine the capability of other nonlinear regression methods such as neural networks.
6.5.
Summary
In Sec. 6, we have applied the LDB and LRB methods developed in Sees. 4 and 6 to a geophysical problem of predicting the lithologic information from
Local Feature Extraction and Its Applications
383
the acoustic well-logging wave forms. Using these methods, we could success fully extract the useful features for predicting this information. The results, in general, agree with the explanations from the physics of wave propagation, although our use of the physics in constructing the regression rules is mini mal. The best results using our methods are found to be comparable to the predictions using the physically-derived quantities such as the Vp/Vs values. 7. Multiresolution Representations Using the Autocorrelation Functions of Wavelets and their Applications 7.1.
Introduction
So far we have concentrated on the library of orthonormal bases and their applications. The orthogonality plays an important role over there mainly for the computational efficiency and the simplicity in implementation of the nu merical algorithms. In this section, we consider a library of "non-orthogonal" bases, namely, the autocorrelation functions of wavelets. We trade the or thogonality with the convenient properties for explicitly characterizing edges or singularities of signals. It is certainly possible to estimate the local be havior of signals by analyzing the growth or decay from scale to scale of the coefficients of the orthonormal wavelet expansions; however, the coefficients of the orthonormal wavelet expansions are not shift-invariant. Thus, redundant representations (without subsampling at each scale, e.g., Refs. 19, 123, 132 and 139, or the continuous wavelet transforms 85 ) are being used in order to simplify the analysis of coefficients from scale to scale. In particular, the or thonormal wavelet expansion of a vector of length n without subsampling is not only shift-invariant but also contains all the wavelet coefficients to represent N circularly-shifted versions of the original signal. 12 ' 132,139 The asymmetric shape of the orthonormal compactly supported wavelets presents another difficulty for the analysis of signals. The symmetric basis func tions are preferred since, for example, their use simplifies finding zero-crossings (or extrema) corresponding to the locations of edges in images at later stages of processing. There are several approaches for dealing with this problem. The first approach consists of constructing approximately symmetric orthonormal wavelets and gives rise to approximate quadrature mirror filters.94 The second consists of using biorthogonal bases, 27,147 so that the basis functions may be chosen to be exactly symmetric.
384
N. Saito
Alternatively, a redundant (shift-invariant) representation using dilations and translations of the autocorrelation functions of wavelets may be used for signal analysis instead of the wavelets per se. The exact filters for the decomposition process are the autocorrelations of the quadrature mirror fil ter coefficients of the compactly supported wavelets and, therefore, are exactly symmetric. The recursive definition of the autocorrelation functions of wavelets leads to fast iterative algorithms to generate a shift-invariant multiresolution representation which we call the autocorrelation shell representation. A re markable feature of this representation is a natural interpolation algorithm associated with it. This interpolation algorithm, the so-called symmetric it erative interpolation, is due to Dubuc 55 and Deslauriers and Dubuc. 47 The coefficients of the interpolation scheme of Refs. 47 and 55 generated from the Lagrange polynomials are the autocorrelation coefficients of the quadrature mirror filters associated with the compactly supported wavelets of Ref. 42. This connection was also noticed by Shensa in Ref. 139. This interpolation scheme of Dubuc is also related to the "algorithme a trous" in Refs. 56 and 72. Another interesting feature of this representation is its convertibility to the re dundant expansion (without subsampling) by the corresponding orthonormal wavelets on each scale, independently of other scales. As an application of the proposed representation, we consider the recon struction of a signal from its multiscale edge representation. Here, the multiscale edge representation means the pairs of: (a) locations of zero-crossings which indicate positions of edges; and (b) slopes at these zero-crossings, in the multiple scale representation of the signal, in particular, in the autocorrelation shell representation. This problem has intrigued many researchers in human and machine vision community, mainly due to David Marr's conjecture 101 : "The edge information of a signal in multiple scales is sufficient to recover the original signal itself." The autocorrelation functions of wavelets give two ad vantages toward this problem: (a) the symmetric iterative interpolation scheme allows us to detect zero-crossings and to compute the slopes at these points in a simple and efficient manner; and (b) the reconstruction of the original signal from its zero-crossing information is posed as solving a system of lin ear algebraic equations so that the relationship between the zero-crossings representation and the original signal becomes quite explicit. Such a repre sentation should be useful for nonlinear manipulation of signals, for example, edge-preserving smoothing and interpolation; see Mallat and Zhong, 98 Mallat and Hwang. 96 For other nonlinear edge-preserving smoothing algorithms, see e.g. Refs. I l l and 115.
Local Feature Extraction and Its Applications
385
Our results can also be viewed as a way to obtain the continuous-like multiresolution analysis starting from the discrete multiresolution analysis. Another approach to make the connection between continuous and discrete multiresolution analyses is developed by Duval-Destin et al.,57 where the start ing point is the continuous version of the multiresolution analysis. The in terested reader in this connection is referred to the recent work of Beylkin and Torresani 14 ; see also Donoho's "interpolating" wavelet transforms 50 which compute the (non-orthogonal) wavelet coefficients of continuous functions not by integration but by sampling based on the Deslauriers-Dubuc interpolation scheme. This section is organized as follows. In Sec. 7.2 we introduce the notion of the orthonormal shell expansion of signals which generates a shift-invariant representation using the orthonormal wavelets. In Sec. 7.3 we consider expan sion of signals into the autocorrelation shell. A new interpretation of Dubuc's iterative interpolation scheme is discussed in Sec. 7.4. In Sec. 7.5 we formulate our approach to the reconstruction of a signal from its multiscale zero-crossings representation and give examples. 7.2. Orthonormal shell: A shift-invariant using orthonormal wavelets
representation
In certain pattern recognition applications such as pattern matching, shift invariance of a signal representation is of critical importance. Although coef ficients of orthonormal wavelet expansions are not shift invariant, the wavelet coefficients of all N circulant shifts of a vector of size N = 2 n may be com puted in 0(N log N) operations. 12 In this section, we propose another way to compute such set of coefficients. Once all wavelet coefficients of N circulant shifts of the vector are computed, we may use them for a variety of applications where the shift invariance is essential. But, first, let us briefly review the properties of the compactly supported wavelets (for details we refer to Refs. 42 and 43). The orthonormal basis of compactly supported wavelets of L2(R) is formed by the dilation and transla tion of a single function rp(x), TPj>k{x) = 2-j'2ilj(2-ix-k),
(7.1)
where j , k € Z. The function ip(x) has a companion, the scaling function f(x). The wavelet basis induces a multiresolution analysis on L 2 (R), 9 3 ' 1 0 4 i.e. the decomposition of the Hilbert space L2(R) into a chain of closed subspaces
386
N. Saito
■ ■ ■ C V2 C Vi C V0 C V_i C V_2 C • • •
(7.2)
such that
U v i = L 2 (R).
f]Vj = {0}, j€Z
(7-3)
i€Z
By defining Wj as an orthogonal complement of Vj in V,_i, Vj-1=Vj®Wj,
(7.4)
the space L2(R) is represented as a direct sum L 2 (R) = 0 W , .
(7.5)
J€Z
On each fixed scale j , the wavelets {if>j,k(x)}k€Z form an orthonormal basis of Wj and the functions {(f>j,k(x) = 2~^24>(2~ix —fc)}jbezform an orthonormal basis of Vj. In addition, the function ip has M vanishing moments r+oo
/f —OO :
ip(x)xmdx
= 0,
m =0 , . . . , M - l .
(7.6)
Due to the fact that Vj, Wj C V,_i, the functions
•
(7-7)
fc=0 L-l
^(x) = -^5]fl^(2x-*), k=0
(7.8)
where 9k = (-l)khL-k-i,
k = 0,...,L-l.
(7.9)
The number of coefficients L in (7.7) and (7.8) is related to the number of vanishing moments M and for the wavelets in Ref. 42 L = 2M. If additional conditions (such as less asymmetry and regularity) are imposed, then the re lation might be different43; e.g., the coiflet, which is a less asymmetric version of the Daubechies wavelet, has L = 3Af. But L is always even. The coefficients H - {hk}o
K ( 0 l 2 + K ( 0 l a = i, where the 27r-periodic functions mo and mi are defined as
(7-io)
Local Feature Extraction and Its Applications 387 L-\
™o(0 = -J=£>^ ifc4 > 1 mi
(711)
i-1
» * e ^ = e <(f+,r) mo(£ + *) •
(^) = 7 | £
(7-12)
fc=0
7.2.1. The orthonormd shell In practical applications there is always the finest scale of interest and, there fore, it is sufficient to consider only shifts by multiples of some fixed unit. Throughout Sec. 7 we will assume that the number of scales is finite and that there exist a finest and a coarsest scale of interest. Without loss of generality, we will assume that the finest scale is described by the JV(= 2 n ) dimensional subspace Vo C L 2 (R) and consider only circulant shifts on Vo- In this case, the multiresolution decomposition of the space Vo may be written as Vo=(0WiJ©VJ,
(7.13)
where 1 < J < n and the subspace Vj describes the coarsest scale. Since the functions {ipj,k(x)}i<j<Jfi
and
{¥>J,fc(z)}o
(7-14)
form an orthonormal basis of Vo, for any vector / G Vo represented by n-l
/(x) = £ a t o , , f c ( z ) , fc=o we have the following relation:
(7.15)
II/II2 = £
(7.i6)
j=l
2
£ W + fc=0
2
£ W fc=0
The coefficients sjj and d^ in (7.16) are defined as s{ = j f{x)
(7.17)
d£= Jf(x)^k(x)dx,
(7.18)
for j = 1,2,..., J and k = 0 , 1 , . . . , 2 n _ J — 1, and the norm is defined as
388
N. Saito
(7.19) We refer to the set of coefficients {dj}i<j<j, o
and
and
{s£}o
{
(7.20)
where jjtk(x)
= 2-j'2tP(2-i(x
£/,*(*) = 2-Jf\{2~J(x
- *)),
(7.21)
- k)).
(7.22)
We call this family a shell of the orthonormal wavelets for shifts in Vo. Hence forth, we call this family an orthonormal shell for short. Let us define the following norm:
ll/lll = £ 2-J EW) 2 + 2_J X > * ) 2 . j=\
fc=0
(7-23)
fc=0
where the coefficients s^ and d£ are denned as
*L = J f{*)
(7-24)
4 = Jf(*WiA*)**-
(7-25)
We refer to the set of coefficients {^}i<j<j, o
(7-26)
7.2.2. A fast algorithm for expanding into the orthonormal shell Let us assume that the orthonormal wavelet coefficients of the finest scale {s°}o
Local Feature Extraction and Its Applications
389
/ = 5Zfc=o sfcVo,* e Vo. To obtain the orthonormal shell coefficients of this function / , we use the quadrature mirror filters H = {/ij}o
«*' = ZX2i-.i.
( 7 - 27 )
1=0 L-l
d
2' = £* 5 &i-w.
(7-28)
1=0
for j = 1 , . . . , J, fc = 0 , . . . , n — 1. Clearly, computations via (7.27) and (7.28) require 2NJ < 2N log2 N operations. The diagram for computing these co efficients via (7.27) and (7.28) is illustrated in Fig. 7.1. We note that the computational diagram of this algorithm is essentially identical to the Hier archical Discrete Correlation scheme (HDC) of P. Burt, 19 which was designed for efficient correlation of images at multiple scales. We also note that the HDC scheme was proposed prior to the Laplacian pyramid scheme 21 which, in turn, stimulated the development of the multiresolution analysis 93 and of the orthonormal bases of the compactly supported wavelets.42
Tf/7 hji.h
h,
Pig. 7.1. A scheme illustrating the algorithm for expanding into the orthonormal shell. Using the quadrature mirror filter H = {ho,hi,h.2,h$}, all points are computed for the orthonormal shell, whereas only points marked by • are computed for the orthonormal wavelet expansion.
390
N. Saito Using (7.11) and (7.12), we rewrite (7.27) and (7.28) in the Fourier domain,
P(0 = yftmap-W'-Ht),
(7-29)
(7.30)
Let us show that we have computed the orthonormal wavelet coefficients of all circulant shifts of the function / . Since the algorithmic structures for computing the coefficients {SJ£} and {d^} are exactly the same, we consider only {dfc}- At the first scale, we have L-l /=o We rewrite (7.31) as L-l 1=0 t-i
^fc+i = 5 3 9ls2k+i+l.
(7-33)
1=0
for k = 0 , . . . , N/2 — 1. The right-hand side of (7.32) coincides with the compu tation of the orthonormal wavelet coefficients of the shifted signal / ( x + 1). It is clear that the sequence {d£ fe } contains all the orthonormal wavelet coefficients that appear if f(x) is circularly shifted by 2 , 4 , . . . , and the sequence {d^fc+i) contains all the orthonormal wavelet coefficients for odd shifts ( 1 , 3 , . . . ) . Similarly, at the j'th scale, we may rewrite d3k as L-i d
(7-34)
vk = J2 91*17-1 (2k+i). 1=0 L-l
d
Vk+l
~ 2-,S' S 2i-»(2Jfc+0+l ' 1=0
(7.35)
L-l d
2ik+2i-l
=
Z^9ls2i-i(2k+l)+2i-l Z=0
'
(7.36)
Local Feature Extraction and Its Applications 391
for k = 0 , . . . , 2 n K Now the sequences {d£ik}, {d3Vk+1},..., {<8»k+2i-i} contain the orthonormal wavelet coefficients of the j t h scale of the signal shifted by 0 , 1 , . . . , 2 J — 1, respectively. Therefore, the set {<^}i<j<j, o
»
so d1
i
1
_ l
—I
d2 d3
I
I
1
|
I f
—*■
d4
—A
d5
^ w
1—
A - ^ ~
s5
0
100
200
300
400
500
Fig. 7.2. The expansion of two unit impulses into the orthonormal shell using the Daubechies wavelet with two vanishing moments and L = 4.
Figure 7.2 illustrates the shift invariance of the representation. In Fig. 7.2 we use the quadrature mirror filters with two vanishing moments and of length L = 4. We also set depth of expansions J = 5. The top row shows the original impulses. Dotted lines highlight the expansion of the shifted impulse. Clearly, the shift in the original signal is preserved at each scale. Note that in Fig. 7.2 we see the mirror images of wavelets with details appropriate at the corresponding scales. It is clear that due to "rough" shapes of wavelets there might be "too many" zero-crossings. Also, the positions of peaks are shifted across the scales due to the asymmetry of the Daubechies wavelets. The following proposition shows the relationship between the original signal {s\} and the coefficients of the orthonormal shell expansion of this signal.
392
N. Saito
P r o p o s i t i o n 7 . 1 . For any function f € Vo, f(x) = Y?k=o a\f(x — ^)> ^ie coefficients {s^} and {d£} defined in (7.24) and (7.25) satisfy the following identities N-l
N-l
£«, fc =X>° fc #, fc ,
(7-37)
fc=0 fc=0 N-l
N-l
E«, fc = Es°^>
fc=0 fc=0
( 7 - 38 )
where
S*(fl = 5°(0 2" a ]Jmo(2'-iO,
(7-39)
i=i j'-i
*'(0 = *°(0 2J/2 m,(2i-i0 I I ^(tf- 1 *),
(7.40)
i=i
for fc = 0 , . . . , N - 1. By multiplying (7.39) by tp(£), we have * > ( 0 W = s0(02J'/2nmo(2'-^)v(0 = -°(0 ^ # ( 2 * 0 ,
(7.41)
1=1
where we have used the identity oo
Z=l
The inverse Fourier transform of (7.41) yields (7.37). The relation (7.38) may be derived similarly. □ Figure 7.2 illustrates the proposition. By applying the proposition to the sequence {si = <$*<,,*:}, we have an expansion {2~^2ip{2~^(ko — x))}i<j<J and 2~J/2tp(2~J(ko — x)). Therefore, we see the mirror images of the basis functions themselves.
Local Feature Extraction and Its Applications
393
7.2.3. A fast reconstruction algorithm Given the coefficients {dfc}i<><j, o
.
(7.43)
This expression is equivalent to L-l l
*l~ = o £ ( * ' * i - * - > i +9idL2i-H),
(7.44)
for j = 1 , . . . , J, k — 0 , . . . , N — 1. As shown in Sec. 7.2.2, the orthonormal shell coefficients on the j'th scale are twice as redundant as on the (j — l)st scale, and the factor 1/2 in (7.44) accounts for this. 7.3. Autocorrelation multiresolution
shell: A symmetric representation
shift-invariant
The representation of a signal in the orthonormal shell is shift-invariant. This representation, however, has several drawbacks for certain applications, such as detecting and characterizing edges (or singularities) in the signal, where the scale-to-scale analysis of the coefficients is necessary. 66 ' 95 ' 98 This representation lacks symmetry because of the asymmetric shapes of compactly supported wavelets. It results in a rather complicated representation even in the case of the unit impulse sequence, as may be seen in Fig. 7.2. Also, because of the "rough" shape of compactly supported wavelets, there might be "too many" zero-crossings in this representation. Here we introduce the notion of an autocorrelation shell of compactly sup ported wavelets, i.e. a shell formed by dilations and translations of the au tocorrelation functions of compactly supported wavelets. The decomposition filters associated with these autocorrelation functions naturally have symmetric shapes which simplifies the scale-to-scale analysis of the coefficients. One of the interesting features of this representation is its convertibility to the orthonor mal shell of the corresponding compactly supported wavelets on each scale independently of other scales. The algorithm for such conversion is discussed in detail. We also investigate the subsampled version of the autocorrelation
394 N. Saito shell. It turns out to be possible to reconstruct the original signal, if we store a single additional number at each scale: the Nyquist frequency component of the signal on that scale. 7.3.1. Properties of the autocorrelation functions of compactly supported wavelets Let us first summarize the properties of the autocorrelation functions of a com pactly supported scaling function tp(x) and the corresponding wavelet \l>{x). By definition of the autocorrelation function, we have r+oo *(x) = /
+oo
WvMv -x)dy.
/
(7.46)
•oo Given the fact that {
(7.48)
where Jofc denotes the Kronecker delta. The Fourier transforms of the autocorrelation functions in (7.45) and (7.46) are as follows:
*(0 = WOI a . *
(7.49) (7.50)
By taking the Fourier transforms of (7.7) and (7.8), we have &O=motf/2)0tf/2),
(7.51)
&O="»iK/2)0K/2).
(7.52)
Using (7.51) and (7.52), we obtain
4«) = KK/2)|2*K/2),
(7.53)
* ( 0 = |mi(e/2)| 2 $(^/2).
(7.54)
Equation (7.10) also implies that
Local Feature Extraction and Its Applications
*(0 + *(0 = *tt/2).
395
(7-55)
Since |mo(£)| 2 is an even function, we have 1L/2
1
fc=i
where {a*} are the autocorrelation coefficients of the filter H, L-l-k
ak = 2 ^ h,fc,+fc i=o
forfc = l , . . . , £ - l
(7.57)
and a2fc = 0
for Jfe = l , . . . , L / 2 - l .
(7.58)
The coefficients {a2fc-i}i
*(z) = *(2x) + - y " o 2 j - i ( $ ( 2 x - 21 + 1) + $(2x + 21- 1)), 2
(7.59)
1=1 L/2
tf (as) = *(2x) - i V a 2 j - i ( * ( 2 x - 2Z + 1) + $(2x + 21 - 1)). 2
(7.60)
<-i
By direct examination of (7.59) and (7.60), we obtain that both $ and ^ are supported within the interval [—L + 1, L — 1]. Finally, $(x) and 'J'(x) have vanishing moments, 12 namely f -t-oo + OO
/ M
-
/
-oo -oo +oo _r+oo
x m * ( x ) d x = 0,
forO<m
(7.61)
x m $ ( x ) dx = 0,
for 1 < m < L - 1,
(7.62)
-oo ■oo
and
+oo r+oo
/
■oo
$(x)dx = l .
(7.63)
396
N. Saito
Since L consecutive moments of the autocorrelation function $(x) (7.61), we have H) = 0(£L).
vanish (7.64)
Therefore, '$'(£) may be viewed as the symbol of a pseudo-differential operator which behaves like an approximation of the derivative operator {d/dx)L. We note that due to its definition, this operator may be calculated recursively. The convolution with the function ty{x) has two properties useful in edge detection (see Ref. 101): it behaves essentially like a differential operator in detecting spatial intensity changes and it is designed to act at any desired scale. Finally, we display functions $(z), ¥>(x)> ^(x)> ^{x)> and t n e magnitudes of their Fourier transforms in Figs. 7.3 and 7.4. In these figures, we have used the Daubechies wavelet with two vanishing moments and L = 4. It is easy to see that the autocorrelation functions $(z) and ty{x) are smoother than the functions ip{x) and ip(x), furthermore $(f) and '!'(£) decay faster than <£(£) and V>(£) respectively. We also note that both $(z) and \£(x) are even.
-0.5
-0.5
Fig. 7.3. Plots of the autocorrelation function $(x) and the Daubechies scaling function
Local Feature Extraction and Its Applications
397
Fig. 7.4. Plots of the autocorrelation function * ( x ) and the Daubechies wavelet VC1) w i ' h two vanishing moments and L = 4. (a) * ( i ) , (b) ^(x), (c) magnitude of the Fourier transform of $ ( x ) , (d) magnitude of the Fourier transform of ^>(x).
Remark 7.2. It follows from (7.55) or (7.59) and (7.60) that tf (x) = 2$(2x) - $ ( x ) .
(7.65)
This may be compared with the approximation of the Laplacian of a Gaussian function (the so-called Mexican-hat function) by the difference of two Gaussian functions (the so-called DOG function) as cP —2 G(x; a) « G(x; aa) - G(x; a) dx = aG(ax; a) — G(x; a),
(7.66) (7.67)
where ,-*2/2*2
G{x;a) = V/27TCT
(7.68)
398
N. Saito
and a = 1.6 as Marr suggested in Ref. 101. Interestingly enough, Marr and Hildreth compared the DOG function with the difference of two boxcar func tions and the difference of two sine functions (i.e. the ideal bandpass filter) in their edge detection performances. 102 They claimed the superiority of the DOG function over the other two by saying that the latter two functions are too localized either in space domain or in spatial frequency domain. 7.3.2. The autocorrelation shell of compactly supported wavelets By analogy with the previous section, let us now consider the following family of functions, {®jAx)}i<j<J,o
and
{<£>j,k(x)}o
(7.69)
where * *fc(x)= 2 - ^ * ( 2 - > ( x - k)),
(7.70)
*./,*(*) = 2 - / / 2 $ ( 2 - 7 ( x - k)).
(7.71)
We call this family a shell of autocorrelations of compactly supported wavelets for shifts in Vo. Henceforth, we call this family an autocorrelation shell for short. The relation to the orthonormal shell. Let us now consider the relation between the autocorrelation shell and the orthonormal shell. Let / e Vo so that T»-l
/(x) = £^(x-fc),
(7-72)
fc=0
and consider the function Af, Af(x) = J2s0k^(x-k).
(7.73)
fc=0
These two functions are related via Af(x) = Jf(yMy-x)dy
(7.74)
= s°k,
(7.75)
and Af(k)
foik = 0,l,...,N-l.
Local Feature Extraction and Its Applications
Similarly at the scale j , let us define functions f3(x) orthonormal shell coefficients s^ and d3.,
and fd{x)
399
using the
N-l
//(*)=£«*¥>(*-*),
(7-76)
k=0 N-l
(7.77)
ti(x)='£di
By correlating the functions f3(x) and f3d{x) with the functions 2~3
Alf(x)
=J
3
3
3
f (y)2-
x))dy=J2Si*(x-k)>
( 7 - 78 )
fc=0
Af{x)
=j
fi(y)2-3H2-j(y
N-l
x))dy = £
Di*(x
- k),
(7.79)
k=o
where we define the coefficients {S3k} and {D3.} to be the autocorrelation shell coefficients (averages and differences). Prom (7.47), the coefficients {S£} and {D£} are the values of A3,f(x) and Adf(x) at integer points: Si = Aif(k)
= j fi(y)2-3
D3 = Ajdf{k) = j fi(y)2-^(2-i(y
- k)) dy,
(7.80)
- *)) dy.
(7.81)
To summarize, the coefficients {S£} and {D£} may be obtained as follows: Step 1. Expand the function / G Vo in the orthonormal shell and obtain the coefficients {s-J.} and {dj.} (see Sec. 7.2). Step 2. Expand the unit impulse {6ok} in the orthonormal shell and obtain the coefficients {v£} and {ui3.}, which are the values of 2~3^2ip<{2~3x) and 2~3/2ii><(2~3x) at integer points x. Step 3. For each scale j , correlate the coefficients {s3.} and {d3.} with the coefficients of the unit impulse {v£} and {w£} respectively, and multiply the results by the factor 2~3/2.
400
N. Saito
Remark 7.3. In applications of autocorrelation shell representations, one may assume that the original continuous signal is the function Af(x) rather than f(x). In this case {s£} are the values of Af(x) at integer points. A fast algorithm for expanding in an autocorrelation shell. Let us now derive a fast algorithm for expanding in an autocorrelation shell, i.e. the pyramid algorithm for computing {S£} and {D^} from {SjjJ -1 }. First, let us define the coefficients {pk} and {qk} by rewriting Eqs. (7.59) and (7.60) as
£•(./*> £•(./»
L-l k=-L+l
-*),
(7.82)
-*),
(7.83)
L-l k=-L+l
where
Pk = <
2-1/2
for k = 0,
2- 3 / 2 a |fc|
for Jfc = ± l , ± 3 , . . . , ± ( L - -1)
0
for k = ±2, ±4,. . . , ± ( L - -2)
2-1/2 qk = < I, -Pk — Pk
for k = 0, .
u
(7.84)
(7.85)
otherwise. otherwise.
We view these coefficients as filters P = {pk}-L+i
and Q = (7.84) a n d By taking (7.53) a n d
L-l
v^|mo(0|2=
£
Pkeik*,
(7.86)
9*e ik «.
(7.87)
k=-L+l L-l 2
V2|m!(0| =
£ fc=-L+l
Thus, using $ and \t for the decomposition is equivalent to viewing V ^ | T I O ( O I 2 and v^|"ii(^)| 2 as decomposition filters instead of mo(£) and mi(£) in the
Local Feature Extraction and Its Applications 401
orthonormal case. We note that the pair of filters V2\mo(£) | 2 and N/^ITOIC^)!2 is not aquadratureimrrorpairsince2|mo(£)| 4 +2|m 1 (£)| 4 ^ 1. Using the filters P and Q, we obtain the pyramid algorithm for expanding into the autocorrelation shell,
Z=-L+l
*>i= £ *s&>-u-
( 7 - 89 )
l=-L+l
As an example of the coefficients {pt}, for Daubechies' wavelets with two vanishing moments, L = 4 and the coefficients are 2 - 1 / 2 ( - l / 1 6 , 0 , 9 / 1 6 , 1,9/16,0,-1/16). The direct conversion from an autocorrelation shell to an orthonormal shell. We now consider an algorithm for the direct conversion of {S£} and {D£} to {s^} and {d?k}. Let us first derive a recursion relationship similar to (7.39) and (7.40) for the orthonormal shell expansion. From (7.86)-(7.89), we obtain &(0
= s°(02j/2 f l W ^ O I 2 ,
(7-90)
/=i
i-i j
D (0
= 5°(02
j/2
1
2
|m1(2^ 0| I I K ( 2 ' - J 0 | 2 •
(7-91)
On defining the functions mi(0
= f[mo(2l-10,
(7-92)
so that mJ1(0=m1(2>-10mr1(0>
(7-93)
we obtain from (7.39), (7.40), (7.90) and (7.91) Sj(0=<(0sj(0, j
t> (t) = m}(t)d'{Q.
(7-94) (7.95)
402
N. Saito
Thus, to compute the autocorrelation shell coefficients from the orthonormal shell coefficients, we have to evaluate the following two equations:
^'(0 = 7 - ^ ^ ( 0 ,
(7.96)
K(0l a In general, such computation leads to an unstable algorithm due to zeros of the denominators. In our case, however, this procedure is stable and the di vision by |mj(^)| 2 and |n*j(f)| 2 in (7.96) and (7.97) does not cause problems because the numerators are always zero when the denominators are zero (see Proposition 7.4 below). We observe that transforming back and forth between the orthonormal shell representation and the autocorrelation shell representation is done on each scale separately. Properties of the autocorrelation shell. Let us define the following norm for the autocorrelation shell expansion:
H/lli = E 2~j E ( ^ + 2 " J l i W ) 2 , i=i
fc=o
(7-98)
fc=o
where the coefficients D^ and S£ are denned in (7.81) and (7.80). The norm (7.98) may be compared with the norm for the orthonormal shell (7.23). As for the orthonormal shell, the factor 2 - J in (7.98) is used to offset the redundancy of this representation. We now obtain the following estimate: Proposition 7.4.
j^yll/ll^ll/lli^ll/ll 2 ,
(7-99)
where ||/|| is the L2 norm of the vector f in V0 defined in (7.19). This inequality guarantees that there exists a stable reconstruction algo rithm from the autocorrelation shell coefficients. If the number of dyadic scales J —► oo, then the lower bound in (7.99) approaches zero and the reconstruction algorithm becomes unstable. In feasible practical applications, however, the
Local Feature Extraction and Its Applications
403
number of dyadic scales does not exceed a small constant (e.g., J < 50), and the estimate (7.99) is sufficient for a stable reconstruction. It also shows that our construction is limited to finite-dimensional subspaces. Proof. From (7.26) we have
ll/ll2 = ll/lll = E 2-> f ) {4f + 2~J £ (^)2 • j=l
fc=0
(7-100)
*=0
Using Parseval's equahty, we have the following Fourier domain expressions: N-l
, N-l
(7.101)
fc=0 fc=0 N-l
d 2 E( *) fc=o
1
N-l
2 = "^E^'i *=o
(7.102)
Using the expressions (7.39), (7.40), (7.92) and (7.93), we rewrite (7.100) as
ll/lll = jf
(7.103) fc=0
where £* = 2wk/N. Similarly, for the norm of the autocorrelation shell expansion (7.98), we have N-l
N-l
2 i^E^'E^+^E^) fc=0 3=1
(7.104)
fc=0
With the same arguments, the Fourier domain expression of (7.104) becomes 1 ^ ~N
E EtfiViteoi4 + E i«2iai"^(&)i4 j=l
k=Q
(7.105)
fc=0
Since sup
K(0|2<1
and
0<4<2TT
sup K ( 0 | 3 < 1 ,
(7.106)
0<{<2ir
we immediately have from (7.105) (7.107)
404 N. Saito As for the rest of the inequality of (7.99), using the Schwarz inequality in (7.103) we have J
V2/JV_i
/N-l
£ £ ;s° \
ll/lll < jf
k
I
2
J
+ Ei 2i
2
y/N
J
\ !/2-
2
J
EI^I K (6)I
4
fe=0
\1/2
/N-l
E
xl/2
E i^i K(^)i
\1/2/N-l
Jfc=0
jmi
4
fc=0
j = l \ fc=0
/N-l
2
E i*2i V t o i
4
fc=0
/
j= l \
J JV-1
2
/N-l
S
2 J
J
\
4
/2
+ E I °*I K (&)i 4
V
fc=0
JV-1
a
/ 1/2
< v/JTI^ ( E E l^l K«fe)| + E tfl l»tf(601 Viv ,i=i fe=o fc=0 = v7TI il/ll ll/iu.
(7.108)
Since ||/|| = | | / | | 5 (see (7.26)), we obtain (7.99).
□
The following proposition (which is similar to Proposition 7.1) is essential in our approach to the reconstruction of signals from zero-crossings. Proposition 7.5. For any function f 6 V0, f(x) — J2N=o slf(x ~ *)» *^e coefficients {SJk} and {/?/?} defined in (7.80) and (7.81) satisfy the following identities N-l
5
N-l
E **°.* = E s ^.*>
fc=0 fc=0 fc=0 N-l 4-1
N-l
E^*o.* = fc=0 E*2*i.fc, fe=o
where $o,k(x) = $o,fc(x), *j,fc andQj^ tively.
(7.109)
(7.110)
are defined in (7.70) and (7.71) respec
Proof. By taking the Fourier transform of the left-hand side of (7.109) and using (7.90), we have
Local Feature Extraction and Its Applications
# ( 0 * ( 0 = s°mj/2
= S°(02j/2$(2^),
f l Imotf-'Ofm
405
(7.111)
1=1
where we have used the identity
*(o=iii m o( 2_i oi a
(7.112)
i=i
The inverse Fourier transform of (7.111) yields (7.109). The relation (7.110) may be derived similarly. □ Examples of the autocorrelation shell expansion. Let us illustrate the auto correlation shell expansion using several examples. In Fig. 7.5 we show the expansion of the unit impulse and its shifted version in the autocorrelation shell to illustrate the symmetry, smoothness, and shift invariance of this rep resentation (see Fig. 7.2 for comparison). In Fig. 7.6 we display an example of one-dimensional signal which is a natural radioactivity profile of certain sub surface formations used in Example 3.11 in Sec. 3. The autocorrelation shell coefficients of this signal are shown in Fig. 7.7. Figure 7.8 shows the corre sponding average coefficients. In this case, we have used the autocorrelation functions of the Daubechies wavelet with two vanishing moments and L = 4.
so D1 D2
.9
nAr D4
-^V-
D5 S5
100
200
300
400
500
Fig. 7.5. The expansion of two unit impulses in the autocorrelation shell using the autocor relation functions of the Daubechies wavelet with L = 2M = 4.
406 N. Saito 120
300
500
Fig. 7.6. The original signal (representing a natural radioactivity of certain subsurface formations) which will be used as an example for the autocorrelation shell expansion.
SO
"■n/\yv~v/w\VV_
J"U
D1
a
02 l ^ W I . ^ - ^ W ^ A V Y V -
3 03 ^ W / W l / ^ W V ^ 04
M/vvW\r-
W\Ar~
D5
100
200
300
400
500
Fig. 7.7. The expansion of the signal shown in Fig. 7.6 in the autocorrelation shell using the autocorrelation functions of the Daubechies wavelet with L = 2M = 4. The top row is the original signal {S^}. The coefficients {•Dj}i<j<s, o
Local Feature Extraction and Its Applications 407
J
Fig. 7.8. Plots of the average coefficients on different scales in the autocorrelation shell representation of the signal shown in Fig. 7.6. The top row is the original signal.
7.3.3. A direct reconstruction of signals from the autocorrelation shell coefficients As we have shown in Sec. 7.3.2, the original signal may be reconstructed from the autocorrelation shell coefficients by converting them to the orthonormal shell coefficients followed by the reconstruction algorithm (7.44). Let us now construct an algorithm for reconstructing the original signal directly from the autocorrelation shell coefficients. Rewriting (7.88) and (7.89) and using the coefficients {a*} of (7.57), we have L/2 5fc_
D
v/2
i = v=2
S
k
ci-i
+
S 1 2 Z^,a2l-1^Sk+2'-U2l-n k-2i-U2l-l)) ■7fc+2'-1(2'-l) + ~^~^k-2i(2l-l)' 1=1
x
(7.113)
L/2
2 / , Q 2<-lWfc+2J-i(2i-1) + ^k-2i-»(21-l))
(7.114)
1=1
for j = 1 , . . . , J, k = 0 , . . . , N - 1. By adding (7.113) and (7.114), we obtain a simple reconstruction formula, ji-i
>k
_ 1 =^Sk3
+
D
l),
(7.115)
408
N. Saito
for j = 1 , . . . , J, k = 0 , . . . , N — 1. Given the autocorrelation shell coefficients iDk}i<j<J, o
s°k = 2-J'2SJk + £
2-H2D{ ,
(7.116)
for fc = 0 , . . . , J V - l . Similarly to the orthonormal shell coefficients, {Sk} and {Dk} are redun dant. This may be used to reconstruct a signal from subsampled autocorrela tion shell coefficients. 7.3.4. The subsampled autocorrelation shell We now turn to a question of reconstructing the original vector from a subsam pled sequence of the autocorrelation shell coefficients. We demonstrate that it is possible to reconstruct the original signal if we keep a single additional number at each scale of the autocorrelation shell expansion, i.e. the Nyquist frequency components of the autocorrelation shell coefficients. Let us define the subsampled autocorrelation shell expansion of a vector / £ Vo as follows:
{
2
n-i_j
£
2n-J-l
v
di*,-,fc(i)[
and
Yl «**•/.*(*).
fe=o J I<J<J where $,,* and tyj^ are denned by 9jtk(x) = 2-j/2$(2-jx ^jik(x)
(7-117)
fe=o
= 2-''29(2->x
- k),
(7.118)
- k),
(7.119)
and the coefficients sk and d£ are defined as the subsamples of the autocorre lation shell coefficients S£ and Dk as 4 = Slk
dl=Dl
,
(7.120) (7.121)
Local Feature Extraction and Its Applications
409
for k = 0 , 1 , . . . , 2n~j - 1. The notation s{ and d3k is local to Sec. 7.3.4 (not to be confused with the one for the orthonormal wavelets). The decomposition algorithm is similar to that for the orthonormal wave lets, and we compute L-\ s
=
k
2-*/ l=-L+l
PlS
2k+l
(7.122)
'
L-l d
k=
^2
(7.123)
1lS2k+l '
fori = l , . . . , J , * = 0 > . . . , 2 B - > - l . For the reconstruction, we first rewrite (7.122) and (7.123) in terms of the coefficients {a*} as
Sfc
" s/2
S
J—1
2k
L/2
-^ \ "^
/ j —1
j —1
(7.124)
~ n / .a2t-l(s2fc-2t+l + S2*+2i-l)
(7.125)
+
S
1 L'2 S
2k
\
2k+2l-l)
+ n / ,a2l-l\.s2k-2t+l Z 1=1
1=1
By adding these two expressions, we obtain the coefficients of the (j — l)st scale with even indices,
& = ^(4+4),
(7.126)
for j = 1 , . . . , J, k = 0 , . . . , 2" - - 7 - 1. As for the coefficients with odd indices, we first define the sequence,
*L = ^4-4) L/2
=
J. \
">
*
o / Ja2l-\\s2k-2l + l+ (=1
(7.127) j
s
J
»
2k+2l-l) •
By taking the Fourier transform of (7.127), we have
(7.128)
410
N. Saito 2"-J-l
A'(0= £
Aie'fe«
(7.129)
fc=0
2n~i-lL/2
=5 E fc=0
E^-iWfc+w-i^ + ^-ai+i^)
(7-13°)
1=1 L/2
= «^«/2) E ° « - i
cos
(( 2 ' - w 2 ) '
(7-131)
•kV(tt+1K/a.
(7-132)
J=l
where
^«/2)-
2n-3-\
E E
Jfc=0
The division by X)/=i a2/-i cos((2i — l)£/2) is not denned at the Nyquist fre quency at £ = 7r. For the uniqueness of the reconstruction, we compute the Nyquist frequency component at each scale and store it as a part of the de composition algorithm. We then supply these data at the reconstruction stage. The Nyquist frequency component of odd samples at the (j — l)st scale may simply be computed as 2"" J -1
" & = ^d~>/2) = i £
(-l) fc *&-+i •
(7-133)
fe=0
In summary, the coefficients with odd indices can be recovered as the inverse Fourier transform of the following quantity: A J
L/2
S&K/2)
®
for 0 < £ < 7T,
E °«-i cos (( 2 ' ~ X)^/2)
(7.134)
1=1
, ^ q
for£ = 7r.
Remark 7.6. In Ref. 20, Burt introduced three different multiresolution image representation schemes: (a) "standard-density" pyramid; (b) "double-density" pyramid; and (c) "full-density" pyramid. The schemes (a) and (c) correspond to the representations using the subsampled shell (or the standard wavelets) and the shell, respectively. The "double-density pyramid" has twice as many
Local Feature Extraction and Its Applications
411
expansion coefficients as the standard-density pyramid, at each scale. Ac cording to Burt, this scheme is often used in practice for image processing applications. This comment makes sense in our scheme too; if we double the number of coefficients at each scale in the subsampled autocorrelation shell expansion, then there is no need to keep the Nyquist frequency component at each scale. 7.4. A review of Dubuc 's iterative
interpolation
scheme
The representation using the autocorrelation functions of compactly supported wavelets has a natural interpolation algorithm associated with it. This inter polation scheme is due to Dubuc 55 and has been extended in Ref. 47 by Deslauriers and Dubuc. This interpolation scheme may be arrived at by considering the autocorrelation function of the scaling function
/(*) = ^ ( / ( * " h) + Hx + h)) ~ i ( / ( x ~ 3/l) + /(* + 3/ 0) •
(7-135)
412
N. Saito
Fig. 7.9. The Lagrange iterative interpolation of the unit impulse sequence with the asso ciated quadrature mirror filter of length L = 4, i.e. ai = 9/8 and 03 = —1/8. Black nodes at x = 0 indicate 1 and white nodes at x = ±1 have value 0. Shaded nodes have values other than 0 or 1. Note that the values of nodes existing at the jth scale do not change at the (j — l)st scale and higher. The result of repeating this procedure converges to $(z) as j —► - c o .
Figure 7.9 illustrates a few steps of this iterative process applied to the unit impulse. Deslauriers and Dubuc generalized this interpolation scheme as follows: f(x) = J2F(k/2)f(x
+ kh)
for
x(=Bn+1\Bn
and
h = l/2n+1,
(7.136) where the coefficients F(k/2) are computed by generating the function satis fying F(x/2) = Y^ F(k/2)F(x - k). (7.137) kez By comparing (7.136) and (7.137), we observe that the function F(x) is an interpolation of the unit impulse {5ok}kez- Using this fact, Eq. (7.136) may be rewritten as f(^) = ^2f(k)F(x-k) kez
for
xeR.
(7.138)
In particular, they discussed an example connected with the Lagrange polyno mial with L = 2M nodes,
Local Feature Extraction and Its Applications
413
M
/(*)=
£
(7.139)
Vfc_\(0)f(x + (2k-l)h)
k=-M+l M
= £
^k-_\(0)(f(x
- (2k - l)h) + f(x + (2k - l)h)),
(7.140)
fc=i
where {P^J^x)}-M+i
tt
(7i4i)
\is^)-
In this case, Eq. (7.137) reduces to M
F(x) = F(2x) + J^ 'Pi'k-i(0)(F(2x - 2k + 1) + F(2x + 2k-
1)).
(7.142)
fc=i
This special case of (7.136) or (7.138) is called the "Lagrange iterative inter polation". The original Dubuc's scheme (7.135) corresponds to L = 4 in the scheme (7.139). For general wavelets, we have the following proposition: Proposition 7.7. F(x) = * ( z ) ,
(7.143)
where F(x) is the fundamental function defined in (7.137) and $(x) is the autocorrelation function of the scaling function
(7.144)
Using the property (7.47), we have from (7.144) *(fc/2) = o fc /2.
(7.145)
414 N. Saito In other words, the two-scale difference equation for the function $ in (7.59) may be rewritten as *(x/2) = J2 $ ( * / 2 ) * ( s - Jfc). (7.146) fcez Equivalence of (7.137) and (7.146) combined with the uniqueness of the nontrivial Z^-solution to these equations (Theorem 2.1 of Ref. 45) implies (7.143). □ The vanishing moments of $(x) (see (7.62) and (7.63)) and Proposition 7.7 yield Proposition 7.8. (Deslauriers & Dubuc 47 ) For any polynomial P of degree smaller than L, the Lagrange iterative interpolation of the sequence f(n) = P(n), n € Z via (7.139), is precisely the function f(x) = P(x) for any x G R. The regularity of the fundamental function F(x) may also be derived from the results of Daubechies and Lagarias. 46 To compute the derivative of the interpolated function, we differentiate (7.138): Proposition 7.9. Ifh = 2~n and ifx £ Bm, where m < n, then the derivative of an interpolation function f(x) is computed via L-2
f{x)
= Y, rk{f{x + kh) - f{x - kh)), fc=i
(7.147)
where
+ rr+oo °° d rk = /
(a) / / the integral in (7.148) exists, then the coefficients {rk}kez satisfy the following system of linear algebraic equations rfc = 2 and
1
T2k +
in (7.148)
L/2
9 Z2a2l-l(r2k-2l+l 2 i-l
+»"2fc+2Z-l)
(7.149)
Local Feature Extraction and Its Applications
X>r fc = - 1 ,
415
(7.150)
fcez
where the coefficients a.21-1 are given in (7.57). (b) / / the number of vanishing moments of the wavelet M > 2, then Eqs. (7.149) and (7.150) have a unique solution with a finite number of nonzero rjt, namely, r* ^ 0 for —L + 2
rk = -r-k.
Let us describe an algorithm for evaluating functions $(x) and $'(x) at any given point x € R for any given accuracy. This algorithm is used in Sec. 7.5 for generating the zero-crossings representation of a signal and reconstructing the signal from that representation. We use the iterative interpolation scheme to zoom in the interval around x until we reach the interval [x — e, x + e], where e is the prescribed accuracy. Once in this interval, the derivative at the center of this interval is computed using (7.147). The evaluation of the functions ^(x) and *'(x) may be obtained from the formula (7.65), i.e. V(x) = 2$(2x) - $ ( x ) . 1.5 p
1-
0.5-
0-
-0.5-
-1 -
-1.51 -
3
-
2
-
1
0
1
2
3
Fig. 7.10. The autocorrelation function * ( i ) (dashed line) and its derivative $'(x) (solid line) with L = 4. Note the rough shape of * ' ( x ) .
416
N. Saito
Using this algorithm, we computed $(x) and $'(x) which are shown in Fig. 7.10. The same figure may be obtained if we directly apply the iterative interpolation scheme to the unit impulse (see Fig. 7.9; see also Dubuc 55 and Daubechies & Lagarias 45 ' 46 ). Remark 7.11. The interpolation scheme discussed in this section "fills the gap" between the following two extreme cases: If the number of vanishing moments M = 1 and the length of the quadra ture mirror filter L = 2, then {il>j,k(x)} is the Haar basis and we have 1+ x $Haar(*) = < 1 - X
0
for - 1 < x < 0, for
0 < X < 1,
(7.152)
otherwise.
The interpolation process is exactly linear interpolation. (The autocorrelation function of the characteristic function $H»ar is often called the hat function.) Let us now consider the case where M —¥ co. Using the expressions (3.49)(3.52) of Ref. 12, the relation (7.56) for the 2x periodic function |mo(£)| 2 rnay be rewritten in terms of M as follows: 1
1
M
K(0I 2 = i1 + o S" 2 *- 1 cos(2k ~ W * fc=i
- + r V* (-i)fc-1coB(2fc-iK + M ^ (2* - 1)(M - m)\(M + k - 1)! " 2 2 ° f-< (2* - 1)(M - m)\(M + * - ! ) ! ' k l
l
where CM
=
,71„x (? 15 '
>
2
(2M - 1)!
(M-l)\4M-\
'
(7.154)
If M -» oo, then fc-1
K « ) I 2 -+ 2 + ^ E ^ r r
cos 2
fc=l
( *- w >
(7-155)
which is exactly the Fourier coefficient of the characteristic function X[-*/2,ir/2](Q- This implies that the corresponding autocorrelation function is $oo(x) = sinc(x) =
SmnX 7TX
.
(7.156)
Local Feature Extraction and Its Applications
417
The interpolation process then corresponds to the so-called band-limited in terpolation. Daubechies 44 noticed that if the number M of the vanishing mo ments of the compactly supported wavelet ip(x) approaches infinity, then the corresponding scaling function ip(x) itself also approaches Voo(z) = sinc(z).
(7.157)
As a result, we have the following relation: ¥>«>(*) = *«>(x)
(7.158)
and /-,
at
V hk =
*
sin7rfc/2
=
i -^]r
for
_
,„
fceZ
-
,
(7 159)
-
The autocorrelation function $ of the wavelet corresponding to the quadra ture mirror filter with M vanishing moments always satisfies the two-scale difference equation M
* ( i / 2 ) = J ^ *(k/2)*(x - k),
(7.160)
k=-M
and (7.152) and (7.156) may be considered as the two extreme examples. Thus, the quadrature mirror filters with M vanishing moments, where 1 < M < oo, provide a parametrized family of the symmetric iterative interpolation schemes. Remark 7.12. It turns out to be easy to adjust the auto-correlation shell to "life on the interval" (see Ref. 28 for a more delicate construction for wavelets on the interval). Since our filter coefficients pk are obtained by evaluating the Lagrange polynomials at the origin x — 0 (see (7.139)), it is natural to adjust the filter coefficients for the boundaries by simply generating them by evalu ating these polynomials at the desired points. For example, for the lowpass filter coefficients 2- 1 / 2 {-l/16,0,9/16,1,9/16,0, - 1 / 1 6 } based on Daubechies' QMF with L = 2M = 4, the adjusted lowpass filter coefficients for the left boundary are 2- 1 / 2 {5/16,1,15/16,0, -5/16,0,1/16}. These coefficients are convolved with the leftmost seven points of the signal to obtain the second leftmost point of the next scale.
418
N. Saito
7.5. On reconstructing
signals from
zero-crossings
Here, we formulate the problem of reconstructing signals from zero-crossings (and slopes at these points) in the autocorrelation shell. The outline of our approach is as follows: we compute and record the zero-crossings (and slopes at these zero-crossings) on each scale of the autocorrelation shell expansion within the prescribed numerical accuracy using the Dubuc's iterative inter polation scheme. For reconstruction, we set up a system of linear algebraic equations, where the unknown vector is the original signal itself and the en tries of the matrix are computed from the values of the autocorrelation function and its derivative at the integer translates of the zero-crossings. The signal is reconstructed by solving this linear system. Reconstructing a signal from its zero-crossings by solving a linear system of equations has been proposed by S. Curtis and A. Oppenheim. 41 Their method requires a solution of a linear system where the unknowns are the Fourier coefficients and, therefore, the linear system is dense. It also requires multiple threshold-crossings rather than zero-crossings, and moreover, the quality of the reconstruction depends strongly on the choice of the thresholds. We would like to note that in our approach we take advantage of multiresolution properties of the autocorrelation shell and, specifically, of Proposition 7.5. This proposition allows us to set the linear system directly for the unknown signal rather than the coefficients of its expansion. We note that this proposition does not hold if we were to use the scale-space filtering with Gaussians, for example. If we were to use biorthogonal wavelet bases, then it is possible to set up a linear system similar to that constructed in Sec. 7. In this case the difficulty is in keeping both the difference and average coefficients sufficiently smooth and balance this requirement with the computational efficiency. In Sec. 7, however, we limit ourselves to considering only the autocorrelation shell. 7.5.1. Zero-crossing detection and computation of slopes Using the iterative interpolation scheme described in Sec. 7.4, we locate the zero-crossing locations of the set of functions {53«:=o Dk^(x ~ k)}i<j<J withi*1 the prescribed numerical accuracy (in our examples we compute in double precision with e = 1 0 - 1 4 ) . To compute the locations of zero-crossings, we recursively subdivide the unit interval bracketing the zero-crossing until the length of the subdivided interval bracketing that zero-crossing becomes less than the accuracy e. The iterative interpolation scheme allows us to zoom in
Local Feature Extraction and Its Applications
419
as much as we want around the zero-crossing. This process requires at most 0(—L\og2e) operations per zero-crossing. Once the zero-crossing is found, the computation of the slope is merely the convolution of the 2(L — 2) points around the zero-crossing with the filter coefficients {TI}-L+2
££>j$(a4-fc)=0,
(7.161)
fc=0 N-l
X)^t*Vm "*) = < ,
(7-162)
fc=0
where 1 < j < J, 0 < m < Wz - 1. Applying Proposition 7.5 to (7.161) and (7.162), we have N-l
5 > 0 f c - M ^ ) = 0,
(7.163)
fc=0 N-l
53«°2" i *i,*(^) = ^ -
(7-164)
fc=o Since it is easy to evaluate $fj,k{x) and ^', k(x) for any x £ R within the prescribed accuracy as described in Sec. 7.4, we interpret (7.163) and (7.164)
420
N. Saito
as a system of linear algebraic equations where the original signal {s£} itself is the unknown vector. Using the same proposition for the available average coefficients, we have N-l
£
N-l
s°kZj,k(x) = £
#*(>,*(*).
(7.165)
fc=0 fc=0
If we evaluate (7.165) at the integer point x = 2Jl, then we have N-l
y|*fc*^(2J0 = ^
/ >
(7.166)
for I = 0 , 1 , . . . , N, - 1, where N, = 2n~J. We rewrite (7.163), (7.164) and (7.166) in a vector-matrix form as As = v,
(7.167)
where a G R w is a shorthand notation of the original signal {s^}, v € R2N*+Nis a data vector including the slopes and available coarsest subsampled coeffi cients, i.e. ,0,vNj _ j , S 0 , S 2 j , . . . , SN_2j) , (7.168) and A is a (2JV2 + Na) x N matrix and has the following structure: "*> =
(0,VQ,
... ,0,vlfi_v
... ,0,v0,...
A2 (7.169)
A = AJ
\SJJ where A J is a 2N% x TV submatrix whose entries are
{A*)3k+U
= 2-Wjit(x>),
(7.171)
for A; = 0 , . . . , N*' — 1 and I = 0 , . . . , N - 1 and SJ is an N, x N submatrix where (S J )fc,i=*j 1 j(2 J fc), (7.172)
Local Feature Extraction and Its Applications
421
for k = 0,...,Na and I = 0 , . . . , n — 1. (Note that we use vector and matrix indices starting from 0 rather than 1.) We note that the fcth row of the matrix SJ comprises the average coefficients at the scale J of the autocorrelation shell expansion of the shifted unit impulse {^2Jfc,i}o
NA = ^ 2 i V j 2 J + 1 ( L - 1) +N,2J+1{L
- 1)
j
= 2(L-1)
Y^2j2Ni+n U=i
(7.173)
J
The number of zero-crossings usually decreases as the scale j increases. As a result, the number of the nonzero entries of the matrix A is essentially O(N). The sparsity of this matrix allows one to solve the system (7.167) efficiently. Whether we can solve the linear system (7.167) depends on the condition number of the matrix (7.169), which is affected by the distribution of loca tions of zero-crossings. If there are very few zero-crossings (which means that the signal is zero over a significant part of its support) as, for example, in the expansion of the unit impulse {sjjj = 6k0,k} with only 2L zero-crossings at each scale, then we need to use additional constraints for solving the linear system (7.167), that if there are too few several approaches to introduce these additional constraints. One approach (which might be sufficient in some ap plications) would be to consider the generalized inverse of (7.169). Another possible approach (that we have experimented with), is to introduce a heuris tic constraint that the distance between the adjacent zero-crossings at the j'th scale does not exceed 2 J + 1 (L — 1). To impose these constraints on the solution of the system of linear equations (7.167), we rewrite (7.167) in terms of the difference coefficients {Dj.}. To do this, let d e ^NJ+N. ^ e a v e c t o r 0 f the autocorrelation shell coefficients including the subsampled coarsest averages: d= (D0,. ■ ■ ,DN_lt... Also, let T e R(NJ+N,)xN b e
,D0,.. a
.,DN_i,SQ,S2j,...
,SN_2j)
.
(7.174)
transformation matrix from s to d: d = Ts.
Then, the linear system (7.167) on a can be rewritten as
(7.175)
422
N. Saito
(7.176)
Ld = v, where L = AT is a (2NZ + Ns) x (NJ + N„) matrix, / L1
o \
0 2
0
L
(7.177)
\
LJ
0
0
IN.
0
)
where IN. is the 7V,-dimensional identity matrix, and IP is a 27V| x iV block matrix whose entries are
4w = *(**-'),
(7.178) (7.179)
for A; = 0 , . . . , N\\ — 1 and / = 0 , . . . , N — 1. We now impose a constraint that if there are 2 J + l ( L — 1) or more consecutive null columns in the submatrix L\ then the corresponding coefficients must be zero. These constraints may be written as Cd=0,
(7.180)
and C is an (NJ + JV,)-dimensional square matrix of the form 0 2
0
\
c
0
(7.181) CJ 0
\ o
0 0 /
where the submatrix (7 J is an AT-dimensional diagonal matrix as
{
1
if Di must be zero,
0
otherwise.
(7.182)
Local Feature Extraction and Its Applications
423
To eliminate d in favor of s in Eq. (7.180), we use the transformation matrix T, and define the matrix B = CT where B € R(NJ+N.)xN_ T h e i l ) (7.i 8 0 ) can be written as Bs = 0. Hence the problem may now be stated as follows: Minimize
\\As-v\\
subject to
Ba = 0.
(7.183)
Using the method of Lagrange multipliers, we obtain the least squares solution s = (ATA
+ XBTB)-1ATv.
(7.184)
Since we consider minimizing \\Aa - v\\ and satisfying Bs = 0 equally im portant, we assume A = 1. We note that our formulation is completely linear except for the process of the zero-crossing detection. It is clear from (7.167) and (7.183) that the slope information is essential for signal reconstruction since if there is no slope information, we only have the trivial solution, a = 0. Previously, this fact was examined only empirically.78 We also note that our approach may be modified to produce the maximabased representation of Mallat and Zhong98 by considering / * ^f(y)dy in stead of $t(x) and the corresponding two-scale difference equation. Using the symmetric iterative interpolation, we have better numerical control than by using the approaches developed by Mallat and Zhong and by Hummel and Moniot. 78 Recently, however, Berman and Baras showed in Ref. 11 that the dyadic wavelet maxima representation of Mallat and Zhong and the dyadic wavelet zero-crossings representation of Mallat 95 are, in general, non unique. In our future project, we will study the applicability of their results to our zero-crossings representation scheme. 7.5.3. Examples We first use the signal shown in Fig. 7.6 as an example. In this case, the size of the matrix A is 1852 by 512. The relative £2 error of the reconstructed signal compared with the original signal (see Fig. 7.6) is 5.674436 x 10~ 13 . The accuracy threshold e was set to 10~ 14 in this case. Next, let us consider the reconstruction of the unit impulse {<Sfc0,k} from its zero-crossings and slopes in the autocorrelation shell expansion. In this case, the size of the matrix A is 56 x 64. In Fig. 7.11a we display the matrix AT A in (7.184). In Fig. 7.11b we show the matrix with the constraints ArA + BTB. It is easy to see that the constraints serve to condition the linear system. The relative I2 error with the constraints is 7.417360 x 1 0 - 1 5 whereas the error of
424
N. Saito
(a)
(b)
Fig. 7.11. The effect of the constraints in the reconstruction of the unit impulse from zero-crossing and slopes, (a) The unconstrained matrix ATA. (b) The constrained matrix
ATA + BTB.
the solution by the generalized inverse without the constraints is 3.247662 x 10~ 4 . 7.6.
Summary
In Sec. 7 we have proposed the autocorrelation shell representation, a "hy brid" shift-invariant multiresolution representation using the autocorrelation functions of compactly supported wavelets. The autocorrelation functions of the corresponding scaling functions induce the symmetric iterative interpola tion of Dubuc 55 and Deslauriers and Dubuc 47 which allows us to interpolate efficiently on all dyadic rationals. This property of the autocorrelation func tions enables us to compute zero-crossings and slopes at the zero-crossings of the autocorrelation shell representation. This representation also gives us an explicit relation between the original signal and its expansion coefficients so that we can set up a system of linear algebraic equations for reconstructing the original signal from these zero-crossings and slopes. The original signal
Local Feature Extraction and Its Applications
425
is reconstructed within prescribed numerical accuracy by solving this linear system. 8. Further Development 8.1.
Introduction
So far we have used a library of orthonormal bases and a library of nonorthogonal bases generated by the autocorrelation functions of wavelets for the various signal analysis tasks. In the time-frequency plane, we can associate each basis function in our libraries with a rectangular box called a "Heisen berg box" or a "phase cell". 105 ' 157 In particular, an orthonormal basis in the library corresponds to a disjoint cover (or tiling) of the time-frequency plane by these rectangular boxes (see e.g. Refs. 5, 69 and 157 for pictures of various tiling patterns). This is because our basis functions are generated by applying certain translation operators both in time and frequency axes as well as dila tion operators to a set of elementary functions such as the scaling functions for wavelets/wavelet packets and "bell" functions for local trigonometric bases. Thus, the tiling of the time-frequency plane by a particular basis in our library is completely governed by the Heisenberg box of the corresponding elementary function as well as the translation and dilation operators. For a certain class of signals which we have seen so far, these basis functions have worked quite well; however, for other types of signals such as frequency-modulated signals (often called chirp signals), our basis functions may not be too efficient. In other words, energy of these signals may not be captured efficiently by our ba sic analysis tools which are constrained in rectangular boxes whose edges are parallel to the time and frequency axes; the entropy of the signal represented in such a basis may be large. Recently, Coifman, Matviyenko, and Meyer proposed one approach toward breaking this limitation: tiling the time-frequency plane by a set of "oblique" boxes which consist of linear chirps (which have a modulation factor of the form e*"Hat+b)} localized by smooth cutoff functions.29 Baraniuk and Jones made a similar observation 5 and for computing actual expansion coefficients onto such a chirp basis, they suggested to "warp" the time axis of a signal with the predefined unitary operator followed by the standard wavelet expansion so that the standard software can be used. We can certainly add these new bases in our library; in this section, how ever, to extend the usefulness of our standard library, we propose another
426 N. Saito
exploratory approach using the entropy minimization principle. Unlike the approach of Baraniuk and Jones which uses the predefined warping functions, our approach exploratively finds a time-warping (or deformation or distortion) function which monotonically increasing nonlinear function of time. This func tion makes the observed signal complicated and makes our analysis tools in efficient. By undoing the warping effect (or "unwarping" the signal), we can obtain a simpler version of the signal suitable for the analysis and interpre tation using our library of bases. This idea is also related to the minimum description length principle. We can think of the shorter description of the observed signal as two-part codes, i.e., the sum of the code lengths of: (a) the unwarped (simplified) signal and (b) the warping function itself. We can ob tain a particularly compact representation of a signal if its warping function is a smooth function such as a polynomial. For simple chirp signals, our method "discovers" the modulation laws of such signals. Finding the modulation law of chirp signals essentially amounts to "straightening" their energy distribu tions in the time-frequency plane. This gives rise to a simpler representation of the signal. This idea is best explained by figures. In Fig. 8.1, we display an example chirp signal specified by
Fig. 8.1. An example of chirp signal.
Local Feature Extraction and Its Applications
427
Fig. 8.2. The time-frequency energy distribution of the chirp signal shown in Fig. 8.1 in the local sine best basis coordinate. The horizontal and vertical axes represent time (increasing from left to right) and frequency (increasing from bottom to top), respectively. Gray levels in the image are proportional to the logarithm of squares of the expansion coefficients.
Fig. 8.3. The chirp signal of Fig. 8.1 after the "demodulation".
428 N. Saito
Fig. 8.4. The time-frequency energy distribution of the chirp signal after "demodulation" in its own local sine best basis coordinate.
x(t) = sin(977ru(t)) 4- 2 sin(177ru(t)) + 0.5 sin(537ru(<)),
for 0 < t < 1, (8.1) where u(t) = t2, and t is sampled 1,024 times. The corresponding energy distribution in the time-frequency plane using the local sine best basis of this signal is shown in Fig. 8.2. We can see that dominant energies increase linearly in this plane. The normalized entropy of this signal is 4.6909 bits in the local sine best basis. Figure 8.3 shows our result: the unwarped or "demodulated" version of the above chirp signal. All linear patterns in Fig. 8.2 are now straightened in Fig. 8.4. The normalized entropy of this unwarped signal in its own local sine best basis is just 2.7537 bits. The warping information costs 0.4576 bit with its own local sine best basis. The sum is 3.2113 bits which is smaller than the original entropy. h "In the strict MDL formulation, we need additional cost for describing two different best bases.
Local Feature Extraction and Its Applications
8.2. Discovering time-warping minimization principle
functions
by the
429
entropy
In Sec. 8.2, we describe an algorithm for finding such a time-warping function. We do not take a strict MDL approach at this point: we do not minimize the sum of the description lengths of the unwarped signal and the warping function. Instead, we focus our attention on the first term, i.e. the simplicity of the unwarped signal, by the entropy minimization principle. We will study the strict MDL-based formulation in our future project. 8.2.1. Problem formulation Let x — (xo, ■ ■. ,x„_i) G R n be a signal vector at hand. By unwarping, the samples on the regular grids in original domain are transformed to those on the irregular grids. To compute the samples on the regular grids after unwarping, we construct a continuous function x(t) of t G [0, N — 1] from the original signal samples {xk}. To do this, we use the autocorrelation function of scaling function $(£) defined in (7.45): N-l
x(t) = J2 xk$(t - k).
(8.2)
fc=0
We simply use (8.2) to obtain the interpolated values at any t with 0 < t < N — 1. Let us assume that a warping function t = D(T) is a monotonically increasing function of r G [0, N — 1] with v(0) = 0 and v(N — 1) = N — 1. Let W C C 1 [0,iV - 1] denote a space of such functions. We also assume that its inverse unwarping function T = v~*(t) exists and v _ 1 G U. Let u denote this inverse v _ 1 for short. Both time warping and unwarping are simply a change of variable of the function x. We want to find a nonlinear mapping u* from t-domain to -r-domain so that changing the variable from t to r = u*(t) minimizes the entropy of the signal. To do this, we need to define the entropy of the unwarped signal. The samples on the regular grids { 0 , 1 , . . . , N — 1} in r-domain correspond to the irregular time positions {v(0),v(l),...,v(N — 1)} in the original t-domain. Let x(v) = (x(v(0)),x(v(l)),... ,x(v(N — 1))). This vector x(v) represents the unwarped signal regularly sampled in r-domain. Now we can define the entropy of the unwarped signal x(y) as the normalized entropy of the sequence x(v) with respect to its own best basis selected from our library £: Ha(x(v))
= min H2{WTx(v)),
(8.3)
430
N. Saito
where # 2 ( 0 is the normalized entropy of a sequence defined as (2.2) in Sec. 2: 2
/
N
2
"*o--Um)*(tii)and W € O(N) denotes a matrix corresponding to a basis in our library. For our problem, the normalization is critical since the change of variable does not conserve the signal's energy. Our problem is now described as follows: find a function v* (r) such that Ha(x(v*)) 8.2.2. Numerical
= mmHs,(x(v)).
(8.4)
implementation
Since this is a nonlinear problem, it is difficult in general to find the v* giving the global minimum of (8.4). Moreover, if we do not restrict the space of solutions U, it may become computationally infeasible. Therefore, we use a one-parameter family of model functions to approximate U since this reduces the problem (8.4) to a one-dimensional nonlinear optimization problem. In particular, we use a family of hyperbolas mainly because: (a) it generates a set of smoothly varying functions covering the warping range [0, N — 1] x [0, N — 1] on the Cartesian product of t and r spaces, and (b) it allows explicit algebraic expressions for both v and u which save computational cost. Let e denote this parameter. Let us write v~x{t) = ue(t) = t + he(t), where h£(t) is a hyperbola from this family: h,(t) = s/a2e + b2 - y/a* + (t- 6)2 ,
for 0 < t < N - 1,
(8.5)
where b = (n — l ) / 2 , i.e. the midpoint of the interval [0,iV — 1], and ae = —b(e — l/£)/2. This hyperbola takes its maximum or minimum be at the midpoint t = b depending on the sign of e. Also, we have /i£(0) = hc(N — 1) = 0 and he(t) = 0 for e = 0. After some algebraic manipulations, its inverse ve(r) can be written as
v (r) - r
(N-l)(l+e*)-2er
This formula is necessary to evaluate the entropy of the unwarped signal x(ve). In the first step, what we are after is e* minimizing the entropy Hc(x(v£-)). Any standard method can be used for this one-dimensional nonlinear opti mization problem; in general, however, there is no guarantee for the method
Local Feature Extraction and Its Applications
431
to find the global minimum. Even after obtaining the optimal ve-, we may still face a problem: the globally-minimizing solution v* to (8.4) may not be well-approximated by the single parametric function ve.. Therefore, we use an iterative approach (successive approximation): replace the original signal x by the unwarped signal x(ve ) obtained by the single optimization process and iterate the whole process on this new x. For the example shown in Sec. 8.3, we used the autocorrelation of the Haar function (i.e. the hat function) in (8.2) since it permits us to use the linear interpolation which has the computational advantage to the other interpolation schemes. We also used only a local sine dictionary as the library member. In Step 2, we adopted Brent's method (pp. 402-405 of Ref. 116). The unwarped signal shown in Fig. 8.3 was obtained after three iterations of the successive approximation process mentioned above.
Fig. 8.5. Unwarping a signal warped by a tangent function. The original signal is shown in the top row.
We show another example in Fig. 8.5. The original signal was computed by the same formula (8.1) with the arctangent unwarping function, u{t) = -[tan _ 1 (tan(l) • (2t - 1)) + 1],
for 0 < t < 1.
432
N. Saito
Fig. 8.6. Discovering the modulation law of the signal shown in Fig. 8.5. Dotted lines indicate the true unwarping function. Solid lines indicate the approximation to the true unwarping function obtained at each iteration.
After five iterations of the above algorithm, it reached a minimum entropy solution (this may still be a local minimum). Figure 8.6 shows how the un warping function obtained at each iteration is successively approaching to the true unwarping function. 8.3.
Discussion
The above algorithm is designed for signals warped by smooth global func tions. For a signal deformed locally, i.e. a signal whose time-frequency energy distribution has more complicated local structures, the hierarchical algorithm similar to the best-basis algorithm should be used: Step 0: Segment a signal into a binary tree of dyadic blocks using the smooth cutoff functions similarly to the local trigonometric transforms. Step 1: At each node, compute the best unwarping function and the minimum entropy value. Step 2: Prune the binary tree: at each parent node of the binary tree, elimi nate its two children nodes if the sum of the minimum entropy by unwarping
Local Feature Extraction and Its Applications
433
the signals in two child nodes separately is greater than or equal to the mini mum entropy of unwarped signal in the parent node. Step 3: Construct the unwarping function by combining unwarping functions at all the survived nodes. We will implement the multiple window unwarping algorithm with careful boundary treatment in our future project. Straightening the time-frequency energy distribution is an important is sue not only for the compact representations but also for understanding the pattern/signal generating mechanisms. This problem is also related to the nonuniform sampling of signals. 25 ' 159 In speech recognition, the concept of time-warping is used for matching distorted spoken words to the standard templates. 136 For images, deformation or domain warping has important ap plications in computer vision, 64,108 medical image analysis, 8 and structural geology.70 We will continue our research in this important area. 9. Conclusion In this thesis, we have explored many problems related to feature extraction using a library of bases. Our philosophy, the best-basis paradigm, is to select the best possible basis for the problem at hand from a library of bases. Wavelet packet dictionaries (including wavelet bases) and local trigonometric dictionar ies are the main constituents of our library. In Sees. 4-6, we have seen that this array of tools provides us with better understanding and insight on the data-generating mechanisms or underlying physical phenomena than the con ventional methods. Moreover, we have shown that these tools provide efficient and concrete numerical algorithms for the problem at hand. We believe that these tools unify or extend various conventional method ologies or concepts. Table 9.1 summarizes the correspondences discussed or obtained in this thesis. We have also indicated that the best-basis paradigm provides a bridge between the statistical and syntactic approaches to pattern recognition problems; the best-basis paradigm allows one to represent a class of signals by a basis selected from a set of tree-structured bases and one can interpret its tree structure as a grammar or a rule specifying that class. The best-basis paradigm may also be viewed as a unifying tool in the area of com putational and descriptive complexities. Tom Cover says in his stimulating book (p. 3 of Ref. 40):
434
N. Saito
Table 9.1. Summary of the correspondences between the conventional concepts and the new concepts based on the "best-basis paradigm/a library of bases" reviewed, discussed, or developed in this thesis. Object Coordinate system Interpolation scheme
Conventional concept
"Library" concept
Standard Euclidean basis Fourier basis Linear interpolation Band-limited interpolation
Wavelet packet bases Local trigonometric bases Symmetric iterative interpolation
Compression
Difference of boxcars Difference of Gaussians Karhunen-Loeve basis
Difference of autocorrelation Functions of wavelets Joint best basis
Classification Regression
Linear discriminant analysis Linear regression
Local discriminant basis Local regression basis
Edge operator
"One can think about computational complexity (time complexity) and Kolmogorov complexity (program length or descriptive com plexity) as two axes corresponding to program running time and program length. Kolmogorov complexity focuses on minimizing along the second axis, and computational complexity focuses on minimizing along the first axis. Little work has been done on the simultaneous minimization of the two." We believe that the best-basis paradigm approaches a simultaneouslyminimizing solution in this plane. Many future projects have been mentioned in the course of this thesis. This thesis concludes with a list of most important problem areas which could not be treated here and will be studied in the future: (a) Robustness: In a real data set, the existence of outliers is not uncom mon. Therefore, robust estimation techniques 76 should be considered. These methods use median, median absolute deviation, and errors measured in l l norm, instead of mean, standard deviation, ^ 2 -based errors which have been used in this thesis. To make the results of this thesis more robust, it is necessary to incorporate these robust methodologies in the basis selection mechanisms and error estimates. Another robustness problem is the shift sensitivity of the expansion coefficients onto our orthonormal bases in the library. One possible solution is to create a few circularly-shifted versions of the original signals
Local Feature Extraction and Its Applications
435
— shifts either in time or in frequency or in both — as discussed in Sec. 3. This reduces the sensitivity of the coefficients to the shifts and at the same time increases the number of training samples for classification and regression problems. (b) Invariance: Many real data, such as hand-written characters, speech signals, or subsurface formation patterns, are deformed or distorted from their standard (or normal) templates and patterns. Classifying such data set re quires splitting the input space into the equivalent classes, each of which con tains all kinds of distorted patterns generated from a single template. Simpler distortions include translation, dilation, and rotation operators and their com binations. Invariance in pattern recognition was studied by many researchers, in particular, Amari, 2 Grenander, 64 Kanatani, 79 Lenz, 89 Otsu 112 and Segman et al.137 Research to deal with general nonlinear distortion functions is still in its infancy. It is necessary to extend the ideas proposed in Sec. 8 and make them robust. (c) How to combine the dictionaries: The current best-basis paradigm first selects the best basis (JBB/LDB/LRB) for the problem at hand from each dictionary individually in the library. Then, it selects the best of these best bases in the library. For signals having composite features, such as the signals containing spikes and sinusoids in Fig. 4.6, the best-basis paradigm iteratively "peels off" the features: it first extracts the major features by some best basis, then the residual signals are supplied to the algorithm to get the secondary features, and it repeats the process. A more efficient (in the sense of descrip tive complexity) way to handle this problem is to describe such a signal as a linear combination of the basis functions from different dictionaries; computa tionally, however, it may be very expensive; see also the related proposal called "matching pursuit" by Mallat and Zhang. 97 (d) Higher dimensional signals: All the approaches proposed in this the sis can be extended to images and multidimensional signals; however, except for the denoising applications, we have not applied our methods to real im ages. Extending the library for images, e.g., adding the non-separable twodimensional wavelets, should be implemented. Also, research on deformations and distortions of two-dimensional features is quite challenging but may pro vide us with fascinating new ideas.
436
N. Saito
Appendix A. MDL-Based Tree Pruning Algorithms Appendix A . l .
Introduction
In Sees. 4-6, we have applied pruning algorithms to eliminate unimportant branches of the fully-grown trees. Pruning trees are important to avoid "over training". 18 The larger or more complex a tree becomes, the better the perfor mance on the training data set that was used to generate that tree; however, in general, the performance on the test data set gets worse since it learns too many specific features of the training data set and loses its generalization power. On the other hand, if a tree is too small, then it may not capture some important information in the training data set for the problem at hand. An important question is how to obtain the "right-sized" tree. This situation is similar to determining the number of basis coefficients to retain in the problem of simul taneous noise suppression and signal compression in Sec. 3. In this appendix, we consider two pruning algorithms and compare their performances. Appendix A . 2 . Minimal
pruning
cost-complexity
Breiman et al. suggested the so-called "minimal cost-complexity pruning" method 18 (see also Ref. 26). To explain their algorithm, we need some termi nology. Let T r a a x denote a fully-grown tree where each node is either "sparse" (e.g., it only contains less than 10 samples) or "pure" (e.g., the node deviance becomes less than 1% of the root node). Let tk denote a node in the tree, and in particular, let to denote the root node. A branch (or subtree) of a tree T consisting of the node t and all descendants of t is denoted by Ti and we write this relationship as T >- Tt. Let T denote a set of terminal nodes in T, and \f\ denote the number of terminal nodes in T and we call this number the "size" of the tree T. Then the cost complexity of T is defined as Da{T)±D{T)
+ a\f\,
(A.l)
where D{T) is the deviance of the tree T, i.e. the entropy for CTs, the residual sum of squares (RSS) for RTs. The parameter a > 0 is called the complexity parameter. We can easily see the second term is the regularization term and is closely related to Barron's complexity regularization technique 6 (see also Sec. 3.6). The minimal complexity (sub)tree T(a) for a fixed a is defined by Da{T(a))± and
nm
Da(T)
(A.2)
Local Feature Extraction and Its Applications
if
Da(T) = Da(T(a)),
then T(a) r< T.
437
(A.3)
This definition selects the smallest minimizer of Da if there are ties. Breiman et al. showed that for every a > 0, there exists a smallest minimizing subtree as denned by (A.2) and (A.3). Moreover, using the finiteness of the number of subtrees of T m a x , Breiman et al. showed how to obtain the sequence of subtrees T m a x = Ti y T2 y ■ ■■ y {to}, where T/t = T(otk) = T(a) for a* < a < atk+i with a i = 0. Thus, the remaining important question is how to find an optimal a and equivalently, an optimal subtree from this sequence. If we use the resubstitution estimates of D (i.e. using the training data set on which the full tree was grown), the values of D become too optimistic. To overcome this prob lem, Breiman et al. suggested using the so-called "cross-validation" technique. Given a sequence of the subtree sizes, |T m a x | = |Ti| > • • • > |{to}| = 1, this
Fig. A.l. A comparison of curves of subtree size vs. deviance using the resubstitution estimates (dotted line) and the cross-validation estimates (solid line). The data set of Ex ample 4.6 represented in the LRBP coordinates is used. The algorithm of Breiman et al. does not necessarily generate a subtree sequence with regularly decreasing sizes. This situation is indicated by the symbols c and r on the curves. The values of the corresponding complexity parameter a is shown on top of the frame.
438
N. Saito
procedure first randomly divides the training data set into mutually exclusive M subsets (say, M = 10) and uses M — 1 subsets for constructing the sequence of subtrees of given sizes. Then, the remaining subset is used to evaluate the sequence: this subset is used as an independent test data set and the deviance of each subtree is computed. This process is iterated for M times, i.e., each subset is used for a test data set and the deviances are accumulated and then averaged. Figure A.l compares deviances computed by the resubstitution and the cross-validation. The tree was grown on the data set of Example 4.6 represented in the LRBP coordinates. The full tree has already been displayed in Fig. 5.1. If we use the resubstitution estimates, we get the full tree as the minimal complexity tree; on the other hand, a subtree with eight terminal nodes is chosen using the cross-validation estimates. The major problems of this cross-validation approach are: (a) how to determine the parameter M, i.e. into how many subsets should we divide the training data set (why not M = 2? why not the extreme case, M = N?), and (b) the computational burden of the repeated tree-growing and pruning processes. Because of these problems, we prefer to use the MDL-based pruning algorithm which we discuss in Sec. A.3. Appendix A.3. MDL-based
pruning
algorithms
Use of the MDL principle for tree pruning have been considered by several researchers, such as Rissanen 128 (Sec. 7.2 of Ref. 128), Quinlan and Rivest, 117 Wallace and Patrick. 148 Here, we modify the algorithm proposed by Rissanen specifically for CART. An essential idea behind Refs. 117, 128 and 148 is again the two-stage encoding: description of a tree and the data using that tree in binary strings. The MDL principle suggests picking the tree giving the shortest code length. As we saw in Sec. 3, this principle frees the user from any parameter setting such as M or a in the minimal complexity pruning. More precisely, the code length of this two-stage code L(y,T) for the response vector y in the training data set and a given tree T can be written as L(y,T)
= L(y\T) + L(T),
(A.4)
where L(y\T) represents the code length for describing y given the tree T and L(T) is the code length of the tree itself. The MDL principle suggests picking a tree T* minimizing (A.4), MDL(y,T')
=
nun L(y,T).
(A.5)
Local Feature Extraction
and Its Applications
439
To derive an actual algorithm to compute T", we first consider the second term L(T) of (A.4), i.e. how to encode a tree generated by the CART algorithm. Now let us check how a tree is represented in the S and S-PLUS package. Let us consider the small tree shown in Fig. A.2 which is a pruned (by our MDL algorithm) version of the full CT displayed in Fig. 5.1. The following is the printout of this tree. node), s p l i t , n, deviance, y v a l , (yprob) * denotes t e r m i n a l node 1) root 300 659.20 c l a s s l (0.33330 0.33330 0.33330) 2) x.3<-4.13682 131 175.30 c l a s s 2 (0.28240 0.70230 0.01527) 4) x.6<-0.182365 77 36.99 c l a s s 2 (0.02597 0.94810 0.02597) * 5) x.6>-0.182365 54 70.05 c l a s s l (0.64810 0.35190 0.00000) * 3) x.3>-4.13682 169 279.90 c l a s s 3 (0.37280 0.04734 0.57990) 6) x.6<-1.1214 87 53.42 c l a s s 3 (0.00000 0.09195 0.90800) * 7) x.6>-1.1214 82 88.78 c l a s s l (0.76830 0.00000 0.23170) *
; classl
x.3<-4.13682 x.3>-4.13682
x.6<-0.182365
\
x.6<-1.1214
x.6>-0.182365
/
x.6>-1.1214
/
\
/
class2
classl
class3
classl
\
4/77
19/54
8/87
19/82
Fig. A.2. The pruned classification tree (by the MDL-based pruning algorithm) from the full tree shown in Fig. 5.1.
440
N. Saito
We note that the representation of an RT has the averages of the responses instead of the class names and has no class probabilities. Suppose T under consideration has k terminal nodes. Since this is a binary tree, it is easy to see that the total number of nodes including internal nodes is 2A: — 1. Encoding the tree structure is very simple: use 0 for internal nodes (in this example, the root and nodes # 2 and #3) and 1 for terminal nodes (nodes # 4 - # 7 ) . Any nontrivial tree, however, always starts with the root node (which is an internal node) and ends with the terminal node. Hence it is not necessary to describe the first 0 and the last 1. Now the structure of T can be represented as a binary string of length 2A: — 3 containing k — 1 Is for k > 1. The code for the example shown in Fig. A.2 is 01101. From (3.4), the description length of such binary string is log ( k J7X ). Except the tree structure, we still need to encode the information contained in each node. Each node t £ T has: (a) the coordinate index m (1 < m < N = dim(x)), (b) the coordinate threshold 8m which is a real number, and (c) the node value (i.e. the class label for CT, and the average of the responses for RT which is a real number). We note that the deviance is not necessary for a tree description. These code lengths (ignoring the integer requirement again) can be easily computed using the examples studied in Sec. 3. Let us use the same notation as the previous sections, i.e., N and C denote the number of total training samples and the number of classes in the training data set. Then the code lengths of the above three terms are: (a) logiV; (b) (1/2) log N; and (c) logC for CT and (1/2) log N for RT, respectively. We can still reduce the code length by noting that each splitting rule is duplicated with only difference in inequality directions as shown in the printout of the tree representation in S and S-PLUS. Nodes pointed by a left branch always have "<" and the ones pointed by a right branch have ">" in the splitting rule. Thus, we can reduce the description length of the coordinate indices and thresholds by half. Also, it is not necessary to describe these attributes for the root node; only information encoded at the root node is the node value whose description length is simply a constant c — log 7 where 7 = C for classification and 7 = ^/N for regression. Hence the total code length to describe the tree is L(T) = log ( 2 * J x 3 ) + (2k - 2 ) Q logn + i logiV + l o g 7 ) + c = log (*£'*)
+ ( * " ! ) logtnVJvV) + c.
(A.6)
Local Feature Extraction and Its Applications
441
Now we proceed to encoding the response vector y given the tree T, i.e. the second term of (A.4). For the CTs, each vector Xi in the training class reaches one of the terminal nodes in the tree. Let us record y* in that node. Then after dropping all the training vectors into this tree, each terminal node has a subsequence of the class sequence y = ( y i , . . . , VN). Rissanen suggests using the sum of the description lengths of these subsequences. More precisely, let y(t) denote the subsequence of y = (3/1,..., yjv) which reached t. Let Nc(t) be the number of class c samples reached t and let N(t) = J3 c =i ^c(*)- As we can see, the subsequence y(t) can be described as a string of C-ary alphabet of length N(t) whose complexity can be computed via (3.5). Thus, we have
L(V\T) = Y±Hv(t)\T),
(A.7)
tef where
For the RTs, we need to assume some probability distributions. Since S and S-PLUS assume the i.i.d. Gaussian model and the RSS is used as the deviance for growing trees, we also follow this assumption, i.e., t/< ~ A/"(/ii,
(A.9)
where y(T) denotes the RT estimate of /1 = (fii,. ■ -,MAT), i-e. a piecewise constant function, given T and d is a constant independent of T. We can now compute the MDL value (A.4) for any tree in CART combining (A.6)-(A.9). Remark A . l . We note that to compute the terms of log of binomial, say log (£) where p > q are both non-negative integers, it is faster to use the builtin log-gamma function if available than the plain multiplication for large p. Since p! = T(p + 1) we should expand
log (- j = iogr(p -i-1) - logrte +1) - io g r(p - q +1).
442
N. Saito
Then we only need three times of log-gamma function calls) rather than directly computing log(p!/g!(p — q)\) (i.e. 2(p — q) multiplications, 1 division, and 1 log function call). Figure A.3 shows the description lengths of the subtrees in the sequence. This plot suggests using the subtree with four terminal nodes for this example.
1 •
2
y y
y
y
\
V 10 size
16
20
Fig. A.3. A curve of subtree size vs. MDL value of the tree shown in Fig. 5.1. The subtree with four terminal nodes gives the minimum value.
Acknowledgments First of all, I would like to thank my advisor, Professor Ronald Raphy Coifman, for his support, encouragement, useful suggestions, and enthusiastic dis cussions. I was very fortunate to have the opportunity to work with him. Many thanks are also due to Professor Gregory Beylkin of the University of Colorado at Boulder for introducing me to the world of wavelets and suggesting that I should pursue my Ph.D. at Yale under the supervision of Professor Coifman. Section 7 is a result of our continuing collaboration. I would like to thank Professor Yves Meyer of the Universite de Paris-Dauphine, and Professor Vladimir Rokhlin of the Computer Science
Local Feature Extraction and Its Applications
443
Department at Yale for serving as the reading committee members of my thesis. I have been quite influenced by Professor Meyer's books. I am also indebted to the work of Professor David Donoho of Stanford University and Professor Victor Wickerhauser of Washington University at St. Louis. I would like to thank Professor Andrew Barron of the Statistics Depart ment at Yale for helpful discussions. I also learned a lot from his course on information theory. Many of my colleagues at Schlumberger-Doll Research gave me helpful sug gestions and encouragement; in particular, I appreciate valuable inputs from: Drs. Robert Burridge, Stefan Luthi, Douglas Miller, Ram Shenoy and Lisa Stewart. Schlumberger's management team has been supportive throughout my study at Yale. I would like to thank especially Dr. Bill Murphy and Mr. Luis Ayestaran, the former and the current directors of the Interpretation Science Department. I would like to thank Toshiki Saito, my father, and Shigeko Yamaki, my grandmother, for making this possible through their constant encouragement and support. I was deeply affected by my mother Teruko who passed away when I was 11 years old. She was the first to show me how interesting mathe matics is. My parents-in-law, Masuto and Yoko Yashiki, and my aunt-in-law, Hiroko Katayama always encouraged me and supported my study. My sons, Tomoya and Yuta refreshed my mind many times. Last but not least, I would like to give my special thanks to my wife, Mayumi. Without her, I could not have completed this thesis at all. This research was supported in part by Schlumberger-Doll Research and by APRA ATR program. Bibliography 1. N. Ahmed and K. R. Rao, Orthogonal Transforms for Digital Signal Processing (Springer-Verlag, 1975). 2. S. Amari, Invariant structures of signal and feature space in pattern recognition problems, RAAG Memoirs 4 (1968) 553-566. 3. P. Auscher, G. Weiss and M. V. Wickerhauser, Local sine and cosine bases of Coifman and Meyer and the construction of smooth wavelets, in Wavelets: A Tutorial in Theory and Applications, ed. C. K. Chui (Academic Press, 1992), pp. 237-256.
444
N. Saito
4. B. R. Bakshi and G. Stephanopoulos, Wave-net: a multiresolution, hierarchical neural network with localized learning, AIChE J. 39 (1993) 57-81. 5. R. G. Baraniuk and D. L. Jones, Shear madness: new orthonormal bases and frames using chirp functions, IEEE Trans. Signal Processing 4 1 (1993) 35433549. 6. A. R. Barron, Complexity regularization with application to artificial neural net works, in Nonparametric function Estimation and Related Topics, ed. G. Roussas (Kluwer, 1991), pp. 561-576. 7. A. R. Barron and T. M. Cover, Minimum complexity density estimation, IEEE Trans. Inform. Theory 37 (1991) 1034-1054. 8. K. A. Bartels and A. C. Bovik, Shape change analysis and shape modeling using three dimensional biomedical images, Proc. ICASSP-93, Vol. 5 (IEEE, 1993) pp. 93-96. 9. M. Basseville, Distance measures for signal processing and pattern recognition, Signal Processing 18 (1989) 349-369. 10. R. A. Becker, J. M. Chambers and A. R. Wilks, The New S Language: A Program ming Environment for Data Analysis and Graphics (Chapman & Hall, 1988). 11. Z. Berman and J. S. Baras, Properties of the multiscale maxima and zerocrossings representations, IEEE Trans. Signal Processing 4 1 (1993) 3216-3231. 12. G. Beylkin, On the representation of operators in bases of compactly supported wavelets, SIAM J. Numer. Anal. 29 (1992) 1716-1740. 13. G. Beylkin, R. Coifman and V. Rokhlin, Fast wavelet transforms and numerical algorithms I, Comm. Pure Appl. Math. 44 (1991) 141-183. 14. G. Beylkin and B. Torr&ani, Implementation of operators via filter banks, au tocorrelation shell and Hardy wavelets, Appl. Comput. Harmonic Anal. 3 (1996) 164-185. 15. T. I. Boubez and R. L. Peskin, Multiresolution neural networks, Wavelet Appli cations, ed. H. H. Szu, April 1994, Proc. SPIE 2242, pp. 649-660. 16. B. Bradie, R. Coifman and A. Grossmann, Fast numerical computations of oscil latory integrals related to acoustic scattering, I, Appl. Comput. Harmonic Anal. 1 (1993) 94-99. 17. J. N. Bradley and C. M. Brislawn, Image compression by vector quantization of multiresolution decompositions, Physica D 6 0 (1992) 245-258. 18. L. Breiman, J. H. Friedman, R. A. Olshen and C. J. Stone, Classification and Regression Trees (Chapman & Hall, 1993), previously published by Wadsworth & Brooks/Cole in 1984. 19. P. J. Burt, Fast filter transforms for image processing, Comput. Graphics Image Processing 16 (1981) 20-51. 20. , Algorithms and architectures for smart sensing, Proc. Image Under standing Workshop, April 1988, pp. 139-153. 21. P. J. Burt and E. H. Adelson, The Laplacian pyramid as a compact image code, IEEE Trans. Commun. 3 1 (1983) 532-540. 22. D. E. Cannon and G. R. Coates, Applying mineral knowledge to standard log interpretation, Trans. Soc. Prof. Well Log Anal, 31st Annual Logging Symposium, 1990, Paper V.
Local Feature Extraction and Its Applications
445
23. J. M. Chambers and T. R. Hastie, Statistical Models in S (Chapman & Hall, 1992). 24. B. Cheng and D. M. Titterington, Neural networks: a review from a statistical perspective (with comments), Statist. Sci. 9 (1994) 2-54. 25. J. J. Clark, M. R. Palmer and P. D. Lawrence, A transformation method for the reconstruction of functions from nonuniformly spaced samples, IEEE Trans. Acoust, Speech, Signal Processing ASSP-33 (1985) 1151-1165. 26. L. A. Clark and D. Pregibon, Tree-based models, in Statistical Models in 5 , eds. J. M. Chambers and T. R. Hastie (Chapman & Hall, 1992), pp. 377-419. 27. A. Cohen, I. Daubechies and J.-C. Feauveau, Biorthogonal bases of compactly supported wavelets, Comm. Pure Appl. Math. 45 (1992) 485-560. 28. A. Cohen, I. Daubechies and P. Vial, Wavelets on the interval and fast wavelet transforms, Appl. Comput. Harmonic Anal. 1 (1993) 54-81. 29. R. Coifman, G. Matviyenko and Y. Meyer, Modulated Malvar-Wilson bases, Appl. Comput. Harmonic Anal. 4 (1997) 58-61. 30. R. R. Coifman and F. Majid, Adapted waveform analysis and denoising, in Progress in Wavelet Analysis and Applications, eds. Y. Meyer and S. Roques (Editions Frontieres, 1993) pp. 63-76. 31. R. R. Coifman and Y. Meyer, Nouvelles bases orthonormies de L 2 (R) ayant la structure du system de Walsh, preprint, Department of Mathematics, Yale University, New Haven, CT, August 1989. 32. , Orthonormal wave packet bases, preprint, Department of Mathematics, Yale University, New Haven, CT, 1990. 33. , Remarques sur I'analyse de fourier a fenetre, Comptes Rendus Acad. Sci. Paris, Serie 7 3 1 2 (1991) 259-261. 34. R. R. Coifman and N. Saito, Constructions of local orthonormal bases for clas sification and regression, Comptes Rendus Acad. Sci. Paris, Serie I 319 (1994) 191-196. 35. R. R. Coifman and M. V. Wickerhauser, Entropy-based algorithms for best basis selection, IEEE Trans. Inform. Theory 38 (1992) 713-719. 36. , Wavelets and adapted waveform analysis, in Wavelets: Mathematics and Applications, eds. J. Benedetto and M. Frazier (CRC Press, 1993). 37. J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series, Math. Comput. 19 (1965) 297-301. 38. T. M. Cover, The best two independent measurements are not the two best, IEEE Trans. Syst. Man Cybern. S M C - 4 (1974) 116-117. 39. T. M. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inform. Theory I T - 1 3 (1967) 21-27. 40. T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley Interscience, 1991). 41. S. R. Curtis and A. V. Oppenheim, Reconstruction of multidimensional signals from zero crossings, J. Opt. Soc. Amer. A 4 (1987) 221-231. 42. I. Daubechies, Orthonormal bases of compactly supported wavelets, Comm. Pure Appl. Math. 4 1 (1988) 909-996.
446 N. Saito 43. 44. 45. 46. 47. 48. 49. 50. 51.
52.
53. 54. 55. 56.
57.
58. 59. 60. 61. 62.
, Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics, Vol. 61 (SIAM, 1992). , Orthonormal bases of compactly supported wavelets II. Variations on a theme, SIAM J. Math. Anal. 24 (1993) 499-519. I. Daubechies and J. C. Lagarias, Two-scale difference equations I. Existence and global regularity of solutions, SIAM J. Math. Anal. 22 (1991) 1388-1410. , Two-scale difference equations II. Local regularity, infinite products of matrices and fractals, SIAM J. Math. Anal. 23 (1992) 1031-1079. G. Deslauriers and S. Dubuc, Symmetric iterative interpolation processes, Constructive Approx. 5 (1989) 49-68. P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach (Prentice-Hall, 1982). R. A. DeVore, B. Jawerth and B. J. Lucier, Image compression through wavelet transform coding, IEEE Trans. Inform. Theory 38 (1992) 719-746. D. L. Donoho, Interpolating wavelet transforms, Technical Report 408, Depart ment of Statistics, Stanford University, Stanford, CA, Oct. 1992. , Nonlinear wavelet methods for recovery of signals, densities, and spectra from indirect and noisy data, Proc. Symp. Appl. Math., ed. I. Daubechies (AMS, 1993), pp. 173-205. , Wavelet shrinkage and W.V.D.: A 10-minute tour, Progress in Wavelet Analysis and Applications eds. Y. Meyer and S. Roques (Editions Prontieres, 1993), pp. 109-128. D. L. Donoho and I. M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika 81 (1994) 425-^155. N. R. Draper and H. Smith, Applied Regression Analysis (John Wiley & Sons, 1981), 2nd ed. S. Dubuc, Interpolation through an iterative scheme, J. Math. Anal. Appl. 114 (1986) 185-204. P. Dutilleux, An implementation of the "algorithme a trous" to compute the wavelet transform, in Wavelets, Time-Frequency Methods and Phase Space, eds. J. M. Combes, A. Grossmann and Ph. Tchamitchian (Springer-Verlag, 1989), pp. 298-304. M. Duval-Destin, M. A. Muschietti and B. Torresani, Continuous wavelet decompositions, multiresolution and contrast analysis, SIAM J. Math. Anal. 24 (1993) 739-755. B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap (Chapman & Hall, 1993). R. A. Fisher, The use of multiple measurements in taxonomic problems, Ann. Eugenics 7 (1936) 179-188. J. H. Friedman and W. Stuetzle, Projection pursuit regression, J. Amer. Statist. Assoc. 76 (1981) 817-823. J. H. Friedman and J. W. Tukey, A projection pursuit algorithm for exploratory data analysis, IEEE Trans. Comput. 23 (1974) 881-890. K. S. Fu, Syntactic Pattern Recognition and Applications (Prentice-Hall, 1982).
Local Feature Extraction and Its Applications
447
63. K. Fukunaga, Introduction to Statistical Pattern Recognition (Academic Press, 1990), 2nd ed. 64. U. Grenander, General Pattern Theory: A Mathematical Study of Regular Struc tures (Oxford Univ. Press, 1993). 65. K. Grochenig and W. R. Madych, Multiresolution analysis, Hoar bases, and selfsimilar tilings of Rn, IEEE Trans. Inform. Theory 38 (1992) 556-568. 66. A. Grossmann, Wavelet transforms and edge detection, in Stochastic Processes in Physics and Engineering, eds. S. Albeverio, P. Blanchard, M. Hazewinkel and L. Streit (D. Reidel, 1988), pp. 149-157. 67. H. Guo and S. B. Gelfand, Classification trees with neural network feature ex traction, IEEE Trans. Neural Networks 3 (1992) 923-933. 68. W. S. Harlan, J. F. Claerbout and F. Rocca, Signal/noise separation and velocity estimation, Geophys. 49 (1984) 1869-1880. 69. C. Herley, J. Kovacevic, K. R&mchandran and M. Vetterli, Tilings of the timefrequency plane: construction of arbitrary orthogonal bases and fast tiling algo rithms, IEEE Trans. Signal Processing 4 1 (1993) 3341-3359. 70. V. Hirsinger and B. E. Hobbs, A general harmonic coordinate transformation to simulate the states of strain in inhomogeneously deformed rocks, J. Struct. Geol. 5 (1983) 307-320. 71. R. E. Hoard, Sonic waveform logging: a new way to obtain subsurface geologic information, SPWLA 24th Annual Logging Symposium, 1983, Paper XX. 72. M. Holschneider, R. Kronland-Martinet, J. Morlet and A. Grossmann, A realtime algorithm for signal analysis with the help of the wavelet transform, in Wavelets, Time-Frequency Methods and Phase Space, eds. J. M. Combes, A. Grossmann, and Ph. Tchamitchian (Springer-Verlag, 1989), pp. 286-297. 73. T. Hopper, Compression of gray-scale fingerprint images, in Wavelet Applica tions, eds. H. H. Szu, April 1994, Proc. SPIE 2242, pp. 180-187. 74. H. Hotelling, Analysis of a complex of statistical variables into principal compo nents, J. Educ. Psych. 24 (1933) 417-441; 498-520. 75. K. Hsu, Wave separation and feature extraction of acoustic well-logging wave forms using Karhunen-Loeve transformation, Geophys. 55 (1990) 176-184. 76. P. J. Huber, Robust Statistics (John Wiley & Sons, 1981). 77. , Projection pursuit (with discussion), Ann. Statist. 13 (1985) 435-525. 78. R. Hummel and R. Moniot, Reconstruction from zero crossings in scale space, IEEE Trans. Acoust, Speech, Signal Processing 37 (1989) 2111-2130. 79. K. Kanatani, Group-Theoretical Methods in Image Understanding (SpringerVerlag, 1990). 80. K. Karhunen, Uber linearen methoden in der Wahrscheinlichkeitsrechnung, Ann. Acad. Sci. Fennicae, Ser. A 3 7 (1947) No. 1. 81. C. V. Kimball and T. L. Marzetta, Semblance processing of borehole acoustic array data, Geophys. 49 (1984) 274-281. 82. A. N. Kolmogorov, Three approaches to the quantitative definition of information, Problems Inform. Transmission 1 (1965) 1-7. 83. G. Korvin, Fractal Models in the Earth Sciences (Elsevier, 1992).
448 N. Saito 84. J. Kovacevic and M. Vetterli, Nonseparable multidimensional perfect reconstruc tion filter banks and wavelet bases for Rn, IEEE Trans. Inform. Theory 38 (1992) 533-555. 85. R. Kronland-Martinet, J. Morlet and A. Grossmann, Analysis of sound patterns through wavelet transforms, J. Pattern Recognition Artificial Intell. 1 (1987) 273-302. 86. J. B. Kruskal, Toward a practical method which helps uncover the structure of a set of multivariate observations by finding the linear transformation which opti mizes a new 'index of condensation', in Statistical Computation, eds. R. C. Milton and J. A. Nelder (Academic Press, 1969), pp. 427-440. 87. S. Kullback and R. A. Leibler, On information and sufficiency, Ann. Math. Statist. 22 (1951) 79-86. 88. Y. G. Leclerc, Constructing simple stable descriptions for image partitioning, Int. J. Computer Vision 3 (1989) 73-102. 89. R. Lenz, Group Theoretical Methods in Image Processing, Lecture Notes in Com puter Science, Vol. 413 (Springer-Verlag, 1990). 90. M. Li and P. Vitanyi, An Introduction to Kolmogorov Complexity and Its Appli cations (Springer-Verlag, 1993). 91. M. Loeve, Sur les fonctions aleatoires stationnaires de second ordere, Rev. Sci. 83 (1945) 297-310. 92. J. Lu, J. B. Weaver and D. M. Healy, Jr., Noise reduction with multiscale edge rep resentation and perceptual criteria, Proc. IEEE Intern. Symp. Time-Frequency and Time-Scale Analysis, Victoria, British Columbia, October 1992, pp. 555-558. 93. S. Mallat, Multiresolution approximations and wavelet orthonormal bases in L2(R), Trans. Amer. Math. Soc. 315 (1989) 69-87. 94. , A theory for multiresolution signal decomposition, IEEE Trans. Pattern Anal. Machine Intell. 11 (1989) 674-693. 95. , Zero-crossings of a wavelet transform, IEEE Trans. Inform. Theory 37 (1991) 1019-1033. 96. S. Mallat and W. L. Hwang, Singularity detection and processing with wavelets, IEEE Trans. Inform. Theory 38 (1992) 617-643. 97. S. Mallat and Z. Zhang, Matching pursuit with time-frequency dictionaries, IEEE Trans. Signal Processing 4 1 (1993) 3397-3415. 98. S. Mallat and S. Zhong, Characterization of signals from multiscale edges, IEEE Trans. Pattern Anal. Machine Intell. 14 (1992) 710-732. 99. H. S. Malvar, The LOT: transform coding without blocking effects, IEEE Trans. Acoust, Speech, Signal Processing 37 (1989) 553-559. 100. , Lapped transforms for efficient transform/subband coding, IEEE Trans. Acoust, Speech, Signal Processing 38 (1990) 969-978. 101. D. Marr, Vision (W. H. Freeman and Co., 1982). 102. D. Marr and E. Hildreth, Theory of edge detection, Proc. R. Soc. London, Ser. B 2 0 7 (1980) 187-217. 103. G. J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition (John Wiley & Sons, 1992).
Local Feature Extraction and Its Applications 449 104. Y. Meyer, Ondelettes et functions splines, Technical report, seminaire EDP, Ecole Polytechnique, Paris, 1986. 105. , Wavelets: Algorithms and Applications (SIAM, 1993), translated and revised by R. D. Ryan. 106. , Wavelets and Operators, Cambridge Studies in Advanced Mathematics, Vol. 57 (Cambridge Univ. Press, 1993), translated by D. H. Salinger. 107. M. L. Minsky and S. A. Papert, Perceptions (MIT Press, 1988), expanded ed. 108. D. Mumford, Pattern theory: a unifying perspective, Proc. First European Congress of Mathematicians (Birkhauser, 1993). 109. W. Murphy, A. Reischer and K. Hsu, Modulus decomposition of compressional and shear velocities in sand bodies, Geophys. 58 (1993) 227-239. 110. W. Niblack, MDL Methods in Image Analysis and Computer Vision, New York, June 1993, IEEE Conf. Comput. Vision, Pattern Recognition, Tutorial note. 111. S. Osher and L. I. Rudin, Feature-oriented image enhancement using shock filters, SIAM J. Numer. Anal. 27 (1990) 919-940. 112. N. Otsu, Mathematical studies on feature extraction in pattern recognition, Researches of the Electrotechnical Laboratory 818, Electrotechnical Labora tory, 1-1-4, Umezono, Sakura-machi, Niihari-gun, Ibaraki, Japan, July 1981, in Japanese. 113. Y. Pati and P. Krishnaprasad, Analysis and synthesis of feed forward neural networks using discrete affine wavelet transforms, IEEE Trans. Neural Networks 4 (1993) 73-85. 114. T. Pavlidis, Structural Pattern Recognition (Springer-Verlag, 1977). 115. P. Perona and J. Malik, Scale-space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. Machine Intell. 12 (1990) 629-639. 116. W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery, Numerical Recipes in C (Cambridge Univ. Press, 1992), 2nd ed. 117. J. R. Quinlan and R. L. Rivest, Inferring decision trees using the minimum description length principle, Information Control 80 (1989) 227-248. 118. J. Quirein, S. Kimminau, J. LaVigne, J. Singer and F. Wendel, A coherent frame work for developing and applying multiple formation evaluation models, Trans. Soc. Prof. Well Log Anal. 27th Annual Logging Symposium, 1986, Paper DD. 119. M. G. Rahim, A neural tree network for phoneme classification, Proc. ICASSP-92 (IEEE, 1992), pp. 345-348. 120. C. R. Rao, Linear Statistical Inference and Its Applications (John Wiley & Sons, 1973), 2nd ed. 121. K. R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantages, and Applications (Academic Press, 1990). 122. O. Rioul, Regular wavelets: a discrete-time approach, IEEE Trans. Signal Processing 4 1 (1993) 3572-3579. 123. O. Rioul and P. Duhamel, Fast algorithms for discrete and continuous wavelet transforms, IEEE Trans. Inform. Theory 38 (1992) 569-586. 124. O. Rioul and M. Vetterli, Wavelets and signal processing, IEEE SP Magazine 8 (1991) 14-38.
450
N. Saito
125. B. D. Ripley, Statistical aspects of neural networks, in Networks and Chaos: Statistical and Probabilistic Aspects, eds. O. E. Barndorff-Nielsen, J. L. Jensen, D. R. Cox and W. S. Kendall (Chapman & Hall, 1993), pp. 40-123. 126. J. Rissanen, A universal prior for integers and estimation by minimum description length, Ann. Statist. 11 (1983) 416-431. 127. , Universal coding, information, prediction, and estimation, IEEE Trans. Inform. Theory 30 (1984) 629-636. 128. , Stochastic Complexity in Statistical Inquiry (World Scientific, 1989). 129. N. Saito, Simultaneous noise suppression and signal compression using a library of orthonormal bases and the minimum description length criterion, in Wavelets in Geophysics, eds. E. FoufoularGeorgiou and P. Kumar (Academic Press, 1994), pp. 299-324. 130. , Simultaneous noise suppression and signal compression using a library of orthonormal bases and the minimum description length criterion, in Wavelet Applications, ed. H. H. Szu, April 1994, Proc. SPIE 2242, pp. 224-235. 131. N. Saito and G. Beylkin, Multiresolution representations using the autocorrelation functions of compactly supported wavelets, Tech. report, Schlumberger-Doll Research, August 1991, expanded abstract in Proceedings of ICASSP92, Vol. 4, pp. 381-383, March 1992. 132. , Multiresolution representations using the auto-correlation functions of compactly supported wavelets, Proc. ICASSP-92 (IEEE, 1992), Vol. 4, pp. 381384. 133. , Multiresolution representations using the auto-correlation functions of compactly supported wavelets, IEEE Trans. Signal Processing 41 (1993) 35843590. 134. , Multiresolution representations using the auto-correlation functions of wavelets, in Progress in Wavelet Analysis and Applications, eds. Y. Meyer and S. Roques (Editions Frontieres, 1993), pp. 721-726. 135. N. Saito and R. R. Coifman, Local discriminant bases, in Mathematical Imaging: Wavelet Applications, in Signal and Image Processing, eds. A. F. Laine and M. A. Unser, July 1994, Proc. SPIE 2303. 136. H. Sakoe and S. Chiba, A dynamic programming approach to continuous speech recognition, in Proc. 7th Int. Congress Acoust, Budapest, 1971, Paper 20C-13. 137. J. Segman, J. Rubinstein and Y. Y. Zeevi, The canonical coordinates method for pattern deformation: theoretical and computational considerations, IEEE Trans. Pattern Anal. Machine Intell. 14 (1992) 1171-1183. 138. J. M. Shapiro, Image coding using the embedded zerotree wavelet algorithm, in Mathematical Imaging: Wavelet Applications in Signal and Image Processing, ed. A. F. Laine (IEEE, 1993), Proc. SPIE 2034, pp. 180-193. 139. M. J. Shensa, The discrete wavelet transform: wedding the a trous and Mallat algorithms, IEEE Trans. Signal Processing 40 (1992) 2464-2482. 140. StatSci, S-PLUS Reference Manual, Vols. 1 & 2, version 3.2, Seattle, WA, De cember 1993.
Local Feature Extraction and Its Applications
451
141. J.-E. Stromberg, J. Zrida and A. Isaksson, Neural trees - using neural nets in a tree classifier structure, Proc. ICASSP-91, (IEEE, 1991), pp. 137-140. 142. H. H. Szu, B. Telfer and S. Kadambe, Neural network adaptive wavelets for signal representation and classification, Opt. Engrg. 31 (1992) 1907-1916. 143. R. H. Tatham, Vp/V, and lithology, Geophys. 47 (1982) 336-344. 144. J. Tittman, Geophysical Well Logging (Academic Press, 1986). 145. G. T. Toussaint, Note on optimal selection of independent binary-valued features for pattern recognition, IEEE Trans. Inform. Theory IT-17 (1971) 618. 146. D. L. Turcotte, Fractals and Chaos in Geology and Geophysics (Cambridge Univ. Press, 1992). 147. M. Vetterli and C. Herley, Wavelets and filter banks: theory and design, IEEE Trans. Signal Processing 40 (1992) 2207-2232. 148. C. S. Wallace and J. D. Patrick, Coding decision trees, Machine Learning 11 (1993) 7-22. 149. R. S. Wallace, Finding natural clusters through entropy minimization, Ph.D. the sis, School of Comput. Science, Carnegie Mellon Univ., Pittsburgh, June 1989. 150. S. Watanabe, Karhunen-Loeve expansion and factor analysis: theoretical re marks and applications, Trans. £ih Prague Conf. Inform. Theory, Statist. De cision Functions, Random Processes, Prague (Czechoslovak Acad. of Sci., 1967), pp. 635-660. 151. , Pattern recognition as a quest for minimum entropy, Pattern Recognition 13 (1981) 381-387. 152. , Pattern Recognition: Human and Mechanical (John Wiley & Sons, 1985). 153. M. Wax and T. Kailath, Detection of signals by information theoretic criteria, IEEE Trans. Acoust, Speech, Signal Processing A S S P - 3 3 (1985) 387-392. 154. J. E. White, Underground sound: applications of seismic waves, in Methods in Geochemistry and Geophysics, Vol. 18 (Elsevier, 1983). 155. M. V. Wickerhauser, Fast approximate factor analysis, in Curves and Surfaces in Computer Vision and Graphics II, October 1991, Proc. SPIE 1610, pp. 23-32. 156. , High-resolution still picture compression, Digital Signal Processing: A Review J. 2 (1992) 204-226. 157. , Adapted Wavelet Analysis from Theory to Software (A. K. Peters, Ltd., 1994), with diskette. 158. K. W. Winkler and W. F. Murphy III, Acoustic velocity and attenuation in porous rocks, AGU Review Volume (Amer. Geophys. Union, 1994). 159. Y. Y. Zeevi and E. Shlomot, Nonuniform sampling and antialiasing in image representation, IEEE Trans. Signal Processing 4 1 (1993) 1223-1236. 160. Q. Zhang and A. Benveniste, Wavelet networks, IEEE Trans. Neural Networks 3 (1992) 889-898.